Re: DIH Issues
> On Apr 25, 2017, at 10:28 AM, AJ Lemke wrote: > > Thanks for the thought Alex! > The fields that have this happen most often are numeric and boolean fields. > These fields have real data (id numbers, true/false, etc.) > > AJ > We had an identical problem a few months ago, and there was no question that the field was populated in all MySQL records. We figured out how to use another field in the schema to do the same query, so, ended up deleting the troublesome field. Never did discover why, all ideas failed. In our case, the same data populated 2 different fields, one worked, one did not, but, never found a good reason for that. I’d love to know if you figure it out, as, it could be the reason why ours did the same thing. Our is a much older version though. We figured it’s some sort of rare bug. We played around for several weeks. Hope you can find it. Steve
Re: DIH Issues
I wonder if it is possible to write a component/URP/something that will intercept exceptions like these and dump out full record. Regards, Alex. http://www.solr-start.com/ - Resources for Solr users, new and experienced On 25 April 2017 at 12:19, Erick Erickson wrote: > You say your SQL database always has the values, but does the output > from the SQL query you actually use have them? I've been fooled before > by the query I form "somehow" doesn't have a value for all fields I > expect. > > You could also crank the logging level up enough to see the docs that > are indexed, although that would probably only confirm that the fields > weren't in the docs which you know already, not tell you why they are > missing. Pull the SQL out and run it independently perhaps? > > I sound a bit like a broken record, but this is why I like SolrJ, I > can actually debug that: > https://lucidworks.com/2012/02/14/indexing-with-solrj/ > > Best, > Erick > > On Tue, Apr 25, 2017 at 8:28 AM, AJ Lemke wrote: >> Thanks for the thought Alex! >> The fields that have this happen most often are numeric and boolean fields. >> These fields have real data (id numbers, true/false, etc.) >> >> AJ >> >> -Original Message- >> From: Alexandre Rafalovitch [mailto:arafa...@gmail.com] >> Sent: Tuesday, April 25, 2017 8:27 AM >> To: solr-user >> Subject: Re: DIH Issues >> >> Maybe the content gets simplified away between the database and the Solr >> schema. For example if your field contains just spaces and you have >> UpdateRequestProcessors to do trim and removal of empty fields? >> >> Schemaless mode will remove empty fields, but will not trim for example. >> >> Regards, >>Alex. >> >> http://www.solr-start.com/ - Resources for Solr users, new and experienced >> >> >> On 25 April 2017 at 09:21, AJ Lemke wrote: >>> Hey all, >>> >>> We are using 6.3.0 and we have issues with DIH throwing errors. We are >>> seeing an intermittent issue where on a full index a single error will be >>> thrown. The error is always "missing required field: fieldname". >>> Our SQL database always has data in the field that comes up with the error. >>> Most of the errors are coming on fields that SQL has marked as required. >>> >>> Would anyone have any hints or ideas where to look to remedy this situation. >>> >>> As always if you need more information let me know. >>> >>> Thanks >>> AJ
Re: DIH Issues
You say your SQL database always has the values, but does the output from the SQL query you actually use have them? I've been fooled before by the query I form "somehow" doesn't have a value for all fields I expect. You could also crank the logging level up enough to see the docs that are indexed, although that would probably only confirm that the fields weren't in the docs which you know already, not tell you why they are missing. Pull the SQL out and run it independently perhaps? I sound a bit like a broken record, but this is why I like SolrJ, I can actually debug that: https://lucidworks.com/2012/02/14/indexing-with-solrj/ Best, Erick On Tue, Apr 25, 2017 at 8:28 AM, AJ Lemke wrote: > Thanks for the thought Alex! > The fields that have this happen most often are numeric and boolean fields. > These fields have real data (id numbers, true/false, etc.) > > AJ > > -Original Message- > From: Alexandre Rafalovitch [mailto:arafa...@gmail.com] > Sent: Tuesday, April 25, 2017 8:27 AM > To: solr-user > Subject: Re: DIH Issues > > Maybe the content gets simplified away between the database and the Solr > schema. For example if your field contains just spaces and you have > UpdateRequestProcessors to do trim and removal of empty fields? > > Schemaless mode will remove empty fields, but will not trim for example. > > Regards, >Alex. > > http://www.solr-start.com/ - Resources for Solr users, new and experienced > > > On 25 April 2017 at 09:21, AJ Lemke wrote: >> Hey all, >> >> We are using 6.3.0 and we have issues with DIH throwing errors. We are >> seeing an intermittent issue where on a full index a single error will be >> thrown. The error is always "missing required field: fieldname". >> Our SQL database always has data in the field that comes up with the error. >> Most of the errors are coming on fields that SQL has marked as required. >> >> Would anyone have any hints or ideas where to look to remedy this situation. >> >> As always if you need more information let me know. >> >> Thanks >> AJ
RE: DIH Issues
Thanks for the thought Alex! The fields that have this happen most often are numeric and boolean fields. These fields have real data (id numbers, true/false, etc.) AJ -Original Message- From: Alexandre Rafalovitch [mailto:arafa...@gmail.com] Sent: Tuesday, April 25, 2017 8:27 AM To: solr-user Subject: Re: DIH Issues Maybe the content gets simplified away between the database and the Solr schema. For example if your field contains just spaces and you have UpdateRequestProcessors to do trim and removal of empty fields? Schemaless mode will remove empty fields, but will not trim for example. Regards, Alex. http://www.solr-start.com/ - Resources for Solr users, new and experienced On 25 April 2017 at 09:21, AJ Lemke wrote: > Hey all, > > We are using 6.3.0 and we have issues with DIH throwing errors. We are > seeing an intermittent issue where on a full index a single error will be > thrown. The error is always "missing required field: fieldname". > Our SQL database always has data in the field that comes up with the error. > Most of the errors are coming on fields that SQL has marked as required. > > Would anyone have any hints or ideas where to look to remedy this situation. > > As always if you need more information let me know. > > Thanks > AJ
Re: DIH Issues
Maybe the content gets simplified away between the database and the Solr schema. For example if your field contains just spaces and you have UpdateRequestProcessors to do trim and removal of empty fields? Schemaless mode will remove empty fields, but will not trim for example. Regards, Alex. http://www.solr-start.com/ - Resources for Solr users, new and experienced On 25 April 2017 at 09:21, AJ Lemke wrote: > Hey all, > > We are using 6.3.0 and we have issues with DIH throwing errors. We are > seeing an intermittent issue where on a full index a single error will be > thrown. The error is always "missing required field: fieldname". > Our SQL database always has data in the field that comes up with the error. > Most of the errors are coming on fields that SQL has marked as required. > > Would anyone have any hints or ideas where to look to remedy this situation. > > As always if you need more information let me know. > > Thanks > AJ
Re: DIH issues with 4.7.1
bq. due to things like NTP, etc. The full sentence is very important. NTP is not the only way for this to happen - you also have leap seconds, daylight savings time, internet clock sync, a whole host of things that affect currentTimeMillis and not nanoTime. It is without question the way to go to even hope for monotonicity. -- Mark Miller about.me/markrmiller On April 26, 2014 at 1:11:14 PM, Walter Underwood (wun...@wunderwood.org) wrote: NTP works very hard to keep the clock positive monotonic. But nanoTime is intended for elapsed time measurement anyway, so it is the right choice. You can get some pretty fun clock behavior by running on virtual machines, like in AWS. And some system real time clocks don't tick during a leap second. And Windows system clocks are probably still hopeless. If you want to run the clock backwards, we don't need NTP, we can set it with "date". wunder On Apr 26, 2014, at 9:10 AM, Mark Miller wrote: > My answer remains the same. I guess if you want more precise terminology, > nanoTime will generally be monotonic and currentTimeMillis will not be, due > to things like NTP, etc. You want monotonicity for measuring elapsed times. > -- > Mark Miller > about.me/markrmiller > > On April 26, 2014 at 11:25:16 AM, Walter Underwood (wun...@wunderwood.org) > wrote: > > NTP should slew the clock rather than jump it. I haven't checked recently, > but that is how it worked in the 90's when I was organizing the NTP hierarchy > at HP. > > It only does step changes if the clocks is really wrong. That is most likely > at reboot, when other demons aren't running yet. > > wunder > > On Apr 26, 2014, at 7:30 AM, Mark Miller wrote: > >> System.currentTimeMillis can jump around due to NTP, etc. If you are trying >> to count elapsed time, you don’t want to use a method that can jump around >> with the results. >> -- >> Mark Miller >> about.me/markrmiller >> >> On April 26, 2014 at 8:58:20 AM, YouPeng Yang (yypvsxf19870...@gmail.com) >> wrote: >> >> Hi Rafał Kuć >> I got it,the point is many operating systems measure time in units of >> tens of milliseconds,and the System.currentTimeMillis() is just base on >> operating system. >> In my case,I just do DIH with a crontable, Is there any possiblity to get >> in that trouble?I am really can not picture what the situation may lead to >> the problem. >> >> >> Thanks very much. >> >> >> 2014-04-26 20:49 GMT+08:00 YouPeng Yang : >> >>> Hi Mark Miller >>> Sorry to get you in these discussion . >>> I notice that Mark Miller report this issure in >>> https://issues.apache.org/jira/browse/SOLR-5734 according to >>> https://issues.apache.org/jira/browse/SOLR-5721,but it just happened with >>> the zookeeper. >>> If I just do DIH with JDBCDataSource ,I do not think it will get the >>> problem. >>> Please give some hints >>> > Bonus,just post the last mail I send about the problem: >>> >>> I have just compare the difference between the version 4.6.0 and 4.7.1. >>> Notice that the time in the getConnection function is declared with the >>> System.nanoTime in 4.7.1 ,while System.currentTimeMillis(). >>> Curious about the resson for the change.the benefit of it .Is it >>> neccessory? >>> I have read the SOLR-5734 , >>> https://issues.apache.org/jira/browse/SOLR-5734 >>> Do some google about the difference of currentTimeMillis and nano,but >>> still can not figure out it. >>> >>> Thank you very much. >>> >>> >>> 2014-04-26 20:31 GMT+08:00 YouPeng Yang : >>> >>> Hi I have just compare the difference between the version 4.6.0 and 4.7.1. Notice that the time in the getConnection function is declared with the System.nanoTime in 4.7.1 ,while System.currentTimeMillis(). Curious about the resson for the change.the benefit of it .Is it neccessory? I have read the SOLR-5734 , https://issues.apache.org/jira/browse/SOLR-5734 Do some google about the difference of currentTimeMillis and nano,but still can not figure out it. 2014-04-26 2:24 GMT+08:00 Shawn Heisey : On 4/25/2014 11:56 AM, Hutchins, Jonathan wrote: > >> I recently upgraded from 4.6.1 to 4.7.1 and have found that the DIH >> process that we are using takes 4x as long to complete. The only odd >> thing I notice is when I enable debug logging for the dataimporthandler >> process, it appears that in the new version each sql query is resulting >> in >> a new connection opened through jdbcdatasource (log: >> http://pastebin.com/JKh4gpmu). Were there any changes that would >> affect >> the speed of running a full import? >> > > This is most likely the problem you are experiencing: > > https://issues.apache.org/jira/browse/SOLR-5954 > > The fix will be in t
Re: DIH issues with 4.7.1
NTP works very hard to keep the clock positive monotonic. But nanoTime is intended for elapsed time measurement anyway, so it is the right choice. You can get some pretty fun clock behavior by running on virtual machines, like in AWS. And some system real time clocks don't tick during a leap second. And Windows system clocks are probably still hopeless. If you want to run the clock backwards, we don't need NTP, we can set it with "date". wunder On Apr 26, 2014, at 9:10 AM, Mark Miller wrote: > My answer remains the same. I guess if you want more precise terminology, > nanoTime will generally be monotonic and currentTimeMillis will not be, due > to things like NTP, etc. You want monotonicity for measuring elapsed times. > -- > Mark Miller > about.me/markrmiller > > On April 26, 2014 at 11:25:16 AM, Walter Underwood (wun...@wunderwood.org) > wrote: > > NTP should slew the clock rather than jump it. I haven't checked recently, > but that is how it worked in the 90's when I was organizing the NTP hierarchy > at HP. > > It only does step changes if the clocks is really wrong. That is most likely > at reboot, when other demons aren't running yet. > > wunder > > On Apr 26, 2014, at 7:30 AM, Mark Miller wrote: > >> System.currentTimeMillis can jump around due to NTP, etc. If you are trying >> to count elapsed time, you don’t want to use a method that can jump around >> with the results. >> -- >> Mark Miller >> about.me/markrmiller >> >> On April 26, 2014 at 8:58:20 AM, YouPeng Yang (yypvsxf19870...@gmail.com) >> wrote: >> >> Hi Rafał Kuć >> I got it,the point is many operating systems measure time in units of >> tens of milliseconds,and the System.currentTimeMillis() is just base on >> operating system. >> In my case,I just do DIH with a crontable, Is there any possiblity to get >> in that trouble?I am really can not picture what the situation may lead to >> the problem. >> >> >> Thanks very much. >> >> >> 2014-04-26 20:49 GMT+08:00 YouPeng Yang : >> >>> Hi Mark Miller >>> Sorry to get you in these discussion . >>> I notice that Mark Miller report this issure in >>> https://issues.apache.org/jira/browse/SOLR-5734 according to >>> https://issues.apache.org/jira/browse/SOLR-5721,but it just happened with >>> the zookeeper. >>> If I just do DIH with JDBCDataSource ,I do not think it will get the >>> problem. >>> Please give some hints >>> > Bonus,just post the last mail I send about the problem: >>> >>> I have just compare the difference between the version 4.6.0 and 4.7.1. >>> Notice that the time in the getConnection function is declared with the >>> System.nanoTime in 4.7.1 ,while System.currentTimeMillis(). >>> Curious about the resson for the change.the benefit of it .Is it >>> neccessory? >>> I have read the SOLR-5734 , >>> https://issues.apache.org/jira/browse/SOLR-5734 >>> Do some google about the difference of currentTimeMillis and nano,but >>> still can not figure out it. >>> >>> Thank you very much. >>> >>> >>> 2014-04-26 20:31 GMT+08:00 YouPeng Yang : >>> >>> Hi I have just compare the difference between the version 4.6.0 and 4.7.1. Notice that the time in the getConnection function is declared with the System.nanoTime in 4.7.1 ,while System.currentTimeMillis(). Curious about the resson for the change.the benefit of it .Is it neccessory? I have read the SOLR-5734 , https://issues.apache.org/jira/browse/SOLR-5734 Do some google about the difference of currentTimeMillis and nano,but still can not figure out it. 2014-04-26 2:24 GMT+08:00 Shawn Heisey : On 4/25/2014 11:56 AM, Hutchins, Jonathan wrote: > >> I recently upgraded from 4.6.1 to 4.7.1 and have found that the DIH >> process that we are using takes 4x as long to complete. The only odd >> thing I notice is when I enable debug logging for the dataimporthandler >> process, it appears that in the new version each sql query is resulting >> in >> a new connection opened through jdbcdatasource (log: >> http://pastebin.com/JKh4gpmu). Were there any changes that would >> affect >> the speed of running a full import? >> > > This is most likely the problem you are experiencing: > > https://issues.apache.org/jira/browse/SOLR-5954 > > The fix will be in the new 4.8 version. The release process for 4.8 is > underway right now. A second release candidate was required yesterday. If > > no further problems are encountered, the release should be made around > the > middle of next week. If problems are encountered, the release will be > delayed. > > Here's something very important that has been mentioned before: Solr > 4.8 will require Java 7. Previously, Java 6 was required. Java 7u55 (the >
Re: DIH issues with 4.7.1
My answer remains the same. I guess if you want more precise terminology, nanoTime will generally be monotonic and currentTimeMillis will not be, due to things like NTP, etc. You want monotonicity for measuring elapsed times. -- Mark Miller about.me/markrmiller On April 26, 2014 at 11:25:16 AM, Walter Underwood (wun...@wunderwood.org) wrote: NTP should slew the clock rather than jump it. I haven't checked recently, but that is how it worked in the 90's when I was organizing the NTP hierarchy at HP. It only does step changes if the clocks is really wrong. That is most likely at reboot, when other demons aren't running yet. wunder On Apr 26, 2014, at 7:30 AM, Mark Miller wrote: > System.currentTimeMillis can jump around due to NTP, etc. If you are trying > to count elapsed time, you don’t want to use a method that can jump around > with the results. > -- > Mark Miller > about.me/markrmiller > > On April 26, 2014 at 8:58:20 AM, YouPeng Yang (yypvsxf19870...@gmail.com) > wrote: > > Hi Rafał Kuć > I got it,the point is many operating systems measure time in units of > tens of milliseconds,and the System.currentTimeMillis() is just base on > operating system. > In my case,I just do DIH with a crontable, Is there any possiblity to get > in that trouble?I am really can not picture what the situation may lead to > the problem. > > > Thanks very much. > > > 2014-04-26 20:49 GMT+08:00 YouPeng Yang : > >> Hi Mark Miller >> Sorry to get you in these discussion . >> I notice that Mark Miller report this issure in >> https://issues.apache.org/jira/browse/SOLR-5734 according to >> https://issues.apache.org/jira/browse/SOLR-5721,but it just happened with >> the zookeeper. >> If I just do DIH with JDBCDataSource ,I do not think it will get the >> problem. >> Please give some hints >> Bonus,just post the last mail I send about the problem: >> >> I have just compare the difference between the version 4.6.0 and 4.7.1. >> Notice that the time in the getConnection function is declared with the >> System.nanoTime in 4.7.1 ,while System.currentTimeMillis(). >> Curious about the resson for the change.the benefit of it .Is it >> neccessory? >> I have read the SOLR-5734 , >> https://issues.apache.org/jira/browse/SOLR-5734 >> Do some google about the difference of currentTimeMillis and nano,but >> still can not figure out it. >> >> Thank you very much. >> >> >> 2014-04-26 20:31 GMT+08:00 YouPeng Yang : >> >> Hi >>> I have just compare the difference between the version 4.6.0 and >>> 4.7.1. Notice that the time in the getConnection function is declared >>> with the System.nanoTime in 4.7.1 ,while System.currentTimeMillis(). >>> Curious about the resson for the change.the benefit of it .Is it >>> neccessory? >>> I have read the SOLR-5734 , >>> https://issues.apache.org/jira/browse/SOLR-5734 >>> Do some google about the difference of currentTimeMillis and nano,but >>> still can not figure out it. >>> >>> >>> >>> >>> 2014-04-26 2:24 GMT+08:00 Shawn Heisey : >>> >>> On 4/25/2014 11:56 AM, Hutchins, Jonathan wrote: > I recently upgraded from 4.6.1 to 4.7.1 and have found that the DIH > process that we are using takes 4x as long to complete. The only odd > thing I notice is when I enable debug logging for the dataimporthandler > process, it appears that in the new version each sql query is resulting > in > a new connection opened through jdbcdatasource (log: > http://pastebin.com/JKh4gpmu). Were there any changes that would > affect > the speed of running a full import? > This is most likely the problem you are experiencing: https://issues.apache.org/jira/browse/SOLR-5954 The fix will be in the new 4.8 version. The release process for 4.8 is underway right now. A second release candidate was required yesterday. If no further problems are encountered, the release should be made around the middle of next week. If problems are encountered, the release will be delayed. Here's something very important that has been mentioned before: Solr 4.8 will require Java 7. Previously, Java 6 was required. Java 7u55 (the current release from Oracle as I write this) is recommended as a minimum. If a 4.7.3 version is built, this is a fix that we should backport. Thanks, Shawn >>> >> -- Walter Underwood wun...@wunderwood.org
Re: DIH issues with 4.7.1
NTP should slew the clock rather than jump it. I haven't checked recently, but that is how it worked in the 90's when I was organizing the NTP hierarchy at HP. It only does step changes if the clocks is really wrong. That is most likely at reboot, when other demons aren't running yet. wunder On Apr 26, 2014, at 7:30 AM, Mark Miller wrote: > System.currentTimeMillis can jump around due to NTP, etc. If you are trying > to count elapsed time, you don’t want to use a method that can jump around > with the results. > -- > Mark Miller > about.me/markrmiller > > On April 26, 2014 at 8:58:20 AM, YouPeng Yang (yypvsxf19870...@gmail.com) > wrote: > > Hi Rafał Kuć > I got it,the point is many operating systems measure time in units of > tens of milliseconds,and the System.currentTimeMillis() is just base on > operating system. > In my case,I just do DIH with a crontable, Is there any possiblity to get > in that trouble?I am really can not picture what the situation may lead to > the problem. > > > Thanks very much. > > > 2014-04-26 20:49 GMT+08:00 YouPeng Yang : > >> Hi Mark Miller >> Sorry to get you in these discussion . >> I notice that Mark Miller report this issure in >> https://issues.apache.org/jira/browse/SOLR-5734 according to >> https://issues.apache.org/jira/browse/SOLR-5721,but it just happened with >> the zookeeper. >> If I just do DIH with JDBCDataSource ,I do not think it will get the >> problem. >> Please give some hints >> Bonus,just post the last mail I send about the problem: >> >> I have just compare the difference between the version 4.6.0 and 4.7.1. >> Notice that the time in the getConnection function is declared with the >> System.nanoTime in 4.7.1 ,while System.currentTimeMillis(). >> Curious about the resson for the change.the benefit of it .Is it >> neccessory? >> I have read the SOLR-5734 , >> https://issues.apache.org/jira/browse/SOLR-5734 >> Do some google about the difference of currentTimeMillis and nano,but >> still can not figure out it. >> >> Thank you very much. >> >> >> 2014-04-26 20:31 GMT+08:00 YouPeng Yang : >> >> Hi >>> I have just compare the difference between the version 4.6.0 and >>> 4.7.1. Notice that the time in the getConnection function is declared >>> with the System.nanoTime in 4.7.1 ,while System.currentTimeMillis(). >>> Curious about the resson for the change.the benefit of it .Is it >>> neccessory? >>> I have read the SOLR-5734 , >>> https://issues.apache.org/jira/browse/SOLR-5734 >>> Do some google about the difference of currentTimeMillis and nano,but >>> still can not figure out it. >>> >>> >>> >>> >>> 2014-04-26 2:24 GMT+08:00 Shawn Heisey : >>> >>> On 4/25/2014 11:56 AM, Hutchins, Jonathan wrote: > I recently upgraded from 4.6.1 to 4.7.1 and have found that the DIH > process that we are using takes 4x as long to complete. The only odd > thing I notice is when I enable debug logging for the dataimporthandler > process, it appears that in the new version each sql query is resulting > in > a new connection opened through jdbcdatasource (log: > http://pastebin.com/JKh4gpmu). Were there any changes that would > affect > the speed of running a full import? > This is most likely the problem you are experiencing: https://issues.apache.org/jira/browse/SOLR-5954 The fix will be in the new 4.8 version. The release process for 4.8 is underway right now. A second release candidate was required yesterday. If no further problems are encountered, the release should be made around the middle of next week. If problems are encountered, the release will be delayed. Here's something very important that has been mentioned before: Solr 4.8 will require Java 7. Previously, Java 6 was required. Java 7u55 (the current release from Oracle as I write this) is recommended as a minimum. If a 4.7.3 version is built, this is a fix that we should backport. Thanks, Shawn >>> >> -- Walter Underwood wun...@wunderwood.org
Re: DIH issues with 4.7.1
System.currentTimeMillis can jump around due to NTP, etc. If you are trying to count elapsed time, you don’t want to use a method that can jump around with the results. -- Mark Miller about.me/markrmiller On April 26, 2014 at 8:58:20 AM, YouPeng Yang (yypvsxf19870...@gmail.com) wrote: Hi Rafał Kuć I got it,the point is many operating systems measure time in units of tens of milliseconds,and the System.currentTimeMillis() is just base on operating system. In my case,I just do DIH with a crontable, Is there any possiblity to get in that trouble?I am really can not picture what the situation may lead to the problem. Thanks very much. 2014-04-26 20:49 GMT+08:00 YouPeng Yang : > Hi Mark Miller > Sorry to get you in these discussion . > I notice that Mark Miller report this issure in > https://issues.apache.org/jira/browse/SOLR-5734 according to > https://issues.apache.org/jira/browse/SOLR-5721,but it just happened with > the zookeeper. > If I just do DIH with JDBCDataSource ,I do not think it will get the > problem. > Please give some hints > > >> Bonus,just post the last mail I send about the problem: > > I have just compare the difference between the version 4.6.0 and 4.7.1. > Notice that the time in the getConnection function is declared with the > System.nanoTime in 4.7.1 ,while System.currentTimeMillis(). > Curious about the resson for the change.the benefit of it .Is it > neccessory? > I have read the SOLR-5734 , > https://issues.apache.org/jira/browse/SOLR-5734 > Do some google about the difference of currentTimeMillis and nano,but > still can not figure out it. > > Thank you very much. > > > 2014-04-26 20:31 GMT+08:00 YouPeng Yang : > > Hi >> I have just compare the difference between the version 4.6.0 and >> 4.7.1. Notice that the time in the getConnection function is declared >> with the System.nanoTime in 4.7.1 ,while System.currentTimeMillis(). >> Curious about the resson for the change.the benefit of it .Is it >> neccessory? >> I have read the SOLR-5734 , >> https://issues.apache.org/jira/browse/SOLR-5734 >> Do some google about the difference of currentTimeMillis and nano,but >> still can not figure out it. >> >> >> >> >> 2014-04-26 2:24 GMT+08:00 Shawn Heisey : >> >> On 4/25/2014 11:56 AM, Hutchins, Jonathan wrote: >>> I recently upgraded from 4.6.1 to 4.7.1 and have found that the DIH process that we are using takes 4x as long to complete. The only odd thing I notice is when I enable debug logging for the dataimporthandler process, it appears that in the new version each sql query is resulting in a new connection opened through jdbcdatasource (log: http://pastebin.com/JKh4gpmu). Were there any changes that would affect the speed of running a full import? >>> >>> This is most likely the problem you are experiencing: >>> >>> https://issues.apache.org/jira/browse/SOLR-5954 >>> >>> The fix will be in the new 4.8 version. The release process for 4.8 is >>> underway right now. A second release candidate was required yesterday. If >>> no further problems are encountered, the release should be made around the >>> middle of next week. If problems are encountered, the release will be >>> delayed. >>> >>> Here's something very important that has been mentioned before: Solr >>> 4.8 will require Java 7. Previously, Java 6 was required. Java 7u55 (the >>> current release from Oracle as I write this) is recommended as a minimum. >>> >>> If a 4.7.3 version is built, this is a fix that we should backport. >>> >>> Thanks, >>> Shawn >>> >>> >> >
Re: DIH issues with 4.7.1
Hi Rafał Kuć I got it,the point is many operating systems measure time in units of tens of milliseconds,and the System.currentTimeMillis() is just base on operating system. In my case,I just do DIH with a crontable, Is there any possiblity to get in that trouble?I am really can not picture what the situation may lead to the problem. Thanks very much. 2014-04-26 20:49 GMT+08:00 YouPeng Yang : > Hi Mark Miller > Sorry to get you in these discussion . > I notice that Mark Miller report this issure in > https://issues.apache.org/jira/browse/SOLR-5734 according to > https://issues.apache.org/jira/browse/SOLR-5721,but it just happened with > the zookeeper. > If I just do DIH with JDBCDataSource ,I do not think it will get the > problem. > Please give some hints > > >> Bonus,just post the last mail I send about the problem: > >I have just compare the difference between the version 4.6.0 and 4.7.1. > Notice that the time in the getConnection function is declared with the > System.nanoTime in 4.7.1 ,while System.currentTimeMillis(). > Curious about the resson for the change.the benefit of it .Is it > neccessory? >I have read the SOLR-5734 , > https://issues.apache.org/jira/browse/SOLR-5734 >Do some google about the difference of currentTimeMillis and nano,but > still can not figure out it. > > Thank you very much. > > > 2014-04-26 20:31 GMT+08:00 YouPeng Yang : > > Hi >>I have just compare the difference between the version 4.6.0 and >> 4.7.1. Notice that the time in the getConnection function is declared >> with the System.nanoTime in 4.7.1 ,while System.currentTimeMillis(). >> Curious about the resson for the change.the benefit of it .Is it >> neccessory? >>I have read the SOLR-5734 , >> https://issues.apache.org/jira/browse/SOLR-5734 >>Do some google about the difference of currentTimeMillis and nano,but >> still can not figure out it. >> >> >> >> >> 2014-04-26 2:24 GMT+08:00 Shawn Heisey : >> >> On 4/25/2014 11:56 AM, Hutchins, Jonathan wrote: >>> I recently upgraded from 4.6.1 to 4.7.1 and have found that the DIH process that we are using takes 4x as long to complete. The only odd thing I notice is when I enable debug logging for the dataimporthandler process, it appears that in the new version each sql query is resulting in a new connection opened through jdbcdatasource (log: http://pastebin.com/JKh4gpmu). Were there any changes that would affect the speed of running a full import? >>> >>> This is most likely the problem you are experiencing: >>> >>> https://issues.apache.org/jira/browse/SOLR-5954 >>> >>> The fix will be in the new 4.8 version. The release process for 4.8 is >>> underway right now. A second release candidate was required yesterday. If >>> no further problems are encountered, the release should be made around the >>> middle of next week. If problems are encountered, the release will be >>> delayed. >>> >>> Here's something very important that has been mentioned before: Solr >>> 4.8 will require Java 7. Previously, Java 6 was required. Java 7u55 (the >>> current release from Oracle as I write this) is recommended as a minimum. >>> >>> If a 4.7.3 version is built, this is a fix that we should backport. >>> >>> Thanks, >>> Shawn >>> >>> >> >
Re: DIH issues with 4.7.1
Hi Mark Miller Sorry to get you in these discussion . I notice that Mark Miller report this issure in https://issues.apache.org/jira/browse/SOLR-5734 according to https://issues.apache.org/jira/browse/SOLR-5721,but it just happened with the zookeeper. If I just do DIH with JDBCDataSource ,I do not think it will get the problem. Please give some hints >> Bonus,just post the last mail I send about the problem: I have just compare the difference between the version 4.6.0 and 4.7.1. Notice that the time in the getConnection function is declared with the System.nanoTime in 4.7.1 ,while System.currentTimeMillis(). Curious about the resson for the change.the benefit of it .Is it neccessory? I have read the SOLR-5734 , https://issues.apache.org/jira/browse/SOLR-5734 Do some google about the difference of currentTimeMillis and nano,but still can not figure out it. Thank you very much. 2014-04-26 20:31 GMT+08:00 YouPeng Yang : > Hi >I have just compare the difference between the version 4.6.0 and 4.7.1. > Notice that the time in the getConnection function is declared with the > System.nanoTime in 4.7.1 ,while System.currentTimeMillis(). > Curious about the resson for the change.the benefit of it .Is it > neccessory? >I have read the SOLR-5734 , > https://issues.apache.org/jira/browse/SOLR-5734 >Do some google about the difference of currentTimeMillis and nano,but > still can not figure out it. > > > > > 2014-04-26 2:24 GMT+08:00 Shawn Heisey : > > On 4/25/2014 11:56 AM, Hutchins, Jonathan wrote: >> >>> I recently upgraded from 4.6.1 to 4.7.1 and have found that the DIH >>> process that we are using takes 4x as long to complete. The only odd >>> thing I notice is when I enable debug logging for the dataimporthandler >>> process, it appears that in the new version each sql query is resulting >>> in >>> a new connection opened through jdbcdatasource (log: >>> http://pastebin.com/JKh4gpmu). Were there any changes that would affect >>> the speed of running a full import? >>> >> >> This is most likely the problem you are experiencing: >> >> https://issues.apache.org/jira/browse/SOLR-5954 >> >> The fix will be in the new 4.8 version. The release process for 4.8 is >> underway right now. A second release candidate was required yesterday. If >> no further problems are encountered, the release should be made around the >> middle of next week. If problems are encountered, the release will be >> delayed. >> >> Here's something very important that has been mentioned before: Solr 4.8 >> will require Java 7. Previously, Java 6 was required. Java 7u55 (the >> current release from Oracle as I write this) is recommended as a minimum. >> >> If a 4.7.3 version is built, this is a fix that we should backport. >> >> Thanks, >> Shawn >> >> >
Re: DIH issues with 4.7.1
Hello! Look at the javadocs for both. The granularity of System.currentTimeMillis() depend on the operating system, so it may happen that calls to that method that are 1 millisecond away from each other still return the same value. This is not the case with System.nanoTime() - http://docs.oracle.com/javase/7/docs/api/java/lang/System.html -- Regards, Rafał Kuć Performance Monitoring * Log Analytics * Search Analytics Solr & Elasticsearch Support * http://sematext.com/ > Hi >I have just compare the difference between the version 4.6.0 and 4.7.1. > Notice that the time in the getConnection function is declared with the > System.nanoTime in 4.7.1 ,while System.currentTimeMillis(). > Curious about the resson for the change.the benefit of it .Is it > neccessory? >I have read the SOLR-5734 , > https://issues.apache.org/jira/browse/SOLR-5734 >Do some google about the difference of currentTimeMillis and nano,but > still can not figure out it. > 2014-04-26 2:24 GMT+08:00 Shawn Heisey : >> On 4/25/2014 11:56 AM, Hutchins, Jonathan wrote: >> >>> I recently upgraded from 4.6.1 to 4.7.1 and have found that the DIH >>> process that we are using takes 4x as long to complete. The only odd >>> thing I notice is when I enable debug logging for the dataimporthandler >>> process, it appears that in the new version each sql query is resulting in >>> a new connection opened through jdbcdatasource (log: >>> http://pastebin.com/JKh4gpmu). Were there any changes that would affect >>> the speed of running a full import? >>> >> >> This is most likely the problem you are experiencing: >> >> https://issues.apache.org/jira/browse/SOLR-5954 >> >> The fix will be in the new 4.8 version. The release process for 4.8 is >> underway right now. A second release candidate was required yesterday. If >> no further problems are encountered, the release should be made around the >> middle of next week. If problems are encountered, the release will be >> delayed. >> >> Here's something very important that has been mentioned before: Solr 4.8 >> will require Java 7. Previously, Java 6 was required. Java 7u55 (the >> current release from Oracle as I write this) is recommended as a minimum. >> >> If a 4.7.3 version is built, this is a fix that we should backport. >> >> Thanks, >> Shawn >> >>
Re: DIH issues with 4.7.1
Hi I have just compare the difference between the version 4.6.0 and 4.7.1. Notice that the time in the getConnection function is declared with the System.nanoTime in 4.7.1 ,while System.currentTimeMillis(). Curious about the resson for the change.the benefit of it .Is it neccessory? I have read the SOLR-5734 , https://issues.apache.org/jira/browse/SOLR-5734 Do some google about the difference of currentTimeMillis and nano,but still can not figure out it. 2014-04-26 2:24 GMT+08:00 Shawn Heisey : > On 4/25/2014 11:56 AM, Hutchins, Jonathan wrote: > >> I recently upgraded from 4.6.1 to 4.7.1 and have found that the DIH >> process that we are using takes 4x as long to complete. The only odd >> thing I notice is when I enable debug logging for the dataimporthandler >> process, it appears that in the new version each sql query is resulting in >> a new connection opened through jdbcdatasource (log: >> http://pastebin.com/JKh4gpmu). Were there any changes that would affect >> the speed of running a full import? >> > > This is most likely the problem you are experiencing: > > https://issues.apache.org/jira/browse/SOLR-5954 > > The fix will be in the new 4.8 version. The release process for 4.8 is > underway right now. A second release candidate was required yesterday. If > no further problems are encountered, the release should be made around the > middle of next week. If problems are encountered, the release will be > delayed. > > Here's something very important that has been mentioned before: Solr 4.8 > will require Java 7. Previously, Java 6 was required. Java 7u55 (the > current release from Oracle as I write this) is recommended as a minimum. > > If a 4.7.3 version is built, this is a fix that we should backport. > > Thanks, > Shawn > >
Re: DIH issues with 4.7.1
On 4/25/2014 11:56 AM, Hutchins, Jonathan wrote: I recently upgraded from 4.6.1 to 4.7.1 and have found that the DIH process that we are using takes 4x as long to complete. The only odd thing I notice is when I enable debug logging for the dataimporthandler process, it appears that in the new version each sql query is resulting in a new connection opened through jdbcdatasource (log: http://pastebin.com/JKh4gpmu). Were there any changes that would affect the speed of running a full import? This is most likely the problem you are experiencing: https://issues.apache.org/jira/browse/SOLR-5954 The fix will be in the new 4.8 version. The release process for 4.8 is underway right now. A second release candidate was required yesterday. If no further problems are encountered, the release should be made around the middle of next week. If problems are encountered, the release will be delayed. Here's something very important that has been mentioned before: Solr 4.8 will require Java 7. Previously, Java 6 was required. Java 7u55 (the current release from Oracle as I write this) is recommended as a minimum. If a 4.7.3 version is built, this is a fix that we should backport. Thanks, Shawn
Re: DIH issues with 4.7.1
Hi Jonathan, It's a known bug: https://issues.apache.org/jira/browse/SOLR-5954. It'll be fixed in 4.8, which is being voted on now. Alan Woodward www.flax.co.uk On 25 Apr 2014, at 18:56, Hutchins, Jonathan wrote: > I recently upgraded from 4.6.1 to 4.7.1 and have found that the DIH > process that we are using takes 4x as long to complete. The only odd > thing I notice is when I enable debug logging for the dataimporthandler > process, it appears that in the new version each sql query is resulting in > a new connection opened through jdbcdatasource (log: > http://pastebin.com/JKh4gpmu). Were there any changes that would affect > the speed of running a full import? > > Thanks! > > - Jonathan Hutchins > >