Re: DIH Issues

2017-04-25 Thread Sales

> On Apr 25, 2017, at 10:28 AM, AJ Lemke  wrote:
> 
> Thanks for the thought Alex!
> The fields that have this happen most often are numeric and boolean fields. 
> These fields have real data (id numbers, true/false, etc.)
> 
> AJ
> 

We had an identical problem a few months ago, and there was no question that 
the field was populated in all MySQL records. We figured out how to use another 
field in the schema to do the same query, so, ended up deleting the troublesome 
field. Never did discover why, all ideas failed. In our case, the same data 
populated 2 different fields, one worked, one did not, but, never found a good 
reason for that. I’d love to know if you figure it out, as, it could be the 
reason why ours did the same thing. Our is a much older version though. We 
figured it’s some sort of rare bug. We played around for several weeks. Hope 
you can find it. 

Steve

Re: DIH Issues

2017-04-25 Thread Alexandre Rafalovitch
I wonder if it is possible to write a component/URP/something that
will intercept exceptions like these and dump out full record.

Regards,
   Alex.

http://www.solr-start.com/ - Resources for Solr users, new and experienced


On 25 April 2017 at 12:19, Erick Erickson  wrote:
> You say your SQL database always has the values, but does the output
> from the SQL query you actually use have them? I've been fooled before
> by the query I form "somehow" doesn't have a value for all fields I
> expect.
>
> You could also crank the logging level up enough to see the docs that
> are indexed, although that would probably only confirm that the fields
> weren't in the docs which you know already, not tell you why they are
> missing. Pull the SQL out and run it independently perhaps?
>
> I sound a bit like a broken record, but this is why I like SolrJ, I
> can actually debug that:
> https://lucidworks.com/2012/02/14/indexing-with-solrj/
>
> Best,
> Erick
>
> On Tue, Apr 25, 2017 at 8:28 AM, AJ Lemke  wrote:
>> Thanks for the thought Alex!
>> The fields that have this happen most often are numeric and boolean fields. 
>> These fields have real data (id numbers, true/false, etc.)
>>
>> AJ
>>
>> -Original Message-
>> From: Alexandre Rafalovitch [mailto:arafa...@gmail.com]
>> Sent: Tuesday, April 25, 2017 8:27 AM
>> To: solr-user 
>> Subject: Re: DIH Issues
>>
>> Maybe the content gets simplified away between the database and the Solr 
>> schema. For example if your field contains just spaces and you have 
>> UpdateRequestProcessors to do trim and removal of empty fields?
>>
>> Schemaless mode will remove empty fields, but will not trim for example.
>>
>> Regards,
>>Alex.
>> 
>> http://www.solr-start.com/ - Resources for Solr users, new and experienced
>>
>>
>> On 25 April 2017 at 09:21, AJ Lemke  wrote:
>>> Hey all,
>>>
>>> We are using 6.3.0 and we have issues with DIH throwing errors.  We are 
>>> seeing an intermittent issue where on a full index a single error will be 
>>> thrown.  The error is always "missing required field: fieldname".
>>> Our SQL database always has data in the field that comes up with the error. 
>>>  Most of the errors are coming on fields that SQL has marked as required.
>>>
>>> Would anyone have any hints or ideas where to look to remedy this situation.
>>>
>>> As always if you need more information let me know.
>>>
>>> Thanks
>>> AJ


Re: DIH Issues

2017-04-25 Thread Erick Erickson
You say your SQL database always has the values, but does the output
from the SQL query you actually use have them? I've been fooled before
by the query I form "somehow" doesn't have a value for all fields I
expect.

You could also crank the logging level up enough to see the docs that
are indexed, although that would probably only confirm that the fields
weren't in the docs which you know already, not tell you why they are
missing. Pull the SQL out and run it independently perhaps?

I sound a bit like a broken record, but this is why I like SolrJ, I
can actually debug that:
https://lucidworks.com/2012/02/14/indexing-with-solrj/

Best,
Erick

On Tue, Apr 25, 2017 at 8:28 AM, AJ Lemke  wrote:
> Thanks for the thought Alex!
> The fields that have this happen most often are numeric and boolean fields. 
> These fields have real data (id numbers, true/false, etc.)
>
> AJ
>
> -Original Message-
> From: Alexandre Rafalovitch [mailto:arafa...@gmail.com]
> Sent: Tuesday, April 25, 2017 8:27 AM
> To: solr-user 
> Subject: Re: DIH Issues
>
> Maybe the content gets simplified away between the database and the Solr 
> schema. For example if your field contains just spaces and you have 
> UpdateRequestProcessors to do trim and removal of empty fields?
>
> Schemaless mode will remove empty fields, but will not trim for example.
>
> Regards,
>Alex.
> 
> http://www.solr-start.com/ - Resources for Solr users, new and experienced
>
>
> On 25 April 2017 at 09:21, AJ Lemke  wrote:
>> Hey all,
>>
>> We are using 6.3.0 and we have issues with DIH throwing errors.  We are 
>> seeing an intermittent issue where on a full index a single error will be 
>> thrown.  The error is always "missing required field: fieldname".
>> Our SQL database always has data in the field that comes up with the error.  
>> Most of the errors are coming on fields that SQL has marked as required.
>>
>> Would anyone have any hints or ideas where to look to remedy this situation.
>>
>> As always if you need more information let me know.
>>
>> Thanks
>> AJ


RE: DIH Issues

2017-04-25 Thread AJ Lemke
Thanks for the thought Alex!
The fields that have this happen most often are numeric and boolean fields. 
These fields have real data (id numbers, true/false, etc.)

AJ

-Original Message-
From: Alexandre Rafalovitch [mailto:arafa...@gmail.com] 
Sent: Tuesday, April 25, 2017 8:27 AM
To: solr-user 
Subject: Re: DIH Issues

Maybe the content gets simplified away between the database and the Solr 
schema. For example if your field contains just spaces and you have 
UpdateRequestProcessors to do trim and removal of empty fields?

Schemaless mode will remove empty fields, but will not trim for example.

Regards,
   Alex.

http://www.solr-start.com/ - Resources for Solr users, new and experienced


On 25 April 2017 at 09:21, AJ Lemke  wrote:
> Hey all,
>
> We are using 6.3.0 and we have issues with DIH throwing errors.  We are 
> seeing an intermittent issue where on a full index a single error will be 
> thrown.  The error is always "missing required field: fieldname".
> Our SQL database always has data in the field that comes up with the error.  
> Most of the errors are coming on fields that SQL has marked as required.
>
> Would anyone have any hints or ideas where to look to remedy this situation.
>
> As always if you need more information let me know.
>
> Thanks
> AJ


Re: DIH Issues

2017-04-25 Thread Alexandre Rafalovitch
Maybe the content gets simplified away between the database and the
Solr schema. For example if your field contains just spaces and you
have UpdateRequestProcessors to do trim and removal of empty fields?

Schemaless mode will remove empty fields, but will not trim for example.

Regards,
   Alex.

http://www.solr-start.com/ - Resources for Solr users, new and experienced


On 25 April 2017 at 09:21, AJ Lemke  wrote:
> Hey all,
>
> We are using 6.3.0 and we have issues with DIH throwing errors.  We are 
> seeing an intermittent issue where on a full index a single error will be 
> thrown.  The error is always "missing required field: fieldname".
> Our SQL database always has data in the field that comes up with the error.  
> Most of the errors are coming on fields that SQL has marked as required.
>
> Would anyone have any hints or ideas where to look to remedy this situation.
>
> As always if you need more information let me know.
>
> Thanks
> AJ


Re: DIH issues with 4.7.1

2014-04-26 Thread Mark Miller
bq. due to things like NTP, etc.

The full sentence is very important. NTP is not the only way for this to happen 
- you also have leap seconds, daylight savings time, internet clock sync, a 
whole host of things that affect currentTimeMillis and not nanoTime. It is 
without question the way to go to even hope for monotonicity.
-- 
Mark Miller
about.me/markrmiller

On April 26, 2014 at 1:11:14 PM, Walter Underwood (wun...@wunderwood.org) wrote:

NTP works very hard to keep the clock positive monotonic. But nanoTime is 
intended for elapsed time measurement anyway, so it is the right choice.  

You can get some pretty fun clock behavior by running on virtual machines, like 
in AWS. And some system real time clocks don't tick during a leap second. And 
Windows system clocks are probably still hopeless.  

If you want to run the clock backwards, we don't need NTP, we can set it with 
"date".  

wunder  

On Apr 26, 2014, at 9:10 AM, Mark Miller  wrote:  

> My answer remains the same. I guess if you want more precise terminology, 
> nanoTime will generally be monotonic and currentTimeMillis will not be, due 
> to things like NTP, etc. You want monotonicity for measuring elapsed times.  
> --  
> Mark Miller  
> about.me/markrmiller  
>  
> On April 26, 2014 at 11:25:16 AM, Walter Underwood (wun...@wunderwood.org) 
> wrote:  
>  
> NTP should slew the clock rather than jump it. I haven't checked recently, 
> but that is how it worked in the 90's when I was organizing the NTP hierarchy 
> at HP.  
>  
> It only does step changes if the clocks is really wrong. That is most likely 
> at reboot, when other demons aren't running yet.  
>  
> wunder  
>  
> On Apr 26, 2014, at 7:30 AM, Mark Miller  wrote:  
>  
>> System.currentTimeMillis can jump around due to NTP, etc. If you are trying 
>> to count elapsed time, you don’t want to use a method that can jump around 
>> with the results.  
>> --  
>> Mark Miller  
>> about.me/markrmiller  
>>  
>> On April 26, 2014 at 8:58:20 AM, YouPeng Yang (yypvsxf19870...@gmail.com) 
>> wrote:  
>>  
>> Hi Rafał Kuć  
>> I got it,the point is many operating systems measure time in units of  
>> tens of milliseconds,and the System.currentTimeMillis() is just base on  
>> operating system.  
>> In my case,I just do DIH with a crontable, Is there any possiblity to get  
>> in that trouble?I am really can not picture what the situation may lead to  
>> the problem.  
>>  
>>  
>> Thanks very much.  
>>  
>>  
>> 2014-04-26 20:49 GMT+08:00 YouPeng Yang :  
>>  
>>> Hi Mark Miller  
>>> Sorry to get you in these discussion .  
>>> I notice that Mark Miller report this issure in  
>>> https://issues.apache.org/jira/browse/SOLR-5734 according to  
>>> https://issues.apache.org/jira/browse/SOLR-5721,but it just happened with  
>>> the zookeeper.  
>>> If I just do DIH with JDBCDataSource ,I do not think it will get the  
>>> problem.  
>>> Please give some hints  
>>>  
> Bonus,just post the last mail I send about the problem:  
>>>  
>>> I have just compare the difference between the version 4.6.0 and 4.7.1.  
>>> Notice that the time in the getConnection function is declared with the  
>>> System.nanoTime in 4.7.1 ,while System.currentTimeMillis().  
>>> Curious about the resson for the change.the benefit of it .Is it  
>>> neccessory?  
>>> I have read the SOLR-5734 ,  
>>> https://issues.apache.org/jira/browse/SOLR-5734  
>>> Do some google about the difference of currentTimeMillis and nano,but  
>>> still can not figure out it.  
>>>  
>>> Thank you very much.  
>>>  
>>>  
>>> 2014-04-26 20:31 GMT+08:00 YouPeng Yang :  
>>>  
>>> Hi  
 I have just compare the difference between the version 4.6.0 and  
 4.7.1. Notice that the time in the getConnection function is declared  
 with the System.nanoTime in 4.7.1 ,while System.currentTimeMillis().  
 Curious about the resson for the change.the benefit of it .Is it  
 neccessory?  
 I have read the SOLR-5734 ,  
 https://issues.apache.org/jira/browse/SOLR-5734  
 Do some google about the difference of currentTimeMillis and nano,but  
 still can not figure out it.  
  
  
  
  
 2014-04-26 2:24 GMT+08:00 Shawn Heisey :  
  
 On 4/25/2014 11:56 AM, Hutchins, Jonathan wrote:  
>  
>> I recently upgraded from 4.6.1 to 4.7.1 and have found that the DIH  
>> process that we are using takes 4x as long to complete. The only odd  
>> thing I notice is when I enable debug logging for the dataimporthandler  
>> process, it appears that in the new version each sql query is resulting  
>> in  
>> a new connection opened through jdbcdatasource (log:  
>> http://pastebin.com/JKh4gpmu). Were there any changes that would  
>> affect  
>> the speed of running a full import?  
>>  
>  
> This is most likely the problem you are experiencing:  
>  
> https://issues.apache.org/jira/browse/SOLR-5954  
>  
> The fix will be in t

Re: DIH issues with 4.7.1

2014-04-26 Thread Walter Underwood
NTP works very hard to keep the clock positive monotonic. But nanoTime is 
intended for elapsed time measurement anyway, so it is the right choice.

You can get some pretty fun clock behavior by running on virtual machines, like 
in AWS. And some system real time clocks don't tick during a leap second. And 
Windows system clocks are probably still hopeless.

If you want to run the clock backwards, we don't need NTP, we can set it with 
"date".

wunder

On Apr 26, 2014, at 9:10 AM, Mark Miller  wrote:

> My answer remains the same. I guess if you want more precise terminology, 
> nanoTime will generally be monotonic and currentTimeMillis will not be, due 
> to things like NTP, etc. You want monotonicity for measuring elapsed times.
> -- 
> Mark Miller
> about.me/markrmiller
> 
> On April 26, 2014 at 11:25:16 AM, Walter Underwood (wun...@wunderwood.org) 
> wrote:
> 
> NTP should slew the clock rather than jump it. I haven't checked recently, 
> but that is how it worked in the 90's when I was organizing the NTP hierarchy 
> at HP.  
> 
> It only does step changes if the clocks is really wrong. That is most likely 
> at reboot, when other demons aren't running yet.  
> 
> wunder  
> 
> On Apr 26, 2014, at 7:30 AM, Mark Miller  wrote:  
> 
>> System.currentTimeMillis can jump around due to NTP, etc. If you are trying 
>> to count elapsed time, you don’t want to use a method that can jump around 
>> with the results.  
>> --  
>> Mark Miller  
>> about.me/markrmiller  
>> 
>> On April 26, 2014 at 8:58:20 AM, YouPeng Yang (yypvsxf19870...@gmail.com) 
>> wrote:  
>> 
>> Hi Rafał Kuć  
>> I got it,the point is many operating systems measure time in units of  
>> tens of milliseconds,and the System.currentTimeMillis() is just base on  
>> operating system.  
>> In my case,I just do DIH with a crontable, Is there any possiblity to get  
>> in that trouble?I am really can not picture what the situation may lead to  
>> the problem.  
>> 
>> 
>> Thanks very much.  
>> 
>> 
>> 2014-04-26 20:49 GMT+08:00 YouPeng Yang :  
>> 
>>> Hi Mark Miller  
>>> Sorry to get you in these discussion .  
>>> I notice that Mark Miller report this issure in  
>>> https://issues.apache.org/jira/browse/SOLR-5734 according to  
>>> https://issues.apache.org/jira/browse/SOLR-5721,but it just happened with  
>>> the zookeeper.  
>>> If I just do DIH with JDBCDataSource ,I do not think it will get the  
>>> problem.  
>>> Please give some hints  
>>> 
> Bonus,just post the last mail I send about the problem:  
>>> 
>>> I have just compare the difference between the version 4.6.0 and 4.7.1.  
>>> Notice that the time in the getConnection function is declared with the  
>>> System.nanoTime in 4.7.1 ,while System.currentTimeMillis().  
>>> Curious about the resson for the change.the benefit of it .Is it  
>>> neccessory?  
>>> I have read the SOLR-5734 ,  
>>> https://issues.apache.org/jira/browse/SOLR-5734  
>>> Do some google about the difference of currentTimeMillis and nano,but  
>>> still can not figure out it.  
>>> 
>>> Thank you very much.  
>>> 
>>> 
>>> 2014-04-26 20:31 GMT+08:00 YouPeng Yang :  
>>> 
>>> Hi  
 I have just compare the difference between the version 4.6.0 and  
 4.7.1. Notice that the time in the getConnection function is declared  
 with the System.nanoTime in 4.7.1 ,while System.currentTimeMillis().  
 Curious about the resson for the change.the benefit of it .Is it  
 neccessory?  
 I have read the SOLR-5734 ,  
 https://issues.apache.org/jira/browse/SOLR-5734  
 Do some google about the difference of currentTimeMillis and nano,but  
 still can not figure out it.  
 
 
 
 
 2014-04-26 2:24 GMT+08:00 Shawn Heisey :  
 
 On 4/25/2014 11:56 AM, Hutchins, Jonathan wrote:  
> 
>> I recently upgraded from 4.6.1 to 4.7.1 and have found that the DIH  
>> process that we are using takes 4x as long to complete. The only odd  
>> thing I notice is when I enable debug logging for the dataimporthandler  
>> process, it appears that in the new version each sql query is resulting  
>> in  
>> a new connection opened through jdbcdatasource (log:  
>> http://pastebin.com/JKh4gpmu). Were there any changes that would  
>> affect  
>> the speed of running a full import?  
>> 
> 
> This is most likely the problem you are experiencing:  
> 
> https://issues.apache.org/jira/browse/SOLR-5954  
> 
> The fix will be in the new 4.8 version. The release process for 4.8 is  
> underway right now. A second release candidate was required yesterday. If 
>  
> no further problems are encountered, the release should be made around 
> the  
> middle of next week. If problems are encountered, the release will be  
> delayed.  
> 
> Here's something very important that has been mentioned before: Solr  
> 4.8 will require Java 7. Previously, Java 6 was required. Java 7u55 (the  
> 

Re: DIH issues with 4.7.1

2014-04-26 Thread Mark Miller
My answer remains the same. I guess if you want more precise terminology, 
nanoTime will generally be monotonic and currentTimeMillis will not be, due to 
things like NTP, etc. You want monotonicity for measuring elapsed times.
-- 
Mark Miller
about.me/markrmiller

On April 26, 2014 at 11:25:16 AM, Walter Underwood (wun...@wunderwood.org) 
wrote:

NTP should slew the clock rather than jump it. I haven't checked recently, but 
that is how it worked in the 90's when I was organizing the NTP hierarchy at 
HP.  

It only does step changes if the clocks is really wrong. That is most likely at 
reboot, when other demons aren't running yet.  

wunder  

On Apr 26, 2014, at 7:30 AM, Mark Miller  wrote:  

> System.currentTimeMillis can jump around due to NTP, etc. If you are trying 
> to count elapsed time, you don’t want to use a method that can jump around 
> with the results.  
> --  
> Mark Miller  
> about.me/markrmiller  
>  
> On April 26, 2014 at 8:58:20 AM, YouPeng Yang (yypvsxf19870...@gmail.com) 
> wrote:  
>  
> Hi Rafał Kuć  
> I got it,the point is many operating systems measure time in units of  
> tens of milliseconds,and the System.currentTimeMillis() is just base on  
> operating system.  
> In my case,I just do DIH with a crontable, Is there any possiblity to get  
> in that trouble?I am really can not picture what the situation may lead to  
> the problem.  
>  
>  
> Thanks very much.  
>  
>  
> 2014-04-26 20:49 GMT+08:00 YouPeng Yang :  
>  
>> Hi Mark Miller  
>> Sorry to get you in these discussion .  
>> I notice that Mark Miller report this issure in  
>> https://issues.apache.org/jira/browse/SOLR-5734 according to  
>> https://issues.apache.org/jira/browse/SOLR-5721,but it just happened with  
>> the zookeeper.  
>> If I just do DIH with JDBCDataSource ,I do not think it will get the  
>> problem.  
>> Please give some hints  
>>  
 Bonus,just post the last mail I send about the problem:  
>>  
>> I have just compare the difference between the version 4.6.0 and 4.7.1.  
>> Notice that the time in the getConnection function is declared with the  
>> System.nanoTime in 4.7.1 ,while System.currentTimeMillis().  
>> Curious about the resson for the change.the benefit of it .Is it  
>> neccessory?  
>> I have read the SOLR-5734 ,  
>> https://issues.apache.org/jira/browse/SOLR-5734  
>> Do some google about the difference of currentTimeMillis and nano,but  
>> still can not figure out it.  
>>  
>> Thank you very much.  
>>  
>>  
>> 2014-04-26 20:31 GMT+08:00 YouPeng Yang :  
>>  
>> Hi  
>>> I have just compare the difference between the version 4.6.0 and  
>>> 4.7.1. Notice that the time in the getConnection function is declared  
>>> with the System.nanoTime in 4.7.1 ,while System.currentTimeMillis().  
>>> Curious about the resson for the change.the benefit of it .Is it  
>>> neccessory?  
>>> I have read the SOLR-5734 ,  
>>> https://issues.apache.org/jira/browse/SOLR-5734  
>>> Do some google about the difference of currentTimeMillis and nano,but  
>>> still can not figure out it.  
>>>  
>>>  
>>>  
>>>  
>>> 2014-04-26 2:24 GMT+08:00 Shawn Heisey :  
>>>  
>>> On 4/25/2014 11:56 AM, Hutchins, Jonathan wrote:  
  
> I recently upgraded from 4.6.1 to 4.7.1 and have found that the DIH  
> process that we are using takes 4x as long to complete. The only odd  
> thing I notice is when I enable debug logging for the dataimporthandler  
> process, it appears that in the new version each sql query is resulting  
> in  
> a new connection opened through jdbcdatasource (log:  
> http://pastebin.com/JKh4gpmu). Were there any changes that would  
> affect  
> the speed of running a full import?  
>  
  
 This is most likely the problem you are experiencing:  
  
 https://issues.apache.org/jira/browse/SOLR-5954  
  
 The fix will be in the new 4.8 version. The release process for 4.8 is  
 underway right now. A second release candidate was required yesterday. If  
 no further problems are encountered, the release should be made around the 
  
 middle of next week. If problems are encountered, the release will be  
 delayed.  
  
 Here's something very important that has been mentioned before: Solr  
 4.8 will require Java 7. Previously, Java 6 was required. Java 7u55 (the  
 current release from Oracle as I write this) is recommended as a minimum.  
  
 If a 4.7.3 version is built, this is a fix that we should backport.  
  
 Thanks,  
 Shawn  
  
  
>>>  
>>  

--  
Walter Underwood  
wun...@wunderwood.org  





Re: DIH issues with 4.7.1

2014-04-26 Thread Walter Underwood
NTP should slew the clock rather than jump it. I haven't checked recently, but 
that is how it worked in the 90's when I was organizing the NTP hierarchy at HP.

It only does step changes if the clocks is really wrong. That is most likely at 
reboot, when other demons aren't running yet.

wunder

On Apr 26, 2014, at 7:30 AM, Mark Miller  wrote:

> System.currentTimeMillis can jump around due to NTP, etc. If you are trying 
> to count elapsed time, you don’t want to use a method that can jump around 
> with the results.
> -- 
> Mark Miller
> about.me/markrmiller
> 
> On April 26, 2014 at 8:58:20 AM, YouPeng Yang (yypvsxf19870...@gmail.com) 
> wrote:
> 
> Hi Rafał Kuć  
> I got it,the point is many operating systems measure time in units of  
> tens of milliseconds,and the System.currentTimeMillis() is just base on  
> operating system.  
> In my case,I just do DIH with a crontable, Is there any possiblity to get  
> in that trouble?I am really can not picture what the situation may lead to  
> the problem.  
> 
> 
> Thanks very much.  
> 
> 
> 2014-04-26 20:49 GMT+08:00 YouPeng Yang :  
> 
>> Hi Mark Miller  
>> Sorry to get you in these discussion .  
>> I notice that Mark Miller report this issure in  
>> https://issues.apache.org/jira/browse/SOLR-5734 according to  
>> https://issues.apache.org/jira/browse/SOLR-5721,but it just happened with  
>> the zookeeper.  
>> If I just do DIH with JDBCDataSource ,I do not think it will get the  
>> problem.  
>> Please give some hints  
>> 
 Bonus,just post the last mail I send about the problem:  
>> 
>> I have just compare the difference between the version 4.6.0 and 4.7.1.  
>> Notice that the time in the getConnection function is declared with the  
>> System.nanoTime in 4.7.1 ,while System.currentTimeMillis().  
>> Curious about the resson for the change.the benefit of it .Is it  
>> neccessory?  
>> I have read the SOLR-5734 ,  
>> https://issues.apache.org/jira/browse/SOLR-5734  
>> Do some google about the difference of currentTimeMillis and nano,but  
>> still can not figure out it.  
>> 
>> Thank you very much.  
>> 
>> 
>> 2014-04-26 20:31 GMT+08:00 YouPeng Yang :  
>> 
>> Hi  
>>> I have just compare the difference between the version 4.6.0 and  
>>> 4.7.1. Notice that the time in the getConnection function is declared  
>>> with the System.nanoTime in 4.7.1 ,while System.currentTimeMillis().  
>>> Curious about the resson for the change.the benefit of it .Is it  
>>> neccessory?  
>>> I have read the SOLR-5734 ,  
>>> https://issues.apache.org/jira/browse/SOLR-5734  
>>> Do some google about the difference of currentTimeMillis and nano,but  
>>> still can not figure out it.  
>>> 
>>> 
>>> 
>>> 
>>> 2014-04-26 2:24 GMT+08:00 Shawn Heisey :  
>>> 
>>> On 4/25/2014 11:56 AM, Hutchins, Jonathan wrote:  
 
> I recently upgraded from 4.6.1 to 4.7.1 and have found that the DIH  
> process that we are using takes 4x as long to complete. The only odd  
> thing I notice is when I enable debug logging for the dataimporthandler  
> process, it appears that in the new version each sql query is resulting  
> in  
> a new connection opened through jdbcdatasource (log:  
> http://pastebin.com/JKh4gpmu). Were there any changes that would  
> affect  
> the speed of running a full import?  
> 
 
 This is most likely the problem you are experiencing:  
 
 https://issues.apache.org/jira/browse/SOLR-5954  
 
 The fix will be in the new 4.8 version. The release process for 4.8 is  
 underway right now. A second release candidate was required yesterday. If  
 no further problems are encountered, the release should be made around the 
  
 middle of next week. If problems are encountered, the release will be  
 delayed.  
 
 Here's something very important that has been mentioned before: Solr  
 4.8 will require Java 7. Previously, Java 6 was required. Java 7u55 (the  
 current release from Oracle as I write this) is recommended as a minimum.  
 
 If a 4.7.3 version is built, this is a fix that we should backport.  
 
 Thanks,  
 Shawn  
 
 
>>> 
>> 

--
Walter Underwood
wun...@wunderwood.org





Re: DIH issues with 4.7.1

2014-04-26 Thread Mark Miller
System.currentTimeMillis can jump around due to NTP, etc. If you are trying to 
count elapsed time, you don’t want to use a method that can jump around with 
the results.
-- 
Mark Miller
about.me/markrmiller

On April 26, 2014 at 8:58:20 AM, YouPeng Yang (yypvsxf19870...@gmail.com) wrote:

Hi Rafał Kuć  
I got it,the point is many operating systems measure time in units of  
tens of milliseconds,and the System.currentTimeMillis() is just base on  
operating system.  
In my case,I just do DIH with a crontable, Is there any possiblity to get  
in that trouble?I am really can not picture what the situation may lead to  
the problem.  


Thanks very much.  


2014-04-26 20:49 GMT+08:00 YouPeng Yang :  

> Hi Mark Miller  
> Sorry to get you in these discussion .  
> I notice that Mark Miller report this issure in  
> https://issues.apache.org/jira/browse/SOLR-5734 according to  
> https://issues.apache.org/jira/browse/SOLR-5721,but it just happened with  
> the zookeeper.  
> If I just do DIH with JDBCDataSource ,I do not think it will get the  
> problem.  
> Please give some hints  
>  
> >> Bonus,just post the last mail I send about the problem:  
>  
> I have just compare the difference between the version 4.6.0 and 4.7.1.  
> Notice that the time in the getConnection function is declared with the  
> System.nanoTime in 4.7.1 ,while System.currentTimeMillis().  
> Curious about the resson for the change.the benefit of it .Is it  
> neccessory?  
> I have read the SOLR-5734 ,  
> https://issues.apache.org/jira/browse/SOLR-5734  
> Do some google about the difference of currentTimeMillis and nano,but  
> still can not figure out it.  
>  
> Thank you very much.  
>  
>  
> 2014-04-26 20:31 GMT+08:00 YouPeng Yang :  
>  
> Hi  
>> I have just compare the difference between the version 4.6.0 and  
>> 4.7.1. Notice that the time in the getConnection function is declared  
>> with the System.nanoTime in 4.7.1 ,while System.currentTimeMillis().  
>> Curious about the resson for the change.the benefit of it .Is it  
>> neccessory?  
>> I have read the SOLR-5734 ,  
>> https://issues.apache.org/jira/browse/SOLR-5734  
>> Do some google about the difference of currentTimeMillis and nano,but  
>> still can not figure out it.  
>>  
>>  
>>  
>>  
>> 2014-04-26 2:24 GMT+08:00 Shawn Heisey :  
>>  
>> On 4/25/2014 11:56 AM, Hutchins, Jonathan wrote:  
>>>  
 I recently upgraded from 4.6.1 to 4.7.1 and have found that the DIH  
 process that we are using takes 4x as long to complete. The only odd  
 thing I notice is when I enable debug logging for the dataimporthandler  
 process, it appears that in the new version each sql query is resulting  
 in  
 a new connection opened through jdbcdatasource (log:  
 http://pastebin.com/JKh4gpmu). Were there any changes that would  
 affect  
 the speed of running a full import?  
  
>>>  
>>> This is most likely the problem you are experiencing:  
>>>  
>>> https://issues.apache.org/jira/browse/SOLR-5954  
>>>  
>>> The fix will be in the new 4.8 version. The release process for 4.8 is  
>>> underway right now. A second release candidate was required yesterday. If  
>>> no further problems are encountered, the release should be made around the  
>>> middle of next week. If problems are encountered, the release will be  
>>> delayed.  
>>>  
>>> Here's something very important that has been mentioned before: Solr  
>>> 4.8 will require Java 7. Previously, Java 6 was required. Java 7u55 (the  
>>> current release from Oracle as I write this) is recommended as a minimum.  
>>>  
>>> If a 4.7.3 version is built, this is a fix that we should backport.  
>>>  
>>> Thanks,  
>>> Shawn  
>>>  
>>>  
>>  
>  


Re: DIH issues with 4.7.1

2014-04-26 Thread YouPeng Yang
Hi Rafał Kuć
  I got it,the point is  many operating systems measure time in units of
tens of milliseconds,and the  System.currentTimeMillis() is  just base on
operating system.
  In my case,I just do DIH with a crontable, Is there any possiblity to get
in that trouble?I am really can not picture what the situation may lead to
the problem.


Thanks very much.


2014-04-26 20:49 GMT+08:00 YouPeng Yang :

> Hi Mark Miller
>   Sorry to get you in these discussion .
>   I notice that Mark Miller report this issure in
> https://issues.apache.org/jira/browse/SOLR-5734 according to
> https://issues.apache.org/jira/browse/SOLR-5721,but it just happened with
> the zookeeper.
>   If I just do DIH with JDBCDataSource ,I do not think it will get the
> problem.
>   Please give some hints
>
>  >> Bonus,just post the last mail I send about the problem:
>
>I have just compare the difference between the version 4.6.0 and 4.7.1.
> Notice that the time in the getConnection function   is declared with the
> System.nanoTime in 4.7.1 ,while System.currentTimeMillis().
>   Curious about the resson for the change.the benefit of it .Is it
> neccessory?
>I have read the SOLR-5734 ,
> https://issues.apache.org/jira/browse/SOLR-5734
>Do some google about the difference of currentTimeMillis and nano,but
> still can not figure out it.
>
> Thank you very much.
>
>
> 2014-04-26 20:31 GMT+08:00 YouPeng Yang :
>
> Hi
>>I have just compare the difference between the version 4.6.0 and
>> 4.7.1. Notice that the time in the getConnection function   is declared
>> with the System.nanoTime in 4.7.1 ,while System.currentTimeMillis().
>>   Curious about the resson for the change.the benefit of it .Is it
>> neccessory?
>>I have read the SOLR-5734 ,
>> https://issues.apache.org/jira/browse/SOLR-5734
>>Do some google about the difference of currentTimeMillis and nano,but
>> still can not figure out it.
>>
>>
>>
>>
>> 2014-04-26 2:24 GMT+08:00 Shawn Heisey :
>>
>> On 4/25/2014 11:56 AM, Hutchins, Jonathan wrote:
>>>
 I recently upgraded from 4.6.1 to 4.7.1 and have found that the DIH
 process that we are using takes 4x as long to complete.  The only odd
 thing I notice is when I enable debug logging for the dataimporthandler
 process, it appears that in the new version each sql query is resulting
 in
 a new connection opened through jdbcdatasource (log:
 http://pastebin.com/JKh4gpmu).  Were there any changes that would
 affect
 the speed of running a full import?

>>>
>>> This is most likely the problem you are experiencing:
>>>
>>> https://issues.apache.org/jira/browse/SOLR-5954
>>>
>>> The fix will be in the new 4.8 version.  The release process for 4.8 is
>>> underway right now.  A second release candidate was required yesterday.  If
>>> no further problems are encountered, the release should be made around the
>>> middle of next week.  If problems are encountered, the release will be
>>> delayed.
>>>
>>> Here's something very important that has been mentioned before:  Solr
>>> 4.8 will require Java 7.  Previously, Java 6 was required.  Java 7u55 (the
>>> current release from Oracle as I write this) is recommended as a minimum.
>>>
>>> If a 4.7.3 version is built, this is a fix that we should backport.
>>>
>>> Thanks,
>>> Shawn
>>>
>>>
>>
>


Re: DIH issues with 4.7.1

2014-04-26 Thread YouPeng Yang
Hi Mark Miller
  Sorry to get you in these discussion .
  I notice that Mark Miller report this issure in
https://issues.apache.org/jira/browse/SOLR-5734 according to
https://issues.apache.org/jira/browse/SOLR-5721,but it just happened with
the zookeeper.
  If I just do DIH with JDBCDataSource ,I do not think it will get the
problem.
  Please give some hints

 >> Bonus,just post the last mail I send about the problem:
   I have just compare the difference between the version 4.6.0 and 4.7.1.
Notice that the time in the getConnection function   is declared with the
System.nanoTime in 4.7.1 ,while System.currentTimeMillis().
  Curious about the resson for the change.the benefit of it .Is it
neccessory?
   I have read the SOLR-5734 ,
https://issues.apache.org/jira/browse/SOLR-5734
   Do some google about the difference of currentTimeMillis and nano,but
still can not figure out it.

Thank you very much.


2014-04-26 20:31 GMT+08:00 YouPeng Yang :

> Hi
>I have just compare the difference between the version 4.6.0 and 4.7.1.
> Notice that the time in the getConnection function   is declared with the
> System.nanoTime in 4.7.1 ,while System.currentTimeMillis().
>   Curious about the resson for the change.the benefit of it .Is it
> neccessory?
>I have read the SOLR-5734 ,
> https://issues.apache.org/jira/browse/SOLR-5734
>Do some google about the difference of currentTimeMillis and nano,but
> still can not figure out it.
>
>
>
>
> 2014-04-26 2:24 GMT+08:00 Shawn Heisey :
>
> On 4/25/2014 11:56 AM, Hutchins, Jonathan wrote:
>>
>>> I recently upgraded from 4.6.1 to 4.7.1 and have found that the DIH
>>> process that we are using takes 4x as long to complete.  The only odd
>>> thing I notice is when I enable debug logging for the dataimporthandler
>>> process, it appears that in the new version each sql query is resulting
>>> in
>>> a new connection opened through jdbcdatasource (log:
>>> http://pastebin.com/JKh4gpmu).  Were there any changes that would affect
>>> the speed of running a full import?
>>>
>>
>> This is most likely the problem you are experiencing:
>>
>> https://issues.apache.org/jira/browse/SOLR-5954
>>
>> The fix will be in the new 4.8 version.  The release process for 4.8 is
>> underway right now.  A second release candidate was required yesterday.  If
>> no further problems are encountered, the release should be made around the
>> middle of next week.  If problems are encountered, the release will be
>> delayed.
>>
>> Here's something very important that has been mentioned before:  Solr 4.8
>> will require Java 7.  Previously, Java 6 was required.  Java 7u55 (the
>> current release from Oracle as I write this) is recommended as a minimum.
>>
>> If a 4.7.3 version is built, this is a fix that we should backport.
>>
>> Thanks,
>> Shawn
>>
>>
>


Re: DIH issues with 4.7.1

2014-04-26 Thread Rafał Kuć
Hello!

Look at the javadocs for both. The granularity of
System.currentTimeMillis() depend on the operating system, so it may
happen that calls to that method that are 1 millisecond away from each
other still return the same value. This is not the case with
System.nanoTime() -
http://docs.oracle.com/javase/7/docs/api/java/lang/System.html

-- 
Regards,
 Rafał Kuć
Performance Monitoring * Log Analytics * Search Analytics
Solr & Elasticsearch Support * http://sematext.com/


> Hi
>I have just compare the difference between the version 4.6.0 and 4.7.1.
> Notice that the time in the getConnection function   is declared with the
> System.nanoTime in 4.7.1 ,while System.currentTimeMillis().
>   Curious about the resson for the change.the benefit of it .Is it
> neccessory?
>I have read the SOLR-5734 ,
> https://issues.apache.org/jira/browse/SOLR-5734
>Do some google about the difference of currentTimeMillis and nano,but
> still can not figure out it.




> 2014-04-26 2:24 GMT+08:00 Shawn Heisey :

>> On 4/25/2014 11:56 AM, Hutchins, Jonathan wrote:
>>
>>> I recently upgraded from 4.6.1 to 4.7.1 and have found that the DIH
>>> process that we are using takes 4x as long to complete.  The only odd
>>> thing I notice is when I enable debug logging for the dataimporthandler
>>> process, it appears that in the new version each sql query is resulting in
>>> a new connection opened through jdbcdatasource (log:
>>> http://pastebin.com/JKh4gpmu).  Were there any changes that would affect
>>> the speed of running a full import?
>>>
>>
>> This is most likely the problem you are experiencing:
>>
>> https://issues.apache.org/jira/browse/SOLR-5954
>>
>> The fix will be in the new 4.8 version.  The release process for 4.8 is
>> underway right now.  A second release candidate was required yesterday.  If
>> no further problems are encountered, the release should be made around the
>> middle of next week.  If problems are encountered, the release will be
>> delayed.
>>
>> Here's something very important that has been mentioned before:  Solr 4.8
>> will require Java 7.  Previously, Java 6 was required.  Java 7u55 (the
>> current release from Oracle as I write this) is recommended as a minimum.
>>
>> If a 4.7.3 version is built, this is a fix that we should backport.
>>
>> Thanks,
>> Shawn
>>
>>



Re: DIH issues with 4.7.1

2014-04-26 Thread YouPeng Yang
Hi
   I have just compare the difference between the version 4.6.0 and 4.7.1.
Notice that the time in the getConnection function   is declared with the
System.nanoTime in 4.7.1 ,while System.currentTimeMillis().
  Curious about the resson for the change.the benefit of it .Is it
neccessory?
   I have read the SOLR-5734 ,
https://issues.apache.org/jira/browse/SOLR-5734
   Do some google about the difference of currentTimeMillis and nano,but
still can not figure out it.




2014-04-26 2:24 GMT+08:00 Shawn Heisey :

> On 4/25/2014 11:56 AM, Hutchins, Jonathan wrote:
>
>> I recently upgraded from 4.6.1 to 4.7.1 and have found that the DIH
>> process that we are using takes 4x as long to complete.  The only odd
>> thing I notice is when I enable debug logging for the dataimporthandler
>> process, it appears that in the new version each sql query is resulting in
>> a new connection opened through jdbcdatasource (log:
>> http://pastebin.com/JKh4gpmu).  Were there any changes that would affect
>> the speed of running a full import?
>>
>
> This is most likely the problem you are experiencing:
>
> https://issues.apache.org/jira/browse/SOLR-5954
>
> The fix will be in the new 4.8 version.  The release process for 4.8 is
> underway right now.  A second release candidate was required yesterday.  If
> no further problems are encountered, the release should be made around the
> middle of next week.  If problems are encountered, the release will be
> delayed.
>
> Here's something very important that has been mentioned before:  Solr 4.8
> will require Java 7.  Previously, Java 6 was required.  Java 7u55 (the
> current release from Oracle as I write this) is recommended as a minimum.
>
> If a 4.7.3 version is built, this is a fix that we should backport.
>
> Thanks,
> Shawn
>
>


Re: DIH issues with 4.7.1

2014-04-25 Thread Shawn Heisey

On 4/25/2014 11:56 AM, Hutchins, Jonathan wrote:

I recently upgraded from 4.6.1 to 4.7.1 and have found that the DIH
process that we are using takes 4x as long to complete.  The only odd
thing I notice is when I enable debug logging for the dataimporthandler
process, it appears that in the new version each sql query is resulting in
a new connection opened through jdbcdatasource (log:
http://pastebin.com/JKh4gpmu).  Were there any changes that would affect
the speed of running a full import?


This is most likely the problem you are experiencing:

https://issues.apache.org/jira/browse/SOLR-5954

The fix will be in the new 4.8 version.  The release process for 4.8 is 
underway right now.  A second release candidate was required yesterday.  
If no further problems are encountered, the release should be made 
around the middle of next week.  If problems are encountered, the 
release will be delayed.


Here's something very important that has been mentioned before:  Solr 
4.8 will require Java 7.  Previously, Java 6 was required.  Java 7u55 
(the current release from Oracle as I write this) is recommended as a 
minimum.


If a 4.7.3 version is built, this is a fix that we should backport.

Thanks,
Shawn



Re: DIH issues with 4.7.1

2014-04-25 Thread Alan Woodward
Hi Jonathan,

It's a known bug: https://issues.apache.org/jira/browse/SOLR-5954.  It'll be 
fixed in 4.8, which is being voted on now.

Alan Woodward
www.flax.co.uk


On 25 Apr 2014, at 18:56, Hutchins, Jonathan wrote:

> I recently upgraded from 4.6.1 to 4.7.1 and have found that the DIH
> process that we are using takes 4x as long to complete.  The only odd
> thing I notice is when I enable debug logging for the dataimporthandler
> process, it appears that in the new version each sql query is resulting in
> a new connection opened through jdbcdatasource (log:
> http://pastebin.com/JKh4gpmu).  Were there any changes that would affect
> the speed of running a full import?
> 
> Thanks!
> 
> - Jonathan Hutchins
> 
>