Re: [jira] Created: (LUCENE-1172) Small speedups to DocumentsWriter

2008-02-11 Thread robert engels
I am not disputing that there is a speed improvement. I am disputing that the performance gain of many of these patches is not worth the additional complexity in the code. Clear code will allow for more radical improvements as more eyes will be able to easily understand the inner workings a

[jira] Resolved: (LUCENE-325) [PATCH] new method expungeDeleted() added to IndexWriter

2008-02-11 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-325?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless resolved LUCENE-325. --- Resolution: Fixed I just committed this. Thanks John! And sorry for the long de

[jira] Commented: (LUCENE-1173) index corruption autoCommit=false

2008-02-11 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1173?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12567849#action_12567849 ] Michael McCandless commented on LUCENE-1173: Yes this is one awesome test case

[jira] Updated: (LUCENE-1173) index corruption autoCommit=false

2008-02-11 Thread Yonik Seeley (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1173?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yonik Seeley updated LUCENE-1173: - Attachment: indexstress.patch Thanks Mike! Attaching new version of test that correctly deals w

[jira] Updated: (LUCENE-1174) outdated information in Analyzer javadoc

2008-02-11 Thread Daniel Naber (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1174?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Naber updated LUCENE-1174: - Attachment: analyzer-javadoc.diff > outdated information in Analyzer javadoc > -

[jira] Created: (LUCENE-1174) outdated information in Analyzer javadoc

2008-02-11 Thread Daniel Naber (JIRA)
outdated information in Analyzer javadoc Key: LUCENE-1174 URL: https://issues.apache.org/jira/browse/LUCENE-1174 Project: Lucene - Java Issue Type: Bug Components: Javadocs Affects Versi

[jira] Commented: (LUCENE-1173) index corruption autoCommit=false

2008-02-11 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1173?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12567819#action_12567819 ] Michael McCandless commented on LUCENE-1173: Uh oh ... I'll take this! > inde

[jira] Commented: (LUCENE-1173) index corruption autoCommit=false

2008-02-11 Thread Yonik Seeley (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1173?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12567818#action_12567818 ] Yonik Seeley commented on LUCENE-1173: -- Note: if I reduce the test to indexing with a

[jira] Updated: (LUCENE-1173) index corruption autoCommit=false

2008-02-11 Thread Yonik Seeley (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1173?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yonik Seeley updated LUCENE-1173: - Attachment: indexstress.patch Attaching a patch that can reproduce. With autoCommit=true, the te

[jira] Created: (LUCENE-1173) index corruption autoCommit=false

2008-02-11 Thread Yonik Seeley (JIRA)
index corruption autoCommit=false - Key: LUCENE-1173 URL: https://issues.apache.org/jira/browse/LUCENE-1173 Project: Lucene - Java Issue Type: Bug Components: Index Affects Versions: 2.3

Re: [jira] Created: (LUCENE-1172) Small speedups to DocumentsWriter

2008-02-11 Thread Michael McCandless
Grant Ingersoll wrote: Also, perhaps we should spin off another thread to discuss how to make DocsWriter easier to maintain. My biggest concern is understanding how the various threads work together, and a few other areas but, like I said, let's spin up a separate thread to brainstorm w

[jira] Resolved: (LUCENE-1044) Behavior on hard power shutdown

2008-02-11 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1044?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless resolved LUCENE-1044. Resolution: Fixed > Behavior on hard power shutdown >

Re: [jira] Created: (LUCENE-1172) Small speedups to DocumentsWriter

2008-02-11 Thread Grant Ingersoll
OK, I am convinced that this one is useful. Also, perhaps we should spin off another thread to discuss how to make DocsWriter easier to maintain. My biggest concern is understanding how the various threads work together, and a few other areas but, like I said, let's spin up a separate thr

Re: [jira] Created: (LUCENE-1172) Small speedups to DocumentsWriter

2008-02-11 Thread Doug Cutting
Michael McCandless wrote: In fact I've found you need to pursue both the 2x type gains and also the many smaller ones, to reach good performance. +1 Put another way, you must address both the asymptotic behavior and the constant factors. A good order-of-algorithms implementation is worthles

[jira] Updated: (LUCENE-1173) index corruption autoCommit=false

2008-02-11 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1173?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless updated LUCENE-1173: --- Attachment: LUCENE-1173.patch I just sent email to java-user to give a heads up on t

Re: [jira] Created: (LUCENE-1172) Small speedups to DocumentsWriter

2008-02-11 Thread robert engels
One final thing, the guys responsible for the sorting in Arrays.java - Joshua Bloch and Neal Gafter. Now I KNOW there must be a very good reason for the choices they made... On Feb 11, 2008, at 9:35 AM, robert engels wrote: Also, these couple of paging have some very good information on sor

Re: [jira] Created: (LUCENE-1172) Small speedups to DocumentsWriter

2008-02-11 Thread robert engels
Also, these couple of paging have some very good information on sorting, and why heapsort is even faster than quicksort... http://users.aims.ac.za/~mackay/sorting/sorting.html http://www.azillionmonkeys.com/qed/sort.html On Feb 11, 2008, at 9:29 AM, robert engels wrote: My intent was not to

Re: [jira] Created: (LUCENE-1172) Small speedups to DocumentsWriter

2008-02-11 Thread robert engels
My intent was not to diminish your hard work. We all appreciate it. I was only trying to caution that 4% gains are not all what they seem to be. If you looks at Arrays.java in the 1.5 JDK, and read through the javadoc, you will quickly see that the sorting is well-thought out. They use a

Re: [jira] Updated: (LUCENE-1173) index corruption autoCommit=false

2008-02-11 Thread Michael Busch
Michael McCandless (JIRA) wrote: > [ > https://issues.apache.org/jira/browse/LUCENE-1173?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel > ] > > Michael McCandless updated LUCENE-1173: > --- > > Attachment: LUCENE-1173.patch > > I

[jira] Assigned: (LUCENE-1173) index corruption autoCommit=false

2008-02-11 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1173?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless reassigned LUCENE-1173: -- Assignee: Michael McCandless > index corruption autoCommit=false > ---

Re: Lucene-based Distributed Index Leveraging Hadoop

2008-02-11 Thread Tim Jones
I am guessing that the idea behind not putting the indexes in HDFS is (1) maximize performance; (2) they are relatively transient - meaning the data they are created from could be in HDFS, but the indexes themselves are just local. To avoid having to recreate them, a backup copy could be k

[jira] Commented: (LUCENE-1175) occasional MergeException while indexing

2008-02-11 Thread Yonik Seeley (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12567900#action_12567900 ] Yonik Seeley commented on LUCENE-1175: -- Another exception, this time during IndexRead

[jira] Commented: (LUCENE-1175) occasional MergeException while indexing

2008-02-11 Thread Yonik Seeley (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12567880#action_12567880 ] Yonik Seeley commented on LUCENE-1175: -- OK, not much info to reproduce at this point,

[jira] Created: (LUCENE-1175) occasional MergeException while indexing

2008-02-11 Thread Yonik Seeley (JIRA)
occasional MergeException while indexing Key: LUCENE-1175 URL: https://issues.apache.org/jira/browse/LUCENE-1175 Project: Lucene - Java Issue Type: Bug Affects Versions: 2.3 Reporter:

[jira] Resolved: (LUCENE-1171) Make DocumentsWriter more robust on hitting OOM

2008-02-11 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1171?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless resolved LUCENE-1171. Resolution: Fixed > Make DocumentsWriter more robust on hitting OOM >

Re: [jira] Created: (LUCENE-1172) Small speedups to DocumentsWriter

2008-02-11 Thread Michael McCandless
In fact I've found you need to pursue both the 2x type gains and also the many smaller ones, to reach good performance. And it requires alot of ongoing vigilence to keep good performance. You lose 3-4% here and there and very quickly, very easily you're 2X slower. These tests are very real. I'm

Re: [jira] Created: (LUCENE-1172) Small speedups to DocumentsWriter

2008-02-11 Thread eks dev
again, as long as you do not make one step forward into actual code, we will continue to have what we have today, as this is the best what we have. you made your statement: "Clear code will allow for more radical improvements as more eyes will be able to easily understand the inner workings an

[jira] Commented: (LUCENE-167) [PATCH] QueryParser not handling queries containing AND and OR

2008-02-11 Thread Graham Maloon (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12567556#action_12567556 ] Graham Maloon commented on LUCENE-167: -- I see that very little has been done with this

[jira] Commented: (LUCENE-1170) query with AND and OR not retrieving correct results

2008-02-11 Thread Graham Maloon (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1170?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12567550#action_12567550 ] Graham Maloon commented on LUCENE-1170: --- Lucene-167 has a patch for the version in 2

Re: [jira] Created: (LUCENE-1172) Small speedups to DocumentsWriter

2008-02-11 Thread robert engels
The reason it needs (or should be done) on Unix, is that it is much easier (and better I think) at reporting the "real" timings. What the reporter stated was in (most likely) real time? which is not the best way to measure performance - especially on multi user/tasking OSes. The unix time f

Re: [jira] Commented: (LUCENE-1173) index corruption autoCommit=false

2008-02-11 Thread Michael Busch
Yonik Seeley (JIRA) wrote: > [ > https://issues.apache.org/jira/browse/LUCENE-1173?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12567878#action_12567878 > ] > > Yonik Seeley commented on LUCENE-1173: > -- > > Hol

[jira] Commented: (LUCENE-1173) index corruption autoCommit=false

2008-02-11 Thread Yonik Seeley (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1173?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12567878#action_12567878 ] Yonik Seeley commented on LUCENE-1173: -- Hold up a bit... my random testing may have h

[jira] Commented: (LUCENE-1173) index corruption autoCommit=false

2008-02-11 Thread Yonik Seeley (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1173?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12567873#action_12567873 ] Yonik Seeley commented on LUCENE-1173: -- Patch looks good (heh... a one liner!) At lea

Re: [jira] Updated: (LUCENE-1173) index corruption autoCommit=false

2008-02-11 Thread Michael McCandless
OK I'll backport this fix. I'd also like to backport LUCENE-1168 (another corruption case when autoCommit=false) and LUCENE-1171 (deadlock on hitting OOM). Mike Michael Busch wrote: Michael McCandless (JIRA) wrote: [ https://issues.apache.org/jira/browse/LUCENE-1173? page=com.atlass

Re: [jira] Created: (LUCENE-1172) Small speedups to DocumentsWriter

2008-02-11 Thread eks dev
Robert, you may or may not be right, I do not know. The only way to prove it would be to show you can do it better, no? If you are so convinced this is wrong, you could, much better than quoting textbooks: a) write better patch, get attention with something you think is "better bottleneck"