date:20080211

Re: [jira] Created: (LUCENE-1172) Small speedups to DocumentsWriter

2008-02-11 Thread robert engels

I am not disputing that there is a speed improvement. I am disputing  
that the performance gain of many of these patches is not worth the  
additional complexity in the code. Clear code will allow for more  
radical improvements as more eyes will be able to easily understand  
the inner workings and offer better algorithms, not just micro  
improvements that the JVM (eventually) can probably figure out on its  
own.


It is a value judgement, and regretfully I don't have another 30  
years to pass down the full knowledge behind my reasoning.


Luckily, however, there are some very good books available on the  
subject...


It's not the fault of the submitter, but many of these timings are  
suspect due to difficulty in measuring the improvements accurately.


Here is a simple example:

You can configure the JVM to not perform aggressive garbage  
collection, and write a program that generates a lot garbage - but it  
runs very fast (not GCing), until the GC eventually occurs (if the  
program runs long enough). It may be overall much slower than an  
alternative that runs slower as it executes, but has code to manage  
the objects as they are created, and rarely if ever hits a GC cycle.   
But then, the JVM (e.g. generational GC) can implement improvements  
that makes choice A faster (and the better choice)... and the cycle  
continues...


Without detailed timings and other metrics (GC pauses, IO, memory  
utilization, native compilation, etc.) most benchmarks are not very  
accurate or useful.  There are a lot of variables to consider - maybe  
more so than can reasonably be considered.  That is why a 4% gain is  
highly suspect.  If the gain was 25%, or 50% or 100%, you have a  
better chance of it being an innate improvement, and not just the  
interaction of some other factors.


On Feb 11, 2008, at 2:32 AM, eks dev wrote:


Robert,

you may or may not be right, I do not know. The only way to prove  
it would be to show you can do it better, no?
If you are so convinced this is wrong, you could, much better than  
quoting textbooks:


a) write better patch, get attention with something you think is  
"better bottleneck"
b) provide realistic "performance tests" as you dispute the  
measurement provided here


It has to be that concrete, academic discussions are cool, but at  
the end of a day, it is the code that executes that counts.


cheers,
eks

- Original Message 
From: robert engels <[EMAIL PROTECTED]>
To: java-dev@lucene.apache.org
Sent: Sunday, 10 February, 2008 9:15:30 PM
Subject: Re: [jira] Created: (LUCENE-1172) Small speedups to  
DocumentsWriter


I am not sure these numbers matter. I think they are skewed because
you are probably running too short a test, and the index is in memory
(or OS cache).

Once you use a real index that needs to read/write from the disk, the
percentage change will be negligible.

This is the problem with many of these "performance changes" - they
just aren't real world enough.  Even if they were, I would argue that
code simplicity/maintainability is worth more than 6 seconds on a
operation that takes 4 minutes to run...

There are many people that believe micro benchmarks are next to
worthless. A good rule of thumb is that if the optimization doesn't
result in 2x speedup, it probably shouldn't be done. In most cases
any efficiency gains are later lost in maintainability issues.

See http://en.wikipedia.org/wiki/Optimization_(computer_science)

Almost always there is a better bottleneck somewhere.

On Feb 10, 2008, at 1:37 PM, Michael McCandless wrote:



Yonik Seeley wrote:


I wonder how well a single generic quickSort(Object[] arr, int low,
int high) would perform vs the type-specific ones?  I guess the main
overhead would be a cast from Object to the specific class to do the
compare?  Too bad Java doesn't have true generics/templates.



OK I tested this.

Starting from the patch on LUCENE-1172, which has 3 quickSort methods
(one per type), I created a single quickSort method on Object[] that
takes a Comparator, and made 3 Comparators instead.

Mac OS X 10.4 (JVM 1.5):

original patch --> 247.1
  simplified patch --> 254.9 (3.2% slower)

Windows Server 2003 R64 (JVM 1.6):

original patch --> 440.6
  simplified patch --> 452.7 (2.7% slower)

The times are best in 10 runs.  I'm running all tests with these JVM
args:

  -Xms1024M -Xmx1024M -Xbatch -server

I think this is a big enough difference in performance that it's
worth keeping 3 separate quickSorts in DocumentsWriter.

Mike

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]




-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]






  __
Sent from Yahoo! Mail - a smarter inbox http://uk.mail.yahoo.com



-

[jira] Resolved: (LUCENE-325) [PATCH] new method expungeDeleted() added to IndexWriter

2008-02-11 Thread Michael McCandless (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-325?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless resolved LUCENE-325.
---

Resolution: Fixed

I just committed this.  Thanks John!  And sorry for the long
delay.

I also added an "these APIs are experimental" warning on top of
MergePolicy and MergeScheduler (which I should have done before
2.3 :(, though I don't expect alot of usage of these).


> [PATCH] new method expungeDeleted() added to IndexWriter
> 
>
> Key: LUCENE-325
> URL: https://issues.apache.org/jira/browse/LUCENE-325
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Index
>Affects Versions: CVS Nightly - Specify date in submission
> Environment: Operating System: Windows XP
> Platform: All
>Reporter: John Wang
>Assignee: Michael McCandless
>Priority: Minor
> Fix For: 2.4
>
> Attachments: attachment.txt, IndexWriter.patch, IndexWriter.patch, 
> LUCENE-325.patch, TestExpungeDeleted.java
>
>
> We make use the docIDs in lucene. I need a way to compact the docIDs in 
> segments
> to remove the "holes" created from doing deletes. The only way to do this is 
> by
> calling IndexWriter.optimize(). This is a very heavy call, for the cases where
> the index is large but with very small number of deleted docs, calling 
> optimize
> is not practical.
> I need a new method: expungeDeleted(), which finds all the segments that have
> delete documents and merge only those segments.
> I have implemented this method and have discussed with Otis about submitting a
> patch. I don't see where I can attached the patch. I will do according to the
> patch guidleine and email the lucene mailing list.
> Thanks
> -John
> I don't see a place where I can

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

[jira] Commented: (LUCENE-1173) index corruption autoCommit=false

2008-02-11 Thread Michael McCandless (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-1173?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12567849#action_12567849
 ] 

Michael McCandless commented on LUCENE-1173:


Yes this is one awesome test case :)

Thanks.

> index corruption autoCommit=false
> -
>
> Key: LUCENE-1173
> URL: https://issues.apache.org/jira/browse/LUCENE-1173
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: Index
>Affects Versions: 2.3
>Reporter: Yonik Seeley
>Assignee: Michael McCandless
>Priority: Critical
> Attachments: indexstress.patch, indexstress.patch
>
>
> In both Lucene 2.3 and trunk, the index becomes corrupted when 
> autoCommit=false

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

[jira] Updated: (LUCENE-1173) index corruption autoCommit=false

2008-02-11 Thread Yonik Seeley (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-1173?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yonik Seeley updated LUCENE-1173:
-

Attachment: indexstress.patch

Thanks Mike!

Attaching new version of test that correctly deals with terms with no docs 
(because of deletions).
Other variations were failing before, now it's just those with autoCommit=false

Note that it's possible to trigger this bug by indexing only 3 documents:
mergeFactor=2; maxBufferedDocs=2; Map docs = indexRandom(1, 3, 2, dir1);

I love random testing:-)

> index corruption autoCommit=false
> -
>
> Key: LUCENE-1173
> URL: https://issues.apache.org/jira/browse/LUCENE-1173
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: Index
>Affects Versions: 2.3
>Reporter: Yonik Seeley
>Assignee: Michael McCandless
>Priority: Critical
> Attachments: indexstress.patch, indexstress.patch
>
>
> In both Lucene 2.3 and trunk, the index becomes corrupted when 
> autoCommit=false

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

[jira] Updated: (LUCENE-1174) outdated information in Analyzer javadoc

2008-02-11 Thread Daniel Naber (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-1174?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Naber updated LUCENE-1174:
-

Attachment: analyzer-javadoc.diff

> outdated information in Analyzer javadoc
> 
>
> Key: LUCENE-1174
> URL: https://issues.apache.org/jira/browse/LUCENE-1174
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: Javadocs
>Affects Versions: 2.3
>Reporter: Daniel Naber
>Priority: Minor
> Attachments: analyzer-javadoc.diff
>
>
> I'm sure you find more ways to improve the javadoc, so feel free to change 
> and extend my patch.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

[jira] Created: (LUCENE-1174) outdated information in Analyzer javadoc

2008-02-11 Thread Daniel Naber (JIRA)

outdated information in Analyzer javadoc


 Key: LUCENE-1174
 URL: https://issues.apache.org/jira/browse/LUCENE-1174
 Project: Lucene - Java
  Issue Type: Bug
  Components: Javadocs
Affects Versions: 2.3
Reporter: Daniel Naber
Priority: Minor
 Attachments: analyzer-javadoc.diff

I'm sure you find more ways to improve the javadoc, so feel free to change and 
extend my patch.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

[jira] Commented: (LUCENE-1173) index corruption autoCommit=false

2008-02-11 Thread Michael McCandless (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-1173?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12567819#action_12567819
 ] 

Michael McCandless commented on LUCENE-1173:


Uh oh ... I'll take this!

> index corruption autoCommit=false
> -
>
> Key: LUCENE-1173
> URL: https://issues.apache.org/jira/browse/LUCENE-1173
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: Index
>Affects Versions: 2.3
>Reporter: Yonik Seeley
>Priority: Critical
> Attachments: indexstress.patch
>
>
> In both Lucene 2.3 and trunk, the index becomes corrupted when 
> autoCommit=false

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

[jira] Commented: (LUCENE-1173) index corruption autoCommit=false

2008-02-11 Thread Yonik Seeley (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-1173?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12567818#action_12567818
 ] 

Yonik Seeley commented on LUCENE-1173:
--

Note: if I reduce the test to indexing with a single thread, it still fails.
Map docs = indexRandom(1, 50, 50, dir1);
The test still does the indexing in a different thread than the close(), so 
it's not quite a single threaded test.

Another thing to note: all of the terms are matching up (the test succeeds if I 
don't test the stored fields).


> index corruption autoCommit=false
> -
>
> Key: LUCENE-1173
> URL: https://issues.apache.org/jira/browse/LUCENE-1173
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: Index
>Affects Versions: 2.3
>Reporter: Yonik Seeley
>Priority: Critical
> Attachments: indexstress.patch
>
>
> In both Lucene 2.3 and trunk, the index becomes corrupted when 
> autoCommit=false

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

[jira] Updated: (LUCENE-1173) index corruption autoCommit=false

2008-02-11 Thread Yonik Seeley (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-1173?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yonik Seeley updated LUCENE-1173:
-

Attachment: indexstress.patch

Attaching a patch that can reproduce.
With autoCommit=true, the test passes. Change it to false and it fails.

The test basic uses multiple threads to update documents.  The last document 
for any id is kept, and then all these docs are indexed again serially.  The 
two indicies are them compared.

> index corruption autoCommit=false
> -
>
> Key: LUCENE-1173
> URL: https://issues.apache.org/jira/browse/LUCENE-1173
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: Index
>Affects Versions: 2.3
>Reporter: Yonik Seeley
>Priority: Critical
> Attachments: indexstress.patch
>
>
> In both Lucene 2.3 and trunk, the index becomes corrupted when 
> autoCommit=false

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

[jira] Created: (LUCENE-1173) index corruption autoCommit=false

2008-02-11 Thread Yonik Seeley (JIRA)

index corruption autoCommit=false
-

 Key: LUCENE-1173
 URL: https://issues.apache.org/jira/browse/LUCENE-1173
 Project: Lucene - Java
  Issue Type: Bug
  Components: Index
Affects Versions: 2.3
Reporter: Yonik Seeley
Priority: Critical


In both Lucene 2.3 and trunk, the index becomes corrupted when autoCommit=false

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: [jira] Created: (LUCENE-1172) Small speedups to DocumentsWriter

2008-02-11 Thread Michael McCandless



Grant Ingersoll wrote:

Also, perhaps we should spin off another thread to discuss how to  
make DocsWriter easier to maintain.  My biggest concern is  
understanding how the various threads work together, and a few  
other areas but, like I said, let's spin up a separate thread to  
brainstorm what is needed.


I agree we should work on simplifying it with time, and, spreading  
the knowledge of how it works.


Note, that there is some risk in just using wikipedia for profiling  
given it's distribution of terms, etc..


Good point.  Previously I was using Europarl, but, that corpus is  
just too fast to index.


Are you thinking Wikipedia is somewhat "dirty" (lots of extra terms  
not normally seen with clean content)?  Since I'm using  
StandardAnalyzer and not an analyzer based on the new  
WikipediaTokenizer, I'm getting even extra terms.  Also, I think we'd  
need an HTMLFilter in the chain since Wikipedia content uses HTML  
markup.  Grant, what analyzer chain do you use when you index Wikipedia?


I also wonder if using the LineDocMaker is all that realistic a  
profiling scenario.  While it is really useful in that it minimizes  
IO interaction, etc. I can't help but feel that it isn't at all  
close to typical usage.  Most users are not going to have all their  
docs rolled up into a single file, 1 doc per line, so I wonder if  
we potentially lose insight into how Lucene performs given that  
other issues like I/O/memory used for loading files may force the  
JVM/Lucene to not have the resources it needs.  Of course, I do  
know it is good to try to isolate things so we can focus just on  
Lucene, but we also should try to make some accounting for how it  
lives in the wild.


I agree, this part is not realistic, and the intention is to measure  
just the indexing time.  In fact I expect most apps spend quite a bit  
more time building up a Document (filtering binary docs, etc) than  
actually indexing it.  The only real-world app that I can think of  
that would be close to LineDocMaker is using Lucene to search big log  
files, where one line = one Document.


Last, I think it would be good to always attach/check in the .alg  
file that is used when running the test, so that others can verify  
on different systems/configurations, etc.


I did post the alg (under LUCENE-1172).  Though I see I forgot to  
{code} it and it looks messed up now.  My recent test to try a single  
quickSort(Object[]) were the same alg, just repeated 10 times instead  
of 3.


But I agree we should always post the alg for all tests...

Mike

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

[jira] Resolved: (LUCENE-1044) Behavior on hard power shutdown

2008-02-11 Thread Michael McCandless (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-1044?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless resolved LUCENE-1044.


Resolution: Fixed

> Behavior on hard power shutdown
> ---
>
> Key: LUCENE-1044
> URL: https://issues.apache.org/jira/browse/LUCENE-1044
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: Index
> Environment: Windows Server 2003, Standard Edition, Sun Hotspot Java 
> 1.5
>Reporter: venkat rangan
>Assignee: Michael McCandless
> Fix For: 2.4
>
> Attachments: FSyncPerfTest.java, LUCENE-1044.patch, 
> LUCENE-1044.take2.patch, LUCENE-1044.take3.patch, LUCENE-1044.take4.patch, 
> LUCENE-1044.take5.patch, LUCENE-1044.take6.patch, LUCENE-1044.take7.patch, 
> LUCENE-1044.take8.patch
>
>
> When indexing a large number of documents, upon a hard power failure  (e.g. 
> pull the power cord), the index seems to get corrupted. We start a Java 
> application as an Windows Service, and feed it documents. In some cases 
> (after an index size of 1.7GB, with 30-40 index segment .cfs files) , the 
> following is observed.
> The 'segments' file contains only zeros. Its size is 265 bytes - all bytes 
> are zeros.
> The 'deleted' file also contains only zeros. Its size is 85 bytes - all bytes 
> are zeros.
> Before corruption, the segments file and deleted file appear to be correct. 
> After this corruption, the index is corrupted and lost.
> This is a problem observed in Lucene 1.4.3. We are not able to upgrade our 
> customer deployments to 1.9 or later version, but would be happy to back-port 
> a patch, if the patch is small enough and if this problem is already solved.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: [jira] Created: (LUCENE-1172) Small speedups to DocumentsWriter

2008-02-11 Thread Grant Ingersoll

OK, I am convinced that this one is useful.  Also, perhaps we should  
spin off another thread to discuss how to make DocsWriter easier to  
maintain.  My biggest concern is understanding how the various threads  
work together, and a few other areas but, like I said, let's spin up a  
separate thread to brainstorm what is needed.


Note, that there is some risk in just using wikipedia for profiling  
given it's distribution of terms, etc..  I also wonder if using the  
LineDocMaker is all that realistic a profiling scenario.  While it is  
really useful in that it minimizes IO interaction, etc. I can't help  
but feel that it isn't at all close to typical usage.  Most users are  
not going to have all their docs rolled up into a single file, 1 doc  
per line, so I wonder if we potentially lose insight into how Lucene  
performs given that other issues like I/O/memory used for loading  
files may force the JVM/Lucene to not have the resources it needs.  Of  
course, I do know it is good to try to isolate things so we can focus  
just on Lucene, but we also should try to make some accounting for how  
it lives in the wild.


Last, I think it would be good to always attach/check in the .alg file  
that is used when running the test, so that others can verify on  
different systems/configurations, etc.


-Grant


On Feb 11, 2008, at 6:14 AM, Michael McCandless wrote:


In fact I've found you need to pursue both the 2x type gains and also
the many smaller ones, to reach good performance.  And it requires
alot of ongoing vigilence to keep good performance.  You lose 3-4%
here and there and very quickly, very easily you're 2X slower.

These tests are very real. I'm indexing Wikipedia content, using
StandardAnalyzer, running under contrib/benchmark.  It's true, in a
real app more time will be spent pulling documents from the source,
but I'm intentionally trying to minimize that in order to measure just
the indexing time.  Getting a 4% gain by replacing mergesort with
quicksort is real.

If the profiler found other 4% gains, with such a small increase in
code complexity, I would passionately argue for those as well.  So
far it hasn't.

Robert if you have some concrete ideas for the 2X type gains, I'm all
ears :)

I certainly agree there is a point where complexity cost doesn't
offset the performance gain, but I think this particular change is
well before that point.

Lucene's indexing throughput is an important metric in its
competitiveness with other search engines.  And I want Lucene to be
the best.

Mike

eks dev wrote:

again, as long as you do not make one step forward into actual  
code, we will continue to have  what we have today, as this is the  
best what we have.


you made your statement:
"Clear code will allow for more radical improvements as more eyes  
will be able to easily understand the inner workings and offer  
better algorithms",


Not a single person here would ever dispute this statement, but  
unfortunately there is no compiler that executes such statements.  
Make a patch that utilizes this "clear-code" paradigm, show us   
these better algorithms on actual example  and than say: "without  
LUCENE-1172 I was able to improve XYZ feature by using ABC  
algorithm". That would work smooth.


Anyhow, I am not going to write more on this topic, sorry for the  
noise...


And Robert, please do not get this wrong, I see your point and I  
respect it! I just felt slight unfairness to the people that make  
the hands dirty writing as clear and fast code as possible.





- Original Message 
From: robert engels <[EMAIL PROTECTED]>
To: java-dev@lucene.apache.org
Sent: Monday, 11 February, 2008 9:55:02 AM
Subject: Re: [jira] Created: (LUCENE-1172) Small speedups to  
DocumentsWriter


I am not disputing that there is a speed improvement. I am disputing
that the performance gain of many of these patches is not worth the
additional complexity in the code. Clear code will allow for more
radical improvements as more eyes will be able to easily understand
the inner workings and offer better algorithms, not just micro
improvements that the JVM (eventually) can probably figure out on its
own.

It is a value judgement, and regretfully I don't have another 30
years to pass down the full knowledge behind my reasoning.

Luckily, however, there are some very good books available on the
subject...

It's not the fault of the submitter, but many of these timings are
suspect due to difficulty in measuring the improvements accurately.

Here is a simple example:

You can configure the JVM to not perform aggressive garbage
collection, and write a program that generates a lot garbage - but it
runs very fast (not GCing), until the GC eventually occurs (if the
program runs long enough). It may be overall much slower than an
alternative that runs slower as it executes, but has code to manage
the objects as they are created, and rarely if ever hits a GC cycle.
But then, the JVM (e.g. generational GC) can implement imp

Re: [jira] Created: (LUCENE-1172) Small speedups to DocumentsWriter

2008-02-11 Thread Doug Cutting


Michael McCandless wrote:

In fact I've found you need to pursue both the 2x type gains and also
the many smaller ones, to reach good performance.


+1  Put another way, you must address both the asymptotic behavior and 
the constant factors.  A good order-of-algorithms implementation is 
worthless if its constant factors are huge, and vice-versa.


Doug

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

[jira] Updated: (LUCENE-1173) index corruption autoCommit=false

2008-02-11 Thread Michael McCandless (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-1173?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless updated LUCENE-1173:
---

Attachment: LUCENE-1173.patch

I just sent email to java-user to give a heads up on this.

Attach patch fixes the issue.  All tests pass.

I think we should spin 2.3.1 for this one?

> index corruption autoCommit=false
> -
>
> Key: LUCENE-1173
> URL: https://issues.apache.org/jira/browse/LUCENE-1173
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: Index
>Affects Versions: 2.3
>Reporter: Yonik Seeley
>Assignee: Michael McCandless
>Priority: Critical
> Attachments: indexstress.patch, indexstress.patch, LUCENE-1173.patch
>
>
> In both Lucene 2.3 and trunk, the index becomes corrupted when 
> autoCommit=false

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: [jira] Created: (LUCENE-1172) Small speedups to DocumentsWriter

2008-02-11 Thread robert engels

One final thing, the guys responsible for the sorting in Arrays.java  
- Joshua Bloch and Neal Gafter.


Now I KNOW there must be a very good reason for the choices they made...

On Feb 11, 2008, at 9:35 AM, robert engels wrote:

Also, these couple of paging have some very good information on  
sorting, and why heapsort is even faster than quicksort...


http://users.aims.ac.za/~mackay/sorting/sorting.html
http://www.azillionmonkeys.com/qed/sort.html


On Feb 11, 2008, at 9:29 AM, robert engels wrote:

My intent was not to diminish your hard work. We all appreciate  
it. I was only trying to caution that 4% gains are not all what  
they seem to be.


If you looks at Arrays.java in the 1.5 JDK, and read through the  
javadoc, you will quickly see that the sorting is well-thought out.


They use a tuned quicksort for primitives, which offers O(n(log 
(n)) performance, and a modified mergesort for Objects  
guaranteeing O(n(log(n)) performance. A standard quicksort has  
worst case performance of O(n^2) ! Both use an insertion sort if  
the numbers of elements is small.


I can only assume that in their testing they chose a mergesort for  
objects was to either: 1. have stable sort times, or more likely  
2. the merge sort has a better chance of being optimized by the  
JIT, and/or the sequential access of elements makes for more  
efficient object access in the JVM.


These people that are far more capable than me chose one over the  
other for I assume very good reasons - I just wish I knew what  
they were.


On Feb 11, 2008, at 5:14 AM, Michael McCandless wrote:

In fact I've found you need to pursue both the 2x type gains and  
also

the many smaller ones, to reach good performance.  And it requires
alot of ongoing vigilence to keep good performance.  You lose 3-4%
here and there and very quickly, very easily you're 2X slower.

These tests are very real. I'm indexing Wikipedia content, using
StandardAnalyzer, running under contrib/benchmark.  It's true, in a
real app more time will be spent pulling documents from the source,
but I'm intentionally trying to minimize that in order to measure  
just

the indexing time.  Getting a 4% gain by replacing mergesort with
quicksort is real.

If the profiler found other 4% gains, with such a small increase in
code complexity, I would passionately argue for those as well.  So
far it hasn't.

Robert if you have some concrete ideas for the 2X type gains, I'm  
all

ears :)

I certainly agree there is a point where complexity cost doesn't
offset the performance gain, but I think this particular change is
well before that point.

Lucene's indexing throughput is an important metric in its
competitiveness with other search engines.  And I want Lucene to be
the best.

Mike

eks dev wrote:

again, as long as you do not make one step forward into actual  
code, we will continue to have  what we have today, as this is  
the best what we have.


you made your statement:
"Clear code will allow for more radical improvements as more  
eyes will be able to easily understand the inner workings and  
offer better algorithms",


Not a single person here would ever dispute this statement, but  
unfortunately there is no compiler that executes such  
statements. Make a patch that utilizes this "clear-code"  
paradigm, show us  these better algorithms on actual example   
and than say: "without LUCENE-1172 I was able to improve XYZ  
feature by using ABC algorithm". That would work smooth.


Anyhow, I am not going to write more on this topic, sorry for  
the noise...


And Robert, please do not get this wrong, I see your point and I  
respect it! I just felt slight unfairness to the people that  
make the hands dirty writing as clear and fast code as possible.





- Original Message 
From: robert engels <[EMAIL PROTECTED]>
To: java-dev@lucene.apache.org
Sent: Monday, 11 February, 2008 9:55:02 AM
Subject: Re: [jira] Created: (LUCENE-1172) Small speedups to  
DocumentsWriter


I am not disputing that there is a speed improvement. I am  
disputing

that the performance gain of many of these patches is not worth the
additional complexity in the code. Clear code will allow for more
radical improvements as more eyes will be able to easily understand
the inner workings and offer better algorithms, not just micro
improvements that the JVM (eventually) can probably figure out  
on its

own.

It is a value judgement, and regretfully I don't have another 30
years to pass down the full knowledge behind my reasoning.

Luckily, however, there are some very good books available on the
subject...

It's not the fault of the submitter, but many of these timings are
suspect due to difficulty in measuring the improvements accurately.

Here is a simple example:

You can configure the JVM to not perform aggressive garbage
collection, and write a program that generates a lot garbage -  
but it

runs very fast (not GCing), until the GC eventually occurs (if the
program runs long enough). It may be ove

Re: [jira] Created: (LUCENE-1172) Small speedups to DocumentsWriter

2008-02-11 Thread robert engels

Also, these couple of paging have some very good information on  
sorting, and why heapsort is even faster than quicksort...


http://users.aims.ac.za/~mackay/sorting/sorting.html
http://www.azillionmonkeys.com/qed/sort.html


On Feb 11, 2008, at 9:29 AM, robert engels wrote:

My intent was not to diminish your hard work. We all appreciate it.  
I was only trying to caution that 4% gains are not all what they  
seem to be.


If you looks at Arrays.java in the 1.5 JDK, and read through the  
javadoc, you will quickly see that the sorting is well-thought out.


They use a tuned quicksort for primitives, which offers O(n(log(n))  
performance, and a modified mergesort for Objects guaranteeing O(n 
(log(n)) performance. A standard quicksort has worst case  
performance of O(n^2) ! Both use an insertion sort if the numbers  
of elements is small.


I can only assume that in their testing they chose a mergesort for  
objects was to either: 1. have stable sort times, or more likely 2.  
the merge sort has a better chance of being optimized by the JIT,  
and/or the sequential access of elements makes for more efficient  
object access in the JVM.


These people that are far more capable than me chose one over the  
other for I assume very good reasons - I just wish I knew what they  
were.


On Feb 11, 2008, at 5:14 AM, Michael McCandless wrote:


In fact I've found you need to pursue both the 2x type gains and also
the many smaller ones, to reach good performance.  And it requires
alot of ongoing vigilence to keep good performance.  You lose 3-4%
here and there and very quickly, very easily you're 2X slower.

These tests are very real. I'm indexing Wikipedia content, using
StandardAnalyzer, running under contrib/benchmark.  It's true, in a
real app more time will be spent pulling documents from the source,
but I'm intentionally trying to minimize that in order to measure  
just

the indexing time.  Getting a 4% gain by replacing mergesort with
quicksort is real.

If the profiler found other 4% gains, with such a small increase in
code complexity, I would passionately argue for those as well.  So
far it hasn't.

Robert if you have some concrete ideas for the 2X type gains, I'm all
ears :)

I certainly agree there is a point where complexity cost doesn't
offset the performance gain, but I think this particular change is
well before that point.

Lucene's indexing throughput is an important metric in its
competitiveness with other search engines.  And I want Lucene to be
the best.

Mike

eks dev wrote:

again, as long as you do not make one step forward into actual  
code, we will continue to have  what we have today, as this is  
the best what we have.


you made your statement:
"Clear code will allow for more radical improvements as more eyes  
will be able to easily understand the inner workings and offer  
better algorithms",


Not a single person here would ever dispute this statement, but  
unfortunately there is no compiler that executes such statements.  
Make a patch that utilizes this "clear-code" paradigm, show us   
these better algorithms on actual example  and than say: "without  
LUCENE-1172 I was able to improve XYZ feature by using ABC  
algorithm". That would work smooth.


Anyhow, I am not going to write more on this topic, sorry for the  
noise...


And Robert, please do not get this wrong, I see your point and I  
respect it! I just felt slight unfairness to the people that make  
the hands dirty writing as clear and fast code as possible.





- Original Message 
From: robert engels <[EMAIL PROTECTED]>
To: java-dev@lucene.apache.org
Sent: Monday, 11 February, 2008 9:55:02 AM
Subject: Re: [jira] Created: (LUCENE-1172) Small speedups to  
DocumentsWriter


I am not disputing that there is a speed improvement. I am disputing
that the performance gain of many of these patches is not worth the
additional complexity in the code. Clear code will allow for more
radical improvements as more eyes will be able to easily understand
the inner workings and offer better algorithms, not just micro
improvements that the JVM (eventually) can probably figure out on  
its

own.

It is a value judgement, and regretfully I don't have another 30
years to pass down the full knowledge behind my reasoning.

Luckily, however, there are some very good books available on the
subject...

It's not the fault of the submitter, but many of these timings are
suspect due to difficulty in measuring the improvements accurately.

Here is a simple example:

You can configure the JVM to not perform aggressive garbage
collection, and write a program that generates a lot garbage -  
but it

runs very fast (not GCing), until the GC eventually occurs (if the
program runs long enough). It may be overall much slower than an
alternative that runs slower as it executes, but has code to manage
the objects as they are created, and rarely if ever hits a GC cycle.
But then, the JVM (e.g. generational GC) can implement improvements
that mak

Re: [jira] Created: (LUCENE-1172) Small speedups to DocumentsWriter

2008-02-11 Thread robert engels

My intent was not to diminish your hard work. We all appreciate it. I  
was only trying to caution that 4% gains are not all what they seem  
to be.


If you looks at Arrays.java in the 1.5 JDK, and read through the  
javadoc, you will quickly see that the sorting is well-thought out.


They use a tuned quicksort for primitives, which offers O(n(log(n))  
performance, and a modified mergesort for Objects guaranteeing O(n(log 
(n)) performance. A standard quicksort has worst case performance of O 
(n^2) ! Both use an insertion sort if the numbers of elements is small.


I can only assume that in their testing they chose a mergesort for  
objects was to either: 1. have stable sort times, or more likely 2.  
the merge sort has a better chance of being optimized by the JIT, and/ 
or the sequential access of elements makes for more efficient object  
access in the JVM.


These people that are far more capable than me chose one over the  
other for I assume very good reasons - I just wish I knew what they  
were.


On Feb 11, 2008, at 5:14 AM, Michael McCandless wrote:


In fact I've found you need to pursue both the 2x type gains and also
the many smaller ones, to reach good performance.  And it requires
alot of ongoing vigilence to keep good performance.  You lose 3-4%
here and there and very quickly, very easily you're 2X slower.

These tests are very real. I'm indexing Wikipedia content, using
StandardAnalyzer, running under contrib/benchmark.  It's true, in a
real app more time will be spent pulling documents from the source,
but I'm intentionally trying to minimize that in order to measure just
the indexing time.  Getting a 4% gain by replacing mergesort with
quicksort is real.

If the profiler found other 4% gains, with such a small increase in
code complexity, I would passionately argue for those as well.  So
far it hasn't.

Robert if you have some concrete ideas for the 2X type gains, I'm all
ears :)

I certainly agree there is a point where complexity cost doesn't
offset the performance gain, but I think this particular change is
well before that point.

Lucene's indexing throughput is an important metric in its
competitiveness with other search engines.  And I want Lucene to be
the best.

Mike

eks dev wrote:

again, as long as you do not make one step forward into actual  
code, we will continue to have  what we have today, as this is the  
best what we have.


you made your statement:
"Clear code will allow for more radical improvements as more eyes  
will be able to easily understand the inner workings and offer  
better algorithms",


Not a single person here would ever dispute this statement, but  
unfortunately there is no compiler that executes such statements.  
Make a patch that utilizes this "clear-code" paradigm, show us   
these better algorithms on actual example  and than say: "without  
LUCENE-1172 I was able to improve XYZ feature by using ABC  
algorithm". That would work smooth.


Anyhow, I am not going to write more on this topic, sorry for the  
noise...


And Robert, please do not get this wrong, I see your point and I  
respect it! I just felt slight unfairness to the people that make  
the hands dirty writing as clear and fast code as possible.





- Original Message 
From: robert engels <[EMAIL PROTECTED]>
To: java-dev@lucene.apache.org
Sent: Monday, 11 February, 2008 9:55:02 AM
Subject: Re: [jira] Created: (LUCENE-1172) Small speedups to  
DocumentsWriter


I am not disputing that there is a speed improvement. I am disputing
that the performance gain of many of these patches is not worth the
additional complexity in the code. Clear code will allow for more
radical improvements as more eyes will be able to easily understand
the inner workings and offer better algorithms, not just micro
improvements that the JVM (eventually) can probably figure out on its
own.

It is a value judgement, and regretfully I don't have another 30
years to pass down the full knowledge behind my reasoning.

Luckily, however, there are some very good books available on the
subject...

It's not the fault of the submitter, but many of these timings are
suspect due to difficulty in measuring the improvements accurately.

Here is a simple example:

You can configure the JVM to not perform aggressive garbage
collection, and write a program that generates a lot garbage - but it
runs very fast (not GCing), until the GC eventually occurs (if the
program runs long enough). It may be overall much slower than an
alternative that runs slower as it executes, but has code to manage
the objects as they are created, and rarely if ever hits a GC cycle.
But then, the JVM (e.g. generational GC) can implement improvements
that makes choice A faster (and the better choice)... and the cycle
continues...

Without detailed timings and other metrics (GC pauses, IO, memory
utilization, native compilation, etc.) most benchmarks are not very
accurate or useful.  There are a lot of variables to consider - maybe
more so tha

Re: [jira] Updated: (LUCENE-1173) index corruption autoCommit=false

2008-02-11 Thread Michael Busch

Michael McCandless (JIRA) wrote:
>  [ 
> https://issues.apache.org/jira/browse/LUCENE-1173?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
>  ]
> 
> Michael McCandless updated LUCENE-1173:
> ---
> 
> Attachment: LUCENE-1173.patch
> 
> I just sent email to java-user to give a heads up on this.
> 
> Attach patch fixes the issue.  All tests pass.
> 
> I think we should spin 2.3.1 for this one?
> 

+1

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

[jira] Assigned: (LUCENE-1173) index corruption autoCommit=false

2008-02-11 Thread Michael McCandless (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-1173?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless reassigned LUCENE-1173:
--

Assignee: Michael McCandless

> index corruption autoCommit=false
> -
>
> Key: LUCENE-1173
> URL: https://issues.apache.org/jira/browse/LUCENE-1173
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: Index
>Affects Versions: 2.3
>Reporter: Yonik Seeley
>Assignee: Michael McCandless
>Priority: Critical
> Attachments: indexstress.patch
>
>
> In both Lucene 2.3 and trunk, the index becomes corrupted when 
> autoCommit=false

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Lucene-based Distributed Index Leveraging Hadoop

2008-02-11 Thread Tim Jones

I am guessing that the idea behind not putting the indexes in HDFS is  
(1) maximize performance; (2) they are relatively transient - meaning  
the data they are created from could be in HDFS, but the indexes  
themselves are just local.  To avoid having to recreate them, a backup  
copy could be kept in HDFS.


Since a goal is to be able to update them (frequently), this seems  
like a good approach to me.


Tim


Andrzej Bialecki wrote:

Doug Cutting wrote:
My primary difference with your proposal is that I would like to  
support online indexing.  Documents could be inserted and removed  
directly, and shards would synchronize changes amongst replicas,  
with an "eventual consistency" model.  Indexes would not be stored  
in HDFS, but directly on the local disk of each node.  Hadoop would  
perhaps not play a role. In many ways this would resemble CouchDB,  
but with explicit support for sharding and failover from the outset.


It's true that searching over HDFS is slow - but I'd hate to lose  
all other HDFS benefits and have to start from scratch ... I wonder  
what would be the performance of FsDirectory over an HDFS index that  
is "pinned" to a local disk, i.e. a full local replica is available,  
with block size of each index file equal to the file size.



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

[jira] Commented: (LUCENE-1175) occasional MergeException while indexing

2008-02-11 Thread Yonik Seeley (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-1175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12567900#action_12567900
 ] 

Yonik Seeley commented on LUCENE-1175:
--

Another exception, this time during IndexReader.open() after an indexing run.
{code}
java.io.FileNotFoundException: _a.fdt
at org.apache.lucene.store.RAMDirectory.openInput(RAMDirectory.java:234)
at org.apache.lucene.store.Directory.openInput(Directory.java:104)
at org.apache.lucene.index.FieldsReader.(FieldsReader.java:75)
at 
org.apache.lucene.index.SegmentReader.initialize(SegmentReader.java:308)
at org.apache.lucene.index.SegmentReader.get(SegmentReader.java:262)
at org.apache.lucene.index.SegmentReader.get(SegmentReader.java:197)
at 
org.apache.lucene.index.MultiSegmentReader.(MultiSegmentReader.java:55)
at 
org.apache.lucene.index.DirectoryIndexReader$1.doBody(DirectoryIndexReader.java:91)
at 
org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:651)
at 
org.apache.lucene.index.DirectoryIndexReader.open(DirectoryIndexReader.java:79)
at org.apache.lucene.index.IndexReader.open(IndexReader.java:209)
at org.apache.lucene.index.IndexReader.open(IndexReader.java:192)
at 
org.apache.lucene.index.TestStressIndexing2.verifyEquals(TestStressIndexing2.java:161)
at 
org.apache.lucene.index.TestStressIndexing2.testMultiConfig(TestStressIndexing2.java:72)
{code}


> occasional MergeException while indexing
> 
>
> Key: LUCENE-1175
> URL: https://issues.apache.org/jira/browse/LUCENE-1175
> Project: Lucene - Java
>  Issue Type: Bug
>Affects Versions: 2.3
>Reporter: Yonik Seeley
>
> TestStressIndexing2.testMultiConfig occasionally hits merge exceptions

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

[jira] Commented: (LUCENE-1175) occasional MergeException while indexing

2008-02-11 Thread Yonik Seeley (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-1175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12567880#action_12567880
 ] 

Yonik Seeley commented on LUCENE-1175:
--

OK, not much info to reproduce at this point, except to put the iteratons to 
100 on testMultiConfig and let it run for a while.  Here is an example 
exception:

{code}
Exception in thread "Lucene Merge Thread #1" 
org.apache.lucene.index.MergePolicy$MergeException: 
java.io.FileNotFoundException: _5_1.del
at 
org.apache.lucene.index.ConcurrentMergeScheduler.handleMergeException(ConcurrentMergeScheduler.java:320)
at 
org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:297)
Caused by: java.io.FileNotFoundException: _5_1.del
at 
org.apache.lucene.store.RAMDirectory.fileLength(RAMDirectory.java:167)
at org.apache.lucene.index.SegmentInfo.sizeInBytes(SegmentInfo.java:216)
at 
org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:3750)
at org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:3354)
at 
org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:211)
at 
org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:266)
{code}

It could potentially either be a problem in indexing, or in RAMDirectory.


> occasional MergeException while indexing
> 
>
> Key: LUCENE-1175
> URL: https://issues.apache.org/jira/browse/LUCENE-1175
> Project: Lucene - Java
>  Issue Type: Bug
>Affects Versions: 2.3
>Reporter: Yonik Seeley
>
> TestStressIndexing2.testMultiConfig occasionally hits merge exceptions

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

[jira] Created: (LUCENE-1175) occasional MergeException while indexing

2008-02-11 Thread Yonik Seeley (JIRA)

occasional MergeException while indexing


 Key: LUCENE-1175
 URL: https://issues.apache.org/jira/browse/LUCENE-1175
 Project: Lucene - Java
  Issue Type: Bug
Affects Versions: 2.3
Reporter: Yonik Seeley


TestStressIndexing2.testMultiConfig occasionally hits merge exceptions

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

[jira] Resolved: (LUCENE-1171) Make DocumentsWriter more robust on hitting OOM

2008-02-11 Thread Michael McCandless (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-1171?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless resolved LUCENE-1171.


Resolution: Fixed

> Make DocumentsWriter more robust on hitting OOM
> ---
>
> Key: LUCENE-1171
> URL: https://issues.apache.org/jira/browse/LUCENE-1171
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: Index
>Affects Versions: 2.3
>Reporter: Michael McCandless
>Assignee: Michael McCandless
>Priority: Minor
> Fix For: 2.4
>
> Attachments: LUCENE-1171.patch
>
>
> I've been stress testing DocumentsWriter by indexing wikipedia, but not
> giving enough memory to the JVM, in varying heap sizes to tickle the
> different interesting cases.  Sometimes DocumentsWriter can deadlock;
> other times it will hit a subsequent NPE or AIOOBE or assertion
> failure.
> I've fixed all the cases I've found, and added some more asserts.  Now
> it just produces plain OOM exceptions.  All changes are contained to
> DocumentsWriter.java.
> All tests pass.  I plan to commit in a day or two!

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: [jira] Created: (LUCENE-1172) Small speedups to DocumentsWriter

2008-02-11 Thread Michael McCandless


In fact I've found you need to pursue both the 2x type gains and also
the many smaller ones, to reach good performance.  And it requires
alot of ongoing vigilence to keep good performance.  You lose 3-4%
here and there and very quickly, very easily you're 2X slower.

These tests are very real. I'm indexing Wikipedia content, using
StandardAnalyzer, running under contrib/benchmark.  It's true, in a
real app more time will be spent pulling documents from the source,
but I'm intentionally trying to minimize that in order to measure just
the indexing time.  Getting a 4% gain by replacing mergesort with
quicksort is real.

If the profiler found other 4% gains, with such a small increase in
code complexity, I would passionately argue for those as well.  So
far it hasn't.

Robert if you have some concrete ideas for the 2X type gains, I'm all
ears :)

I certainly agree there is a point where complexity cost doesn't
offset the performance gain, but I think this particular change is
well before that point.

Lucene's indexing throughput is an important metric in its
competitiveness with other search engines.  And I want Lucene to be
the best.

Mike

eks dev wrote:

again, as long as you do not make one step forward into actual  
code, we will continue to have  what we have today, as this is the  
best what we have.


you made your statement:
"Clear code will allow for more radical improvements as more eyes  
will be able to easily understand the inner workings and offer  
better algorithms",


Not a single person here would ever dispute this statement, but  
unfortunately there is no compiler that executes such statements.  
Make a patch that utilizes this "clear-code" paradigm, show us   
these better algorithms on actual example  and than say: "without  
LUCENE-1172 I was able to improve XYZ feature by using ABC  
algorithm". That would work smooth.


Anyhow, I am not going to write more on this topic, sorry for the  
noise...


And Robert, please do not get this wrong, I see your point and I  
respect it! I just felt slight unfairness to the people that make  
the hands dirty writing as clear and fast code as possible.





- Original Message 
From: robert engels <[EMAIL PROTECTED]>
To: java-dev@lucene.apache.org
Sent: Monday, 11 February, 2008 9:55:02 AM
Subject: Re: [jira] Created: (LUCENE-1172) Small speedups to  
DocumentsWriter


I am not disputing that there is a speed improvement. I am disputing
that the performance gain of many of these patches is not worth the
additional complexity in the code. Clear code will allow for more
radical improvements as more eyes will be able to easily understand
the inner workings and offer better algorithms, not just micro
improvements that the JVM (eventually) can probably figure out on its
own.

It is a value judgement, and regretfully I don't have another 30
years to pass down the full knowledge behind my reasoning.

Luckily, however, there are some very good books available on the
subject...

It's not the fault of the submitter, but many of these timings are
suspect due to difficulty in measuring the improvements accurately.

Here is a simple example:

You can configure the JVM to not perform aggressive garbage
collection, and write a program that generates a lot garbage - but it
runs very fast (not GCing), until the GC eventually occurs (if the
program runs long enough). It may be overall much slower than an
alternative that runs slower as it executes, but has code to manage
the objects as they are created, and rarely if ever hits a GC cycle.
But then, the JVM (e.g. generational GC) can implement improvements
that makes choice A faster (and the better choice)... and the cycle
continues...

Without detailed timings and other metrics (GC pauses, IO, memory
utilization, native compilation, etc.) most benchmarks are not very
accurate or useful.  There are a lot of variables to consider - maybe
more so than can reasonably be considered.  That is why a 4% gain is
highly suspect.  If the gain was 25%, or 50% or 100%, you have a
better chance of it being an innate improvement, and not just the
interaction of some other factors.

On Feb 11, 2008, at 2:32 AM, eks dev wrote:


Robert,

you may or may not be right, I do not know. The only way to prove
it would be to show you can do it better, no?
If you are so convinced this is wrong, you could, much better than
quoting textbooks:

a) write better patch, get attention with something you think is
"better bottleneck"
b) provide realistic "performance tests" as you dispute the
measurement provided here

It has to be that concrete, academic discussions are cool, but at
the end of a day, it is the code that executes that counts.

cheers,
eks

- Original Message 
From: robert engels <[EMAIL PROTECTED]>
To: java-dev@lucene.apache.org
Sent: Sunday, 10 February, 2008 9:15:30 PM
Subject: Re: [jira] Created: (LUCENE-1172) Small speedups to
DocumentsWriter

I am not sure these numbers matter. I think they are skewed because
yo

Re: [jira] Created: (LUCENE-1172) Small speedups to DocumentsWriter

2008-02-11 Thread eks dev

again, as long as you do not make one step forward into actual code, we will 
continue to have  what we have today, as this is the best what we have. 

you made your statement:
"Clear code will allow for more radical improvements as more eyes will be able 
to easily understand the inner workings and offer better algorithms",

Not a single person here would ever dispute this statement, but unfortunately 
there is no compiler that executes such statements. Make a patch that utilizes 
this "clear-code" paradigm, show us  these better algorithms on actual example  
and than say: "without LUCENE-1172 I was able to improve XYZ feature by using 
ABC algorithm". That would work smooth.

Anyhow, I am not going to write more on this topic, sorry for the noise... 

And Robert, please do not get this wrong, I see your point and I respect it! I 
just felt slight unfairness to the people that make the hands dirty writing as 
clear and fast code as possible.

- Original Message 
From: robert engels <[EMAIL PROTECTED]>
To: java-dev@lucene.apache.org
Sent: Monday, 11 February, 2008 9:55:02 AM
Subject: Re: [jira] Created: (LUCENE-1172) Small speedups to DocumentsWriter

I am not disputing that there is a speed improvement. I am disputing  
that the performance gain of many of these patches is not worth the  
additional complexity in the code. Clear code will allow for more  
radical improvements as more eyes will be able to easily understand  
the inner workings and offer better algorithms, not just micro  
improvements that the JVM (eventually) can probably figure out on its  
own.

It is a value judgement, and regretfully I don't have another 30  
years to pass down the full knowledge behind my reasoning.

Luckily, however, there are some very good books available on the  
subject...

It's not the fault of the submitter, but many of these timings are  
suspect due to difficulty in measuring the improvements accurately.

Here is a simple example:

You can configure the JVM to not perform aggressive garbage  
collection, and write a program that generates a lot garbage - but it  
runs very fast (not GCing), until the GC eventually occurs (if the  
program runs long enough). It may be overall much slower than an  
alternative that runs slower as it executes, but has code to manage  
the objects as they are created, and rarely if ever hits a GC cycle.   
But then, the JVM (e.g. generational GC) can implement improvements  
that makes choice A faster (and the better choice)... and the cycle  
continues...

Without detailed timings and other metrics (GC pauses, IO, memory  
utilization, native compilation, etc.) most benchmarks are not very  
accurate or useful.  There are a lot of variables to consider - maybe  
more so than can reasonably be considered.  That is why a 4% gain is  
highly suspect.  If the gain was 25%, or 50% or 100%, you have a  
better chance of it being an innate improvement, and not just the  
interaction of some other factors.

On Feb 11, 2008, at 2:32 AM, eks dev wrote:

> Robert,
>
> you may or may not be right, I do not know. The only way to prove  
> it would be to show you can do it better, no?
> If you are so convinced this is wrong, you could, much better than  
> quoting textbooks:
>
> a) write better patch, get attention with something you think is  
> "better bottleneck"
> b) provide realistic "performance tests" as you dispute the  
> measurement provided here
>
> It has to be that concrete, academic discussions are cool, but at  
> the end of a day, it is the code that executes that counts.
>
> cheers,
> eks
>
> - Original Message 
> From: robert engels <[EMAIL PROTECTED]>
> To: java-dev@lucene.apache.org
> Sent: Sunday, 10 February, 2008 9:15:30 PM
> Subject: Re: [jira] Created: (LUCENE-1172) Small speedups to  
> DocumentsWriter
>
> I am not sure these numbers matter. I think they are skewed because
> you are probably running too short a test, and the index is in memory
> (or OS cache).
>
> Once you use a real index that needs to read/write from the disk, the
> percentage change will be negligible.
>
> This is the problem with many of these "performance changes" - they
> just aren't real world enough.  Even if they were, I would argue that
> code simplicity/maintainability is worth more than 6 seconds on a
> operation that takes 4 minutes to run...
>
> There are many people that believe micro benchmarks are next to
> worthless. A good rule of thumb is that if the optimization doesn't
> result in 2x speedup, it probably shouldn't be done. In most cases
> any efficiency gains are later lost in maintainability issues.
>
> See http://en.wikipedia.org/wiki/Optimization_(computer_science)
>
> Almost always there is a better bottleneck somewhere.
>
> On Feb 10, 2008, at 1:37 PM, Michael McCandless wrote:
>
>>
>> Yonik Seeley wrote:
>>
>>> I wonder how well a single generic quickSort(Object[] arr, int low,
>>> int high) would perform vs the type-specific ones?  I guess the main
>>

[jira] Commented: (LUCENE-167) [PATCH] QueryParser not handling queries containing AND and OR

2008-02-11 Thread Graham Maloon (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12567556#action_12567556
 ] 

Graham Maloon commented on LUCENE-167:
--

I see that very little has been done with this since 2005. Are there any plans 
to incorporate a fix into the current build? How can I get my hands on a copy 
of the fix that will work with 2.3.0?

> [PATCH] QueryParser not handling queries containing AND and OR
> --
>
> Key: LUCENE-167
> URL: https://issues.apache.org/jira/browse/LUCENE-167
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: QueryParser
>Affects Versions: unspecified
> Environment: Operating System: Linux
> Platform: PC
>Reporter: Morus Walter
>Assignee: Erik Hatcher
> Attachments: LuceneTest.java, QueryParser.jj.patch, QueryParser.patch
>
>
> The QueryParser does not seem to handle boolean queries containing AND and OR
> operators correctly:
> e.g.
> a AND b OR c AND d gets parsed as +a +b +c +d.
> The attached patch fixes this by changing the vector of boolean clauses into a
> vector of vectors of boolean clauses in the addClause method of the query
> parser. A new sub-vector is created whenever an explicit OR operator is used.
> Queries using explicit AND/OR are grouped by precedence of AND over OR. That 
> is
> a OR b AND c gets a OR (b AND c).
> Queries using implicit AND/OR (depending on the default operator) are handled 
> as
> before (so one can still use a +b -c to create one boolean query, where b is
> required, c forbidden and a optional).
> It's less clear how a query using both explizit AND/OR and implicit operators
> should be handled.
> Since the patch groups on explicit OR operators a query 
> a OR b c is read as a (b c)
> whereas
> a AND b c as +a +b c
> (given that default operator or is used).
> There's one issue left:
> The old query parser reads  a query 
> `a OR NOT b' as `a -b' which is the same as `a AND NOT b'.
> The modified query parser reads this as `a (-b)'.
> While this looks better (at least to me), it does not produce the result of a 
> OR
> NOT b. Instead the (-b) part seems to be silently dropped.
> While I understand that this query is illegal (just searching for one negative
> term) I don't think that silently dropping this part is an appropriate way to
> deal with that. But I don't think that's a query parser issue.
> The only question is, if the query parser should take care of that. 
> I attached the patch (made against 1.3rc3 but working for 1.3final as well) 
> and
> a test program.
> The test program parses a number of queries with default-or and default-and
> operator and reparses the result of the toString method of the created query.
> It outputs the initial query, the parsed query with default or, the reparesed
> query, the parsed query with the default and it's reparsed query.
> If called with a -q option, it also run's the queries against an index
> consisting of all documentes containing one or none a b c or d.
> Using an unpatched and a patched version of lucene in the classpath one can 
> look
> at the effect of the patch in detail.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

[jira] Commented: (LUCENE-1170) query with AND and OR not retrieving correct results

2008-02-11 Thread Graham Maloon (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-1170?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12567550#action_12567550
 ] 

Graham Maloon commented on LUCENE-1170:
---

Lucene-167 has a patch for the version in 2005. Has this not been incorporated 
into the newer releases to fix this problem?

> query with AND and OR not retrieving correct results
> 
>
> Key: LUCENE-1170
> URL: https://issues.apache.org/jira/browse/LUCENE-1170
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: QueryParser
>Affects Versions: 2.3
> Environment: linux and windows
>Reporter: Graham Maloon
>
> I was working with Lucene 1.4, and have now upgraded to 2.3.0 but there is 
> still a problem that I am experiencing with the Queryparser
>  
> I am passing the following queries:
>  
> "big brother" - works fine
> "big brother" AND dubai - works fine
> "big brother" AND football - works fine
> "big brother" AND dubai OR football - returns extra documents which contain 
> "big brother" but do not contain either dubai or football.
> "big brother" AND (dubai OR football) gives the same as the one above  
>  
> Am I doing something wrong?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: [jira] Created: (LUCENE-1172) Small speedups to DocumentsWriter

2008-02-11 Thread robert engels

The reason it needs (or should be done) on Unix, is that it is much  
easier (and better I think) at reporting the "real" timings. What the  
reporter stated was in (most likely) real time? which is not the best  
way to measure performance - especially on multi user/tasking OSes.


The unix time facilities give a better picture of exactly why the  
program took the amount of time to execute.



On Feb 11, 2008, at 1:07 AM, Mike Klaas wrote:

Certainly others do agree with you to some degree that this case is  
on the cost/benefit borderline.  Again, this case wasn't really the  
point.


My point was it feels to me that you have, on occasion, been over- 
quick to criticize without paying sufficiently respectful attention  
to the details of what is being discussed.  For instance, the  
criticism of "these tests should be done on a *nix platform" to  
someone who has repeated the tests on osx (yes, a nix) and  
windows.  Or that the test is too short and the index in memory (it  
was 10MM docs with term vecs on FSDirectory.  It is possible that  
some of the index wasn't fsync'd at the end of each test, I  
suppose, but I would expect this to be a small amount and  
equivalent in pre- and post-patch scenarios).  Or calling a full  
index run of 10MM docs a "micro benchmark".


I do think that I was unchill in sending the original post to the  
list instead of to you via personal mail.  I shouldn't have.


regards,
-Mike

On 10-Feb-08, at 7:33 PM, robert engels wrote:

Please chill. You are inferring something that was not implied.  
You may think it lacks perspective and respect (I disagree on  
both), but it certainly doesn't lack in correctness.


First, depending on how you measure it, 2x speedup equates to a  
50% reduction in time. In my review of the changes that brought  
about the biggest performance gains from 1.9 on, almost all were  
related to avoiding disk accesses by buffering more documents and  
doing more processing in memory.  I don't think many of the micro- 
benchmarks mattered much, and with a JVM environment it is very  
difficult to prove as it is going to be heavily JVM and  
configuration dependent.


The main point was that ANY disk access is going to be ORDERS OF  
MAGNITUDE slower than any of these sort of optimizations.


So either you are loading the index completely in memory (only  
small indexes, so the difference in speed is not going to matter  
much), or you might be using a federated system of memory indices  
(to form a large index), but USUALLY at some point the index must  
be first created in a persistent store (that which is covered  
here), in order to provide realistic restart times, etc.


The author of the patch and timings gives no information as to  
disk speed, IO speed, controllers, raid configuration , etc. When  
creating an index in persistent store, these factors matter more  
than a 2-4% speed up.  Creating an index completely in memory is  
then bound by the reading of the data from the disk, and/or the  
network - all much slower than the actual indexing.


Usually optimizations like this only matter in areas of  
development where the data set is small, but the processing large  
(a lot of numerical analysis).  In some cases the data set may  
also be "large", but then usually the processing is exponentially  
larger.  The building of the index in Lucene in not very  
computationally expensive.


If you are going spend hundreds of hours "optimizing", you best be  
optimizing the right things. That was the point of the link I sent  
(the quotes are from people far more capable than I).


I was trying to make the point that a 2-4 % speed up probably  
doesn't amount to much in a real environment given all of the  
other factors, so it is probably better for the project/community  
to err on the side of code clarity and ease of maintenance.


The project can continue to do what it wants (obviously) - but  
what I was pointing out should be nothing new to experienced  
designers/developers - I only offering a reminder. It is my  
observation (others will disagree !), but I think a lot of Lucene  
has some unneeded esoteric code, where the benefit doesn't match  
the cost.


On Feb 10, 2008, at 5:48 PM, Mike Klaas wrote:

While I agree in general that excessive optimization at the  
expense of code clarity is undesirable, you are overstating the  
point.  2X is a ridiculous threshold to apply to something as  
performance critical as a full text search engine.  If search was  
twice as slow, lucene would be utterly unusable for me.  Indexing  
less important than search, of course, but a 2X slowdown with be  
quite painful there.


I don't have an opinion in this case: I believe that there is a  
tradeoff but that it is the responsibility of the commiter(s) to  
achieve the correct balance--they are the ones who will be  
maintaining the code, after all.  I find your persistence  
surprising and your tone dangerously near condescending.  Telling  
the gu

Re: [jira] Commented: (LUCENE-1173) index corruption autoCommit=false

2008-02-11 Thread Michael Busch

Yonik Seeley (JIRA) wrote:
> [ 
> https://issues.apache.org/jira/browse/LUCENE-1173?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12567878#action_12567878
>  ] 
> 
> Yonik Seeley commented on LUCENE-1173:
> --
> 
> Hold up a bit... my random testing may have hit another bug
> testMultiConfig hit an error at some point when I cranked up the 
> iterations... I'm trying to reproduce.
> 

OK, I suggest that we should wait a couple of days before we cut 2.3.1
in case there are more problems. We should backport the patches and
commit them to the 2.3 branch. I'll then end of this week create a 2.3.1
tag, build release artifacts and call a vote. Sounds good?

-Michael

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

[jira] Commented: (LUCENE-1173) index corruption autoCommit=false

2008-02-11 Thread Yonik Seeley (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-1173?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12567878#action_12567878
 ] 

Yonik Seeley commented on LUCENE-1173:
--

Hold up a bit... my random testing may have hit another bug
testMultiConfig hit an error at some point when I cranked up the iterations... 
I'm trying to reproduce.

> index corruption autoCommit=false
> -
>
> Key: LUCENE-1173
> URL: https://issues.apache.org/jira/browse/LUCENE-1173
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: Index
>Affects Versions: 2.3
>Reporter: Yonik Seeley
>Assignee: Michael McCandless
>Priority: Critical
> Attachments: indexstress.patch, indexstress.patch, LUCENE-1173.patch
>
>
> In both Lucene 2.3 and trunk, the index becomes corrupted when 
> autoCommit=false

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

[jira] Commented: (LUCENE-1173) index corruption autoCommit=false

2008-02-11 Thread Yonik Seeley (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-1173?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12567873#action_12567873
 ] 

Yonik Seeley commented on LUCENE-1173:
--

Patch looks good (heh... a one liner!)
At least it won't break previously working code since autoCommit=true is the 
default.  The only risk is people trying out the new setting and not realizing 
it can break things.
2.3.1 might be nice, but I'll leave to others (who have the actual time to do 
the work) to decide.

> index corruption autoCommit=false
> -
>
> Key: LUCENE-1173
> URL: https://issues.apache.org/jira/browse/LUCENE-1173
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: Index
>Affects Versions: 2.3
>Reporter: Yonik Seeley
>Assignee: Michael McCandless
>Priority: Critical
> Attachments: indexstress.patch, indexstress.patch, LUCENE-1173.patch
>
>
> In both Lucene 2.3 and trunk, the index becomes corrupted when 
> autoCommit=false

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: [jira] Updated: (LUCENE-1173) index corruption autoCommit=false

2008-02-11 Thread Michael McCandless



OK I'll backport this fix.

I'd also like to backport LUCENE-1168 (another corruption case when  
autoCommit=false) and LUCENE-1171 (deadlock on hitting OOM).


Mike

Michael Busch wrote:


Michael McCandless (JIRA) wrote:
 [ https://issues.apache.org/jira/browse/LUCENE-1173? 
page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]


Michael McCandless updated LUCENE-1173:
---

Attachment: LUCENE-1173.patch

I just sent email to java-user to give a heads up on this.

Attach patch fixes the issue.  All tests pass.

I think we should spin 2.3.1 for this one?



+1

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]




-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: [jira] Created: (LUCENE-1172) Small speedups to DocumentsWriter

2008-02-11 Thread eks dev

Robert, 

you may or may not be right, I do not know. The only way to prove it would be 
to show you can do it better, no?
If you are so convinced this is wrong, you could, much better than quoting 
textbooks:

a) write better patch, get attention with something you think is "better 
bottleneck" 
b) provide realistic "performance tests" as you dispute the measurement 
provided here

It has to be that concrete, academic discussions are cool, but at the end of a 
day, it is the code that executes that counts.

cheers, 
eks

- Original Message 
From: robert engels <[EMAIL PROTECTED]>
To: java-dev@lucene.apache.org
Sent: Sunday, 10 February, 2008 9:15:30 PM
Subject: Re: [jira] Created: (LUCENE-1172) Small speedups to DocumentsWriter

I am not sure these numbers matter. I think they are skewed because  
you are probably running too short a test, and the index is in memory  
(or OS cache).

Once you use a real index that needs to read/write from the disk, the  
percentage change will be negligible.

This is the problem with many of these "performance changes" - they  
just aren't real world enough.  Even if they were, I would argue that  
code simplicity/maintainability is worth more than 6 seconds on a  
operation that takes 4 minutes to run...

There are many people that believe micro benchmarks are next to  
worthless. A good rule of thumb is that if the optimization doesn't  
result in 2x speedup, it probably shouldn't be done. In most cases  
any efficiency gains are later lost in maintainability issues.

See http://en.wikipedia.org/wiki/Optimization_(computer_science)

Almost always there is a better bottleneck somewhere.

On Feb 10, 2008, at 1:37 PM, Michael McCandless wrote:

>
> Yonik Seeley wrote:
>
>> I wonder how well a single generic quickSort(Object[] arr, int low,
>> int high) would perform vs the type-specific ones?  I guess the main
>> overhead would be a cast from Object to the specific class to do the
>> compare?  Too bad Java doesn't have true generics/templates.
>
>
> OK I tested this.
>
> Starting from the patch on LUCENE-1172, which has 3 quickSort methods
> (one per type), I created a single quickSort method on Object[] that
> takes a Comparator, and made 3 Comparators instead.
>
> Mac OS X 10.4 (JVM 1.5):
>
> original patch --> 247.1
>   simplified patch --> 254.9 (3.2% slower)
>
> Windows Server 2003 R64 (JVM 1.6):
>
> original patch --> 440.6
>   simplified patch --> 452.7 (2.7% slower)
>
> The times are best in 10 runs.  I'm running all tests with these JVM
> args:
>
>   -Xms1024M -Xmx1024M -Xbatch -server
>
> I think this is a big enough difference in performance that it's
> worth keeping 3 separate quickSorts in DocumentsWriter.
>
> Mike
>
> -
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
>

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

  __
Sent from Yahoo! Mail - a smarter inbox http://uk.mail.yahoo.com

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: [jira] Created: (LUCENE-1172) Small speedups to DocumentsWriter

[jira] Resolved: (LUCENE-325) [PATCH] new method expungeDeleted() added to IndexWriter

[jira] Commented: (LUCENE-1173) index corruption autoCommit=false

[jira] Updated: (LUCENE-1173) index corruption autoCommit=false

[jira] Updated: (LUCENE-1174) outdated information in Analyzer javadoc

[jira] Created: (LUCENE-1174) outdated information in Analyzer javadoc

[jira] Commented: (LUCENE-1173) index corruption autoCommit=false

[jira] Commented: (LUCENE-1173) index corruption autoCommit=false

[jira] Updated: (LUCENE-1173) index corruption autoCommit=false

[jira] Created: (LUCENE-1173) index corruption autoCommit=false

Re: [jira] Created: (LUCENE-1172) Small speedups to DocumentsWriter

[jira] Resolved: (LUCENE-1044) Behavior on hard power shutdown

Re: [jira] Created: (LUCENE-1172) Small speedups to DocumentsWriter

Re: [jira] Created: (LUCENE-1172) Small speedups to DocumentsWriter

[jira] Updated: (LUCENE-1173) index corruption autoCommit=false

Re: [jira] Created: (LUCENE-1172) Small speedups to DocumentsWriter

Re: [jira] Created: (LUCENE-1172) Small speedups to DocumentsWriter

Re: [jira] Created: (LUCENE-1172) Small speedups to DocumentsWriter

Re: [jira] Updated: (LUCENE-1173) index corruption autoCommit=false

[jira] Assigned: (LUCENE-1173) index corruption autoCommit=false

Re: Lucene-based Distributed Index Leveraging Hadoop

[jira] Commented: (LUCENE-1175) occasional MergeException while indexing

[jira] Commented: (LUCENE-1175) occasional MergeException while indexing

[jira] Created: (LUCENE-1175) occasional MergeException while indexing

[jira] Resolved: (LUCENE-1171) Make DocumentsWriter more robust on hitting OOM

Re: [jira] Created: (LUCENE-1172) Small speedups to DocumentsWriter

Re: [jira] Created: (LUCENE-1172) Small speedups to DocumentsWriter

[jira] Commented: (LUCENE-167) [PATCH] QueryParser not handling queries containing AND and OR

[jira] Commented: (LUCENE-1170) query with AND and OR not retrieving correct results

Re: [jira] Created: (LUCENE-1172) Small speedups to DocumentsWriter

Re: [jira] Commented: (LUCENE-1173) index corruption autoCommit=false

[jira] Commented: (LUCENE-1173) index corruption autoCommit=false

[jira] Commented: (LUCENE-1173) index corruption autoCommit=false

Re: [jira] Updated: (LUCENE-1173) index corruption autoCommit=false

Re: [jira] Created: (LUCENE-1172) Small speedups to DocumentsWriter

35 matches

Site Navigation

Mail list logo

Footer information