We just discovered this problem as well. Here's a test case that fails
without the fix.
Index: src/test/org/apache/lucene/index/TestCompoundFile.java
===
--- src/test/org/apache/lucene/index/TestCompoundFile.java
(revision 382277)
++
One big performance problem with IndexWriter.addIndexes() is that it has
to optimize the index both before and after adding the segments. When
you have a very large index, to which you are adding batches of small
updates, these calls to optimize make using addIndexes() impossible. It
makes parall
TermScorer caches values unnecessarily
--
Key: LUCENE-502
URL: http://issues.apache.org/jira/browse/LUCENE-502
Project: Lucene - Java
Type: Improvement
Components: Search
Versions: 1.9
Reporter: Steven Tamm
[ http://issues.apache.org/jira/browse/LUCENE-502?page=all ]
Steven Tamm updated LUCENE-502:
---
Attachment: TermScorer.patch
Here's the patch
Sorry about my lack of proofreading, I saved right as I was leaving work.
The main point is that the look
Type: Improvement
Components: Index
Versions: 1.9
Environment: Patch is against Lucene 1.9 trunk (as of Mar 1 06)
Reporter: Steven Tamm
Attachments: NormFactors.patch
MultiReader.norms() is very inefficient: it has to construct a byte array
that's as long as all the docu
[ http://issues.apache.org/jira/browse/LUCENE-505?page=all ]
Steven Tamm updated LUCENE-505:
---
Attachment: NormFactors.patch
This patch doesn't include my previous change to TermScorer. It passes all of
the lucene unit tests in addition to our s
[ http://issues.apache.org/jira/browse/LUCENE-505?page=all ]
Steven Tamm updated LUCENE-505:
---
Attachment: NormFactors.patch
Sorry, I didn't remove whitespace in the previous patch. This one's easier to
read.
svn diff --diff-cmd diff -x &qu
[
http://issues.apache.org/jira/browse/LUCENE-505?page=comments#action_12368381 ]
Steven Tamm commented on LUCENE-505:
I made the change less for MultiReader, but to prevent the instantiation of the
fakeNorms array (which is an extra 1MB of useless
[
http://issues.apache.org/jira/browse/LUCENE-505?page=comments#action_12368389 ]
Steven Tamm commented on LUCENE-505:
> I also worry about performance with this change. Have you benchmarked this
> while searching large indexes?
yes. see
/LUCENE-506
Project: Lucene - Java
Type: Improvement
Components: Index
Versions: 2.0
Environment: Patch against Lucene 1.9 trunk as of Mar 1 06
Reporter: Steven Tamm
Summary: Provide a way to avoid loading the TermInfoIndex into memory if you
know all the terms you
[ http://issues.apache.org/jira/browse/LUCENE-506?page=all ]
Steven Tamm updated LUCENE-506:
---
Attachment: Prefetching.patch
This also includes two additional test cases. The public exposure to the
prefetching is controlled solely by IndexReader.open
System: other
Platform: Other
Reporter: Steven Tamm
Assigned to: Lucene Developers
Priority: Minor
Seems I'm the only person who has the "unused variable" warning turned on in
Eclipse :-) This patch removes those unused variables and imports (for now
only in the "se
[
http://issues.apache.org/jira/browse/LUCENE-507?page=comments#action_12368403 ]
Steven Tamm commented on LUCENE-507:
In Lucene 1.9, there are a lot of local variable and unused import warnings.
> CLONE -[PATCH] remove unused variab
, 2.0
Environment: Lucene Trunk
Reporter: Steven Tamm
When you're iterating a SegmentTermEnum and you go past the end of the docs,
you end up with a state where the nextBuffer = null and the prevBuffer is the
penultimate term, not the last term. This patch fixes it. (It's also r
[ http://issues.apache.org/jira/browse/LUCENE-505?page=all ]
Steven Tamm updated LUCENE-505:
---
Attachment: LazyNorms.patch
Here's a patch where if you turn LOAD_NORMS_INTO_MEM to false, it will instead
load the norms from the disk all the time.
[
http://issues.apache.org/jira/browse/LUCENE-505?page=comments#action_12368443 ]
Steven Tamm commented on LUCENE-505:
We're still using TermScorer, which generates the fakeNorms() regardless of
omitNorms on or off. ConstantTermScorer is a step i
: Index
Versions: 1.9, 2.0
Reporter: Steven Tamm
Attachments: DocField.patch
If you just want to retrieve a single field from a Document, the only way to do
it is to retrieve all the fields from the Document and then search it. This
patch is an optimization that allows you
[ http://issues.apache.org/jira/browse/LUCENE-509?page=all ]
Steven Tamm updated LUCENE-509:
---
Attachment: DocField.patch
Adds
Field IndexReader.docField(int doc, String fieldName)
which is more efficient than document(doc).getField(fieldName
[
http://issues.apache.org/jira/browse/LUCENE-509?page=comments#action_12368563 ]
Steven Tamm commented on LUCENE-509:
Ahh yes. I actually just hit this problem with Japanese... I'll post a fix
soon.
> Performance optimization when retr
[ http://issues.apache.org/jira/browse/LUCENE-509?page=all ]
Steven Tamm updated LUCENE-509:
---
Attachment: DocField_2.patch
Now calls readChar() instead of just skipping.
> Performance optimization when retrieving a single field from a docum
[ http://issues.apache.org/jira/browse/LUCENE-509?page=all ]
Steven Tamm updated LUCENE-509:
---
Attachment: DocField_3.patch
This includes a UTF-8 test. It fails with the first patch and works with the
second.
I specifically chose to return only the
[ http://issues.apache.org/jira/browse/LUCENE-511?page=all ]
Steven Tamm updated LUCENE-511:
---
Attachment: BufferedIndexOutput.patch
Here's the files in patch format.
> New BufferedIndexOutput optimization fails to update buf
[ http://issues.apache.org/jira/browse/LUCENE-509?page=all ]
Steven Tamm updated LUCENE-509:
---
Attachment: DocField_4.patch
My previous change didn't affect MultiReader, so it was useless for some of our
indexes (i.e. it would work but not be effi
[ http://issues.apache.org/jira/browse/LUCENE-509?page=all ]
Steven Tamm updated LUCENE-509:
---
Attachment: DocField_4b.patch
Sorry about that. I didn't regenerate the patch after I added the test case.
Apologies: this version works.
> Per
[
http://issues.apache.org/jira/browse/LUCENE-502?page=comments#action_12368768 ]
Steven Tamm commented on LUCENE-502:
If you're using a WildcardTermEnum, this optimization saves a ton. We usually
do wildcard searches which retrieve 50-5000
[
http://issues.apache.org/jira/browse/LUCENE-502?page=comments#action_12368775 ]
Steven Tamm commented on LUCENE-502:
The main point is this: When you are using TermScorer to score one document,
it is doing a lot of extra work. It's reading 31
[
http://issues.apache.org/jira/browse/LUCENE-502?page=comments#action_12368784 ]
Steven Tamm commented on LUCENE-502:
> The conjunctive scorer does not call score(HitCollector,int). This is only
> called in a few cases anymore.
However, i
[
http://issues.apache.org/jira/browse/LUCENE-507?page=comments#action_12369106 ]
Steven Tamm commented on LUCENE-507:
I haven't included a patch, although I can generate one easily.
As a matter of course, javadoc references that aren't incl
[ http://issues.apache.org/jira/browse/LUCENE-507?page=all ]
Steven Tamm updated LUCENE-507:
---
Attachment: Unused.patch
This fixes unnecessary casts, unused imports, unused private methods, and
unused private variables. Most of the changes were in the
Optimization for IndexWriter.addIndexes()
-
Key: LUCENE-528
URL: http://issues.apache.org/jira/browse/LUCENE-528
Project: Lucene - Java
Type: Improvement
Components: Index
Reporter: Steven Tamm
Priority: Minor
[ http://issues.apache.org/jira/browse/LUCENE-528?page=all ]
Steven Tamm updated LUCENE-528:
---
Attachment: AddIndexes.patch
> Optimization for IndexWriter.addIndexes()
> -
>
> Key: LUCENE-528
&g
[ http://issues.apache.org/jira/browse/LUCENE-505?page=all ]
Steven Tamm updated LUCENE-505:
---
Attachment: NormFactors20.patch
There was a bug in MultiReader.java where I wasn't handling the caches
correctly, specifically in getNormFactors and doSe
[
http://issues.apache.org/jira/browse/LUCENE-509?page=comments#action_12442154 ]
Steven Tamm commented on LUCENE-509:
[[ Old comment, sent by email on Fri, 23 Jun 2006 09:16:17 -0700 ]]
It does. There isn't a real perfor
[
https://issues.apache.org/jira/browse/LUCENE-508?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12556717#action_12556717
]
Steven Tamm commented on LUCENE-508:
Thanks!
-Steven
> SegmentTermEn
34 matches
Mail list logo