RE: 1.9 RC1

2006-03-02 Thread Steven Tamm
We just discovered this problem as well. Here's a test case that fails without the fix. Index: src/test/org/apache/lucene/index/TestCompoundFile.java === --- src/test/org/apache/lucene/index/TestCompoundFile.java (revision 382277) ++

Optimization for IndexWriter.addIndexes()

2006-03-15 Thread Steven Tamm
One big performance problem with IndexWriter.addIndexes() is that it has to optimize the index both before and after adding the segments. When you have a very large index, to which you are adding batches of small updates, these calls to optimize make using addIndexes() impossible. It makes parall

[jira] Created: (LUCENE-502) TermScorer caches values unnecessarily

2006-02-28 Thread Steven Tamm (JIRA)
TermScorer caches values unnecessarily -- Key: LUCENE-502 URL: http://issues.apache.org/jira/browse/LUCENE-502 Project: Lucene - Java Type: Improvement Components: Search Versions: 1.9 Reporter: Steven Tamm

[jira] Updated: (LUCENE-502) TermScorer caches values unnecessarily

2006-02-28 Thread Steven Tamm (JIRA)
[ http://issues.apache.org/jira/browse/LUCENE-502?page=all ] Steven Tamm updated LUCENE-502: --- Attachment: TermScorer.patch Here's the patch Sorry about my lack of proofreading, I saved right as I was leaving work. The main point is that the look

[jira] Created: (LUCENE-505) MultiReader.norm() takes up too much memory: norms byte[] should be made into an Object

2006-03-01 Thread Steven Tamm (JIRA)
Type: Improvement Components: Index Versions: 1.9 Environment: Patch is against Lucene 1.9 trunk (as of Mar 1 06) Reporter: Steven Tamm Attachments: NormFactors.patch MultiReader.norms() is very inefficient: it has to construct a byte array that's as long as all the docu

[jira] Updated: (LUCENE-505) MultiReader.norm() takes up too much memory: norms byte[] should be made into an Object

2006-03-01 Thread Steven Tamm (JIRA)
[ http://issues.apache.org/jira/browse/LUCENE-505?page=all ] Steven Tamm updated LUCENE-505: --- Attachment: NormFactors.patch This patch doesn't include my previous change to TermScorer. It passes all of the lucene unit tests in addition to our s

[jira] Updated: (LUCENE-505) MultiReader.norm() takes up too much memory: norms byte[] should be made into an Object

2006-03-01 Thread Steven Tamm (JIRA)
[ http://issues.apache.org/jira/browse/LUCENE-505?page=all ] Steven Tamm updated LUCENE-505: --- Attachment: NormFactors.patch Sorry, I didn't remove whitespace in the previous patch. This one's easier to read. svn diff --diff-cmd diff -x &qu

[jira] Commented: (LUCENE-505) MultiReader.norm() takes up too much memory: norms byte[] should be made into an Object

2006-03-01 Thread Steven Tamm (JIRA)
[ http://issues.apache.org/jira/browse/LUCENE-505?page=comments#action_12368381 ] Steven Tamm commented on LUCENE-505: I made the change less for MultiReader, but to prevent the instantiation of the fakeNorms array (which is an extra 1MB of useless

[jira] Commented: (LUCENE-505) MultiReader.norm() takes up too much memory: norms byte[] should be made into an Object

2006-03-01 Thread Steven Tamm (JIRA)
[ http://issues.apache.org/jira/browse/LUCENE-505?page=comments#action_12368389 ] Steven Tamm commented on LUCENE-505: > I also worry about performance with this change. Have you benchmarked this > while searching large indexes? yes. see

[jira] Created: (LUCENE-506) Optimize Memory Use for Short-Lived Indexes (Do not load TermInfoIndex if you know the queries ahead of time)

2006-03-01 Thread Steven Tamm (JIRA)
/LUCENE-506 Project: Lucene - Java Type: Improvement Components: Index Versions: 2.0 Environment: Patch against Lucene 1.9 trunk as of Mar 1 06 Reporter: Steven Tamm Summary: Provide a way to avoid loading the TermInfoIndex into memory if you know all the terms you

[jira] Updated: (LUCENE-506) Optimize Memory Use for Short-Lived Indexes (Do not load TermInfoIndex if you know the queries ahead of time)

2006-03-01 Thread Steven Tamm (JIRA)
[ http://issues.apache.org/jira/browse/LUCENE-506?page=all ] Steven Tamm updated LUCENE-506: --- Attachment: Prefetching.patch This also includes two additional test cases. The public exposure to the prefetching is controlled solely by IndexReader.open

[jira] Created: (LUCENE-507) CLONE -[PATCH] remove unused variables

2006-03-01 Thread Steven Tamm (JIRA)
System: other Platform: Other Reporter: Steven Tamm Assigned to: Lucene Developers Priority: Minor Seems I'm the only person who has the "unused variable" warning turned on in Eclipse :-) This patch removes those unused variables and imports (for now only in the "se

[jira] Commented: (LUCENE-507) CLONE -[PATCH] remove unused variables

2006-03-01 Thread Steven Tamm (JIRA)
[ http://issues.apache.org/jira/browse/LUCENE-507?page=comments#action_12368403 ] Steven Tamm commented on LUCENE-507: In Lucene 1.9, there are a lot of local variable and unused import warnings. > CLONE -[PATCH] remove unused variab

[jira] Created: (LUCENE-508) SegmentTermEnum.next() doesn't maintain prevBuffer at end

2006-03-01 Thread Steven Tamm (JIRA)
, 2.0 Environment: Lucene Trunk Reporter: Steven Tamm When you're iterating a SegmentTermEnum and you go past the end of the docs, you end up with a state where the nextBuffer = null and the prevBuffer is the penultimate term, not the last term. This patch fixes it. (It's also r

[jira] Updated: (LUCENE-505) MultiReader.norm() takes up too much memory: norms byte[] should be made into an Object

2006-03-01 Thread Steven Tamm (JIRA)
[ http://issues.apache.org/jira/browse/LUCENE-505?page=all ] Steven Tamm updated LUCENE-505: --- Attachment: LazyNorms.patch Here's a patch where if you turn LOAD_NORMS_INTO_MEM to false, it will instead load the norms from the disk all the time.

[jira] Commented: (LUCENE-505) MultiReader.norm() takes up too much memory: norms byte[] should be made into an Object

2006-03-01 Thread Steven Tamm (JIRA)
[ http://issues.apache.org/jira/browse/LUCENE-505?page=comments#action_12368443 ] Steven Tamm commented on LUCENE-505: We're still using TermScorer, which generates the fakeNorms() regardless of omitNorms on or off. ConstantTermScorer is a step i

[jira] Created: (LUCENE-509) Performance optimization when retrieving a single field from a document

2006-03-02 Thread Steven Tamm (JIRA)
: Index Versions: 1.9, 2.0 Reporter: Steven Tamm Attachments: DocField.patch If you just want to retrieve a single field from a Document, the only way to do it is to retrieve all the fields from the Document and then search it. This patch is an optimization that allows you

[jira] Updated: (LUCENE-509) Performance optimization when retrieving a single field from a document

2006-03-02 Thread Steven Tamm (JIRA)
[ http://issues.apache.org/jira/browse/LUCENE-509?page=all ] Steven Tamm updated LUCENE-509: --- Attachment: DocField.patch Adds Field IndexReader.docField(int doc, String fieldName) which is more efficient than document(doc).getField(fieldName

[jira] Commented: (LUCENE-509) Performance optimization when retrieving a single field from a document

2006-03-02 Thread Steven Tamm (JIRA)
[ http://issues.apache.org/jira/browse/LUCENE-509?page=comments#action_12368563 ] Steven Tamm commented on LUCENE-509: Ahh yes. I actually just hit this problem with Japanese... I'll post a fix soon. > Performance optimization when retr

[jira] Updated: (LUCENE-509) Performance optimization when retrieving a single field from a document

2006-03-02 Thread Steven Tamm (JIRA)
[ http://issues.apache.org/jira/browse/LUCENE-509?page=all ] Steven Tamm updated LUCENE-509: --- Attachment: DocField_2.patch Now calls readChar() instead of just skipping. > Performance optimization when retrieving a single field from a docum

[jira] Updated: (LUCENE-509) Performance optimization when retrieving a single field from a document

2006-03-02 Thread Steven Tamm (JIRA)
[ http://issues.apache.org/jira/browse/LUCENE-509?page=all ] Steven Tamm updated LUCENE-509: --- Attachment: DocField_3.patch This includes a UTF-8 test. It fails with the first patch and works with the second. I specifically chose to return only the

[jira] Updated: (LUCENE-511) New BufferedIndexOutput optimization fails to update bufferStart

2006-03-02 Thread Steven Tamm (JIRA)
[ http://issues.apache.org/jira/browse/LUCENE-511?page=all ] Steven Tamm updated LUCENE-511: --- Attachment: BufferedIndexOutput.patch Here's the files in patch format. > New BufferedIndexOutput optimization fails to update buf

[jira] Updated: (LUCENE-509) Performance optimization when retrieving a single field from a document

2006-03-02 Thread Steven Tamm (JIRA)
[ http://issues.apache.org/jira/browse/LUCENE-509?page=all ] Steven Tamm updated LUCENE-509: --- Attachment: DocField_4.patch My previous change didn't affect MultiReader, so it was useless for some of our indexes (i.e. it would work but not be effi

[jira] Updated: (LUCENE-509) Performance optimization when retrieving a single field from a document

2006-03-02 Thread Steven Tamm (JIRA)
[ http://issues.apache.org/jira/browse/LUCENE-509?page=all ] Steven Tamm updated LUCENE-509: --- Attachment: DocField_4b.patch Sorry about that. I didn't regenerate the patch after I added the test case. Apologies: this version works. > Per

[jira] Commented: (LUCENE-502) TermScorer caches values unnecessarily

2006-03-03 Thread Steven Tamm (JIRA)
[ http://issues.apache.org/jira/browse/LUCENE-502?page=comments#action_12368768 ] Steven Tamm commented on LUCENE-502: If you're using a WildcardTermEnum, this optimization saves a ton. We usually do wildcard searches which retrieve 50-5000

[jira] Commented: (LUCENE-502) TermScorer caches values unnecessarily

2006-03-03 Thread Steven Tamm (JIRA)
[ http://issues.apache.org/jira/browse/LUCENE-502?page=comments#action_12368775 ] Steven Tamm commented on LUCENE-502: The main point is this: When you are using TermScorer to score one document, it is doing a lot of extra work. It's reading 31

[jira] Commented: (LUCENE-502) TermScorer caches values unnecessarily

2006-03-03 Thread Steven Tamm (JIRA)
[ http://issues.apache.org/jira/browse/LUCENE-502?page=comments#action_12368784 ] Steven Tamm commented on LUCENE-502: > The conjunctive scorer does not call score(HitCollector,int). This is only > called in a few cases anymore. However, i

[jira] Commented: (LUCENE-507) CLONE -[PATCH] remove unused variables

2006-03-06 Thread Steven Tamm (JIRA)
[ http://issues.apache.org/jira/browse/LUCENE-507?page=comments#action_12369106 ] Steven Tamm commented on LUCENE-507: I haven't included a patch, although I can generate one easily. As a matter of course, javadoc references that aren't incl

[jira] Updated: (LUCENE-507) CLONE -[PATCH] remove unused variables

2006-03-06 Thread Steven Tamm (JIRA)
[ http://issues.apache.org/jira/browse/LUCENE-507?page=all ] Steven Tamm updated LUCENE-507: --- Attachment: Unused.patch This fixes unnecessary casts, unused imports, unused private methods, and unused private variables. Most of the changes were in the

[jira] Created: (LUCENE-528) Optimization for IndexWriter.addIndexes()

2006-03-21 Thread Steven Tamm (JIRA)
Optimization for IndexWriter.addIndexes() - Key: LUCENE-528 URL: http://issues.apache.org/jira/browse/LUCENE-528 Project: Lucene - Java Type: Improvement Components: Index Reporter: Steven Tamm Priority: Minor

[jira] Updated: (LUCENE-528) Optimization for IndexWriter.addIndexes()

2006-03-21 Thread Steven Tamm (JIRA)
[ http://issues.apache.org/jira/browse/LUCENE-528?page=all ] Steven Tamm updated LUCENE-528: --- Attachment: AddIndexes.patch > Optimization for IndexWriter.addIndexes() > - > > Key: LUCENE-528 &g

[jira] Updated: (LUCENE-505) MultiReader.norm() takes up too much memory: norms byte[] should be made into an Object

2006-04-12 Thread Steven Tamm (JIRA)
[ http://issues.apache.org/jira/browse/LUCENE-505?page=all ] Steven Tamm updated LUCENE-505: --- Attachment: NormFactors20.patch There was a bug in MultiReader.java where I wasn't handling the caches correctly, specifically in getNormFactors and doSe

[jira] Commented: (LUCENE-509) Performance optimization when retrieving a single field from a document

2006-10-13 Thread Steven Tamm (JIRA)
[ http://issues.apache.org/jira/browse/LUCENE-509?page=comments#action_12442154 ] Steven Tamm commented on LUCENE-509: [[ Old comment, sent by email on Fri, 23 Jun 2006 09:16:17 -0700 ]] It does. There isn't a real perfor

[jira] Commented: (LUCENE-508) SegmentTermEnum.next() doesn't maintain prevBuffer at end

2008-01-07 Thread Steven Tamm (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-508?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12556717#action_12556717 ] Steven Tamm commented on LUCENE-508: Thanks! -Steven > SegmentTermEn