Re: Using FastVectorHighlighter for snippets
One more observation. The length of the snippet returned is not equal to the fragment length specified. Does anyone know the reason why? On Wed, Sep 22, 2010 at 3:05 PM, Devshree Sane wrote: > Thanks for your reply Koji. > > On Wed, Sep 22, 2010 at 4:51 AM, Koji Sekiguchi wrote: > >> (10/09/22 3:24), Devshree Sane wrote: >> >>> I am a bit confused about the parameters that are passed to the >>> FastVectorHighlighter.getBestFragments() method. One parameter is a >>> document >>> id and another is the maximum number of fragments. Does it mean that only >>> the maximum number of fragments will be retrieved from document with >>> given >>> id (even if there are more fragments in the same document)? >>> >>> Correct. >> >> > I did a little experiment for this. Following are my observations. > Changing the number of characters from 100 to 1000 decreased the number of > fragments returned. > > Is this because the document text was covered with a few 1000 character > fragments? If so, then this means that one fragment can contain more than > one occurrence of the query term. Is this so? If yes, is there a way to find > the number of occurrences of the query term inside a particular > snippet/fragment? > > Also is there a way to get the beginning and ending positions/offsets in > the document of the snippet/fragment being returned? > > > > >
Problem with Numeric range query.
I have a set of documents that all have a "timestamp" field which is stored as a long integer number. The field is indexed in my Lucene index as a number using NumericField with a precision step of 8: Field field = new NumericField("timestamp", 8); field.setLongValue( timestampValue); I do this so I can do numeric range queries to retrieve all documents that fall within a specific time range. The query I construct has two parts to it, a query, and a filter. I get the document hits as follows: IndexReader reader = .. some index reader. IndexSearcher searcher = new IndexSearcher(reader); Filter filter = NumericRangeFilter.newLongRange("timestamp", 8, startTime, endTime, false, true); Query query = new MatchAllDocsQuery(); searcher.search( query, filter, myCollector); // My collector is a super class of Collector - saves all Hits Occasionally, I have a single document with a very specific timestamp I want to retrieve. Suppose that timestamp is timeX, I will create the filter as follows: Filter filter = NumericRangeFilter.newLongRange("timestamp", 8, timeX-1, timeX, false, true); But with this filter, the document that should be found is never found. I have even tried expanding the time range as follows, but with no success: Filter filter = NumericRangeFilter.newLongRange("timestamp", 8, timeX-1, timeX+500, false, true); Strangely, a filter that should NOT have found the document actually did find the document: Filter filter = NumericRangeFilter.newLongRange("timestamp", 8, timeX, timeX+1000, false, true); This filter should NOT have found the document since the minInclusive argument is false. I have also noticed that sometimes when I have several documents with exactly the same timestamp, a query will return some, but not all, of the documents. I have also tried to use a NumericRangeQuery as follows: Query query = NumericRangeQuery.newLongRange("timestamp", 8, timeX-1, timeX, false, true); searcher.search( query, null, myCollector); This also fails to return my document(s). Am I doing something wrong here? Have I misunderstood how this is supposed to work? Has anyone else had problems like this? Thanks for any help or guidance or tips you can give me, -Daniel Sanders
RE: Problem with Numeric range query.
Hi, Can you provide some self-containing testcase that shows your problem? In most cases those problems are caused by not committing changes to IndexWriter before opening the IndexReader. Additionally, if you only want to look for exactly one timestamp (like a TermQuery), use a NumericRangeQuery with upper+lower inclusive = true and use the specific value to search for as both upper and lower. You may also hit a bug, that's already solved in SVN (it happens when the lower bound is near Long.MAX_VALUE or the upper bound near Long.MIN_VALUE): https://issues.apache.org/jira/browse/LUCENE-2541 Uwe - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de > -Original Message- > From: Daniel Sanders [mailto:dsand...@novell.com] > Sent: Thursday, September 23, 2010 12:23 PM > To: java-user@lucene.apache.org > Subject: Problem with Numeric range query. > > > I have a set of documents that all have a "timestamp" field which is stored as a > long integer number. The field is indexed in my Lucene index as a number > using NumericField with a precision step of 8: > >Field field = new NumericField("timestamp", 8); >field.setLongValue( timestampValue); > > I do this so I can do numeric range queries to retrieve all documents that fall > within a specific time range. > > The query I construct has two parts to it, a query, and a filter. I get the > document hits as follows: > >IndexReader reader = .. some index reader. >IndexSearcher searcher = new IndexSearcher(reader); > >Filter filter = NumericRangeFilter.newLongRange("timestamp", 8, startTime, > endTime, false, true); >Query query = new MatchAllDocsQuery(); >searcher.search( query, filter, myCollector); // My collector is a super class of > Collector - saves all Hits > > Occasionally, I have a single document with a very specific timestamp I want to > retrieve. Suppose that timestamp is timeX, I will create the filter as follows: > >Filter filter = NumericRangeFilter.newLongRange("timestamp", 8, timeX-1, > timeX, false, true); > > But with this filter, the document that should be found is never found. I have > even tried expanding the time range as follows, but with no success: > >Filter filter = NumericRangeFilter.newLongRange("timestamp", 8, timeX-1, > timeX+500, false, true); > > Strangely, a filter that should NOT have found the document actually did find > the document: > >Filter filter = NumericRangeFilter.newLongRange("timestamp", 8, timeX, > timeX+1000, false, true); > > This filter should NOT have found the document since the minInclusive > argument is false. > > I have also noticed that sometimes when I have several documents with exactly > the same timestamp, a query will return some, but not all, of the documents. > > I have also tried to use a NumericRangeQuery as follows: > >Query query = NumericRangeQuery.newLongRange("timestamp", 8, timeX-1, > timeX, false, true); >searcher.search( query, null, myCollector); > > This also fails to return my document(s). > > Am I doing something wrong here? Have I misunderstood how this is supposed > to work? Has anyone else had problems like this? > > > Thanks for any help or guidance or tips you can give me, > > -Daniel Sanders - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
RE: Problem with Numeric range query.
Thank you for your timely response. It's going to take me longer to create an isolated test case you can test this with. I will see what I can do. In the meantime, I have some follow up information in response to your other suggestions. 1) I don't think my problem is that the IndexWriter has not committed the document. Here's why: In my test case, I first retrieve a document using a different lucene query on a different field. From that document I extract the value for timestamp field and then perform the NumericRangeQuery on that value as described below. I was doing as a way to create a unit test that would verify that the NumericRangeQuery was working properly. I think the fact that first query found the document is evidence that the IndexWriter had committed the document. Hence, I would expect that if I follow that query with a NumericRangeQuery it should be able to find the same document. 2) I also don't think my problem is values near Long.MIN_VALUE or Long.MAX_VALUE. My values are all timestamps, which are positive integers that are not anywhere near those two extremes. The values originally come from the java.util.Date.getTime() method. 3) I will try the upper+lower inclusive = true and using same value for min and max, although I don't see how that will change anything. I have actually debugged through the code for NumericRangeQuery, and if minInclusive == false, then min is incremented, and if maxInclusive == false, then max is decremented. So my query: NumericRangeQuery.newLongRange("timestamp",8,timeX-1,timeX,false,true) is essentially equivalent to the query you suggest trying: NumericRangeQuery.newLongRange("timestamp",8,timeX,timeX,true,true) right? -Daniel Sanders >>> "Uwe Schindler" 9/23/2010 2:04 PM >>> Hi, Can you provide some self-containing testcase that shows your problem? In most cases those problems are caused by not committing changes to IndexWriter before opening the IndexReader. Additionally, if you only want to look for exactly one timestamp (like a TermQuery), use a NumericRangeQuery with upper+lower inclusive = true and use the specific value to search for as both upper and lower. You may also hit a bug, that's already solved in SVN (it happens when the lower bound is near Long.MAX_VALUE or the upper bound near Long.MIN_VALUE): https://issues.apache.org/jira/browse/LUCENE-2541 Uwe - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de > -Original Message- > From: Daniel Sanders [mailto:dsand...@novell.com] > Sent: Thursday, September 23, 2010 12:23 PM > To: java-user@lucene.apache.org > Subject: Problem with Numeric range query. > > > I have a set of documents that all have a "timestamp" field which is stored as a > long integer number. The field is indexed in my Lucene index as a number > using NumericField with a precision step of 8: > >Field field = new NumericField("timestamp", 8); >field.setLongValue( timestampValue); > > I do this so I can do numeric range queries to retrieve all documents that fall > within a specific time range. > > The query I construct has two parts to it, a query, and a filter. I get the > document hits as follows: > >IndexReader reader = .. some index reader. >IndexSearcher searcher = new IndexSearcher(reader); > >Filter filter = NumericRangeFilter.newLongRange("timestamp", 8, startTime, > endTime, false, true); >Query query = new MatchAllDocsQuery(); >searcher.search( query, filter, myCollector); // My collector is a super class of > Collector - saves all Hits > > Occasionally, I have a single document with a very specific timestamp I want to > retrieve. Suppose that timestamp is timeX, I will create the filter as follows: > >Filter filter = NumericRangeFilter.newLongRange("timestamp", 8, timeX-1, > timeX, false, true); > > But with this filter, the document that should be found is never found. I have > even tried expanding the time range as follows, but with no success: > >Filter filter = NumericRangeFilter.newLongRange("timestamp", 8, timeX-1, > timeX+500, false, true); > > Strangely, a filter that should NOT have found the document actually did find > the document: > >Filter filter = NumericRangeFilter.newLongRange("timestamp", 8, timeX, > timeX+1000, false, true); > > This filter should NOT have found the document since the minInclusive > argument is false. > > I have also noticed that sometimes when I have several documents with exactly > the same timestamp, a query will return some, but not all, of the documents. > > I have also tried to use a NumericRangeQuery as follows: > >Query query = NumericRangeQuery.newLongRange("timestamp", 8, timeX-1, > timeX, false, true); >searcher.search( query, null, myCollector); > > This also fails to return my document(s). > > Am I doing something wrong here? Have I misunderstood how this is supposed > to work? Has anyone else had
RE: Problem with Numeric range query.
Hi, > Thank you for your timely response. :-) > It's going to take me longer to create an isolated test case you can test this > with. I will see what I can do. That would be fine. Often with a simple test those errors disappear, because they are problem in the logic somewhere else :) But you should in all cases try this. > In the meantime, I have some follow up information in response to your other > suggestions. > > 1) I don't think my problem is that the IndexWriter has not committed the > document. Here's why: > > > In my test case, I first retrieve a document using a different lucene query on a > different field. From that document I extract the value for timestamp field and > then perform the NumericRangeQuery on that value as described below. I was > doing as a way to create a unit test that would verify that the > NumericRangeQuery was working properly. I think the fact that first query > found the document is evidence that the IndexWriter had committed the > document. Hence, I would expect that if I follow that query with a > NumericRangeQuery it should be able to find the same document. Yes. But are you sure, that the timestamp is also indexed? If it's stored only, it would not find that. Or maybe the other way round. > 2) I also don't think my problem is values near Long.MIN_VALUE or > Long.MAX_VALUE. My values are all timestamps, which are positive integers > that are not anywhere near those two extremes. The values originally come > from the java.util.Date.getTime() method. > > 3) I will try the upper+lower inclusive = true and using same value for min and > max, although I don't see how that will change anything. I have actually > debugged through the code for NumericRangeQuery, and if minInclusive == > false, then min is incremented, and if maxInclusive == false, then max is > decremented. So my query: > >NumericRangeQuery.newLongRange("timestamp",8,timeX-1,timeX,false,true) > > is essentially equivalent to the query you suggest trying: > >NumericRangeQuery.newLongRange("timestamp",8,timeX,timeX,true,true) > > right? Yes, it is the same. The Lucene test TestNumericRangeQuery64.testOneMatchQuery() verifies the upper=lower inclusive=true thing. Uwe - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
RE: Problem with Numeric range query.
I'm certain the timestamp field is being indexed. It is created as follows: Document doc = new Document(); NumericField timeField = new NumericField("timestamp", 8); // Defaults to indexing = true. timeField.setLongValue( timeX); doc.add( timeField); ... indexWriter.addDocument(doc); ... indexWriter.commit(); -Daniel >>> "Uwe Schindler" 9/23/2010 3:02 PM >>> Hi, > Thank you for your timely response. :-) > It's going to take me longer to create an isolated test case you can test this > with. I will see what I can do. That would be fine. Often with a simple test those errors disappear, because they are problem in the logic somewhere else :) But you should in all cases try this. > In the meantime, I have some follow up information in response to your other > suggestions. > > 1) I don't think my problem is that the IndexWriter has not committed the > document. Here's why: > > > In my test case, I first retrieve a document using a different lucene query on a > different field. From that document I extract the value for timestamp field and > then perform the NumericRangeQuery on that value as described below. I was > doing as a way to create a unit test that would verify that the > NumericRangeQuery was working properly. I think the fact that first query > found the document is evidence that the IndexWriter had committed the > document. Hence, I would expect that if I follow that query with a > NumericRangeQuery it should be able to find the same document. Yes. But are you sure, that the timestamp is also indexed? If it's stored only, it would not find that. Or maybe the other way round. > 2) I also don't think my problem is values near Long.MIN_VALUE or > Long.MAX_VALUE. My values are all timestamps, which are positive integers > that are not anywhere near those two extremes. The values originally come > from the java.util.Date.getTime() method. > > 3) I will try the upper+lower inclusive = true and using same value for min and > max, although I don't see how that will change anything. I have actually > debugged through the code for NumericRangeQuery, and if minInclusive == > false, then min is incremented, and if maxInclusive == false, then max is > decremented. So my query: > >NumericRangeQuery.newLongRange("timestamp",8,timeX-1,timeX,false,true) > > is essentially equivalent to the query you suggest trying: > >NumericRangeQuery.newLongRange("timestamp",8,timeX,timeX,true,true) > > right? Yes, it is the same. The Lucene test TestNumericRangeQuery64.testOneMatchQuery() verifies the upper=lower inclusive=true thing. Uwe - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
ArrayIndexOutOfBoundsException when iterating over TermDocs
Hi, A user got this very strange exception, and I managed to get the index that it happens on. Basically, iterating over the TermDocs causes an AAOIB exception. I easily reproduced it using the FieldCache which does exactly that (the field in question is indexed as numeric). Here is the exception: Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: 114 at org.apache.lucene.util.BitVector.get(BitVector.java:104) at org.apache.lucene.index.SegmentTermDocs.next(SegmentTermDocs.java:127) at org.apache.lucene.search.FieldCacheImpl$LongCache.createValue(FieldCacheImpl.java:501) at org.apache.lucene.search.FieldCacheImpl$Cache.get(FieldCacheImpl.java:183) at org.apache.lucene.search.FieldCacheImpl.getLongs(FieldCacheImpl.java:470) at TestMe.main(TestMe.java:56) It happens on the following segment: _26t docCount: 914 delCount: 1 delFileName: _26t_1.del And as you can see, it smells like a corner case (it fails for document number 912, the AIOOB happens from the deleted docs). The code to recreate it is simple: FSDirectory dir = FSDirectory.open(new File("index")); IndexReader reader = IndexReader.open(dir, true); IndexReader[] subReaders = reader.getSequentialSubReaders(); for (IndexReader subReader : subReaders) { Field field = subReader.getClass().getSuperclass().getDeclaredField("si"); field.setAccessible(true); SegmentInfo si = (SegmentInfo) field.get(subReader); System.out.println("--> " + si); if (si.getDocStoreSegment().contains("_26t")) { // this is the probleatic one... System.out.println("problematic one..."); FieldCache.DEFAULT.getLongs(subReader, "__documentdate", FieldCache.NUMERIC_UTILS_LONG_PARSER); } } Here is the result of a check index on that segment: 8 of 10: name=_26t docCount=914 compound=true hasProx=true numFiles=2 size (MB)=1.641 diagnostics = {optimize=false, mergeFactor=10, os.version=2.6.18-194.11.1.el5.centos.plus, os=Linux, mergeDocStores=true, lucene.version=3.0.2 953716 - 2010-06-11 17:13:53, source=merge, os.arch=amd64, java.version=1.6.0, java.vendor=Sun Microsystems Inc.} has deletions [delFileName=_26t_1.del] test: open reader.OK [1 deleted docs] test: fields..OK [32 fields] test: field norms.OK [32 fields] test: terms, freq, prox...ERROR [114] java.lang.ArrayIndexOutOfBoundsException: 114 at org.apache.lucene.util.BitVector.get(BitVector.java:104) at org.apache.lucene.index.SegmentTermDocs.next(SegmentTermDocs.java:127) at org.apache.lucene.index.SegmentTermPositions.next(SegmentTermPositions.java:102) at org.apache.lucene.index.CheckIndex.testTermIndex(CheckIndex.java:616) at org.apache.lucene.index.CheckIndex.checkIndex(CheckIndex.java:509) at org.apache.lucene.index.CheckIndex.checkIndex(CheckIndex.java:299) at TestMe.main(TestMe.java:47) test: stored fields...ERROR [114] java.lang.ArrayIndexOutOfBoundsException: 114 at org.apache.lucene.util.BitVector.get(BitVector.java:104) at org.apache.lucene.index.ReadOnlySegmentReader.isDeleted(ReadOnlySegmentReader.java:34) at org.apache.lucene.index.CheckIndex.testStoredFields(CheckIndex.java:684) at org.apache.lucene.index.CheckIndex.checkIndex(CheckIndex.java:512) at org.apache.lucene.index.CheckIndex.checkIndex(CheckIndex.java:299) at TestMe.main(TestMe.java:47) test: term vectorsERROR [114] java.lang.ArrayIndexOutOfBoundsException: 114 at org.apache.lucene.util.BitVector.get(BitVector.java:104) at org.apache.lucene.index.ReadOnlySegmentReader.isDeleted(ReadOnlySegmentReader.java:34) at org.apache.lucene.index.CheckIndex.testTermVectors(CheckIndex.java:721) at org.apache.lucene.index.CheckIndex.checkIndex(CheckIndex.java:515) at org.apache.lucene.index.CheckIndex.checkIndex(CheckIndex.java:299) at TestMe.main(TestMe.java:47) The creation of the index does not do something fancy (all defaults), though there is usage of the near real time aspect (IndexWriter#getReader) which does complicate deleted docs handling. Seems like the deleted docs got written without matching the number of docs?. Sadly, I don't have something that recreates it from scratch, but I do have the index if someone want to have a look at it (mail me directly and I will provide a download link). I will continue to investigate why this might happen, just wondering if someone stumbled on this exception before. Lucene 3.0.2 is used. -shay.banon
In lucene 2.3.2, needs to stop optimization?
Hi, We are using lucene 2.3.2, now we need to index each document as fast as possible, so user can almost immediately search it. So I am considering stop IndexWriter optimization during real time, then in relatively off-time like late night we may call IndexWriter optimize method explicitly once. What is the most efficient way to completely turn off IndexWriter merge in lucene 2.3.2? Thanks very much for helps, Lisheng - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
Re: ArrayIndexOutOfBoundsException when iterating over TermDocs
Shay, would you mind open a jira issue for that? simon On Fri, Sep 24, 2010 at 2:53 AM, Shay Banon wrote: > Hi, > > A user got this very strange exception, and I managed to get the index > that it happens on. Basically, iterating over the TermDocs causes an AAOIB > exception. I easily reproduced it using the FieldCache which does exactly > that (the field in question is indexed as numeric). Here is the exception: > > Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: 114 > at org.apache.lucene.util.BitVector.get(BitVector.java:104) > at org.apache.lucene.index.SegmentTermDocs.next(SegmentTermDocs.java:127) > at > org.apache.lucene.search.FieldCacheImpl$LongCache.createValue(FieldCacheImpl.java:501) > at > org.apache.lucene.search.FieldCacheImpl$Cache.get(FieldCacheImpl.java:183) > at org.apache.lucene.search.FieldCacheImpl.getLongs(FieldCacheImpl.java:470) > at TestMe.main(TestMe.java:56) > > It happens on the following segment: _26t docCount: 914 delCount: 1 > delFileName: _26t_1.del > > And as you can see, it smells like a corner case (it fails for document > number 912, the AIOOB happens from the deleted docs). The code to recreate > it is simple: > > FSDirectory dir = FSDirectory.open(new File("index")); > IndexReader reader = IndexReader.open(dir, true); > > IndexReader[] subReaders = reader.getSequentialSubReaders(); > for (IndexReader subReader : subReaders) { > Field field = > subReader.getClass().getSuperclass().getDeclaredField("si"); > field.setAccessible(true); > SegmentInfo si = (SegmentInfo) field.get(subReader); > System.out.println("--> " + si); > if (si.getDocStoreSegment().contains("_26t")) { > // this is the probleatic one... > System.out.println("problematic one..."); > FieldCache.DEFAULT.getLongs(subReader, "__documentdate", > FieldCache.NUMERIC_UTILS_LONG_PARSER); > } > } > > Here is the result of a check index on that segment: > > 8 of 10: name=_26t docCount=914 > compound=true > hasProx=true > numFiles=2 > size (MB)=1.641 > diagnostics = {optimize=false, mergeFactor=10, > os.version=2.6.18-194.11.1.el5.centos.plus, os=Linux, mergeDocStores=true, > lucene.version=3.0.2 953716 - 2010-06-11 17:13:53, source=merge, > os.arch=amd64, java.version=1.6.0, java.vendor=Sun Microsystems Inc.} > has deletions [delFileName=_26t_1.del] > test: open reader.OK [1 deleted docs] > test: fields..OK [32 fields] > test: field norms.OK [32 fields] > test: terms, freq, prox...ERROR [114] > java.lang.ArrayIndexOutOfBoundsException: 114 > at org.apache.lucene.util.BitVector.get(BitVector.java:104) > at org.apache.lucene.index.SegmentTermDocs.next(SegmentTermDocs.java:127) > at > org.apache.lucene.index.SegmentTermPositions.next(SegmentTermPositions.java:102) > at org.apache.lucene.index.CheckIndex.testTermIndex(CheckIndex.java:616) > at org.apache.lucene.index.CheckIndex.checkIndex(CheckIndex.java:509) > at org.apache.lucene.index.CheckIndex.checkIndex(CheckIndex.java:299) > at TestMe.main(TestMe.java:47) > test: stored fields...ERROR [114] > java.lang.ArrayIndexOutOfBoundsException: 114 > at org.apache.lucene.util.BitVector.get(BitVector.java:104) > at > org.apache.lucene.index.ReadOnlySegmentReader.isDeleted(ReadOnlySegmentReader.java:34) > at org.apache.lucene.index.CheckIndex.testStoredFields(CheckIndex.java:684) > at org.apache.lucene.index.CheckIndex.checkIndex(CheckIndex.java:512) > at org.apache.lucene.index.CheckIndex.checkIndex(CheckIndex.java:299) > at TestMe.main(TestMe.java:47) > test: term vectorsERROR [114] > java.lang.ArrayIndexOutOfBoundsException: 114 > at org.apache.lucene.util.BitVector.get(BitVector.java:104) > at > org.apache.lucene.index.ReadOnlySegmentReader.isDeleted(ReadOnlySegmentReader.java:34) > at org.apache.lucene.index.CheckIndex.testTermVectors(CheckIndex.java:721) > at org.apache.lucene.index.CheckIndex.checkIndex(CheckIndex.java:515) > at org.apache.lucene.index.CheckIndex.checkIndex(CheckIndex.java:299) > at TestMe.main(TestMe.java:47) > > > > The creation of the index does not do something fancy (all defaults), though > there is usage of the near real time aspect (IndexWriter#getReader) which > does complicate deleted docs handling. Seems like the deleted docs got > written without matching the number of docs?. Sadly, I don't have something > that recreates it from scratch, but I do have the index if someone want to > have a look at it (mail me directly and I will provide a download link). > > I will continue to investigate why this might happen, just wondering if > someone stumbled on this exception before. Lucene 3.0.2 is used. > > -shay.banon > - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: jav
RE: In lucene 2.3.2, needs to stop optimization?
Hi, I read document/code and did some experiments, one possibility is to raise mergeFactor to high value, say close to 2Billion, then a lot of small files are created and after >500 docs are created separately, search speed dropped sharply. I noticed with our current data, if I add one doc then call optimize(), it took about 7s, this is too slow for real time search. If I keep mergeFactor as 10 and donot call optimize(), does it mean from time to time IndexWriter would optimize on background, when it happens, it may take a few seconds (so Index will delay a few seconds)? Should I use high mergeFactor and optimize once a day, or use default mergeFactor and donot call optimize? maybe latter is better, but I am concerned about occasional slowness? Currently I donot plan to keep IndexWriter constantly open, but open/close for each index request. Any suggestion to improve would be appreciated, Lisheng -Original Message- From: Zhang, Lisheng [mailto:lisheng.zh...@broadvision.com] Sent: Thursday, September 23, 2010 6:11 PM To: java-user@lucene.apache.org Subject: In lucene 2.3.2, needs to stop optimization? Hi, We are using lucene 2.3.2, now we need to index each document as fast as possible, so user can almost immediately search it. So I am considering stop IndexWriter optimization during real time, then in relatively off-time like late night we may call IndexWriter optimize method explicitly once. What is the most efficient way to completely turn off IndexWriter merge in lucene 2.3.2? Thanks very much for helps, Lisheng - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org