custom FieldCache cost too much time. how can I preload the the custom fieldCache when new segment exits!
custom FieldCache cost too much time. So every first time,reopen the new reader ,it interfere the performance of search I hope someone can tell me,how can I preload the the custom fieldCache when new segment exits! Thanks again! here is source , In FieldComparator.setNextReader method ((C2CFieldManager)fieldManager).lCommID = FieldCache.DEFAULT.getLongs(reader, "1",new LongParser(){ public long parseLong(String documentIDStr) { documentIDStr = documentIDStr.substring(16); long documentID = Long.parseLong(documentIDStr,16); return documentID; } }); - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
Re: custom FieldCache cost too much time. how can I preload the the custom fieldCache when new segment exits!
How are you opening a new reader? If it's a near-real-time reader (IndexWriter.getReader), or you use IndexReader.reopen, it should only be the newly created segments that have to generate the field cache entry, which most of the time should be fast. If you are already using those APIs and its still not fast enough, then you should just warm the reader before using it (additionally, for a near-real-time reader you should warm newly merged segments by installing a mergedSegmentWarmer on the writer). Mike On Sat, Feb 27, 2010 at 3:35 AM, luocanrao wrote: > custom FieldCache cost too much time. > So every first time,reopen the new reader ,it interfere the performance of > search > I hope someone can tell me,how can I preload the the custom fieldCache when > new segment exits! > Thanks again! > > here is source , In FieldComparator.setNextReader method > ((C2CFieldManager)fieldManager).lCommID = > FieldCache.DEFAULT.getLongs(reader, "1",new LongParser(){ > public long parseLong(String documentIDStr) > { > documentIDStr = > documentIDStr.substring(16); > long documentID = > Long.parseLong(documentIDStr,16); > return documentID; > } > > }); > > > - > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > > - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
Re: recovering payload from fields
You can also access payloads through the TermPositions enum, but, this is by term and then by doc. It sounds like you need to iterate through all terms sequentially in a given field in the doc, accessing offset & payload? In which case reanalyzing at search time may be the best way to go. You can store term vectors in the index, which will store offsets (if you ask it to), but, payloads are not currently stored with term vectors. Mike On Fri, Feb 26, 2010 at 7:42 PM, Christopher Condit wrote: >> Payload Data is accessed through PayloadSpans so using SpanQUeries is the >> netry point it seems. There are tools like PayloadSpanUtil that convert >> other >> queries into SpanQueries for this purpose if needed but the api for Payloads >> looks it like it goes through Spans is the bottom line. > > So there's no way to iterate through all the payloads for a given field? I > can't use the SpanQuery mechanism because in this case the entire field will > be displayed - and I can't search for "*". Is there some trick I'm not > thinking of? > >> this is the tip of the iceberg; a big dangerous iceberg... > > Yes - I'm beginning to see that... > > Thanks, > -Chris > > - > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > > - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
Re: Infinite loop when searching empty index
I turned this into a unit test... but I don't see it never returning... the test passes. How did you create your empty reader? Patch: Index: src/test/org/apache/lucene/search/TestMatchAllDocsQuery.java === --- src/test/org/apache/lucene/search/TestMatchAllDocsQuery.java (revision 916939) +++ src/test/org/apache/lucene/search/TestMatchAllDocsQuery.java (working copy) @@ -27,6 +27,8 @@ import org.apache.lucene.index.IndexReader; import org.apache.lucene.queryParser.QueryParser; import org.apache.lucene.store.RAMDirectory; +import org.apache.lucene.store.Directory; +import org.apache.lucene.store.MockRAMDirectory; import org.apache.lucene.util.LuceneTestCase; @@ -115,6 +117,21 @@ dir.close(); } + public void testNeverReturns() throws Exception { +Directory dir = new MockRAMDirectory(); +IndexWriter w = new IndexWriter(dir, new StandardAnalyzer(TEST_VERSION_CURRENT), IndexWriter.MaxFieldLength.UNLIMITED); +IndexReader r = w.getReader(); +w.close(); + +assertEquals(0, r.numDocs()); // empty index +IndexSearcher s = new IndexSearcher(r); +TopDocsCollector collector = TopScoreDocCollector.create(0, true); +s.search(new MatchAllDocsQuery(), collector); // never returns +s.close(); +r.close(); +dir.close(); + } + public void testEquals() { Query q1 = new MatchAllDocsQuery(); Query q2 = new MatchAllDocsQuery(); Mike On Fri, Feb 26, 2010 at 4:54 PM, Justin wrote: > Is this a bug in Lucene Java as of tr...@915399? > > int numDocs = reader.numDocs(); // = 0 (empty index) > TopDocsCollector collector = TopScoreDocCollector.create(numDocs, > true); > searcher.search(new MatchAllDocsQuery(), collector); // never > returns > > // Searcher > public void search(Query query, Collector collector) > throws IOException { > search(createWeight(query), null, collector); // never returns > } > > // extends IndexSearcher > public void search(Weight weight, Filter filter, final Collector > collector) throws IOException { > boolean topScorer = (filter == null) true : false; > Scorer scorer = weight.scorer(reader, true, topScorer); > if (scorer != null && topScorer) { > scorer.score(collector); // never returns > > // Scorer > public void score(Collector collector) throws IOException { > collector.setScorer(this); > int doc; > while ((doc = nextDoc()) != NO_MORE_DOCS) { // doc = 0 (infinite) > collector.collect(doc); > } > } > > > Thanks for any feedback, > Justin > > > > > - > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > > - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
RE: Infinite loop when searching empty index
I was doing the same, MatchAllDocsScorer is fine and also AbstractAllTermDocs. - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de > -Original Message- > From: Michael McCandless [mailto:luc...@mikemccandless.com] > Sent: Saturday, February 27, 2010 11:52 AM > To: java-user@lucene.apache.org > Subject: Re: Infinite loop when searching empty index > > I turned this into a unit test... but I don't see it never > returning... the test passes. > > How did you create your empty reader? > > Patch: > > Index: src/test/org/apache/lucene/search/TestMatchAllDocsQuery.java > === > --- src/test/org/apache/lucene/search/TestMatchAllDocsQuery.java > (revision > 916939) > +++ src/test/org/apache/lucene/search/TestMatchAllDocsQuery.java > (working copy) > @@ -27,6 +27,8 @@ > import org.apache.lucene.index.IndexReader; > import org.apache.lucene.queryParser.QueryParser; > import org.apache.lucene.store.RAMDirectory; > +import org.apache.lucene.store.Directory; > +import org.apache.lucene.store.MockRAMDirectory; > > import org.apache.lucene.util.LuceneTestCase; > > @@ -115,6 +117,21 @@ > dir.close(); >} > > + public void testNeverReturns() throws Exception { > +Directory dir = new MockRAMDirectory(); > +IndexWriter w = new IndexWriter(dir, new > StandardAnalyzer(TEST_VERSION_CURRENT), > IndexWriter.MaxFieldLength.UNLIMITED); > +IndexReader r = w.getReader(); > +w.close(); > + > +assertEquals(0, r.numDocs()); // empty index > +IndexSearcher s = new IndexSearcher(r); > +TopDocsCollector collector = TopScoreDocCollector.create(0, true); > +s.search(new MatchAllDocsQuery(), collector); // never returns > +s.close(); > +r.close(); > +dir.close(); > + } > + >public void testEquals() { > Query q1 = new MatchAllDocsQuery(); > Query q2 = new MatchAllDocsQuery(); > > Mike > > On Fri, Feb 26, 2010 at 4:54 PM, Justin wrote: > > Is this a bug in Lucene Java as of tr...@915399? > > > >int numDocs = reader.numDocs(); // = 0 (empty index) > >TopDocsCollector collector = TopScoreDocCollector.create(numDocs, > > true); > >searcher.search(new MatchAllDocsQuery(), collector); // never > > returns > > > >// Searcher > >public void search(Query query, Collector collector) > > throws IOException { > > search(createWeight(query), null, collector); // never returns > >} > > > >// extends IndexSearcher > >public void search(Weight weight, Filter filter, final Collector > collector) throws IOException { > > boolean topScorer = (filter == null) true : false; > > Scorer scorer = weight.scorer(reader, true, topScorer); > > if (scorer != null && topScorer) { > >scorer.score(collector); // never returns > > > >// Scorer > >public void score(Collector collector) throws IOException { > > collector.setScorer(this); > > int doc; > > while ((doc = nextDoc()) != NO_MORE_DOCS) { // doc = 0 > (infinite) > >collector.collect(doc); > > } > >} > > > > > > Thanks for any feedback, > > Justin > > > > > > > > > > - > > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > > For additional commands, e-mail: java-user-h...@lucene.apache.org > > > > > > - > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
答复: custom FieldCache cost too m uch time. how can I preload the the custom fieldCache when new segment exits!
I set merge factor 4, every five minute I reopen the reader. yes most of the time is very fast. But sometimes it is very slow. For example,when start the program,the first query cosume 10s! So when newly created segment generated,the query cosume more than 1s. Our performance is key point. Sorry ,my English is not good! I hope I can preload custom fieldCache in a another thread,not the query thread. So I will not have performace issue. -邮件原件- 发件人: Michael McCandless [mailto:luc...@mikemccandless.com] 发送时间: 2010年2月27日 18:37 收件人: java-user@lucene.apache.org 主题: Re: custom FieldCache cost too much time. how can I preload the the custom fieldCache when new segment exits! How are you opening a new reader? If it's a near-real-time reader (IndexWriter.getReader), or you use IndexReader.reopen, it should only be the newly created segments that have to generate the field cache entry, which most of the time should be fast. If you are already using those APIs and its still not fast enough, then you should just warm the reader before using it (additionally, for a near-real-time reader you should warm newly merged segments by installing a mergedSegmentWarmer on the writer). Mike On Sat, Feb 27, 2010 at 3:35 AM, luocanrao wrote: > custom FieldCache cost too much time. > So every first time,reopen the new reader ,it interfere the performance of > search > I hope someone can tell me,how can I preload the the custom fieldCache when > new segment exits! > Thanks again! > > here is source , In FieldComparator.setNextReader method > ((C2CFieldManager)fieldManager).lCommID = > FieldCache.DEFAULT.getLongs(reader, "1",new LongParser(){ >public long parseLong(String documentIDStr) > { >documentIDStr = > documentIDStr.substring(16); >long documentID = > Long.parseLong(documentIDStr,16); >return documentID; >} > >}); > > > - > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > > - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
答复: custom FieldCache cost too m uch time. how can I preload the the custom fieldCache when new segment exits!
Ps:in our evrioment a document have more than ten field,in a short time,may be have many update. installing a mergedSegmentWarmer on the writer,can you give me a small case, thanks very much! -邮件原件- 发件人: luocanrao [mailto:luocan19826...@sohu.com] 发送时间: 2010年2月27日 19:09 收件人: java-user@lucene.apache.org 主题: 答复: custom FieldCache cost too much time. how can I preload the the custom fieldCache when new segment exits! I set merge factor 4, every five minute I reopen the reader. yes most of the time is very fast. But sometimes it is very slow. For example,when start the program,the first query cosume 10s! So when newly created segment generated,the query cosume more than 1s. Our performance is key point. Sorry ,my English is not good! I hope I can preload custom fieldCache in a another thread,not the query thread. So I will not have performace issue. -邮件原件- 发件人: Michael McCandless [mailto:luc...@mikemccandless.com] 发送时间: 2010年2月27日 18:37 收件人: java-user@lucene.apache.org 主题: Re: custom FieldCache cost too much time. how can I preload the the custom fieldCache when new segment exits! How are you opening a new reader? If it's a near-real-time reader (IndexWriter.getReader), or you use IndexReader.reopen, it should only be the newly created segments that have to generate the field cache entry, which most of the time should be fast. If you are already using those APIs and its still not fast enough, then you should just warm the reader before using it (additionally, for a near-real-time reader you should warm newly merged segments by installing a mergedSegmentWarmer on the writer). Mike On Sat, Feb 27, 2010 at 3:35 AM, luocanrao wrote: > custom FieldCache cost too much time. > So every first time,reopen the new reader ,it interfere the performance of > search > I hope someone can tell me,how can I preload the the custom fieldCache when > new segment exits! > Thanks again! > > here is source , In FieldComparator.setNextReader method > ((C2CFieldManager)fieldManager).lCommID = > FieldCache.DEFAULT.getLongs(reader, "1",new LongParser(){ >public long parseLong(String documentIDStr) > { >documentIDStr = > documentIDStr.substring(16); >long documentID = > Long.parseLong(documentIDStr,16); >return documentID; >} > >}); > > > - > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > > - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
Re: NumericField exact match
On Fri, Feb 26, 2010 at 3:33 PM, Ivan Vasilev wrote: > Does it matter precision step when I use NumericRangeQuery for exact > matches? No. There is a full-precision version of the value indexed regardless of the precision step, and that's used for an exact match query. > I mean if I use the default precision step when indexing that > fields it is guaranteed that: > 1. With this query I will always hit the docs that contain "val" for the > "field"; > 2. I will never hit docs with different that have diferent "val" for the > "field"; Correct. -Yonik http://www.lucidimagination.com - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
Re: If you could have one feature in Lucene...
Pluggable compression allowing for alternatives to gzip for text compression for storing. Specifically I am interested in bzip2[1] as implemented in Apache Commons Compress[2]. While bzip2 compression is considerable slower than gzip (although decompression is not too much slower than gzip) it compresses much better than gzip (especially text). Having the choice would be helpful, and for Lucene usage for non-text indexing, content specific compression algorithms may outperform the default gzip. And in these days of multi-core / multi-threading, perhaps we could convince the Apache Commons Compress team to implement a parallel Java version of bzip2 compression (theirs is single threaded), like pbzip2[3]. -glen [1]http://en.wikipedia.org/wiki/Bzip2 [2]http://commons.apache.org/compress/ [3]http://compression.ca/pbzip2/ On 24 February 2010 08:42, Grant Ingersoll wrote: > What would it be? > > - > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > > -- - - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
RE: If you could have one feature in Lucene...
Hi Glen, > Pluggable compression allowing for alternatives to gzip for text > compression for storing. > Specifically I am interested in bzip2[1] as implemented in Apache > Commons Compress[2]. > While bzip2 compression is considerable slower than gzip (although > decompression is not too much slower than gzip) it compresses much > better than gzip (especially text). > > Having the choice would be helpful, and for Lucene usage for non-text > indexing, content specific compression algorithms may outperform the > default gzip. Since Version 3.0 / 2.9 of Lucene compression support was removed entirely (in 2.9 still avail as deprecated). All you now have to do is simply store your compressed stored fields as a byte[] (see Field javadocs). By that you can use any compression. The problems with gzip and the other available compression algos lead us to removing the compression support from Lucene (as it had lots of problems). In general the way to go is: Create a ByteArrayOutputStream and wrap with any compression filter, then feed your data in and use "new Field(name,stream.getBytes())". On the client side just use the inverse (Document.getBinaryValue(), create input stream on top of byte[] and decompress). Uwe - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
Re: If you could have one feature in Lucene...
Hello Uwe. That will teach me for not keeping up with the versions! :-) So it is up to the application to keep track of what it used for compression. Understandable. Thanks! Glen On 27 February 2010 10:17, Uwe Schindler wrote: > Hi Glen, > > >> Pluggable compression allowing for alternatives to gzip for text >> compression for storing. >> Specifically I am interested in bzip2[1] as implemented in Apache >> Commons Compress[2]. >> While bzip2 compression is considerable slower than gzip (although >> decompression is not too much slower than gzip) it compresses much >> better than gzip (especially text). >> >> Having the choice would be helpful, and for Lucene usage for non-text >> indexing, content specific compression algorithms may outperform the >> default gzip. > > Since Version 3.0 / 2.9 of Lucene compression support was removed entirely > (in 2.9 still avail as deprecated). All you now have to do is simply store > your compressed stored fields as a byte[] (see Field javadocs). By that you > can use any compression. The problems with gzip and the other available > compression algos lead us to removing the compression support from Lucene (as > it had lots of problems). In general the way to go is: Create a > ByteArrayOutputStream and wrap with any compression filter, then feed your > data in and use "new Field(name,stream.getBytes())". On the client side just > use the inverse (Document.getBinaryValue(), create input stream on top of > byte[] and decompress). > > Uwe > > > - > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > > -- - - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
Re: FieldCache cost too much time. how can I preload the the custom fieldCache when new segment exits!
Sounds like you should simply open & warm the reader in a background thread... You might want to use the SearcherManager class from upcoming Lucene in Action 2nd edition (NOTE: I'm a co-author). You can download the source code @ http://manning.com/hatcher3. Mike - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
Re: custom FieldCache cost too much time. how can I preload the the custom fieldCache when new segment exits!
If you look at the javadocs for IndexWriter it explains how to do it. You just provide a class that implements the warm method, and inside that method you do whatever app specific things you need to do to the provided IndexReader to warm it. Note that the SearcherManager class from LIA2 handles setting up the MergedSegmentWarmer, if you implement the warm method. Mike - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
Re: Infinite loop when searching empty index
Thanks for checking. I think I tracked down the problem. We apparently extended most of these classes and more work was necessary to facilitate the latest API. I just didn't dig deep enough, into nextDoc() which I thought too trivial to step into. The extended Scorer repeatedly returned the last doc ID instead of NO_MORE_DOCS. Sorry for the wild goose chase! I do think I've run across another problem which I may report in a new thread: ParallelReader.reopen() appears to be taking up file descriptors to the same files without letting the old ones go. Our Java process hits the 'limit -n' barrier (in the thousands) after a few minutes. My current work around is to check isCurrent(), close, then open. I wonder if the changes to support near real-time search inadvertently broke this. - Original Message From: Uwe Schindler To: java-user@lucene.apache.org Sent: Sat, February 27, 2010 4:55:55 AM Subject: RE: Infinite loop when searching empty index I was doing the same, MatchAllDocsScorer is fine and also AbstractAllTermDocs. - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de > -Original Message- > From: Michael McCandless [mailto:luc...@mikemccandless.com] > Sent: Saturday, February 27, 2010 11:52 AM > To: java-user@lucene.apache.org > Subject: Re: Infinite loop when searching empty index > > I turned this into a unit test... but I don't see it never > returning... the test passes. > > How did you create your empty reader? > > Patch: > > Index: src/test/org/apache/lucene/search/TestMatchAllDocsQuery.java > === > --- src/test/org/apache/lucene/search/TestMatchAllDocsQuery.java > (revision > 916939) > +++ src/test/org/apache/lucene/search/TestMatchAllDocsQuery.java > (working copy) > @@ -27,6 +27,8 @@ > import org.apache.lucene.index.IndexReader; > import org.apache.lucene.queryParser.QueryParser; > import org.apache.lucene.store.RAMDirectory; > +import org.apache.lucene.store.Directory; > +import org.apache.lucene.store.MockRAMDirectory; > > import org.apache.lucene.util.LuceneTestCase; > > @@ -115,6 +117,21 @@ > dir.close(); >} > > + public void testNeverReturns() throws Exception { > +Directory dir = new MockRAMDirectory(); > +IndexWriter w = new IndexWriter(dir, new > StandardAnalyzer(TEST_VERSION_CURRENT), > IndexWriter.MaxFieldLength.UNLIMITED); > +IndexReader r = w.getReader(); > +w.close(); > + > +assertEquals(0, r.numDocs()); // empty index > +IndexSearcher s = new IndexSearcher(r); > +TopDocsCollector collector = TopScoreDocCollector.create(0, true); > +s.search(new MatchAllDocsQuery(), collector); // never returns > +s.close(); > +r.close(); > +dir.close(); > + } > + >public void testEquals() { > Query q1 = new MatchAllDocsQuery(); > Query q2 = new MatchAllDocsQuery(); > > Mike > > On Fri, Feb 26, 2010 at 4:54 PM, Justin wrote: > > Is this a bug in Lucene Java as of tr...@915399? > > > >int numDocs = reader.numDocs(); // = 0 (empty index) > >TopDocsCollector collector = TopScoreDocCollector.create(numDocs, > > true); > >searcher.search(new MatchAllDocsQuery(), collector); // never > > returns > > > >// Searcher > >public void search(Query query, Collector collector) > > throws IOException { > > search(createWeight(query), null, collector); // never returns > >} > > > >// extends IndexSearcher > >public void search(Weight weight, Filter filter, final Collector > collector) throws IOException { > > boolean topScorer = (filter == null) true : false; > > Scorer scorer = weight.scorer(reader, true, topScorer); > > if (scorer != null && topScorer) { > >scorer.score(collector); // never returns > > > >// Scorer > >public void score(Collector collector) throws IOException { > > collector.setScorer(this); > > int doc; > > while ((doc = nextDoc()) != NO_MORE_DOCS) { // doc = 0 > (infinite) > >collector.collect(doc); > > } > >} > > > > > > Thanks for any feedback, > > Justin > > > > > > > > > > - > > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > > For additional commands, e-mail: java-user-h...@lucene.apache.org > > > > > > - > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For addition
Re: Infinite loop when searching empty index
Hmm -- can you give more details on the possible file descriptor leak? Or a test case? Thanks. Mike On Sat, Feb 27, 2010 at 12:24 PM, Justin wrote: > Thanks for checking. I think I tracked down the problem. We apparently > extended most of these classes and more work was necessary to facilitate the > latest API. I just didn't dig deep enough, into nextDoc() which I thought > too trivial to step into. The extended Scorer repeatedly returned the last > doc ID instead of NO_MORE_DOCS. Sorry for the wild goose chase! > > I do think I've run across another problem which I may report in a new > thread: ParallelReader.reopen() appears to be taking up file descriptors to > the same files without letting the old ones go. Our Java process hits the > 'limit -n' barrier (in the thousands) after a few minutes. My current work > around is to check isCurrent(), close, then open. I wonder if the changes to > support near real-time search inadvertently broke this. > > > > > - Original Message > From: Uwe Schindler > To: java-user@lucene.apache.org > Sent: Sat, February 27, 2010 4:55:55 AM > Subject: RE: Infinite loop when searching empty index > > I was doing the same, MatchAllDocsScorer is fine and also AbstractAllTermDocs. > > - > Uwe Schindler > H.-H.-Meier-Allee 63, D-28213 Bremen > http://www.thetaphi.de > eMail: u...@thetaphi.de > >> -Original Message- >> From: Michael McCandless [mailto:luc...@mikemccandless.com] >> Sent: Saturday, February 27, 2010 11:52 AM >> To: java-user@lucene.apache.org >> Subject: Re: Infinite loop when searching empty index >> >> I turned this into a unit test... but I don't see it never >> returning... the test passes. >> >> How did you create your empty reader? >> >> Patch: >> >> Index: src/test/org/apache/lucene/search/TestMatchAllDocsQuery.java >> === >> --- src/test/org/apache/lucene/search/TestMatchAllDocsQuery.java >> (revision >> 916939) >> +++ src/test/org/apache/lucene/search/TestMatchAllDocsQuery.java >> (working copy) >> @@ -27,6 +27,8 @@ >> import org.apache.lucene.index.IndexReader; >> import org.apache.lucene.queryParser.QueryParser; >> import org.apache.lucene.store.RAMDirectory; >> +import org.apache.lucene.store.Directory; >> +import org.apache.lucene.store.MockRAMDirectory; >> >> import org.apache.lucene.util.LuceneTestCase; >> >> @@ -115,6 +117,21 @@ >> dir.close(); >> } >> >> + public void testNeverReturns() throws Exception { >> + Directory dir = new MockRAMDirectory(); >> + IndexWriter w = new IndexWriter(dir, new >> StandardAnalyzer(TEST_VERSION_CURRENT), >> IndexWriter.MaxFieldLength.UNLIMITED); >> + IndexReader r = w.getReader(); >> + w.close(); >> + >> + assertEquals(0, r.numDocs()); // empty index >> + IndexSearcher s = new IndexSearcher(r); >> + TopDocsCollector collector = TopScoreDocCollector.create(0, true); >> + s.search(new MatchAllDocsQuery(), collector); // never returns >> + s.close(); >> + r.close(); >> + dir.close(); >> + } >> + >> public void testEquals() { >> Query q1 = new MatchAllDocsQuery(); >> Query q2 = new MatchAllDocsQuery(); >> >> Mike >> >> On Fri, Feb 26, 2010 at 4:54 PM, Justin wrote: >> > Is this a bug in Lucene Java as of tr...@915399? >> > >> > int numDocs = reader.numDocs(); // = 0 (empty index) >> > TopDocsCollector collector = TopScoreDocCollector.create(numDocs, >> > true); >> > searcher.search(new MatchAllDocsQuery(), collector); // never >> > returns >> > >> > // Searcher >> > public void search(Query query, Collector collector) >> > throws IOException { >> > search(createWeight(query), null, collector); // never returns >> > } >> > >> > // extends IndexSearcher >> > public void search(Weight weight, Filter filter, final Collector >> collector) throws IOException { >> > boolean topScorer = (filter == null) true : false; >> > Scorer scorer = weight.scorer(reader, true, topScorer); >> > if (scorer != null && topScorer) { >> > scorer.score(collector); // never returns >> > >> > // Scorer >> > public void score(Collector collector) throws IOException { >> > collector.setScorer(this); >> > int doc; >> > while ((doc = nextDoc()) != NO_MORE_DOCS) { // doc = 0 >> (infinite) >> > collector.collect(doc); >> > } >> > } >> > >> > >> > Thanks for any feedback, >> > Justin >> > >> > >> > >> > >> > - >> > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org >> > For additional commands, e-mail: java-user-h...@lucene.apache.org >> > >> > >> >> - >> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org >> For additional commands, e-mail: java-user-h...@lucene.apache.org > > > > ---
Changing TF method
Hi, I want to change the Lucene's similarity in a way that I can add Fuzzy memberships to the terms of a document. Thus, TF value of a term in one document is not always 1, it can add 0.7 to the value of the TF ( (In my application, each term is contained in a document at most once). This membership value is available before index time. On the other hand, each occurrence of a word will not be considered as 1 documentfrequency for the IDF formula. I was wondering if I can change the TF and IDF values of the terms like this. So far, I know that I can change the impact of TF values on the scoring, but not this thing that I'm looking for. Best, Reza -- View this message in context: http://old.nabble.com/Changing-TF-method-tp27730729p27730729.html Sent from the Lucene - Java Users mailing list archive at Nabble.com. - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
RE: recovering payload from fields
> It sounds like you need to iterate through all terms sequentially in a given > field in the doc, accessing offset & payload? In which case reanalyzing at > search time may be the best way to go. If it matters it doesn't need to be sequential. I just need access to all the payloads for a given doc in the index. If reanalyzing is the best option I suppose I'll do that. Or perhaps build some auxiliary structure to cache the information. Thanks for the clarification, -Chris - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org