custom FieldCache cost too much time. how can I preload the the custom fieldCache when new segment exits!

2010-02-27 Thread luocanrao
custom FieldCache cost too much time.
So every first time,reopen the new reader ,it interfere the performance of
search
I hope someone can tell me,how can I preload the the custom fieldCache when
new segment exits!
Thanks again!

here is source  , In FieldComparator.setNextReader method   
((C2CFieldManager)fieldManager).lCommID =
FieldCache.DEFAULT.getLongs(reader, "1",new LongParser(){
public long parseLong(String documentIDStr)
{
documentIDStr =
documentIDStr.substring(16);
long documentID =
Long.parseLong(documentIDStr,16);
return documentID;
}

});


-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



Re: custom FieldCache cost too much time. how can I preload the the custom fieldCache when new segment exits!

2010-02-27 Thread Michael McCandless
How are you opening a new reader?

If it's a near-real-time reader (IndexWriter.getReader), or you use
IndexReader.reopen, it should only be the newly created segments that
have to generate the field cache entry, which most of the time should
be fast.

If you are already using those APIs and its still not fast enough,
then you should just warm the reader before using it (additionally,
for a near-real-time reader you should warm newly merged segments by
installing a mergedSegmentWarmer on the writer).

Mike

On Sat, Feb 27, 2010 at 3:35 AM, luocanrao  wrote:
> custom FieldCache cost too much time.
> So every first time,reopen the new reader ,it interfere the performance of
> search
> I hope someone can tell me,how can I preload the the custom fieldCache when
> new segment exits!
> Thanks again!
>
> here is source  , In FieldComparator.setNextReader method
> ((C2CFieldManager)fieldManager).lCommID =
> FieldCache.DEFAULT.getLongs(reader, "1",new LongParser(){
>                                public long parseLong(String documentIDStr)
> {
>                                        documentIDStr =
> documentIDStr.substring(16);
>                                        long documentID =
> Long.parseLong(documentIDStr,16);
>                                        return documentID;
>                                }
>
>                });
>
>
> -
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org
>
>

-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



Re: recovering payload from fields

2010-02-27 Thread Michael McCandless
You can also access payloads through the TermPositions enum, but, this
is by term and then by doc.

It sounds like you need to iterate through all terms sequentially in a
given field in the doc, accessing offset & payload?  In which case
reanalyzing at search time may be the best way to go.

You can store term vectors in the index, which will store offsets (if
you ask it to), but, payloads are not currently stored with term
vectors.

Mike

On Fri, Feb 26, 2010 at 7:42 PM, Christopher Condit  wrote:
>> Payload Data is accessed through PayloadSpans so using SpanQUeries is the
>> netry point it seems.  There are tools like PayloadSpanUtil that convert 
>> other
>> queries into SpanQueries for this purpose if needed but the api for Payloads
>> looks it like it goes through Spans is the bottom line.
>
> So there's no way to iterate through all the payloads for a given field? I 
> can't use the SpanQuery mechanism because in this case the entire field will 
> be displayed - and I can't search for "*". Is there some trick I'm not 
> thinking of?
>
>> this is the tip of the iceberg; a big dangerous iceberg...
>
> Yes - I'm beginning to see that...
>
> Thanks,
> -Chris
>
> -
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org
>
>

-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



Re: Infinite loop when searching empty index

2010-02-27 Thread Michael McCandless
I turned this into a unit test... but I don't see it never
returning... the test passes.

How did you create your empty reader?

Patch:

Index: src/test/org/apache/lucene/search/TestMatchAllDocsQuery.java
===
--- src/test/org/apache/lucene/search/TestMatchAllDocsQuery.java
(revision
916939)
+++ src/test/org/apache/lucene/search/TestMatchAllDocsQuery.java
(working copy)
@@ -27,6 +27,8 @@
 import org.apache.lucene.index.IndexReader;
 import org.apache.lucene.queryParser.QueryParser;
 import org.apache.lucene.store.RAMDirectory;
+import org.apache.lucene.store.Directory;
+import org.apache.lucene.store.MockRAMDirectory;

 import org.apache.lucene.util.LuceneTestCase;

@@ -115,6 +117,21 @@
 dir.close();
   }

+  public void testNeverReturns() throws Exception {
+Directory dir = new MockRAMDirectory();
+IndexWriter w = new IndexWriter(dir, new
StandardAnalyzer(TEST_VERSION_CURRENT),
IndexWriter.MaxFieldLength.UNLIMITED);
+IndexReader r = w.getReader();
+w.close();
+
+assertEquals(0, r.numDocs()); // empty index
+IndexSearcher s = new IndexSearcher(r);
+TopDocsCollector collector = TopScoreDocCollector.create(0, true);
+s.search(new MatchAllDocsQuery(), collector);  // never returns
+s.close();
+r.close();
+dir.close();
+  }
+
   public void testEquals() {
 Query q1 = new MatchAllDocsQuery();
 Query q2 = new MatchAllDocsQuery();

Mike

On Fri, Feb 26, 2010 at 4:54 PM, Justin  wrote:
> Is this a bug in Lucene Java as of tr...@915399?
>
>    int numDocs = reader.numDocs(); // = 0 (empty index)
>    TopDocsCollector collector = TopScoreDocCollector.create(numDocs,
> true);
>    searcher.search(new MatchAllDocsQuery(), collector);  // never
> returns
>
>    // Searcher
>    public void search(Query query, Collector collector)
>      throws IOException {
>      search(createWeight(query), null, collector); // never returns
>    }
>
>    // extends IndexSearcher
>    public void search(Weight weight, Filter filter, final Collector 
> collector) throws IOException {
>      boolean topScorer = (filter == null) true : false;
>      Scorer scorer = weight.scorer(reader, true, topScorer);
>      if (scorer != null && topScorer) {
>        scorer.score(collector); // never returns
>
>    // Scorer
>    public void score(Collector collector) throws IOException {
>      collector.setScorer(this);
>      int doc;
>      while ((doc = nextDoc()) != NO_MORE_DOCS) { // doc = 0 (infinite)
>        collector.collect(doc);
>      }
>    }
>
>
> Thanks for any feedback,
> Justin
>
>
>
>
> -
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org
>
>

-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



RE: Infinite loop when searching empty index

2010-02-27 Thread Uwe Schindler
I was doing the same, MatchAllDocsScorer is fine and also AbstractAllTermDocs.

-
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: u...@thetaphi.de

> -Original Message-
> From: Michael McCandless [mailto:luc...@mikemccandless.com]
> Sent: Saturday, February 27, 2010 11:52 AM
> To: java-user@lucene.apache.org
> Subject: Re: Infinite loop when searching empty index
> 
> I turned this into a unit test... but I don't see it never
> returning... the test passes.
> 
> How did you create your empty reader?
> 
> Patch:
> 
> Index: src/test/org/apache/lucene/search/TestMatchAllDocsQuery.java
> ===
> --- src/test/org/apache/lucene/search/TestMatchAllDocsQuery.java
>   (revision
> 916939)
> +++ src/test/org/apache/lucene/search/TestMatchAllDocsQuery.java
>   (working copy)
> @@ -27,6 +27,8 @@
>  import org.apache.lucene.index.IndexReader;
>  import org.apache.lucene.queryParser.QueryParser;
>  import org.apache.lucene.store.RAMDirectory;
> +import org.apache.lucene.store.Directory;
> +import org.apache.lucene.store.MockRAMDirectory;
> 
>  import org.apache.lucene.util.LuceneTestCase;
> 
> @@ -115,6 +117,21 @@
>  dir.close();
>}
> 
> +  public void testNeverReturns() throws Exception {
> +Directory dir = new MockRAMDirectory();
> +IndexWriter w = new IndexWriter(dir, new
> StandardAnalyzer(TEST_VERSION_CURRENT),
> IndexWriter.MaxFieldLength.UNLIMITED);
> +IndexReader r = w.getReader();
> +w.close();
> +
> +assertEquals(0, r.numDocs()); // empty index
> +IndexSearcher s = new IndexSearcher(r);
> +TopDocsCollector collector = TopScoreDocCollector.create(0, true);
> +s.search(new MatchAllDocsQuery(), collector);  // never returns
> +s.close();
> +r.close();
> +dir.close();
> +  }
> +
>public void testEquals() {
>  Query q1 = new MatchAllDocsQuery();
>  Query q2 = new MatchAllDocsQuery();
> 
> Mike
> 
> On Fri, Feb 26, 2010 at 4:54 PM, Justin  wrote:
> > Is this a bug in Lucene Java as of tr...@915399?
> >
> >int numDocs = reader.numDocs(); // = 0 (empty index)
> >TopDocsCollector collector = TopScoreDocCollector.create(numDocs,
> > true);
> >searcher.search(new MatchAllDocsQuery(), collector);  // never
> > returns
> >
> >// Searcher
> >public void search(Query query, Collector collector)
> >  throws IOException {
> >  search(createWeight(query), null, collector); // never returns
> >}
> >
> >// extends IndexSearcher
> >public void search(Weight weight, Filter filter, final Collector
> collector) throws IOException {
> >  boolean topScorer = (filter == null) true : false;
> >  Scorer scorer = weight.scorer(reader, true, topScorer);
> >  if (scorer != null && topScorer) {
> >scorer.score(collector); // never returns
> >
> >// Scorer
> >public void score(Collector collector) throws IOException {
> >  collector.setScorer(this);
> >  int doc;
> >  while ((doc = nextDoc()) != NO_MORE_DOCS) { // doc = 0
> (infinite)
> >collector.collect(doc);
> >  }
> >}
> >
> >
> > Thanks for any feedback,
> > Justin
> >
> >
> >
> >
> > -
> > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> > For additional commands, e-mail: java-user-h...@lucene.apache.org
> >
> >
> 
> -
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org



-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



答复: custom FieldCache cost too m uch time. how can I preload the the custom fieldCache when new segment exits!

2010-02-27 Thread luocanrao
I set merge factor 4, every five minute I reopen the reader.
yes most of the time is very fast. But sometimes it is very slow.
For example,when start the program,the first query cosume 10s!
So when newly created segment generated,the query cosume more than 1s.
Our performance is key point.
Sorry ,my English is not good!
I hope I can preload custom fieldCache in a another thread,not the query thread.
So I will not have performace issue.
-邮件原件-
发件人: Michael McCandless [mailto:luc...@mikemccandless.com] 
发送时间: 2010年2月27日 18:37
收件人: java-user@lucene.apache.org
主题: Re: custom FieldCache cost too much time. how can I preload the the custom 
fieldCache when new segment exits!

How are you opening a new reader?

If it's a near-real-time reader (IndexWriter.getReader), or you use
IndexReader.reopen, it should only be the newly created segments that
have to generate the field cache entry, which most of the time should
be fast.

If you are already using those APIs and its still not fast enough,
then you should just warm the reader before using it (additionally,
for a near-real-time reader you should warm newly merged segments by
installing a mergedSegmentWarmer on the writer).

Mike

On Sat, Feb 27, 2010 at 3:35 AM, luocanrao  wrote:
> custom FieldCache cost too much time.
> So every first time,reopen the new reader ,it interfere the performance of
> search
> I hope someone can tell me,how can I preload the the custom fieldCache when
> new segment exits!
> Thanks again!
>
> here is source  , In FieldComparator.setNextReader method
> ((C2CFieldManager)fieldManager).lCommID =
> FieldCache.DEFAULT.getLongs(reader, "1",new LongParser(){
>public long parseLong(String documentIDStr)
> {
>documentIDStr =
> documentIDStr.substring(16);
>long documentID =
> Long.parseLong(documentIDStr,16);
>return documentID;
>}
>
>});
>
>
> -
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org
>
>

-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org


-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



答复: custom FieldCache cost too m uch time. how can I preload the the custom fieldCache when new segment exits!

2010-02-27 Thread luocanrao
Ps:in our evrioment
a document have more than ten field,in a short time,may be have many update.
installing a mergedSegmentWarmer on the writer,can you give me a small case,
thanks very much!

-邮件原件-
发件人: luocanrao [mailto:luocan19826...@sohu.com] 
发送时间: 2010年2月27日 19:09
收件人: java-user@lucene.apache.org
主题: 答复: custom FieldCache cost too much time. how can I preload the the custom 
fieldCache when new segment exits!

I set merge factor 4, every five minute I reopen the reader.
yes most of the time is very fast. But sometimes it is very slow.
For example,when start the program,the first query cosume 10s!
So when newly created segment generated,the query cosume more than 1s.
Our performance is key point.
Sorry ,my English is not good!
I hope I can preload custom fieldCache in a another thread,not the query thread.
So I will not have performace issue.
-邮件原件-
发件人: Michael McCandless [mailto:luc...@mikemccandless.com] 
发送时间: 2010年2月27日 18:37
收件人: java-user@lucene.apache.org
主题: Re: custom FieldCache cost too much time. how can I preload the the custom 
fieldCache when new segment exits!

How are you opening a new reader?

If it's a near-real-time reader (IndexWriter.getReader), or you use
IndexReader.reopen, it should only be the newly created segments that
have to generate the field cache entry, which most of the time should
be fast.

If you are already using those APIs and its still not fast enough,
then you should just warm the reader before using it (additionally,
for a near-real-time reader you should warm newly merged segments by
installing a mergedSegmentWarmer on the writer).

Mike

On Sat, Feb 27, 2010 at 3:35 AM, luocanrao  wrote:
> custom FieldCache cost too much time.
> So every first time,reopen the new reader ,it interfere the performance of
> search
> I hope someone can tell me,how can I preload the the custom fieldCache when
> new segment exits!
> Thanks again!
>
> here is source  , In FieldComparator.setNextReader method
> ((C2CFieldManager)fieldManager).lCommID =
> FieldCache.DEFAULT.getLongs(reader, "1",new LongParser(){
>public long parseLong(String documentIDStr)
> {
>documentIDStr =
> documentIDStr.substring(16);
>long documentID =
> Long.parseLong(documentIDStr,16);
>return documentID;
>}
>
>});
>
>
> -
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org
>
>

-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org


-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org


-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



Re: NumericField exact match

2010-02-27 Thread Yonik Seeley
On Fri, Feb 26, 2010 at 3:33 PM, Ivan Vasilev  wrote:
> Does it matter precision step when I use NumericRangeQuery for exact
> matches?

No.  There is a full-precision version of the value indexed regardless
of the precision step, and that's used for an exact match query.

> I mean if I use the default precision step when indexing that
> fields it is guaranteed that:
> 1. With this query I will always hit the docs that contain "val" for the
> "field";
> 2. I will never hit docs with different that have diferent "val" for the
> "field";

Correct.

-Yonik
http://www.lucidimagination.com

-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



Re: If you could have one feature in Lucene...

2010-02-27 Thread Glen Newton
Pluggable compression allowing for alternatives to gzip for text
compression for storing.
Specifically I am interested in bzip2[1] as implemented in Apache
Commons Compress[2].
While bzip2 compression is considerable slower than gzip (although
decompression is not too much slower than gzip) it compresses much
better than gzip (especially text).

Having the choice would be helpful, and for Lucene usage for non-text
indexing, content specific compression algorithms may outperform the
default gzip.

And in these days of multi-core / multi-threading, perhaps we could
convince the Apache Commons Compress team to implement a parallel Java
version of bzip2 compression (theirs is single threaded), like
pbzip2[3].

-glen


[1]http://en.wikipedia.org/wiki/Bzip2
[2]http://commons.apache.org/compress/
[3]http://compression.ca/pbzip2/

On 24 February 2010 08:42, Grant Ingersoll  wrote:
> What would it be?
>
> -
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org
>
>



-- 

-

-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



RE: If you could have one feature in Lucene...

2010-02-27 Thread Uwe Schindler
Hi Glen,

 
> Pluggable compression allowing for alternatives to gzip for text
> compression for storing.
> Specifically I am interested in bzip2[1] as implemented in Apache
> Commons Compress[2].
> While bzip2 compression is considerable slower than gzip (although
> decompression is not too much slower than gzip) it compresses much
> better than gzip (especially text).
> 
> Having the choice would be helpful, and for Lucene usage for non-text
> indexing, content specific compression algorithms may outperform the
> default gzip.

Since Version 3.0 / 2.9 of Lucene compression support was removed entirely (in 
2.9 still avail as deprecated). All you now have to do is simply store your 
compressed stored fields as a byte[] (see Field javadocs). By that you can use 
any compression. The problems with gzip and the other available compression 
algos lead us to removing the compression support from Lucene (as it had lots 
of problems). In general the way to go is: Create a ByteArrayOutputStream and 
wrap with any compression filter, then feed your data in and use "new 
Field(name,stream.getBytes())". On the client side just use the inverse 
(Document.getBinaryValue(), create input stream on top of byte[] and 
decompress).

Uwe


-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



Re: If you could have one feature in Lucene...

2010-02-27 Thread Glen Newton
Hello Uwe.

That will teach me for not keeping up with the versions! :-)
So it is up to the application to keep track of what it used for compression.
Understandable.
Thanks!

Glen

On 27 February 2010 10:17, Uwe Schindler  wrote:
> Hi Glen,
>
>
>> Pluggable compression allowing for alternatives to gzip for text
>> compression for storing.
>> Specifically I am interested in bzip2[1] as implemented in Apache
>> Commons Compress[2].
>> While bzip2 compression is considerable slower than gzip (although
>> decompression is not too much slower than gzip) it compresses much
>> better than gzip (especially text).
>>
>> Having the choice would be helpful, and for Lucene usage for non-text
>> indexing, content specific compression algorithms may outperform the
>> default gzip.
>
> Since Version 3.0 / 2.9 of Lucene compression support was removed entirely 
> (in 2.9 still avail as deprecated). All you now have to do is simply store 
> your compressed stored fields as a byte[] (see Field javadocs). By that you 
> can use any compression. The problems with gzip and the other available 
> compression algos lead us to removing the compression support from Lucene (as 
> it had lots of problems). In general the way to go is: Create a 
> ByteArrayOutputStream and wrap with any compression filter, then feed your 
> data in and use "new Field(name,stream.getBytes())". On the client side just 
> use the inverse (Document.getBinaryValue(), create input stream on top of 
> byte[] and decompress).
>
> Uwe
>
>
> -
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org
>
>



-- 

-

-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



Re: FieldCache cost too much time. how can I preload the the custom fieldCache when new segment exits!

2010-02-27 Thread Michael McCandless
Sounds like you should simply open & warm the reader in a background thread...

You might want to use the SearcherManager class from upcoming Lucene
in Action 2nd edition (NOTE: I'm a co-author).  You can download the
source code @ http://manning.com/hatcher3.

Mike

-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



Re: custom FieldCache cost too much time. how can I preload the the custom fieldCache when new segment exits!

2010-02-27 Thread Michael McCandless
If you look at the javadocs for IndexWriter it explains how to do it.
You just provide a class that implements the warm method, and inside
that method you do whatever app specific things you need to do to the
provided IndexReader to warm it.

Note that the SearcherManager class from LIA2 handles setting up the
MergedSegmentWarmer, if you implement the warm method.

Mike

-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



Re: Infinite loop when searching empty index

2010-02-27 Thread Justin
Thanks for checking.  I think I tracked down the problem.  We apparently 
extended most of these classes and more work was necessary to facilitate the 
latest API.  I just didn't dig deep enough, into nextDoc() which I thought too 
trivial to step into.  The extended Scorer repeatedly returned the last doc ID 
instead of NO_MORE_DOCS.  Sorry for the wild goose chase!

I do think I've run across another problem which I may report in a new thread: 
ParallelReader.reopen() appears to be taking up file descriptors to the same 
files without letting the old ones go.  Our Java process hits the 'limit -n' 
barrier (in the thousands) after a few minutes.  My current work around is to 
check isCurrent(), close, then open.  I wonder if the changes to support near 
real-time search inadvertently broke this.




- Original Message 
From: Uwe Schindler 
To: java-user@lucene.apache.org
Sent: Sat, February 27, 2010 4:55:55 AM
Subject: RE: Infinite loop when searching empty index

I was doing the same, MatchAllDocsScorer is fine and also AbstractAllTermDocs.

-
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: u...@thetaphi.de

> -Original Message-
> From: Michael McCandless [mailto:luc...@mikemccandless.com]
> Sent: Saturday, February 27, 2010 11:52 AM
> To: java-user@lucene.apache.org
> Subject: Re: Infinite loop when searching empty index
> 
> I turned this into a unit test... but I don't see it never
> returning... the test passes.
> 
> How did you create your empty reader?
> 
> Patch:
> 
> Index: src/test/org/apache/lucene/search/TestMatchAllDocsQuery.java
> ===
> --- src/test/org/apache/lucene/search/TestMatchAllDocsQuery.java
> (revision
> 916939)
> +++ src/test/org/apache/lucene/search/TestMatchAllDocsQuery.java
> (working copy)
> @@ -27,6 +27,8 @@
>  import org.apache.lucene.index.IndexReader;
>  import org.apache.lucene.queryParser.QueryParser;
>  import org.apache.lucene.store.RAMDirectory;
> +import org.apache.lucene.store.Directory;
> +import org.apache.lucene.store.MockRAMDirectory;
> 
>  import org.apache.lucene.util.LuceneTestCase;
> 
> @@ -115,6 +117,21 @@
>  dir.close();
>}
> 
> +  public void testNeverReturns() throws Exception {
> +Directory dir = new MockRAMDirectory();
> +IndexWriter w = new IndexWriter(dir, new
> StandardAnalyzer(TEST_VERSION_CURRENT),
> IndexWriter.MaxFieldLength.UNLIMITED);
> +IndexReader r = w.getReader();
> +w.close();
> +
> +assertEquals(0, r.numDocs()); // empty index
> +IndexSearcher s = new IndexSearcher(r);
> +TopDocsCollector collector = TopScoreDocCollector.create(0, true);
> +s.search(new MatchAllDocsQuery(), collector);  // never returns
> +s.close();
> +r.close();
> +dir.close();
> +  }
> +
>public void testEquals() {
>  Query q1 = new MatchAllDocsQuery();
>  Query q2 = new MatchAllDocsQuery();
> 
> Mike
> 
> On Fri, Feb 26, 2010 at 4:54 PM, Justin  wrote:
> > Is this a bug in Lucene Java as of tr...@915399?
> >
> >int numDocs = reader.numDocs(); // = 0 (empty index)
> >TopDocsCollector collector = TopScoreDocCollector.create(numDocs,
> > true);
> >searcher.search(new MatchAllDocsQuery(), collector);  // never
> > returns
> >
> >// Searcher
> >public void search(Query query, Collector collector)
> >  throws IOException {
> >  search(createWeight(query), null, collector); // never returns
> >}
> >
> >// extends IndexSearcher
> >public void search(Weight weight, Filter filter, final Collector
> collector) throws IOException {
> >  boolean topScorer = (filter == null) true : false;
> >  Scorer scorer = weight.scorer(reader, true, topScorer);
> >  if (scorer != null && topScorer) {
> >scorer.score(collector); // never returns
> >
> >// Scorer
> >public void score(Collector collector) throws IOException {
> >  collector.setScorer(this);
> >  int doc;
> >  while ((doc = nextDoc()) != NO_MORE_DOCS) { // doc = 0
> (infinite)
> >collector.collect(doc);
> >  }
> >}
> >
> >
> > Thanks for any feedback,
> > Justin
> >
> >
> >
> >
> > -
> > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> > For additional commands, e-mail: java-user-h...@lucene.apache.org
> >
> >
> 
> -
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org



-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org


  

-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For addition

Re: Infinite loop when searching empty index

2010-02-27 Thread Michael McCandless
Hmm -- can you give more details on the possible file descriptor leak?
 Or a test case?  Thanks.

Mike

On Sat, Feb 27, 2010 at 12:24 PM, Justin  wrote:
> Thanks for checking.  I think I tracked down the problem.  We apparently 
> extended most of these classes and more work was necessary to facilitate the 
> latest API.  I just didn't dig deep enough, into nextDoc() which I thought 
> too trivial to step into.  The extended Scorer repeatedly returned the last 
> doc ID instead of NO_MORE_DOCS.  Sorry for the wild goose chase!
>
> I do think I've run across another problem which I may report in a new 
> thread: ParallelReader.reopen() appears to be taking up file descriptors to 
> the same files without letting the old ones go.  Our Java process hits the 
> 'limit -n' barrier (in the thousands) after a few minutes.  My current work 
> around is to check isCurrent(), close, then open.  I wonder if the changes to 
> support near real-time search inadvertently broke this.
>
>
>
>
> - Original Message 
> From: Uwe Schindler 
> To: java-user@lucene.apache.org
> Sent: Sat, February 27, 2010 4:55:55 AM
> Subject: RE: Infinite loop when searching empty index
>
> I was doing the same, MatchAllDocsScorer is fine and also AbstractAllTermDocs.
>
> -
> Uwe Schindler
> H.-H.-Meier-Allee 63, D-28213 Bremen
> http://www.thetaphi.de
> eMail: u...@thetaphi.de
>
>> -Original Message-
>> From: Michael McCandless [mailto:luc...@mikemccandless.com]
>> Sent: Saturday, February 27, 2010 11:52 AM
>> To: java-user@lucene.apache.org
>> Subject: Re: Infinite loop when searching empty index
>>
>> I turned this into a unit test... but I don't see it never
>> returning... the test passes.
>>
>> How did you create your empty reader?
>>
>> Patch:
>>
>> Index: src/test/org/apache/lucene/search/TestMatchAllDocsQuery.java
>> ===
>> --- src/test/org/apache/lucene/search/TestMatchAllDocsQuery.java
>>     (revision
>> 916939)
>> +++ src/test/org/apache/lucene/search/TestMatchAllDocsQuery.java
>>     (working copy)
>> @@ -27,6 +27,8 @@
>>  import org.apache.lucene.index.IndexReader;
>>  import org.apache.lucene.queryParser.QueryParser;
>>  import org.apache.lucene.store.RAMDirectory;
>> +import org.apache.lucene.store.Directory;
>> +import org.apache.lucene.store.MockRAMDirectory;
>>
>>  import org.apache.lucene.util.LuceneTestCase;
>>
>> @@ -115,6 +117,21 @@
>>      dir.close();
>>    }
>>
>> +  public void testNeverReturns() throws Exception {
>> +    Directory dir = new MockRAMDirectory();
>> +    IndexWriter w = new IndexWriter(dir, new
>> StandardAnalyzer(TEST_VERSION_CURRENT),
>> IndexWriter.MaxFieldLength.UNLIMITED);
>> +    IndexReader r = w.getReader();
>> +    w.close();
>> +
>> +    assertEquals(0, r.numDocs()); // empty index
>> +    IndexSearcher s = new IndexSearcher(r);
>> +    TopDocsCollector collector = TopScoreDocCollector.create(0, true);
>> +    s.search(new MatchAllDocsQuery(), collector);  // never returns
>> +    s.close();
>> +    r.close();
>> +    dir.close();
>> +  }
>> +
>>    public void testEquals() {
>>      Query q1 = new MatchAllDocsQuery();
>>      Query q2 = new MatchAllDocsQuery();
>>
>> Mike
>>
>> On Fri, Feb 26, 2010 at 4:54 PM, Justin  wrote:
>> > Is this a bug in Lucene Java as of tr...@915399?
>> >
>> >    int numDocs = reader.numDocs(); // = 0 (empty index)
>> >    TopDocsCollector collector = TopScoreDocCollector.create(numDocs,
>> > true);
>> >    searcher.search(new MatchAllDocsQuery(), collector);  // never
>> > returns
>> >
>> >    // Searcher
>> >    public void search(Query query, Collector collector)
>> >      throws IOException {
>> >      search(createWeight(query), null, collector); // never returns
>> >    }
>> >
>> >    // extends IndexSearcher
>> >    public void search(Weight weight, Filter filter, final Collector
>> collector) throws IOException {
>> >      boolean topScorer = (filter == null) true : false;
>> >      Scorer scorer = weight.scorer(reader, true, topScorer);
>> >      if (scorer != null && topScorer) {
>> >        scorer.score(collector); // never returns
>> >
>> >    // Scorer
>> >    public void score(Collector collector) throws IOException {
>> >      collector.setScorer(this);
>> >      int doc;
>> >      while ((doc = nextDoc()) != NO_MORE_DOCS) { // doc = 0
>> (infinite)
>> >        collector.collect(doc);
>> >      }
>> >    }
>> >
>> >
>> > Thanks for any feedback,
>> > Justin
>> >
>> >
>> >
>> >
>> > -
>> > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
>> > For additional commands, e-mail: java-user-h...@lucene.apache.org
>> >
>> >
>>
>> -
>> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
>> For additional commands, e-mail: java-user-h...@lucene.apache.org
>
>
>
> ---

Changing TF method

2010-02-27 Thread PlusPlus

Hi, 

   I want to change the Lucene's similarity in a way that I can add Fuzzy
memberships to the terms of a document. Thus, TF value of a term in one
document is not always 1, it can add 0.7 to the value of the TF ( (In my
application, each term is contained in a document at most once). This
membership value is available before index time. 

   On the other hand, each occurrence of a word will not be considered as 1
documentfrequency for the IDF formula. 

   I was wondering if I can change the TF and IDF values of the terms like
this. So far, I know that I can change the impact of TF values on the
scoring, but not this thing that I'm looking for. 

Best, 
Reza 
-- 
View this message in context: 
http://old.nabble.com/Changing-TF-method-tp27730729p27730729.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.


-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



RE: recovering payload from fields

2010-02-27 Thread Christopher Condit
> It sounds like you need to iterate through all terms sequentially in a given
> field in the doc, accessing offset & payload?  In which case reanalyzing at
> search time may be the best way to go.

If it matters it doesn't need to be sequential. I just need access to all the 
payloads for a given doc in the index. If reanalyzing is the best option I 
suppose I'll do that. Or perhaps build some auxiliary structure to cache the 
information.

Thanks for the clarification,
-Chris

-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org