Re: Can lucene query this result?

2006-07-20 Thread Heng Mei
I don't think there's an easy built-in way for Lucene to do this. What you can do is implement a HitCollector to process each doc hit and maintain a count for each user_group. You'll need to preload a doc_id - user_group mapping. (Take a look at the code for FieldCacheImpl.getInts() for sample

Re: Empty fields ...

2006-07-20 Thread Chris Hostetter
: Thanks much for that clarification, it helps a lot. The original request was : to find docs wthat were NOT NULL, so I'm glad I'm not the only one who : But with your RangeFilter comment, that seems unnecessary. You can use a : RangeFilter with null, null as bounds, then just flip the bits in

Re: Index-Format difference between 1.4.3 and 2.0

2006-07-20 Thread lude
As Luke was release with a Lucene-1.9 Where did you get this information? From all I know Luke is based on Lucene Version 1.4.3. On 7/19/06, Nicolas Lalevée [EMAIL PROTECTED] wrote: Le Mercredi 19 Juillet 2006 12:32, lude a écrit: Hi Nicolas, thanks for answering. You wrote: And

Re: Lucene support for OpenDocument?

2006-07-20 Thread Andrzej Bialecki
Daniel Noll wrote: marbux wrote: Hello, The OpenDocument Fellowship attempts to maintain a directory of applicatiopns supporting OpenDocument file formats. http://www.opendocumentfellowship.org/applicationsa. I have been attempting, without success, to determine whether Lucene supports

Re: Index-Format difference between 1.4.3 and 2.0

2006-07-20 Thread yueyu lin
I'm using Luke to manage Lucene 1.9's index On 7/20/06, lude [EMAIL PROTECTED] wrote: As Luke was release with a Lucene-1.9 Where did you get this information? From all I know Luke is based on Lucene Version 1.4.3. On 7/19/06, Nicolas Lalevée [EMAIL PROTECTED] wrote: Le Mercredi 19

Re: Index-Format difference between 1.4.3 and 2.0

2006-07-20 Thread Andrzej Bialecki
lude wrote: As Luke was release with a Lucene-1.9 Where did you get this information? From all I know Luke is based on Lucene Version 1.4.3. The latest version of Luke was released with an early snapshot of 1.9. I plan to release a 2.0-based version in a few days. -- Best regards,

Re: Index-Format difference between 1.4.3 and 2.0

2006-07-20 Thread Miles Barr
Andrzej Bialecki wrote: lude wrote: As Luke was release with a Lucene-1.9 Where did you get this information? From all I know Luke is based on Lucene Version 1.4.3. The latest version of Luke was released with an early snapshot of 1.9. I plan to release a 2.0-based version in a

PDF documents with MoreLikeThis class

2006-07-20 Thread Davide
Hi, I'm using MoreLikeThis class to find similar documents... but I'm not sure if it is correct to pass as argument a Pdf file to *MoreLikeThis.like()* method. Trying to be more clear: 1) In my Lucene index I add some PDF files (I use PDFBox to extract text and add fields to index) 2) Now I want

Re: PDF documents with MoreLikeThis class

2006-07-20 Thread mark harwood
Do I have to extract text from PDF file and then pass an InputStream with the text inside? Yes. Although technically you could pass the content unparsed it will contain a lot of unintelligible garbage in the form of markup and images. All Lucene classes deliberately try and avoid the mucky

Re: Empty fields ...

2006-07-20 Thread Erick Erickson
What? You actually want me to put forth some effort? That's crazy talk G.. Thanks, I think I've got it now. Best Erick

RE: Date ranges - getting the approach right

2006-07-20 Thread Rob Staveley (Tom)
Sorry for the delayed response. It takes me a while to get my head around Lucene. I've got parallel indexes, which means that chorological ordering by doc ID would need to be a bit more sophisticated. It strikes me that there must be some performance advantage doing it though. I'll see if I can

RE: Date ranges - getting the approach right

2006-07-20 Thread Mike Streeton
This is how we solve the range query problem using filters. The nice part about it is you can use a range in a query so several ranges can be ORed/ANDed or NOTed together if required, instead of applying a range filter to the who query. (Assumes dates in MMDD format) Hope this helps Mike.

RE: Date ranges - getting the approach right

2006-07-20 Thread Rob Staveley (Tom)
Wow. Looking at the implementation of http://lucene.apache.org/java/docs/api/org/apache/lucene/index/IndexReader.h tml#open(org.apache.lucene.store.Directory) I've now realised that when you create an IndexReader (clue it is abstract), you actually instantiate a MultiReader, with an IndexReader

Performance question

2006-07-20 Thread Scott Smith
I was reading a book on SQL query tuning. The gist of it was that the way to get the best performance (fastest execution) out of a SQL select statement was to create execution plans where the most selective term in the where clause is used first, the next most selective term is used next, etc.

Re: Lock obtain time out (OT: Mailing list settings)

2006-07-20 Thread Paul Borgermans
Hello I suppose that you are using gmail? It is just a property of gmail, take a look at thee archives after a few hours, you will find it back ;-) for example: http://mail-archives.apache.org/mod_mbox/lucene-java-user/ hth --paul On 7/19/06, Pasquale Imbemba [EMAIL PROTECTED] wrote:

Re: Performance question

2006-07-20 Thread Doron Cohen
Does it matter what order I add the sub-queries to the BooleanQuery Q. That is, is the execution speed for the search faster (slower) if I do: Q.add(Q1, BooleanClause.Occur.MUST); Q.add(Q2, BooleanClause.Occur.MUST); Q.add(Q3, BooleanClause.Occur.MUST); As

Understanding Lucene Slop

2006-07-20 Thread Walt Stoneburner
Hello, I'm trying to understand Lucene's slop value a little better, as what I'm able to Google about it seems a little ambiguous. My main goal is to search for a linear sequence of keywords in a specific order over a given range. For instant I'd like a query of fate ships to find

Re: Understanding Lucene Slop

2006-07-20 Thread Erick Erickson
Have you looked at SpanNearQuery? From what you describe, it looks to be what you want. The constructor takes slop as well as a boolean whether order is relevant. The array of SpanQuerys would probably consist of a bunch of SpanTermQuerys. Best Erick