Question on method visibility in Highlighter - WeightedSpanTermExtractor class

2012-10-17 Thread Dawn Zoƫ Raison
Hi folks, Is there a reason why the setMaxDocCharsToAnalyze() method of WeightedSpanTermExtractor() is protected? The class is a perfect fit for my requirement (enumerating the list of terms present in a document that match the current query for subsequent highlighting in a PDF file) with th

Re: Restrict Lucene search in concrete document ids

2012-10-17 Thread Ian Lea
I would expect a filter to be quicker than adding thousands of clauses because Filters are just bit sets and operations are extremely fast. But never take performance predictions, particularly from me, on trust - test it in your app with your index on your hardware. To use a filter here I think yo

Re: Lucene 4.0 benchmark bug?

2012-10-17 Thread Robert Muir
On Wed, Oct 17, 2012 at 6:48 AM, Zeynep P. wrote: > > My index is created by lucene 3.6. > Second problem, for the same collection MAP = 0.17 with default similarity, > MAP= 0.07 with lucene 4.0 BM25 similarity (b=0.75, k1=1.2). Did you read CHANGES.txt before doing this? This is really importan

Re: Restrict Lucene search in concrete document ids

2012-10-17 Thread sxam
Hi, Well, those are Lucene Document Ids I am talking about. And I know it could be done using a Filter with a Query, but I thought that if I can tell Lucene to somehow look in a specified set of Documents, it would speed up the search substantially. Otherwise it's just the same as adding 5000 Claus

Lucene 4.0 benchmark bug?

2012-10-17 Thread Zeynep P.
Hi to all, I started to use benchmark 4.0 to create submission report files with the following code: BufferedReader br = new BufferedReader(fr); QualityQuery qqs[] = qReader.readQueries(br); QualityQueryParser qqParser = new SimpleQQParser("title", "body");

Re: Lucene updateDocument deletes the document, but the counts keep increasing

2012-10-17 Thread Deepak Shakya
K. Thanks for the clarification. On Wed, Oct 17, 2012 at 7:10 PM, Ian Lea wrote: > I guess so. I don't pay much attention to these figures but > presumably IndexReader.numDocs() and numDeletedDocs() will adjust as > deletions get merged away. Try it and see. > > > -- > Ian. > > > On Wed, Oct 1

Re: Lucene updateDocument deletes the document, but the counts keep increasing

2012-10-17 Thread Ian Lea
I guess so. I don't pay much attention to these figures but presumably IndexReader.numDocs() and numDeletedDocs() will adjust as deletions get merged away. Try it and see. -- Ian. On Wed, Oct 17, 2012 at 2:09 PM, Deepak Shakya wrote: > Oh is it. So whenever in future these segments gets merg

Re: Lucene updateDocument deletes the document, but the counts keep increasing

2012-10-17 Thread Deepak Shakya
Oh is it. So whenever in future these segments gets merged, i will have my document count going down right? On Wed, Oct 17, 2012 at 6:33 PM, Ian Lea wrote: > Yes, IndexWriter.updateDocument() deletes and then adds. See the > javadocs. So your index will have deleted docs. Why do you care? > Th

Re: Lucene updateDocument deletes the document, but the counts keep increasing

2012-10-17 Thread Ian Lea
Yes, IndexWriter.updateDocument() deletes and then adds. See the javadocs. So your index will have deleted docs. Why do you care? They'll go away eventually as segments get merged. If you really do care, see IndexWriter,forceMergeDeletes(). See also the javadoc for that: This is often a horribl

Lucene updateDocument deletes the document, but the counts keep increasing

2012-10-17 Thread Deepak Shakya
I am using updateDocument() method to update my document in the lucene index. Here is how I am doing it. writer.updateDocument(new Term(Constants.DOC_ID_FIELD, doc.get(Constants.DOC_ID_FIELD)), doc); I check my index data with Luke, and find that on second run of the indexing, Luke tells that Del