Luke handler questions
Hi, I'm looking at an index with the Luke handler and see something that makes no sense to me: lst name=itemid str name=typestring/str str name=schemaI-SOl/str str name=indexI-SO-/str int name=docs1138826/int int name=distinct1138826/int lst name=topTerms int name=INBMA001343200809012/int Note how docs # == distinct #. That looks good and makes sense - each document has a unique itemid. But then look at topTerms. What does number 2 represent there? I thought it was the term frequency. If so, then the above says there are 2 documents with itemid=INBMA00134320080901 and that conflicts with docs # == distinct #. Thanks, Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
Re: Luke handler questions
On Thu, Sep 4, 2008 at 1:26 PM, Otis Gospodnetic [EMAIL PROTECTED] wrote: Note how docs # == distinct #. That looks good and makes sense - each document has a unique itemid. But then look at topTerms. What does number 2 represent there? I thought it was the term frequency. If so, then the above says there are 2 documents with itemid=INBMA00134320080901 and that conflicts with docs # == distinct #. Remember that the Lucene term frequency does not take into account deleted documents. So in this case, INBMA00134320080901 was probably overwritten. -Yonik