I'm using this version of Lucenet: https://github.com/apache/lucenenet

I'm trying to get the number of phrase matches per document using a
PhraseQuery and an ExactPhraseScorer like so:

// Some phraseQuery defined here

using (IndexReader indexReader =
DirectoryReader.Open(IndexerJob.LuceneDirectory))
{
IndexSearcher indexSearcher = new IndexSearcher(indexReader);

TopDocs topDocs = indexSearcher.Search(masterQuery, _MAXSEARCHRESULTS);
var weight = phraseQuery.CreateWeight(indexSearcher);

var scorers = indexReader.Leaves.Select(o => weight.Scorer(o,
o.AtomicReader.LiveDocs)).Where(o => o != null);
foreach (var scorer in scorers)
{
while (scorer.NextDoc() != DocIdSetIterator.NO_MORE_DOCS)
{
int doc = scorer.DocID();
int freq = scorer.Freq();
Console.WriteLine("Document {0} contains {1} matches", doc, freq);
}
}
}

But when I call scorer.NextDoc(), it always returns
DocIdSetIterator.NO_MORE_DOCS, so the code in the while loop is never
executed. I tried this with a TermQuery instead of a PhraseQuery, and it
works fine. So the problem is with the implementation of PhraseQuery and
the ExactPhraseScorer.

I looked at the source code, and there seems to be a function in
ExactPhraseScorer:

private int PhraseFreq() { ... }

That is responsible for the calculation of the counts per document. Also
involved are the int[]'s Counts and Gens, but I don't really understand
what this is doing well enough to diagnose it.

Any ideas?

William

Reply via email to