I'm using this version of Lucenet: https://github.com/apache/lucenenet
I'm trying to get the number of phrase matches per document using a
PhraseQuery and an ExactPhraseScorer like so:
// Some phraseQuery defined here
using (IndexReader indexReader =
DirectoryReader.Open(IndexerJob.LuceneDirectory))
{
IndexSearcher indexSearcher = new IndexSearcher(indexReader);
TopDocs topDocs = indexSearcher.Search(masterQuery, _MAXSEARCHRESULTS);
var weight = phraseQuery.CreateWeight(indexSearcher);
var scorers = indexReader.Leaves.Select(o => weight.Scorer(o,
o.AtomicReader.LiveDocs)).Where(o => o != null);
foreach (var scorer in scorers)
{
while (scorer.NextDoc() != DocIdSetIterator.NO_MORE_DOCS)
{
int doc = scorer.DocID();
int freq = scorer.Freq();
Console.WriteLine("Document {0} contains {1} matches", doc, freq);
}
}
}
But when I call scorer.NextDoc(), it always returns
DocIdSetIterator.NO_MORE_DOCS, so the code in the while loop is never
executed. I tried this with a TermQuery instead of a PhraseQuery, and it
works fine. So the problem is with the implementation of PhraseQuery and
the ExactPhraseScorer.
I looked at the source code, and there seems to be a function in
ExactPhraseScorer:
private int PhraseFreq() { ... }
That is responsible for the calculation of the counts per document. Also
involved are the int[]'s Counts and Gens, but I don't really understand
what this is doing well enough to diagnose it.
Any ideas?
William