Hi Marc,
In some shameless self-promotion, I've written up some worked Lucene
examples (maybe a little more focused on Lucene internals than best
practices) over at https://github.com/msfroh/lucene-university. If you have
anything you'd like to understand better, feel free to open issues there
and
Hi Marc,
You seem to hit all the questions we had too :)
The 10k vs 100k sample size was mainly influenced by the users, 100k is
slower but more accurate and less chance of missing that 1 doc that had the
outlier value.
Our basic hybrid approach that we settled on in our application was to show