Fixed the issue (HTML elements pollution of relevancy by
existing on all pages.)... serves as an excellent example of how little
effort it takes to add a custom processor in a JesterJ project
https://github.com/nsoft/index-solr-ref-guide/issues/1
Now search q=hdfs only matches 7 pages, not all
Boy, do I remember this "I did a cool thing and nobody looked" feeling for
Solr RefGuide.
But if it could be useful for this project, my Guide import code is still
public. I actually read the content straight from ASCIIDoc internal
representation as opposed to Tika:
https://github.com/arafalov/sol