Hi Ryan,
Why not preprocessing your documents with tools like Apache UIMA, GATE or
OpenNLP before indexing them in Lucene? GATE for instance has FST-based
gazetteers which would be perfect for your place names, AFAIK there is also
a Dictionary component for UIMA which would be a good match.
Julie
Hi folks,
I was recommended to use PrecedenceQueryParser if I want boolean precedence in
my queries. While examining this class, I have noticed that it and its super
class do not extend the QueryParser but have a separate
implementation/hierarchy. All other parsers in that package do extend the
Hi,
I am experimenting with the Lucene trunk (aka 4.0), especially with the new
IndexDocValues feature. I am trying to store some query-independent statistics
such as PageRank, etc. One stat that I am trying to store is the sum of all the
term frequencies in a document. This can be seen as the
Hey,
On Wed, Jan 4, 2012 at 1:15 PM, Hany Azzam wrote:
> Hi,
>
> I am experimenting with the Lucene trunk (aka 4.0), especially with the new
> IndexDocValues feature. I am trying to store some query-independent
> statistics such as PageRank, etc. One stat that I am trying to store is the
> sum
Hi Simon,
Thank you for your reply. The document length is just an example of what I need
to store. Another stat that I need is a *normalised* sum of the TF's. I can
compute this using my own cache during retrieval by extending the
SimilarityBase and storing the values in a cache that is used w