Hi all,
recently I've been working with Solr to enable named entity recognition of
indexed documents which I did with UIMA so I wonder if that could be an
interesting use case for Stanbol as well.

For the mentioned purpose I've developed a custom UpdateHandler[1] for Solr
which enables enriching of documents being indexed with Apache UIMA on the
basis of the following use case:

   1. user sends documents to Solr
   2. each document received by Solr is sent to a UIMA analysis pipeline
   just before it gets indexed
   3. the UIMA pipeline extracts enrichments, i.e. named entites
   4. the enrichments are written to Solr fields on the basis of a mapping
   configuration
   5. the enriched Solr document is actually written inside the index

In my opinion that could be done also with Stanbol Enhancer.
Such an integration could run on top of the already developed contrib module
[2][3] or with a separate one written from scratch; obviously such options
have advantages and drawbacks we can discuss (later?).
What do you think?
Cheers,
Tommaso

[1] : http://wiki.apache.org/solr/SolrPlugins#UpdateHandler
[2] : http://svn.apache.org/repos/asf/lucene/dev/trunk/solr/contrib/uima/
[3] : http://wiki.apache.org/solr/SolrUIMA

Reply via email to