Hi all, as you may see reading the wiki page linked by Jörn that integration enables calling a UIMA pipeline from inside a Solr instance. This is done via a dedicated component one can add to the chain of UpdateRequestProcessors that are responsible of processing documents when they come to Solr to be indexed.
So, if you add the UIMAUpdateRequestProcessor to the Solr update chain, the flow goes like: 1. Document is sent to Solr to be indexed 2. Solr reads the (configurable) fields which contain text to be sent to UIMA inside the CAS 3. Solr sends the CAS to an AnalysisEngine (configurable) 4. Once the UIMA pipeline has ended, Solr writes UIMA annotations' feature values to the Solr document's fields (the mapping is configurable) 5. Solr sends the enriched document to the next processor of the Solr update chain (which leads finally to the actual writing to the index) The aggregate Analysis Engine shipped with Solr uses some Sandbox, aka UIMA Addons, components (WhitespaceTokenizer, HMMTagger, AlchemyAPIAnnotator, OpenCalaisAnnotator) to demonstrate some basic enrichment capabilities. That can obviously be changed/extended as one wish. The current implementation runs UIMA pipelines with the simplest way an external app could do [1] but it'd be good to provide support to add support for CPEs and UIMA-AS, so anyone interested in helping with that is welcome. Obviously, any feedback is welcome too :) Tommaso [1] : http://uima.apache.org/d/uimaj-2.3.1/tutorials_and_users_guides.html#ugr.tug.application.using_aes 2011/4/4 Jörn Kottmann <[email protected]> > Hi all, > > some might already know it, the new Solr 3.1 has now support for UIMA: > http://wiki.apache.org/solr/SolrUIMA > > Jörn >
