Dear Wiki user, You have subscribed to a wiki page or wiki category on "Solr Wiki" for change notification.
The "Solr4UIMA" page has been changed by MogenetiDev: http://wiki.apache.org/solr/Solr4UIMA?action=diff&rev1=24&rev2=25 ## page was copied from SolrUIMA - = Solr UIMA integration = + = Solr 4 UIMA Tutorial = <!> [[Solr4.1]] <<TableOfContents>> Solr UIMA contrib enables enhancing of Solr documents using the Unstructured Information Management Architecture ([[http://uima.apache.org|UIMA]]). - UIMA lets you define custom pipelines of Analysis Engines which incrementally add metadata to the document via annotations. + UIMA lets you define custom pipelines of Analysis Engines which incrementally add metadata to the document via annotations. In this tutorial we first install the Eclipse UIMA toolkit, create a custom UIMA Annotator, test the Annotator using the UIMA CAS Visual debugger, create a JAR file for use with Solr 4 and setup Solr to use the Annotator. + + == Setup UIMA toolkit in Eclipse == + + More details can be found here: + [[http://uima.apache.org/downloads/releaseDocs/2.2.2-incubating/docs/html/overview_and_setup/overview_and_setup.html#ugr.ovv.eclipse_setup]] + + 1. Install Eclipse Modelling Framwork (EMF) from the Eclipse update site + 2. Install Apache UIMA eclipse tooling from [[http://www.apache.org/dist/uima/eclipse-update-site]] + 3. Install Apache UIMA from [[http://uima.apache.org/downloads.cgi]] + 4. Open uimaj-examples (this will enable Run As functionality for the e.g. the JCas debugger) + * File - Import - General / Existing Projects into workspace - Select apache-uima folder + * This will automatically add uimaj-examples to the workspace + + == Create your own UIMA Annotator == + + More details can be found here: + [[http://uima.apache.org/doc-uima-annotator.html]] + + 1. Create a new Java project in your Eclipse workspace called RoomNumberAnnotator. To do this select "File -> New -> Java Project" + and use RoomNumberAnnotator as the project name. Also, in the Project Layout section, make sure the button to + "Create separate folders for sources and class files" is checked. + 2. Add the UIMA nature to the project by right-clicking on the "RoomNumberAnnotator" project and choose "Add UIMA Nature". + Confirm the upcoming dialogues with "Yes" to add the UIMA nature, pressing "OK", next, to confirm the status message dialog. + This will create a default directory layout of folders useful for annotator component development. + 3. Project - Right click - Add UIMA nature + 4. Configure build path (create Variable UIMA_HOME): + * Right-click to the RoomNumberAnnotator project and choose Build Path -> Configure Build Path. + * Click the "Add Variable..." button, and select the "UIMA_HOME" variable. Add new variable now, using the Configure Variables, setting it to the home directory where you have UIMA installed. + * Click the "Extend..." button and chose the uima-core.jar in "lib" directory. You could add other jars from the UIMA lib, but the uima-core.jar is the only one needed for this project. + * Finalize all dialogues with the "OK" button. + 5. Define Annotator type + * Right-click on the "desc" folder of your project and choose "New -> Other" + * Select "Analysis Engine Descriptor" from the "UIMA" folder and press "Next" + * Enter "RoomNumberAnnotatorDescriptor.xml" as file name, and press "Finish" + 6. Add new type (RoomNumber) to the RoomNumberAnnotatorDescriptor.xml + * Open the descriptor using the UIMA Component Descriptor Editor (CDE) by right-click to the "RoomNumberAnnotatorDescriptor.xml" + file and choose "Open With -> Component Descriptor Editor" + * Select the "TypeSystem" tab at the bottom to show the type system definition page. + * Press the "Add Type" button to add the new type. Use "org.apache.uima.tutorial.RoomNumber" + as type name and finish with "OK". The supertype "uima.tcas.Annotation" is correct + 7. Add new feature (building) to type RoomNumber + * Select the "org.apache.uima.tutorial.RoomNumber" type by clicking it. + * Click the "Add..." button to add a feature to the type and specify "building" as feature name and "uima.cas.String" + as range type. This means that the "building" feature is a String based feature. + * Finish the dialog by clicking "OK". + * Save the descriptor file + 8. Automatically create Java classes: + * Open the descriptor file in the Component Descriptor Editor and select the "Type System" tab. + * Press the "JCasGen" button that will trigger the Java class generation. + The generated classes will be added to the "src" folder of your project in a separate package. + 9. Write Java code for the Annotator + * Right-click on the "src" folder and select "New -> Class" + * Package: org.apache.uima.tutorial.ex1 + Name: RoomNumberAnnotator + Superclass: org.apache.uima.analysis_component.JCasAnnotator_ImplBase + 10. Test the Annotator: + * Run - Run as - Run configurations - Java Application - UIMA CAS Visual debugger + * Select the "User Entries" in the classpath tab and press the "Add Projects..." button + * Mark the "RoomNumberAnnotator" project in the upcoming dialog and finish with "OK" + * Run the CAS Visual Debugger (CVD) by selecting "Run" + * Choose "Run -> Load AE" and select the RoomNumberAnnotatorDescriptor.xml file in the desc folder of your Eclipse project + * Copy and past the text below for testing to the text section of the CVD + + {{{ + April 7, 2004 Distillery Lunch Seminar + UIMA and its Metadata + 12:00PM-1:00PM in HAW GN-K35 + + April 16, 2004 KM & I Department Tea + Title: An Eclipse-based TAE Configurator Tool + 3:00PM-4:30PM in HAW GN-K35 + + May 11, 2004 UIMA Tutorial + 9:00AM-5:00PM in YKT 20-001 + }}} + + * To run the annotator on the specified text, choose "Run -> RunRoomNumberAnnotatorDescriptor" + 11. Create JAR file from Project: Right-click on the Project - Export - Java - JAR file + 12. Copy the JAR file to SOLR_HOME/example/solr/collection1/lib + + == SolrUIMA UpdateRequestProcessor == The SolrUIMA UpdateRequestProcessor is a custom UpdateRequestProcessor that takes document(s) being indexed, sends them to a UIMA pipeline and then returns the document(s) enriched with the specified metadata. === Installation === - 1. Go to dev/solr/contrib/uima and run 'ant clean dist' - 2. get the package apache-solr-uima-4.0-SNAPSHOT.jar together with the jars under the dev/solr/contrib/uima/lib directory and paste everything inside one of the lib directories of your Solr instance (defined inside the solrconfig.xml). You may need to create the lib directory for a specific core. + 1. Download latest Solr 4.x release [[http://www.apache.org/dyn/closer.cgi/lucene/solr/]] + 2. Copy the following files from the Solr release to the Solr document location you are using (in this case solr/example/solr/collection1) {{{ mkdir solr/example/solr/collection1/lib - cp solr/dist/apache-solr-uima*.jar solr/example/solr/collection1/lib + cp solr/dist/solr-uima*.jar solr/example/solr/collection1/lib cp solr/contrib/uima/lib/*.jar solr/example/solr/collection1/lib/ - cp solr/build/contrib/solr-uima/lucene-libs/lucene-analyzers-uima-4.0-SNAPSHOT.jar solr/example/solr/collection1/lib/ + cp solr/contrib/uima/lucene-libs/lucene-analyzers-uima*.jar solr/example/solr/collection1/lib/ }}} - 3. modify your Solr instance config files as described in the [[https://svn.apache.org/repos/asf/lucene/dev/trunk/solr/contrib/uima/README.txt|solr/contrib/solr-uima/README.txt]] + 3. Modify your Solr instance config files as described in the [[https://svn.apache.org/repos/asf/lucene/dev/trunk/solr/contrib/uima/README.txt|solr/contrib/solr-uima/README.txt]] - 4. run your Solr instance and enjoy UIMA enriching documents being indexed + 4. Run your Solr instance and enjoy UIMA enriching documents being indexed === Configuration === @@ -57, +138 @@ see [[https://issues.apache.org/jira/browse/SOLR-2129|SOLR-2129]] === UIMA components used === - UIMA supports the use of existing analysis engines (see [[http://uima.apache.org/sandbox.html|here]] and [[http://uima.apache.org/external-resources.html|here]]) as long as the creation of custom components. + UIMA supports the use of existing analysis engines (see [[http://uima.apache.org/sandbox.html|here]] and [[http://uima.apache.org/external-resources.html|here]]) as long as the creation of custom components. The current contrib/uima module uses a predefined set of components : 1. [[http://uima.apache.org/sandbox.html#whitespace.tokenizer|WhitespaceTokenizer]] @@ -105, +186 @@ One can use the default one bundled inside the component or create a new one. - For example to use one of the default Dictionary Annotator Analysis Engine descriptors use the following (which runs Whitespace Tokenizer and then Dictionary Annotator): + For example to use one of the default Dictionary Annotator Analysis Engine descriptors use the following (which runs Whitespace Tokenizer and then Dictionary Annotator): {{{ <config> ...