Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Solr Wiki" for change 
notification.

The "Solr4UIMA" page has been changed by MogenetiDev:
http://wiki.apache.org/solr/Solr4UIMA?action=diff&rev1=24&rev2=25

  ## page was copied from SolrUIMA
- = Solr UIMA integration =
+ = Solr 4 UIMA Tutorial =
  <!> [[Solr4.1]]
  
  <<TableOfContents>>
  
  Solr UIMA contrib enables enhancing of Solr documents using the Unstructured 
Information Management Architecture ([[http://uima.apache.org|UIMA]]).
- UIMA lets you define custom pipelines of Analysis Engines which incrementally 
add metadata to the document via annotations.
+ UIMA lets you define custom pipelines of Analysis Engines which incrementally 
add metadata to the document via annotations. In this tutorial we first install 
the Eclipse UIMA toolkit, create a custom UIMA Annotator, test the Annotator 
using the UIMA CAS Visual debugger, create a JAR file for use with Solr 4 and 
setup Solr to use the Annotator.
+ 
+ == Setup UIMA toolkit in Eclipse ==
+ 
+ More details can be found here:
+ 
[[http://uima.apache.org/downloads/releaseDocs/2.2.2-incubating/docs/html/overview_and_setup/overview_and_setup.html#ugr.ovv.eclipse_setup]]
+ 
+  1. Install Eclipse Modelling Framwork (EMF) from the Eclipse update site
+  2. Install Apache UIMA eclipse tooling from 
[[http://www.apache.org/dist/uima/eclipse-update-site]]
+  3. Install Apache UIMA from [[http://uima.apache.org/downloads.cgi]]
+  4. Open uimaj-examples (this will enable Run As functionality for the e.g. 
the JCas debugger)
+       * File - Import - General / Existing Projects into workspace - Select 
apache-uima folder
+       * This will automatically add uimaj-examples to the workspace
+ 
+ == Create your own UIMA Annotator ==
+ 
+ More details can be found here:
+ [[http://uima.apache.org/doc-uima-annotator.html]]
+ 
+  1.   Create a new Java project in your Eclipse workspace called 
RoomNumberAnnotator. To do this select "File -> New -> Java Project"
+         and use RoomNumberAnnotator as the project name. Also, in the Project 
Layout section, make sure the button to
+       "Create separate folders for sources and class files" is checked.
+  2.   Add the UIMA nature to the project by right-clicking on the 
"RoomNumberAnnotator" project and choose "Add UIMA Nature".
+       Confirm the upcoming dialogues with "Yes" to add the UIMA nature, 
pressing "OK", next, to confirm the status message dialog.
+       This will create a default directory layout of folders useful for 
annotator component development.
+  3.   Project - Right click - Add UIMA nature
+  4.   Configure build path (create Variable UIMA_HOME):
+       *       Right-click to the RoomNumberAnnotator project and choose Build 
Path -> Configure Build Path.
+       *       Click the "Add Variable..." button, and select the "UIMA_HOME" 
variable. Add new variable now, using the Configure Variables,           
setting it to the home directory where you have UIMA installed.
+       *       Click the "Extend..." button and chose the uima-core.jar in 
"lib" directory. You could add other jars from the UIMA lib, but the 
uima-core.jar is the only one needed for this project.
+       *       Finalize all dialogues with the "OK" button.
+  5.   Define Annotator type
+       *       Right-click on the "desc" folder of your project and choose 
"New -> Other"
+       *       Select "Analysis Engine Descriptor" from the "UIMA" folder and 
press "Next"
+       *       Enter "RoomNumberAnnotatorDescriptor.xml" as file name, and 
press "Finish"
+  6.   Add new type (RoomNumber) to the RoomNumberAnnotatorDescriptor.xml
+       *       Open the descriptor using the UIMA Component Descriptor Editor 
(CDE) by right-click to the "RoomNumberAnnotatorDescriptor.xml"
+               file and choose "Open With -> Component Descriptor Editor"
+       *       Select the "TypeSystem" tab at the bottom to show the type 
system definition page.
+       *       Press the "Add Type" button to add the new type. Use 
"org.apache.uima.tutorial.RoomNumber"
+               as type name and finish with "OK". The supertype 
"uima.tcas.Annotation" is correct
+  7.   Add new feature (building) to type RoomNumber
+       *       Select the "org.apache.uima.tutorial.RoomNumber" type by 
clicking it.
+       *       Click the "Add..." button to add a feature to the type and 
specify "building" as feature name and "uima.cas.String"
+               as range type. This means that the "building" feature is a 
String based feature.
+       *       Finish the dialog by clicking "OK".
+       *       Save the descriptor file
+  8.   Automatically create Java classes:
+       *       Open the descriptor file in the Component Descriptor Editor and 
select the "Type System" tab.
+       *       Press the "JCasGen" button that will trigger the Java class 
generation.
+               The generated classes will be added to the "src" folder of your 
project in a separate package.
+  9.   Write Java code for the Annotator
+       *       Right-click on the "src" folder and select "New -> Class"
+       *       Package: org.apache.uima.tutorial.ex1
+               Name: RoomNumberAnnotator
+               Superclass: 
org.apache.uima.analysis_component.JCasAnnotator_ImplBase
+  10.  Test the Annotator:
+       *       Run - Run as - Run configurations - Java Application - UIMA CAS 
Visual debugger
+       *       Select the "User Entries" in the classpath tab and press the 
"Add Projects..." button
+       *       Mark the "RoomNumberAnnotator" project in the upcoming dialog 
and finish with "OK"
+       *       Run the CAS Visual Debugger (CVD) by selecting "Run"
+       *       Choose "Run -> Load AE" and select the 
RoomNumberAnnotatorDescriptor.xml file in the desc folder of your Eclipse 
project
+       *       Copy and past the text below for testing to the text section of 
the CVD
+ 
+        {{{
+         April 7, 2004 Distillery Lunch Seminar
+       UIMA and its Metadata
+       12:00PM-1:00PM in HAW GN-K35
+ 
+       April 16, 2004 KM & I Department Tea
+       Title: An Eclipse-based TAE Configurator Tool
+       3:00PM-4:30PM in HAW GN-K35
+ 
+       May 11, 2004 UIMA Tutorial
+       9:00AM-5:00PM in YKT 20-001
+         }}}
+ 
+       *       To run the annotator on the specified text, choose "Run -> 
RunRoomNumberAnnotatorDescriptor"
+  11. Create JAR file from Project: Right-click on the Project - Export - Java 
- JAR file
+  12. Copy the JAR file to SOLR_HOME/example/solr/collection1/lib
+ 
+ 
  
  == SolrUIMA UpdateRequestProcessor ==
  The SolrUIMA UpdateRequestProcessor is a custom UpdateRequestProcessor that 
takes document(s) being indexed, sends them to a UIMA pipeline and then returns 
the document(s) enriched with the specified metadata.
  
  
  === Installation ===
-  1. Go to dev/solr/contrib/uima and run 'ant clean dist'
-  2. get the package apache-solr-uima-4.0-SNAPSHOT.jar together with the jars 
under the dev/solr/contrib/uima/lib directory and paste everything inside one 
of the lib directories of your Solr instance (defined inside the 
solrconfig.xml).  You may need to create the lib directory for a specific core. 
+  1. Download latest Solr 4.x release 
[[http://www.apache.org/dyn/closer.cgi/lucene/solr/]]
+  2. Copy the following files from the Solr release to the Solr document 
location you are using (in this case solr/example/solr/collection1)
    {{{
    mkdir solr/example/solr/collection1/lib
-   cp solr/dist/apache-solr-uima*.jar solr/example/solr/collection1/lib
+   cp solr/dist/solr-uima*.jar solr/example/solr/collection1/lib
    cp solr/contrib/uima/lib/*.jar solr/example/solr/collection1/lib/
-   cp 
solr/build/contrib/solr-uima/lucene-libs/lucene-analyzers-uima-4.0-SNAPSHOT.jar 
solr/example/solr/collection1/lib/
+   cp solr/contrib/uima/lucene-libs/lucene-analyzers-uima*.jar 
solr/example/solr/collection1/lib/
    }}}
  
-  3. modify your Solr instance config files as described in the 
[[https://svn.apache.org/repos/asf/lucene/dev/trunk/solr/contrib/uima/README.txt|solr/contrib/solr-uima/README.txt]]
+  3. Modify your Solr instance config files as described in the 
[[https://svn.apache.org/repos/asf/lucene/dev/trunk/solr/contrib/uima/README.txt|solr/contrib/solr-uima/README.txt]]
-  4. run your Solr instance and enjoy UIMA enriching documents being indexed
+  4. Run your Solr instance and enjoy UIMA enriching documents being indexed
  
  === Configuration ===
  
@@ -57, +138 @@

  see [[https://issues.apache.org/jira/browse/SOLR-2129|SOLR-2129]]
  
  === UIMA components used ===
- UIMA supports the use of existing analysis engines (see 
[[http://uima.apache.org/sandbox.html|here]] and 
[[http://uima.apache.org/external-resources.html|here]]) as long as the 
creation of custom components. 
+ UIMA supports the use of existing analysis engines (see 
[[http://uima.apache.org/sandbox.html|here]] and 
[[http://uima.apache.org/external-resources.html|here]]) as long as the 
creation of custom components.
  
  The current contrib/uima module uses a predefined set of components :
   1. 
[[http://uima.apache.org/sandbox.html#whitespace.tokenizer|WhitespaceTokenizer]]
@@ -105, +186 @@

  
  One can use the default one bundled inside the component or create a new one.
  
- For example to use one of the default Dictionary Annotator Analysis Engine 
descriptors use the following (which runs Whitespace Tokenizer and then 
Dictionary Annotator): 
+ For example to use one of the default Dictionary Annotator Analysis Engine 
descriptors use the following (which runs Whitespace Tokenizer and then 
Dictionary Annotator):
  {{{
    <config>
      ...

Reply via email to