I am a complete newbie to the UIMA framework and I'm just following the steps given in this tutorial: http://www.ibm.com/developerworks/webservices/tutorials/ws-uima/section5.html to create an annotator for DNA sequences.
I have installed Apache UIMA SDK 2.4 and set UIMA_HOME variable in .bashrc and added UIMA_HOME/bin to the PATH. I am using Eclipse 4.2.2 on an Ubuntu 12.10 machine. I also added the UIMA_HOME directory to Eclipse's Classpath variable. And I have successfully imported and run some of the the uimaj-examples that came with the SDK. Now I was trying to create an Annotator and Analysis Engine for DNA sequences but when I try to run that with the Document Analyzer, I get a pop-up with following error: org.apache.uima.resource.ResourceInitializationException: Annotator class "bio.uima.DNASequenceAnnotator" was not found. (Descriptor: file: /home/name/workspace/DNAUima/descriptors/DNASequenceAEDescriptor.xml) CausedBy: java.lang.ClassNotFoundException: bio.uima.DNASequenceAnnotator Here are the steps I followed to create the annotator and analysis engine: 1. Created a new Java project in Eclipse. Created package: 'bio.uima'. 2. Created 'data' and 'descriptors' folders in the project root directory. 3. Created a type descriptor file using the Eclipse UIMA plugin in the 'descriptors' folder. Named the file: 'DNASequenceTypeSystemDescriptor.xml' 4. On the 'Type System' tab, added a new type. Named it: 'bio.uima.DNASequence' with supertype: 'uima.tcas.Annotation'. 5. Added a feature named: 'value' with range type: 'uima.cas.String'. This will hold the actual DNA sequence string. 6. Saved the type descriptor file. This automatically created 'DNASequence.java' and 'DNASequence_Type.java' in the 'bio.uima' package. Also added the required .jar files to the lib folder and configured the build path accordingly. At this point, Eclipse showed no errors. Just some warnings in the generated java files. 7. Created annotator class: Added new class to 'bio.uima' package, named the class DNASequenceAnnotator that extends JCasAnnotator_ImplBase. 8. Wrote code to match and search DNA sequences using regex in the overriden 'public void process(JCas aCas)' method. Stored document text from the JCas in a string called txt. Created a new object of DNASequence type for every match using the 'DNASequence(JCas jcas)' constructor. 9. Called annotation.setBegin(matcher.start()), annotation.setEnd(matcher.end()), annotation.setValue(txt.substring(matcher.start(), matcher.end())) and annotation.addToIndex(). 10. Created the Analysis Engine descriptor file in the 'descriptors' folder using the UIMA plugin. Named this file: DNASequenceAEDescriptor.xml. Set the Java class file to: bio.uima.DNASequenceAnnotator. Engine type: Primitive. Name: DNASequenceAEDescriptor. 11. In the Type System tab, clicked Set DataPath, and set the value to descriptors folder. Clicked Add and added DNASequenceTypeSystemDescriptor.xml. 12. In the Capabilities tab, clicked first line, clicked Add Type and clicked Output column for DNASequence. Edited features to only show begin, end and value instead of all features. 13. Ran the Document analyzer. Run > Run configurations > UIMA Document Analyzer. Project: uimaj-examples. Main class: org.apache.uima.tools.docanalyzer.DocumentAnalyzer. Clicked Run. This opened up the Document Analyzer. 14. Selected input and output data directories. And selected /descriptors/DNASequenceAEDescriptor.xml as the Analysis Engine XML Descriptor. 15. Clicked Run and got a pop up with the error described above. I made sure that I followed all the steps in the tutorial linked above. The exact same steps have been described in the pdf on the Project Documentation page. Can anyone help me out with this? What am I missing/doing wrong?
