Not able to run annotator for DNA sequence

Farhang Sat, 30 Mar 2013 12:40:14 -0700

I am a complete newbie to the UIMA framework and I'm just following the steps
given in this tutorial: 
http://www.ibm.com/developerworks/webservices/tutorials/ws-uima/section5.html
to create an annotator for DNA sequences.


I have installed Apache UIMA SDK 2.4 and set UIMA_HOME variable in .bashrc and
added UIMA_HOME/bin to the PATH.
I am using Eclipse 4.2.2 on an Ubuntu 12.10 machine. I also added the UIMA_HOME
directory to Eclipse's Classpath variable.
And I have successfully imported and run some of the the uimaj-examples that
came with the SDK.

Now I was trying to create an Annotator and Analysis Engine for DNA sequences
but when I try to run that with the Document Analyzer, I get a pop-up with
following error:
org.apache.uima.resource.ResourceInitializationException: Annotator class 
"bio.uima.DNASequenceAnnotator" was not found. (Descriptor: file: 
/home/name/workspace/DNAUima/descriptors/DNASequenceAEDescriptor.xml)
CausedBy: java.lang.ClassNotFoundException: bio.uima.DNASequenceAnnotator

Here are the steps I followed to create the annotator and analysis engine:
1. Created a new Java project in Eclipse. Created package: 'bio.uima'.
2. Created 'data' and 'descriptors' folders in the project root directory.
3. Created a type descriptor file using the Eclipse UIMA plugin in the 
'descriptors' folder. Named the file: 'DNASequenceTypeSystemDescriptor.xml'
4. On the 'Type System' tab, added a new type. Named it: 'bio.uima.DNASequence' 
with supertype: 'uima.tcas.Annotation'.
5. Added a feature named: 'value' with range type: 'uima.cas.String'. This will 
hold the actual DNA sequence string.
6. Saved the type descriptor file. This automatically created
'DNASequence.java' and 'DNASequence_Type.java' in the 'bio.uima' package. Also 
added the required .jar files to the lib folder and configured the build path 
accordingly. At this point, Eclipse showed no errors. Just some warnings in
the generated java files.
7. Created annotator class: Added new class to 'bio.uima' package, named the 
class DNASequenceAnnotator that extends JCasAnnotator_ImplBase.
8. Wrote code to match and search DNA sequences using regex in the overriden 
'public void process(JCas aCas)' method. Stored document text from the JCas
in a string called txt. Created a new object of DNASequence type for every
match using the 'DNASequence(JCas jcas)' constructor.
9. Called annotation.setBegin(matcher.start()),
annotation.setEnd(matcher.end()),
annotation.setValue(txt.substring(matcher.start(), matcher.end())) and
annotation.addToIndex().
10. Created the Analysis Engine descriptor file in the 'descriptors' folder
using the UIMA plugin. Named this file: DNASequenceAEDescriptor.xml. Set the
Java class file to: bio.uima.DNASequenceAnnotator. Engine type: Primitive.
Name: DNASequenceAEDescriptor.
11. In the Type System tab, clicked Set DataPath, and set the value to 
descriptors folder. Clicked Add and added DNASequenceTypeSystemDescriptor.xml.
12. In the Capabilities tab, clicked first line, clicked Add Type and clicked 
Output column for DNASequence. Edited features to only show begin, end and
value instead of all features.
13. Ran the Document analyzer. Run > Run configurations > UIMA Document
Analyzer. Project: uimaj-examples. Main class: 
org.apache.uima.tools.docanalyzer.DocumentAnalyzer. Clicked Run. This
opened up the Document Analyzer.
14. Selected input and output data directories. And selected 
/descriptors/DNASequenceAEDescriptor.xml as the Analysis Engine XML Descriptor.
15. Clicked Run and got a pop up with the error described above.

I made sure that I followed all the steps in the tutorial linked above. The 
exact same steps have been described in the pdf on the Project Documentation 
page.
Can anyone help me out with this? What am I missing/doing wrong?

Not able to run annotator for DNA sequence

Reply via email to