Re: How to create and use a repository for UIMA annotators?

Richard Eckart de Castilho Tue, 01 Mar 2011 23:50:24 -0800

Hello Greg,

> It's sort of a "maven-like" model (i.e. when using a Nexus server).  Or maybe 
> I should just actually use maven and nexus?
> 
> Has anyone out there tried to create a "UIMA Repository" that can be directly 
> referenced from a component descriptor file?  How did you make it work?


We consider ourselves to have a "UIMA Repository" based on Maven - cf. DKPro 
Core http://code.google.com/p/dkpro-core-asl/

I would like to point out that we have largely abandonded static UIMA 
descriptors (except type descriptors).

We feel very comfortable programming on the Java level, dynamically creating 
descriptors using uimaFIT and running our pipelines directly from within Java 
(no CPE GUI or such).
For this scenario, Maven works like a charm for us. We do not even worry too 
much about type systems, because we have packaged their XML descriptors and JCas
wrappers in JARs as well and can simply add them as Maven dependencies. We use 
uimaFIT's automatic type system detection feature to dynamically construct a
global type system description from all type system description files that 
could be found in a well-defined location in the classpath (that is, in the 
afore
mentioned JARs). A short example:

  * add dependency on de.tudarmstadt.ukp.dkpro.core.io.text-asl (for TextReader)
  * add dependency on de.tudarmstadt.ukp.dkpro.core.tokit-asl (for 
BreakIteratorSegmenter)
  * add dependency on de.tudarmstadt.ukp.dkpro.core.dictionaryannotator-asl 
(for DictionaryAnnotator)
  * dependency on uimaFIT automatically added (for CASDumpWriter)
  * dependencies on type systems and JCas wrappers automatically added by Maven

Then we can immediately assemble and run a pipeline:

    CollectionReader reader = createCollectionReader(TextReader.class,
        TextReader.PARAM_PATH, "src/test/resources/text",
        TextReader.PARAM_PATTERNS, new String[] { "[+]*.txt", "[-]broken.txt" },
        TextReader.PARAM_LANGUAGE, "en");

    AnalysisEngine tokenizer = createPrimitive(BreakIteratorSegmenter.class);

    AnalysisEngine nameFinder = createPrimitive(DictionaryAnnotator.class,
        DictionaryAnnotator.PARAM_PHRASE_FILE, 
"src/test/resources/dictionaries/names.txt",
        DictionaryAnnotator.PARAM_ANNOTATION_TYPE, Name.class.getName());

    AnalysisEngine writer = createPrimitive(CASDumpWriter.class,
        CASDumpWriter.PARAM_OUTPUT_FILE, "target/output.txt");

    SimplePipeline.runPipeline(reader, tokenizer, nameFinder, writer);

Notice that no line references a type system whatsoever. This is because we let 
uimaFIT automatically scan the classpath and simply make all
types it finds available to every created component.

Our approach seems to work great for our researchers to assemble and run 
pipelines on a single machine. We do currently not scale out UIMA.

Cheers,

Richard

-- 
------------------------------------------------------------------- 
Richard Eckart de Castilho
Technical Lead
Ubiquitous Knowledge Processing Lab 
FB 20 Computer Science Department      
Technische Universität Darmstadt 
Hochschulstr. 10, D-64289 Darmstadt, Germany 
phone +49 (6151) 16-7477, fax -5455, room S2/02/E225
[email protected] 
www.ukp.tu-darmstadt.de 
-------------------------------------------------------------------

Re: How to create and use a repository for UIMA annotators?

Reply via email to