Re: How to create and use a repository for UIMA annotators?

swirl Mon, 06 May 2013 18:48:21 -0700

Richard Eckart de Castilho <eckartde@...> writes:

> 
> Hello Greg,
> 
> > It's sort of a "maven-like" model (i.e. when using a Nexus server).  Or 
maybe I should just actually use
> maven and nexus?
> > 
> > Has anyone out there tried to create a "UIMA Repository" that can be 
directly referenced from a component
> descriptor file?  How did you make it work?
> 
> We consider ourselves to have a "UIMA Repository" based on Maven - cf. 
DKPro Core http://code.google.com/p/dkpro-core-asl/
> 
> I would like to point out that we have largely abandonded static UIMA 
descriptors (except type descriptors).
> 
> We feel very comfortable programming on the Java level, dynamically 
creating descriptors using uimaFIT
> and running our pipelines directly from within Java (no CPE GUI or such).
> For this scenario, Maven works like a charm for us. We do not even worry 
too much about type systems, because
> we have packaged their XML descriptors and JCas
> wrappers in JARs as well and can simply add them as Maven dependencies. We 
use uimaFIT's automatic type
> system detection feature to dynamically construct a
> global type system description from all type system description files that 
could be found in a
> well-defined location in the classpath (that is, in the afore
> mentioned JARs). A short example:
> 
>   * add dependency on de.tudarmstadt.ukp.dkpro.core.io.text-asl (for 
TextReader)
>   * add dependency on de.tudarmstadt.ukp.dkpro.core.tokit-asl (for 
BreakIteratorSegmenter)
>   * add dependency on de.tudarmstadt.ukp.dkpro.core.dictionaryannotator-
asl (for DictionaryAnnotator)
>   * dependency on uimaFIT automatically added (for CASDumpWriter)
>   * dependencies on type systems and JCas wrappers automatically added by 
Maven
> 
> Then we can immediately assemble and run a pipeline:
> 
>     CollectionReader reader = createCollectionReader(TextReader.class,
>         TextReader.PARAM_PATH, "src/test/resources/text",
>         TextReader.PARAM_PATTERNS, new String[] { "[+]*.txt", "[-
]broken.txt" },
>         TextReader.PARAM_LANGUAGE, "en");
> 
>     AnalysisEngine tokenizer = 
createPrimitive(BreakIteratorSegmenter.class);
> 
>     AnalysisEngine nameFinder = createPrimitive(DictionaryAnnotator.class,
>         DictionaryAnnotator.PARAM_PHRASE_FILE, 
"src/test/resources/dictionaries/names.txt",
>         DictionaryAnnotator.PARAM_ANNOTATION_TYPE, Name.class.getName());
> 
>     AnalysisEngine writer = createPrimitive(CASDumpWriter.class,
>         CASDumpWriter.PARAM_OUTPUT_FILE, "target/output.txt");
> 
>     SimplePipeline.runPipeline(reader, tokenizer, nameFinder, writer);
> 
> Notice that no line references a type system whatsoever. This is because 
we let uimaFIT automatically scan
> the classpath and simply make all
> types it finds available to every created component.
> 
> Our approach seems to work great for our researchers to assemble and run 
pipelines on a single machine. We do
> currently not scale out UIMA.
> 
> Cheers,
> 
> Richard
>



Hi Richard,
Would you mind showing me how the uimafit is able to "dynamically construct 
a global type system description from all type system description files that 
could be found in a well-defined location in the classpath".

Do you rely on using the umiafit/types.txt file?
If so, how you specify it such that it is able to pick up the type 
description files in the classpath? I looked into the 
de.tudarmstadt.ukp.dkpro.core.tokit-asl-1.4.0.jar but there are no type 
description file inside the JAR itself.

Re: How to create and use a repository for UIMA annotators?

Reply via email to