Richard,

I find the idea of enabling maven by packaging components in jars very compelling. Have you dealt with third-party code that expects to find
resources from file system locations rather than classpath names?

-Chris

On 3/2/11 12:49 AM, Richard Eckart de Castilho wrote:
Hello Greg,

It's sort of a "maven-like" model (i.e. when using a Nexus server).  Or maybe I 
should just actually use maven and nexus?

Has anyone out there tried to create a "UIMA Repository" that can be directly 
referenced from a component descriptor file?  How did you make it work?

We consider ourselves to have a "UIMA Repository" based on Maven - cf. DKPro 
Core http://code.google.com/p/dkpro-core-asl/

I would like to point out that we have largely abandonded static UIMA 
descriptors (except type descriptors).

We feel very comfortable programming on the Java level, dynamically creating 
descriptors using uimaFIT and running our pipelines directly from within Java 
(no CPE GUI or such).
For this scenario, Maven works like a charm for us. We do not even worry too 
much about type systems, because we have packaged their XML descriptors and JCas
wrappers in JARs as well and can simply add them as Maven dependencies. We use 
uimaFIT's automatic type system detection feature to dynamically construct a
global type system description from all type system description files that 
could be found in a well-defined location in the classpath (that is, in the 
afore
mentioned JARs). A short example:

   * add dependency on de.tudarmstadt.ukp.dkpro.core.io.text-asl (for 
TextReader)
   * add dependency on de.tudarmstadt.ukp.dkpro.core.tokit-asl (for 
BreakIteratorSegmenter)
   * add dependency on de.tudarmstadt.ukp.dkpro.core.dictionaryannotator-asl 
(for DictionaryAnnotator)
   * dependency on uimaFIT automatically added (for CASDumpWriter)
   * dependencies on type systems and JCas wrappers automatically added by Maven

Then we can immediately assemble and run a pipeline:

     CollectionReader reader = createCollectionReader(TextReader.class,
         TextReader.PARAM_PATH, "src/test/resources/text",
         TextReader.PARAM_PATTERNS, new String[] { "[+]*.txt", "[-]broken.txt" 
},
         TextReader.PARAM_LANGUAGE, "en");

     AnalysisEngine tokenizer = createPrimitive(BreakIteratorSegmenter.class);

     AnalysisEngine nameFinder = createPrimitive(DictionaryAnnotator.class,
         DictionaryAnnotator.PARAM_PHRASE_FILE, 
"src/test/resources/dictionaries/names.txt",
         DictionaryAnnotator.PARAM_ANNOTATION_TYPE, Name.class.getName());

     AnalysisEngine writer = createPrimitive(CASDumpWriter.class,
         CASDumpWriter.PARAM_OUTPUT_FILE, "target/output.txt");

     SimplePipeline.runPipeline(reader, tokenizer, nameFinder, writer);

Notice that no line references a type system whatsoever. This is because we let 
uimaFIT automatically scan the classpath and simply make all
types it finds available to every created component.

Our approach seems to work great for our researchers to assemble and run 
pipelines on a single machine. We do currently not scale out UIMA.

Cheers,

Richard



--
Christophe (Chris) Roeder
Software Developer, Professional Research Assistant
Center for Computational Pharmacology, University of Colorado Denver
12801 E 17th Ave, MS 8303,  Aurora, CO 80045 USA
[email protected] / tel: (303) 724-7574

Reply via email to