Richard,
I find the idea of enabling maven by packaging components in jars very
compelling. Have you dealt with third-party code that expects to find
resources from file system locations rather than classpath names?
-Chris
On 3/2/11 12:49 AM, Richard Eckart de Castilho wrote:
Hello Greg,
It's sort of a "maven-like" model (i.e. when using a Nexus server). Or maybe I
should just actually use maven and nexus?
Has anyone out there tried to create a "UIMA Repository" that can be directly
referenced from a component descriptor file? How did you make it work?
We consider ourselves to have a "UIMA Repository" based on Maven - cf. DKPro
Core http://code.google.com/p/dkpro-core-asl/
I would like to point out that we have largely abandonded static UIMA
descriptors (except type descriptors).
We feel very comfortable programming on the Java level, dynamically creating
descriptors using uimaFIT and running our pipelines directly from within Java
(no CPE GUI or such).
For this scenario, Maven works like a charm for us. We do not even worry too
much about type systems, because we have packaged their XML descriptors and JCas
wrappers in JARs as well and can simply add them as Maven dependencies. We use
uimaFIT's automatic type system detection feature to dynamically construct a
global type system description from all type system description files that
could be found in a well-defined location in the classpath (that is, in the
afore
mentioned JARs). A short example:
* add dependency on de.tudarmstadt.ukp.dkpro.core.io.text-asl (for
TextReader)
* add dependency on de.tudarmstadt.ukp.dkpro.core.tokit-asl (for
BreakIteratorSegmenter)
* add dependency on de.tudarmstadt.ukp.dkpro.core.dictionaryannotator-asl
(for DictionaryAnnotator)
* dependency on uimaFIT automatically added (for CASDumpWriter)
* dependencies on type systems and JCas wrappers automatically added by Maven
Then we can immediately assemble and run a pipeline:
CollectionReader reader = createCollectionReader(TextReader.class,
TextReader.PARAM_PATH, "src/test/resources/text",
TextReader.PARAM_PATTERNS, new String[] { "[+]*.txt", "[-]broken.txt"
},
TextReader.PARAM_LANGUAGE, "en");
AnalysisEngine tokenizer = createPrimitive(BreakIteratorSegmenter.class);
AnalysisEngine nameFinder = createPrimitive(DictionaryAnnotator.class,
DictionaryAnnotator.PARAM_PHRASE_FILE,
"src/test/resources/dictionaries/names.txt",
DictionaryAnnotator.PARAM_ANNOTATION_TYPE, Name.class.getName());
AnalysisEngine writer = createPrimitive(CASDumpWriter.class,
CASDumpWriter.PARAM_OUTPUT_FILE, "target/output.txt");
SimplePipeline.runPipeline(reader, tokenizer, nameFinder, writer);
Notice that no line references a type system whatsoever. This is because we let
uimaFIT automatically scan the classpath and simply make all
types it finds available to every created component.
Our approach seems to work great for our researchers to assemble and run
pipelines on a single machine. We do currently not scale out UIMA.
Cheers,
Richard
--
Christophe (Chris) Roeder
Software Developer, Professional Research Assistant
Center for Computational Pharmacology, University of Colorado Denver
12801 E 17th Ave, MS 8303, Aurora, CO 80045 USA
[email protected] / tel: (303) 724-7574