Hello Greg, > It's sort of a "maven-like" model (i.e. when using a Nexus server). Or maybe > I should just actually use maven and nexus? > > Has anyone out there tried to create a "UIMA Repository" that can be directly > referenced from a component descriptor file? How did you make it work?
We consider ourselves to have a "UIMA Repository" based on Maven - cf. DKPro Core http://code.google.com/p/dkpro-core-asl/ I would like to point out that we have largely abandonded static UIMA descriptors (except type descriptors). We feel very comfortable programming on the Java level, dynamically creating descriptors using uimaFIT and running our pipelines directly from within Java (no CPE GUI or such). For this scenario, Maven works like a charm for us. We do not even worry too much about type systems, because we have packaged their XML descriptors and JCas wrappers in JARs as well and can simply add them as Maven dependencies. We use uimaFIT's automatic type system detection feature to dynamically construct a global type system description from all type system description files that could be found in a well-defined location in the classpath (that is, in the afore mentioned JARs). A short example: * add dependency on de.tudarmstadt.ukp.dkpro.core.io.text-asl (for TextReader) * add dependency on de.tudarmstadt.ukp.dkpro.core.tokit-asl (for BreakIteratorSegmenter) * add dependency on de.tudarmstadt.ukp.dkpro.core.dictionaryannotator-asl (for DictionaryAnnotator) * dependency on uimaFIT automatically added (for CASDumpWriter) * dependencies on type systems and JCas wrappers automatically added by Maven Then we can immediately assemble and run a pipeline: CollectionReader reader = createCollectionReader(TextReader.class, TextReader.PARAM_PATH, "src/test/resources/text", TextReader.PARAM_PATTERNS, new String[] { "[+]*.txt", "[-]broken.txt" }, TextReader.PARAM_LANGUAGE, "en"); AnalysisEngine tokenizer = createPrimitive(BreakIteratorSegmenter.class); AnalysisEngine nameFinder = createPrimitive(DictionaryAnnotator.class, DictionaryAnnotator.PARAM_PHRASE_FILE, "src/test/resources/dictionaries/names.txt", DictionaryAnnotator.PARAM_ANNOTATION_TYPE, Name.class.getName()); AnalysisEngine writer = createPrimitive(CASDumpWriter.class, CASDumpWriter.PARAM_OUTPUT_FILE, "target/output.txt"); SimplePipeline.runPipeline(reader, tokenizer, nameFinder, writer); Notice that no line references a type system whatsoever. This is because we let uimaFIT automatically scan the classpath and simply make all types it finds available to every created component. Our approach seems to work great for our researchers to assemble and run pipelines on a single machine. We do currently not scale out UIMA. Cheers, Richard -- ------------------------------------------------------------------- Richard Eckart de Castilho Technical Lead Ubiquitous Knowledge Processing Lab FB 20 Computer Science Department Technische Universität Darmstadt Hochschulstr. 10, D-64289 Darmstadt, Germany phone +49 (6151) 16-7477, fax -5455, room S2/02/E225 [email protected] www.ukp.tu-darmstadt.de -------------------------------------------------------------------
