Richard Eckart de Castilho <eckartde@...> writes: > > Hello Greg, > > > It's sort of a "maven-like" model (i.e. when using a Nexus server). Or maybe I should just actually use > maven and nexus? > > > > Has anyone out there tried to create a "UIMA Repository" that can be directly referenced from a component > descriptor file? How did you make it work? > > We consider ourselves to have a "UIMA Repository" based on Maven - cf. DKPro Core http://code.google.com/p/dkpro-core-asl/ > > I would like to point out that we have largely abandonded static UIMA descriptors (except type descriptors). > > We feel very comfortable programming on the Java level, dynamically creating descriptors using uimaFIT > and running our pipelines directly from within Java (no CPE GUI or such). > For this scenario, Maven works like a charm for us. We do not even worry too much about type systems, because > we have packaged their XML descriptors and JCas > wrappers in JARs as well and can simply add them as Maven dependencies. We use uimaFIT's automatic type > system detection feature to dynamically construct a > global type system description from all type system description files that could be found in a > well-defined location in the classpath (that is, in the afore > mentioned JARs). A short example: > > * add dependency on de.tudarmstadt.ukp.dkpro.core.io.text-asl (for TextReader) > * add dependency on de.tudarmstadt.ukp.dkpro.core.tokit-asl (for BreakIteratorSegmenter) > * add dependency on de.tudarmstadt.ukp.dkpro.core.dictionaryannotator- asl (for DictionaryAnnotator) > * dependency on uimaFIT automatically added (for CASDumpWriter) > * dependencies on type systems and JCas wrappers automatically added by Maven > > Then we can immediately assemble and run a pipeline: > > CollectionReader reader = createCollectionReader(TextReader.class, > TextReader.PARAM_PATH, "src/test/resources/text", > TextReader.PARAM_PATTERNS, new String[] { "[+]*.txt", "[- ]broken.txt" }, > TextReader.PARAM_LANGUAGE, "en"); > > AnalysisEngine tokenizer = createPrimitive(BreakIteratorSegmenter.class); > > AnalysisEngine nameFinder = createPrimitive(DictionaryAnnotator.class, > DictionaryAnnotator.PARAM_PHRASE_FILE, "src/test/resources/dictionaries/names.txt", > DictionaryAnnotator.PARAM_ANNOTATION_TYPE, Name.class.getName()); > > AnalysisEngine writer = createPrimitive(CASDumpWriter.class, > CASDumpWriter.PARAM_OUTPUT_FILE, "target/output.txt"); > > SimplePipeline.runPipeline(reader, tokenizer, nameFinder, writer); > > Notice that no line references a type system whatsoever. This is because we let uimaFIT automatically scan > the classpath and simply make all > types it finds available to every created component. > > Our approach seems to work great for our researchers to assemble and run pipelines on a single machine. We do > currently not scale out UIMA. > > Cheers, > > Richard >
Hi Richard, Would you mind showing me how the uimafit is able to "dynamically construct a global type system description from all type system description files that could be found in a well-defined location in the classpath". Do you rely on using the umiafit/types.txt file? If so, how you specify it such that it is able to pick up the type description files in the classpath? I looked into the de.tudarmstadt.ukp.dkpro.core.tokit-asl-1.4.0.jar but there are no type description file inside the JAR itself.
