Hi Chris,

> I find the idea of enabling maven by packaging components in jars very 
> compelling. Have you dealt with third-party code that expects to find
> resources from file system locations rather than classpath names?


I am not completely sure what you mean, so I hope my answer will satisfy you.

If you want to build a pipeline for a third-party component, that cannot deal 
with URLs, DKPro Core includes ResourceUtils.getUrlAsFile() [1] which bridges 
that for you and even supports caching. For resolving "classpath:" URLs, you 
can either use ResourceUtils.resolveLocation().

   AnalysisEngine ae = createPrimitive(ThirdPartyAE.class,
       ThirdPartyAE.PARAM_RESOURCE_FILE, 
getUrlAsFile(resolveLocation("classpath:/my/packaged/resource.bin"), 
true).getAbsolutePath());

If the third-party component supports the UIMA ResourceLoader, you should be 
able to configure that to resolve resources from the file-system.

Some of the components we have implemented support loading resources from the 
classpath. This means we can package resources like tagging models as JARs and 
add them as Maven dependencies as well. 

DKPro includes Ant scripts that automatically create such JARs for TreeTagger 
models and binaries as well as for models of the Stanford Parser and NER. The 
generated JARs can be uploaded to a Maven repository and added to a project 
just like that (due to license restrictions, not to a public repository). The 
TreeTagger component is intelligent enough to load the correct model just by 
looking at the document language set in the CAS. The Stanford Parser and NER 
components currently can't do that, here you'd have to specify a model URL like 
"classpath:/resource/Classifiers/FaruquiPado/hgc_GERMAN_175M.ser.gz" (cf. [2]).

DKPro also includes a powerful base class for CollectionReaders that uses the 
Spring PathMatchingResourcePatternResolver [3], which is also used by uimaFIT 
for automatic type detection. ResourceCollectionReaderBase [4] allows you to 
easily create CollectionReaders capable of loading data from the file system or 
the classpath (or any other location/URL supported by the Spring Resource 
framework) using Ant-like inclusion/exclusion patterns. For example our 
TextReader uses that:

    CollectionReader reader = createCollectionReader(TextReader.class,
        TextReader.PARAM_PATH, "classpath:/data",
        TextReader.PARAM_PATTERNS, new String[] { "[+]text/**/*.txt", 
"[-]**/broken.txt" },
        TextReader.PARAM_LANGUAGE, "en");

Best,

Richard

[1] 
http://code.google.com/p/dkpro-core-asl/source/browse/de.tudarmstadt.ukp.dkpro.core-asl/trunk/de.tudarmstadt.ukp.dkpro.core.api.resources/src/main/java/de/tudarmstadt/ukp/dkpro/core/api/resources/ResourceUtils.java
[2] 
http://code.google.com/p/dkpro-core-gpl/source/browse/de.tudarmstadt.ukp.dkpro.core-gpl/trunk/de.tudarmstadt.ukp.dkpro.core.stanfordnlp/src/test/java/de/tudarmstadt/ukp/dkpro/core/stanfordnlp/StanfordNamedEntityRecognizerTest.java
[3] 
http://static.springsource.org/spring/docs/2.5.x/api/org/springframework/core/io/support/PathMatchingResourcePatternResolver.html
[4] 
http://code.google.com/p/dkpro-core-asl/source/browse/de.tudarmstadt.ukp.dkpro.core-asl/trunk/de.tudarmstadt.ukp.dkpro.core.api.io/src/main/java/de/tudarmstadt/ukp/dkpro/core/api/io/ResourceCollectionReaderBase.java

-- 
------------------------------------------------------------------- 
Richard Eckart de Castilho
Technical Lead
Ubiquitous Knowledge Processing Lab 
FB 20 Computer Science Department      
Technische Universität Darmstadt 
Hochschulstr. 10, D-64289 Darmstadt, Germany 
phone +49 (6151) 16-7477, fax -5455, room S2/02/E225
[email protected] 
www.ukp.tu-darmstadt.de 
------------------------------------------------------------------- 





Reply via email to