Hi Chris,
> I find the idea of enabling maven by packaging components in jars very
> compelling. Have you dealt with third-party code that expects to find
> resources from file system locations rather than classpath names?
I am not completely sure what you mean, so I hope my answer will satisfy you.
If you want to build a pipeline for a third-party component, that cannot deal
with URLs, DKPro Core includes ResourceUtils.getUrlAsFile() [1] which bridges
that for you and even supports caching. For resolving "classpath:" URLs, you
can either use ResourceUtils.resolveLocation().
AnalysisEngine ae = createPrimitive(ThirdPartyAE.class,
ThirdPartyAE.PARAM_RESOURCE_FILE,
getUrlAsFile(resolveLocation("classpath:/my/packaged/resource.bin"),
true).getAbsolutePath());
If the third-party component supports the UIMA ResourceLoader, you should be
able to configure that to resolve resources from the file-system.
Some of the components we have implemented support loading resources from the
classpath. This means we can package resources like tagging models as JARs and
add them as Maven dependencies as well.
DKPro includes Ant scripts that automatically create such JARs for TreeTagger
models and binaries as well as for models of the Stanford Parser and NER. The
generated JARs can be uploaded to a Maven repository and added to a project
just like that (due to license restrictions, not to a public repository). The
TreeTagger component is intelligent enough to load the correct model just by
looking at the document language set in the CAS. The Stanford Parser and NER
components currently can't do that, here you'd have to specify a model URL like
"classpath:/resource/Classifiers/FaruquiPado/hgc_GERMAN_175M.ser.gz" (cf. [2]).
DKPro also includes a powerful base class for CollectionReaders that uses the
Spring PathMatchingResourcePatternResolver [3], which is also used by uimaFIT
for automatic type detection. ResourceCollectionReaderBase [4] allows you to
easily create CollectionReaders capable of loading data from the file system or
the classpath (or any other location/URL supported by the Spring Resource
framework) using Ant-like inclusion/exclusion patterns. For example our
TextReader uses that:
CollectionReader reader = createCollectionReader(TextReader.class,
TextReader.PARAM_PATH, "classpath:/data",
TextReader.PARAM_PATTERNS, new String[] { "[+]text/**/*.txt",
"[-]**/broken.txt" },
TextReader.PARAM_LANGUAGE, "en");
Best,
Richard
[1]
http://code.google.com/p/dkpro-core-asl/source/browse/de.tudarmstadt.ukp.dkpro.core-asl/trunk/de.tudarmstadt.ukp.dkpro.core.api.resources/src/main/java/de/tudarmstadt/ukp/dkpro/core/api/resources/ResourceUtils.java
[2]
http://code.google.com/p/dkpro-core-gpl/source/browse/de.tudarmstadt.ukp.dkpro.core-gpl/trunk/de.tudarmstadt.ukp.dkpro.core.stanfordnlp/src/test/java/de/tudarmstadt/ukp/dkpro/core/stanfordnlp/StanfordNamedEntityRecognizerTest.java
[3]
http://static.springsource.org/spring/docs/2.5.x/api/org/springframework/core/io/support/PathMatchingResourcePatternResolver.html
[4]
http://code.google.com/p/dkpro-core-asl/source/browse/de.tudarmstadt.ukp.dkpro.core-asl/trunk/de.tudarmstadt.ukp.dkpro.core.api.io/src/main/java/de/tudarmstadt/ukp/dkpro/core/api/io/ResourceCollectionReaderBase.java
--
-------------------------------------------------------------------
Richard Eckart de Castilho
Technical Lead
Ubiquitous Knowledge Processing Lab
FB 20 Computer Science Department
Technische Universität Darmstadt
Hochschulstr. 10, D-64289 Darmstadt, Germany
phone +49 (6151) 16-7477, fax -5455, room S2/02/E225
[email protected]
www.ukp.tu-darmstadt.de
-------------------------------------------------------------------