Re: managing resources for UIMA?

Jens Grivolla Tue, 28 May 2013 09:27:59 -0700

Thanks for your pointers, I think this will be very helpful.

However, we use various components and wrappers from outside sources(such as OpenNLP) and don't always control how resources are loaded bythe AE. Sometimes it might be sufficient to have the resource on theclasspath (in a JAR referenced by Maven), and I think it is acceptablefor us to hard-code the resource reference in the AE descriptor, e.g.Having a fully automatic resource resolution as in DKPro would be nicebut is not an immediate necessity.

We face a more complicated situation with some components that do notresolve resources from the classpath, e.g. C++ and Python componentsthat need to reference the actual resource files. For those situationswe would need to unpack the resource when building the AE and bundle itso it can be accessed with a file path. We currently manually copy theresource file into a /resources folder that gets included in the PEARpackages we generate (and thus unpacked when installing the PEAR).

This probably leads to a different problem, which is doing more advancedcustom tasks with Maven, such as unpacking archives, moving files aroundwhen packaging, etc. I see that you actually use Ant to package yourresource Maven artifacts, probably due to the difficulty of doing suchtask directly in Maven.

Would Gradle be a better option to have the dependency management fromMaven while being able to more easily define custom manipulations ofresources to help with packaging? Is it possible to generate PEARpackages from Gradle? There are afaik plugins for Maven and Ant, sowould we then reference an Ant task from Gradle? (I'll split this partoff as a more general thread about Gradle, I think.)


Thanks,
Jens

On 05/22/2013 06:45 PM, Richard Eckart de Castilho wrote:

For some additional half-decent documentation on how and why we do the model 
packaging as we do it, see:

http://code.google.com/p/dkpro-core-asl/wiki/ResourceProviderAPI
http://code.google.com/p/dkpro-core-asl/wiki/PackagingResources

-- Richard

Am 22.05.2013 um 18:43 schrieb Richard Eckart de Castilho 
<[email protected]>:

Hi Jens,

for DKPro Core [1], we have packaged a large number of models as Maven 
artifacts and host them in our public Maven repository [1]. We have made good 
experiences with this approach. Please do feel free to make use of these 
packages.

To package models, we use a set of ant-macros [2] which we use in different Ant 
scripts that download the original models from their original sites and wrap 
them up in a standard layout and naming scheme, for example [4].

Cheers,

-- Richard

[1] http://code.google.com/p/dkpro-core-asl
[2] 
http://zoidberg.ukp.informatik.tu-darmstadt.de/artifactory/public-model-releases-local
[3] 
http://code.google.com/p/dkpro-core-asl/source/browse/built-ant-macros/trunk/ant-macros.xml
[4] 
https://dkpro-core-gpl.googlecode.com/svn/de.tudarmstadt.ukp.dkpro.core-gpl/trunk/de.tudarmstadt.ukp.dkpro.core.stanfordnlp-gpl/src/scripts/build.xml

Am 22.05.2013 um 18:31 schrieb Jens Grivolla <[email protected]>:

Hi, while not strictly a UIMA issue, we have a problem that seems very relevant 
in the context of UIMA analysis engines: how to manage large binary resources 
such as trained models used by an AE, etc.

So far, we have managed to achieve a good separation between code development 
and the actual AEs, using Maven (and git for version control). An AE thus 
consists only of a POM referencing the code, the AE descriptor, and the 
resources used for the AE. The AE poms are configured to generate PEAR archives 
that include all dependencies and resources.

At this point we have the code in git, and the AEs' pom and descriptor also, 
while we manually copy the resources to the directory before running `mvn 
package` (and exclude those resources from git). We're missing a way to manage 
those resources, including versioning etc.

I'm guessing that this is a rather typical problem, so what solutions do you 
use? We're thinking of having all resources also in Maven (e.g. Artifactory) so 
we can reference them with a unique identifier and version. This would also 
help us when moving to more complex pipeline assemblies using uimafit instead 
of generating individual PEARS for each component in order to create complete 
packages.

Btw, we are just very few core developers, with most of the team made up of linguists, so 
we want to make it easy for them to save versions of resources they create and assemble 
AEs by just referencing the algorithm and resource (e.g. "create a new OpenNLP 
POStagger using spanish-pos-model.bin, version 1.2.3").

Thanks for sharing your experiences with this...

Jens

Re: managing resources for UIMA?

Reply via email to