Thanks for your pointers, I think this will be very helpful.

However, we use various components and wrappers from outside sources (such as OpenNLP) and don't always control how resources are loaded by the AE. Sometimes it might be sufficient to have the resource on the classpath (in a JAR referenced by Maven), and I think it is acceptable for us to hard-code the resource reference in the AE descriptor, e.g. Having a fully automatic resource resolution as in DKPro would be nice but is not an immediate necessity.

We face a more complicated situation with some components that do not resolve resources from the classpath, e.g. C++ and Python components that need to reference the actual resource files. For those situations we would need to unpack the resource when building the AE and bundle it so it can be accessed with a file path. We currently manually copy the resource file into a /resources folder that gets included in the PEAR packages we generate (and thus unpacked when installing the PEAR).

This probably leads to a different problem, which is doing more advanced custom tasks with Maven, such as unpacking archives, moving files around when packaging, etc. I see that you actually use Ant to package your resource Maven artifacts, probably due to the difficulty of doing such task directly in Maven.

Would Gradle be a better option to have the dependency management from Maven while being able to more easily define custom manipulations of resources to help with packaging? Is it possible to generate PEAR packages from Gradle? There are afaik plugins for Maven and Ant, so would we then reference an Ant task from Gradle? (I'll split this part off as a more general thread about Gradle, I think.)

Thanks,
Jens

On 05/22/2013 06:45 PM, Richard Eckart de Castilho wrote:
For some additional half-decent documentation on how and why we do the model 
packaging as we do it, see:

http://code.google.com/p/dkpro-core-asl/wiki/ResourceProviderAPI
http://code.google.com/p/dkpro-core-asl/wiki/PackagingResources

-- Richard

Am 22.05.2013 um 18:43 schrieb Richard Eckart de Castilho 
<[email protected]>:

Hi Jens,

for DKPro Core [1], we have packaged a large number of models as Maven 
artifacts and host them in our public Maven repository [1]. We have made good 
experiences with this approach. Please do feel free to make use of these 
packages.

To package models, we use a set of ant-macros [2] which we use in different Ant 
scripts that download the original models from their original sites and wrap 
them up in a standard layout and naming scheme, for example [4].

Cheers,

-- Richard

[1] http://code.google.com/p/dkpro-core-asl
[2] 
http://zoidberg.ukp.informatik.tu-darmstadt.de/artifactory/public-model-releases-local
[3] 
http://code.google.com/p/dkpro-core-asl/source/browse/built-ant-macros/trunk/ant-macros.xml
[4] 
https://dkpro-core-gpl.googlecode.com/svn/de.tudarmstadt.ukp.dkpro.core-gpl/trunk/de.tudarmstadt.ukp.dkpro.core.stanfordnlp-gpl/src/scripts/build.xml

Am 22.05.2013 um 18:31 schrieb Jens Grivolla <[email protected]>:

Hi, while not strictly a UIMA issue, we have a problem that seems very relevant 
in the context of UIMA analysis engines: how to manage large binary resources 
such as trained models used by an AE, etc.

So far, we have managed to achieve a good separation between code development 
and the actual AEs, using Maven (and git for version control). An AE thus 
consists only of a POM referencing the code, the AE descriptor, and the 
resources used for the AE. The AE poms are configured to generate PEAR archives 
that include all dependencies and resources.

At this point we have the code in git, and the AEs' pom and descriptor also, 
while we manually copy the resources to the directory before running `mvn 
package` (and exclude those resources from git). We're missing a way to manage 
those resources, including versioning etc.

I'm guessing that this is a rather typical problem, so what solutions do you 
use? We're thinking of having all resources also in Maven (e.g. Artifactory) so 
we can reference them with a unique identifier and version. This would also 
help us when moving to more complex pipeline assemblies using uimafit instead 
of generating individual PEARS for each component in order to create complete 
packages.

Btw, we are just very few core developers, with most of the team made up of linguists, so 
we want to make it easy for them to save versions of resources they create and assemble 
AEs by just referencing the algorithm and resource (e.g. "create a new OpenNLP 
POStagger using spanish-pos-model.bin, version 1.2.3").

Thanks for sharing your experiences with this...

Jens




Reply via email to