Hi Jens,

for DKPro Core [1], we have packaged a large number of models as Maven 
artifacts and host them in our public Maven repository [1]. We have made good 
experiences with this approach. Please do feel free to make use of these 
packages.

To package models, we use a set of ant-macros [2] which we use in different Ant 
scripts that download the original models from their original sites and wrap 
them up in a standard layout and naming scheme, for example [4].

Cheers,

-- Richard

[1] http://code.google.com/p/dkpro-core-asl
[2] 
http://zoidberg.ukp.informatik.tu-darmstadt.de/artifactory/public-model-releases-local
[3] 
http://code.google.com/p/dkpro-core-asl/source/browse/built-ant-macros/trunk/ant-macros.xml
[4] 
https://dkpro-core-gpl.googlecode.com/svn/de.tudarmstadt.ukp.dkpro.core-gpl/trunk/de.tudarmstadt.ukp.dkpro.core.stanfordnlp-gpl/src/scripts/build.xml

Am 22.05.2013 um 18:31 schrieb Jens Grivolla <[email protected]>:

> Hi, while not strictly a UIMA issue, we have a problem that seems very 
> relevant in the context of UIMA analysis engines: how to manage large binary 
> resources such as trained models used by an AE, etc.
> 
> So far, we have managed to achieve a good separation between code development 
> and the actual AEs, using Maven (and git for version control). An AE thus 
> consists only of a POM referencing the code, the AE descriptor, and the 
> resources used for the AE. The AE poms are configured to generate PEAR 
> archives that include all dependencies and resources.
> 
> At this point we have the code in git, and the AEs' pom and descriptor also, 
> while we manually copy the resources to the directory before running `mvn 
> package` (and exclude those resources from git). We're missing a way to 
> manage those resources, including versioning etc.
> 
> I'm guessing that this is a rather typical problem, so what solutions do you 
> use? We're thinking of having all resources also in Maven (e.g. Artifactory) 
> so we can reference them with a unique identifier and version. This would 
> also help us when moving to more complex pipeline assemblies using uimafit 
> instead of generating individual PEARS for each component in order to create 
> complete packages.
> 
> Btw, we are just very few core developers, with most of the team made up of 
> linguists, so we want to make it easy for them to save versions of resources 
> they create and assemble AEs by just referencing the algorithm and resource 
> (e.g. "create a new OpenNLP POStagger using spanish-pos-model.bin, version 
> 1.2.3").
> 
> Thanks for sharing your experiences with this...
> 
> Jens

Reply via email to