For some additional half-decent documentation on how and why we do the model 
packaging as we do it, see:

http://code.google.com/p/dkpro-core-asl/wiki/ResourceProviderAPI
http://code.google.com/p/dkpro-core-asl/wiki/PackagingResources

-- Richard

Am 22.05.2013 um 18:43 schrieb Richard Eckart de Castilho 
<[email protected]>:

> Hi Jens,
> 
> for DKPro Core [1], we have packaged a large number of models as Maven 
> artifacts and host them in our public Maven repository [1]. We have made good 
> experiences with this approach. Please do feel free to make use of these 
> packages.
> 
> To package models, we use a set of ant-macros [2] which we use in different 
> Ant scripts that download the original models from their original sites and 
> wrap them up in a standard layout and naming scheme, for example [4].
> 
> Cheers,
> 
> -- Richard
> 
> [1] http://code.google.com/p/dkpro-core-asl
> [2] 
> http://zoidberg.ukp.informatik.tu-darmstadt.de/artifactory/public-model-releases-local
> [3] 
> http://code.google.com/p/dkpro-core-asl/source/browse/built-ant-macros/trunk/ant-macros.xml
> [4] 
> https://dkpro-core-gpl.googlecode.com/svn/de.tudarmstadt.ukp.dkpro.core-gpl/trunk/de.tudarmstadt.ukp.dkpro.core.stanfordnlp-gpl/src/scripts/build.xml
> 
> Am 22.05.2013 um 18:31 schrieb Jens Grivolla <[email protected]>:
> 
>> Hi, while not strictly a UIMA issue, we have a problem that seems very 
>> relevant in the context of UIMA analysis engines: how to manage large binary 
>> resources such as trained models used by an AE, etc.
>> 
>> So far, we have managed to achieve a good separation between code 
>> development and the actual AEs, using Maven (and git for version control). 
>> An AE thus consists only of a POM referencing the code, the AE descriptor, 
>> and the resources used for the AE. The AE poms are configured to generate 
>> PEAR archives that include all dependencies and resources.
>> 
>> At this point we have the code in git, and the AEs' pom and descriptor also, 
>> while we manually copy the resources to the directory before running `mvn 
>> package` (and exclude those resources from git). We're missing a way to 
>> manage those resources, including versioning etc.
>> 
>> I'm guessing that this is a rather typical problem, so what solutions do you 
>> use? We're thinking of having all resources also in Maven (e.g. Artifactory) 
>> so we can reference them with a unique identifier and version. This would 
>> also help us when moving to more complex pipeline assemblies using uimafit 
>> instead of generating individual PEARS for each component in order to create 
>> complete packages.
>> 
>> Btw, we are just very few core developers, with most of the team made up of 
>> linguists, so we want to make it easy for them to save versions of resources 
>> they create and assemble AEs by just referencing the algorithm and resource 
>> (e.g. "create a new OpenNLP POStagger using spanish-pos-model.bin, version 
>> 1.2.3").
>> 
>> Thanks for sharing your experiences with this...
>> 
>> Jens

Reply via email to