For some additional half-decent documentation on how and why we do the model packaging as we do it, see:
http://code.google.com/p/dkpro-core-asl/wiki/ResourceProviderAPI http://code.google.com/p/dkpro-core-asl/wiki/PackagingResources -- Richard Am 22.05.2013 um 18:43 schrieb Richard Eckart de Castilho <[email protected]>: > Hi Jens, > > for DKPro Core [1], we have packaged a large number of models as Maven > artifacts and host them in our public Maven repository [1]. We have made good > experiences with this approach. Please do feel free to make use of these > packages. > > To package models, we use a set of ant-macros [2] which we use in different > Ant scripts that download the original models from their original sites and > wrap them up in a standard layout and naming scheme, for example [4]. > > Cheers, > > -- Richard > > [1] http://code.google.com/p/dkpro-core-asl > [2] > http://zoidberg.ukp.informatik.tu-darmstadt.de/artifactory/public-model-releases-local > [3] > http://code.google.com/p/dkpro-core-asl/source/browse/built-ant-macros/trunk/ant-macros.xml > [4] > https://dkpro-core-gpl.googlecode.com/svn/de.tudarmstadt.ukp.dkpro.core-gpl/trunk/de.tudarmstadt.ukp.dkpro.core.stanfordnlp-gpl/src/scripts/build.xml > > Am 22.05.2013 um 18:31 schrieb Jens Grivolla <[email protected]>: > >> Hi, while not strictly a UIMA issue, we have a problem that seems very >> relevant in the context of UIMA analysis engines: how to manage large binary >> resources such as trained models used by an AE, etc. >> >> So far, we have managed to achieve a good separation between code >> development and the actual AEs, using Maven (and git for version control). >> An AE thus consists only of a POM referencing the code, the AE descriptor, >> and the resources used for the AE. The AE poms are configured to generate >> PEAR archives that include all dependencies and resources. >> >> At this point we have the code in git, and the AEs' pom and descriptor also, >> while we manually copy the resources to the directory before running `mvn >> package` (and exclude those resources from git). We're missing a way to >> manage those resources, including versioning etc. >> >> I'm guessing that this is a rather typical problem, so what solutions do you >> use? We're thinking of having all resources also in Maven (e.g. Artifactory) >> so we can reference them with a unique identifier and version. This would >> also help us when moving to more complex pipeline assemblies using uimafit >> instead of generating individual PEARS for each component in order to create >> complete packages. >> >> Btw, we are just very few core developers, with most of the team made up of >> linguists, so we want to make it easy for them to save versions of resources >> they create and assemble AEs by just referencing the algorithm and resource >> (e.g. "create a new OpenNLP POStagger using spanish-pos-model.bin, version >> 1.2.3"). >> >> Thanks for sharing your experiences with this... >> >> Jens
