Hi Jens, for DKPro Core [1], we have packaged a large number of models as Maven artifacts and host them in our public Maven repository [1]. We have made good experiences with this approach. Please do feel free to make use of these packages.
To package models, we use a set of ant-macros [2] which we use in different Ant scripts that download the original models from their original sites and wrap them up in a standard layout and naming scheme, for example [4]. Cheers, -- Richard [1] http://code.google.com/p/dkpro-core-asl [2] http://zoidberg.ukp.informatik.tu-darmstadt.de/artifactory/public-model-releases-local [3] http://code.google.com/p/dkpro-core-asl/source/browse/built-ant-macros/trunk/ant-macros.xml [4] https://dkpro-core-gpl.googlecode.com/svn/de.tudarmstadt.ukp.dkpro.core-gpl/trunk/de.tudarmstadt.ukp.dkpro.core.stanfordnlp-gpl/src/scripts/build.xml Am 22.05.2013 um 18:31 schrieb Jens Grivolla <[email protected]>: > Hi, while not strictly a UIMA issue, we have a problem that seems very > relevant in the context of UIMA analysis engines: how to manage large binary > resources such as trained models used by an AE, etc. > > So far, we have managed to achieve a good separation between code development > and the actual AEs, using Maven (and git for version control). An AE thus > consists only of a POM referencing the code, the AE descriptor, and the > resources used for the AE. The AE poms are configured to generate PEAR > archives that include all dependencies and resources. > > At this point we have the code in git, and the AEs' pom and descriptor also, > while we manually copy the resources to the directory before running `mvn > package` (and exclude those resources from git). We're missing a way to > manage those resources, including versioning etc. > > I'm guessing that this is a rather typical problem, so what solutions do you > use? We're thinking of having all resources also in Maven (e.g. Artifactory) > so we can reference them with a unique identifier and version. This would > also help us when moving to more complex pipeline assemblies using uimafit > instead of generating individual PEARS for each component in order to create > complete packages. > > Btw, we are just very few core developers, with most of the team made up of > linguists, so we want to make it easy for them to save versions of resources > they create and assemble AEs by just referencing the algorithm and resource > (e.g. "create a new OpenNLP POStagger using spanish-pos-model.bin, version > 1.2.3"). > > Thanks for sharing your experiences with this... > > Jens
