Hi, while not strictly a UIMA issue, we have a problem that seems very relevant in the context of UIMA analysis engines: how to manage large binary resources such as trained models used by an AE, etc.

So far, we have managed to achieve a good separation between code development and the actual AEs, using Maven (and git for version control). An AE thus consists only of a POM referencing the code, the AE descriptor, and the resources used for the AE. The AE poms are configured to generate PEAR archives that include all dependencies and resources.

At this point we have the code in git, and the AEs' pom and descriptor also, while we manually copy the resources to the directory before running `mvn package` (and exclude those resources from git). We're missing a way to manage those resources, including versioning etc.

I'm guessing that this is a rather typical problem, so what solutions do you use? We're thinking of having all resources also in Maven (e.g. Artifactory) so we can reference them with a unique identifier and version. This would also help us when moving to more complex pipeline assemblies using uimafit instead of generating individual PEARS for each component in order to create complete packages.

Btw, we are just very few core developers, with most of the team made up of linguists, so we want to make it easy for them to save versions of resources they create and assemble AEs by just referencing the algorithm and resource (e.g. "create a new OpenNLP POStagger using spanish-pos-model.bin, version 1.2.3").

Thanks for sharing your experiences with this...

Jens

Reply via email to