managing resources for UIMA?

Jens Grivolla Wed, 22 May 2013 09:32:34 -0700

Hi, while not strictly a UIMA issue, we have a problem that seems veryrelevant in the context of UIMA analysis engines: how to manage largebinary resources such as trained models used by an AE, etc.

So far, we have managed to achieve a good separation between codedevelopment and the actual AEs, using Maven (and git for versioncontrol). An AE thus consists only of a POM referencing the code, the AEdescriptor, and the resources used for the AE. The AE poms areconfigured to generate PEAR archives that include all dependencies andresources.

At this point we have the code in git, and the AEs' pom and descriptoralso, while we manually copy the resources to the directory beforerunning `mvn package` (and exclude those resources from git). We'remissing a way to manage those resources, including versioning etc.

I'm guessing that this is a rather typical problem, so what solutions doyou use? We're thinking of having all resources also in Maven (e.g.Artifactory) so we can reference them with a unique identifier andversion. This would also help us when moving to more complex pipelineassemblies using uimafit instead of generating individual PEARS for eachcomponent in order to create complete packages.

Btw, we are just very few core developers, with most of the team made upof linguists, so we want to make it easy for them to save versions ofresources they create and assemble AEs by just referencing the algorithmand resource (e.g. "create a new OpenNLP POStagger usingspanish-pos-model.bin, version 1.2.3").


Thanks for sharing your experiences with this...

Jens

managing resources for UIMA?

Reply via email to