Thanks for your pointers, I think this will be very helpful.
However, we use various components and wrappers from outside sources
(such as OpenNLP) and don't always control how resources are loaded by
the AE. Sometimes it might be sufficient to have the resource on the
classpath (in a JAR referenced by Maven), and I think it is acceptable
for us to hard-code the resource reference in the AE descriptor, e.g.
Having a fully automatic resource resolution as in DKPro would be nice
but is not an immediate necessity.
We face a more complicated situation with some components that do not
resolve resources from the classpath, e.g. C++ and Python components
that need to reference the actual resource files. For those situations
we would need to unpack the resource when building the AE and bundle it
so it can be accessed with a file path. We currently manually copy the
resource file into a /resources folder that gets included in the PEAR
packages we generate (and thus unpacked when installing the PEAR).
This probably leads to a different problem, which is doing more advanced
custom tasks with Maven, such as unpacking archives, moving files around
when packaging, etc. I see that you actually use Ant to package your
resource Maven artifacts, probably due to the difficulty of doing such
task directly in Maven.
Would Gradle be a better option to have the dependency management from
Maven while being able to more easily define custom manipulations of
resources to help with packaging? Is it possible to generate PEAR
packages from Gradle? There are afaik plugins for Maven and Ant, so
would we then reference an Ant task from Gradle? (I'll split this part
off as a more general thread about Gradle, I think.)
Thanks,
Jens
On 05/22/2013 06:45 PM, Richard Eckart de Castilho wrote:
For some additional half-decent documentation on how and why we do the model
packaging as we do it, see:
http://code.google.com/p/dkpro-core-asl/wiki/ResourceProviderAPI
http://code.google.com/p/dkpro-core-asl/wiki/PackagingResources
-- Richard
Am 22.05.2013 um 18:43 schrieb Richard Eckart de Castilho
<[email protected]>:
Hi Jens,
for DKPro Core [1], we have packaged a large number of models as Maven
artifacts and host them in our public Maven repository [1]. We have made good
experiences with this approach. Please do feel free to make use of these
packages.
To package models, we use a set of ant-macros [2] which we use in different Ant
scripts that download the original models from their original sites and wrap
them up in a standard layout and naming scheme, for example [4].
Cheers,
-- Richard
[1] http://code.google.com/p/dkpro-core-asl
[2]
http://zoidberg.ukp.informatik.tu-darmstadt.de/artifactory/public-model-releases-local
[3]
http://code.google.com/p/dkpro-core-asl/source/browse/built-ant-macros/trunk/ant-macros.xml
[4]
https://dkpro-core-gpl.googlecode.com/svn/de.tudarmstadt.ukp.dkpro.core-gpl/trunk/de.tudarmstadt.ukp.dkpro.core.stanfordnlp-gpl/src/scripts/build.xml
Am 22.05.2013 um 18:31 schrieb Jens Grivolla <[email protected]>:
Hi, while not strictly a UIMA issue, we have a problem that seems very relevant
in the context of UIMA analysis engines: how to manage large binary resources
such as trained models used by an AE, etc.
So far, we have managed to achieve a good separation between code development
and the actual AEs, using Maven (and git for version control). An AE thus
consists only of a POM referencing the code, the AE descriptor, and the
resources used for the AE. The AE poms are configured to generate PEAR archives
that include all dependencies and resources.
At this point we have the code in git, and the AEs' pom and descriptor also,
while we manually copy the resources to the directory before running `mvn
package` (and exclude those resources from git). We're missing a way to manage
those resources, including versioning etc.
I'm guessing that this is a rather typical problem, so what solutions do you
use? We're thinking of having all resources also in Maven (e.g. Artifactory) so
we can reference them with a unique identifier and version. This would also
help us when moving to more complex pipeline assemblies using uimafit instead
of generating individual PEARS for each component in order to create complete
packages.
Btw, we are just very few core developers, with most of the team made up of linguists, so
we want to make it easy for them to save versions of resources they create and assemble
AEs by just referencing the algorithm and resource (e.g. "create a new OpenNLP
POStagger using spanish-pos-model.bin, version 1.2.3").
Thanks for sharing your experiences with this...
Jens