Re: DocumentAnnotation and type-merging

Marshall Schor Mon, 18 Dec 2006 16:26:42 -0800

Adam Lally wrote:

On 12/18/06, Marshall Schor <[EMAIL PROTECTED]> wrote:

It seems to me that developers choose to put data into a CAS because
they envision "sharing"
that data with other (independently-developed) components.  If they're
planning
to do that, then the "definitions" of the types they're sharing in some
sense
naturally belong to multiple components (the components that are
"sharing" that
type definition).


Thus, it seems a little bit illogical to envision the standard packaging
of types
go with a particular "component" - a better practice might be thinking
of types
more as 1st class parts in themselves.


Hmmm.. good point, alghough I would say that the type defintions don't
belong to the components at all.  Type systems should be 1st-class
objects that are maintained as separate entities and have an owner who
decides what the official definition of the type is.

I agree. The best practice would be to keep the generated sources andclass files ina Jar.

It seems to me that if we have a jar containing JCAS cover classes and
this jar is needed by multiple annotators, then we have similar issues
that come up for any shared library.  For example, if annotator A
bundles one version of Xerces and annotator B bundles a nother
version, there may be problems if you try to deploy both of these into
the same application.  Usually, the library will be backwards
compatible so the only trick is to make sure the newer version is in
use.  If it's not backwards compatible then it gets uglier.

I think what we have is annotator A might have a version of types for it(T/A) andannotator B might have a version of types for it (T/B). The "assembly"of A and B

has a process whereby T/A and T/B are "merged", and a new T/A&B is

created. This seems different form the Xerces example. It's notncesessarily a "newer"

versus "older" thing, for UIMA assemblies.

together, I imagine
I would regenerate the JCas classes for this aggregate, and package
these as
2 separate things.  This would allow future users of my part (the
aggregate) to

combine it with other parts, and re-run JCasGen on that new amalgam,etc.

This concept seems a fundamental principle of how components arehooked up

together in UIMA.  Our approach differs from WSDL, in that it strives to
avoid
translating data formats / representations, by instead having a
pre-figured-out
"merged" design for shared data at the start of a "run".

If I assemble several parts

I think what is a fundamental principle of UIMA (or should be, anyway)
is that components interoperate without manual intervention.  So if I
want to run a pipeline of any 5 UIMA components, I can just grab them
off the shelf and run.

I agree this is a good goal. It seems achievable with some "automation"introducedinto the assembly step.

This seems particularly important for applications that host arbitrary
UIMA analytics.  End users want to grab the latest, greatest annotator
and drop it in.  This should work smoothly, or UIMA isn't meeting one
of its most important goals.

I agree. Perhaps we should figure out what (if anything) is inhibitingthis,

and see if it can be be addressed.  One concept might be to require JCas
source/class files to be packaged in a particular way, and to improve the
"merge" logic to cover more cases (and report on the cases where it fails
and a "manual" merge step might be needed).  I think in most pragmatic
cases it will work fine "automatically".

-Marshall

Re: DocumentAnnotation and type-merging

Reply via email to