Re: DocumentAnnotation and type-merging

Marshall Schor Mon, 18 Dec 2006 09:45:31 -0800

I think I'm leaning toward keeping this capability in JCasGen. Here'swhat I'm thinking:


The concerns Adam raises arise when the classes that JCasGen generates

are packaged in the same JAR file as the component they go with. But Ithink

they are mitigated if we keep type systems packaged separately from
components.  (I'm imagining there's a way to do this, for now).

It seems to me that developers choose to put data into a CAS becausethey envision "sharing"that data with other (independently-developed) components. If they'replanningto do that, then the "definitions" of the types they're sharing in somesensenaturally belong to multiple components (the components that are"sharing" thattype definition).Thus, it seems a little bit illogical to envision the standard packagingof typesgo with a particular "component" - a better practice might be thinkingof typesmore as 1st class parts in themselves. If I assemble several partstogether, I imagineI would regenerate the JCas classes for this aggregate, and packagethese as2 separate things. This would allow future users of my part (theaggregate) to

combine it with other parts, and re-run JCasGen on that new amalgam, etc.

This concept seems a fundamental principle of how components are hooked up

together in UIMA. Our approach differs from WSDL, in that it strives toavoidtranslating data formats / representations, by instead having apre-figured-out"merged" design for shared data at the start of a "run".The JCas approach does have a deficiency in that if the user hashand-customizedthe generated code, in more than one component, the merging of thecustomizedcode has to be done by hand. (It does work if only one of the partssharing a typehas the hand-customized code, or if more than one of the parts havethis, that theyare identical).

Adam Lally wrote:

After thinking about this some more here's my a proposed plan of action:

1) We explicilty document that "feature extension" (meaning the
practice of defining a type in two different places, with different
features, causing those feature sets to be merged) is incompatible
with JCAS.  The "owner" of a type (whoever defined it first) should
get to choose the features and generate the one and only JCas class.
If a user of this type wants to add their own features, UIMA supports
this only through the "plain" CAS API, not the JCAS.  JCasGen and CDE
should give warnings when they encounter multiple non-identical
definitions of the same type.

I wonder if this might be too constraining for part - combining.

2) The UIMA Framework "owns" the definition of
uima.tcas.DocumentAnnotation and its corresponding JCAS class.  So
users MAY NOT have their own JCAS cover classes for
DocumentAnnotation.  JCasGen should stop generating DocumentAnnotation
classes.

3) We will provide a convient way for users to define their own global
metadata types and easily access them in the CAS.

+1 to this

4) Migration path:  Users must delete any DocumentAnnotation JCas
cover class that they have.  If they have added custom features, then
they either have to change their code to not use JCAS (for accessing
these particular features), or they have to define their own global
metadata type and change their code to use that rather than the
DocumentAnnotation.

I don't think there's a good way to make JCAS and feature extension
coexist happily, so let's acknowledge that they don't.  It's never
ideal to break users' code but I think this is worth it.  I'm guessing
most users haven't used feature extension anyway, so the effect is
minor.

The way described at the top might be a reasonably good way to make JCas

and feature extension co-exist happily: keep JCas generatedsources/classes in

separate Jars, "live-with" the limit of only one "hand-done customization".


Thoughts?

-Adam

Re: DocumentAnnotation and type-merging

Reply via email to