I think I'm leaning toward keeping this capability in JCasGen. Here's
what I'm thinking:
The concerns Adam raises arise when the classes that JCasGen generates
are packaged in the same JAR file as the component they go with. But I
think
they are mitigated if we keep type systems packaged separately from
components. (I'm imagining there's a way to do this, for now).
It seems to me that developers choose to put data into a CAS because
they envision "sharing"
that data with other (independently-developed) components. If they're
planning
to do that, then the "definitions" of the types they're sharing in some
sense
naturally belong to multiple components (the components that are
"sharing" that
type definition).
Thus, it seems a little bit illogical to envision the standard packaging
of types
go with a particular "component" - a better practice might be thinking
of types
more as 1st class parts in themselves. If I assemble several parts
together, I imagine
I would regenerate the JCas classes for this aggregate, and package
these as
2 separate things. This would allow future users of my part (the
aggregate) to
combine it with other parts, and re-run JCasGen on that new amalgam, etc.
This concept seems a fundamental principle of how components are hooked up
together in UIMA. Our approach differs from WSDL, in that it strives to
avoid
translating data formats / representations, by instead having a
pre-figured-out
"merged" design for shared data at the start of a "run".
The JCas approach does have a deficiency in that if the user has
hand-customized
the generated code, in more than one component, the merging of the
customized
code has to be done by hand. (It does work if only one of the parts
sharing a type
has the hand-customized code, or if more than one of the parts have
this, that they
are identical).
Adam Lally wrote:
After thinking about this some more here's my a proposed plan of action:
1) We explicilty document that "feature extension" (meaning the
practice of defining a type in two different places, with different
features, causing those feature sets to be merged) is incompatible
with JCAS. The "owner" of a type (whoever defined it first) should
get to choose the features and generate the one and only JCas class.
If a user of this type wants to add their own features, UIMA supports
this only through the "plain" CAS API, not the JCAS. JCasGen and CDE
should give warnings when they encounter multiple non-identical
definitions of the same type.
I wonder if this might be too constraining for part - combining.
2) The UIMA Framework "owns" the definition of
uima.tcas.DocumentAnnotation and its corresponding JCAS class. So
users MAY NOT have their own JCAS cover classes for
DocumentAnnotation. JCasGen should stop generating DocumentAnnotation
classes.
3) We will provide a convient way for users to define their own global
metadata types and easily access them in the CAS.
+1 to this
4) Migration path: Users must delete any DocumentAnnotation JCas
cover class that they have. If they have added custom features, then
they either have to change their code to not use JCAS (for accessing
these particular features), or they have to define their own global
metadata type and change their code to use that rather than the
DocumentAnnotation.
I don't think there's a good way to make JCAS and feature extension
coexist happily, so let's acknowledge that they don't. It's never
ideal to break users' code but I think this is worth it. I'm guessing
most users haven't used feature extension anyway, so the effect is
minor.
The way described at the top might be a reasonably good way to make JCas
and feature extension co-exist happily: keep JCas generated
sources/classes in
separate Jars, "live-with" the limit of only one "hand-done customization".
Thoughts?
-Adam