On 12/15/06, Marshall Schor <[EMAIL PROTECTED]> wrote:
> Questions:
> 1) Do we continue to support adding features to DocumentAnnotation?
It breaks a lot of user code if we stop supporting this.  We might
consider doing this as part of the change-over to Apache, but if we did, I 
think we
would need to implement the ability for users to more easily store and find 
data in
the CAS that they were using the Document Annotation for, before.

I think we should provide that feature anyway.  IMO we need a better
way to index and retrieve arbitrary FS without having to declare a
custom index in your descriptor.

I do share your concern that this breaks user code in a way for which
there's no quick fix.  Maybe, we can hope that very few users have
actually added their own features to DocumentAnnotation, though.


> 2) If so, should we delete the JCAS DocumentAnnotation class from the
> framework code entirely, to avoid the problem that required the
> uima_jcas_builtin_types.jar workaround?
If it is deleted, it might break old code generated before JCasGen was
modified to create this class.  In that case, there would be no JCas cover class
for it at all. That's unlikely to be a problem, unless the user was looking for 
the
only default built-in feature (language).  And there's an easy work-around - 
run JCasGen.


Asking the user to run JCasGen in this case seems OK to me.  JCasGen
has generated DocumentAnnotation since UIMA v1.1, I believe, so only
code older than that would be affected.

I don't recall other reasons for having it - does anyone else?

There are a couple of uses of it in the framework itself.

a) The JCas has a deprecated method: DocumentAnnotation
getDocumentAnnotation().  This was deprecated because it is not safe
to use if the annotator is loaded under a different classloader than
the framework, and the annotator has its own definition of
DocumentAnnotation.  If we remove the DocumentAnnotation class from
the framework we'd have to remove this method, and users would have to
change their code to use the alternative method
getDocumentAnnoationFs(), which returns type TOP and requires the user
to cast to DocumentAnnotation.

b) The example FileSystemCollectionReader uses the DocumentAnnotation
class to set the language feature (to a value specified in a config
parameter).  Either we remove this feature, change it to not use JCAS
(kind of an ugly workaround to have to expose in example code), or we
could move the DocumentAnnotation class from uima-core.jar to
uima-examples.jar (which is a slight improvement, but in practice
unlikely to help much since I think most users just include all the
uima jars in their classpath anyway).


However, there's still a big problem with having JCasGen generate the
class for DocumentAnnotation:  it is unsafe with respect to
composability.  If two annotators bundle differing versions of
DocumentAnnotation they cannot be used together without rerunning
JCasGen and then dealing with all the issues that arise because you
now have three versions of DocumentAnnotation hanging around (so you
either need to get your classpath ordering right or actually go into
the annotators' jars and delete the versions of DocumentAnnotation
that they bundled).


No solution seems perfect, but if there were ever a time to remove
support for adding features to DocumentAnnotation, I think now is that
time.

-Adam

Reply via email to