On 12/15/06, Marshall Schor <[EMAIL PROTECTED]> wrote:
> Questions: > 1) Do we continue to support adding features to DocumentAnnotation? It breaks a lot of user code if we stop supporting this. We might consider doing this as part of the change-over to Apache, but if we did, I think we would need to implement the ability for users to more easily store and find data in the CAS that they were using the Document Annotation for, before.
I think we should provide that feature anyway. IMO we need a better way to index and retrieve arbitrary FS without having to declare a custom index in your descriptor. I do share your concern that this breaks user code in a way for which there's no quick fix. Maybe, we can hope that very few users have actually added their own features to DocumentAnnotation, though.
> 2) If so, should we delete the JCAS DocumentAnnotation class from the > framework code entirely, to avoid the problem that required the > uima_jcas_builtin_types.jar workaround? If it is deleted, it might break old code generated before JCasGen was modified to create this class. In that case, there would be no JCas cover class for it at all. That's unlikely to be a problem, unless the user was looking for the only default built-in feature (language). And there's an easy work-around - run JCasGen.
Asking the user to run JCasGen in this case seems OK to me. JCasGen has generated DocumentAnnotation since UIMA v1.1, I believe, so only code older than that would be affected.
I don't recall other reasons for having it - does anyone else?
There are a couple of uses of it in the framework itself. a) The JCas has a deprecated method: DocumentAnnotation getDocumentAnnotation(). This was deprecated because it is not safe to use if the annotator is loaded under a different classloader than the framework, and the annotator has its own definition of DocumentAnnotation. If we remove the DocumentAnnotation class from the framework we'd have to remove this method, and users would have to change their code to use the alternative method getDocumentAnnoationFs(), which returns type TOP and requires the user to cast to DocumentAnnotation. b) The example FileSystemCollectionReader uses the DocumentAnnotation class to set the language feature (to a value specified in a config parameter). Either we remove this feature, change it to not use JCAS (kind of an ugly workaround to have to expose in example code), or we could move the DocumentAnnotation class from uima-core.jar to uima-examples.jar (which is a slight improvement, but in practice unlikely to help much since I think most users just include all the uima jars in their classpath anyway). However, there's still a big problem with having JCasGen generate the class for DocumentAnnotation: it is unsafe with respect to composability. If two annotators bundle differing versions of DocumentAnnotation they cannot be used together without rerunning JCasGen and then dealing with all the issues that arise because you now have three versions of DocumentAnnotation hanging around (so you either need to get your classpath ordering right or actually go into the annotators' jars and delete the versions of DocumentAnnotation that they bundled). No solution seems perfect, but if there were ever a time to remove support for adding features to DocumentAnnotation, I think now is that time. -Adam
