On 6/22/11 7:38 PM, Hannes Korte wrote:
On 06/22/2011 06:50 PM, Olivier Grisel wrote:
I am ok with switching to UIMA CAS. We might need additional metadata
outside of the CAS annotations though. For instance if the annotators
fixes a typo in the Sofa it-self, we might need to be able to tell
that Sofa1 is subject to being replaced by Sofa2 according to
annotator A1 for instance.
Do we have one CAS per sentence or one CAS per document? If the former
is the case, then we will need some more metadata around the CAS
documents to be able to show the context of a given sentence (if that is
needed at all). If the latter is the case, then this will lead to many
different Sofas, which only differ in a few characters, right?
I was thinking about a system where we have one CAS per document,
but our tooling should still collect annotation on a sentence level.
So a user needs to annotate at least one sentence to add something
useful to the CAS. The training code should then take care of training
on a document which only contains a few annotated sentences.
Jörn