On 06/22/2011 07:53 PM, Olivier Grisel wrote: > 2011/6/22 Jörn Kottmann <[email protected]>: >> On 6/22/11 6:50 PM, Olivier Grisel wrote: >>> >>> I am ok with switching to UIMA CAS. We might need additional metadata >>> outside of the CAS annotations though. For instance if the annotators >>> fixes a typo in the Sofa it-self, we might need to be able to tell >>> that Sofa1 is subject to being replaced by Sofa2 according to >>> annotator A1 for instance. >>> >> >> I am not sure if we should fix such mistakes, the system will also encounter >> them in real data it needs to process. Fixing typos, or correcting things in >> the text is >> always difficult when there are already existing annotations. >> >> Do you feel fixing mistakes in the text is important? > > We can leave that issue as a low priority discussion for later and > just ignore it for now. > > >> We can also fix by having an option to delete "garbage" texts from the >> corpus. > > Yes, discarding a whole CAS. But if the CAS is document level instead > of sentence level, that might be an issue.
Let's say we have a CAS type Sentence, which will not be changed, and another type AnnotatedSentence. Each time a sentence was annotated by a user, a new AnnotatedSentence annotation will be created in the same span containing information about the user and the state of the sentence (e.g. correct, unsure, or discarded). This way we can store all that without the need for changes to the Sofa. Alternatively, each Sentence could have a List of something like AnnotationMetadata. > ... >> I believe the Corpus server should be independent of the other components >> and define some kind of remote API for data interchange. > > Is there a JSON version of XMI? Hannes, what is your opinion on this? A separate corpus server sounds good to me. But this server can simply deliver the default XMI representation of the CASes. I think the documents have to be preprocessed for annotation on the server side of the WebGUI anyways. The JS client should not call the corpus server directly.
