On 7/18/2012 5:07 PM, Sebastian Sprenger wrote: > I am writing on a state of the art analysis of frameworks for filtering and > analysing information streams. > I don't understand why annotators (or any pre-processing components) can only > process Objects that specify JCas.
This is probably off-topic, and may be a detail, but the fundamental object container in UIMA that flows from annotator to annotator is a "CAS". There is an interface for it called the JCas - which stands for Java interface to the CAS. So I presume in the above sentence you meant a "CAS". > Why is it not possible to process arbitrary > objects? A purpose of UIMA is to enable collaboration among independently developed unstructured information analysis components. In general, these components can be written in a variety of "languages". UIMA, in particular, supports annotators written in Java and C/C++, plus some others (Python, etc). These languages have different capabilities for expressing "objects". For UIMA we chose an approach which put the objects (featurestructures) into the CAS. When writing a particular annotator, in a particular language, you are free to use whatever objects you desire within that annotator. When you get or put data into the CAS you are "sharing" that data with other components, potentially developed independently, by others, in other languages. ========== You may be asking a different question, however. You may be saying, I have a JPEG image, or an Audio file encoded in mp3, etc. -- something that's not "text". The often used approach for examples in UIMA sometimes appears to assume that the unstructured input, the "Subject of Analysis" (or SofA as our documentation calls it) is a text document. However, UIMA *does* allow arbitrary kinds of unstructured information for the SofA -- if that was your concern. For more details, see http://uima.apache.org/d/uimaj-2.4.0/tutorials_and_users_guides.html#ugr.tug.aas.sofa_data_formats > Best regards, Sebastian > > Hope this helps. -Marshall