Hello UIMA users,

I know annotators are not *supposed* to change the sofa data once it has
been set but I really need it in my setup.

First of all, I go by the premise that I want to integrate an annotator that
is not really sofa-aware, thus using cas.getDocumentText() to do it's job.

The problem with this is that an annotator might know how to process rss
feeds, csv data, etc. but the current cas sofa data String is in another
(text) format (say newsML, etc.). Remember: the annotator does not know
about multiple views. It just wants the default sofa data string.

This requires adding some sort of converter which should be reusable. This
converter would be implemented as an analysis engine and routed by a Flow
controller, just before the annotator, so that in the cas, the annotator
would find the right text format.

The problem with this approach is that, after the Flow controller determines
that an annotator needs a converter, and returns the converter's step so
that the converter could do its job and prepare the cas for the annotator,
it gets into trouble because UIMA does not allow modifying the sofa data
string after it has already been set, thus unable to convert.

The exception is, just in case:
org.apache.uima.cas.CASRuntimeException: Data for Sofa feature
setLocalSofaData() has already been set.

I understand that this is a common sense restriction imposed by default by
UIMA, but I would like to disable the restriction from the Flow controller
just for the converter, then enable it back for the annotator. The flow
controller would cache the original content and restore it after each
annotator has finished it's job and before routing another
annotator/converter pair.

I tried this:
((CASImpl) cas).enableReset(true);
cas.reset();
cas.setDocumentText("test");

But that, obviously, removes any annotations from the CAS's index and I
don't want that. So I tried to restore the cas by doing:
cas.addFsToIndexes(somePreviousAnnotation);

The result is a NPE that seems to be caused by an invalid state of the CAS
that I have just reset.

Here's the stack trace:
Caused by: java.lang.NullPointerException
    at
org.apache.uima.cas.impl.FSIndexRepositoryImpl.ll_addFS(FSIndexRepositoryImpl.java:1344)
    at
org.apache.uima.cas.impl.FSIndexRepositoryImpl.addFS(FSIndexRepositoryImpl.java:812)
    at
org.apache.uima.cas.impl.FSIndexRepositoryImpl.addFS(FSIndexRepositoryImpl.java:1258)
    at org.apache.uima.cas.impl.CASImpl.addFsToIndexes(CASImpl.java:3787)
    at
ws.scribo.MediaTypeFlowController$MediaTypeFlow.next(MediaTypeFlowController.java:184)

I would really appreciate it if someone would help me set the CAS sofa data
from an annotator after it already has been set, as I explained above.

If my approach is fundamentally flawed, I would highly appreciate
suggestions on how may I achieve the same results as initially desired,
hopefully still respecting the restriction that annotators are not sofa
aware.

Thanks for your time!

Reply via email to