Here's one suggestion.

I imagine that if someone edited the text of a paragraph, a simple edit could
end up changing quite a bit of the annotations.  So, a good approach would be to
re-do the annotations of the newly changed text, from the ground up. 

This means that you would create a new CAS with its subject-of-analysis being
the changed document, and run it through the pipeline again.

I will give a poor example (I'm not a great linguist...) If the original text 
was:

  The wet bank was close to the bridge.  It was full of people in bathing suits.

and there were annotations linking "It" and "bank" and bank was identified as
the side of a river.

And, then, you changed it to

    The central bank was close to the bridge.  It was full of people in bathing
suits.

and there were now annotations linking "it" and "bridge", and bank was
identified as a financial institution, you can see that smallish changes could
have long-distance and complex consequences.


I suppose the reason you don't want to do the straight forward approach of
creating a new CAS for every change has to do with thinking it would be too
inefficient.

There are a couple of ways that could be addressed.  The "document" (or whatever
you want to call the thing being worked on) could be split into smallish units
(for example, paragraphs), so the thing being re-processed would be smaller.  Of
course, this means that inter-paragraph effects would be lost.

Another thing you could do is to use the capability of the CAS to support
multiple views. Each view has its own subject of analysis.  (See
http://uima.apache.org/downloads/releaseDocs/2.3.0-incubating/docs/html/tutorials_and_users_guides/tutorials_and_users_guides.html#ugr.tug.mvs
)

You could then try and write some kind of "fast path" that for an updated text,
would attempt to map as many of the previous annotations from the original text
to it.  I think this could be a difficult problem to solve in general, but in
specific cases, some fast path situations may exist.

HTH.  -Marshall Schor


On 9/10/2010 2:22 AM, Jim wrote:
> We are looking to build an editor for (human) translators that would display
> many layers of offset based annotations while allowing real time edits of both
> text and possibly the annotations themselves.
>
> So far this project (link below) is the best example we have seen. We were
> wondering if UIMA had something similar or could offer us some insights. We
> are certainly interested in applying UIMA annotators. But its the real time
> editing part we are finding challenging.
>
> http://code.google.com/p/wave-robot-java-client/
>
> Jim
>
>
> On 9/9/2010 1:30 AM, Thilo Götz wrote:
>> Hi,
>>
>> On 9/9/2010 01:00, Jim Hargrave wrote:
>>> I apologize if my terminology doesn't match with normal UIMA usage - but
>>> hopefully the general idea will be understandable.
>>>
>>> Is it always assumed that UIMA's document text is immutable?
>>
>> yes.
>>
>>> Let's say you have some text and with several position-based annotations.
>>> The text changes, now all of your annotation positions are incorrect. Are
>>> there API's that allow you to change your text, but still preserve the
>>> offsets in your annotations?
>>
>> There is no built-in support for this sort of thing in UIMA.
>> It would be easy to do after UIMA analysis has finished, but
>> I imagine you want to modify the text during analysis.  That
>> is not possible because UIMA subjects of analysis are
>> immutable.
>>
>> If you give us more details, we may have some ideas about
>> different approaches to the issue.
>>
>> --Thilo
>>
>>>
>>> Jim
>>>
>>>
>>>   NOTICE: This email message is for the sole use of the intended
>>> recipient(s) and may contain confidential and privileged information. Any
>>> unauthorized review, use, disclosure or distribution is prohibited. If you
>>> are not the intended recipient, please contact the sender by reply email and
>>> destroy all copies of the original message.
>>>
>>>
>>>
>>
>
>
>
>

Reply via email to