Re: big offsets efficiency, and multiple offsets

Richard Eckart de Castilho Wed, 04 Dec 2013 07:21:21 -0800

selectCovered() and friends expect annotations (or AnnotationFS), yes. Anyway, 
I don't want to convince you to deviate from your idea. Frame offsets sound 
very reasonable. Just trying to discuss potential implications and confusions 
(e.g. getCoveredText() not working).


>>> Also, can I have several indexes on the same annotations in order to work 
>>> with character offsets for text analysis, but then efficiently query for 
>>> overlapping annotations from other views based on frame offsets?

Afaik you cannot query across views, e.g. do a selectCovered(view2, 
view1Annotation, X.class) because the afaik the UIMA FSIterator.moveTo() 
mechanism tries to locate view1Annotation in the indexes of view2, which will 
not work. For this reason, I'm actually thinking about removing these 
potentially problematic signatures from uimaFIT and just keep 
selectCovered(view1Annotation, X.class). You should at least check this, if 
this is part of your assumption.

-- Richard

On 04.12.2013, at 12:47, Jens Grivolla <j+...@grivolla.net> wrote:

> True, but don't things like selectCovered() etc. expect Annotations (to match 
> on begin/end)? So using Annotation might make it easier in some cases to 
> select the annotations we're interested in.
> 
> -- Jens
> 
> On 04/12/13 15:35, Richard Eckart de Castilho wrote:
>> Why is it bad if you cannot inherit from Annotation? The getCoveredText() 
>> will not work anyway if you are working with audio/video data.
>> 
>> -- Richard
>> 
>> On 04.12.2013, at 12:31, Jens Grivolla <j+...@grivolla.net> wrote:
>> 
>>> Hi, we're now starting the EUMSSI project, which deals with integrating 
>>> annotation layers coming from audio, video and text analysis.
>>> 
>>> We're thinking to base it all on UIMA, having different views with separate 
>>> audio, video, transcribed text, etc. sofas.  In order to align the 
>>> different views we need to have a common offset specification that allows 
>>> us to map e.g. character offsets to the corresponding timestamps.
>>> 
>>> In order to avoid float timestamps (which would mean we can't derive from 
>>> Annotation) I was thinking of using audio/video frames with e.g. 100 or 
>>> 1000 frames/second.  Annotation has begin and end defined as signed 32 bit 
>>> ints, leaving sufficient room for very long documents even at 1000 fps, so 
>>> I don't think we're going to run into any limits there.  Is there anything 
>>> that could become problematic when working with offsets that are probably 
>>> quite a bit larger than what is typically found with character offsets?
>>> 
>>> Also, can I have several indexes on the same annotations in order to work 
>>> with character offsets for text analysis, but then efficiently query for 
>>> overlapping annotations from other views based on frame offsets?
>>> 
>>> Btw, if you're interested in the project we have a writeup (condensed from 
>>> the project proposal) here: 
>>> https://dl.dropboxusercontent.com/u/4169273/UIMA_EUMSSI.pdf and there will 
>>> hopefully soon be some content on http://eumssi.eu/
>>> 
>>> Thanks,
>>> Jens

Re: big offsets efficiency, and multiple offsets

Reply via email to