Re: big offsets efficiency, and multiple offsets

Jens Grivolla Wed, 04 Dec 2013 06:48:43 -0800

True, but don't things like selectCovered() etc. expect Annotations (tomatch on begin/end)? So using Annotation might make it easier in somecases to select the annotations we're interested in.


-- Jens


On 04/12/13 15:35, Richard Eckart de Castilho wrote:

Why is it bad if you cannot inherit from Annotation? The getCoveredText() will 
not work anyway if you are working with audio/video data.

-- Richard

On 04.12.2013, at 12:31, Jens Grivolla <[email protected]> wrote:

Hi, we're now starting the EUMSSI project, which deals with integrating 
annotation layers coming from audio, video and text analysis.

We're thinking to base it all on UIMA, having different views with separate 
audio, video, transcribed text, etc. sofas.  In order to align the different 
views we need to have a common offset specification that allows us to map e.g. 
character offsets to the corresponding timestamps.

In order to avoid float timestamps (which would mean we can't derive from 
Annotation) I was thinking of using audio/video frames with e.g. 100 or 1000 
frames/second.  Annotation has begin and end defined as signed 32 bit ints, 
leaving sufficient room for very long documents even at 1000 fps, so I don't 
think we're going to run into any limits there.  Is there anything that could 
become problematic when working with offsets that are probably quite a bit 
larger than what is typically found with character offsets?

Also, can I have several indexes on the same annotations in order to work with 
character offsets for text analysis, but then efficiently query for overlapping 
annotations from other views based on frame offsets?

Btw, if you're interested in the project we have a writeup (condensed from the 
project proposal) here: 
https://dl.dropboxusercontent.com/u/4169273/UIMA_EUMSSI.pdf and there will 
hopefully soon be some content on http://eumssi.eu/

Thanks,
Jens

Re: big offsets efficiency, and multiple offsets

Reply via email to