Re: big offsets efficiency, and multiple offsets
On 05/12/13 10:04, Jens Grivolla wrote: > I agree that it might make more sense to model our needs more directly >> instead of trying to squeeze it into the schema we normally use for text >> processing. But at the same time I would of course like to avoid having >> to reimplement many of the things that are already available when using >> AnnotationBase. >> >> For the cross-view indexing issue I was thinking of creating individual >> views for each modality and then a merged view that just contains a >> subset of annotations of each view, and on which we would do the >> cross-modal reasoning. >> >> I just looked again at the GaleMultiModalExample (not much there, >> unfortunately) and saw that e.g. AudioSpan derives from AnnotationBase >> but still has float values for begin/end. I would be really interested >> in learning more about what was done in GALE, but it's hard to find any >> relevant information... >> > The readme at http://svn.apache.org/repos/asf/uima/sandbox/trunk/GaleMultiModalExample/README.txtpoints to two papers with more details on the GALE multi-modal application. A portion of the view model was like this: Audio view - sofaref to the audio data, which was passed in parallel to multiple ASR annotators. Each ASR annotator put it's transcription in the view, where annotations contained ASR engine IDs. Transcription Views - a text sofa with transcription for an ASR output. Annotations for each word referenced the lexeme annotations in the audio view. Multiple MT annotators would receive each transcription view and add their translations in the view. Translation views - a text sofa with one of the translations, based on a combination of ASR engine and MT engine. Annotations in a translation view referenced the annotations in a transcription view. There were more views. The points here are that 1) views were designed to hold a particular SOFA to be processed by analytics appropriate for that modality, 2) each derived view had cross references to the annotations in views they were derived from, and 3) at the end the GUI presenting the final translation could, for any word(s), show the particular piece of transcription it came from, and/or play the associated audio segment. Eddie
Re: big offsets efficiency, and multiple offsets
I forgot to say that the text analysis view(s) will necessarily have to use character offsets so that we can obtain the coveredText, which means that all resulting annotations will also use character offsets. The merged view will need to use time-based offsets which means that we have to recreate the annotations there with mapped offsets rather than just index the same annotations in a different view. I think that basically means that we won't do much cross-view querying but rather have one component (AE) that reads from all views and creates a new one with new independent annotations after mapping the offsets. -- Jens On 05/12/13 10:04, Jens Grivolla wrote: I agree that it might make more sense to model our needs more directly instead of trying to squeeze it into the schema we normally use for text processing. But at the same time I would of course like to avoid having to reimplement many of the things that are already available when using AnnotationBase. For the cross-view indexing issue I was thinking of creating individual views for each modality and then a merged view that just contains a subset of annotations of each view, and on which we would do the cross-modal reasoning. I just looked again at the GaleMultiModalExample (not much there, unfortunately) and saw that e.g. AudioSpan derives from AnnotationBase but still has float values for begin/end. I would be really interested in learning more about what was done in GALE, but it's hard to find any relevant information... Thanks, Jens On 04/12/13 20:16, Marshall Schor wrote: Echoing Richard, 1) It would perhaps make more sense to be more direct about each of the different types of data. UIMA "built-in" only the most "popular" things - and Annotation was one of them. Annotation derives from Annotation-base, which just defines an associated Sofa / view. So it would make more sense to define different kinds of highest-level abstractions for your project, related to the different kinds of views/sofas. Audio might entail a begin / end style of offsets; Images might entail a pair x-y coordinates, to describe a (square) subset of an image. Video might do something like audio, or something more complex... UIMA's use of the AnnotationBase includes insuring that when you add-to-indexes (an operation that implicitly takes a "view" - and adds a FS to that view), that if the FS is a subtype of AnnotationBase, then the FS must be indexed in the associated view to which that FS "belongs"; if you try to add-to-index in a view other than the one the FS was created in, you get this kind of error: Error - the Annotation "{0}" is over view "{1}" and cannot be added to indexes associated with the different view "{2}". The logic behind this restriction is: an Annotation (or, more generally, an object having a supertype of AnnotationBase) is (by definition) associated with a particular Sofa/View, and it is more likely that it is an error if that annotation is indexed with a sofa it doesn't belong with. Of course, Feature Structures which are not Annotations (or more generally, not derived from AnnotationBase), can be indexed in multiple views. 2) By keeping separate notions for pointers-into-the-Sofa, you can define algorithmic mappings for these that make the best sense for your project, including notions of fuzzyness, time-shift (imagine the audio is out-of-sync with the video, like lots of u-tube things seem to be), etc. -Marshall On 12/4/2013 9:31 AM, Jens Grivolla wrote: Hi, we're now starting the EUMSSI project, which deals with integrating annotation layers coming from audio, video and text analysis. We're thinking to base it all on UIMA, having different views with separate audio, video, transcribed text, etc. sofas. In order to align the different views we need to have a common offset specification that allows us to map e.g. character offsets to the corresponding timestamps. In order to avoid float timestamps (which would mean we can't derive from Annotation) I was thinking of using audio/video frames with e.g. 100 or 1000 frames/second. Annotation has begin and end defined as signed 32 bit ints, leaving sufficient room for very long documents even at 1000 fps, so I don't think we're going to run into any limits there. Is there anything that could become problematic when working with offsets that are probably quite a bit larger than what is typically found with character offsets? Also, can I have several indexes on the same annotations in order to work with character offsets for text analysis, but then efficiently query for overlapping annotations from other views based on frame offsets? Btw, if you're interested in the project we have a writeup (condensed from the project proposal) here: https://dl.dropboxusercontent.com/u/4169273/UIMA_EUMSSI.pdf and there will hopefully soon be some content on http://eumssi.eu/ Thanks, Jens
Re: big offsets efficiency, and multiple offsets
I agree that it might make more sense to model our needs more directly instead of trying to squeeze it into the schema we normally use for text processing. But at the same time I would of course like to avoid having to reimplement many of the things that are already available when using AnnotationBase. For the cross-view indexing issue I was thinking of creating individual views for each modality and then a merged view that just contains a subset of annotations of each view, and on which we would do the cross-modal reasoning. I just looked again at the GaleMultiModalExample (not much there, unfortunately) and saw that e.g. AudioSpan derives from AnnotationBase but still has float values for begin/end. I would be really interested in learning more about what was done in GALE, but it's hard to find any relevant information... Thanks, Jens On 04/12/13 20:16, Marshall Schor wrote: Echoing Richard, 1) It would perhaps make more sense to be more direct about each of the different types of data. UIMA "built-in" only the most "popular" things - and Annotation was one of them. Annotation derives from Annotation-base, which just defines an associated Sofa / view. So it would make more sense to define different kinds of highest-level abstractions for your project, related to the different kinds of views/sofas. Audio might entail a begin / end style of offsets; Images might entail a pair x-y coordinates, to describe a (square) subset of an image. Video might do something like audio, or something more complex... UIMA's use of the AnnotationBase includes insuring that when you add-to-indexes (an operation that implicitly takes a "view" - and adds a FS to that view), that if the FS is a subtype of AnnotationBase, then the FS must be indexed in the associated view to which that FS "belongs"; if you try to add-to-index in a view other than the one the FS was created in, you get this kind of error: Error - the Annotation "{0}" is over view "{1}" and cannot be added to indexes associated with the different view "{2}". The logic behind this restriction is: an Annotation (or, more generally, an object having a supertype of AnnotationBase) is (by definition) associated with a particular Sofa/View, and it is more likely that it is an error if that annotation is indexed with a sofa it doesn't belong with. Of course, Feature Structures which are not Annotations (or more generally, not derived from AnnotationBase), can be indexed in multiple views. 2) By keeping separate notions for pointers-into-the-Sofa, you can define algorithmic mappings for these that make the best sense for your project, including notions of fuzzyness, time-shift (imagine the audio is out-of-sync with the video, like lots of u-tube things seem to be), etc. -Marshall On 12/4/2013 9:31 AM, Jens Grivolla wrote: Hi, we're now starting the EUMSSI project, which deals with integrating annotation layers coming from audio, video and text analysis. We're thinking to base it all on UIMA, having different views with separate audio, video, transcribed text, etc. sofas. In order to align the different views we need to have a common offset specification that allows us to map e.g. character offsets to the corresponding timestamps. In order to avoid float timestamps (which would mean we can't derive from Annotation) I was thinking of using audio/video frames with e.g. 100 or 1000 frames/second. Annotation has begin and end defined as signed 32 bit ints, leaving sufficient room for very long documents even at 1000 fps, so I don't think we're going to run into any limits there. Is there anything that could become problematic when working with offsets that are probably quite a bit larger than what is typically found with character offsets? Also, can I have several indexes on the same annotations in order to work with character offsets for text analysis, but then efficiently query for overlapping annotations from other views based on frame offsets? Btw, if you're interested in the project we have a writeup (condensed from the project proposal) here: https://dl.dropboxusercontent.com/u/4169273/UIMA_EUMSSI.pdf and there will hopefully soon be some content on http://eumssi.eu/ Thanks, Jens
Re: big offsets efficiency, and multiple offsets
:) Btw. the indexing system in UIMA didn't appear extensible to me last time I checked. Considering somebody would introduce a x/y coordinates scheme for image data. This would call for some spatial index, e.g. a k-d tree. While it is possible to define different indexes of the bag, set, and sorted kind, it is not possible to add a new kind of index. I think, this would be quite a useful feature, also for linguistic data. E.g. an index to efficiently navigate the dominance relations in syntactic tree structures. At the UIMA@GSCL 2013 workshop, Nicolas Hernandez [1] provided a nice summary of kinds of navigation that would be nice to have in UIMA, but are currently not supported. His work, alas, focusses on text. I imagine that the processing of audio and video data whips up a whole new batch of desirable types of navigation and indexing. Although in UIMA, anchoring and annotations have been conflated into the same thing (e.g. Annotation), it is not uncommon to consider anchoring an entirely different aspect from annotations (cf. [2-4]). This recognizes that there are specific considerations for each kind of anchoring (different kinds of discrete/continuous x-dimensional spaces, identifiable segments, alignments, etc.) in particular related to navigation and relation. Cheers, -- Richard [1] http://ceur-ws.org/Vol-1038/paper_11.pdf [2] http://dl.acm.org/citation.cfm?id=1273097 [3] http://www.doaj.org/doaj?func=fulltext&aId=812876 [4] http://dl.acm.org/citation.cfm?id=1642059.1642060 On 04.12.2013, at 17:16, Marshall Schor wrote: > Echoing Richard, > > 1) It would perhaps make more sense to be more direct about each of the > different types of data. UIMA "built-in" only the most "popular" things - and > Annotation was one of them. > > Annotation derives from Annotation-base, which just defines an associated > Sofa / > view. > > So it would make more sense to define different kinds of highest-level > abstractions for your project, related to the different kinds of views/sofas. > Audio might entail a begin / end style of offsets; Images might entail a pair > x-y coordinates, to describe a (square) subset of an image. Video might do > something like audio, or something more complex... > > UIMA's use of the AnnotationBase includes insuring that when you > add-to-indexes > (an operation that implicitly takes a "view" - and adds a FS to that view), > that > if the FS is a subtype of AnnotationBase, then the FS must be indexed in the > associated view to which that FS "belongs"; if you try to add-to-index in a > view > other than the one the FS was created in, you get this kind of error: > > Error - the Annotation "{0}" is over view "{1}" and cannot be added to indexes > associated with the different view "{2}". > > The logic behind this restriction is: an Annotation (or, more generally, an > object having a supertype of AnnotationBase) is (by definition) associated > with > a particular Sofa/View, and it is more likely that it is an error if that > annotation is indexed with a sofa it doesn't belong with. > > Of course, Feature Structures which are not Annotations (or more generally, > not > derived from AnnotationBase), can be indexed in multiple views. > > 2) By keeping separate notions for pointers-into-the-Sofa, you can define > algorithmic mappings for these that make the best sense for your project, > including notions of fuzzyness, time-shift (imagine the audio is out-of-sync > with the video, like lots of u-tube things seem to be), etc. > > -Marshall > > > On 12/4/2013 9:31 AM, Jens Grivolla wrote: >> Hi, we're now starting the EUMSSI project, which deals with integrating >> annotation layers coming from audio, video and text analysis. >> >> We're thinking to base it all on UIMA, having different views with separate >> audio, video, transcribed text, etc. sofas. In order to align the different >> views we need to have a common offset specification that allows us to map >> e.g. >> character offsets to the corresponding timestamps. >> >> In order to avoid float timestamps (which would mean we can't derive from >> Annotation) I was thinking of using audio/video frames with e.g. 100 or 1000 >> frames/second. Annotation has begin and end defined as signed 32 bit ints, >> leaving sufficient room for very long documents even at 1000 fps, so I don't >> think we're going to run into any limits there. Is there anything that could >> become problematic when working with offsets that are probably quite a bit >> larger than what is typically found with character offsets? >> >> Also, can I have several indexes on the same annotations in order to work >> with >> character offsets for text analysis, but then efficiently query for >> overlapping annotations from other views based on frame offsets? >> >> Btw, if you're interested in the project we have a writeup (condensed from >> the >> project proposal) here: >> https://dl.dropboxusercontent.com/u/4169273/UIMA_EUMSSI.pdf and there
Re: big offsets efficiency, and multiple offsets
Echoing Richard, 1) It would perhaps make more sense to be more direct about each of the different types of data. UIMA "built-in" only the most "popular" things - and Annotation was one of them. Annotation derives from Annotation-base, which just defines an associated Sofa / view. So it would make more sense to define different kinds of highest-level abstractions for your project, related to the different kinds of views/sofas. Audio might entail a begin / end style of offsets; Images might entail a pair x-y coordinates, to describe a (square) subset of an image. Video might do something like audio, or something more complex... UIMA's use of the AnnotationBase includes insuring that when you add-to-indexes (an operation that implicitly takes a "view" - and adds a FS to that view), that if the FS is a subtype of AnnotationBase, then the FS must be indexed in the associated view to which that FS "belongs"; if you try to add-to-index in a view other than the one the FS was created in, you get this kind of error: Error - the Annotation "{0}" is over view "{1}" and cannot be added to indexes associated with the different view "{2}". The logic behind this restriction is: an Annotation (or, more generally, an object having a supertype of AnnotationBase) is (by definition) associated with a particular Sofa/View, and it is more likely that it is an error if that annotation is indexed with a sofa it doesn't belong with. Of course, Feature Structures which are not Annotations (or more generally, not derived from AnnotationBase), can be indexed in multiple views. 2) By keeping separate notions for pointers-into-the-Sofa, you can define algorithmic mappings for these that make the best sense for your project, including notions of fuzzyness, time-shift (imagine the audio is out-of-sync with the video, like lots of u-tube things seem to be), etc. -Marshall On 12/4/2013 9:31 AM, Jens Grivolla wrote: > Hi, we're now starting the EUMSSI project, which deals with integrating > annotation layers coming from audio, video and text analysis. > > We're thinking to base it all on UIMA, having different views with separate > audio, video, transcribed text, etc. sofas. In order to align the different > views we need to have a common offset specification that allows us to map e.g. > character offsets to the corresponding timestamps. > > In order to avoid float timestamps (which would mean we can't derive from > Annotation) I was thinking of using audio/video frames with e.g. 100 or 1000 > frames/second. Annotation has begin and end defined as signed 32 bit ints, > leaving sufficient room for very long documents even at 1000 fps, so I don't > think we're going to run into any limits there. Is there anything that could > become problematic when working with offsets that are probably quite a bit > larger than what is typically found with character offsets? > > Also, can I have several indexes on the same annotations in order to work with > character offsets for text analysis, but then efficiently query for > overlapping annotations from other views based on frame offsets? > > Btw, if you're interested in the project we have a writeup (condensed from the > project proposal) here: > https://dl.dropboxusercontent.com/u/4169273/UIMA_EUMSSI.pdf and there will > hopefully soon be some content on http://eumssi.eu/ > > Thanks, > Jens > >
Re: big offsets efficiency, and multiple offsets
selectCovered() and friends expect annotations (or AnnotationFS), yes. Anyway, I don't want to convince you to deviate from your idea. Frame offsets sound very reasonable. Just trying to discuss potential implications and confusions (e.g. getCoveredText() not working). >>> Also, can I have several indexes on the same annotations in order to work >>> with character offsets for text analysis, but then efficiently query for >>> overlapping annotations from other views based on frame offsets? Afaik you cannot query across views, e.g. do a selectCovered(view2, view1Annotation, X.class) because the afaik the UIMA FSIterator.moveTo() mechanism tries to locate view1Annotation in the indexes of view2, which will not work. For this reason, I'm actually thinking about removing these potentially problematic signatures from uimaFIT and just keep selectCovered(view1Annotation, X.class). You should at least check this, if this is part of your assumption. -- Richard On 04.12.2013, at 12:47, Jens Grivolla wrote: > True, but don't things like selectCovered() etc. expect Annotations (to match > on begin/end)? So using Annotation might make it easier in some cases to > select the annotations we're interested in. > > -- Jens > > On 04/12/13 15:35, Richard Eckart de Castilho wrote: >> Why is it bad if you cannot inherit from Annotation? The getCoveredText() >> will not work anyway if you are working with audio/video data. >> >> -- Richard >> >> On 04.12.2013, at 12:31, Jens Grivolla wrote: >> >>> Hi, we're now starting the EUMSSI project, which deals with integrating >>> annotation layers coming from audio, video and text analysis. >>> >>> We're thinking to base it all on UIMA, having different views with separate >>> audio, video, transcribed text, etc. sofas. In order to align the >>> different views we need to have a common offset specification that allows >>> us to map e.g. character offsets to the corresponding timestamps. >>> >>> In order to avoid float timestamps (which would mean we can't derive from >>> Annotation) I was thinking of using audio/video frames with e.g. 100 or >>> 1000 frames/second. Annotation has begin and end defined as signed 32 bit >>> ints, leaving sufficient room for very long documents even at 1000 fps, so >>> I don't think we're going to run into any limits there. Is there anything >>> that could become problematic when working with offsets that are probably >>> quite a bit larger than what is typically found with character offsets? >>> >>> Also, can I have several indexes on the same annotations in order to work >>> with character offsets for text analysis, but then efficiently query for >>> overlapping annotations from other views based on frame offsets? >>> >>> Btw, if you're interested in the project we have a writeup (condensed from >>> the project proposal) here: >>> https://dl.dropboxusercontent.com/u/4169273/UIMA_EUMSSI.pdf and there will >>> hopefully soon be some content on http://eumssi.eu/ >>> >>> Thanks, >>> Jens
Re: big offsets efficiency, and multiple offsets
True, but don't things like selectCovered() etc. expect Annotations (to match on begin/end)? So using Annotation might make it easier in some cases to select the annotations we're interested in. -- Jens On 04/12/13 15:35, Richard Eckart de Castilho wrote: Why is it bad if you cannot inherit from Annotation? The getCoveredText() will not work anyway if you are working with audio/video data. -- Richard On 04.12.2013, at 12:31, Jens Grivolla wrote: Hi, we're now starting the EUMSSI project, which deals with integrating annotation layers coming from audio, video and text analysis. We're thinking to base it all on UIMA, having different views with separate audio, video, transcribed text, etc. sofas. In order to align the different views we need to have a common offset specification that allows us to map e.g. character offsets to the corresponding timestamps. In order to avoid float timestamps (which would mean we can't derive from Annotation) I was thinking of using audio/video frames with e.g. 100 or 1000 frames/second. Annotation has begin and end defined as signed 32 bit ints, leaving sufficient room for very long documents even at 1000 fps, so I don't think we're going to run into any limits there. Is there anything that could become problematic when working with offsets that are probably quite a bit larger than what is typically found with character offsets? Also, can I have several indexes on the same annotations in order to work with character offsets for text analysis, but then efficiently query for overlapping annotations from other views based on frame offsets? Btw, if you're interested in the project we have a writeup (condensed from the project proposal) here: https://dl.dropboxusercontent.com/u/4169273/UIMA_EUMSSI.pdf and there will hopefully soon be some content on http://eumssi.eu/ Thanks, Jens
Re: big offsets efficiency, and multiple offsets
Why is it bad if you cannot inherit from Annotation? The getCoveredText() will not work anyway if you are working with audio/video data. -- Richard On 04.12.2013, at 12:31, Jens Grivolla wrote: > Hi, we're now starting the EUMSSI project, which deals with integrating > annotation layers coming from audio, video and text analysis. > > We're thinking to base it all on UIMA, having different views with separate > audio, video, transcribed text, etc. sofas. In order to align the different > views we need to have a common offset specification that allows us to map > e.g. character offsets to the corresponding timestamps. > > In order to avoid float timestamps (which would mean we can't derive from > Annotation) I was thinking of using audio/video frames with e.g. 100 or 1000 > frames/second. Annotation has begin and end defined as signed 32 bit ints, > leaving sufficient room for very long documents even at 1000 fps, so I don't > think we're going to run into any limits there. Is there anything that could > become problematic when working with offsets that are probably quite a bit > larger than what is typically found with character offsets? > > Also, can I have several indexes on the same annotations in order to work > with character offsets for text analysis, but then efficiently query for > overlapping annotations from other views based on frame offsets? > > Btw, if you're interested in the project we have a writeup (condensed from > the project proposal) here: > https://dl.dropboxusercontent.com/u/4169273/UIMA_EUMSSI.pdf and there will > hopefully soon be some content on http://eumssi.eu/ > > Thanks, > Jens