[EMAIL PROTECTED] wrote:
> Hello,
> 
> I intend to use UIMA for analyzing electronic ink. This consists of a set of 
> stroke objects, which, in turn, each contain a set of samples (x,y 
> coordinates). Since I'm new to UIMA and I only found examples dealing with 
> text analysis, I wonder how ink data can be represented best within UIMA. 
> 
> I would favor to use these pre-structured data as base data of one SOFA. A 
> second SOFA would consist in the text analyzed by a handwriting recognition 
> engine. 
> 
> Yet, as I understand, SOFA base data must be "flat". For ink data, this would 
> imply to store all samples in an array of integers, which is then annotated 
> with metadata for the structure (samples and strokes)? Is this the only way 
> to model these data?
> 
> Thanks in advance for your help!
> 
> Jürgen
> 

Sounds like a cool project.  Anyway, you don't absolutely need any sofa
data, if it's not useful.  You can create *only* structured data
(feature structures) if you like.  They don't need to be anchored in
some (text-like) artifact if that's not useful for your problem domain.

So you could create stroke objects as containers, where you reference
the corresponding set of samples.  The samples could be simple objects
themselves.  You would probably want to put the stroke objects in a
custom index, sorted by some properties that makes sense for these kinds
of objects.

Hope this helps.  Let us know how it goes.

--Thilo

Reply via email to