Hi, Here are some thoughts.
* You have main-text, images, margin notes, and for the latter two, "position on the page" information. You should put the main-text into a sofa, like you say. You may put the images and margin notes into either additional sofas or feature structures in the main sofa. The decision for where to put these depends on what kind of analysis you plan to do with the images and margin notes. They should be in sofas if you plan to run some unstructured analytics annotators over them, for example some image recognition or classification analytics. But if you just need to keep these as artifacts, with no particular kind of analytics for these parts, just put them in additional feature structures in the main sofa. Re: can UIMA handle sofas with different kinds of data: yes it can. Each sofa can be a text string or a byte array (local or remote); see: http://uima.apache.org/d/uimaj-current/tutorials_and_users_guides.html#ugr.tug.aas.sofa Re: can annotations refer to feature structures in other sofas: yes they can. See http://uima.apache.org/d/uimaj-current/tutorials_and_users_guides.html#ugr.tug.mvs.sample_application -Marshall On 3/22/2017 10:32 AM, Markus Krug wrote: > Dear UIMA-users, > > we are currently facing the issue, that the documents we are processing > using UIMA have more than just "linear text". > > On top of text we got images and marginal notes that should be encoded > at the correct positions. (Output of OCR and image segmentation) > > So far i do not know if UIMA is capable of handling sofas with different > types of material (e.g. text and images) > > We came up with a concept like this (please comment if this is stupid or > if better ways to handle this have been found already) > > 1. Store the main text in the primary sofa > > 2. For each image/marginal note, use a different sofa and store the > content in there > > 3. In the main text, refer to annotations in different sofas (is this > possible? - i never needed this before) at the according position > > If there are any best praqctices for those kind of problems i would be > glad if you would let me know > > Thanks in advance > > Markus Krug > >
