Hello Armin,

UIMA does not provide for this piece of information in the CAS. You can use the 
SourceDocumentInformation and you can also use it with CAS if you want, but you 
will have to access it using the complicated way, e.g. something like this:

  Type t = 
cas.getTypeSystem().getType("org.apache.uima.examples.SourceDocumentInformation");
  AnnotationFS anno = cas.createAnnotation(type, 0, 
cas.getDocumentText().length());
                anno.setStringValue(type.getFeatureByBaseName("uri"), 
"file:/path/to/file.txt");
  cas.addToIndexes(anno);

In DKPro Core we define a DocumentMetaData type which is our replacement for 
SourceDocumentInformation and used by our readers and writers. It provides the 
fields:

  documentTitle
  documentBaseUri
  documentUri
  collectionId
  documentId
  isLastSegment

We currently do not have the fields "offsetInSource" and "documentSize". I 
think I should add these.

Anyway, you can define your own metadata annotation type inheriting from 
DocumentAnnotation and use that. You should add it to the CAS before setting 
any language or text though, because otherwise UIMA will automatically create a 
default DocumentAnnotation in the CAS and you will end up with two meta data 
annotations. If you add yours first, UIMA will use it and it will be accessible 
via CAS.getDocumentAnnotation() as well.

Best,

-- Richard

Am 23.11.2011 um 09:16 schrieb [email protected]:

> Hi!
> 
> I need to know the name of the source documents when writing the
> resulting CASes from a pipline which starts be reading source documents
> with a collection reader. I thougt that
> org.apache.umia.examples.SourceDocumentInformation is the correct means
> to do it. But it is just an example and it works with JCas only. Is
> there no SourceDocumentInformation for CAS? Is this really the way to do
> it or are there other means as well? Is it my responsibility to fill in
> the values in a collection reader or is it done automatically?
> 
> Regards,
> 
> Armin

-- 
------------------------------------------------------------------- 
Richard Eckart de Castilho
Technical Lead
Ubiquitous Knowledge Processing Lab 
FB 20 Computer Science Department      
Technische Universität Darmstadt 
Hochschulstr. 10, D-64289 Darmstadt, Germany 
phone [+49] (0)6151 16-7477, fax -5455, room S2/02/B117
[email protected] 
www.ukp.tu-darmstadt.de 
Web Research at TU Darmstadt (WeRC) www.werc.tu-darmstadt.de
------------------------------------------------------------------- 




Reply via email to