Hello Armin,
UIMA does not provide for this piece of information in the CAS. You can use the
SourceDocumentInformation and you can also use it with CAS if you want, but you
will have to access it using the complicated way, e.g. something like this:
Type t =
cas.getTypeSystem().getType("org.apache.uima.examples.SourceDocumentInformation");
AnnotationFS anno = cas.createAnnotation(type, 0,
cas.getDocumentText().length());
anno.setStringValue(type.getFeatureByBaseName("uri"),
"file:/path/to/file.txt");
cas.addToIndexes(anno);
In DKPro Core we define a DocumentMetaData type which is our replacement for
SourceDocumentInformation and used by our readers and writers. It provides the
fields:
documentTitle
documentBaseUri
documentUri
collectionId
documentId
isLastSegment
We currently do not have the fields "offsetInSource" and "documentSize". I
think I should add these.
Anyway, you can define your own metadata annotation type inheriting from
DocumentAnnotation and use that. You should add it to the CAS before setting
any language or text though, because otherwise UIMA will automatically create a
default DocumentAnnotation in the CAS and you will end up with two meta data
annotations. If you add yours first, UIMA will use it and it will be accessible
via CAS.getDocumentAnnotation() as well.
Best,
-- Richard
Am 23.11.2011 um 09:16 schrieb [email protected]:
> Hi!
>
> I need to know the name of the source documents when writing the
> resulting CASes from a pipline which starts be reading source documents
> with a collection reader. I thougt that
> org.apache.umia.examples.SourceDocumentInformation is the correct means
> to do it. But it is just an example and it works with JCas only. Is
> there no SourceDocumentInformation for CAS? Is this really the way to do
> it or are there other means as well? Is it my responsibility to fill in
> the values in a collection reader or is it done automatically?
>
> Regards,
>
> Armin
--
-------------------------------------------------------------------
Richard Eckart de Castilho
Technical Lead
Ubiquitous Knowledge Processing Lab
FB 20 Computer Science Department
Technische Universität Darmstadt
Hochschulstr. 10, D-64289 Darmstadt, Germany
phone [+49] (0)6151 16-7477, fax -5455, room S2/02/B117
[email protected]
www.ukp.tu-darmstadt.de
Web Research at TU Darmstadt (WeRC) www.werc.tu-darmstadt.de
-------------------------------------------------------------------