Hi Richard, thank you. I did it the hard way using CAS. JCas works fine as well. In both cases SourceDocumentInformation.xml has to be included as a type system.
For the latter case, I derived a JCas from a CAS with getJCas() local to the process method as in org.apache.uima.examples.cpe.FileSystemCollectionReader.java and used the SourceDocumentInformation class to fill that annotation attributes in. What happens if I make a JCas from a CAS that way? Is it just another frontend for the same data or is the whole CAS data copied/duplicated to a new JCas instance? Regards, Armin -----Ursprüngliche Nachricht----- Von: Richard Eckart de Castilho [mailto:[email protected]] Gesendet: Mittwoch, 23. November 2011 10:00 An: [email protected] Betreff: Re: SourceDocumentInformation Hello Armin, UIMA does not provide for this piece of information in the CAS. You can use the SourceDocumentInformation and you can also use it with CAS if you want, but you will have to access it using the complicated way, e.g. something like this: Type t = cas.getTypeSystem().getType("org.apache.uima.examples.SourceDocumentInformation"); AnnotationFS anno = cas.createAnnotation(type, 0, cas.getDocumentText().length()); anno.setStringValue(type.getFeatureByBaseName("uri"), "file:/path/to/file.txt"); cas.addToIndexes(anno); In DKPro Core we define a DocumentMetaData type which is our replacement for SourceDocumentInformation and used by our readers and writers. It provides the fields: documentTitle documentBaseUri documentUri collectionId documentId isLastSegment We currently do not have the fields "offsetInSource" and "documentSize". I think I should add these. Anyway, you can define your own metadata annotation type inheriting from DocumentAnnotation and use that. You should add it to the CAS before setting any language or text though, because otherwise UIMA will automatically create a default DocumentAnnotation in the CAS and you will end up with two meta data annotations. If you add yours first, UIMA will use it and it will be accessible via CAS.getDocumentAnnotation() as well. Best, -- Richard Am 23.11.2011 um 09:16 schrieb [email protected]: > Hi! > > I need to know the name of the source documents when writing the > resulting CASes from a pipline which starts be reading source > documents with a collection reader. I thougt that > org.apache.umia.examples.SourceDocumentInformation is the correct > means to do it. But it is just an example and it works with JCas only. > Is there no SourceDocumentInformation for CAS? Is this really the way > to do it or are there other means as well? Is it my responsibility to > fill in the values in a collection reader or is it done automatically? > > Regards, > > Armin -- ------------------------------------------------------------------- Richard Eckart de Castilho Technical Lead Ubiquitous Knowledge Processing Lab FB 20 Computer Science Department Technische Universität Darmstadt Hochschulstr. 10, D-64289 Darmstadt, Germany phone [+49] (0)6151 16-7477, fax -5455, room S2/02/B117 [email protected] www.ukp.tu-darmstadt.de Web Research at TU Darmstadt (WeRC) www.werc.tu-darmstadt.de -------------------------------------------------------------------
