This approach extracts the document name from a custom, user-defined feature structure that cTAKES defines. This is not part of core UIMA.

So I don't think this will work unless you're running the cTAKES pipeline.

-------------

UIMA doesn't know about documents as a built-in concept. If you want, you can have your Collection Reader or other code which sets up the CAS with the original document, add a special feature structure (which you define) with any information you need about the document you use to initialize a CAS. Because this user-defined feature structure is put into the CAS, you can extract it later in any Annotator.

The examples that come with UIMA include a collection reader: org.apache.uima.examples.cpe.FileSystemCollectionReader. This example creates a user-defined feature structure called SourceDocumentInformation to hold the path to the source file.

You can see this example in http://uima.apache.org/d/uimaj-2.4.0/tutorials_and_users_guides.html,
search for sourcedocument.

-Marshall

On 5/1/2012 7:46 PM, Halgrim, Scott wrote:
In my CAS Consumer I use:

     void processView(JCas view) throws Exception {
         String docName = DocumentIDAnnotationUtil.getDocumentID(view);
                 ...
         }

DocumentIDAnnotationUtil is a cTAKES class available here: 
https://ohnlp.svn.sourceforge.net/svnroot/ohnlp/trunk/cTAKES/core/src/edu/mayo/bmi/uima/core/util/

Not sure if that's the best way, but hope it helps.

Scott

-----Original Message-----
From: michelangelo [mailto:[email protected]]
Sent: Tuesday, May 01, 2012 2:33 PM
To: [email protected]
Subject: how to get the original filename of the input document?

Hello

I did my first aggregate AE with several Annotators. All works fine but,
now, I
need the original filename (or filepath) of the input Document. I did
several
tries with getSofaDataURI() = but it is null, and other Annotations in the
JCas
but without success. While I can with success obtain mime-type, language,
etc...
I did a xml serialization of the JCas and I can see the filepath in a
<string>...document.txt</string>  tag. How can I access it?

many thanks
Michelangelo

GHC Confidentiality Statement

This message and any attached files might contain confidential information 
protected by federal and state law. The information is intended only for the 
use of the individual(s) or entities originally named as addressees. The 
improper disclosure of such information may be subject to civil or criminal 
penalties. If this message reached you in error, please contact the sender and 
destroy this message. Disclosing, copying, forwarding, or distributing the 
information by unauthorized individuals or entities is strictly prohibited by 
law.

Reply via email to