This approach extracts the document name from a custom, user-defined feature
structure that cTAKES defines. This is not part of core UIMA.
So I don't think this will work unless you're running the cTAKES pipeline.
-------------
UIMA doesn't know about documents as a built-in concept. If you want, you can
have your Collection Reader or other code which sets up the CAS with the
original document, add a special feature structure (which you define) with any
information you need about the document you use to initialize a CAS. Because
this user-defined feature structure is put into the CAS, you can extract it
later in any Annotator.
The examples that come with UIMA include a collection reader:
org.apache.uima.examples.cpe.FileSystemCollectionReader. This example creates a
user-defined feature structure called SourceDocumentInformation to hold the path
to the source file.
You can see this example in
http://uima.apache.org/d/uimaj-2.4.0/tutorials_and_users_guides.html,
search for sourcedocument.
-Marshall
On 5/1/2012 7:46 PM, Halgrim, Scott wrote:
In my CAS Consumer I use:
void processView(JCas view) throws Exception {
String docName = DocumentIDAnnotationUtil.getDocumentID(view);
...
}
DocumentIDAnnotationUtil is a cTAKES class available here:
https://ohnlp.svn.sourceforge.net/svnroot/ohnlp/trunk/cTAKES/core/src/edu/mayo/bmi/uima/core/util/
Not sure if that's the best way, but hope it helps.
Scott
-----Original Message-----
From: michelangelo [mailto:[email protected]]
Sent: Tuesday, May 01, 2012 2:33 PM
To: [email protected]
Subject: how to get the original filename of the input document?
Hello
I did my first aggregate AE with several Annotators. All works fine but,
now, I
need the original filename (or filepath) of the input Document. I did
several
tries with getSofaDataURI() = but it is null, and other Annotations in the
JCas
but without success. While I can with success obtain mime-type, language,
etc...
I did a xml serialization of the JCas and I can see the filepath in a
<string>...document.txt</string> tag. How can I access it?
many thanks
Michelangelo
GHC Confidentiality Statement
This message and any attached files might contain confidential information
protected by federal and state law. The information is intended only for the
use of the individual(s) or entities originally named as addressees. The
improper disclosure of such information may be subject to civil or criminal
penalties. If this message reached you in error, please contact the sender and
destroy this message. Disclosing, copying, forwarding, or distributing the
information by unauthorized individuals or entities is strictly prohibited by
law.