OK - found the problem.

The Document analyzer uses a component "FileSystemCollectionReader" to read the
files. This component inserts into the CAS the name of the file being read,
using the code:

      // Also store location of source document in CAS. This information is 
critical
      // if CAS Consumers will need to know where the original document contents
are located.
      // For example, the Semantic Search CAS Indexer writes this information
into the
      // search index that it creates, which allows applications that use the
search index to
      // locate the documents that satisfy their semantic queries.
      SourceDocumentInformation srcDocInfo = new 
SourceDocumentInformation(jcas);
      srcDocInfo.setUri(file.getAbsoluteFile().toURL().toString());

This last line gets the source file name, in your case

C:\Watson\UIMA sdk\apache-uima\examples\data

and the toURL converts the "blank" to "%20"

which then causes the serialization code to fail when it attempts to create the 
file name, and as a result, the default file name is used.

I could reproduce this by making the source directory have a blank in it.

You can avoid this issue by having the source directory the document analyzer 
is using, be one without blanks in the path name.

Cheers. -Marshall


On 3/22/2011 1:09 PM, Marshall Schor wrote:
>
> On 3/22/2011 12:25 PM, Marshall Schor wrote:
>> Here's an idea:
>>
>> The suffix doc1.xmi doc2.xmi, etc are produced when the XMI Cas Serializer is
>> called with a null file name:
>>
>> uimaj-examples/src/main/java/org/apache/uima/examples/xmi/XmiWriterCasConsumer.java
>>
>> line 108-110:
>>     if (outFile == null) {
>>       outFile = new File(mOutputDir, "doc" + mDocNum++ + ".xmi");    
>>     }
>>
>> The code above that has a try block that might be getting tripped up by the 
>> fact
>> that your install point is in a path with a blank in it.
>>
>> Can you try installing into a path without a blank?
> I tried this, and it also worked (with blanks in the file path) - so that's 
> not
> it...
>
> I'll contact you off-list to debug this mystery. -Marshall
>> -Marshall
>>
>> On 3/22/2011 8:48 AM, Bob Sizemore wrote:
>>> Anybody have any ideas for me to try to get the doc analyzer showing the 
>>> right
>>> document names?
>>>
>>>
>>>
>

Reply via email to