Now posted as a Jira: https://issues.apache.org/jira/browse/UIMA-2097

-Marshall

On 3/22/2011 1:54 PM, Marshall Schor wrote:
> OK - found the problem.
>
> The Document analyzer uses a component "FileSystemCollectionReader" to read 
> the
> files. This component inserts into the CAS the name of the file being read,
> using the code:
>
>       // Also store location of source document in CAS. This information is 
> critical
>       // if CAS Consumers will need to know where the original document 
> contents
> are located.
>       // For example, the Semantic Search CAS Indexer writes this information
> into the
>       // search index that it creates, which allows applications that use the
> search index to
>       // locate the documents that satisfy their semantic queries.
>       SourceDocumentInformation srcDocInfo = new 
> SourceDocumentInformation(jcas);
>       srcDocInfo.setUri(file.getAbsoluteFile().toURL().toString());
>
> This last line gets the source file name, in your case
>
> C:\Watson\UIMA sdk\apache-uima\examples\data
>
> and the toURL converts the "blank" to "%20"
>
> which then causes the serialization code to fail when it attempts to create 
> the file name, and as a result, the default file name is used.
>
> I could reproduce this by making the source directory have a blank in it.
>
> You can avoid this issue by having the source directory the document analyzer 
> is using, be one without blanks in the path name.
>
> Cheers. -Marshall
>
>
> On 3/22/2011 1:09 PM, Marshall Schor wrote:
>> On 3/22/2011 12:25 PM, Marshall Schor wrote:
>>> Here's an idea:
>>>
>>> The suffix doc1.xmi doc2.xmi, etc are produced when the XMI Cas Serializer 
>>> is
>>> called with a null file name:
>>>
>>> uimaj-examples/src/main/java/org/apache/uima/examples/xmi/XmiWriterCasConsumer.java
>>>
>>> line 108-110:
>>>     if (outFile == null) {
>>>       outFile = new File(mOutputDir, "doc" + mDocNum++ + ".xmi");    
>>>     }
>>>
>>> The code above that has a try block that might be getting tripped up by the 
>>> fact
>>> that your install point is in a path with a blank in it.
>>>
>>> Can you try installing into a path without a blank?
>> I tried this, and it also worked (with blanks in the file path) - so that's 
>> not
>> it...
>>
>> I'll contact you off-list to debug this mystery. -Marshall
>>> -Marshall
>>>
>>> On 3/22/2011 8:48 AM, Bob Sizemore wrote:
>>>> Anybody have any ideas for me to try to get the doc analyzer showing the 
>>>> right
>>>> document names?
>>>>
>>>>
>>>>
>

Reply via email to