Now posted as a Jira: https://issues.apache.org/jira/browse/UIMA-2097
-Marshall
On 3/22/2011 1:54 PM, Marshall Schor wrote:
> OK - found the problem.
>
> The Document analyzer uses a component "FileSystemCollectionReader" to read
> the
> files. This component inserts into the CAS the name of the file being read,
> using the code:
>
> // Also store location of source document in CAS. This information is
> critical
> // if CAS Consumers will need to know where the original document
> contents
> are located.
> // For example, the Semantic Search CAS Indexer writes this information
> into the
> // search index that it creates, which allows applications that use the
> search index to
> // locate the documents that satisfy their semantic queries.
> SourceDocumentInformation srcDocInfo = new
> SourceDocumentInformation(jcas);
> srcDocInfo.setUri(file.getAbsoluteFile().toURL().toString());
>
> This last line gets the source file name, in your case
>
> C:\Watson\UIMA sdk\apache-uima\examples\data
>
> and the toURL converts the "blank" to "%20"
>
> which then causes the serialization code to fail when it attempts to create
> the file name, and as a result, the default file name is used.
>
> I could reproduce this by making the source directory have a blank in it.
>
> You can avoid this issue by having the source directory the document analyzer
> is using, be one without blanks in the path name.
>
> Cheers. -Marshall
>
>
> On 3/22/2011 1:09 PM, Marshall Schor wrote:
>> On 3/22/2011 12:25 PM, Marshall Schor wrote:
>>> Here's an idea:
>>>
>>> The suffix doc1.xmi doc2.xmi, etc are produced when the XMI Cas Serializer
>>> is
>>> called with a null file name:
>>>
>>> uimaj-examples/src/main/java/org/apache/uima/examples/xmi/XmiWriterCasConsumer.java
>>>
>>> line 108-110:
>>> if (outFile == null) {
>>> outFile = new File(mOutputDir, "doc" + mDocNum++ + ".xmi");
>>> }
>>>
>>> The code above that has a try block that might be getting tripped up by the
>>> fact
>>> that your install point is in a path with a blank in it.
>>>
>>> Can you try installing into a path without a blank?
>> I tried this, and it also worked (with blanks in the file path) - so that's
>> not
>> it...
>>
>> I'll contact you off-list to debug this mystery. -Marshall
>>> -Marshall
>>>
>>> On 3/22/2011 8:48 AM, Bob Sizemore wrote:
>>>> Anybody have any ideas for me to try to get the doc analyzer showing the
>>>> right
>>>> document names?
>>>>
>>>>
>>>>
>