[
https://issues.apache.org/jira/browse/UIMA-1080?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12608048#action_12608048
]
Adam Lally commented on UIMA-1080:
----------------------------------
This doesn't seem to handle spaces in the file path. For example if you run
the document analyzer with this input dir:
C:\Program Files\apache-uima\examples\data
Then the output files are produced with the generic names doc0, doc1, etc.,
indicating that the filename wasn't extracted from the URI. As I recall, the
URI class is much less lenient than the URL class when it comes to spaces.
This might be considered a problem with the FileSystemCollectionReader, which
populates the SourceDocumenInformation.uri field. Perhaps it should not be
putting spaces in there. However, I am somewhat nervous about changing this to
URL-encode the uri, since I think it is likely there's some user code out there
that is relying on the current behavior.
Also, whatever change is applied to XmiWriterCasConsumer probably should also
be applied to XCasWriterCasConsumer. And there are also example versions of
these classes in the uimaj-examples project.
> [Patch] Wrong usage of URL in XmiWriterCasConsumer
> --------------------------------------------------
>
> Key: UIMA-1080
> URL: https://issues.apache.org/jira/browse/UIMA-1080
> Project: UIMA
> Issue Type: Improvement
> Components: InternalTools
> Affects Versions: 2.2.2
> Reporter: Richard Eckart
> Priority: Minor
> Attachments: UIMA-1080.patch
>
>
> The XmiWriterCasConsumer wraps the value of
> SourceDocumentInformation.getUri() in an URL to extract the path. This only
> works if the value returned by getUri() is actually an URL starting with
> http, ftp or some other known protocol. It does not work if a framework user
> puts some self-defined URIs in there, such as annolab://default/myfile.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.