Hi Tim, sorry, I'm not sure now what I was planning to fix :-), I've
looked at the source again and it is not a case of InputStream returned
directly from the method...
try/catch will most likely work better, though may be it would hide some
issue to do with some of the parsers closing the stream early somewhere...
Thanks, Sergey
On 02/06/17 12:13, Allison, Timothy B. wrote:
Thank you for sharing this with us.
Oddly, I’m able to reproduce this with our 2pic.docx test file, but not
with our “test_recursive_embedded.docx”.
Please open a ticket on our JIRA.
*From:*Haris Osmanagic [mailto:[email protected]]
*Sent:* Friday, June 2, 2017 6:28 AM
*To:* [email protected]
*Subject:* "Stream closed" error when extracting text using Tika Server
Hi everyone!
I am using Tika Server, and I have faced a weird thing when extracting
text and requiring a plain text response. Tests can be found here:
https://github.com/hariso/tika/commit/2a0dc37a4427070360c7ebe147712d9c873a4e7b
*Version used*: 1.15
*File used*: Any I tried (MS Word, DOCX, PDF)
*Method used*: Multipart upload, using Accept: text/plain
*Expected result*: extracted text
*Actual result*: extract text PLUS an error saying
<ns1:XMLFault
xmlns:ns1="http://cxf.apache.org/bindings/xformat"><ns1:faultstring
xmlns:ns1="http://cxf.apache.org/bindings/xformat">java.io.IOException:
Stream Closed</ns1:faultstring></ns1:XMLFault>
Looking at the code, it seems like the method used for producing text is
using try-with-resources
<https://github.com/hariso/tika/blob/2a0dc37a4427070360c7ebe147712d9c873a4e7b/tika-server/src/main/java/org/apache/tika/server/resource/TikaResource.java#L408-L411>,
and the used input stream has already been closed. The method used for
producing XML doesn't do it
<https://github.com/hariso/tika/blob/2a0dc37a4427070360c7ebe147712d9c873a4e7b/tika-server/src/main/java/org/apache/tika/server/resource/TikaResource.java#L476>.
In my use case, the parsed text is processed in an additional, where
using XML/HTML is not really desired, hence I cannot use it as a
workaround (at least not now).
Any help or comments are appreciated!
Haris
--
Sergey Beryozkin
Talend Community Coders
http://coders.talend.com/