Thank you for sharing this with us.
Oddly, I’m able to reproduce this with our 2pic.docx test file, but not with our “test_recursive_embedded.docx”. Please open a ticket on our JIRA. From: Haris Osmanagic [mailto:[email protected]] Sent: Friday, June 2, 2017 6:28 AM To: [email protected] Subject: "Stream closed" error when extracting text using Tika Server Hi everyone! I am using Tika Server, and I have faced a weird thing when extracting text and requiring a plain text response. Tests can be found here: https://github.com/hariso/tika/commit/2a0dc37a4427070360c7ebe147712d9c873a4e7b Version used: 1.15 File used: Any I tried (MS Word, DOCX, PDF) Method used: Multipart upload, using Accept: text/plain Expected result: extracted text Actual result: extract text PLUS an error saying <ns1:XMLFault xmlns:ns1="http://cxf.apache.org/bindings/xformat"><ns1:faultstring xmlns:ns1="http://cxf.apache.org/bindings/xformat">java.io.IOException: Stream Closed</ns1:faultstring></ns1:XMLFault> Looking at the code, it seems like the method used for producing text is using try-with-resources<https://github.com/hariso/tika/blob/2a0dc37a4427070360c7ebe147712d9c873a4e7b/tika-server/src/main/java/org/apache/tika/server/resource/TikaResource.java#L408-L411>, and the used input stream has already been closed. The method used for producing XML doesn't do it<https://github.com/hariso/tika/blob/2a0dc37a4427070360c7ebe147712d9c873a4e7b/tika-server/src/main/java/org/apache/tika/server/resource/TikaResource.java#L476>. In my use case, the parsed text is processed in an additional, where using XML/HTML is not really desired, hence I cannot use it as a workaround (at least not now). Any help or comments are appreciated! Haris
