Thank you for sharing this with us.

Oddly, I’m able to reproduce this with our 2pic.docx test file, but not with 
our “test_recursive_embedded.docx”.



Please open a ticket on our JIRA.


From: Haris Osmanagic [mailto:[email protected]]
Sent: Friday, June 2, 2017 6:28 AM
To: [email protected]
Subject: "Stream closed" error when extracting text using Tika Server

Hi everyone!

I am using Tika Server, and I have faced a weird thing when extracting text and 
requiring a plain text response. Tests can be found here: 
https://github.com/hariso/tika/commit/2a0dc37a4427070360c7ebe147712d9c873a4e7b

Version used: 1.15
File used: Any I tried (MS Word, DOCX, PDF)
Method used: Multipart upload, using Accept: text/plain

Expected result: extracted text
Actual result: extract text PLUS an error saying

<ns1:XMLFault 
xmlns:ns1="http://cxf.apache.org/bindings/xformat";><ns1:faultstring 
xmlns:ns1="http://cxf.apache.org/bindings/xformat";>java.io.IOException: Stream 
Closed</ns1:faultstring></ns1:XMLFault>

Looking at the code, it seems like the method used for producing text is using 
try-with-resources<https://github.com/hariso/tika/blob/2a0dc37a4427070360c7ebe147712d9c873a4e7b/tika-server/src/main/java/org/apache/tika/server/resource/TikaResource.java#L408-L411>,
 and the used input stream has already been closed. The method used for 
producing XML doesn't do 
it<https://github.com/hariso/tika/blob/2a0dc37a4427070360c7ebe147712d9c873a4e7b/tika-server/src/main/java/org/apache/tika/server/resource/TikaResource.java#L476>.

In my use case, the parsed text is processed in an additional, where using 
XML/HTML is not really desired, hence I cannot use it as a workaround (at least 
not now).

Any help or comments are appreciated!

Haris


Reply via email to