Haris is correct.
The static "parse()" closes the InputStream so we shouldn't wrap the call to
parse in an autoclose
try(InputStream is = xyz) {
TikaResource.parse(...)
}
Once I remove the autoclosing try, the test passes.
-----Original Message-----
From: Sergey Beryozkin [mailto:[email protected]]
Sent: Friday, June 2, 2017 7:20 AM
To: [email protected]
Subject: Re: "Stream closed" error when extracting text using Tika Server
Hi Tim, sorry, I'm not sure now what I was planning to fix :-), I've looked at
the source again and it is not a case of InputStream returned directly from the
method...
try/catch will most likely work better, though may be it would hide some issue
to do with some of the parsers closing the stream early somewhere...
Thanks, Sergey
On 02/06/17 12:13, Allison, Timothy B. wrote:
> Thank you for sharing this with us.
>
> Oddly, I’m able to reproduce this with our 2pic.docx test file, but
> not with our “test_recursive_embedded.docx”.
>
> Please open a ticket on our JIRA.
>
> *From:*Haris Osmanagic [mailto:[email protected]]
> *Sent:* Friday, June 2, 2017 6:28 AM
> *To:* [email protected]
> *Subject:* "Stream closed" error when extracting text using Tika
> Server
>
> Hi everyone!
>
> I am using Tika Server, and I have faced a weird thing when extracting
> text and requiring a plain text response. Tests can be found here:
> https://github.com/hariso/tika/commit/2a0dc37a4427070360c7ebe147712d9c
> 873a4e7b
>
> *Version used*: 1.15
>
> *File used*: Any I tried (MS Word, DOCX, PDF)
>
> *Method used*: Multipart upload, using Accept: text/plain
>
> *Expected result*: extracted text
>
> *Actual result*: extract text PLUS an error saying
>
> <ns1:XMLFault
> xmlns:ns1="http://cxf.apache.org/bindings/xformat"><ns1:faultstring
> xmlns:ns1="http://cxf.apache.org/bindings/xformat">java.io.IOException:
> Stream Closed</ns1:faultstring></ns1:XMLFault>
>
> Looking at the code, it seems like the method used for producing text
> is using try-with-resources
> <https://github.com/hariso/tika/blob/2a0dc37a4427070360c7ebe147712d9c8
> 73a4e7b/tika-server/src/main/java/org/apache/tika/server/resource/Tika
> Resource.java#L408-L411>, and the used input stream has already been
> closed. The method used for producing XML doesn't do it
> <https://github.com/hariso/tika/blob/2a0dc37a4427070360c7ebe147712d9c873a4e7b/tika-server/src/main/java/org/apache/tika/server/resource/TikaResource.java#L476>.
>
> In my use case, the parsed text is processed in an additional, where
> using XML/HTML is not really desired, hence I cannot use it as a
> workaround (at least not now).
>
> Any help or comments are appreciated!
>
> Haris
>
--
Sergey Beryozkin
Talend Community Coders
http://coders.talend.com/