Haris is correct.

The static "parse()" closes the InputStream so we shouldn't wrap the call to 
parse in an autoclose 

try(InputStream is = xyz) {
        TikaResource.parse(...)
}

Once I remove the autoclosing try, the test passes.


-----Original Message-----
From: Sergey Beryozkin [mailto:[email protected]] 
Sent: Friday, June 2, 2017 7:20 AM
To: [email protected]
Subject: Re: "Stream closed" error when extracting text using Tika Server

Hi Tim, sorry, I'm not sure now what I was planning to fix :-), I've looked at 
the source again and it is not a case of InputStream returned directly from the 
method...
try/catch will most likely work better, though may be it would hide some issue 
to do with some of the parsers closing the stream early somewhere...

Thanks, Sergey
On 02/06/17 12:13, Allison, Timothy B. wrote:
> Thank you for sharing this with us.
> 
> Oddly, I’m able to reproduce this with our 2pic.docx test file, but 
> not with our “test_recursive_embedded.docx”.
> 
> Please open a ticket on our JIRA.
> 
> *From:*Haris Osmanagic [mailto:[email protected]]
> *Sent:* Friday, June 2, 2017 6:28 AM
> *To:* [email protected]
> *Subject:* "Stream closed" error when extracting text using Tika 
> Server
> 
> Hi everyone!
> 
> I am using Tika Server, and I have faced a weird thing when extracting 
> text and requiring a plain text response. Tests can be found here:
> https://github.com/hariso/tika/commit/2a0dc37a4427070360c7ebe147712d9c
> 873a4e7b
> 
> *Version used*: 1.15
> 
> *File used*: Any I tried (MS Word, DOCX, PDF)
> 
> *Method used*: Multipart upload, using Accept: text/plain
> 
> *Expected result*: extracted text
> 
> *Actual result*: extract text PLUS an error saying
> 
> <ns1:XMLFault
> xmlns:ns1="http://cxf.apache.org/bindings/xformat";><ns1:faultstring
> xmlns:ns1="http://cxf.apache.org/bindings/xformat";>java.io.IOException: 
> Stream Closed</ns1:faultstring></ns1:XMLFault>
> 
> Looking at the code, it seems like the method used for producing text 
> is using try-with-resources 
> <https://github.com/hariso/tika/blob/2a0dc37a4427070360c7ebe147712d9c8
> 73a4e7b/tika-server/src/main/java/org/apache/tika/server/resource/Tika
> Resource.java#L408-L411>, and the used input stream has already been 
> closed. The method used for producing XML doesn't do it 
> <https://github.com/hariso/tika/blob/2a0dc37a4427070360c7ebe147712d9c873a4e7b/tika-server/src/main/java/org/apache/tika/server/resource/TikaResource.java#L476>.
> 
> In my use case, the parsed text is processed in an additional, where 
> using XML/HTML is not really desired, hence I cannot use it as a 
> workaround (at least not now).
> 
> Any help or comments are appreciated!
> 
> Haris
> 


--
Sergey Beryozkin

Talend Community Coders
http://coders.talend.com/

Reply via email to