Thanks everyone for feedback! I am not able to sign up for Apache's JIRA, so I couldn't open the ticket myself, sorry for that. Am I able to help somehow this way?
On Fri, Jun 2, 2017 at 3:18 PM Allison, Timothy B. <[email protected]> wrote: > I opened TIKA-2384 for this. Let’s move discussion there. > > > > *From:* Luís Filipe Nassif [mailto:[email protected]] > *Sent:* Friday, June 2, 2017 9:00 AM > *To:* [email protected] > *Subject:* RE: "Stream closed" error when extracting text using Tika > Server > > > > I think resources should be closed where they are opened, like > parser.parse() API contract, no? > > > > Luis > > > > Em 2 de jun de 2017 8:27 AM, "Allison, Timothy B." <[email protected]> > escreveu: > > Haris is correct. > > The static "parse()" closes the InputStream so we shouldn't wrap the call > to parse in an autoclose > > try(InputStream is = xyz) { > TikaResource.parse(...) > } > > Once I remove the autoclosing try, the test passes. > > > -----Original Message----- > From: Sergey Beryozkin [mailto:[email protected]] > Sent: Friday, June 2, 2017 7:20 AM > To: [email protected] > Subject: Re: "Stream closed" error when extracting text using Tika Server > > Hi Tim, sorry, I'm not sure now what I was planning to fix :-), I've > looked at the source again and it is not a case of InputStream returned > directly from the method... > try/catch will most likely work better, though may be it would hide some > issue to do with some of the parsers closing the stream early somewhere... > > Thanks, Sergey > On 02/06/17 12:13, Allison, Timothy B. wrote: > > Thank you for sharing this with us. > > > > Oddly, I’m able to reproduce this with our 2pic.docx test file, but > > not with our “test_recursive_embedded.docx”. > > > > Please open a ticket on our JIRA. > > > > *From:*Haris Osmanagic [mailto:[email protected]] > > *Sent:* Friday, June 2, 2017 6:28 AM > > *To:* [email protected] > > *Subject:* "Stream closed" error when extracting text using Tika > > Server > > > > Hi everyone! > > > > I am using Tika Server, and I have faced a weird thing when extracting > > text and requiring a plain text response. Tests can be found here: > > https://github.com/hariso/tika/commit/2a0dc37a4427070360c7ebe147712d9c > > 873a4e7b > > > > *Version used*: 1.15 > > > > *File used*: Any I tried (MS Word, DOCX, PDF) > > > > *Method used*: Multipart upload, using Accept: text/plain > > > > *Expected result*: extracted text > > > > *Actual result*: extract text PLUS an error saying > > > > <ns1:XMLFault > > xmlns:ns1="http://cxf.apache.org/bindings/xformat"><ns1:faultstring > > xmlns:ns1="http://cxf.apache.org/bindings/xformat">java.io.IOException: > > Stream Closed</ns1:faultstring></ns1:XMLFault> > > > > Looking at the code, it seems like the method used for producing text > > is using try-with-resources > > <https://github.com/hariso/tika/blob/2a0dc37a4427070360c7ebe147712d9c8 > > 73a4e7b/tika-server/src/main/java/org/apache/tika/server/resource/Tika > > Resource.java#L408-L411>, and the used input stream has already been > > closed. The method used for producing XML doesn't do it > > < > https://github.com/hariso/tika/blob/2a0dc37a4427070360c7ebe147712d9c873a4e7b/tika-server/src/main/java/org/apache/tika/server/resource/TikaResource.java#L476 > >. > > > > In my use case, the parsed text is processed in an additional, where > > using XML/HTML is not really desired, hence I cannot use it as a > > workaround (at least not now). > > > > Any help or comments are appreciated! > > > > Haris > > > > > -- > Sergey Beryozkin > > Talend Community Coders > http://coders.talend.com/ > >
