[ 
https://issues.apache.org/jira/browse/TIKA-35?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12531981
 ] 

Rida Benjelloun commented on TIKA-35:
-------------------------------------

Keith - 
I noted this problem, I will force reading entire stream before calling 
rewind() method. 
Thanks for the suggestion


> Extract MsOffice properties
> ---------------------------
>
>                 Key: TIKA-35
>                 URL: https://issues.apache.org/jira/browse/TIKA-35
>             Project: Tika
>          Issue Type: Improvement
>    Affects Versions: 0.1-incubator
>            Reporter: Rida Benjelloun
>            Assignee: Rida Benjelloun
>             Fix For: 0.1-incubator
>
>         Attachments: RereadableInputStream.java, 
> RereadableInputStreamTest.java, tika35.patch, tika35.patch
>
>
> Hi,
> I have developed a patch that allows MsOffice properties extraction. I wasn't 
> able to extract the MsOffice properties and full text from a single 
> inputstream, I always get this error : java.io.IOException Source code of 
> java.io.IOException: Unable to read entire header; -1 bytes read;
> expected 512 bytes. 
> I don't know how they make it work in Nutch (any ideas ?).
> To get it work, I have added "filePath" variable in the parser class, and I 
> populate it from ParseUtils class. After that I create an inputStream from 
> filePath or Url and I use it to extract properties and I use the default 
> inputstream to extract full text.
> I didn't commit this modification; I would like to have your opinions before.
> Regards.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to