[jira] Updated: (TIKA-262) ParsingReader does not parse metadata for larger MS Office documents

Daan de Wit (JIRA) Fri, 17 Jul 2009 06:47:38 -0700

     [ 
https://issues.apache.org/jira/browse/TIKA-262?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Daan de Wit updated TIKA-262:
-----------------------------

    Attachment: OfficeParser.java.patch

It seems that word reorders the entries, such that the content entry is before 
the summary information entry for larger documents. Attached is a naive fix to 
OfficeParser.java that handles this.

> ParsingReader does not parse metadata for larger MS Office documents
> --------------------------------------------------------------------
>
>                 Key: TIKA-262
>                 URL: https://issues.apache.org/jira/browse/TIKA-262
>             Project: Tika
>          Issue Type: Bug
>          Components: parser
>    Affects Versions: 0.3
>            Reporter: Daan de Wit
>         Attachments: lipsum.doc, OfficeParser.java.patch, 
> tika-0.3_large-ms-office-metadata.patch
>
>
> The ParsingReader should cause the metadata to be extracted before anything 
> is read from the reader. This is not done for certain MS Office files, it 
> seems to be related to the size of the document.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (TIKA-262) ParsingReader does not parse metadata for larger MS Office documents

Reply via email to