Avoid multiple passes over the input stream in Microsoft parsers
----------------------------------------------------------------

                 Key: TIKA-63
                 URL: https://issues.apache.org/jira/browse/TIKA-63
             Project: Tika
          Issue Type: Improvement
          Components: general
            Reporter: Jukka Zitting
            Assignee: Jukka Zitting
             Fix For: 0.1-incubator


The current Excel, Word, and PowerPoint parsers make multiple passes over the 
given input stream - first to read document metadata, and then to extract text 
content. We can avoid this duplicate consumption by using the POIFSFileSystem 
class as a source of both the metadata and text content in the parser classes 
since these Office documents are in any case parsed into memory.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to