Nick Burch created TIKA-2346:
--------------------------------

             Summary: Allow Office format parsers to exclude parsing shapes
                 Key: TIKA-2346
                 URL: https://issues.apache.org/jira/browse/TIKA-2346
             Project: Tika
          Issue Type: Improvement
          Components: parser
    Affects Versions: 1.14
            Reporter: Nick Burch
             Fix For: 1.15


The Office format parsers support including or excluding of deleted text and 
moved text. It would be good to also support something similar for shape-based 
text, though probably not for PPT / PPTX as that's almost all shape-based!

(This has been done hackily in the Alfresco fork of Tika at  
https://github.com/Alfresco/tika/commit/32aca3fd96816ad49b869a82c9ba0f02265f8744
 but would be good to do properly)



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to