Nick Burch created TIKA-2346: -------------------------------- Summary: Allow Office format parsers to exclude parsing shapes Key: TIKA-2346 URL: https://issues.apache.org/jira/browse/TIKA-2346 Project: Tika Issue Type: Improvement Components: parser Affects Versions: 1.14 Reporter: Nick Burch Fix For: 1.15
The Office format parsers support including or excluding of deleted text and moved text. It would be good to also support something similar for shape-based text, though probably not for PPT / PPTX as that's almost all shape-based! (This has been done hackily in the Alfresco fork of Tika at https://github.com/Alfresco/tika/commit/32aca3fd96816ad49b869a82c9ba0f02265f8744 but would be good to do properly) -- This message was sent by Atlassian JIRA (v6.3.15#6346)