[ https://issues.apache.org/jira/browse/TIKA-4219?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Tim Allison resolved TIKA-4219. ------------------------------- Fix Version/s: 2.9.2 Resolution: Fixed > Figure out what to do with epubs with encrypted non-core content > ---------------------------------------------------------------- > > Key: TIKA-4219 > URL: https://issues.apache.org/jira/browse/TIKA-4219 > Project: Tika > Issue Type: Task > Reporter: Tim Allison > Priority: Major > Fix For: 2.9.2 > > > On TIKA-4218, we noticed several epubs that were now being identified as > encrypted, which is good. We did this work on TIKA-4176. > On the other hand, we found several epubs that were now identified as > encrypted but which had content before we were doing the encryption detection. > The issue in at least one file that I reviewed is that non-core content is > encrypted -- the fonts. So, from a text+metadata extraction, we could still > get all the content and then throw an Encrypted Exception or maybe flag > something as encrypted. > I'm not sure what the best thing to do is in this case. > An example file is here: > http://corpora.tika.apache.org/base/docs/commoncrawl3/47/47WOSBEUHE6CRMVDFBOOHUD36FEQAZ6T -- This message was sent by Atlassian Jira (v8.20.10#820010)