[ https://issues.apache.org/jira/browse/TIKA-402?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Martijn van Groningen updated TIKA-402: --------------------------------------- Attachment: iwork.patch testKeynote.key I couldn't find a java library that parses a keynote presentation, so I have made an initial patch that parses a keynote presentation. It is work in-progress and I was hoping to get some feedback. The attached presentation is a keynote version 5 presentation (but has keynote format version 2.x). The patch is working. If have tested this via the Tika CLI. Also 2 tests are included in the patch, one testing the parsing and one the auto detecting. I have added the test file separately, because binary files can't be included in a patch. The keynote file should be placed the test-documents package in the parsers module's resource directory. Older keynote format versions (1.x) are not supported yet, because the format is different. Also if I remember correctly that keynote file is a directory and not a compressed file. Support for Pages is not yet included. > Support for Keynote and Pages documents > --------------------------------------- > > Key: TIKA-402 > URL: https://issues.apache.org/jira/browse/TIKA-402 > Project: Tika > Issue Type: New Feature > Components: parser > Reporter: Jukka Zitting > Attachments: iwork.patch, testKeynote.key > > > It would be nice to have support for documents created by Apple's Keynote and > Pages applications. Both file formats are described in > http://developer.apple.com/mac/library/documentation/AppleApplications/Conceptual/iWork2-0_XML/Chapter01/Introduction.html. > I'm not sure if there already are open source parser libraries for these > formats or if we'd need to directly process the XML content. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.