Hello,
I have just recently discovered Tika as I have been playing around with
fscrawler to help me index my file shares and I came across a problem, that I
can't fix. Tika has had the ability to parse Apple iWork files for quite some
time, but since Apple has split up the iWorks Suite into three seperate apps,
the media type has changed for each of those - now seperate files.
As I have learned from looking at the code of the Class IWorkPackageParser, it
defines this media type for iWork files:
/**
* This parser handles all iWorks formats.
*/
private final static Set<MediaType> supportedTypes =
Collections.unmodifiableSet(new HashSet<MediaType>(Arrays.asList(
MediaType.application("vnd.apple.iwork"),
IWORKDocumentType.KEYNOTE.getType(),
IWORKDocumentType.NUMBERS.getType(),
IWORKDocumentType.PAGES.getType()
)));
However, fscrawler sends this MediaType to Tika, which of course triggers no
parser: application/vnd.apple.keynote
Can the iWorks parser be updated to be able to handle Keynote files, or at
least, give it a try? Unfortuanetly, I am not a dev type, so I am lacking the
skills to pull that off, but I'd be ready to try a new parser and provide
feedback.
Regards,
Stephan
--
Krebs's 3 Basic Rules for Online Safety
1st - "If you didn't go looking for it, don't install it!"
2nd - "If you installed it, update it."
3rd - "If you no longer need it, remove it."
http://krebsonsecurity.com/2011/05/krebss-3-basic-rules-for-online-safety
Stephan Budach
Head of IT
Jung von Matt AG
Glashüttenstraße 79
D-20357 Hamburg
Tel: +49 40-4321-1353
Fax: +49 40-4321-1114
E-Mail: [email protected]
Internet: http://www.jvm.com
WebEx: https://jvm.webex.com/meet/stephan.budach
Vorstand: Dr. Peter Figge
Vorsitzender des Aufsichtsrates: Dr. Jochen Gutbrod
AG HH HRB 72893
smime.p7s
Description: S/MIME cryptographic signature
