Hello, 

I have just recently discovered Tika as I have been playing around with 
fscrawler to help me index my file shares and I came across a problem, that I 
can't fix. Tika has had the ability to parse Apple iWork files for quite some 
time, but since Apple has split up the iWorks Suite into three seperate apps, 
the media type has changed for each of those - now seperate files. 


As I have learned from looking at the code of the Class IWorkPackageParser, it 
defines this media type for iWork files: 



/** 
* This parser handles all iWorks formats. 
*/ 
private final static Set<MediaType> supportedTypes = 
Collections.unmodifiableSet(new HashSet<MediaType>(Arrays.asList( 
MediaType.application("vnd.apple.iwork"), 
IWORKDocumentType.KEYNOTE.getType(), 
IWORKDocumentType.NUMBERS.getType(), 
IWORKDocumentType.PAGES.getType() 
))); 


However, fscrawler sends this MediaType to Tika, which of course triggers no 
parser: application/vnd.apple.keynote 


Can the iWorks parser be updated to be able to handle Keynote files, or at 
least, give it a try? Unfortuanetly, I am not a dev type, so I am lacking the 
skills to pull that off, but I'd be ready to try a new parser and provide 
feedback. 


Regards, 
Stephan 
-- 

Krebs's 3 Basic Rules for Online Safety 
1st - "If you didn't go looking for it, don't install it!" 
2nd - "If you installed it, update it." 
3rd - "If you no longer need it, remove it." 
http://krebsonsecurity.com/2011/05/krebss-3-basic-rules-for-online-safety 


Stephan Budach 
Head of IT 
Jung von Matt AG 
Glashüttenstraße 79 
D-20357 Hamburg 


Tel: +49 40-4321-1353 
Fax: +49 40-4321-1114 
E-Mail: [email protected] 
Internet: http://www.jvm.com 
WebEx: https://jvm.webex.com/meet/stephan.budach 

Vorstand: Dr. Peter Figge 
Vorsitzender des Aufsichtsrates: Dr. Jochen Gutbrod 
AG HH HRB 72893 

Attachment: smime.p7s
Description: S/MIME cryptographic signature

Reply via email to