Hello :-) FYI, here is a list of apparent Tika 0.8 conversion failures when run from Xapian's omindex on a Debian 6 Squeeze 64-bit system with 4 GB memory:
doc files: tried: 10268, failed: 345 3.35% docx files: tried: 248, failed: 0 odp files: tried: 7, failed: 0 ods files: tried: 71, failed: 0 odt files: tried: 136, failed: 0 pdf files: tried: 3888, failed: 150 3.85% pps files: tried: 29, failed: 3 10.34% ppsx files: tried: 12, failed: 0 ppt files: tried: 331, failed: 0 pptx files: tried: 24, failed: 0 rtf files: tried: 698, failed: 1 .14% xls files: tried: 3339, failed: 2 .05% xlsx files: tried: 63, failed: 0 The statistics were generated by searching omindex output for .$ext" failed where $ext was each of the listed extensions in turn. More information can be supplied on request. Best Charles
