On 13/08/11 13:29, Jukka Zitting wrote: > Hi, > > Sorry about that. The snapshot builds I pointed to are automatically > built binaries from the latest source tree, so you can find the latest > tika-app.jar from there.
> Jukka Zitting Hello :) That was a lot more convenient. Is there some benefit in building from sources? Here's the output for essentially the same document collation (a few are added each day) using tika-app-1.0-20110802.120407-81.jar: doc files: tried: 10202, failed: 152 1.48% docx files: tried: 252, failed: 0 odp files: tried: 6, failed: 0 ods files: tried: 71, failed: 0 odt files: tried: 135, failed: 0 pdf files: tried: 3859, failed: 112 2.90% pps files: tried: 30, failed: 3 10.00% ppsx files: tried: 12, failed: 0 ppt files: tried: 329, failed: 3 .91% pptx files: tried: 24, failed: 0 rtf files: tried: 691, failed: 1 .14% xls files: tried: 3313, failed: 35 1.05% xlsx files: tried: 63, failed: 0 For comparison here is the Tika 0.8 equivalent as posted on 9aug11: doc files: tried: 10268, failed: 345 3.35% docx files: tried: 248, failed: 0 odp files: tried: 7, failed: 0 ods files: tried: 71, failed: 0 odt files: tried: 136, failed: 0 pdf files: tried: 3888, failed: 150 3.85% pps files: tried: 29, failed: 3 10.34% ppsx files: tried: 12, failed: 0 ppt files: tried: 331, failed: 0 pptx files: tried: 24, failed: 0 rtf files: tried: 698, failed: 1 .14% xls files: tried: 3339, failed: 2 .05% xlsx files: tried: 63, failed: 0 Best Charles
