On 13/08/11 13:29, Jukka Zitting wrote:
> Hi,
> 

> Sorry about that. The snapshot builds I pointed to are automatically
> built binaries from the latest source tree, so you can find the latest
> tika-app.jar from there.

> Jukka Zitting

Hello :)

That was a lot more convenient.  Is there some benefit in building from
sources?

Here's the output for essentially the same document collation (a few are
added each day) using tika-app-1.0-20110802.120407-81.jar:

 doc files: tried: 10202, failed: 152  1.48%
docx files: tried:   252, failed:   0
 odp files: tried:     6, failed:   0
 ods files: tried:    71, failed:   0
 odt files: tried:   135, failed:   0
 pdf files: tried:  3859, failed: 112  2.90%
 pps files: tried:    30, failed:   3 10.00%
ppsx files: tried:    12, failed:   0
 ppt files: tried:   329, failed:   3   .91%
pptx files: tried:    24, failed:   0
 rtf files: tried:   691, failed:   1   .14%
 xls files: tried:  3313, failed:  35  1.05%
xlsx files: tried:    63, failed:   0

For comparison here is the Tika 0.8 equivalent as posted on 9aug11:

 doc files: tried: 10268, failed: 345  3.35%
docx files: tried:   248, failed:   0
 odp files: tried:     7, failed:   0
 ods files: tried:    71, failed:   0
 odt files: tried:   136, failed:   0
 pdf files: tried:  3888, failed: 150  3.85%
 pps files: tried:    29, failed:   3 10.34%
ppsx files: tried:    12, failed:   0
 ppt files: tried:   331, failed:   0
pptx files: tried:    24, failed:   0
 rtf files: tried:   698, failed:   1   .14%
 xls files: tried:  3339, failed:   2   .05%
xlsx files: tried:    63, failed:   0

Best

Charles


Reply via email to