[ANNOUNCE] Apache Tika 1.14 release

2016-11-09 Thread Chris Mattmann
Hi, The Apache Tika project is pleased to announce the release of Apache Tika 1.14. The release contents have been pushed out to the main Apache release site and to the Central sync, so the releases should be available as soon as the mirrors get the syncs. Apache Tika is a toolkit for detecting

Re: Mime type matching: tika-mimetypes.xml

2016-11-09 Thread Nick Burch
On Wed, 9 Nov 2016, Chris Bamford wrote: … ... Does offset="0:8192" mean match 'Message-ID:' anywhere in the first 8192 bytes? Yup, that's it. If that is found, and nothing with a priority score of higher than 50 also matches, it'll return that type. If a higher

Mime type matching: tika-mimetypes.xml

2016-11-09 Thread Chris Bamford
Hi, I was wondering exactly what this syntax means in tika-mimetypes.xml … ... Does offset="0:8192" mean match 'Message-ID:' anywhere in the first 8192 bytes? If so, I'm not sure it is working properly as I have some eml files with this string near the