On Jan 26, 2009, at 12:05 PM, Jukka Zitting wrote:


Tika's mime type detection routinely fails on fairly common files. For instance, every gif I've tried Tika returns application/octet- stream rather
than image/gif.

Hmm, you're right. As no noted, the proper configuration for GIF is
missing. I'll fix that in TIKA-192.

I was just about to go through and a bunch of the missing magic from the libmagic files for my own application. So I can just go ahead and mail out mine when it's done. (Don't worry, I'm not going to load it up with things like Xenix core files and Apple ][ NuFiles.)

I also was going to hack some of the parsers to get some better quality metadata from them. For instance, the MP3 parser doesn't handles ID3v2. So if/when I do that, I'll submit a patch.

but apparently the config file is silently failing to be loaded, or being ignored, or AutoDetectParser's mime detector isn't correctly checking the globs or something, none of which makes any sense. Something should either
write to stderr or throw an exception if this was the case.

Hmm, I'll look into that.

I stuck a x at the very beginning of the file when I wrote it out with xemacs.
Four days for a single one byte error.   I suck.  :(

I've even tried creating a new tika-config.xml with the fullpath to my
tika-mimetypes.xml in it

[...]
but that just causes NullPointerException s to be thrown.

Where's the NPE coming from? Can you file a bug for that?

Sorry. I tried replicating it right now, and now I can't get it to do it. Might had been related to having a bogus xml file.

--
Jonathan Koren
jonat...@soe.ucsc.edu
http://www.soe.ucsc.edu/~jonathan/


Reply via email to