On Jan 26, 2009, at 2:15 PM, Jukka Zitting wrote:
On Mon, Jan 26, 2009 at 11:03 PM, Jonathan Koren <jonat...@soe.ucsc.edu > wrote:

I also was going to hack some of the parsers to get some better quality metadata from them. For instance, the MP3 parser doesn't handles ID3v2.
So if/when I do that, I'll submit a patch.

Cool! However, see http://markmail.org/message/rgesbchrufeauxnw for a
discussion on how complex a parser implementation within Tika can
become until it would be better to look for (or create) an external
parser library for that format.


I particularly liked the part where the example given as a good enough parser was the very parser I singled out. :)

So the takeaway is "Don't be PDFBox," and "Don't be afraid to add yet another dependency, if reimplementing is easy?"

I can't imagine that ID3v2 would be the hard to implement, and is v1 even used anymore?

--
Jonathan Koren
jonat...@soe.ucsc.edu
http://www.soe.ucsc.edu/~jonathan/


Reply via email to