On Jan 26, 2009, at 2:15 PM, Jukka Zitting wrote:
On Mon, Jan 26, 2009 at 11:03 PM, Jonathan Koren <jonat...@soe.ucsc.edu
> wrote:
I also was going to hack some of the parsers to get some better
quality
metadata from them. For instance, the MP3 parser doesn't handles
ID3v2.
So if/when I do that, I'll submit a patch.
Cool! However, see http://markmail.org/message/rgesbchrufeauxnw for a
discussion on how complex a parser implementation within Tika can
become until it would be better to look for (or create) an external
parser library for that format.
I particularly liked the part where the example given as a good enough
parser was the very parser I singled out. :)
So the takeaway is "Don't be PDFBox," and "Don't be afraid to add yet
another dependency, if reimplementing is easy?"
I can't imagine that ID3v2 would be the hard to implement, and is v1
even used anymore?
--
Jonathan Koren
jonat...@soe.ucsc.edu
http://www.soe.ucsc.edu/~jonathan/