Hello,

I'm trying to index mp3s using Tika parser.

by adding
LOG.trace(tikaMDName + ": " + tikamd.get(tikaMDName)); in TikaParser.java

I was able to see that Tika identified nicely all that I want like artist,
album, genre using xmpDM etc..

010-08-12 18:59:00,640 DEBUG parse.ParseUtil - Parsing [
http://www.joshwoodward.com/mp3/JoshWoodward-Stickybee.mp3] with
[org.apache.nutch.parse.tika.tikapar...@21ce9f9d]
2010-08-12 18:59:00,640 DEBUG tika.TikaParser - Using Tika parser
org.apache.tika.parser.mp3.Mp3Parser for mime-type audio/mpeg
2010-08-12 18:59:00,656 TRACE tika.TikaParser - Meta tags for
http://www.joshwoodward.com/mp3/JoshWoodward-Stickybee.mp3: base=null,
noCache=false, noFollow=false, noIndex=false, refresh=false,
refreshHref=null
 * general tags:
 * http-equiv tags:

2010-08-12 18:59:00,656 TRACE tika.TikaParser - Getting text...
2010-08-12 18:59:00,656 TRACE tika.TikaParser - Getting title...
2010-08-12 18:59:00,656 TRACE tika.TikaParser - Getting links...
2010-08-12 18:59:00,656 TRACE tika.TikaParser - found 0 outlinks in
http://www.joshwoodward.com/mp3/JoshWoodward-Stickybee.mp3
2010-08-12 18:59:00,656 TRACE tika.TikaParser - xmpDM:releaseDate: 2007
2010-08-12 18:59:00,656 TRACE tika.TikaParser - title: Stickybee
2010-08-12 18:59:00,656 TRACE tika.TikaParser - samplerate: 44100
2010-08-12 18:59:00,656 TRACE tika.TikaParser - xmpDM:album: Dirty Wings
2010-08-12 18:59:00,656 TRACE tika.TikaParser - xmpDM:artist: Josh Woodward
2010-08-12 18:59:00,656 TRACE tika.TikaParser - Author: Josh Woodward
2010-08-12 18:59:00,656 TRACE tika.TikaParser - channels: 2
2010-08-12 18:59:00,656 TRACE tika.TikaParser - xmpDM:genre: Rock
2010-08-12 18:59:00,657 TRACE tika.TikaParser - xmpDM:audioSampleRate: 44100
2010-08-12 18:59:00,657 TRACE tika.TikaParser - xmpDM:logComment:
XXXCommentshttp://www.joshwoodward.com/
2010-08-12 18:59:00,657 TRACE tika.TikaParser - Content-Type: audio/mpeg
2010-08-12 18:59:00,657 TRACE tika.TikaParser - version: MPEG 3 Layer III
Version 1


How can I index this fields in the same way Creative Commons parser does?
Shouldn't "nutchMetadata.add(tikaMDName, tikamd.get(tikaMDName));" do just
that?


Thank you for your enlightenments,
André Ricardo

Reply via email to