Hi Andre,

> I was able to see that Tika identified nicely all that I want like artist,
> album, genre using xmpDM etc..
>
>
> 2010-08-12 18:59:00,656 TRACE tika.TikaParser - Getting text...
> 2010-08-12 18:59:00,656 TRACE tika.TikaParser - Getting title...
> 2010-08-12 18:59:00,656 TRACE tika.TikaParser - Getting links...
> 2010-08-12 18:59:00,656 TRACE tika.TikaParser - found 0 outlinks in
> http://www.joshwoodward.com/mp3/JoshWoodward-Stickybee.mp3
> 2010-08-12 18:59:00,656 TRACE tika.TikaParser - xmpDM:releaseDate: 2007
> 2010-08-12 18:59:00,656 TRACE tika.TikaParser - title: Stickybee
> 2010-08-12 18:59:00,656 TRACE tika.TikaParser - samplerate: 44100
> 2010-08-12 18:59:00,656 TRACE tika.TikaParser - xmpDM:album: Dirty Wings
> 2010-08-12 18:59:00,656 TRACE tika.TikaParser - xmpDM:artist: Josh Woodward
> 2010-08-12 18:59:00,656 TRACE tika.TikaParser - Author: Josh Woodward
> 2010-08-12 18:59:00,656 TRACE tika.TikaParser - channels: 2
> 2010-08-12 18:59:00,656 TRACE tika.TikaParser - xmpDM:genre: Rock
> 2010-08-12 18:59:00,657 TRACE tika.TikaParser - xmpDM:audioSampleRate:
> 44100
> 2010-08-12 18:59:00,657 TRACE tika.TikaParser - xmpDM:logComment:
> XXXCommentshttp://www.joshwoodward.com/
> 2010-08-12 18:59:00,657 TRACE tika.TikaParser - Content-Type: audio/mpeg
> 2010-08-12 18:59:00,657 TRACE tika.TikaParser - version: MPEG 3 Layer III
> Version 1
>
>
> How can I index this fields in the same way Creative Commons parser does?
> Shouldn't "nutchMetadata.add(tikaMDName, tikamd.get(tikaMDName));" do just
> that?
>

The TikaParser stores the metadata returned by Tika in the ParseMetadata.
It's not up to the parser to decide what should be indexed. This is the job
of the IndexingFilters. What you need to do is create a new plugin which an
implementation of an IndexingFilter which will inspect the parse metadata
and generate the fields accordingly. Have a look at the plugins index-* or
creativecommons to see examples of IndexingFilters.

HTH

Julien Nioche
-- 
DigitalPebble Ltd

Open Source Solutions for Text Engineering
http://www.digitalpebble.com

Reply via email to