I think you asked the same question a few weeks ago and got an answer as to how this could be done
http://lucene.472066.n3.nabble.com/Images-videos-and-audio-tp2920535p2922882.html On 23 May 2011 00:18, Felipe Barriga Richards <[email protected]> wrote: > Hi everyone! > > Currently I'm submitting crawled pages and files to solr using "nutch > solrindex http://localhost:8983/solr/ ..." and it works. The problem is > that I need to extract metadata from PDF and MP3 files. To do this I can > submit the documents _manually_ using curl to solr > (http://localhost:8983/solr/update/extract). > Anyone knows how to configure nutch to do this ? > Maybe with chaining update processors on solr (solrconfig.xml) ? > > > Thanks, > > -- * *Open Source Solutions for Text Engineering http://digitalpebble.blogspot.com/ http://www.digitalpebble.com

