You can start from http://wiki.apache.org/solr/ExtractingRequestHandler
On Mon, May 23, 2011 at 4:48 AM, Felipe Barriga Richards <[email protected]> wrote: > Hi everyone! > > Currently I'm submitting crawled pages and files to solr using "nutch > solrindex http://localhost:8983/solr/ ..." and it works. The problem is > that I need to extract metadata from PDF and MP3 files. To do this I can > submit the documents _manually_ using curl to solr > (http://localhost:8983/solr/update/extract). > Anyone knows how to configure nutch to do this ? > Maybe with chaining update processors on solr (solrconfig.xml) ? > > > Thanks, > >

