You can start from http://wiki.apache.org/solr/ExtractingRequestHandler


On Mon, May 23, 2011 at 4:48 AM, Felipe Barriga Richards
<[email protected]> wrote:
> Hi everyone!
>
> Currently I'm submitting crawled pages and files to solr using "nutch
> solrindex http://localhost:8983/solr/ ..." and it works. The problem is
> that I need to extract metadata from PDF and MP3 files. To do this I can
> submit the documents _manually_ using curl to solr
> (http://localhost:8983/solr/update/extract).
> Anyone knows how to configure nutch to do this ?
> Maybe with chaining update processors on solr (solrconfig.xml) ?
>
>
> Thanks,
>
>

Reply via email to