SolrIndex command

marora Thu, 09 Aug 2012 14:41:22 -0700

Hi There,
I am a new Nutch user. I am using Nutch to crawl and then send crawl data
to SOLR. I have a question about bin/nutch solrindex command. Which tika
libraries are being used to index; Is it the tika libraries in Nutch or
does Nutch let SOLR index so it uses Solr's tika libraries? I think I read
it somewhere that Nutch is focusing on crawling and parsing and lets SOLR
do the indexing so SOLR's libraries should get used.


Specifically, I am having problems in extracting tags I.e. Say <H1> from
pdf files using Nutch/SOLR combination. The extract-contrib module defined
in schema.xml should get used.

Thanks in advance,
Madhvi

>

SolrIndex command

Reply via email to