I'm having problems with integrating SOLR and NUTCH. I have done the following:

1 - Installed/configured NUTCH, SOLR, and HBase.

2 - The crawl script did not work for me, so I'm using the step-by-step commands

3 - I ran inject, generate, fetch, and parse and all ran successfully. I'm able 
to see the table in HBase and see the fetch and parse flags set for the entries.

4 - I copied the /conf/schema.xml from the Nutch directory into the SOLR config 
directory and verified its using the right schema.xml file. 

5 - I made sure that I updated schema.xml to set indexed and stored property to 
true
<field name="content" type="text" stored="true" indexed="true"/> 

6 - Finally, I started SOLR and tried running bin/nutch solrindex …

SOLR runs without errors (checked the solr.log). However, nothing is loaded to 
SOLR. It states number of documents loaded is 0, and the query *:* returns 
nothing. 

What could be the problem? Any ideas will be appreciated. 

Thanks

Mariam 

Reply via email to