I'm switching to more recent Nutch/Solr, after years of using Nutch 1.4 and 
Solr 3.3.0. I get no results when I index into Solr. I can't tell where this 
breaks down.

I use these commands:
cd /opt/apache-nutch-1.12/runtime/local
export JAVA_HOME=/usr/lib/jvm/java-1.7.0-openjdk-1.7.0.121.x86_64
export NUTCH_CONF_DIR=/opt/apache-nutch-1.12/runtime/local/conf/phfaws
bin/crawl urls/phfaws crawl/phfaws 1
bin/nutch solrindex http://localhost:8983/solr/phfaws/ crawl/phfaws/crawldb 
-linkdb crawl/phfaws/linkdb crawl/phfaws/segments/*

I believe that Nutch is crawling properly, but I do find that the crawl folders 
end up about 25% as large as what I produced with Nutch 1.4. I suspect that the 
problem is with the Nutch/Solr integration. My Solr core didn't create a 
schema.xml, instead having a managed scheme. I've copied my Nutch local conf's 
schema.xml into Solr, but I haven't seen that I'm supposed to do anything more 
with that.


Chip Calhoun
Digital Archivist
Niels Bohr Library & Archives
American Institute of Physics
One Physics Ellipse
College Park, MD  20740
301-209-3180
https://www.aip.org/history-programs/niels-bohr-library

Reply via email to