I think I have Nutch set up right (Nutch 1.13 and Solr 6.6.0).  When I try
to crawl stuff and send it to Solr, it doesn't seem to be getting any
content.  Here's the code I'm using to get web content and push it to Solr:

mkdir -p /opt/nutch/urls
echo 'http://www.with-impact.com' > /opt/nutch/urls/seed.txt
vi /opt/nutch/conf/regex-urlfilter.txt
# +.
export JAVA_HOME='/etc/alternatives/jre_1.8.0'
/opt/solr/bin/solr create -c nutch_solr_data_core
/opt/nutch/bin/nutch inject crawl/crawldb urls/seed.txt
cd /opt/nutch
/opt/nutch/bin/nutch generate crawl/crawldb crawl/segments
s1=`ls -d /opt/nutch/crawl/segments/2* | tail -1`
/opt/nutch/bin/nutch fetch $s1
/opt/nutch/bin/nutch parse $s1
/opt/nutch/bin/nutch updatedb crawl/crawldb $s1
/opt/nutch/bin/nutch invertlinks crawl/linkdb -dir crawl/segments
/opt/nutch/bin/nutch solrindex http://localhost:8983/solr/nutch_solr_data_core
crawl/crawldb/ -linkdb crawl/linkdb/ $s1


Am I missing a step?

I wouldn't mind using nutch 2, but I didn't see a good tutorial for Nutch
2/Solr 6 integration.  Can anyone point me to one?

Thanks!

Reply via email to