Hi, I thinkI managed to address this issue.
What i did was to also add 
+^http://([a-z0-9]*\.)*apache.org/
in the regex-urlfilter.txt in $NUTCH_HOME/conf.
I guess both files regex-urlfilter.txt AND nutch-site.xml need to be
concurrently updated in both locations, i.e.
$NUTCH_HOME/conf & $NUTCH_HOME/conf/runtime/local/conf.
Is that correct?
In any case this was the only modification I made and the crawling worked. 

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Exception-in-thread-main-java-io-IOException-Job-failed-tp3766765p3821757.html
Sent from the Nutch - User mailing list archive at Nabble.com.

Reply via email to