Hi, I thinkI managed to address this issue. What i did was to also add +^http://([a-z0-9]*\.)*apache.org/ in the regex-urlfilter.txt in $NUTCH_HOME/conf. I guess both files regex-urlfilter.txt AND nutch-site.xml need to be concurrently updated in both locations, i.e. $NUTCH_HOME/conf & $NUTCH_HOME/conf/runtime/local/conf. Is that correct? In any case this was the only modification I made and the crawling worked.
-- View this message in context: http://lucene.472066.n3.nabble.com/Exception-in-thread-main-java-io-IOException-Job-failed-tp3766765p3821757.html Sent from the Nutch - User mailing list archive at Nabble.com.

