My *nutch default* contains <property> <name>db.ignore.external.links</name> <value>false</value> <description>If true, outlinks leading from a page to external hosts will be ignored. This is an effective way to limit the crawl to include only initially injected hosts, without creating complex URLFilters. </description> </property>
*seed* http://feeds.bbci.co.uk/news/business/rss.xml *regex url filter* +^http://([a-z0-9]*\.)*feeds.bbci.co.uk/news/business/rss.xml +^http://([a-z0-9]*\.)*www.bbc.co.uk/news/ *Crawl* $ bin/nutch crawl urls -solr http://localhost:8983/solr/ -depth 2 -topN 70 The crawl does not fetch any www.bbc.co.uk/news pages eventhough all links in http://feeds.bbci.co.uk/news/business/rss.xml are pointing to www.bbc.co.uk/news. Please let me know where i m wrong. Thanks in advance. Shameema

