Hi all,

I have been trying to run a crawl on a couple of different domains using
nutch:

bin/nutch crawl urls -dir crawled -depth 3

 Everytime I get the response:
Stopping at depth=x - no more URLs to fetch. Sometimes a page or two at the
first level get crawled and in most other cases, nothing gets crawled. I
don't know if I have been making a mistake in the crawl-urlfilter.txt file.
Here is how it looks for me:

# accept hosts in MY.DOMAIN.NAME
+^http://([a-z0-9]*\.)*blogspot.com/

(rest all other sections in the file have default values)

My urllist.txt file has only one url:
http://gmailblog.blogspot.com

The only website where the crawl seems to be working properly is
http://lucene.apache.org

Any suggestions are appreciated.



-- 
View this message in context: 
http://old.nabble.com/Stopping-at-depth%3D0---no-more-URLs-to-fetch-tp26310955p26310955.html
Sent from the Nutch - User mailing list archive at Nabble.com.

Reply via email to