nutch says No URLs to fetch - check your seed list and URL filters when
trying to index fmforums.com.
I am using this command:
bin/nutch crawl urls -dir crawl -depth 3 -topN 50
- urls directory contains urls.txt which contains http://www.fmforums.com/
- crawl-urlfilter.txt contains +^http://([
times, to break
loops
-.*(/[^/]+)/[^/]+\1/[^/]+\1/
+^http://([a-z0-9]*\.)*fmforums.com/
# skip everything else
-.
arkadi.kosmy...@csiro.au wrote on 2010-04-20 4:49 PM:
What is in your regex-urlfilter.txt?
-Original Message-----
From: joshua paul [mailto:jos...@neocodesoftware
filter.txt?
-Original Message-----
From: joshua paul [mailto:jos...@neocodesoftware.com]
Sent: Wednesday, 21 April 2010 9:44 AM
To: nutch-user@lucene.apache.org
Subject: nutch says No URLs to fetch - check your seed list and URL
filters when trying to index fmforums.com
nutch says No URL