YES - I forgot to include that... robots.txt is fine. it is wide open:
###
#
# sample robots.txt file for this website
#
# addresses all robots by using wild card *
User-agent: *
#
# list folders robots are not allowed to index
#Disallow: /tutorials/404redirect/
What is in your regex-urlfilter.txt?
-Original Message-
From: joshua paul [mailto:jos...@neocodesoftware.com]
Sent: Wednesday, 21 April 2010 9:44 AM
To: nutch-user@lucene.apache.org
Subject: nutch says No URLs to fetch - check your seed list and URL
filters when trying to index
after getting this email, I tried commenting out this line in
regex-urlfilter.txt =
#-[...@=]
but it didn't help... i still get same message - no urls to feth
regex-urlfilter.txt =
# skip URLs containing certain characters as probable queries, etc.
-[...@=]
# skip URLs with slash-delimited
Did you check robots.txt
On Wed, Apr 21, 2010 at 7:57 AM, joshua paul jos...@neocodesoftware.comwrote:
after getting this email, I tried commenting out this line in
regex-urlfilter.txt =
#-[...@=]
but it didn't help... i still get same message - no urls to feth
regex-urlfilter.txt =
#