Re: nutch says No URLs to fetch - check your seed list and URL filters when trying to index fmforums.com

2010-04-21 Thread joshua paul
YES - I forgot to include that... robots.txt is fine. it is wide open: ### # # sample robots.txt file for this website # # addresses all robots by using wild card * User-agent: * # # list folders robots are not allowed to index #Disallow: /tutorials/404redirect/

RE: nutch says No URLs to fetch - check your seed list and URL filters when trying to index fmforums.com

2010-04-20 Thread Arkadi.Kosmynin
What is in your regex-urlfilter.txt? -Original Message- From: joshua paul [mailto:jos...@neocodesoftware.com] Sent: Wednesday, 21 April 2010 9:44 AM To: nutch-user@lucene.apache.org Subject: nutch says No URLs to fetch - check your seed list and URL filters when trying to index

Re: nutch says No URLs to fetch - check your seed list and URL filters when trying to index fmforums.com

2010-04-20 Thread joshua paul
after getting this email, I tried commenting out this line in regex-urlfilter.txt = #-[...@=] but it didn't help... i still get same message - no urls to feth regex-urlfilter.txt = # skip URLs containing certain characters as probable queries, etc. -[...@=] # skip URLs with slash-delimited

Re: nutch says No URLs to fetch - check your seed list and URL filters when trying to index fmforums.com

2010-04-20 Thread Harry Nutch
Did you check robots.txt On Wed, Apr 21, 2010 at 7:57 AM, joshua paul jos...@neocodesoftware.comwrote: after getting this email, I tried commenting out this line in regex-urlfilter.txt = #-[...@=] but it didn't help... i still get same message - no urls to feth regex-urlfilter.txt = #