Thanks for your reply! I had not seen any weird exceptions before using it in v. 1.2 This version I am able to fetch the first page from an https html page, but then it doesn't find any outlinks. I tried the ParserChecker and got the same results.
So it stops after this first round. I have tried changing my filters to allow everything (just to make sure that wasn't the issue) and nothing. Another strange thing is that it seems to think that I have already fetched it? I get the -shouldFetch rejected" message in the logs for the seed url. I am not sure how it is determining this, since I am using a new directory for each test crawl. I even deleted the temporary hadoop folders just to be sure and I got the same result. -- View this message in context: http://lucene.472066.n3.nabble.com/protocol-httpclient-tp3216821p3218662.html Sent from the Nutch - User mailing list archive at Nabble.com.