Thanks for your reply!

I had not seen any weird exceptions before using it in v. 1.2  This version
I am able to fetch the first page from an https html page, but then it
doesn't find any outlinks.  I tried the ParserChecker and got the same
results. 

So it stops after this first round.  I have tried changing my filters to
allow everything (just to make sure that wasn't the issue) and nothing.

Another strange thing is that it seems to think that I have already fetched
it? I get the -shouldFetch rejected" message in the logs for the seed url. 
I am not sure how it is determining this, since I am using a new directory
for each test crawl.  I even deleted the temporary hadoop folders just to be
sure and I got the same result. 

--
View this message in context: 
http://lucene.472066.n3.nabble.com/protocol-httpclient-tp3216821p3218662.html
Sent from the Nutch - User mailing list archive at Nabble.com.

Reply via email to