We are continuing to have this problem of HTTP 407 authenticating with the proxy. I got the sysadmin to monitor the logs, and the logs throw this when nutch tries to crawl:
Aug 26 18:36:30 blrwcg01 content_gateway[10059]: NOTE: [4112998] winauth EVENT_NTLM_LOGON_DENIED ip:10.212.51.13, reason:(NTLM) NA NT_STATUS_WRONG_PASSWORD, log:Got user=[502047] domain=[] workstation=[] len1=24 len2=0#012Login for user []\[502047]@[] failed due to [Wrong Password] It seems as though the password is not going correctly to the proxy server. I have set all required proxy parameters correctly in nutch-site.xml. Any clues? Suresh. -----Original Message----- From: Lewis John Mcgibbney [mailto:[email protected]] Sent: Wednesday, June 05, 2013 11:28 AM To: [email protected] Subject: Nutch not crawling fully Hi, It is clear that for the configuration you are running NTLM is not authenticating properly. I would run the Http class with TRACE logging activated, this will show the credentials you are after. You should also note the documentation in nutch-default.xml which explicitly states "NOTE: For NTLM authentication, do not prefix the username with the domain, i.e. 'susam' is correct whereas 'DOMAIN\susam' is incorrect."... looking at your log this does not seem to be the case. http://svn.apache.org/repos/asf/nutch/trunk/src/plugin/protocol-httpclient/src/java/org/apache/nutch/protocol/httpclient/Http.java -- *Lewis* ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~Disclaimer~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Information contained and transmitted by this e-mail is confidential and proprietary to iGATE and its affiliates and is intended for use only by the recipient. If you are not the intended recipient, you are hereby notified that any dissemination, distribution, copying or use of this e-mail is strictly prohibited and you are requested to delete this e-mail immediately and notify the originator or [email protected] <mailto:[email protected]>. iGATE does not enter into any agreement with any party by e-mail. Any views expressed by an individual do not necessarily reflect the view of iGATE. iGATE is not responsible for the consequences of any actions taken on the basis of information provided, through this email. The contents of an attachment to this e-mail may contain software viruses, which could damage your own computer system. While iGATE has taken every reasonable precaution to minimise this risk, we cannot accept liability for any damage which you sustain as a result of software viruses. You should carry out your own virus checks before opening an attachment. To know more about iGATE please visit www.igate.com <http://www.igate.com>. ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

