I have replaced <iis74.intranet> is just a string replacement for our actual intranet name something like blah.intranet.org, and I use the <> convention when I obscuring actual data.
What might the log4js.properties entry for httpclient.Http ? I see it is only at INFO level logging, but I do not know that proper object path to set it up. Thanks, Bob >Hi Bob, > >Do you write host as <iis75.intranet> or iis75.intranet ? > >Kind Regards, >Furkan KAMACI -----Original Message----- From: Bell, Bob Sent: Wednesday, November 02, 2016 12:17 PM To: '[email protected]' <[email protected]> Cc: Bell, Bob <[email protected]> Subject: Nutch 1.12 NTLM authentication IIS 7.5 Intranet I have been trying for more than a year to get NTLM to work with IIS 7.5 without success. I was happy to see the 1.12 recent release, and thought ok I will give it shot again. I am almost to point where I do not believe it works with ntlm, or it does not know how to handle the multiple 401's that are returned, or I have some fundamental problem somewhere ? I have tried everything I could think of, and am at loss on how to solve this mystery. My Nutch server is a Centos 7 in a Virtual Box. I am using the httpclient as indicated in the docs but with no love. I can fetch with anonymous, but I need ntlm to work. I am using plugin.includes = >protocol-httpclient nutch-site.xml: <property> <name>http.auth.file</name> <value>httpclient-auth.xml</value> <description>Authentication configuration file for 'protocol-httpclient' plugin. </description> </property> httpclient-auth.xml for local user: <auth-configuration> <credentials username="nutch" password="<somepassword>"> <default scheme="basic" port="80"/> </credentials> </auth-configuration> Here is output with local user account on the server, one thing I notice, is that I cannot force authentication to be anything other than ntlm, even though I support ntlm, basic, and digest. Notice the scheme was basic, but it goes though ntlm regardless. [root@localhost nutch]# nutch parsechecker http://<iis75.intranet> fetching: http://<iis75.intranet> Whitelisted hosts: [<iis75.intranet>] http.proxy.host = null http.proxy.port = 8080 http.proxy.exception.list = false http.timeout = 36000 http.content.limit = 65536 http.agent = APL-Nutch-Spider/Nutch-1.12 http.accept.language = en-us,en-gb,en;q=0.7,*;q=0.3 http.accept = text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8 Credentials - username: nutch; set as default for realm: ; scheme: basic Pre-configured credentials with scope - host: <iis75.intranet>; port: 80; not found for url: http://<iis75.intranet> Authorization required Supported authentication schemes in the order of preference: [ntlm, digest, basic] ntlm authentication scheme selected Using authentication scheme: ntlm Authorization challenge processed Authentication scope: NTLM <any realm>@<iis75.intranet>:80 Credentials required Credentials provider not available No credentials available for NTLM <any realm>@<iis75.intranet>:80 url: http://<iis75.intranet>; status code: 401; bytes received: 0; Content-Length: 0 401 Authentication Required Fetch failed with protocol status: access_denied(17), lastModified=0: Authentication required: http://<iis75.intranet> [root@localhost nutch]# httpclient-auth.xml for domain user: <auth-configuration> <credentials username="<domainuser>" password="<domainpassword> <default host="<iis75.intranet>" scheme="ntlm" port="80" realm="<domain>"/> </credentials> </auth-configuration> note: doesn’t matter what I put in the host, doesn’t seem to change anything. [root@localhost nutch]# nutch parsechecker http://<iis75.intranet> fetching: http://<iis75.intranet> Whitelisted hosts: [<iis75.intranet>] http.proxy.host = null http.proxy.port = 8080 http.proxy.exception.list = false http.timeout = 36000 http.content.limit = 65536 http.agent = APL-Nutch-Spider/Nutch-1.12 http.accept.language = en-us,en-gb,en;q=0.7,*;q=0.3 http.accept = text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8 Credentials - username: <domainuser>"; set as default for realm: =<domain>; scheme: ntlm Pre-configured credentials with scope - host: <iis75.intranet>; port: 80; not found for url: http://<iis75.intranet> Authorization required Supported authentication schemes in the order of preference: [ntlm, digest, basic] ntlm authentication scheme selected Using authentication scheme: ntlm Authorization challenge processed Authentication scope: NTLM <any realm>@<iis75.intranet>:80 Retry authentication Authenticating with NTLM <any realm>@<iis75.intranet>:80 enter NTLMScheme.authenticate(Credentials, HttpMethod) Authorization required Using authentication scheme: ntlm Authorization challenge processed Authentication scope: NTLM <any realm>@<iis75.intranet>:80 Retry authentication Authenticating with NTLM <any realm>@<iis75.intranet>:80 enter NTLMScheme.authenticate(Credentials, HttpMethod) Authorization required Using authentication scheme: ntlm Authorization challenge processed Authentication scope: NTLM <any realm>@<iis75.intranet>:80 Credentials required Credentials provider not available Failure authenticating with NTLM <any realm>@<iis75.intranet>:80 url: http://<iis75.intranet>; status code: 401; bytes received: 0; Content-Length: 0 401 Authentication Required Fetch failed with protocol status: access_denied(17), lastModified=0: Authentication required: http://<iis75.intranet> Last entry in Hadoop.log: 2016-11-02 12:08:49,568 INFO parse.ParserChecker - fetching: http://<iis75.intranet> 2016-11-02 12:08:50,040 DEBUG util.ObjectCache - No object cache found for conf=Configuration: core-default.xml, core-site.xml, nutch-default.xml, nutch-site.xml, instantiating a new object cache 2016-11-02 12:08:50,119 INFO protocol.RobotRulesParser - Whitelisted hosts: [<iis75.intranet>] 2016-11-02 12:08:50,119 INFO httpclient.Http - http.proxy.host = null 2016-11-02 12:08:50,119 INFO httpclient.Http - http.proxy.port = 8080 2016-11-02 12:08:50,119 INFO httpclient.Http - http.proxy.exception.list = false 2016-11-02 12:08:50,119 INFO httpclient.Http - http.timeout = 36000 2016-11-02 12:08:50,119 INFO httpclient.Http - http.content.limit = 65536 2016-11-02 12:08:50,119 INFO httpclient.Http - http.agent = APL-Nutch-Spider/Nutch-1.12 ([email protected]) 2016-11-02 12:08:50,120 INFO httpclient.Http - http.accept.language = en-us,en-gb,en;q=0.7,*;q=0.3 2016-11-02 12:08:50,120 INFO httpclient.Http - http.accept = text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8 2016-11-02 12:08:50,133 TRACE httpclient.Http - Credentials - username: <domainuser>; set as default for realm: <domain>; scheme: ntlm 2016-11-02 12:08:50,134 TRACE httpclient.Http - Pre-configured credentials with scope - host: <iis75.intranet>; port: 80; not found for url: http://<iis75.intranet> 2016-11-02 12:08:50,313 DEBUG httpclient.HttpMethodDirector - Authorization required 2016-11-02 12:08:50,320 DEBUG auth.AuthChallengeProcessor - Supported authentication schemes in the order of preference: [ntlm, digest, basic] 2016-11-02 12:08:50,320 INFO auth.AuthChallengeProcessor - ntlm authentication scheme selected 2016-11-02 12:08:50,320 DEBUG auth.AuthChallengeProcessor - Using authentication scheme: ntlm 2016-11-02 12:08:50,320 DEBUG auth.AuthChallengeProcessor - Authorization challenge processed 2016-11-02 12:08:50,320 DEBUG httpclient.HttpMethodDirector - Authentication scope: NTLM <any realm>@<iis75.intranet>:80 2016-11-02 12:08:50,320 DEBUG httpclient.HttpMethodDirector - Retry authentication 2016-11-02 12:08:50,321 DEBUG httpclient.HttpMethodDirector - Authenticating with NTLM <any realm>@<iis75.intranet>:80 2016-11-02 12:08:50,321 TRACE auth.NTLMScheme - enter NTLMScheme.authenticate(Credentials, HttpMethod) 2016-11-02 12:08:50,351 DEBUG httpclient.HttpMethodDirector - Authorization required 2016-11-02 12:08:50,352 DEBUG auth.AuthChallengeProcessor - Using authentication scheme: ntlm 2016-11-02 12:08:50,352 DEBUG auth.AuthChallengeProcessor - Authorization challenge processed 2016-11-02 12:08:50,352 DEBUG httpclient.HttpMethodDirector - Authentication scope: NTLM <any realm>@<iis75.intranet>:80 2016-11-02 12:08:50,352 DEBUG httpclient.HttpMethodDirector - Retry authentication 2016-11-02 12:08:50,352 DEBUG httpclient.HttpMethodDirector - Authenticating with NTLM <any realm>@<iis75.intranet>:80 2016-11-02 12:08:50,352 TRACE auth.NTLMScheme - enter NTLMScheme.authenticate(Credentials, HttpMethod) 2016-11-02 12:08:50,393 DEBUG httpclient.HttpMethodDirector - Authorization required 2016-11-02 12:08:50,393 DEBUG auth.AuthChallengeProcessor - Using authentication scheme: ntlm 2016-11-02 12:08:50,393 DEBUG auth.AuthChallengeProcessor - Authorization challenge processed 2016-11-02 12:08:50,393 DEBUG httpclient.HttpMethodDirector - Authentication scope: NTLM <any realm>@<iis75.intranet>:80 2016-11-02 12:08:50,393 DEBUG httpclient.HttpMethodDirector - Credentials required 2016-11-02 12:08:50,393 DEBUG httpclient.HttpMethodDirector - Credentials provider not available 2016-11-02 12:08:50,393 INFO httpclient.HttpMethodDirector - Failure authenticating with NTLM <any realm>@<iis75.intranet>:80 2016-11-02 12:08:50,395 TRACE httpclient.Http - url: http://<iis75.intranet>; status code: 401; bytes received: 0; Content-Length: 0 2016-11-02 12:08:50,681 DEBUG util.ObjectCache - No object cache found for conf=Configuration: core-default.xml, core-site.xml, nutch-default.xml, nutch-site.xml, instantiating a new object cache 2016-11-02 12:08:50,804 TRACE httpclient.Http - 401 Authentication Required Any help is appreciated, as I am about to move on to another spirder for solr. Thanks, Bob

