Hi Bob, Server may require that the domain as a part of username. For example, "domain\\user". Could you check that?
Kind Regards, Furkan KAMACI On Wed, Nov 2, 2016 at 9:11 PM, Bell, Bob <[email protected]> wrote: > I have replaced <iis74.intranet> is just a string replacement for > our actual intranet name something like blah.intranet.org, and > I use the <> convention when I obscuring actual data. > > What might the log4js.properties entry for httpclient.Http ? I see > it is only at INFO level logging, but I do not know that proper > object path to set it up. > > Thanks, > Bob > > >Hi Bob, > > > >Do you write host as <iis75.intranet> or iis75.intranet ? > > > >Kind Regards, > >Furkan KAMACI > > -----Original Message----- > From: Bell, Bob > Sent: Wednesday, November 02, 2016 12:17 PM > To: '[email protected]' <[email protected]> > Cc: Bell, Bob <[email protected]> > Subject: Nutch 1.12 NTLM authentication IIS 7.5 Intranet > > I have been trying for more than a year to get NTLM to work with IIS 7.5 > without success. I was > happy to see the 1.12 recent release, and thought ok I will give it shot > again. I am almost to point where I do not believe it works with ntlm, or > it does not know how to handle the multiple 401's > that are returned, or I have some fundamental problem somewhere ? I > have tried everything I > could think of, and am at loss on how to solve this mystery. My Nutch > server is a Centos 7 in a > Virtual Box. I am using the httpclient as indicated in the docs but > with no love. I can fetch with > anonymous, but I need ntlm to work. > > I am using plugin.includes = >protocol-httpclient > > nutch-site.xml: > <property> > <name>http.auth.file</name> > <value>httpclient-auth.xml</value> > <description>Authentication configuration file for 'protocol-httpclient' > plugin. > </description> > </property> > > httpclient-auth.xml for local user: > <auth-configuration> > <credentials username="nutch" password="<somepassword>"> > <default scheme="basic" port="80"/> > </credentials> > </auth-configuration> > > Here is output with local user account on the server, one thing I notice, > is that I cannot force authentication > to be anything other than ntlm, even though I support ntlm, basic, and > digest. Notice the scheme was basic, > but it goes though ntlm regardless. > > [root@localhost nutch]# nutch parsechecker http://<iis75.intranet> > fetching: http://<iis75.intranet> > Whitelisted hosts: [<iis75.intranet>] > http.proxy.host = null > http.proxy.port = 8080 > http.proxy.exception.list = false > http.timeout = 36000 > http.content.limit = 65536 > http.agent = APL-Nutch-Spider/Nutch-1.12 http.accept.language = > en-us,en-gb,en;q=0.7,*;q=0.3 http.accept = text/html,application/xhtml+ > xml,application/xml;q=0.9,*/*;q=0.8 > Credentials - username: nutch; set as default for realm: ; scheme: basic > Pre-configured credentials with scope - host: <iis75.intranet>; port: 80; > not found for url: http://<iis75.intranet> Authorization required > Supported authentication schemes in the order of preference: [ntlm, digest, > basic] ntlm authentication scheme selected Using authentication scheme: > ntlm Authorization challenge processed Authentication scope: NTLM <any > realm>@<iis75.intranet>:80 Credentials required Credentials provider not > available No credentials available for NTLM <any realm>@<iis75.intranet>:80 > url: http://<iis75.intranet>; status code: 401; bytes received: 0; > Content-Length: 0 > 401 Authentication Required > Fetch failed with protocol status: access_denied(17), lastModified=0: > Authentication required: http://<iis75.intranet> [root@localhost nutch]# > > > httpclient-auth.xml for domain user: > <auth-configuration> > <credentials username="<domainuser>" password="<domainpassword> > <default host="<iis75.intranet>" scheme="ntlm" port="80" > realm="<domain>"/> > </credentials> > </auth-configuration> > > note: doesn’t matter what I put in the host, doesn’t seem to change > anything. > > [root@localhost nutch]# nutch parsechecker http://<iis75.intranet> > fetching: http://<iis75.intranet> > Whitelisted hosts: [<iis75.intranet>] > http.proxy.host = null > http.proxy.port = 8080 > http.proxy.exception.list = false > http.timeout = 36000 > http.content.limit = 65536 > http.agent = APL-Nutch-Spider/Nutch-1.12 http.accept.language = > en-us,en-gb,en;q=0.7,*;q=0.3 http.accept = text/html,application/xhtml+ > xml,application/xml;q=0.9,*/*;q=0.8 > Credentials - username: <domainuser>"; set as default for realm: > =<domain>; scheme: ntlm Pre-configured credentials with scope - host: > <iis75.intranet>; port: 80; not found for url: http://<iis75.intranet> > Authorization required Supported authentication schemes in the order of > preference: [ntlm, digest, basic] ntlm authentication scheme selected Using > authentication scheme: ntlm Authorization challenge processed > Authentication scope: NTLM <any realm>@<iis75.intranet>:80 Retry > authentication Authenticating with NTLM <any realm>@<iis75.intranet>:80 > enter NTLMScheme.authenticate(Credentials, HttpMethod) Authorization > required Using authentication scheme: ntlm Authorization challenge > processed Authentication scope: NTLM <any realm>@<iis75.intranet>:80 Retry > authentication Authenticating with NTLM <any realm>@<iis75.intranet>:80 > enter NTLMScheme.authenticate(Credentials, HttpMethod) Authorization > required Using authentication scheme: ntlm Authorization challenge > processed Authentication scope: NTLM <any realm>@<iis75.intranet>:80 > Credentials required Credentials provider not available Failure > authenticating with NTLM <any realm>@<iis75.intranet>:80 > url: http://<iis75.intranet>; status code: 401; bytes received: 0; > Content-Length: 0 > 401 Authentication Required > Fetch failed with protocol status: access_denied(17), lastModified=0: > Authentication required: http://<iis75.intranet> > > Last entry in Hadoop.log: > > 2016-11-02 12:08:49,568 INFO parse.ParserChecker - fetching: http:// > <iis75.intranet> > 2016-11-02 12:08:50,040 DEBUG util.ObjectCache - No object cache found for > conf=Configuration: core-default.xml, core-site.xml, nutch-default.xml, > nutch-site.xml, instantiating a new object cache > 2016-11-02 12:08:50,119 INFO protocol.RobotRulesParser - Whitelisted > hosts: [<iis75.intranet>] > 2016-11-02 12:08:50,119 INFO httpclient.Http - http.proxy.host = null > 2016-11-02 12:08:50,119 INFO httpclient.Http - http.proxy.port = 8080 > 2016-11-02 12:08:50,119 INFO httpclient.Http - http.proxy.exception.list > = false > 2016-11-02 12:08:50,119 INFO httpclient.Http - http.timeout = 36000 > 2016-11-02 12:08:50,119 INFO httpclient.Http - http.content.limit = 65536 > 2016-11-02 12:08:50,119 INFO httpclient.Http - http.agent = > APL-Nutch-Spider/Nutch-1.12 ([email protected]) > 2016-11-02 12:08:50,120 INFO httpclient.Http - http.accept.language = > en-us,en-gb,en;q=0.7,*;q=0.3 > 2016-11-02 12:08:50,120 INFO httpclient.Http - http.accept = > text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8 > 2016-11-02 12:08:50,133 TRACE httpclient.Http - Credentials - username: > <domainuser>; set as default for realm: <domain>; scheme: ntlm > 2016-11-02 12:08:50,134 TRACE httpclient.Http - Pre-configured credentials > with scope - host: <iis75.intranet>; port: 80; not found for url: http:// > <iis75.intranet> > 2016-11-02 12:08:50,313 DEBUG httpclient.HttpMethodDirector - > Authorization required > 2016-11-02 12:08:50,320 DEBUG auth.AuthChallengeProcessor - Supported > authentication schemes in the order of preference: [ntlm, digest, basic] > 2016-11-02 12:08:50,320 INFO auth.AuthChallengeProcessor - ntlm > authentication scheme selected > 2016-11-02 12:08:50,320 DEBUG auth.AuthChallengeProcessor - Using > authentication scheme: ntlm > 2016-11-02 12:08:50,320 DEBUG auth.AuthChallengeProcessor - Authorization > challenge processed > 2016-11-02 12:08:50,320 DEBUG httpclient.HttpMethodDirector - > Authentication scope: NTLM <any realm>@<iis75.intranet>:80 > 2016-11-02 12:08:50,320 DEBUG httpclient.HttpMethodDirector - Retry > authentication > 2016-11-02 12:08:50,321 DEBUG httpclient.HttpMethodDirector - > Authenticating with NTLM <any realm>@<iis75.intranet>:80 > 2016-11-02 12:08:50,321 TRACE auth.NTLMScheme - enter > NTLMScheme.authenticate(Credentials, HttpMethod) > 2016-11-02 12:08:50,351 DEBUG httpclient.HttpMethodDirector - > Authorization required > 2016-11-02 12:08:50,352 DEBUG auth.AuthChallengeProcessor - Using > authentication scheme: ntlm > 2016-11-02 12:08:50,352 DEBUG auth.AuthChallengeProcessor - Authorization > challenge processed > 2016-11-02 12:08:50,352 DEBUG httpclient.HttpMethodDirector - > Authentication scope: NTLM <any realm>@<iis75.intranet>:80 > 2016-11-02 12:08:50,352 DEBUG httpclient.HttpMethodDirector - Retry > authentication > 2016-11-02 12:08:50,352 DEBUG httpclient.HttpMethodDirector - > Authenticating with NTLM <any realm>@<iis75.intranet>:80 > 2016-11-02 12:08:50,352 TRACE auth.NTLMScheme - enter > NTLMScheme.authenticate(Credentials, HttpMethod) > 2016-11-02 12:08:50,393 DEBUG httpclient.HttpMethodDirector - > Authorization required > 2016-11-02 12:08:50,393 DEBUG auth.AuthChallengeProcessor - Using > authentication scheme: ntlm > 2016-11-02 12:08:50,393 DEBUG auth.AuthChallengeProcessor - Authorization > challenge processed > 2016-11-02 12:08:50,393 DEBUG httpclient.HttpMethodDirector - > Authentication scope: NTLM <any realm>@<iis75.intranet>:80 > 2016-11-02 12:08:50,393 DEBUG httpclient.HttpMethodDirector - Credentials > required > 2016-11-02 12:08:50,393 DEBUG httpclient.HttpMethodDirector - Credentials > provider not available > 2016-11-02 12:08:50,393 INFO httpclient.HttpMethodDirector - Failure > authenticating with NTLM <any realm>@<iis75.intranet>:80 > 2016-11-02 12:08:50,395 TRACE httpclient.Http - url: http://<iis75.intranet>; > status code: 401; bytes received: 0; Content-Length: 0 > 2016-11-02 12:08:50,681 DEBUG util.ObjectCache - No object cache found for > conf=Configuration: core-default.xml, core-site.xml, nutch-default.xml, > nutch-site.xml, instantiating a new object cache > 2016-11-02 12:08:50,804 TRACE httpclient.Http - 401 Authentication Required > > Any help is appreciated, as I am about to move on to another spirder for > solr. > > Thanks, > Bob > >

