Hi Bob, You may be right. Could you share the sniffed data. Do not forget to check headers when you analyse it.
Kind Regards, Furkan KAMACI On Wed, Nov 2, 2016 at 10:23 PM, Bell, Bob <[email protected]> wrote: > This is my httpclient-auth.xml where domainuser is my ad account, > domainpassword > is my password, and the realm is populated with our ad domain. I have > used host="ip of nutch box" > and host="website trying to crawl" I am not sure what you are trying to > say. I have used every > combination I can think of, and it just doesn't seem to work. > > <auth-configuration> > <credentials username="domainuser" password="domainpassword"> > <default host="ip of nutch box" scheme="ntlm" port="80" > realm="domain"/> > </credentials> > </auth-configuration> > > Is this not the proper way to setup the ntlm credentials above ? > > I have tried it with local accounts on the server being crawled, no luck > there either. I have used fully qualified domain names for > the ad domain, like domain.org etc.. Are the error logs pointing to > anything at all ?? > > 2016-11-02 14:41:12,059 TRACE httpclient.HttpMethodBase - enter > HttpMethodBase.processResponseHeaders(HttpState, HttpConnection) > 2016-11-02 14:41:12,059 TRACE httpclient.HttpMethodBase - enter > HttpMethodBase.processCookieHeaders(Header[], HttpState, HttpConnection) > 2016-11-02 14:41:12,059 TRACE httpclient.HttpMethodBase - enter > HttpMethodBase.readResponseBody(HttpState, HttpConnection) > 2016-11-02 14:41:12,059 TRACE httpclient.HttpMethodBase - enter > HttpMethodBase.readResponseBody(HttpConnection) > 2016-11-02 14:41:12,059 TRACE httpclient.HttpConnection - enter > HttpConnection.getResponseInputStream() > 2016-11-02 14:41:12,059 TRACE httpclient.HttpMethodBase - enter > HttpMethodBase.canResponseHaveBody(int) > > 2016-11-02 14:41:12,059 DEBUG httpclient.HttpMethodDirector - > Authorization required > 2016-11-02 14:41:12,059 TRACE httpclient.HttpMethodDirector - enter > HttpMethodBase.processAuthenticationResponse(HttpState, HttpConnection) > 2016-11-02 14:41:12,059 DEBUG auth.AuthChallengeProcessor - Using > authentication scheme: ntlm > 2016-11-02 14:41:12,060 DEBUG auth.AuthChallengeProcessor - Authorization > challenge processed > 2016-11-02 14:41:12,060 DEBUG httpclient.HttpMethodDirector - > Authentication scope: NTLM <any realm>@iis75.intranet.org:80 > 2016-11-02 14:41:12,060 DEBUG httpclient.HttpMethodDirector - Credentials > required > 2016-11-02 14:41:12,060 DEBUG httpclient.HttpMethodDirector - Credentials > provider not available > 2016-11-02 14:41:12,060 INFO httpclient.HttpMethodDirector - Failure > authenticating with NTLM <any realm>@iis75.intranet.org:80 > > These previous entries bother me the most. What does "Credentials > provider not available" mean ? Then I see > Failure authenticating with NTLM, which a real failure, would cause issues > on my IIS75, and lock the accounts > out ? Baffling, this is why I say I do not think it really is failing, it > just think it failed based on interpretation of what > it is getting back from the iis server ? Im stumped. > > 2016-11-02 14:41:12,062 DEBUG httpclient.HttpMethodBase - Resorting to > protocol version default close connection policy > 2016-11-02 14:41:12,062 DEBUG httpclient.HttpMethodBase - Should NOT close > connection, using HTTP/1.1 > 2016-11-02 14:41:12,062 TRACE httpclient.HttpConnection - enter > HttpConnection.isResponseAvailable() > 2016-11-02 14:41:12,062 TRACE httpclient.HttpConnection - enter > HttpConnection.releaseConnection() > 2016-11-02 14:41:12,062 DEBUG httpclient.HttpConnection - Releasing > connection back to connection manager. > 2016-11-02 14:41:12,062 TRACE httpclient.MultiThreadedHttpConnectionManager > - enter HttpConnectionManager.releaseConnection(HttpConnection) > 2016-11-02 14:41:12,062 DEBUG httpclient.MultiThreadedHttpConnectionManager > - Freeing connection, hostConfig=HostConfiguration[host= > http://iis75.intranet.org] > 2016-11-02 14:41:12,062 TRACE httpclient.MultiThreadedHttpConnectionManager > - enter HttpConnectionManager.ConnectionPool.getHostPool( > HostConfiguration) > 2016-11-02 14:41:12,062 DEBUG util.IdleConnectionHandler - Adding > connection at: 1478115672062 > 2016-11-02 14:41:12,062 DEBUG httpclient.MultiThreadedHttpConnectionManager > - Notifying no-one, there are no waiting threads > 2016-11-02 14:41:12,062 TRACE httpclient.Http - url: > http://iis75.intranet.org; status code: 401; bytes received: 0; > Content-Length: 0 > 2016-11-02 14:41:12,367 DEBUG util.ObjectCache - No object cache found for > conf=Configuration: core-default.xml, core-site.xml, nutch-default.xml, > nutch-site.xml, instantiating a new object cache > 2016-11-02 14:41:12,493 TRACE httpclient.Http - 401 Authentication Required > > I see this same issue with other Nutch users when I google it, and people > point back to the limited documented settings. > I know ntlm on the server works, as 300+ plus users authenticate to it > every day. I am not seeing any > security logs on the iis75 server indicating bad login attempts, and I > would get locked out if there was. > I must assume, it is not really attempting to authenticate, I am about to > put a sniffer on it, look at the actual traffic. > > Thanks, > Bob > > -----Original Message----- > From: Furkan KAMACI [mailto:[email protected]] > Sent: Wednesday, November 02, 2016 3:00 PM > To: [email protected] > Subject: Re: Nutch 1.12 NTLM authentication IIS 7.5 Intranet > > Hi Bob, > > It's explained as: > > "Example:- > <credentials username="susam" password="masus"> > <default realm="sso"/> > <authscope host="192.168.101.33" port="80" realm="login"/> > <authscope host="example" port="8080" realm="blogs"/> > <authscope host="example" port="8080" realm="wiki"/> > <authscope host="example" port="80" realm="quiz" scheme="NTLM"/> > </credentials> > <credentials username="admin" password="nimda"> > <authscope host="example" port="8080"/> > </credentials> > > In the above example, 'example:8080' server has pages with multiple > authentication realms. The first set of credentials would be used for > 'blogs' and 'wiki' authentication realms. The second set of > credentials would be used for all other realms. For 'login' realm of > '192.168.101.33', the first set of credentials would be used. For any > other realm of '192.168.101.33' authentication would not be done. For > the NTLM authentication required by 'example:80', the first set of > credentials would be used. For 'sso' realms of all other servers, the > first set of credentials would be used, since it is configured as > 'default'. > > NTLM does not use the notion of realms. The domain name may be > specified as the value for 'realm' attribute in case of NTLM." > > So, do you set realm? > > Kind Regards, > Furkan KAMACI > > On Wed, Nov 2, 2016 at 9:49 PM, Bell, Bob <[email protected]> > wrote: > > > Furkan, > > > > Same results. I tried domain\\user and domain\user, do I need > > to put a trace on the traffic and see what packets are being sent by > > nutch ? > > > > Thanks, > > Bob > > > > -----Original Message----- > > From: Bell, Bob [mailto:[email protected]] > > Sent: Wednesday, November 02, 2016 2:31 PM > > To: [email protected] > > Subject: RE: Nutch 1.12 NTLM authentication IIS 7.5 Intranet > > > > Yes, I will check that. I cranked up the logging and ran again, to > see > > if you might spot something odd. > > > > > > -----Original Message----- > > From: Furkan KAMACI [mailto:[email protected]] > > Sent: Wednesday, November 02, 2016 2:20 PM > > To: [email protected] > > Cc: Bell, Bob <[email protected]> > > Subject: Re: Nutch 1.12 NTLM authentication IIS 7.5 Intranet > > > > Hi Bob, > > > > Server may require that the domain as a part of username. For example, > > "domain\\user". Could you check that? > > > > Kind Regards, > > Furkan KAMACI > > > > On Wed, Nov 2, 2016 at 9:11 PM, Bell, Bob <[email protected]> > > wrote: > > > > > I have replaced <iis74.intranet> is just a string replacement for > > > our actual intranet name something like blah.intranet.org, and I use > > > the <> convention when I obscuring actual data. > > > > > > What might the log4js.properties entry for httpclient.Http ? I see > > > it is only at INFO level logging, but I do not know that proper > > > object path to set it up. > > > > > > Thanks, > > > Bob > > > > > > >Hi Bob, > > > > > > > >Do you write host as <iis75.intranet> or iis75.intranet ? > > > > > > > >Kind Regards, > > > >Furkan KAMACI > > > > > > -----Original Message----- > > > From: Bell, Bob > > > Sent: Wednesday, November 02, 2016 12:17 PM > > > To: '[email protected]' <[email protected]> > > > Cc: Bell, Bob <[email protected]> > > > Subject: Nutch 1.12 NTLM authentication IIS 7.5 Intranet > > > > > > I have been trying for more than a year to get NTLM to work with IIS > 7.5 > > > without success. I was > > > happy to see the 1.12 recent release, and thought ok I will give it > > > shot again. I am almost to point where I do not believe it works > > > with ntlm, or it does not know how to handle the multiple 401's > > > that are returned, or I have some fundamental problem somewhere ? I > > > have tried everything I > > > could think of, and am at loss on how to solve this mystery. My > Nutch > > > server is a Centos 7 in a > > > Virtual Box. I am using the httpclient as indicated in the docs but > > > with no love. I can fetch with > > > anonymous, but I need ntlm to work. > > > > > > I am using plugin.includes = >protocol-httpclient > > > > > > nutch-site.xml: > > > <property> > > > <name>http.auth.file</name> > > > <value>httpclient-auth.xml</value> > > > <description>Authentication configuration file for > 'protocol-httpclient' > > > plugin. > > > </description> > > > </property> > > > > > > httpclient-auth.xml for local user: > > > <auth-configuration> > > > <credentials username="nutch" password="<somepassword>"> > > > <default scheme="basic" port="80"/> > > > </credentials> > > > </auth-configuration> > > > > > > Here is output with local user account on the server, one thing I > > > notice, is that I cannot force authentication to be anything other > > > than ntlm, even though I support ntlm, basic, and > > > digest. Notice the scheme was basic, > > > but it goes though ntlm regardless. > > > > > > [root@localhost nutch]# nutch parsechecker http://<iis75.intranet> > > > fetching: http://<iis75.intranet> > > > Whitelisted hosts: [<iis75.intranet>] http.proxy.host = null > > > http.proxy.port = 8080 http.proxy.exception.list = false > > > http.timeout = 36000 http.content.limit = 65536 http.agent = > > > APL-Nutch-Spider/Nutch-1.12 http.accept.language = > > > en-us,en-gb,en;q=0.7,*;q=0.3 http.accept = > > > text/html,application/xhtml+ > > > xml,application/xml;q=0.9,*/*;q=0.8 > > > Credentials - username: nutch; set as default for realm: ; scheme: > > > basic Pre-configured credentials with scope - host: > > > <iis75.intranet>; > > > port: 80; not found for url: http://<iis75.intranet> Authorization > > > required Supported authentication schemes in the order of preference: > > > [ntlm, digest, basic] ntlm authentication scheme selected Using > > authentication scheme: > > > ntlm Authorization challenge processed Authentication scope: NTLM > > > <any > > > realm>@<iis75.intranet>:80 Credentials required Credentials provider > > > realm>not > > > available No credentials available for NTLM <any > > > realm>@<iis75.intranet>:80 > > > url: http://<iis75.intranet>; status code: 401; bytes received: 0; > > > Content-Length: 0 > > > 401 Authentication Required > > > Fetch failed with protocol status: access_denied(17), lastModified=0: > > > Authentication required: http://<iis75.intranet> [root@localhost > > > nutch]# > > > > > > > > > httpclient-auth.xml for domain user: > > > <auth-configuration> > > > <credentials username="<domainuser>" password="<domainpassword> > > > <default host="<iis75.intranet>" scheme="ntlm" port="80" > > > realm="<domain>"/> > > > </credentials> > > > </auth-configuration> > > > > > > note: doesn’t matter what I put in the host, doesn’t seem to change > > > anything. > > > > > > [root@localhost nutch]# nutch parsechecker http://<iis75.intranet> > > > fetching: http://<iis75.intranet> > > > Whitelisted hosts: [<iis75.intranet>] http.proxy.host = null > > > http.proxy.port = 8080 http.proxy.exception.list = false > > > http.timeout = 36000 http.content.limit = 65536 http.agent = > > > APL-Nutch-Spider/Nutch-1.12 http.accept.language = > > > en-us,en-gb,en;q=0.7,*;q=0.3 http.accept = > > > text/html,application/xhtml+ > > > xml,application/xml;q=0.9,*/*;q=0.8 > > > Credentials - username: <domainuser>"; set as default for realm: > > > =<domain>; scheme: ntlm Pre-configured credentials with scope - host: > > > <iis75.intranet>; port: 80; not found for url: > > > http://<iis75.intranet> Authorization required Supported > > > authentication schemes in the order of > > > preference: [ntlm, digest, basic] ntlm authentication scheme > > > selected Using authentication scheme: ntlm Authorization challenge > > > processed Authentication scope: NTLM <any realm>@<iis75.intranet>:80 > > > Retry authentication Authenticating with NTLM <any > > > realm>@<iis75.intranet>:80 enter > > > realm>NTLMScheme.authenticate(Credentials, > > > HttpMethod) Authorization required Using authentication scheme: ntlm > > > Authorization challenge processed Authentication scope: NTLM <any > > > realm>@<iis75.intranet>:80 Retry authentication Authenticating with > > > NTLM <any realm>@<iis75.intranet>:80 enter > > > NTLMScheme.authenticate(Credentials, HttpMethod) Authorization > > > required Using authentication scheme: ntlm Authorization challenge > > > processed Authentication scope: NTLM <any realm>@<iis75.intranet>:80 > > > Credentials required Credentials provider not available Failure > > > authenticating with NTLM <any realm>@<iis75.intranet>:80 > > > url: http://<iis75.intranet>; status code: 401; bytes received: 0; > > > Content-Length: 0 > > > 401 Authentication Required > > > Fetch failed with protocol status: access_denied(17), lastModified=0: > > > Authentication required: http://<iis75.intranet> > > > > > > Last entry in Hadoop.log: > > > > > > 2016-11-02 12:08:49,568 INFO parse.ParserChecker - fetching: > > > http:// <iis75.intranet> > > > 2016-11-02 12:08:50,040 DEBUG util.ObjectCache - No object cache > > > found for > > > conf=Configuration: core-default.xml, core-site.xml, > > > nutch-default.xml, nutch-site.xml, instantiating a new object cache > > > 2016-11-02 12:08:50,119 INFO protocol.RobotRulesParser - > > > Whitelisted > > > hosts: [<iis75.intranet>] > > > 2016-11-02 12:08:50,119 INFO httpclient.Http - http.proxy.host = > > > null > > > 2016-11-02 12:08:50,119 INFO httpclient.Http - http.proxy.port = > > > 8080 > > > 2016-11-02 12:08:50,119 INFO httpclient.Http - > > > http.proxy.exception.list = false > > > 2016-11-02 12:08:50,119 INFO httpclient.Http - http.timeout = 36000 > > > 2016-11-02 12:08:50,119 INFO httpclient.Http - http.content.limit = > > > 65536 > > > 2016-11-02 12:08:50,119 INFO httpclient.Http - http.agent = > > > APL-Nutch-Spider/Nutch-1.12 ([email protected]) > > > 2016-11-02 12:08:50,120 INFO httpclient.Http - http.accept.language > > > = > > > en-us,en-gb,en;q=0.7,*;q=0.3 > > > 2016-11-02 12:08:50,120 INFO httpclient.Http - http.accept = > > > text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8 > > > 2016-11-02 12:08:50,133 TRACE httpclient.Http - Credentials - username: > > > <domainuser>; set as default for realm: <domain>; scheme: ntlm > > > 2016-11-02 12:08:50,134 TRACE httpclient.Http - Pre-configured > > > credentials with scope - host: <iis75.intranet>; port: 80; not > > > found for url: http:// <iis75.intranet> > > > 2016-11-02 12:08:50,313 DEBUG httpclient.HttpMethodDirector - > > > Authorization required > > > 2016-11-02 12:08:50,320 DEBUG auth.AuthChallengeProcessor - > > > Supported authentication schemes in the order of preference: [ntlm, > > > digest, basic] > > > 2016-11-02 12:08:50,320 INFO auth.AuthChallengeProcessor - ntlm > > > authentication scheme selected > > > 2016-11-02 12:08:50,320 DEBUG auth.AuthChallengeProcessor - Using > > > authentication scheme: ntlm > > > 2016-11-02 12:08:50,320 DEBUG auth.AuthChallengeProcessor - > > > Authorization challenge processed > > > 2016-11-02 12:08:50,320 DEBUG httpclient.HttpMethodDirector - > > > Authentication scope: NTLM <any realm>@<iis75.intranet>:80 > > > 2016-11-02 12:08:50,320 DEBUG httpclient.HttpMethodDirector - Retry > > > authentication > > > 2016-11-02 12:08:50,321 DEBUG httpclient.HttpMethodDirector - > > > Authenticating with NTLM <any realm>@<iis75.intranet>:80 > > > 2016-11-02 12:08:50,321 TRACE auth.NTLMScheme - enter > > > NTLMScheme.authenticate(Credentials, HttpMethod) > > > 2016-11-02 12:08:50,351 DEBUG httpclient.HttpMethodDirector - > > > Authorization required > > > 2016-11-02 12:08:50,352 DEBUG auth.AuthChallengeProcessor - Using > > > authentication scheme: ntlm > > > 2016-11-02 12:08:50,352 DEBUG auth.AuthChallengeProcessor - > > > Authorization challenge processed > > > 2016-11-02 12:08:50,352 DEBUG httpclient.HttpMethodDirector - > > > Authentication scope: NTLM <any realm>@<iis75.intranet>:80 > > > 2016-11-02 12:08:50,352 DEBUG httpclient.HttpMethodDirector - Retry > > > authentication > > > 2016-11-02 12:08:50,352 DEBUG httpclient.HttpMethodDirector - > > > Authenticating with NTLM <any realm>@<iis75.intranet>:80 > > > 2016-11-02 12:08:50,352 TRACE auth.NTLMScheme - enter > > > NTLMScheme.authenticate(Credentials, HttpMethod) > > > 2016-11-02 12:08:50,393 DEBUG httpclient.HttpMethodDirector - > > > Authorization required > > > 2016-11-02 12:08:50,393 DEBUG auth.AuthChallengeProcessor - Using > > > authentication scheme: ntlm > > > 2016-11-02 12:08:50,393 DEBUG auth.AuthChallengeProcessor - > > > Authorization challenge processed > > > 2016-11-02 12:08:50,393 DEBUG httpclient.HttpMethodDirector - > > > Authentication scope: NTLM <any realm>@<iis75.intranet>:80 > > > 2016-11-02 12:08:50,393 DEBUG httpclient.HttpMethodDirector - > > > Credentials required > > > 2016-11-02 12:08:50,393 DEBUG httpclient.HttpMethodDirector - > > > Credentials provider not available > > > 2016-11-02 12:08:50,393 INFO httpclient.HttpMethodDirector - > > > Failure authenticating with NTLM <any realm>@<iis75.intranet>:80 > > > 2016-11-02 12:08:50,395 TRACE httpclient.Http - url: > > > http://<iis75.intranet>; status code: 401; bytes received: 0; > > > Content-Length: 0 > > > 2016-11-02 12:08:50,681 DEBUG util.ObjectCache - No object cache > > > found for > > > conf=Configuration: core-default.xml, core-site.xml, > > > nutch-default.xml, nutch-site.xml, instantiating a new object cache > > > 2016-11-02 12:08:50,804 TRACE httpclient.Http - 401 Authentication > > > Required > > > > > > Any help is appreciated, as I am about to move on to another spirder > > > for solr. > > > > > > Thanks, > > > Bob > > > > > > > > >

