Furkan,

Same results.   I tried domain\\user and domain\user, do I need
to put a trace on the traffic  and see what packets are being
sent by nutch ?    

Thanks,
Bob

-----Original Message-----
From: Bell, Bob [mailto:[email protected]] 
Sent: Wednesday, November 02, 2016 2:31 PM
To: [email protected]
Subject: RE: Nutch 1.12 NTLM authentication IIS 7.5 Intranet

Yes, I will check that.     I cranked up the logging and ran again, to see if 
you might spot something odd. 


-----Original Message-----
From: Furkan KAMACI [mailto:[email protected]] 
Sent: Wednesday, November 02, 2016 2:20 PM
To: [email protected]
Cc: Bell, Bob <[email protected]>
Subject: Re: Nutch 1.12 NTLM authentication IIS 7.5 Intranet

Hi Bob,

Server may require that the domain as a part of username. For example, 
"domain\\user". Could you check that?

Kind Regards,
Furkan KAMACI

On Wed, Nov 2, 2016 at 9:11 PM, Bell, Bob <[email protected]> wrote:

> I have replaced <iis74.intranet> is just a string replacement for our 
> actual intranet name something like blah.intranet.org, and I use the 
> <> convention when I obscuring actual data.
>
> What might the log4js.properties entry for httpclient.Http ?  I see it 
> is only at INFO level logging, but I do not know that proper object 
> path to set it up.
>
> Thanks,
> Bob
>
> >Hi Bob,
> >
> >Do you write host as <iis75.intranet> or iis75.intranet ?
> >
> >Kind Regards,
> >Furkan KAMACI
>
> -----Original Message-----
> From: Bell, Bob
> Sent: Wednesday, November 02, 2016 12:17 PM
> To: '[email protected]' <[email protected]>
> Cc: Bell, Bob <[email protected]>
> Subject: Nutch 1.12 NTLM authentication IIS 7.5 Intranet
>
> I have been trying for more than a year to get NTLM to work with IIS 7.5
> without success.   I was
> happy to see the 1.12 recent release, and thought ok I will give it 
> shot again.  I am almost to point where I do not believe it works with 
> ntlm, or it does not know how to handle the multiple 401's
> that are returned, or I have some fundamental problem somewhere ?    I
> have tried everything I
> could think of, and am at loss on how to solve this mystery.    My Nutch
> server is a Centos 7 in a
> Virtual Box.    I am using the httpclient as indicated in the docs but
> with no love.      I can fetch with
> anonymous, but I need ntlm to work.
>
> I am using plugin.includes = >protocol-httpclient
>
> nutch-site.xml:
> <property>
> <name>http.auth.file</name>
> <value>httpclient-auth.xml</value>
> <description>Authentication configuration file for 'protocol-httpclient'
> plugin.
> </description>
> </property>
>
> httpclient-auth.xml for local user:
> <auth-configuration>
>     <credentials username="nutch" password="<somepassword>">
>         <default  scheme="basic" port="80"/>
>     </credentials>
> </auth-configuration>
>
> Here is output with local user account on the server, one thing I 
> notice, is that I cannot force authentication to be anything other 
> than ntlm, even though I support ntlm, basic, and
> digest.   Notice the scheme was basic,
> but it goes though ntlm regardless.
>
> [root@localhost nutch]# nutch parsechecker http://<iis75.intranet>
> fetching: http://<iis75.intranet>
> Whitelisted hosts: [<iis75.intranet>]
> http.proxy.host = null
> http.proxy.port = 8080
> http.proxy.exception.list = false
> http.timeout = 36000
> http.content.limit = 65536
> http.agent = APL-Nutch-Spider/Nutch-1.12 http.accept.language =
> en-us,en-gb,en;q=0.7,*;q=0.3 http.accept = 
> text/html,application/xhtml+
> xml,application/xml;q=0.9,*/*;q=0.8
> Credentials - username: nutch; set as default for realm: ; scheme: 
> basic Pre-configured credentials with scope -  host: <iis75.intranet>; 
> port: 80; not found for url: http://<iis75.intranet> Authorization 
> required Supported authentication schemes in the order of preference: 
> [ntlm, digest, basic] ntlm authentication scheme selected Using 
> authentication scheme:
> ntlm Authorization challenge processed Authentication scope: NTLM <any
> realm>@<iis75.intranet>:80 Credentials required Credentials provider 
> realm>not
> available No credentials available for NTLM <any 
> realm>@<iis75.intranet>:80
> url: http://<iis75.intranet>; status code: 401; bytes received: 0;
> Content-Length: 0
> 401 Authentication Required
> Fetch failed with protocol status: access_denied(17), lastModified=0:
> Authentication required: http://<iis75.intranet> [root@localhost 
> nutch]#
>
>
> httpclient-auth.xml for domain  user:
> <auth-configuration>
>     <credentials username="<domainuser>" password="<domainpassword>
>         <default host="<iis75.intranet>" scheme="ntlm" port="80"
> realm="<domain>"/>
>     </credentials>
> </auth-configuration>
>
> note: doesn’t matter what I put in the host, doesn’t seem to change 
> anything.
>
> [root@localhost nutch]# nutch parsechecker http://<iis75.intranet>
> fetching: http://<iis75.intranet>
> Whitelisted hosts: [<iis75.intranet>]
> http.proxy.host = null
> http.proxy.port = 8080
> http.proxy.exception.list = false
> http.timeout = 36000
> http.content.limit = 65536
> http.agent = APL-Nutch-Spider/Nutch-1.12 http.accept.language =
> en-us,en-gb,en;q=0.7,*;q=0.3 http.accept = 
> text/html,application/xhtml+
> xml,application/xml;q=0.9,*/*;q=0.8
> Credentials - username: <domainuser>"; set as default for realm:
> =<domain>; scheme: ntlm Pre-configured credentials with scope -  host:
> <iis75.intranet>; port: 80; not found for url: http://<iis75.intranet> 
> Authorization required Supported authentication schemes in the order 
> of
> preference: [ntlm, digest, basic] ntlm authentication scheme selected 
> Using authentication scheme: ntlm Authorization challenge processed 
> Authentication scope: NTLM <any realm>@<iis75.intranet>:80 Retry 
> authentication Authenticating with NTLM <any 
> realm>@<iis75.intranet>:80 enter NTLMScheme.authenticate(Credentials, 
> HttpMethod) Authorization required Using authentication scheme: ntlm 
> Authorization challenge processed Authentication scope: NTLM <any 
> realm>@<iis75.intranet>:80 Retry authentication Authenticating with 
> NTLM <any realm>@<iis75.intranet>:80 enter 
> NTLMScheme.authenticate(Credentials, HttpMethod) Authorization 
> required Using authentication scheme: ntlm Authorization challenge 
> processed Authentication scope: NTLM <any realm>@<iis75.intranet>:80 
> Credentials required Credentials provider not available Failure 
> authenticating with NTLM <any realm>@<iis75.intranet>:80
> url: http://<iis75.intranet>; status code: 401; bytes received: 0;
> Content-Length: 0
> 401 Authentication Required
> Fetch failed with protocol status: access_denied(17), lastModified=0:
> Authentication required: http://<iis75.intranet>
>
> Last entry in  Hadoop.log:
>
> 2016-11-02 12:08:49,568 INFO  parse.ParserChecker - fetching: http:// 
> <iis75.intranet>
> 2016-11-02 12:08:50,040 DEBUG util.ObjectCache - No object cache found 
> for
> conf=Configuration: core-default.xml, core-site.xml, 
> nutch-default.xml, nutch-site.xml, instantiating a new object cache
> 2016-11-02 12:08:50,119 INFO  protocol.RobotRulesParser - Whitelisted
> hosts: [<iis75.intranet>]
> 2016-11-02 12:08:50,119 INFO  httpclient.Http - http.proxy.host = null
> 2016-11-02 12:08:50,119 INFO  httpclient.Http - http.proxy.port = 8080
> 2016-11-02 12:08:50,119 INFO  httpclient.Http - 
> http.proxy.exception.list = false
> 2016-11-02 12:08:50,119 INFO  httpclient.Http - http.timeout = 36000
> 2016-11-02 12:08:50,119 INFO  httpclient.Http - http.content.limit = 
> 65536
> 2016-11-02 12:08:50,119 INFO  httpclient.Http - http.agent =
> APL-Nutch-Spider/Nutch-1.12 ([email protected])
> 2016-11-02 12:08:50,120 INFO  httpclient.Http - http.accept.language =
> en-us,en-gb,en;q=0.7,*;q=0.3
> 2016-11-02 12:08:50,120 INFO  httpclient.Http - http.accept =
> text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
> 2016-11-02 12:08:50,133 TRACE httpclient.Http - Credentials - username:
> <domainuser>; set as default for realm: <domain>; scheme: ntlm
> 2016-11-02 12:08:50,134 TRACE httpclient.Http - Pre-configured 
> credentials with scope -  host: <iis75.intranet>; port: 80; not found 
> for url: http:// <iis75.intranet>
> 2016-11-02 12:08:50,313 DEBUG httpclient.HttpMethodDirector - 
> Authorization required
> 2016-11-02 12:08:50,320 DEBUG auth.AuthChallengeProcessor - Supported 
> authentication schemes in the order of preference: [ntlm, digest, 
> basic]
> 2016-11-02 12:08:50,320 INFO  auth.AuthChallengeProcessor - ntlm 
> authentication scheme selected
> 2016-11-02 12:08:50,320 DEBUG auth.AuthChallengeProcessor - Using 
> authentication scheme: ntlm
> 2016-11-02 12:08:50,320 DEBUG auth.AuthChallengeProcessor - 
> Authorization challenge processed
> 2016-11-02 12:08:50,320 DEBUG httpclient.HttpMethodDirector - 
> Authentication scope: NTLM <any realm>@<iis75.intranet>:80
> 2016-11-02 12:08:50,320 DEBUG httpclient.HttpMethodDirector - Retry 
> authentication
> 2016-11-02 12:08:50,321 DEBUG httpclient.HttpMethodDirector - 
> Authenticating with NTLM <any realm>@<iis75.intranet>:80
> 2016-11-02 12:08:50,321 TRACE auth.NTLMScheme - enter 
> NTLMScheme.authenticate(Credentials, HttpMethod)
> 2016-11-02 12:08:50,351 DEBUG httpclient.HttpMethodDirector - 
> Authorization required
> 2016-11-02 12:08:50,352 DEBUG auth.AuthChallengeProcessor - Using 
> authentication scheme: ntlm
> 2016-11-02 12:08:50,352 DEBUG auth.AuthChallengeProcessor - 
> Authorization challenge processed
> 2016-11-02 12:08:50,352 DEBUG httpclient.HttpMethodDirector - 
> Authentication scope: NTLM <any realm>@<iis75.intranet>:80
> 2016-11-02 12:08:50,352 DEBUG httpclient.HttpMethodDirector - Retry 
> authentication
> 2016-11-02 12:08:50,352 DEBUG httpclient.HttpMethodDirector - 
> Authenticating with NTLM <any realm>@<iis75.intranet>:80
> 2016-11-02 12:08:50,352 TRACE auth.NTLMScheme - enter 
> NTLMScheme.authenticate(Credentials, HttpMethod)
> 2016-11-02 12:08:50,393 DEBUG httpclient.HttpMethodDirector - 
> Authorization required
> 2016-11-02 12:08:50,393 DEBUG auth.AuthChallengeProcessor - Using 
> authentication scheme: ntlm
> 2016-11-02 12:08:50,393 DEBUG auth.AuthChallengeProcessor - 
> Authorization challenge processed
> 2016-11-02 12:08:50,393 DEBUG httpclient.HttpMethodDirector - 
> Authentication scope: NTLM <any realm>@<iis75.intranet>:80
> 2016-11-02 12:08:50,393 DEBUG httpclient.HttpMethodDirector - 
> Credentials required
> 2016-11-02 12:08:50,393 DEBUG httpclient.HttpMethodDirector - 
> Credentials provider not available
> 2016-11-02 12:08:50,393 INFO  httpclient.HttpMethodDirector - Failure 
> authenticating with NTLM <any realm>@<iis75.intranet>:80
> 2016-11-02 12:08:50,395 TRACE httpclient.Http - url: 
> http://<iis75.intranet>; status code: 401; bytes received: 0; 
> Content-Length: 0
> 2016-11-02 12:08:50,681 DEBUG util.ObjectCache - No object cache found 
> for
> conf=Configuration: core-default.xml, core-site.xml, 
> nutch-default.xml, nutch-site.xml, instantiating a new object cache
> 2016-11-02 12:08:50,804 TRACE httpclient.Http - 401 Authentication 
> Required
>
> Any help is appreciated, as I am about to move on to another spirder 
> for solr.
>
> Thanks,
> Bob
>
>

Reply via email to