Hi Bob,

Server may require that the domain as a part of username. For example,
"domain\\user". Could you check that?

Kind Regards,
Furkan KAMACI

On Wed, Nov 2, 2016 at 9:11 PM, Bell, Bob <[email protected]> wrote:

> I have replaced <iis74.intranet> is just a string replacement for
> our actual intranet name something like blah.intranet.org, and
> I use the <> convention when I obscuring actual data.
>
> What might the log4js.properties entry for httpclient.Http ?  I see
> it is only at INFO level logging, but I do not know that proper
> object path to set it up.
>
> Thanks,
> Bob
>
> >Hi Bob,
> >
> >Do you write host as <iis75.intranet> or iis75.intranet ?
> >
> >Kind Regards,
> >Furkan KAMACI
>
> -----Original Message-----
> From: Bell, Bob
> Sent: Wednesday, November 02, 2016 12:17 PM
> To: '[email protected]' <[email protected]>
> Cc: Bell, Bob <[email protected]>
> Subject: Nutch 1.12 NTLM authentication IIS 7.5 Intranet
>
> I have been trying for more than a year to get NTLM to work with IIS 7.5
> without success.   I was
> happy to see the 1.12 recent release, and thought ok I will give it shot
> again.  I am almost to point where I do not believe it works with ntlm, or
> it does not know how to handle the multiple 401's
> that are returned, or I have some fundamental problem somewhere ?    I
> have tried everything I
> could think of, and am at loss on how to solve this mystery.    My Nutch
> server is a Centos 7 in a
> Virtual Box.    I am using the httpclient as indicated in the docs but
> with no love.      I can fetch with
> anonymous, but I need ntlm to work.
>
> I am using plugin.includes = >protocol-httpclient
>
> nutch-site.xml:
> <property>
> <name>http.auth.file</name>
> <value>httpclient-auth.xml</value>
> <description>Authentication configuration file for 'protocol-httpclient'
> plugin.
> </description>
> </property>
>
> httpclient-auth.xml for local user:
> <auth-configuration>
>     <credentials username="nutch" password="<somepassword>">
>         <default  scheme="basic" port="80"/>
>     </credentials>
> </auth-configuration>
>
> Here is output with local user account on the server, one thing I notice,
> is that I cannot force authentication
> to be anything other than ntlm, even though I support ntlm, basic, and
> digest.   Notice the scheme was basic,
> but it goes though ntlm regardless.
>
> [root@localhost nutch]# nutch parsechecker http://<iis75.intranet>
> fetching: http://<iis75.intranet>
> Whitelisted hosts: [<iis75.intranet>]
> http.proxy.host = null
> http.proxy.port = 8080
> http.proxy.exception.list = false
> http.timeout = 36000
> http.content.limit = 65536
> http.agent = APL-Nutch-Spider/Nutch-1.12 http.accept.language =
> en-us,en-gb,en;q=0.7,*;q=0.3 http.accept = text/html,application/xhtml+
> xml,application/xml;q=0.9,*/*;q=0.8
> Credentials - username: nutch; set as default for realm: ; scheme: basic
> Pre-configured credentials with scope -  host: <iis75.intranet>; port: 80;
> not found for url: http://<iis75.intranet> Authorization required
> Supported authentication schemes in the order of preference: [ntlm, digest,
> basic] ntlm authentication scheme selected Using authentication scheme:
> ntlm Authorization challenge processed Authentication scope: NTLM <any
> realm>@<iis75.intranet>:80 Credentials required Credentials provider not
> available No credentials available for NTLM <any realm>@<iis75.intranet>:80
> url: http://<iis75.intranet>; status code: 401; bytes received: 0;
> Content-Length: 0
> 401 Authentication Required
> Fetch failed with protocol status: access_denied(17), lastModified=0:
> Authentication required: http://<iis75.intranet> [root@localhost nutch]#
>
>
> httpclient-auth.xml for domain  user:
> <auth-configuration>
>     <credentials username="<domainuser>" password="<domainpassword>
>         <default host="<iis75.intranet>" scheme="ntlm" port="80"
> realm="<domain>"/>
>     </credentials>
> </auth-configuration>
>
> note: doesn’t matter what I put in the host, doesn’t seem to change
> anything.
>
> [root@localhost nutch]# nutch parsechecker http://<iis75.intranet>
> fetching: http://<iis75.intranet>
> Whitelisted hosts: [<iis75.intranet>]
> http.proxy.host = null
> http.proxy.port = 8080
> http.proxy.exception.list = false
> http.timeout = 36000
> http.content.limit = 65536
> http.agent = APL-Nutch-Spider/Nutch-1.12 http.accept.language =
> en-us,en-gb,en;q=0.7,*;q=0.3 http.accept = text/html,application/xhtml+
> xml,application/xml;q=0.9,*/*;q=0.8
> Credentials - username: <domainuser>"; set as default for realm:
> =<domain>; scheme: ntlm Pre-configured credentials with scope -  host:
> <iis75.intranet>; port: 80; not found for url: http://<iis75.intranet>
> Authorization required Supported authentication schemes in the order of
> preference: [ntlm, digest, basic] ntlm authentication scheme selected Using
> authentication scheme: ntlm Authorization challenge processed
> Authentication scope: NTLM <any realm>@<iis75.intranet>:80 Retry
> authentication Authenticating with NTLM <any realm>@<iis75.intranet>:80
> enter NTLMScheme.authenticate(Credentials, HttpMethod) Authorization
> required Using authentication scheme: ntlm Authorization challenge
> processed Authentication scope: NTLM <any realm>@<iis75.intranet>:80 Retry
> authentication Authenticating with NTLM <any realm>@<iis75.intranet>:80
> enter NTLMScheme.authenticate(Credentials, HttpMethod) Authorization
> required Using authentication scheme: ntlm Authorization challenge
> processed Authentication scope: NTLM <any realm>@<iis75.intranet>:80
> Credentials required Credentials provider not available Failure
> authenticating with NTLM <any realm>@<iis75.intranet>:80
> url: http://<iis75.intranet>; status code: 401; bytes received: 0;
> Content-Length: 0
> 401 Authentication Required
> Fetch failed with protocol status: access_denied(17), lastModified=0:
> Authentication required: http://<iis75.intranet>
>
> Last entry in  Hadoop.log:
>
> 2016-11-02 12:08:49,568 INFO  parse.ParserChecker - fetching: http://
> <iis75.intranet>
> 2016-11-02 12:08:50,040 DEBUG util.ObjectCache - No object cache found for
> conf=Configuration: core-default.xml, core-site.xml, nutch-default.xml,
> nutch-site.xml, instantiating a new object cache
> 2016-11-02 12:08:50,119 INFO  protocol.RobotRulesParser - Whitelisted
> hosts: [<iis75.intranet>]
> 2016-11-02 12:08:50,119 INFO  httpclient.Http - http.proxy.host = null
> 2016-11-02 12:08:50,119 INFO  httpclient.Http - http.proxy.port = 8080
> 2016-11-02 12:08:50,119 INFO  httpclient.Http - http.proxy.exception.list
> = false
> 2016-11-02 12:08:50,119 INFO  httpclient.Http - http.timeout = 36000
> 2016-11-02 12:08:50,119 INFO  httpclient.Http - http.content.limit = 65536
> 2016-11-02 12:08:50,119 INFO  httpclient.Http - http.agent =
> APL-Nutch-Spider/Nutch-1.12 ([email protected])
> 2016-11-02 12:08:50,120 INFO  httpclient.Http - http.accept.language =
> en-us,en-gb,en;q=0.7,*;q=0.3
> 2016-11-02 12:08:50,120 INFO  httpclient.Http - http.accept =
> text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
> 2016-11-02 12:08:50,133 TRACE httpclient.Http - Credentials - username:
> <domainuser>; set as default for realm: <domain>; scheme: ntlm
> 2016-11-02 12:08:50,134 TRACE httpclient.Http - Pre-configured credentials
> with scope -  host: <iis75.intranet>; port: 80; not found for url: http://
> <iis75.intranet>
> 2016-11-02 12:08:50,313 DEBUG httpclient.HttpMethodDirector -
> Authorization required
> 2016-11-02 12:08:50,320 DEBUG auth.AuthChallengeProcessor - Supported
> authentication schemes in the order of preference: [ntlm, digest, basic]
> 2016-11-02 12:08:50,320 INFO  auth.AuthChallengeProcessor - ntlm
> authentication scheme selected
> 2016-11-02 12:08:50,320 DEBUG auth.AuthChallengeProcessor - Using
> authentication scheme: ntlm
> 2016-11-02 12:08:50,320 DEBUG auth.AuthChallengeProcessor - Authorization
> challenge processed
> 2016-11-02 12:08:50,320 DEBUG httpclient.HttpMethodDirector -
> Authentication scope: NTLM <any realm>@<iis75.intranet>:80
> 2016-11-02 12:08:50,320 DEBUG httpclient.HttpMethodDirector - Retry
> authentication
> 2016-11-02 12:08:50,321 DEBUG httpclient.HttpMethodDirector -
> Authenticating with NTLM <any realm>@<iis75.intranet>:80
> 2016-11-02 12:08:50,321 TRACE auth.NTLMScheme - enter
> NTLMScheme.authenticate(Credentials, HttpMethod)
> 2016-11-02 12:08:50,351 DEBUG httpclient.HttpMethodDirector -
> Authorization required
> 2016-11-02 12:08:50,352 DEBUG auth.AuthChallengeProcessor - Using
> authentication scheme: ntlm
> 2016-11-02 12:08:50,352 DEBUG auth.AuthChallengeProcessor - Authorization
> challenge processed
> 2016-11-02 12:08:50,352 DEBUG httpclient.HttpMethodDirector -
> Authentication scope: NTLM <any realm>@<iis75.intranet>:80
> 2016-11-02 12:08:50,352 DEBUG httpclient.HttpMethodDirector - Retry
> authentication
> 2016-11-02 12:08:50,352 DEBUG httpclient.HttpMethodDirector -
> Authenticating with NTLM <any realm>@<iis75.intranet>:80
> 2016-11-02 12:08:50,352 TRACE auth.NTLMScheme - enter
> NTLMScheme.authenticate(Credentials, HttpMethod)
> 2016-11-02 12:08:50,393 DEBUG httpclient.HttpMethodDirector -
> Authorization required
> 2016-11-02 12:08:50,393 DEBUG auth.AuthChallengeProcessor - Using
> authentication scheme: ntlm
> 2016-11-02 12:08:50,393 DEBUG auth.AuthChallengeProcessor - Authorization
> challenge processed
> 2016-11-02 12:08:50,393 DEBUG httpclient.HttpMethodDirector -
> Authentication scope: NTLM <any realm>@<iis75.intranet>:80
> 2016-11-02 12:08:50,393 DEBUG httpclient.HttpMethodDirector - Credentials
> required
> 2016-11-02 12:08:50,393 DEBUG httpclient.HttpMethodDirector - Credentials
> provider not available
> 2016-11-02 12:08:50,393 INFO  httpclient.HttpMethodDirector - Failure
> authenticating with NTLM <any realm>@<iis75.intranet>:80
> 2016-11-02 12:08:50,395 TRACE httpclient.Http - url: http://<iis75.intranet>;
> status code: 401; bytes received: 0; Content-Length: 0
> 2016-11-02 12:08:50,681 DEBUG util.ObjectCache - No object cache found for
> conf=Configuration: core-default.xml, core-site.xml, nutch-default.xml,
> nutch-site.xml, instantiating a new object cache
> 2016-11-02 12:08:50,804 TRACE httpclient.Http - 401 Authentication Required
>
> Any help is appreciated, as I am about to move on to another spirder for
> solr.
>
> Thanks,
> Bob
>
>

Reply via email to