I found this here
http://hc.apache.org/httpclient-3.x/authentication.html

NT Lan Manager (NTLM) authentication is a proprietary, closed
challenge/response authentication protocol for Microsoft Windows. Only
some details about NTLM protocol are available through reverse
engineering. HttpClient provides limited support for what is known as
NTLMv1, the early version of the NTLM protocol. HttpClient does not
support NTLMv2 at all.

So I assume siteminder uses the later version and we're out of luck. Has
anyone found a different way to authenticate with siteminder using
nutch?

-----Original Message-----
From: Campbell, John [mailto:[email protected]] 
Sent: Tuesday, September 21, 2010 4:22 PM
To: [email protected]
Subject: Httpclient Authentication Failure authenticating with NTLM

We are running nutch 1.1 and are attempting to crawl pages that are
behind Siteminder (NTLM). However, we're getting an error that we can't
seem to get around. Here is our setup -

Httpclient-auth.xml
<auth-configuration>
        <credentials username="user" password="pass">
                <default />
        </credentials>
</auth-configuration>

Plugin is enabled, http.agent.host is set to our server ip

Here is some relevant log info:

2010-09-21 16:03:22,954 INFO  httpclient.Http - http.proxy.host = null
2010-09-21 16:03:22,955 INFO  httpclient.Http - http.proxy.port = 8080
2010-09-21 16:03:22,955 INFO  httpclient.Http - http.timeout = 20000
2010-09-21 16:03:22,955 INFO  httpclient.Http - http.content.limit =
65536
2010-09-21 16:03:22,955 INFO  httpclient.Http - http.agent =
nutch-solr-integration/Nutch-1.1
2010-09-21 16:03:22,955 INFO  httpclient.Http - http.accept.language =
en-us,en-gb,en;q=0.7,*;q=0.3
2010-09-21 16:03:22,956 INFO  httpclient.Http -
protocol.plugin.check.blocking = false
2010-09-21 16:03:22,956 INFO  httpclient.Http -
protocol.plugin.check.robots = false
2010-09-21 16:03:24,450 DEBUG auth.AuthChallengeProcessor - Supported
authentication schemes in the order of preference: [ntlm, digest, basic]
2010-09-21 16:03:24,451 INFO  auth.AuthChallengeProcessor - ntlm
authentication scheme selected
2010-09-21 16:03:24,451 DEBUG auth.AuthChallengeProcessor - Using
authentication scheme: ntlm
2010-09-21 16:03:24,452 DEBUG auth.AuthChallengeProcessor -
Authorization challenge processed
2010-09-21 16:03:24,579 DEBUG auth.AuthChallengeProcessor - Using
authentication scheme: ntlm
2010-09-21 16:03:24,579 DEBUG auth.AuthChallengeProcessor -
Authorization challenge processed
2010-09-21 16:03:25,006 DEBUG auth.AuthChallengeProcessor - Using
authentication scheme: ntlm
2010-09-21 16:03:25,007 DEBUG auth.AuthChallengeProcessor -
Authorization challenge processed
2010-09-21 16:03:25,007 INFO  httpclient.HttpMethodDirector - Failure
authenticating with NTLM <any realm>@oursiteminderip:port

I noticed that our log doesn't contain any "Credentials - username
someuser; set .." which makes me think its not grabbing those
credentials correctly out of httpclient-auth.xml. However, siteminder
locks out our username after so many failed attempts and we have been
getting locked out so it seems like it is trying to authenticate.

Thanks for any help

Reply via email to