Hi,

According to the link below, IIS gives an HTTP 500 response when the server
expects an NTLM V2 but is probably receiving an older version. I would
guess that the Httpclient in Nutch doesn't support NTLM V2.

I would also guess that It worked for Arkadi because its server doesn't use
NTLM V2.

Again according to the reference, Sun JRE 5 or higher fully suppliers NTLM
V2. I wonder why it wasn't used for Nutch.

reference: http://oaklandsoftware.com/papers/ntlm.html

On Wednesday, November 30, 2011, remi tassing <[email protected]> wrote:
> Thanks for tips Susam!
> Unfortunately I don't have much support on the server side...
> I have been tipped off by a friend mentioning the possibility of crawlers
being purposely blocked by the server.
> So how can I make Nutch impersonate a browser?
> I tried the tip in the following link but it didn't work:
http://osdir.com/ml/nutch-user.lucene.apache.org/2009-06/msg00022.html
> Remi
> On Sun, Nov 27, 2011 at 9:17 PM, Susam Pal <[email protected]> wrote:
>>
>> On Sun, Nov 27, 2011 at 4:41 PM, remi tassing <[email protected]>
wrote:
>> > Hello guys,
>> > With your advices, I tried tweaking config files during the week-end
and got
>> > some problem I couldn't solve (I'm running nutch-1.2. Cygwin couldn't
get
>> > nutch-1.3 to run).
>> > A sample of my log file can be found below. I have two concerns:
>> >   -How do I know if NTLM login worked?
>> >   -How do I debug the http 500 error code? I suspect it might be due to
>> > cookies...
>> > Thanks in advance for your help
>> > ...
>> > 2011-11-27 18:54:02,298 DEBUG auth.AuthChallengeProcessor - Supported
>> > authentication schemes in the order of preference: [ntlm, digest,
basic]
>> > 2011-11-27 18:54:02,300 INFO  auth.AuthChallengeProcessor - ntlm
>> > authentication scheme selected
>> > DEBUG auth.AuthChallengeProcessor - Using authentication scheme: ntlm
>> > DEBUG auth.AuthChallengeProcessor - Authorization challenge processed
>> > INFO  fetcher.Fetcher - -activeThreads=1, spinWaiting=0,
>> > fetchQueues.totalSize=0
>> > INFO  fetcher.Fetcher - -activeThreads=1, spinWaiting=0,
>> > fetchQueues.totalSize=0
>> > INFO  fetcher.Fetcher - fetch of https://URL failed with: Http
code=500,
>> > url=https://URL
>> > INFO  fetcher.Fetcher - -finishing thread FetcherThread,
activeThreads=0
>> > INFO  fetcher.Fetcher - -activeThreads=0, spinWaiting=0,
>> > fetchQueues.totalSize=0
>> > INFO  fetcher.Fetcher - -activeThreads=0
>> > ...
>>
>> From the logs, Nutch did attempt an NTLM authentication but the server
>> returned HTTP 500. It says nothing about whether the NTLM
>> authentication succeeded or failed. It only indicates that the
>> authentication failed. It suggests that an internal error happened in
>> SharePoint.
>>
>> Now, this can happen due to a variety of reasons. I don't know much
>> about how to troubleshoot this in the SharePoint side. Perhaps you
>> should be looking into IIS logs, event viewer, etc. to figure why
>> SharePoint didn't accept your credentials.
>>
>> Most likely it is some kind of configuration problem in either
>> SharePoint or IIS due to which the the NTLM authentication is causing
>> some trouble. Even though it is outside the scope of Nutch, from my
>> very limited experience working with SharePoint, I can say that it
>> might be a good idea to get the Microsoft technical support involved
>> while trying to troubleshoot this.
>>
>> Regards,
>> Susam Pal
>> http://susam.in/
>
>
>
> --
> Remi Tassing
>
>

Reply via email to