How can I make Nutch use HttpUrlConnection instead of HttpClient in the
painless way? It's been 8years since I wrote any Java code :-/

On Saturday, December 17, 2011, remi tassing <[email protected]> wrote:
> Hi,
>
> According to the link below, IIS gives an HTTP 500 response when the
server expects an NTLM V2 but is probably receiving an older version. I
would guess that the Httpclient in Nutch doesn't support NTLM V2.
>
> I would also guess that It worked for Arkadi because its server doesn't
use NTLM V2.
>
> Again according to the reference, Sun JRE 5 or higher fully suppliers
NTLM V2. I wonder why it wasn't used for Nutch.
>
> reference: http://oaklandsoftware.com/papers/ntlm.html
>
> On Wednesday, November 30, 2011, remi tassing <[email protected]>
wrote:
>> Thanks for tips Susam!
>> Unfortunately I don't have much support on the server side...
>> I have been tipped off by a friend mentioning the possibility of
crawlers being purposely blocked by the server.
>> So how can I make Nutch impersonate a browser?
>> I tried the tip in the following link but it didn't work:
http://osdir.com/ml/nutch-user.lucene.apache.org/2009-06/msg00022.html
>> Remi
>> On Sun, Nov 27, 2011 at 9:17 PM, Susam Pal <[email protected]> wrote:
>>>
>>> On Sun, Nov 27, 2011 at 4:41 PM, remi tassing <[email protected]>
wrote:
>>> > Hello guys,
>>> > With your advices, I tried tweaking config files during the week-end
and got
>>> > some problem I couldn't solve (I'm running nutch-1.2. Cygwin couldn't
get
>>> > nutch-1.3 to run).
>>> > A sample of my log file can be found below. I have two concerns:
>>> >   -How do I know if NTLM login worked?
>>> >   -How do I debug the http 500 error code? I suspect it might be due
to
>>> > cookies...
>>> > Thanks in advance for your help
>>> > ...
>>> > 2011-11-27 18:54:02,298 DEBUG auth.AuthChallengeProcessor - Supported
>>> > authentication schemes in the order of preference: [ntlm, digest,
basic]
>>> > 2011-11-27 18:54:02,300 INFO  auth.AuthChallengeProcessor - ntlm
>>> > authentication scheme selected
>>> > DEBUG auth.AuthChallengeProcessor - Using authentication scheme: ntlm
>>> > DEBUG auth.AuthChallengeProcessor - Authorization challenge processed
>>> > INFO  fetcher.Fetcher - -activeThreads=1, spinWaiting=0,
>>> > fetchQueues.totalSize=0
>>> > INFO  fetcher.Fetcher - -activeThreads=1, spinWaiting=0,
>>> > fetchQueues.totalSize=0
>>> > INFO  fetcher.Fetcher - fetch of https://URL failed with: Http
code=500,
>>> > url=https://URL
>>> > INFO  fetcher.Fetcher - -finishing thread FetcherThread,
activeThreads=0
>>> > INFO  fetcher.Fetcher - -activeThreads=0, spinWaiting=0,
>>> > fetchQueues.totalSize=0
>>> > INFO  fetcher.Fetcher - -activeThreads=0
>>> > ...
>>>
>>> From the logs, Nutch did attempt an NTLM authentication but the server
>>> returned HTTP 500. It says nothing about whether the NTLM
>>> authentication succeeded or failed. It only indicates that the
>>> authentication failed. It suggests that an internal error happened in
>>> SharePoint.
>>>
>>> Now, this can happen due to a variety of reasons. I don't know much
>>> about how to troubleshoot this in the SharePoint side. Perhaps you
>>> should be looking into IIS logs, event viewer, etc. to figure why
>>> SharePoint didn't accept your credentials.
>>>
>>> Most likely it is some kind of configuration problem in either
>>> SharePoint or IIS due to which the the NTLM authentication is causing
>>> some trouble. Even though it is outside the scope of Nutch, from my
>>> very limited experience working with SharePoint, I can say that it
>>> might be a good idea to get the Microsoft technical support involved
>>> while trying to troubleshoot this.
>>>
>>> Regards,
>>> Susam Pal
>>> http://susam.in/
>>
>>
>>
>> --
>> Remi Tassing
>>
>>

Reply via email to