Hi,

I tried the code snippet from the link below and it worked! Just need to
figure out how to integrate that into Nutch, any help?

[1]
http://developer-resource.blogspot.com/2008/06/ntlm-authentication-from-java.html

On Sat, Dec 17, 2011 at 3:07 PM, remi tassing <[email protected]> wrote:

> How can I make Nutch use HttpUrlConnection instead of HttpClient in the
> painless way? It's been 8years since I wrote any Java code :-/
>
>
> On Saturday, December 17, 2011, remi tassing <[email protected]>
> wrote:
> > Hi,
> >
> > According to the link below, IIS gives an HTTP 500 response when the
> server expects an NTLM V2 but is probably receiving an older version. I
> would guess that the Httpclient in Nutch doesn't support NTLM V2.
> >
> > I would also guess that It worked for Arkadi because its server doesn't
> use NTLM V2.
> >
> > Again according to the reference, Sun JRE 5 or higher fully suppliers
> NTLM V2. I wonder why it wasn't used for Nutch.
> >
> > reference: http://oaklandsoftware.com/papers/ntlm.html
> >
> > On Wednesday, November 30, 2011, remi tassing <[email protected]>
> wrote:
> >> Thanks for tips Susam!
> >> Unfortunately I don't have much support on the server side...
> >> I have been tipped off by a friend mentioning the possibility of
> crawlers being purposely blocked by the server.
> >> So how can I make Nutch impersonate a browser?
> >> I tried the tip in the following link but it didn't work:
> http://osdir.com/ml/nutch-user.lucene.apache.org/2009-06/msg00022.html
> >> Remi
> >> On Sun, Nov 27, 2011 at 9:17 PM, Susam Pal <[email protected]> wrote:
> >>>
> >>> On Sun, Nov 27, 2011 at 4:41 PM, remi tassing <[email protected]>
> wrote:
> >>> > Hello guys,
> >>> > With your advices, I tried tweaking config files during the week-end
> and got
> >>> > some problem I couldn't solve (I'm running nutch-1.2. Cygwin
> couldn't get
> >>> > nutch-1.3 to run).
> >>> > A sample of my log file can be found below. I have two concerns:
> >>> >   -How do I know if NTLM login worked?
> >>> >   -How do I debug the http 500 error code? I suspect it might be due
> to
> >>> > cookies...
> >>> > Thanks in advance for your help
> >>> > ...
> >>> > 2011-11-27 18:54:02,298 DEBUG auth.AuthChallengeProcessor - Supported
> >>> > authentication schemes in the order of preference: [ntlm, digest,
> basic]
> >>> > 2011-11-27 18:54:02,300 INFO  auth.AuthChallengeProcessor - ntlm
> >>> > authentication scheme selected
> >>> > DEBUG auth.AuthChallengeProcessor - Using authentication scheme: ntlm
> >>> > DEBUG auth.AuthChallengeProcessor - Authorization challenge processed
> >>> > INFO  fetcher.Fetcher - -activeThreads=1, spinWaiting=0,
> >>> > fetchQueues.totalSize=0
> >>> > INFO  fetcher.Fetcher - -activeThreads=1, spinWaiting=0,
> >>> > fetchQueues.totalSize=0
> >>> > INFO  fetcher.Fetcher - fetch of https://URL failed with: Http
> code=500,
> >>> > url=https://URL
> >>> > INFO  fetcher.Fetcher - -finishing thread FetcherThread,
> activeThreads=0
> >>> > INFO  fetcher.Fetcher - -activeThreads=0, spinWaiting=0,
> >>> > fetchQueues.totalSize=0
> >>> > INFO  fetcher.Fetcher - -activeThreads=0
> >>> > ...
> >>>
> >>> From the logs, Nutch did attempt an NTLM authentication but the server
> >>> returned HTTP 500. It says nothing about whether the NTLM
> >>> authentication succeeded or failed. It only indicates that the
> >>> authentication failed. It suggests that an internal error happened in
> >>> SharePoint.
> >>>
> >>> Now, this can happen due to a variety of reasons. I don't know much
> >>> about how to troubleshoot this in the SharePoint side. Perhaps you
> >>> should be looking into IIS logs, event viewer, etc. to figure why
> >>> SharePoint didn't accept your credentials.
> >>>
> >>> Most likely it is some kind of configuration problem in either
> >>> SharePoint or IIS due to which the the NTLM authentication is causing
> >>> some trouble. Even though it is outside the scope of Nutch, from my
> >>> very limited experience working with SharePoint, I can say that it
> >>> might be a good idea to get the Microsoft technical support involved
> >>> while trying to troubleshoot this.
> >>>
> >>> Regards,
> >>> Susam Pal
> >>> http://susam.in/
> >>
> >>
> >>
> >> --
> >> Remi Tassing
> >>
> >>
>



-- 
Remi Tassing

Reply via email to