Thanks for tips Susam!

Unfortunately I don't have much support on the server side...

I have been tipped off by a friend mentioning the possibility of crawlers
being purposely blocked by the server.

So how can I make Nutch impersonate a browser?

I tried the tip in the following link but it didn't work:
http://osdir.com/ml/nutch-user.lucene.apache.org/2009-06/msg00022.html

Remi
On Sun, Nov 27, 2011 at 9:17 PM, Susam Pal <[email protected]> wrote:

> On Sun, Nov 27, 2011 at 4:41 PM, remi tassing <[email protected]>
> wrote:
> > Hello guys,
> > With your advices, I tried tweaking config files during the week-end and
> got
> > some problem I couldn't solve (I'm running nutch-1.2. Cygwin couldn't get
> > nutch-1.3 to run).
> > A sample of my log file can be found below. I have two concerns:
> >   -How do I know if NTLM login worked?
> >   -How do I debug the http 500 error code? I suspect it might be due to
> > cookies...
> > Thanks in advance for your help
> > ...
> > 2011-11-27 18:54:02,298 DEBUG auth.AuthChallengeProcessor - Supported
> > authentication schemes in the order of preference: [ntlm, digest, basic]
> > 2011-11-27 18:54:02,300 INFO  auth.AuthChallengeProcessor - ntlm
> > authentication scheme selected
> > DEBUG auth.AuthChallengeProcessor - Using authentication scheme: ntlm
> > DEBUG auth.AuthChallengeProcessor - Authorization challenge processed
> > INFO  fetcher.Fetcher - -activeThreads=1, spinWaiting=0,
> > fetchQueues.totalSize=0
> > INFO  fetcher.Fetcher - -activeThreads=1, spinWaiting=0,
> > fetchQueues.totalSize=0
> > INFO  fetcher.Fetcher - fetch of https://URL failed with: Http code=500,
> > url=https://URL
> > INFO  fetcher.Fetcher - -finishing thread FetcherThread, activeThreads=0
> > INFO  fetcher.Fetcher - -activeThreads=0, spinWaiting=0,
> > fetchQueues.totalSize=0
> > INFO  fetcher.Fetcher - -activeThreads=0
> > ...
>
> From the logs, Nutch did attempt an NTLM authentication but the server
> returned HTTP 500. It says nothing about whether the NTLM
> authentication succeeded or failed. It only indicates that the
> authentication failed. It suggests that an internal error happened in
> SharePoint.
>
> Now, this can happen due to a variety of reasons. I don't know much
> about how to troubleshoot this in the SharePoint side. Perhaps you
> should be looking into IIS logs, event viewer, etc. to figure why
> SharePoint didn't accept your credentials.
>
> Most likely it is some kind of configuration problem in either
> SharePoint or IIS due to which the the NTLM authentication is causing
> some trouble. Even though it is outside the scope of Nutch, from my
> very limited experience working with SharePoint, I can say that it
> might be a good idea to get the Microsoft technical support involved
> while trying to troubleshoot this.
>
> Regards,
> Susam Pal
> http://susam.in/
>



-- 
Remi Tassing

Reply via email to