Thank you, Lewis.

I tested with parsechecker tool and I'm still having the same problem. The
output from parsechecker is the following (I replaced the real URL and some
other data):
========================================================================
fetching: http://not-real-host.org/
http.proxy.host = null
http.proxy.port = 8080
http.timeout = 10000
http.content.limit = -1
http.agent = nutch/Nutch-1.5
http.accept.language = en-us,en-gb,en;q=0.7,*;q=0.3
http.accept =
text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Supported authentication schemes in the order of preference: [ntlm, digest,
basic]
Challenge for ntlm authentication scheme not available
Challenge for digest authentication scheme not available
basic authentication scheme selected
Using authentication scheme: basic
Authorization challenge processed
parsing: http://not-real-host.org/
contentType: text/html
signature: 23c79e5c98acfd6090aa8efb2aa51839
---------
Url
---------------
http://not-real-host.org/
---------
ParseData
---------
Version: 5
Status: success(1,0)
Title: 401 Authorization Required
Outlinks: 0
Content Metadata: X-Varnish=416519822 Age=0 WWW-Authenticate=Basic
realm="not-real-realm-name" Date=Thu, 20 Sep 2012 08:57:51 GMT
Vary=Accept-Encoding Content-Length=341 Content-Encoding=gzip Via=1.1
varnish Connection=close Content-Type=text/html; charset=iso-8859-1
X-Cache=MISS Server=Apache/2.2.3 (CentOS)
Parse Metadata: CharEncodingForConversion=windows-1252
OriginalCharEncoding=windows-1252
========================================================================



The log file has this:
========================================================================
2012-09-20 10:57:47,687 DEBUG auth.AuthChallengeProcessor - Supported
authentication schemes in the order of preference: [ntlm, digest, basic]
2012-09-20 10:57:47,687 DEBUG auth.AuthChallengeProcessor - Challenge for
ntlm authentication scheme not available
2012-09-20 10:57:47,687 DEBUG auth.AuthChallengeProcessor - Challenge for
digest authentication scheme not available
2012-09-20 10:57:47,687 INFO  auth.AuthChallengeProcessor - basic
authentication scheme selected
2012-09-20 10:57:47,687 DEBUG auth.AuthChallengeProcessor - Using
authentication scheme: basic
2012-09-20 10:57:47,687 DEBUG auth.AuthChallengeProcessor - Authorization
challenge processed
2012-09-20 10:57:47,687 INFO  httpclient.HttpMethodDirector - No credentials
available for BASIC 'not-real-realm-name'@not-real-host.org:80
========================================================================



So I get a "401 Authorization Required" error. However, I have the necessary
credentials in my httpclient-auth.xml file and the plugin
"protocol-httpclient" is enabled in order for the HTTP Authentication to
work in Nutch.

Nutch documentation states that if I specify default credentials (which I
did), they will be used for all websites that require Authentication. It
doesn't work like this for me. I suspect I'm missing something obvious, but
can't identify it...



Thanks in advance,
Max


-----Original Message-----
From: Lewis John Mcgibbney [mailto:[email protected]] 
Sent: den 19 september 2012 21:42
To: [email protected]; Max Dzyuba
Subject: Re: HTTP Authentication (basic) in Nutch 1.5

Best tool to use is the parsechecker, it is a quick neat way to see whether
your protocol/fetch/authentication is working then whether your parser is
extracting the text and metadata you require.

On Wed, Sep 19, 2012 at 8:30 PM, Max Dzyuba <[email protected]>
wrote:
> Hi Lewis,
>
> I used that website as an example. I don't specify the exact website that
I was using. I'm 100% sure that my website requires authentication and the
credentials I provide are verified too. So there is something I'm missing in
trying to make it work.
>
> Please help.
>
>
>
>
> Best regards,
> MaxLewis John Mcgibbney <[email protected]> wrote:Hi,
>
> On Wed, Sep 19, 2012 at 3:37 PM, Max Dzyuba <[email protected]>
wrote:
>
>>
>> 2012-09-19 16:26:16,106 INFO  httpclient.HttpMethodDirector - No 
>> credentials available for BASIC 'realm'@host.org:80
>>
>>
>>
>> I don't understand why Nutch complains about "No credentials 
>> available for BASIC 'realm'@host.org:80" since I've set up the 
>> default credentials which should be used for any page that asks for
authentication.
>>
>
> If I follow the above link I get a popup box saying that the site does 
> not require authentication credentials and that it is trying to trick 
> me.
>
> Are you sure its not just this site and that another solution is required?
>
> Lewis



--
Lewis

Reply via email to