Thank you, Lewis. I tested with parsechecker tool and I'm still having the same problem. The output from parsechecker is the following (I replaced the real URL and some other data): ======================================================================== fetching: http://not-real-host.org/ http.proxy.host = null http.proxy.port = 8080 http.timeout = 10000 http.content.limit = -1 http.agent = nutch/Nutch-1.5 http.accept.language = en-us,en-gb,en;q=0.7,*;q=0.3 http.accept = text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8 Supported authentication schemes in the order of preference: [ntlm, digest, basic] Challenge for ntlm authentication scheme not available Challenge for digest authentication scheme not available basic authentication scheme selected Using authentication scheme: basic Authorization challenge processed parsing: http://not-real-host.org/ contentType: text/html signature: 23c79e5c98acfd6090aa8efb2aa51839 --------- Url --------------- http://not-real-host.org/ --------- ParseData --------- Version: 5 Status: success(1,0) Title: 401 Authorization Required Outlinks: 0 Content Metadata: X-Varnish=416519822 Age=0 WWW-Authenticate=Basic realm="not-real-realm-name" Date=Thu, 20 Sep 2012 08:57:51 GMT Vary=Accept-Encoding Content-Length=341 Content-Encoding=gzip Via=1.1 varnish Connection=close Content-Type=text/html; charset=iso-8859-1 X-Cache=MISS Server=Apache/2.2.3 (CentOS) Parse Metadata: CharEncodingForConversion=windows-1252 OriginalCharEncoding=windows-1252 ========================================================================
The log file has this: ======================================================================== 2012-09-20 10:57:47,687 DEBUG auth.AuthChallengeProcessor - Supported authentication schemes in the order of preference: [ntlm, digest, basic] 2012-09-20 10:57:47,687 DEBUG auth.AuthChallengeProcessor - Challenge for ntlm authentication scheme not available 2012-09-20 10:57:47,687 DEBUG auth.AuthChallengeProcessor - Challenge for digest authentication scheme not available 2012-09-20 10:57:47,687 INFO auth.AuthChallengeProcessor - basic authentication scheme selected 2012-09-20 10:57:47,687 DEBUG auth.AuthChallengeProcessor - Using authentication scheme: basic 2012-09-20 10:57:47,687 DEBUG auth.AuthChallengeProcessor - Authorization challenge processed 2012-09-20 10:57:47,687 INFO httpclient.HttpMethodDirector - No credentials available for BASIC 'not-real-realm-name'@not-real-host.org:80 ======================================================================== So I get a "401 Authorization Required" error. However, I have the necessary credentials in my httpclient-auth.xml file and the plugin "protocol-httpclient" is enabled in order for the HTTP Authentication to work in Nutch. Nutch documentation states that if I specify default credentials (which I did), they will be used for all websites that require Authentication. It doesn't work like this for me. I suspect I'm missing something obvious, but can't identify it... Thanks in advance, Max -----Original Message----- From: Lewis John Mcgibbney [mailto:[email protected]] Sent: den 19 september 2012 21:42 To: [email protected]; Max Dzyuba Subject: Re: HTTP Authentication (basic) in Nutch 1.5 Best tool to use is the parsechecker, it is a quick neat way to see whether your protocol/fetch/authentication is working then whether your parser is extracting the text and metadata you require. On Wed, Sep 19, 2012 at 8:30 PM, Max Dzyuba <[email protected]> wrote: > Hi Lewis, > > I used that website as an example. I don't specify the exact website that I was using. I'm 100% sure that my website requires authentication and the credentials I provide are verified too. So there is something I'm missing in trying to make it work. > > Please help. > > > > > Best regards, > MaxLewis John Mcgibbney <[email protected]> wrote:Hi, > > On Wed, Sep 19, 2012 at 3:37 PM, Max Dzyuba <[email protected]> wrote: > >> >> 2012-09-19 16:26:16,106 INFO httpclient.HttpMethodDirector - No >> credentials available for BASIC 'realm'@host.org:80 >> >> >> >> I don't understand why Nutch complains about "No credentials >> available for BASIC 'realm'@host.org:80" since I've set up the >> default credentials which should be used for any page that asks for authentication. >> > > If I follow the above link I get a popup box saying that the site does > not require authentication credentials and that it is trying to trick > me. > > Are you sure its not just this site and that another solution is required? > > Lewis -- Lewis

