Looks like Nutch doesn't support form-based authentication out of the box.  You 
may have to create your own httpclient or modify it for dealing with form-based 
authentication.  Form-based authentication requires dedicated input parameters 
(j_username, j_password) to be placed in the initial request message sent to 
the server.  Once authenticated, a cookie named jsessionid is going to be used 
to track the user session.

Nick


-----Original Message-----
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]
Sent: Tue 5/6/2008 4:54 PM
To: [email protected]
Subject: Re: How to authenticate with cookies?
 
Yoav - I'm not 100% certain about this, as I haven't had to deal with 
Nutch+cookies, but I did see some logging that made me think "ah, this thing 
handles cookies like a browser".  Yes, that's likely something that comes with 
httpclient, so just enable protocol-httpclient and disable protocol-http.  Want 
to try and report back?

Found this: 
http://wiki.apache.org/nutch/HttpPostAuthentication?highlight=%28cookies%29

Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch

----- Original Message ----
> From: Yoav Shapira <[EMAIL PROTECTED]>
> To: [email protected]
> Sent: Monday, May 5, 2008 8:49:50 PM
> Subject: How to authenticate with cookies?
> 
> Hi,
> 
> I'm using Nutch to crawl an intranet site that is behind form
> authentication.  I know Nutch doesn't support form authentication yet
> (right?), but I think this site would also work with cookies.  I have
> the right set of cookie names and values, at least for testing, but I
> don't know how to have Nutch use these cookies with every HTTP
> requests during its crawl.
> 
> I saw a reference to a "protocol-httpclient" plugin.  Is that true / relevant?
> 
> Any help on configuring Nutch to use cookies for authentication would
> be appreciated.
> 
> -- 
> Thanks,
> 
> Yoav
> 



Reply via email to