Heh, I think this is another good use-case for HostDB, which doesn't yet exist. 
 If this existed, we could store a cookie for each host in HostDB, and include 
it in CrawlDatum entries used in Fetcher(2).  You'd have to dig down to 
o.a.n.protocol.httpclient.Http and add cookies to the request there, I believe.

Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch


----- Original Message ----
> From: Yoav Shapira <[EMAIL PROTECTED]>
> To: [email protected]
> Sent: Wednesday, May 7, 2008 9:37:01 AM
> Subject: Re: How to authenticate with cookies?
> 
> On Tue, May 6, 2008 at 10:47 PM, Duan, Niu wrote:
> > Looks like Nutch doesn't support form-based authentication out of the box.  
> You may have to create your own httpclient or modify it for >dealing with 
> form-based authentication.  Form-based authentication requires dedicated 
> input 
> parameters (j_username, j_password) to be >placed in the initial request 
> message 
> sent to the server.  Once authenticated, a cookie named jsessionid is going 
> to 
> be used to track the >user session.
> 
> Thank you Nick.
> 
> What I'm actually looking for is a little different.  My server uses a
> custom cookie name and value to indicate an authenticated user.  I
> have this cookie (a valid version thereof, and let's assume for now
> I've gotten past expiration issues) in a text file.
> 
> How do I tell Nutch's crawler to include a cookie name and value with
> each HTTP request?
> 
> Yoav

Reply via email to