Heh, I think this is another good use-case for HostDB, which doesn't yet exist. If this existed, we could store a cookie for each host in HostDB, and include it in CrawlDatum entries used in Fetcher(2). You'd have to dig down to o.a.n.protocol.httpclient.Http and add cookies to the request there, I believe.
Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch ----- Original Message ---- > From: Yoav Shapira <[EMAIL PROTECTED]> > To: [email protected] > Sent: Wednesday, May 7, 2008 9:37:01 AM > Subject: Re: How to authenticate with cookies? > > On Tue, May 6, 2008 at 10:47 PM, Duan, Niu wrote: > > Looks like Nutch doesn't support form-based authentication out of the box. > You may have to create your own httpclient or modify it for >dealing with > form-based authentication. Form-based authentication requires dedicated > input > parameters (j_username, j_password) to be >placed in the initial request > message > sent to the server. Once authenticated, a cookie named jsessionid is going > to > be used to track the >user session. > > Thank you Nick. > > What I'm actually looking for is a little different. My server uses a > custom cookie name and value to indicate an authenticated user. I > have this cookie (a valid version thereof, and let's assume for now > I've gotten past expiration issues) in a text file. > > How do I tell Nutch's crawler to include a cookie name and value with > each HTTP request? > > Yoav
