Yoav, You are right. With the help of the "protocol-httpclient" plugin you will be able to use cookies when crawling. There is one thing that you need to watch out though (quoting Susam Pal): "protocol-httpclient does this for a single fetch cycle".
To be honest I don't exactly know how to define a "fetch cycle". Based on my experience it seems that every time the fetcher goes one level deeper into a web site it starts a new cycle... or if it doesn't I loose the cookie. It might be because of something else, but I don't think so. If anybody has the answer to that, please let Yoav and I know. Thanks, David -----Original Message----- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Yoav Shapira Sent: mardi, 6. mai 2008 02:50 To: [email protected] Subject: How to authenticate with cookies? Hi, I'm using Nutch to crawl an intranet site that is behind form authentication. I know Nutch doesn't support form authentication yet (right?), but I think this site would also work with cookies. I have the right set of cookie names and values, at least for testing, but I don't know how to have Nutch use these cookies with every HTTP requests during its crawl. I saw a reference to a "protocol-httpclient" plugin. Is that true / relevant? Any help on configuring Nutch to use cookies for authentication would be appreciated. -- Thanks, Yoav
