Yoav,

You are right. With the help of the "protocol-httpclient" plugin you
will be able to use cookies when crawling. There is one thing that you
need to watch out though (quoting Susam Pal): "protocol-httpclient does
this for a single fetch cycle". 

To be honest I don't exactly know how to define a "fetch cycle". Based
on my experience it seems that every time the fetcher goes one level
deeper into a web site it starts a new cycle... or if it doesn't I loose
the cookie. It might be because of something else, but I don't think so.

If anybody has the answer to that, please let Yoav and I know.

Thanks,

David


 


-----Original Message-----
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of
Yoav Shapira
Sent: mardi, 6. mai 2008 02:50
To: [email protected]
Subject: How to authenticate with cookies?

Hi,

I'm using Nutch to crawl an intranet site that is behind form
authentication.  I know Nutch doesn't support form authentication yet
(right?), but I think this site would also work with cookies.  I have
the right set of cookie names and values, at least for testing, but I
don't know how to have Nutch use these cookies with every HTTP
requests during its crawl.

I saw a reference to a "protocol-httpclient" plugin.  Is that true /
relevant?

Any help on configuring Nutch to use cookies for authentication would
be appreciated.

-- 
Thanks,

Yoav

Reply via email to