Re: How to authenticate with cookies?

Andrzej Bialecki Thu, 08 May 2008 08:15:30 -0700

POIRIER David wrote:

Yoav,


You are right. With the help of the "protocol-httpclient" plugin you
will be able to use cookies when crawling. There is one thing that you
need to watch out though (quoting Susam Pal): "protocol-httpclient does

this for a single fetch cycle".

To be honest I don't exactly know how to define a "fetch cycle". Based
on my experience it seems that every time the fetcher goes one level
deeper into a web site it starts a new cycle... or if it doesn't I loose
the cookie. It might be because of something else, but I don't think so.

If anybody has the answer to that, please let Yoav and I know.

This is correct. It comes from the fact that Nutch doesn't store cookies(that's yet another potential use for the planned HostDB functionality).This means that in order to accept and use cookies:

* you have to use protocol-httpclient. There is no support for cookiesin protocol-http.

* your fetchlist needs to have more than 1 url from the host - the firstrequest will presumably set the cookies, if you are lucky. ;)

* cookies are accumulated and kept in memory for the duration of thecurrent crawl task.



--
Best regards,
Andrzej Bialecki     <><
 ___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com

Re: How to authenticate with cookies?

Reply via email to