Hi,
On 6/7/07, Emmanuel JOKE <[EMAIL PROTECTED]> wrote:
Hi Guys,
I've different website which set a cookie session and then allow the user to
surf on the site.
I would like to crawl those site but I don't know if Nutch know how to
manage cookie session.
Could you confirm ?
AFAIK, there is no support for cookies.
I'm completly lost with the different plugin which are use to crawl with the
HTTP protocol.
Is it lib-http, protocol-http or protocol-httpclient ?
What is the difference between all of them ?
lib-http is the base of both protocol plugins. It handles stuff like
parsing robots.txt, making sure that fetcher is polite etc., but it
doesn't fetch pages. It delegates fetching to one of the
protocol-(http|httpclient) plugins. Since lib-http is a dependency for
both plugins it gets loaded when either of them gets loaded.
I would appreciate your view, it will help me to implement the management
of cookie in Nutch.
Thanks
--
Doğacan Güney