Hi, I would like to use Nutch to crawl and index an intranet web site for internal use. The site requires authentication, and stores the credentials in a cookie. I've got a valid login and I have the cookie saved, no problem. How do I tell Nutch to use it?
I did some research online before asking, but unfortunately I couldn't find a step-by-step answer for a newbie like myself. I see there's an http-client plugin that can support some authentication. Is that what I should use for cookies? If so, how do I configure it? Or is there something else I should be doing? If the documentation / answer exists, sorry for the hassle and please just point me to it ;) -- Thanks, Yoav
