Looks like Nutch doesn't support form-based authentication out of the box. You may have to create your own httpclient or modify it for dealing with form-based authentication. Form-based authentication requires dedicated input parameters (j_username, j_password) to be placed in the initial request message sent to the server. Once authenticated, a cookie named jsessionid is going to be used to track the user session.
Nick -----Original Message----- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] Sent: Tue 5/6/2008 4:54 PM To: [email protected] Subject: Re: How to authenticate with cookies? Yoav - I'm not 100% certain about this, as I haven't had to deal with Nutch+cookies, but I did see some logging that made me think "ah, this thing handles cookies like a browser". Yes, that's likely something that comes with httpclient, so just enable protocol-httpclient and disable protocol-http. Want to try and report back? Found this: http://wiki.apache.org/nutch/HttpPostAuthentication?highlight=%28cookies%29 Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch ----- Original Message ---- > From: Yoav Shapira <[EMAIL PROTECTED]> > To: [email protected] > Sent: Monday, May 5, 2008 8:49:50 PM > Subject: How to authenticate with cookies? > > Hi, > > I'm using Nutch to crawl an intranet site that is behind form > authentication. I know Nutch doesn't support form authentication yet > (right?), but I think this site would also work with cookies. I have > the right set of cookie names and values, at least for testing, but I > don't know how to have Nutch use these cookies with every HTTP > requests during its crawl. > > I saw a reference to a "protocol-httpclient" plugin. Is that true / relevant? > > Any help on configuring Nutch to use cookies for authentication would > be appreciated. > > -- > Thanks, > > Yoav >
