Andrzej, Thanks for the reply:
- Using 0.6 - probably not supported? - Awesome, thanks. Cory Wilkerson -----Original Message----- From: Andrzej Bialecki [mailto:[EMAIL PROTECTED] Sent: Tuesday, August 09, 2005 11:01 AM To: [email protected] Subject: Re: Cookies, etc. Wilkerson, Cory wrote: > Good morning everyone, > > I've been spending a bit of time with Nutch lately - it's looking like a > really solid product - but I've a couple of questions that I need to > resolve before I can really say whether or not Nutch will work for my > particular situation. > > <Note - the rest of this email presume some knowledge of server-side > knowledge.> > > I've pointed Nutch at a relatively small J2EE-based intranet and am > currently performing and intranet crawl, per the Nutch tutorial. As per > the J2EE spec, the presentation tier utilizes the jsessionid token to > maintain client state. Right now, I'm seeing my pages perform > accordingly to non-cookied clients (Nutch) and serialize the jsessionid ^^^^^^^^^^^^^^^^^^^ Recent development versions of Nutch use protocol-httpclient plugin to handle HTTP, and this plugin supports cookies. Whic version are you using? > onto the generated link (foo.jsp;jsessionid=XXXXXXXXX), and while this > works, the urls that Nutch stores in the index contain the jsessionid > token (yes, it works, but it's a bit confusing and unnecessary). This can be removed through a regular expression in conf/regex-normalizer.xml -- Best regards, Andrzej Bialecki <>< ___. ___ ___ ___ _ _ __________________________________ [__ || __|__/|__||\/| Information Retrieval, Semantic Web ___|||__|| \| || | Embedded Unix, System Integration http://www.sigram.com Contact: info at sigram dot com
