Andrzej,

Thanks for the reply:

- Using 0.6 - probably not supported?
- Awesome, thanks.

Cory Wilkerson


-----Original Message-----
From: Andrzej Bialecki [mailto:[EMAIL PROTECTED] 
Sent: Tuesday, August 09, 2005 11:01 AM
To: [email protected]
Subject: Re: Cookies, etc.

Wilkerson, Cory wrote:
> Good morning everyone,
> 
> I've been spending a bit of time with Nutch lately - it's looking like
a
> really solid product - but I've a couple of questions that I need to
> resolve before I can really say whether or not Nutch will work for my
> particular situation.
> 
> <Note - the rest of this email presume some knowledge of server-side
> knowledge.>
> 
> I've pointed Nutch at a relatively small J2EE-based intranet and am
> currently performing and intranet crawl, per the Nutch tutorial.  As
per
> the J2EE spec, the presentation tier utilizes the jsessionid token to
> maintain client state.  Right now, I'm seeing my pages perform
> accordingly to non-cookied clients (Nutch) and serialize the
jsessionid
                  ^^^^^^^^^^^^^^^^^^^
Recent development versions of Nutch use protocol-httpclient plugin to 
handle HTTP, and this plugin supports cookies. Whic version are you
using?

> onto the generated link (foo.jsp;jsessionid=XXXXXXXXX), and while this
> works, the urls that Nutch stores in the index contain the jsessionid
> token (yes, it works, but it's a bit confusing and unnecessary).  

This can be removed through a regular expression in 
conf/regex-normalizer.xml

-- 
Best regards,
Andrzej Bialecki     <><
  ___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com

Reply via email to