Hi Shri, if you look at regex-normalize.xml you will notice that jsession ids are not taken out as a default.
Your session id seems to be quite long. If you are sure of the format, you may try inserting the following rule <regex> <pattern>(;)jsessionid=[a-zA-Z0-9!-]{64}(?)</pattern> <substitution></substitution> </regex> Are all links of the same style or do you have the ? or ; at other places in your urls? Kind regards, Olaf On Sat, 26 Feb 2005 23:18:02 -0500, Chirag Chaman <[EMAIL PROTECTED]> wrote: > > Use the normalizer. > > If you see regex-normalizer.xml (conf directory) it should already have a > rule to remove Jsession iDs. > This one is a bit unique as it has a "-" . So we may need to write another > one -- sho ld be simple. > > CC- > > > ________________________________ > From: [EMAIL PROTECTED] > [mailto:[EMAIL PROTECTED] On Behalf Of > LocalSearch.HK > Sent: Saturday, February 26, 2005 9:05 PM > To: [EMAIL PROTECTED] > Subject: [Nutch-dev] Re: [Nutch-general] Question about normalizing urls > > > > Hi, > > Can anyone help? I know it is a bit of a newbie question, but I'm stuck with > it. > > Shri > > > ----- Original Message ----- > From: LocalSearch.HK > To: [EMAIL PROTECTED] > Sent: Thursday, February 24, 2005 11:27 PM > Subject: [Nutch-general] Question about normalizing urls > > > Hi, > > Can someone help me with this URL? How would I remove the session ids? > > ;jsessionid=Cdpd452kLqgrnjrvrCJjVjQdmLwWJjnG4JQ4KhPJq2ThQL4XbFzS!-1364780454? > > Regards, > Shri > -- <SimpleHuman gender="male"> <Physical name="Olaf Thiele" /> <Virtual adress="http://www.olafthiele.de" /> </SimpleHuman> ------------------------------------------------------- SF email is sponsored by - The IT Product Guide Read honest & candid reviews on hundreds of IT Products from real users. Discover which products truly live up to the hype. Start reading now. http://ads.osdn.com/?ad_id=6595&alloc_id=14396&op=click _______________________________________________ Nutch-developers mailing list Nutch-developers@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nutch-developers