Hi Shri,
if you look at regex-normalize.xml you will notice
that jsession ids are not taken out as a default.

Your session id seems to be quite long. If you are
sure of the format, you may try inserting the following
rule

<regex>
<pattern>(;)jsessionid=[a-zA-Z0-9!-]{64}(?)</pattern>
  <substitution></substitution>
</regex>

Are all links of the same style or do you have the
? or ; at other places in your urls?

Kind regards,
Olaf


On Sat, 26 Feb 2005 23:18:02 -0500, Chirag Chaman <[EMAIL PROTECTED]> wrote:
>  
> Use the normalizer. 
>   
> If you see regex-normalizer.xml  (conf directory) it should already have a
> rule to remove Jsession iDs. 
> This one is a bit unique as it has a "-" . So we may need to write another
> one -- sho ld be simple. 
>   
> CC- 
>  
>  
>  ________________________________
>  From: [EMAIL PROTECTED]
> [mailto:[EMAIL PROTECTED] On Behalf Of
> LocalSearch.HK
> Sent: Saturday, February 26, 2005 9:05 PM
> To: [EMAIL PROTECTED]
> Subject: [Nutch-dev] Re: [Nutch-general] Question about normalizing urls
> 
>  
>  
> Hi, 
>   
> Can anyone help? I know it is a bit of a newbie question, but I'm stuck with
> it. 
>   
> Shri 
>   
>  
> ----- Original Message ----- 
> From: LocalSearch.HK 
> To: [EMAIL PROTECTED] 
> Sent: Thursday, February 24, 2005 11:27 PM 
> Subject: [Nutch-general] Question about normalizing urls 
> 
>  
> Hi, 
>   
> Can someone help me with this URL? How would I remove the session ids? 
>   
> ;jsessionid=Cdpd452kLqgrnjrvrCJjVjQdmLwWJjnG4JQ4KhPJq2ThQL4XbFzS!-1364780454?
>   
> Regards, 
> Shri 
>   


-- 

<SimpleHuman gender="male">
   <Physical name="Olaf Thiele" />
   <Virtual adress="http://www.olafthiele.de"; />
</SimpleHuman>


-------------------------------------------------------
SF email is sponsored by - The IT Product Guide
Read honest & candid reviews on hundreds of IT Products from real users.
Discover which products truly live up to the hype. Start reading now.
http://ads.osdn.com/?ad_id=6595&alloc_id=14396&op=click
_______________________________________________
Nutch-developers mailing list
Nutch-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nutch-developers

Reply via email to