Re: URL Stemmer

2005-07-27 Thread Otis Gospodnetic
Hm, not sure why you're emailing [EMAIL PROTECTED] [EMAIL PROTECTED] may be better. Here are 2 ancient classes from 2003 that I once used to normalize URLs, to help me identify URL duplicates. This may get stripped on its way to the list. Otis --- Chris Fraschetti <[EMAIL PROTECTED]> wrote:

URL Stemmer

2005-07-27 Thread Chris Fraschetti
Writing simple code to trim down a URL is trivial, but to actually trim it down to its most meaningful state is very hard. In same cases the URL parameters actually define the page in others they are useless babble. I'd like to use the hash of a page's URL as well as a hash of the content data to h