On Jun 2, 2011, at 8:53 AM, Massimo Di Pierro wrote:
> 
> Use IS_SLUG.urlify(...)
> 
> I will remove the latter. Was never intended to be there.

There are a couple of things that could be fixed in urlify.

>     def urlify(value, maxlen=80):
>         s = value.decode('utf-8').lower()    # to lowercase utf-8
>         s = unicodedata.normalize('NFKD', s) # normalize eg è => e, ñ => n
>         s = s.encode('ASCII', 'ignore')      # encode as ASCII
>         s = re.sub('&\w+?;', '', s)          # strip html entities

the '?' is redundant here

>         s = re.sub('[^a-z0-9\-\s]', '', s)   # strip all but 
> alphanumeric/hyphen/space

the comment says 'space' but the pattern retains other whitespace. See next 
line.

>         s = s.replace(' ', '-')              # spaces to hyphens

if the previous line stays \s, then this should be \s as well

>         s = re.sub('--+', '-', s)            # collapse strings of hyphens
>         s = s.strip('-')                     # remove leading and traling 
> hyphens

'trailing'

>         return s[:maxlen].strip('-')         # enforce maximum length
> 

should strip first, then enforce maxlen


I wonder whether it's such a great idea to strip underscores.

Reply via email to