[web2py] urlify

Jonathan Lundell Sat, 23 Jan 2010 17:04:06 -0800

urlify needs a comment to say explicitly what its intention is. That's partly 
because it suppresses quite a few characters that are normally legal in URLs, 
which is confusing.


Also,

> def urlify(s, max_length=80):
>     s = s.lower()
>     # string normalization, eg è => e, ñ => n
>     s = unicodedata.normalize('NFKD', s.decode('utf-8')).encode('ASCII', 
> 'ignore')
>     # strip entities
>     s = re.sub('&\w+;', '', s)

this should be '&\w+?;' (that is, non-greedy). Otherwise, a string like 
'&amp;whatever&amp;' will be completely eliminated.

>     # strip everything but letters, numbers, dashes and spaces
>     s = re.sub('[^a-z0-9\-\s]', '', s)
>     # replace spaces with dashes
>     s = s.replace(' ', '-')
>     # strip multiple contiguous dashes
>     s = re.sub('-{2,}', '-', s)
>     # strip dashes at the beginning and end of the string
>     s = s.strip('-')
>     # ensure the maximum length
>     s = s[:max_length-1]
>     return s

(Stylistically, I think it'd be more readable if the comments were appended to 
the relevant code lines.)

-- 
You received this message because you are subscribed to the Google Groups 
"web2py-users" group.
To post to this group, send email to web...@googlegroups.com.
To unsubscribe from this group, send email to 
web2py+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/web2py?hl=en.

[web2py] urlify

Reply via email to