urlify needs a comment to say explicitly what its intention is. That's partly
because it suppresses quite a few characters that are normally legal in URLs,
which is confusing.
Also,
> def urlify(s, max_length=80):
> s = s.lower()
> # string normalization, eg è => e, ñ => n
> s = unicodedata.normalize('NFKD', s.decode('utf-8')).encode('ASCII',
> 'ignore')
> # strip entities
> s = re.sub('&\w+;', '', s)
this should be '&\w+?;' (that is, non-greedy). Otherwise, a string like
'&whatever&' will be completely eliminated.
> # strip everything but letters, numbers, dashes and spaces
> s = re.sub('[^a-z0-9\-\s]', '', s)
> # replace spaces with dashes
> s = s.replace(' ', '-')
> # strip multiple contiguous dashes
> s = re.sub('-{2,}', '-', s)
> # strip dashes at the beginning and end of the string
> s = s.strip('-')
> # ensure the maximum length
> s = s[:max_length-1]
> return s
(Stylistically, I think it'd be more readable if the comments were appended to
the relevant code lines.)
--
You received this message because you are subscribed to the Google Groups
"web2py-users" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to
[email protected].
For more options, visit this group at
http://groups.google.com/group/web2py?hl=en.