On Jun 2, 2011, at 8:53 AM, Massimo Di Pierro wrote:
>
> Use IS_SLUG.urlify(...)
>
> I will remove the latter. Was never intended to be there.
There are a couple of things that could be fixed in urlify.
> def urlify(value, maxlen=80):
> s = value.decode('utf-8').lower() # to lowercase utf-8
> s = unicodedata.normalize('NFKD', s) # normalize eg è => e, ñ => n
> s = s.encode('ASCII', 'ignore') # encode as ASCII
> s = re.sub('&\w+?;', '', s) # strip html entities
the '?' is redundant here
> s = re.sub('[^a-z0-9\-\s]', '', s) # strip all but
> alphanumeric/hyphen/space
the comment says 'space' but the pattern retains other whitespace. See next
line.
> s = s.replace(' ', '-') # spaces to hyphens
if the previous line stays \s, then this should be \s as well
> s = re.sub('--+', '-', s) # collapse strings of hyphens
> s = s.strip('-') # remove leading and traling
> hyphens
'trailing'
> return s[:maxlen].strip('-') # enforce maximum length
>
should strip first, then enforce maxlen
I wonder whether it's such a great idea to strip underscores.