You should probably copy Twitter's own twitter-text library. Their regex
is a bit more complicated, and in Ruby, an I am not proficient enough in
Python to push a real patch:

LATIN_ACCENTS = [(0xc0..0xd6).to_a, (0xd8..0xf6).to_a, 
(0xf8..0xff).to_a].flatten.pack('U*').freeze
HASHTAG_CHARACTERS = /[a-z0-9_#{LATIN_ACCENTS}]/io

However, HASHTAG_CHARACTERS are only allowed from position 2 and on:

REGEXEN[:auto_link_hashtags] =
/(^|[^0-9A-Z&\/]+)(#|#)([0-9A-Z_]*[A-Z_]+#{HASHTAG_CHARACTERS}*)/io

-- 
non ASCII chars in hashtags dont work
https://bugs.launchpad.net/bugs/372164
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.

-- 
ubuntu-bugs mailing list
[email protected]
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

Reply via email to