On 7/17/09 11:19 PM, Walter Leibbrandt wrote:
Bertrand Kintanar wrote:
On 7/17/09 9:19 PM, saeed wrote:
s1 = 'Guzán'
s2 = ''
n = len(s1)
i = 0
while i<n:
if i<n-6:
if s1[i:i+3]=='&#x' and s1[i+5]==';':
s2 += unichr(int(s1[i+3:i+5], 16)).encode('utf-8')
i += 6
continue
s2 += s1[i]
i += 1
print s2
Now this fixes it all. Thanks alot. I hope there is some sexier way
to do this though. but this will work. thanks again
import re
htmluni = re.compile(r'&#x([\dA-Fa-f]+);')
data = 'Guzán Guzán'
match = htmluni.search(data)
while match:
data = data[:match.start()] + unichr(int(match.group(1), 16)) +
data[match.end():]
match = htmluni.search(data)
Thanks for this Walter. I'm also using regex for my search but never
thought of it to use it as you have in here.
_______________________________________________
pygtk mailing list [email protected]
http://www.daa.com.au/mailman/listinfo/pygtk
Read the PyGTK FAQ: http://faq.pygtk.org/