On 7/17/09 11:19 PM, Walter Leibbrandt wrote:
Bertrand Kintanar wrote:
On 7/17/09 9:19 PM, saeed wrote:
s1 = 'Guzán'
s2 = ''
n = len(s1)
i = 0
while i<n:
   if i<n-6:
     if s1[i:i+3]=='&#x' and s1[i+5]==';':
       s2 += unichr(int(s1[i+3:i+5], 16)).encode('utf-8')
       i += 6
       continue
   s2 += s1[i]
   i += 1
print s2
Now this fixes it all. Thanks alot. I hope there is some sexier way to do this though. but this will work. thanks again
import re
htmluni = re.compile(r'&#x([\dA-Fa-f]+);')
data = 'Guz&#xE1;n   Guz&#xE1;n'

match = htmluni.search(data)
while match:
data = data[:match.start()] + unichr(int(match.group(1), 16)) + data[match.end():]
    match = htmluni.search(data)

Thanks for this Walter. I'm also using regex for my search but never thought of it to use it as you have in here.
_______________________________________________
pygtk mailing list   [email protected]
http://www.daa.com.au/mailman/listinfo/pygtk
Read the PyGTK FAQ: http://faq.pygtk.org/

Reply via email to