On Jan 26, 1:11 pm, globophobe <[EMAIL PROTECTED]> wrote: > This is likely an easy problem; however, I couldn't think of > appropriate keywords for google: > > Basically, I have some raw data that needs to be preprocessed before > it is saved to the database e.g. > > In [1]: unicode_html = u'\u3055\u3080\u3044\uff0f\r\n\u3064\u3081\u305f > \u3044\r\n' > > I need to turn this into an elementtree, but some of the data is > japanese whereas the rest is html. This string contains a <br />.
>>> import unicodedata as ucd >>> s = u'\u3055\u3080\u3044\uff0f\r\n\u3064\u3081\u305f\u3044\r\n' >>> [ucd.name(c) if ord(c) >= 128 else c for c in s] ['HIRAGANA LETTER SA', 'HIRAGANA LETTER MU', 'HIRAGANA LETTER I', 'FULLWIDTH SOLIDUS', u'\r', u'\n', 'HIRAGANA LETTER TU', 'HIRAGANA LETTER ME', 'HIRAGANA LETTER TA', 'HIRAGANA LETTER I', u'\r', u'\n'] >>> Where in there is the <br /> ?? -- http://mail.python.org/mailman/listinfo/python-list