Re: [web2py] Hebrew encoding

Martin Weissenboeck Thu, 31 May 2012 22:46:20 -0700

I have found at http://wiki.python.org/moin/EscapingXml:


import xml.parsers.expat

def unescape(s):
    want_unicode = False
    if isinstance(s, unicode):
        s = s.encode("utf-8")
        want_unicode = True

    # the rest of this assumes that `s` is UTF-8
    list = []

    # create and initialize a parser object
    p = xml.parsers.expat.ParserCreate("utf-8")
    p.buffer_text = True
    p.returns_unicode = want_unicode
    p.CharacterDataHandler = list.append

    # parse the data wrapped in a dummy element
    # (needed so the "document" is well-formed)
    p.Parse("<e>", 0)
    p.Parse(s, 0)
    p.Parse("</e>", 1)

    # join the extracted strings and return
    es = ""
    if want_unicode:
        es = u""
    return es.join(list)

With

t="""&#x5DE;&#x5E4;&#x5EA;&#x5D7;&#x5D9;&#x5DD;
&#x5E8;&#x5D1;&#x5D9;&#x5DD; &#x5DE;&#x5D1;&#x5E7;&#x5E9;&#x5D9;&#x5DD;
&#x5D0;&#x5EA; &#x5E2;&#x5D6;&#x5E8;&#x5EA;&#x5D9;
&#x5D1;&#x5E4;&#x5EA;&#x5E8;&#x5D5;&#x5DF;
&#x5D1;&#x5E2;&#x5D9;&#x5D5;&#x5EA; &#x5E9;&#x5DC;
&#x5D1;&#x5D9;&#x5E6;&#x5D5;&#x5E2;&#x5D9; Visual Studio.
\n&#x5D1;&#x5D3;&#x201D;&#x5DB; &#x5D0;&#x5EA; &#x5E8;&#x5D5;&#x5D1;
&#x5D4;&#x5D1;&#x5E2;&#x5D9;&#x5D5;&#x5EA; &#x5E0;&#x5D9;&#x5EA;&#x5DF;
&#x5DC;&#x5E4;&#x5EA;&#x5D5;&#x5E8; &#x5D9;&#x5D7;&#x5E1;&#x5D9;&#x5EA;
&#x5D1;&#x5E7;&#x5DC;&#x5D5;&#x5EA;, \n&#x5D5;&#x5DB;&#x5DB;&#x5DC;
&#x5E9;&#x5E2;&#x5D5;&#x5D1;&#x5E8; &#x5D4;&#x5D6;&#x5DE;&#x5DF;
&#x5D0;&#x5E0;&#x5D9; &#x5DE;&#x5D5;&#x5E6;&#x5D0; &#x5D0;&#x5EA;
&#x5E2;&#x5E6;&#x5DE;&#x5D9; &#x5DE;&#x5E1;&#x5E4;&#x5E7;
&#x5E4;&#x5D7;&#x5D5;&#x5EA; &#x5D0;&#x5D5; &#x5D9;&#x5D5;&#x5EA;&#x5E8;
&#x5D0;&#x5EA; &#x5D0;&#x5D5;&#x5EA;&#x5DF;
&#x5D4;&#x5EA;&#x5E9;&#x5D5;&#x5D1;&#x5D5;&#x5EA;, \n&#x5DE;&#x5D4;
&#x5E9;&#x5D2;&#x5E8;&#x5DD; &#x5DC;&#x5D9;
&#x5DC;&#x5D7;&#x5E9;&#x5D5;&#x5D1;
&#x5E9;&#x5DB;&#x5E0;&#x5E8;&#x5D0;&#x5D4; &#x5D4;&#x5D2;&#x5D9;&#x5E2;
&#x5D4;&#x5D6;&#x5DE;&#x5DF; &#x5DC;&#x5D4;&#x5E2;&#x5DC;&#x5D5;&#x5EA;
&#x5D0;&#x5D5;&#x5EA;&#x5DF; &#x5D1;&#x5E6;&#x5D5;&#x5E8;&#x5D4;
&#x5DE;&#x5E1;&#x5D5;&#x5D3;&#x5E8;&#x5EA;
&#x5DC;&#x5E4;&#x5D5;&#x5E1;&#x5D8;."""
print unescape (t)

the result is

מפתחים רבים מבקשים את עזרתי בפתרון בעיות של ביצועי Visual Studio.
בד”כ את רוב הבעיות ניתן לפתור יחסית בקלות,
וככל שעובר הזמן אני מוצא את עצמי מספק פחות או יותר את אותן התשובות,
מה שגרם לי לחשוב שכנראה הגיע הזמן להעלות אותן בצורה מסודרת לפוסט.

I hope it helps.
Regards Martin

2012/6/1 Udi Milo <[email protected]>

> part of my product receives user text, saves it and shows it later.
>
> one of my users added a hebrew text attached below and I do not know how
> to translate it into letter instead of hex.
> simple text.encode('UTF-8') doesn't work, and I am far from being an
> expert in the subject. can someone help me out?
>
> see attached text:
>
> &#x5DE;&#x5E4;&#x5EA;&#x5D7;&#x5D9;&#x5DD; &#x5E8;&#x5D1;&#x5D9;&#x5DD;
> &#x5DE;&#x5D1;&#x5E7;&#x5E9;&#x5D9;&#x5DD; &#x5D0;&#x5EA;
> &#x5E2;&#x5D6;&#x5E8;&#x5EA;&#x5D9;
> &#x5D1;&#x5E4;&#x5EA;&#x5E8;&#x5D5;&#x5DF;
> &#x5D1;&#x5E2;&#x5D9;&#x5D5;&#x5EA; &#x5E9;&#x5DC;
> &#x5D1;&#x5D9;&#x5E6;&#x5D5;&#x5E2;&#x5D9; Visual Studio.
> &#x5D1;&#x5D3;&#x201D;&#x5DB; &#x5D0;&#x5EA; &#x5E8;&#x5D5;&#x5D1;
> &#x5D4;&#x5D1;&#x5E2;&#x5D9;&#x5D5;&#x5EA; &#x5E0;&#x5D9;&#x5EA;&#x5DF;
> &#x5DC;&#x5E4;&#x5EA;&#x5D5;&#x5E8; &#x5D9;&#x5D7;&#x5E1;&#x5D9;&#x5EA;
> &#x5D1;&#x5E7;&#x5DC;&#x5D5;&#x5EA;,
> &#x5D5;&#x5DB;&#x5DB;&#x5DC; &#x5E9;&#x5E2;&#x5D5;&#x5D1;&#x5E8;
> &#x5D4;&#x5D6;&#x5DE;&#x5DF; &#x5D0;&#x5E0;&#x5D9;
> &#x5DE;&#x5D5;&#x5E6;&#x5D0; &#x5D0;&#x5EA; &#x5E2;&#x5E6;&#x5DE;&#x5D9;
> &#x5DE;&#x5E1;&#x5E4;&#x5E7; &#x5E4;&#x5D7;&#x5D5;&#x5EA; &#x5D0;&#x5D5;
> &#x5D9;&#x5D5;&#x5EA;&#x5E8; &#x5D0;&#x5EA; &#x5D0;&#x5D5;&#x5EA;&#x5DF;
> &#x5D4;&#x5EA;&#x5E9;&#x5D5;&#x5D1;&#x5D5;&#x5EA;,
> &#x5DE;&#x5D4; &#x5E9;&#x5D2;&#x5E8;&#x5DD; &#x5DC;&#x5D9;
> &#x5DC;&#x5D7;&#x5E9;&#x5D5;&#x5D1;
> &#x5E9;&#x5DB;&#x5E0;&#x5E8;&#x5D0;&#x5D4; &#x5D4;&#x5D2;&#x5D9;&#x5E2;
> &#x5D4;&#x5D6;&#x5DE;&#x5DF; &#x5DC;&#x5D4;&#x5E2;&#x5DC;&#x5D5;&#x5EA;
> &#x5D0;&#x5D5;&#x5EA;&#x5DF; &#x5D1;&#x5E6;&#x5D5;&#x5E8;&#x5D4;
> &#x5DE;&#x5E1;&#x5D5;&#x5D3;&#x5E8;&#x5EA;
> &#x5DC;&#x5E4;&#x5D5;&#x5E1;&#x5D8;.
>

Re: [web2py] Hebrew encoding

Reply via email to