[web2py] Re: Issues with TAG() encoding and XML().flatten()

Massimo Di Pierro Tue, 13 Sep 2011 11:47:36 -0700

AG['h1'](u'öäß'.encode('utf8')).xml()


On Sep 13, 11:13 am, "jot.be" <[email protected]> wrote:
> I searched the ML and found a thread that mentions a similar issue with
> TAG() and Unicode:
>
> http://groups.google.com/group/web2py/browse_thread/thread/a716d6d77b...
>
> I cannot reproduce the described issues with TAG[tagname](input), but I
> still have the problem with passing a Unicode string to TAG(input):
>
> >>> TAG['h1'](u'öäß').xml()
>
> '<h1>\xc3\xb6\xc3\xa4\xc3\x9f</h1>'>>> print TAG['h1'](u'öäß').xml()
> <h1>öäß</h1>
> >>> print TAG[u'hö'](u'<öäß').xml()
>
> <hö>&lt;öäß</hö>>>> print TAG(u'<h1>öäß</h1>').xml()
>
> Traceback (most recent call last):
>   File "<console>", line 1, in <module>
>   File "/Users/jan/hg/web2py/gluon/html.py", line 1054, in __call__
>     return web2pyHTMLParser(decoder.decoder(html)).tree
>   File "/Users/jan/hg/web2py/gluon/decoder.py", line 74, in decoder
>     return buffer.decode(encoding).encode('utf8')
>   File
> "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/ 
> encodings/utf_8.py",
> line 16, in decode
>     return codecs.utf_8_decode(input, errors, True)
> UnicodeEncodeError: 'ascii' codec can't encode characters in position 4-6:
> ordinal not in range(128)
>
> Really strange. Hints are still welcome. ;)
>
> Jan
>
>
>
>
>
>
>
> On Mon, Sep 12, 2011 at 10:17 AM, jot.be <[email protected]> wrote:
> > Hi Massimo,
>
> > thanks for your answer!
>
> > On Mon, Sep 12, 2011 at 2:19 AM, Massimo Di Pierro <
> > [email protected]> wrote:
>
> >> Are you sure your input is UTF8? The web2py markmin_serializer is in
> >> gluon/html.py and it is relatively straightforward. Nothing can really
> >> go bad there. I suspect your input has not been parsed at all into the
> >> web2py object representation.
>
> > Not sure - I am quite new to Python and am not so skilled in dealing with
> > Unicode issues.
>
> > I just appended my last approaches:
> >https://gist.github.com/caec7bd5b41624d50b01#gistcomment-50227
>
> > For my understanding it is not the input that TAG(input) expects. When
> > using classic html entities ("&auml;") it works.
>
> > Any hint what I could try next? :)
>
> >> the parsing is done by TAG(input) (not by XML(input)) and it is based
> >> on the python built-in XML parser which chokes on non-utf8 chars. It
> >> may not be parsting the XML at all and returning the XML as a single
> >> string.
>
> > OK, this is clear.
>
> > Jan
>
> >> Massimo
>
> >> On Sep 11, 2:01 pm, jotbe <[email protected]> wrote:
> >> > Hi List,
>
> >> > I just started my first Web2Py sample project (the Wiki from the book)
> >> > and got it even managed to integrate the HTML5 editor Aloha:
> >>http://aloha-editor.org/
>
> >> > My pages should use Markmin instead of HTML and therefore I am
> >> > converting the HTML to Markmin using TAG().flatten() and
> >> > markmin_serializer. In general it is working and the content is stored
> >> > as Markmin code, but when using eg. German umlauts like 'öä', TAG()
> >> > seems to get confused and doesn't handle the encoding properly.
>
> >> > On the other hand, when trying to use
> >> > XML().flatten(render=markmin_serializer) instead of
> >> > TAG().flatten(render=markmin_serializer), nothing changes at all.
> >> > XML().flatten(render=markmin_serializer) will return the input HTML
> >> > string as is, instead of converting it to Markmin.
>
> >> > I am trying to solve this issue for two days now and read lots of
> >> > posts regarding handling of UTF-8 in Python, tried lots of third party
> >> > modules to workaround this issue, but had no luck so far. I really
> >> > appreciate your help/tips. :)
>
> >> > Various sample code using the Web2Py Shell:
> >>https://gist.github.com/caec7bd5b41624d50b01
>
> >> > Thanks in advance!

[web2py] Re: Issues with TAG() encoding and XML().flatten()

Reply via email to