AG['h1'](u'öäß'.encode('utf8')).xml()On Sep 13, 11:13 am, "jot.be" <[email protected]> wrote: > I searched the ML and found a thread that mentions a similar issue with > TAG() and Unicode: > > http://groups.google.com/group/web2py/browse_thread/thread/a716d6d77b... > > I cannot reproduce the described issues with TAG[tagname](input), but I > still have the problem with passing a Unicode string to TAG(input): > > >>> TAG['h1'](u'öäß').xml() > > '<h1>\xc3\xb6\xc3\xa4\xc3\x9f</h1>'>>> print TAG['h1'](u'öäß').xml() > <h1>öäß</h1> > >>> print TAG[u'hö'](u'<öäß').xml() > > <hö><öäß</hö>>>> print TAG(u'<h1>öäß</h1>').xml() > > Traceback (most recent call last): > File "<console>", line 1, in <module> > File "/Users/jan/hg/web2py/gluon/html.py", line 1054, in __call__ > return web2pyHTMLParser(decoder.decoder(html)).tree > File "/Users/jan/hg/web2py/gluon/decoder.py", line 74, in decoder > return buffer.decode(encoding).encode('utf8') > File > "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/ > encodings/utf_8.py", > line 16, in decode > return codecs.utf_8_decode(input, errors, True) > UnicodeEncodeError: 'ascii' codec can't encode characters in position 4-6: > ordinal not in range(128) > > Really strange. Hints are still welcome. ;) > > Jan > > > > > > > > On Mon, Sep 12, 2011 at 10:17 AM, jot.be <[email protected]> wrote: > > Hi Massimo, > > > thanks for your answer! > > > On Mon, Sep 12, 2011 at 2:19 AM, Massimo Di Pierro < > > [email protected]> wrote: > > >> Are you sure your input is UTF8? The web2py markmin_serializer is in > >> gluon/html.py and it is relatively straightforward. Nothing can really > >> go bad there. I suspect your input has not been parsed at all into the > >> web2py object representation. > > > Not sure - I am quite new to Python and am not so skilled in dealing with > > Unicode issues. > > > I just appended my last approaches: > >https://gist.github.com/caec7bd5b41624d50b01#gistcomment-50227 > > > For my understanding it is not the input that TAG(input) expects. When > > using classic html entities ("ä") it works. > > > Any hint what I could try next? :) > > >> the parsing is done by TAG(input) (not by XML(input)) and it is based > >> on the python built-in XML parser which chokes on non-utf8 chars. It > >> may not be parsting the XML at all and returning the XML as a single > >> string. > > > OK, this is clear. > > > Jan > > >> Massimo > > >> On Sep 11, 2:01 pm, jotbe <[email protected]> wrote: > >> > Hi List, > > >> > I just started my first Web2Py sample project (the Wiki from the book) > >> > and got it even managed to integrate the HTML5 editor Aloha: > >>http://aloha-editor.org/ > > >> > My pages should use Markmin instead of HTML and therefore I am > >> > converting the HTML to Markmin using TAG().flatten() and > >> > markmin_serializer. In general it is working and the content is stored > >> > as Markmin code, but when using eg. German umlauts like 'öä', TAG() > >> > seems to get confused and doesn't handle the encoding properly. > > >> > On the other hand, when trying to use > >> > XML().flatten(render=markmin_serializer) instead of > >> > TAG().flatten(render=markmin_serializer), nothing changes at all. > >> > XML().flatten(render=markmin_serializer) will return the input HTML > >> > string as is, instead of converting it to Markmin. > > >> > I am trying to solve this issue for two days now and read lots of > >> > posts regarding handling of UTF-8 in Python, tried lots of third party > >> > modules to workaround this issue, but had no luck so far. I really > >> > appreciate your help/tips. :) > > >> > Various sample code using the Web2Py Shell: > >>https://gist.github.com/caec7bd5b41624d50b01 > > >> > Thanks in advance!

