On Monday, December 16, 2013 7:15:12 AM UTC-8, Jonathan Lundell wrote: > > On 16 Dec 2013, at 2:18 AM, peter <[email protected] <javascript:>> > wrote: > > I am impressed at how helpful you have been on this Jonathon. > > It does say in the mht file that it is windows-1252 encoded. > > It turns out that > s.decode('cp1252').encode('utf-8') > > is working correctly. I mistakenly thought it was not > > because I got this error > UnicodeEncodeError: 'charmap' codec can't encode character u'\u2018' in > position > 193: character maps to <undefined> > This was from a print statement. It turns out you get this error when > trying to print the left single quotation mark that is correcty coded in > unicode. So this is why I was having such problems. This error is > presumably because the print statement is working in dos mode, this > character is not in the dos character set. So using print to check out what > is going on in python is not a good idea when using unicode. > > Obvious with hindsight, not so obvious without hindsight. > > Thanks again Jonathon for all your support on this. > > > You're welcome. > > I'm sympathetic, having struggled to understand character encoding myself. > > FWIW, I've leaning more and more toward the Python 3 approach to > text, decoding all my strings to unicode on input, encoding as needed on > output. (Python 3's standardization on Unicode almost persuades me that > Python 3 was a good idea.) >
And which Unicode is that? Over in Mercurial-land, there is still gnashing of teeth because Windows chose UTF-16, and "everyone else" chose UTF-8 (that is, most Linux distros). /dps -- Resources: - http://web2py.com - http://web2py.com/book (Documentation) - http://github.com/web2py/web2py (Source code) - https://code.google.com/p/web2py/issues/list (Report Issues) --- You received this message because you are subscribed to the Google Groups "web2py-users" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. For more options, visit https://groups.google.com/groups/opt_out.

