On 17 Dec 2013, at 3:32 PM, Dave S <[email protected]> wrote:
> On Monday, December 16, 2013 7:15:12 AM UTC-8, Jonathan Lundell wrote:
> On 16 Dec 2013, at 2:18 AM, peter <[email protected]> wrote:
>> I am impressed at how helpful you have been on this Jonathon.
>> 
>> It does say in the mht file that it is windows-1252 encoded.
>> 
>> It turns out that 
>>     s.decode('cp1252').encode('utf-8')
>> 
>> is working correctly. I mistakenly thought it was not
>> 
>> because I got this error 
>> UnicodeEncodeError: 'charmap' codec can't encode character u'\u2018' in 
>> position 193: character maps to <undefined>
>> This was from a print statement. It turns out you get this error when trying 
>> to print the left single quotation mark that is correcty coded in unicode. 
>> So this is why I was having such problems. This error is presumably because 
>> the print statement is working in dos mode, this character is not in the dos 
>> character set. So using print to check out what is going on in python is not 
>> a good idea when using unicode.
>> 
>> Obvious with hindsight, not so obvious without hindsight.
>>  
>> Thanks again Jonathon for all your support on this.
>> 
> 
> You're welcome.
> 
> I'm sympathetic, having struggled to understand character encoding myself. 
> 
> FWIW, I've leaning more and more toward the Python 3 approach to text, 
> decoding all my strings to unicode on input, encoding as needed on output. 
> (Python 3's standardization on Unicode almost persuades me that Python 3 was 
> a good idea.)
> 
> And which Unicode is that?  Over in Mercurial-land, there is still gnashing 
> of teeth because Windows chose UTF-16, and "everyone else" chose UTF-8 (that 
> is, most Linux distros).
> 

Unicode in Python is a built-in type, neither utf-8 nor utf-16. This is true in 
both Python 2 & 3; it's just that in Python 3 it's the *only* string (text) 
type. You can encode a unicode string into a utf-8 or utf-16 byte string, but 
that's an encoded byte string, not unicode.

There's some detail on Python's internal handling of Unicode here: 
http://docs.python.org/2/c-api/unicode.html

-- 
Resources:
- http://web2py.com
- http://web2py.com/book (Documentation)
- http://github.com/web2py/web2py (Source code)
- https://code.google.com/p/web2py/issues/list (Report Issues)
--- 
You received this message because you are subscribed to the Google Groups 
"web2py-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/groups/opt_out.

Reply via email to