Re: How do I display unicode value stored in a string variable using ord()

Terry Reedy Sun, 19 Aug 2012 15:03:39 -0700

On 8/19/2012 2:11 PM, wxjmfa...@gmail.com wrote:

Well, it seems some software producers know what they
are doing.

'€'.encode('cp1252')

b'\x80'

'€'.encode('mac-roman')

b'\xdb'

'€'.encode('iso-8859-1')

Traceback (most recent call last):
   File "<eta last command>", line 1, in <module>
UnicodeEncodeError: 'latin-1' codec can't encode character '\u20ac'
in position 0: ordinal not in range(256)

Yes, Python lets you choose your byte encoding from those and a hundredothers. I believe all the codecs are now tested in both directions. Itwas not an easy task.

As to the examples: Latin-1 dates to 1985 and before and the 1988version was published as a standard in 1992.

https://en.wikipedia.org/wiki/Latin-1
"The name euro was officially adopted on 16 December 1995."
https://en.wikipedia.org/wiki/Euro

No wonder Latin-1 does not contain the Euro sign. Internationalstandards organizations standards are relatively fixed. (The unicodeconsortium will not even correct misspelled character names.) Instead,new standards with a new number are adopted.

For better or worse, private mappings are more flexible. In its Macmapping Apple "replaced the generic currency sign ¤ with the euro sign€". (See Latin-1 reference.) Great if you use Euros, not so great if youwere using the previous sign for something else.


Microsoft changed an unneeded code to the Euro for Windows cp-1252.
https://en.wikipedia.org/wiki/Windows-1252

"It is very common to mislabel Windows-1252 text with the charset labelISO-8859-1. A common result was that all the quotes and apostrophes(produced by "smart quotes" in Microsoft software) were replaced withquestion marks or boxes on non-Windows operating systems, making textdifficult to read. Most modern web browsers and e-mail clients treat theMIME charset ISO-8859-1 as Windows-1252 in order to accommodate suchmislabeling. This is now standard behavior in the draft HTML 5specification, which requires that documents advertised as ISO-8859-1actually be parsed with the Windows-1252 encoding.[1]"

Lots of fun. Too bad Microsoft won't push utf-8 so we can allcommunicate text with much less chance of ambiguity.


--
Terry Jan Reedy


--
http://mail.python.org/mailman/listinfo/python-list

Re: How do I display unicode value stored in a string variable using ord()

Reply via email to