On 23/07/13 04:33, Marc Tompkins wrote:
On Mon, Jul 22, 2013 at 11:30 AM, Jim Mooney <cybervigila...@gmail.com>wrote:
On 22 July 2013 11:26, Marc Tompkins <marc.tompk...@gmail.com> wrote:
If you haven't already read it, may I suggest Joel's intro to Unicode?
http://www.joelonsoftware.com/articles/Unicode.html
I had a bad feeling I'd end up learning Unicode ;')
It's not as painful as you might think! Try it - you'll like it!
Actually, once you start getting used to working in Unicode by default,
having to deal with programs that are non-Unicode-aware feels extremely
irritating.
What he said!
Unicode brings order out of chaos. The old code page technology is horrible and
needs to die. It was just barely acceptable back in ancient days when files
were hardly ever transferred from machine to machine, and even then mostly
transferred between machines using the same language. Even so, it didn't work
very well -- ask Russians, who had three mutually incapable code pages.
The basics of Unicode are very simple:
- text strings contain characters;
- what is written to disk contains bytes;
- you need to convert characters to and from bytes, regardless of whether you
are using ASCII or Unicode or something else;
- the conversion uses a mapping of character to byte(s), and visa versa, called
an encoding;
- ASCII is an encoding too, e.g. byte 80 <=> "P";
- use the encode method to go from text to bytes, and decode to go the other
way;
- if you don't know what encoding is used, you cannot tell what the bytes
actually mean;
- although sometimes you can guess, with a variable level of success.
Remember those rules, and you are three quarters of the way to being an expert.
--
Steven
_______________________________________________
Tutor maillist - Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor