On 23/07/13 04:33, Marc Tompkins wrote:
On Mon, Jul 22, 2013 at 11:30 AM, Jim Mooney <cybervigila...@gmail.com>wrote:

On 22 July 2013 11:26, Marc Tompkins <marc.tompk...@gmail.com> wrote:




If you haven't already read it, may I suggest Joel's intro to Unicode?
http://www.joelonsoftware.com/articles/Unicode.html


I had a bad feeling I'd end up learning Unicode ;')


It's not as painful as you might think!  Try it - you'll like it!
Actually, once you start getting used to working in Unicode by default,
having to deal with programs that are non-Unicode-aware feels extremely
irritating.


What he said!

Unicode brings order out of chaos. The old code page technology is horrible and 
needs to die. It was just barely acceptable back in ancient days when files 
were hardly ever transferred from machine to machine, and even then mostly 
transferred between machines using the same language. Even so, it didn't work 
very well -- ask Russians, who had three mutually incapable code pages.


The basics of Unicode are very simple:

- text strings contain characters;

- what is written to disk contains bytes;

- you need to convert characters to and from bytes, regardless of whether you 
are using ASCII or Unicode or something else;

- the conversion uses a mapping of character to byte(s), and visa versa, called 
an encoding;

- ASCII is an encoding too, e.g. byte 80 <=> "P";

- use the encode method to go from text to bytes, and decode to go the other 
way;

- if you don't know what encoding is used, you cannot tell what the bytes 
actually mean;

- although sometimes you can guess, with a variable level of success.



Remember those rules, and you are three quarters of the way to being an expert.



--
Steven
_______________________________________________
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor

Reply via email to