Re: Unicode questions

2010-10-26 Thread Steve Holden
On 10/26/2010 12:32 PM, John Nagle wrote: > On 10/19/2010 12:02 PM, Tobiah wrote: >> I've been reading about the Unicode today. >> I'm only vaguely understanding what it is >> and how it works. >> >> Please correct my understanding where it is lacking. > > http://justfuckinggoogleit.com/ Neit

Re: Unicode questions

2010-10-26 Thread John Nagle
On 10/19/2010 12:02 PM, Tobiah wrote: I've been reading about the Unicode today. I'm only vaguely understanding what it is and how it works. Please correct my understanding where it is lacking. http://justfuckinggoogleit.com/ -- http://mail.python.org/mailman/listinfo/python-list

Re: Unicode questions

2010-10-25 Thread Steve Holden
On 10/25/2010 2:36 PM, Terry Reedy wrote: > On 10/25/2010 2:33 AM, Steve Holden wrote: >> On 10/25/2010 1:42 AM, Lawrence D'Oliveiro wrote: >>> In message, Petite >>> Abeille wrote: >>> Characters vs. Bytes >>> >>> And why do certain people insist on referring to bytes as “octets”? >> >> Becau

Re: Unicode questions

2010-10-25 Thread Terry Reedy
On 10/25/2010 2:33 AM, Steve Holden wrote: On 10/25/2010 1:42 AM, Lawrence D'Oliveiro wrote: In message, Petite Abeille wrote: Characters vs. Bytes And why do certain people insist on referring to bytes as “octets”? Because back in the old days bytes were of varying sizes on different arch

Re: Unicode questions

2010-10-25 Thread Seebs
On 2010-10-25, Lawrence D'Oliveiro wrote: > In message , Petite > Abeille wrote: >> Characters vs. Bytes > And why do certain people insist on referring to bytes as ???octets One common reason is that there have been machines on which "bytes" were not 8 bits. In particular, the usage of "b

Re: Unicode questions

2010-10-24 Thread Steve Holden
On 10/25/2010 1:42 AM, Lawrence D'Oliveiro wrote: > In message , Petite > Abeille wrote: > >> Characters vs. Bytes > > And why do certain people insist on referring to bytes as “octets”? Because back in the old days bytes were of varying sizes on different architectures - indeed the DECSystem-1

Re: Unicode questions

2010-10-24 Thread Chris Rebert
On Sun, Oct 24, 2010 at 10:43 PM, Lawrence D'Oliveiro wrote: > In message , Chris Rebert > wrote: > >> There is no such thing as "plain Unicode representation". > > UCS-4 or UTF-16 probably come the closest. How do you figure that? Cheers, Chris -- http://mail.python.org/mailman/listinfo/python

Re: Unicode questions

2010-10-24 Thread Lawrence D'Oliveiro
In message , Chris Rebert wrote: > There is no such thing as "plain Unicode representation". UCS-4 or UTF-16 probably come the closest. -- http://mail.python.org/mailman/listinfo/python-list

Re: Unicode questions

2010-10-24 Thread Lawrence D'Oliveiro
In message , Petite Abeille wrote: > Characters vs. Bytes And why do certain people insist on referring to bytes as “octets”? -- http://mail.python.org/mailman/listinfo/python-list

Re: Unicode questions

2010-10-21 Thread OdarR
On Oct 19, 9:02 pm, Tobiah wrote: > I've been reading about the Unicode today. > I'm only vaguely understanding what it is > and how it works. > ... > Thanks, > > Tobiah Hi, A good advice, read this presentation, http://farmdev.com/talks/unicode/ Explanation and advices for coding. Olivier --

Re: Unicode questions

2010-10-20 Thread M.-A. Lemburg
Tobiah wrote: > I've been reading about the Unicode today. > I'm only vaguely understanding what it is > and how it works. > > Please correct my understanding where it is lacking. > Unicode is really just a database of character information > such as the name, unicode section, possible > numeric

Re: Unicode questions

2010-10-19 Thread Terry Reedy
On 10/19/2010 4:31 PM, Tobiah wrote: There is no such thing as "plain Unicode representation". The closest thing would be an abstract sequence of Unicode codepoints (ala Python's `unicode` type), but this is way too abstract to be used for sharing/interchange, because storing anything in a file o

Re: Unicode questions

2010-10-19 Thread Chris Rebert
On Tue, Oct 19, 2010 at 1:31 PM, Tobiah wrote: >> There is no such thing as "plain Unicode representation". The closest >> thing would be an abstract sequence of Unicode codepoints (ala Python's >> `unicode` type), but this is way too abstract to be used for >> sharing/interchange, because storing

Re: Unicode questions

2010-10-19 Thread Petite Abeille
On Oct 19, 2010, at 10:31 PM, Tobiah wrote: > So why so many encoding schemes? http://en.wikipedia.org/wiki/Space-time_tradeoff -- http://mail.python.org/mailman/listinfo/python-list

Re: Unicode questions

2010-10-19 Thread Tobiah
> There is no such thing as "plain Unicode representation". The closest > thing would be an abstract sequence of Unicode codepoints (ala Python's > `unicode` type), but this is way too abstract to be used for > sharing/interchange, because storing anything in a file or sending it > over a network u

Re: Unicode questions

2010-10-19 Thread Chris Rebert
On Tue, Oct 19, 2010 at 12:02 PM, Tobiah wrote: > I've been reading about the Unicode today. > I'm only vaguely understanding what it is > and how it works. Petite Abeille already pointed to Joel's excellent primer on the subject; I can only second their endorsement of his article. > Please corr

Re: Unicode questions

2010-10-19 Thread Hrvoje Niksic
Tobiah writes: > would be shared? Why can't we just say "unicode is unicode" > and just share files the way ASCII users do. Just have a huge > ASCII style table that everyone sticks to. I'm not sure that I understand you correctly, but UCS-2 and UCS-4 encodings are that kind of thing. Many pe

Re: Unicode questions

2010-10-19 Thread Petite Abeille
On Oct 19, 2010, at 9:02 PM, Tobiah wrote: > Please enlighten my vague and probably ill-formed conception of this whole > thing. Hmmm... is there a question hidden somewhere in there or is it more open ended in nature? :) In the meantime... The Absolute Minimum Every Software Developer Absol

Unicode questions

2010-10-19 Thread Tobiah
I've been reading about the Unicode today. I'm only vaguely understanding what it is and how it works. Please correct my understanding where it is lacking. Unicode is really just a database of character information such as the name, unicode section, possible numeric value etc. These points of i