I'll preface my response by saying that I know/understand fairly little about it, but since I've recently been smacked by this same issue when converting stuff to Python3, I'll see if I can explain it in a way that makes sense.
On Wed, 18 Jul 2012, Jordan wrote:
OK so I have been trying for a couple days now and I am throwing in the towel, Python 3 wins this one. I want to convert a string to binary and back again like in this question: Stack Overflow: Convert Binary to ASCII and vice versa (Python) <http://stackoverflow.com/questions/7396849/convert-binary-to-ascii-and-vice-versa-python> But in Python 3 I consistently get some sort of error relating to the fact that nothing but bytes and bytearrays support the buffer interface or I get an overflow error because something is too large to be converted to bytes. Please help me and then explian what I am not getting that is new in Python 3. I would like to point out I realize that binary, hex, and encodings are all a very complex subject and so I do not expect to master it but I do hope that I can gain a deeper insight. Thank you all.
The way I've read it - stop thinking about strings as if they are text. The biggest reason that all this has changed is because Python has grown up and entered the world where Unicode actually matters. To us poor shmucks in the English speaking countries of the world it's all very confusing becaust it's nothing we have to deal with. 26 letters is perfectly fine for us - and if we want uppercase we'll just throw another 26. Add a few dozen puncuation marks and 256 is a perfectly fine amount of characters. To make a slightly relevant side trip, when you were a kid did you ever send "secret" messages to a friend with a code like this? A = 1 B = 2 . . . Z = 26 Well, that's basically what is going on when it comes to bytes/text/whatever. When you input some text, Python3 believes that whatever you wrote was encoded with Unicode. The nice thing for us 26-letter folks is that the ASCII alphabet we're so used to just so happens to map quite well to Unicode encodings - so 'A' in ASCII is the same number as 'A' in utf-8. Now, here's the part that I had to (and still need to) wrap my mind around - if the string is "just bytes" then it doesn't really matter what the string is supposed to represent. It could represent the LATIN-1 character set. Or UTF-8, -16, or some other weird encoding. And all the operations that are supposed to modify these strings of bytes (e.g. removing spaces, splitting on a certain "character", etc.) still work. Because if I have this string: 9 45 12 9 13 19 18 9 12 99 102 and I tell you to split on the 9's, it doesn't matter if that's some weird ASCII character, or some equally weird UTF character, or something else entirely. And I don't have to worry about things getting munged up when I try to stick Unicode and ASCII values together - because they're converted to bytes first. So the question is, of course, if it's all bytes, then why does it look like text when I print it out? Well, that's because Python converts that byte stream to Unicode text when it's printed. Or ASCII, if you tell it to. But Python3 has converted all(?) of those functions that used to operate on text and made them operate on byte streams instead. Except for the ones that operate on text ;) Well, I hope that's of some use and isn't too much of a lie - like I said, I'm still trying to wrap my head around things and I've found that explaining (or trying to explain) to someone else is often the best way to work out the idea in your own head. If I've gone too far astray I'm sure the other helpful folks here will correct me :) HTH, Wayne _______________________________________________ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor