>>>>> "Josiah" == Josiah Carlson <[EMAIL PROTECTED]> writes:
Josiah> I try to internalize it by not thinking of strings as Josiah> encoded data, but as binary data, and unicode as text. I Josiah> then remind myself that unicode isn't native on-disk or Josiah> cross-network (which stores and transports bytes, not Josiah> characters), so one needs to encode it as binary data. Josiah> It's a subtle difference, but it has worked so far for me. Seems like a lot of work for something that for monolingual usage should "Just Work" almost all of the time. Josiah> I notice that you seem to be in Japan, so teaching unicode Josiah> is a must. Yes. Japan is more complicated than that, but in Python unicode is a must. Josiah> If you are using the "unicode is text" and "strings are Josiah> data", and they aren't getting it; then I don't know. Well, I can tell you that they don't get it. One problem is PEP 263. It makes it very easy to write programs that do line-oriented I/O with input() and print, and the students come to think it should always be that easy. Since Japan has at least 6 common encodings that students encounter on a daily basis while browsing the web, plus a couple more that live inside of MSFT Word and Java, they're used to huge amounts of magic. The normal response of novice programmers is to mandate that users of their programs use the encoding of choice and put it in ordinary strings so that it just works. Ie, the average student just "eats" the F on the codecs assignment, and writes the rest of her programs without them. >> simple, and the exceptions for using a "nonexistent" method >> mean I don't have to reinforce---the students will be able to >> teach each other. The exceptions also directly help reinforce >> the notion that text == Unicode. Josiah> Are you sure that they would help? If .encode() and Josiah> .decode() drop from strings and unicode (respectively), Josiah> they get an AttributeError. That's almost useless. Well, I'm not _sure_, but this is the kind of thing that you can learn by rote. And it will happen on a sufficiently regular basis that a large fraction of students will experience it. They'll ask each other, and usually they'll find a classmate who knows what happened. I haven't tried this with codecs, but that's been my experience with statistical packages where some routines understand non-linear equations but others insist on linear equations.[1] The error messages ("Equation is non-linear! Aaugh!") are not much more specific than AttributeError. Josiah> Raising a better exception (with more information) would Josiah> be better in that case, but losing the functionality that Josiah> either would offer seems unnecessary; Well, the point is that for the "usual suspects" (ie, Unicode codecs) there is no functionality that would be lost. As MAL pointed out, for these codecs the "original" text is always Unicode; that's the role Unicode is designed for, and by and large it fits the bill very well. With few exceptions (such as rot13) the "derived" text will be bytes that peripherals such as keyboards and terminals can generate and display. Josiah> "You are trying to encode/decode to/from incompatible Josiah> types. expected: a->b got: x->y" is better. Some of those Josiah> can be done *very soon*, given the capabilities of the Josiah> encodings module, That's probably the way to go. If we can have a derived "Unicode codec" class that does this, that would pretty much entirely serve the need I perceive. Beginning students could learn to write iconv.py, more advanced students could learn to create codec stacks to generate MIME bodies, which could include base64 or quoted-printable bytes -> bytes codecs. Footnotes: [1] If you're not familiar with regression analysis, the problem is that the equation "z = a*log(x) + b*log(y)" where a and b are to be estimated is _linear_ in the sense that x, y, and z are data series, and X = log(x) and Y = log(y) can be precomputed so that the equation actually computed is "z = a*X + b*Y". On the other hand "z = a*(x + b*y)" is _nonlinear_ because of the coefficient on y being a*b. Students find this hard to grasp in the classroom, but they learn quickly in the lab. I believe the parameter/variable inversion that my students have trouble with in statistics is similar to the "original"/"derived" inversion that happens with "text you can see" (derived, string) and "abstract text inside the program" (original, Unicode). -- School of Systems and Information Engineering http://turnbull.sk.tsukuba.ac.jp University of Tsukuba Tennodai 1-1-1 Tsukuba 305-8573 JAPAN Ask not how you can "do" free software business; ask what your business can "do for" free software. _______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com