On Sep 13, 2007, at 12:22 PM, Marcin 'Qrczak' Kowalczyk wrote: > What should happen when a command line argument or an environment > variable is not decodable using the system encoding (on Unix where > from the OS point of view it is an array of bytes)?
Here's a suggestion I made on the SBCL dev list a while back, in response to the same issues. I am responding to myself here, where my first suggestion was to keep all the environmental gunk in byte- arrays rather than strings. That is still a very nice and simple possibility. My second inclination was to use a variant of utf8 which can handle all bytestrings, instead of utf8 itself: utf-8b. This obviously works best when the system encoding is actually utf8. > On Aug 2, 2007, at 4:55 PM, James Y Knight wrote: > >> Yeah -- it's pretty clear the environment isn't _actually_ in the >> default encoding. It's just binary junk which often but not always >> contains some text encoded in some arbitrary superset of ASCII. Just >> like command line arguments (and filenames on linux). >> >> The hard part is that users expect command line arguments, filenames, >> and environment values to be strings (because they normally do >> contain text-like things), when strictly they cannot be because there >> is no reliable encoding. >> > > A good alternative to this is for SBCL to use the UTF8b encoding to > decode unix environment gunk (filenames, env vars, command line > args) which are *probably* in utf8, but might not be. utf8b has the > nice property that any arbitrary bytestring can be decoded into > unicode, and then round-tripped back to the same bytes. Valid utf8 > sequences turns into the same unicode characters as with the utf8 > codec. Invalid utf8 sequences turn into invalid surrogate pair > sequences in the unicode string. > > Thus, SBCL can return strings, and never throw an error. If you > actually wanted the random binary, you can losslessly convert the > unicode string back to binary. Win win. > > Some references: > Original mail: > http://mail.nl.linux.org/linux-utf8/2000-07/msg00040.html > > Blog entry: > http://bsittler.livejournal.com/10381.html > > Python implementation: http://hyperreal.org/~est/libutf8b/ James _______________________________________________ Python-3000 mailing list [email protected] http://mail.python.org/mailman/listinfo/python-3000 Unsubscribe: http://mail.python.org/mailman/options/python-3000/archive%40mail-archive.com
