2008/9/30 Marcin 'Qrczak' Kowalczyk <[EMAIL PROTECTED]>: > I've experimentally implemented (not for Python) a different escaping > scheme with a similar goal as UTF-8b: undecodable bytes are prefixed > with U+0000 instead of being converted to unpaired surrogates, and > '\x00' decodes as U+0000 U+0000.
This was not my idea: mono did that first. http://go-mono.com/docs/index.aspx?link=T%3AMono.Unix.UnixEncoding "In short, it's a Glorious Hack. Rejoice. Or something." Note that there are many people, including the Unicode list, who consider this evil because they view this as a non-standard modification of UTF-8. I am undecided on how evil it is. (My implementation differs from mono by the strictness of what Unicode sequences can be encoded: mono encodes all and mine does not, OTOH mine is a bijection and mono is not. Both implementations decode all byte sequences of course.) -- Marcin Kowalczyk [EMAIL PROTECTED] http://qrnik.knm.org.pl/~qrczak/ _______________________________________________ Python-3000 mailing list Python-3000@python.org http://mail.python.org/mailman/listinfo/python-3000 Unsubscribe: http://mail.python.org/mailman/options/python-3000/archive%40mail-archive.com