2008/9/30 Marcin 'Qrczak' Kowalczyk <[EMAIL PROTECTED]>:

> I've experimentally implemented (not for Python) a different escaping
> scheme with a similar goal as UTF-8b: undecodable bytes are prefixed
> with U+0000 instead of being converted to unpaired surrogates, and
> '\x00' decodes as U+0000 U+0000.

This was not my idea: mono did that first.
http://go-mono.com/docs/index.aspx?link=T%3AMono.Unix.UnixEncoding
"In short, it's a Glorious Hack. Rejoice. Or something."

Note that there are many people, including the Unicode list, who
consider this evil because they view this as a non-standard
modification of UTF-8. I am undecided on how evil it is.

(My implementation differs from mono by the strictness of what Unicode
sequences can be encoded: mono encodes all and mine does not, OTOH
mine is a bijection and mono is not. Both implementations decode all
byte sequences of course.)

-- 
Marcin Kowalczyk
[EMAIL PROTECTED]
http://qrnik.knm.org.pl/~qrczak/
_______________________________________________
Python-3000 mailing list
Python-3000@python.org
http://mail.python.org/mailman/listinfo/python-3000
Unsubscribe: 
http://mail.python.org/mailman/options/python-3000/archive%40mail-archive.com

Reply via email to