BTW, Windows will cheerfully let you create and access files with
"garbage surrogates" in it.
Try it yourself:
open(u"\ud8fd", 'w').close()
os.listdir(u'.')
IMO that pretty much blows out of the water any suggestion encoding
invalid UTF-8 sequences into lone surrogates is an evil and broken
thing to do.
So, I'm back to favoring the lone surrogate plan over the U+0000 plan.
But either one seems better than the alternatives.
James
On Sep 29, 2008, at 11:11 PM, Stephen J. Turnbull wrote:
James Y Knight writes:
On Sep 29, 2008, at 3:32 AM, Adam Olsen wrote:
UTF-8b doesn't work as intended. It produces an invalid unicode
object (garbage surrogates) that cannot be used with external APIs
or
libraries that require unicode.
I'd be interested to hear more detail on what you expect the
practical
ramifications of this to be. It doesn't sound likely to be a problem
to me.
That's because you have a specific use case in mind. Adam clearly has
in mind passing the filename on to a library which might proceed to
signal an error (to him, unexpected) on garbage surrogates. He
doesn't want to be surprised by that.
_______________________________________________
Python-3000 mailing list
Python-3000@python.org
http://mail.python.org/mailman/listinfo/python-3000
Unsubscribe:
http://mail.python.org/mailman/options/python-3000/archive%40mail-archive.com