On Sep 30, 2008, at 12:57 PM, Guido van Rossum wrote:
And again: if utf-8b isn't acceptable, because it does break things
in some
unknown-to-me way, I really can't imagine anything working but just
going
back to byte-string access as the only API. It's really not okay
for the
"obvious" APIs to be totally broken by unexpected input. Think
os.getcwd(),
sys.argv, os.environ. You can't just ignore bad files and call it
done.
Actually that is what you *have* to do with the
filesystem-as-a-black-box model. Filesystems reserve the right to fail
occasionally and there's nothing you can do to prevent it -- it would
be unacceptable if the entire disk would stop working because it had
one bad block (unless the bad block is in some kind of master table)
so you just have to deal with it, and you can't wish the problems away
by insisting on a perfect abstraction.
What I meant is that ignoring certain files not nearly good enough to
solve the problem.
python -c "import sys; print sys.argv" "$(echo -e 'filename\x90\x90')"
-> python3 fails to start.
cd "$(echo -e 'dir\x90')" # Assume said dir exists
python -> python3 fails to start.
PATH="$PATH:$(echo -e /home/user/dir\x90)"
python3 -c "import os; print os.environ['PATH']" -> nope, no PATH.
Those aren't good behaviors, and can't be solved simply by pretending
certain files don't exist.
But please see the U+0000-escape alternative proposed by Marcin. It,
unlike utf-8b doesn't depend upon non-standard unicode, so maybe there
won't be as much opposition to it.
James
_______________________________________________
Python-3000 mailing list
Python-3000@python.org
http://mail.python.org/mailman/listinfo/python-3000
Unsubscribe:
http://mail.python.org/mailman/options/python-3000/archive%40mail-archive.com