Antoine Pitrou writes: > Stephen J. Turnbull <turnbull <at> sk.tsukuba.ac.jp> writes: > > > > It's usually not "hypothetical"; often, the user knows what it is. > > Why not ask her? That's what web browsers do, in effect, by providing > > View as Charset commands. > > The average user does not even /know/ what a charset is.
Where I live they do -- there's a reason why "mojibake" is one of the few Japanese words to be borrowed into English rather than vice versa. > > The problem with the strategies that are being proposed is that this > > is an application-level problem, not a Python-level problem. > > I don't understand why you think that. If a filename can't be > exactly represented with a valid Unicode sequence, all applications > wanting to access that file are impacted in the same way, and it is > likely that the same solution or workaround can be applied to all > applications. That is not my experience in 10+ years of developing XEmacs/MULE. There are many solutions/workarounds, but all of them are vulnerable to the fundamental mismatch between the POSIX definition of a filename (or string, for that matter) as a slightly restricted sequence of octets, and the human being's insistence on interpreting that sequence of octets as the encoded representation of a textual string. True, some solutions are better than others, but there seems to be none that dominates across the board. Rather, each of the better ones is appropriate for some subset of users and applications. _______________________________________________ Python-3000 mailing list Python-3000@python.org http://mail.python.org/mailman/listinfo/python-3000 Unsubscribe: http://mail.python.org/mailman/options/python-3000/archive%40mail-archive.com