On Fri, Oct 10, 2008 at 1:31 AM, Glenn Linderman <[EMAIL PROTECTED]> wrote: > On approximately 10/9/2008 11:55 PM, came the following characters from the > keyboard of Stephen J. Turnbull: >> The problem that all the proposals face is that they assume that we >> know where the cleaning up will be done, and that we're in control of >> the code that will have to do it. > > > I think this is your expression of "Applications that do XXX may neeed > modification to handle all files" :) > > The object wrapper gives us the right control, but likely forces more > changes to applications than the other schemes. BDFL has chosen scheme 2, > it seems, unless he changes his mind. It has the advantages that few or no > code changes are necessary to handle files that have Unicode names, and > applications that want to handle files with non-Unicode names can, but have > to work harder. If Python had come with a file path manipulation object > from the beginning, (3) might be a better scheme, but, as much as I like and > wish for scheme (3), scheme (2) has a better migration story, and scheme (1) > basically only solves some of the problems some of the times, and can cause > other problems due to data puns (although the chances of doing so are > somewhat low, and approach zero in my environment, and likely in many > environments... but then in my environment, and likely in many environments, > they also don't actually solve any problems either, so I'd be just as well > off without it).
There's a spectrum of choices, depending on how soon you want the API to fail: * bytes/unicode distinct APIs. unicode never fails, but does skip. * bytes/unicode automatic. return bytes for invalid names; fails when concatenated to unicode strings * invalid unicode. Works internally, but fails when exposed to external APIs * FilePath object. I can't see a difference from invalid unicode? * transformed unicode. Works internally, can be round-tripped through external APIs, but fails if those external APIs touch the filesystem. Also breaks valid file names. Since none of the options eliminate failure (and none can, short of universally redefining UTF-8 or making the filesystem validate the encoding), we instead pick the lesser evil. Although the first option does skip file names, it turns out to be the least surprising and least magical. Indeed, it's the only option that never fails while listing directory contents! -- Adam Olsen, aka Rhamphoryncus _______________________________________________ Python-3000 mailing list Python-3000@python.org http://mail.python.org/mailman/listinfo/python-3000 Unsubscribe: http://mail.python.org/mailman/options/python-3000/archive%40mail-archive.com