2009/4/24 Stephen J. Turnbull <step...@xemacs.org>: > Paul Moore writes: > > > The pros for Martin's proposal are a uniform cross-platform interface, > > and a user-friendly API for the common case. > > A more accurate phrasing would be "... a user-friendly API for those > who feel very lucky today." Which is the common case, of course, but > spins a little differently.
Sorry, but I think you're misrepresenting things. I'd have probably let you off if you'd missed out the "very" - but I do think that it's the common case. Consider: - Windows systems where broken Unicode (lone surrogates or whatever) isn't involved - Unix systems where the user's stated filesystem encoding is correct Can you honestly say that this isn't the vast majority of real-world environments? (IIRC, you are based in Japan, so it may well be true that the likelihood of problems is a lot higher where you are than where I am - the UK - but I suspect that averaging out, things are generally as above). > > [1] Actually, all the PEP says is "With this PEP, a uniform > > treatment of these data as characters becomes possible." An > > argument as to why this is a good thing would be a useful addition > > to the PEP. At the moment it's more or less treated as self-evident > > - which I agree with, but which clearly the Unix people here are > > not as certain of. > > Well, the problem is that both parts are false. I can't work out which "parts" you are referring to here. > If you didn't start > with a valid string in a known encoding, you shouldn't treat it as > characters because it's not. Again, that's the purist argument. If you have a string (of bytes, I guess) and a 99% certain guess as to the correct encoding, then I'd argue that, as long as (a) it's not mission-critical (lives or backups depend on it) and (b) you have a means of failing relatively gracefully, you have every reason to make the assumption about encoding. After all, what's the alternative? Ultimately, you have a byte string and no encoding. You make some assumption, or you can do hardly anything. What use is "Processing file \x66\x6f\x6f" as a progress indicator for a program that scans a directory? (That was "foo" for people who can't read latin-1 written in hex :-)) > Hand it to a careful API, and you'll get > an Exception raised in your face. And that's precisely why it's not > obviously a good thing. Careful clients will have to treat it as > "transcoded bytes", and so the people who develop those clients get no > benefit. OTOH, at least some of those who feel lucky and use it > naively are going to turn out to be wrong. But 99% of the time, "it" is a perfectly acceptable string. (Percentage invented out of thin air, admitted :-)) Remember, only when the system encounters an undecodable byte sequence, would a technically invalid string be generated - and as far as I can tell, the main case when that would happen is on Unix, if the user specifies UTF-8 as the encoding, and the actual filesystem uses something else, *and* there's a file with a name whose byte sequence is invalid UTF-8. I'm *really* struggling to see that as a common scenario. Admittedly, there are other, possibly more common, cases where the string translation is valid, but semantically not what the user expects - user says CP1251, but filesystem is CP850, say. As a UK Windows user, I'm used to seeing CP850 vs CP1251 confusions like this - "£" replaced with ú is the common case. It happens occasionally, and occasionally causes code to behave unexpectedly. But it doesn't reformat my hard drive and the alternative (having to be extra-careful to tell every program precisely which encoding I'm using in every situation) would make programs effectively unusable. > That said, I'm +0 on the PEP as is. So I'm largely preaching to the converted here. After all, lukewarm acceptance from someone with experience of Asian encoding issues is pretty much the equivalent of resounding support from someone who only ever works in English! :-) Paul. _______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com