On Aug 12, 5:32 am, Steven D'Aprano <ste...@remove.this.cybersource.com.au> wrote:
> That problem basically boils down to a deep-seated > philosophical disagreement over which philosophy a > language should follow in regard to backslash escapes: > > "Anything not explicitly permitted is forbidden" > > versus > > "Anything not explicitly forbidden is permitted" No, it doesn't. It boils down to whether a language should: (1) Try it's best to detect errors as early as possible, especially when the cost of doing so is low. (2) Make code as readable as possible, in part by making code as self-evident as possible by mere inspection and by reducing the amount of stuff that you have to memorize. Perl fails miserably in this regard, for instance. (3) To quote Einstein, make everything as simple as possible, and no simpler. (4) Take innately ambiguous things and not force them to be unambiguous by mere fiat. Allowing a programmer to program using a completely arbitrary resolution of "unrecognized escape sequences" violates all of the above principles. The fact that the meanings of unrecognized escape sequences are ambiguous is proved by the fact that every language seems to treat them somewhat differently, demonstrating that there is no natural intuitive meaning for them. Furthermore, allowing programmers to use "unrecognized escape sequences" without raising an error violates: (1) Explicit is better than implicit: Python provides a way to explicitly specify that you want a backslash. Every programmer should be encouraged to use Python's explicit mechanism here. (2) Simple is better than complex: Python currently has two classes of ambiguously interpretable escape sequences: "unrecognized ones", and "illegal" ones. Making a single class (i.e. just illegal ones) is simpler. Also, not having to memorize escape sequences that you rarely have need to use is simpler. (3) Readability counts: See above comments on readability. (4) Errors should never pass silently: Even the Python Reference Manual indicates that unrecognized escape sequences are a source of bugs. (See more comments on this below.) (5) In the face of ambiguity, refuse the temptation to guess. Every language, other than C++, is taking a guess at what the programmer would find to be most useful expansion for unrecognized escape sequences, and each of the languages is guessing differently. This temptation should be refused! You can argue that once it is in the Reference Manual it is no longer a guess, but that is patently specious, as Perl proves. For instance, the fact that Perl will quietly convert an array into a scalar for you, if you assign the array to a scalar variable is certainly a "guess" of the sort that this Python koan is referring to. Likewise for an arbitrary interpretation of unrecognized escape sequences. (6) There should be one-- and preferably only one --obvious way to do it. What is the one obvious way to express "\\y"? It is "\\y" or "\y"? Python can easily make one of these ways the "one obvious way" by making the other one raise an error. (7) Namespaces are one honking great idea -- let's do more of those! Allowing "\y" to self-expand is intruding into the namespace for special characters that require an escape sequence. > C++ apparently forbids all escape sequences, with > unspecified behaviour if you use a forbidden sequence, > except for a handful of explicitly permitted sequences. > > That's not better, it's merely different. It *is* better, as it catches errors early on at little cost, and for all the other reasons listed above. > Actually, that's not true -- that the C++ standard forbids > a thing, but leaves the consequences of doing that thing > unspecified, is clearly a Bad Thing. Indeed. But C++ has backward compatibly issues that make any that Python has to deal with, pale in comparison. The recommended behavior for a C++ compiler, however, is to flag the problem as an error or as a warning. > So on at least one machine in the world, C++ simply strips > out backslashes that it doesn't recognize, leaving the > suffix. Unfortunately, we can't rely on that, because C++ > is underspecified. No, *fortunately* you can't rely on it, forcing you to go fix your code. > Fortunately this is not a problem with > Python, which does completely specify the behaviour of > escape sequences so there are no surprises. It's not a surprise when the C++ compiler issues a warning to you. If you ignore the warning, then you have no one to blame but yourself. > Implicit has an actual meaning. You shouldn't use it as a > mere term of opprobrium for anything you don't like. Pardon me, but I'm using "implicit" to mean "implicit", and nothing more. Python's behavior here is "implicit" in the very same way that Perl implicitly converts an array into a scalar for you. (Though that particular Perl behavior is a far bigger wart than Python's behavior is here!) > > Because you've stated that "\y" is a legal escape > > sequence, while the Python Reference Manual explicitly > > states that it is an "unrecognized escape sequence", and > > that such "unrecognized escape sequences" are sources of > > bugs. > > There's that reading comprehension problem again. > > Unrecognised != illegal. This is reasoning that only a lawyer could love. The right thing for a programming language to do, when handed something that is syntactically "unrecognized" is to raise an error. > It seems to me that the behaviour the Python designers > were looking to avoid was the case where the coder > accidentally inserted a backslash in the wrong place, and > the language stripped the backslash out, e.g.: > > Wanted "a\bcd" but accidentally typed "ab\cd" instead, and > got "abcd". The moral of the story is that *any* arbitrary interpretation of unrecognized escape sequences is a potential source of bugs. In Python, you just end up with a converse issue, where one might understandably assume that "foo\bar" has a backslash in it, because "foo\yar" and *most* other similar strings do. But then it doesn't. > >> This is *exactly* like C++, except that in Python the > >> semantics of \y and \\y are identical. Python doesn't > >> guess what you mean, it *imposes* a meaning on the > >> escape sequence. You just don't like that meaning. > > That's because I don't like things that are > > ill-conceived. > And yet you like C++... go figure *wink* Now that's a bold assertion! I think that "tolerate C++" is more like it. But C++ does have its moments. |>ouglas -- http://mail.python.org/mailman/listinfo/python-list