On Tue, Jun 30, 2020 at 09:04:15PM +0300, Mikhail V wrote:
> > Counter-proposal: hex escapes allow optional curly brackets, similar to
> > unicode name escapes. You could even allow spaces within the braces, for
> > grouping:
> >
> > # Proposed enhancement:
> > "\x{2b}2c" # '+2c'
> > "\x{2b2c}" # '+,'
> > "\x{DEAD BEEF}" # "\xDE\xAD\xBE\xEF"
>
> Nice. But I am not sure about the data type and interpretation depending
> on string type. E.g. the second example:
>
> "\x{2b2c}" # '+,'
>
> In my example I was showing hex codepoints, e.g. U+2b2c is ⬬ (Black
> Horizontal Ellipse)
Your example used the `\x` escape, which takes a pair of hex digits
between 0 and 255 inclusive (`\x00` to `\xFF`) and returns a single
unicode character between `\u0000` and `\u00FF`. You cannot use x
escapes to build up higher unicode code points in a string:
'\x2b\x2c' != '\u2b2c'
So I assumed that you wanted a way to include multiple such escapes in a
sequence. If you want the horizontal ellipse, don't use an `\x` escape,
it is the wrong one! Use `\u2b2c`.
I have no interest in making `\x{2b2c}` an alternative way of writing
`\u2b2c`. Just use the u (or U) escape instead of x.
I have no objection to adding the same braces to unicode u and U
escapes. Inside the braces, spaces and underscores can be just ignored
(they are there for visual grouping).
(1) Byte strings support optional braces, spaces and underscores for
grouping in hex escapes:
b'\x{2b 2c_2a}' == b'\x2b\x2c\x2a' == b'+,*'
The spaces/underscores can appear anywhere within the braces, in any
order. "Consenting adults" apply:
# Valid, but don't do this.
b'\x{ 2 ___ _ ___ b }'
Style guides and linters can warn against writing ugly strings :-)
(2) Unicode strings support the same, with the equivalent semantics:
'\x{2b 2c_2a}' == '\x2b\x2c\x2a' == '+,*'
(3) Similarly Unicode strings support optional braces and grouping for u
and U escapes:
'\u{2b 2c}' == '\u2b2c' == '\N{BLACK HORIZONTAL ELLIPSE}'
'\U{0000 2b2c}' == '\U00002b2c' == '\N{BLACK HORIZONTAL ELLIPSE}'
Likewise any combination of spaces and underscores, in any order, are
valid. We can write hideous strings if we want :-)
# Valid but don't do this.
'\U{ __ 0 __0__ 0 0 2_b 2 ___c___ }'
Unlike x escapes, I don't think we should support multiple code points
within the u and U braces:
# Not part of the proposal
'\u{221a221e}' == '\N{SQUARE ROOT}\N{INFINITY}'
My reasoning for this is that the leading `\x` is proportionally very
"heavy" for hex escapes: fifty percent of the escape code is made up by
the leading `\x`, versus just 33% for u escapes and 20% for U escapes.
So there is much less benefit to grouping multiple u and U escapes in a
single set of braces.
The other reason why grouping u and U escapes is less useful is that
often we can just include the literal unicode character as a string:
'√∞'
whereas you cannot do so for control characters. So my argument is to
make the conservative change and only allow multiple escape codes inside
braces for x escapes.
(We can relax the restriction later if there is demand for it, but we
cannot tighten it if we change our mind.)
Likewise, I would prefer the conservative approach of still requiring
leading zeroes in u and U escapes.
(4) Lastly, f-strings support the same rules as unicode strings.
--
Steven
_______________________________________________
Python-ideas mailing list -- [email protected]
To unsubscribe send an email to [email protected]
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at
https://mail.python.org/archives/list/[email protected]/message/473YKBKZMOH2FNMNDUOMD263VEJ3HH66/
Code of Conduct: http://python.org/psf/codeofconduct/