Who would benefit from changing this? Let's not change things just because we can, or because Perl 6 does it.
On Thu, Nov 16, 2017 at 9:21 AM, MRAB <pyt...@mrabarnett.plus.com> wrote: > On 2017-11-16 10:23, Serhiy Storchaka wrote: > >> Currently the re module ignores only 6 ASCII whitespaces in the >> re.VERBOSE mode: >> >> U+0009 CHARACTER TABULATION >> U+000A LINE FEED >> U+000B LINE TABULATION >> U+000C FORM FEED >> U+000D CARRIAGE RETURN >> U+0020 SPACE >> >> Perl ignores characters that Unicode calls "Pattern White Space" in the >> /x mode. It ignores additional 5 non-ASCII characters. >> >> U+0085 NEXT LINE >> U+200E LEFT-TO-RIGHT MARK >> U+200F RIGHT-TO-LEFT MARK >> U+2028 LINE SEPARATOR >> U+2029 PARAGRAPH SEPARATOR >> >> The regex module just ignores characters for which str.isspace() returns >> True. It ignores additional 20 non-ASCII whitespace characters, >> including characters U+001C..001F whose classification as whitespaces is >> questionable, but doesn't ignore LEFT-TO-RIGHT MARK and RIGHT-TO-LEFT >> MARK. >> >> U+001C [FILE SEPARATOR] >> U+001D [GROUP SEPARATOR] >> U+001E [RECORD SEPARATOR] >> U+001F [UNIT SEPARATOR] >> U+00A0 NO-BREAK SPACE >> U+1680 OGHAM SPACE MARK >> U+2000 EN QUAD >> U+2001 EM QUAD >> U+2002 EN SPACE >> U+2003 EM SPACE >> U+2004 THREE-PER-EM SPACE >> U+2005 FOUR-PER-EM SPACE >> U+2006 SIX-PER-EM SPACE >> U+2007 FIGURE SPACE >> U+2008 PUNCTUATION SPACE >> U+2009 THIN SPACE >> U+200A HAIR SPACE >> U+202F NARROW NO-BREAK SPACE >> U+205F MEDIUM MATHEMATICAL SPACE >> U+3000 IDEOGRAPHIC SPACE >> >> str.isspace appears to be Unicode "Whitespace" plus those 4 > "questionable" codepoints. > > > Is it worth to extend the set of ignored whitespaces to "Pattern >> Whitespaces"? Would it add any benefit? Or add confusion? Should this >> depend on the re.ASCII mode? Should the byte b'\x85' be ignorable in >> verbose bytes patterns? >> >> And there is a similar question about the Python parser. If Python uses >> Unicode definition for identifier, shouldn't it accept non-ASCII >> "Pattern Whitespaces" as whitespaces? There will be technical problems >> with supporting this, but are there any benefits? >> >> >> https://perldoc.perl.org/perlre.html >> https://www.unicode.org/reports/tr31/tr31-4.html#Pattern_Syntax >> https://unicode.org/L2/L2005/05012r-pattern.html >> >> _______________________________________________ > Python-ideas mailing list > Python-ideas@python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -- --Guido van Rossum (python.org/~guido)
_______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/