[issue25743] Clarify exactly what \w matches in UNICODE mode

2016-01-03 Thread Ezio Melotti
Changes by Ezio Melotti : -- components: +Regular Expressions nosy: +ezio.melotti, mrabarnett stage: -> needs patch type: -> enhancement versions: -Python 3.2, Python 3.3, Python 3.4 ___ Python tracker

[issue25743] Clarify exactly what \w matches in UNICODE mode

2015-11-27 Thread Zack Weinberg
New submission from Zack Weinberg: The `re` module documentation does not do a good job of explaining exactly what `\w` matches. Quoting https://docs.python.org/3.5/library/re.html : > \w > For Unicode (str) patterns: > Matches Unicode word characters; this includes most characters > that can

[issue25743] Clarify exactly what \w matches in UNICODE mode

2015-11-27 Thread Andi McClure
Andi McClure added the comment: I would like to request also a clear explanation be given for the documentation in the 2.7 branch. From https://docs.python.org/2.7/library/re.html : "\w ... If UNICODE is set, this will match the characters [0-9_] plus whatever is classified as alphanumeric in

[issue25743] Clarify exactly what \w matches in UNICODE mode

2015-11-27 Thread Zack Weinberg
Zack Weinberg added the comment: FWIW, the actual behavior of \w matching "everything in Unicode general categories L* and N*, plus U+005F (underscore)" is consistent across all versions I can conveniently test (2.7, 3.4, 3.5). In 2.7, there are four characters in general category Nl that \w