[issue34156] Nail down and document the behavior of range expressions in RE character classes

2018-07-19 Thread Zack Weinberg


Zack Weinberg  added the comment:

Also, whether or not the current behavior is the intended behavior, I think 
programmers would appreciate an explicit statement of whether or not it might 
change in the future.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue34156] Nail down and document the behavior of range expressions in RE character classes

2018-07-19 Thread Zack Weinberg

New submission from Zack Weinberg :

The documentation of the semantics of range expressions in regular expression 
character classes is not precise enough.  All it says is

Ranges of characters can be indicated by giving two characters and 
separating them by a '-', for example [a-z] will match any lowercase ASCII 
letter [... more examples, none involving non-ASCII characters]

In testing it seems that the behavior is simply to expand the range to a set of 
characters by numeric code point, e.g. '[ᄀ-ፚ]' will match any single character 
whose ord() is in between ord('ᄀ') and ord('ፚ') (inclusive).  If that is the 
intended behavior, I would like the documentation to explicitly say so.  If 
that is _not_ the intended behavior, I would like to know what the intended 
behavior actually is, and for both the code and the documentation to be changed 
to reflect the intent.

(I think expansion by numeric code point makes sense and is probably what most 
existing programs want, but this is a contended issue in the context of POSIX 
regular expressions, e.g. some C libraries try (not always successfully) to 
make [0-9] match all of the characters that Python's \d matches, so it's not 
"obvious".)

--
assignee: docs@python
components: Documentation, Regular Expressions
messages: 321963
nosy: docs@python, ezio.melotti, mrabarnett, zwol
priority: normal
severity: normal
status: open
title: Nail down and document the behavior of range expressions in RE character 
classes
type: behavior

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com