26.01.18 10:42, INADA Naoki пише:
Currently, int(), str.isdigit(), str.isalnum(), etc... accepts
non-ASCII strings.
s = 123"
s
'123'
s.isdigit()
True
print(ascii(s))
'\uff11\uff12\uff13'
int(s)
123
But sometimes, we want to accept only ascii string. For example,
ipaddress module uses:
_DECIMAL_DIGITS = frozenset('0123456789')
...
if _DECIMAL_DIGITS.issuperset(str):
ref:
https://github.com/python/cpython/blob/e76daebc0c8afa3981a4c5a8b54537f756e805de/Lib/ipaddress.py#L491-L494
If str has str.isascii() method, it can be simpler:
`if s.isascii() and s.isdigit():`
I want to add it in Python 3.7 if there are no opposite opinions.
There were discussions about this. See for example
https://bugs.python.org/issue18814.
In short, there are two considerations that prevented adding this feature:
1. This function can have the constant computation complexity in CPython
(just check a single bit), but other implementations may provide only
the linear computation complexity.
2. In many cases just after taking the answer to this question we encode
the string to bytes (or decode bytes to string). Thus the most natural
way to determining if the string is ASCII-only is trying to encode it to
ASCII.
And adding a new method to the basic type has a high bar.
The code in ipaddress
if not _BaseV4._DECIMAL_DIGITS.issuperset(prefixlen_str):
cls._report_invalid_netmask(prefixlen_str)
try:
prefixlen = int(prefixlen_str)
except ValueError:
cls._report_invalid_netmask(prefixlen_str)
if not (0 <= prefixlen <= cls._max_prefixlen):
cls._report_invalid_netmask(prefixlen_str)
return prefixlen
can be rewritten as:
if not prefixlen_str.isdigit():
cls._report_invalid_netmask(prefixlen_str)
try:
prefixlen = int(prefixlen_str.encode('ascii'))
except UnicodeEncodeError:
cls._report_invalid_netmask(prefixlen_str)
except ValueError:
cls._report_invalid_netmask(prefixlen_str)
if not (0 <= prefixlen <= cls._max_prefixlen):
cls._report_invalid_netmask(prefixlen_str)
return prefixlen
Other possibility -- adding support of the boolean argument in
str.isdigit() and similar predicates that switch them to the ASCII-only
mode. Such option will be very useful for the str.strip(), str.split()
and str.splilines() methods. Currently they split using all Unicode
whitespaces and line separators, but there is a need to split only on
ASCII whitespaces and line separators CR, LF and CRLF. In case of
str.strip() and str.split() you can just pass the string of whitespace
characters, but there is no such option for str.splilines().
_______________________________________________
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/