26.01.18 10:42, INADA Naoki пише:
Currently, int(), str.isdigit(), str.isalnum(), etc... accepts
non-ASCII strings.

s =  123"
s
'123'
s.isdigit()
True
print(ascii(s))
'\uff11\uff12\uff13'
int(s)
123

But sometimes, we want to accept only ascii string.  For example,
ipaddress module uses:

_DECIMAL_DIGITS = frozenset('0123456789')
...
if _DECIMAL_DIGITS.issuperset(str):

ref: 
https://github.com/python/cpython/blob/e76daebc0c8afa3981a4c5a8b54537f756e805de/Lib/ipaddress.py#L491-L494

If str has str.isascii() method, it can be simpler:

`if s.isascii() and s.isdigit():`

I want to add it in Python 3.7 if there are no opposite opinions.

There were discussions about this. See for example https://bugs.python.org/issue18814.

In short, there are two considerations that prevented adding this feature:

1. This function can have the constant computation complexity in CPython (just check a single bit), but other implementations may provide only the linear computation complexity.

2. In many cases just after taking the answer to this question we encode the string to bytes (or decode bytes to string). Thus the most natural way to determining if the string is ASCII-only is trying to encode it to ASCII.

And adding a new method to the basic type has a high bar.

The code in ipaddress

        if not _BaseV4._DECIMAL_DIGITS.issuperset(prefixlen_str):
            cls._report_invalid_netmask(prefixlen_str)
        try:
            prefixlen = int(prefixlen_str)
        except ValueError:
            cls._report_invalid_netmask(prefixlen_str)
        if not (0 <= prefixlen <= cls._max_prefixlen):
            cls._report_invalid_netmask(prefixlen_str)
        return prefixlen

can be rewritten as:

        if not prefixlen_str.isdigit():
            cls._report_invalid_netmask(prefixlen_str)
        try:
            prefixlen = int(prefixlen_str.encode('ascii'))
        except UnicodeEncodeError:
            cls._report_invalid_netmask(prefixlen_str)
        except ValueError:
            cls._report_invalid_netmask(prefixlen_str)
        if not (0 <= prefixlen <= cls._max_prefixlen):
            cls._report_invalid_netmask(prefixlen_str)
        return prefixlen

Other possibility -- adding support of the boolean argument in str.isdigit() and similar predicates that switch them to the ASCII-only mode. Such option will be very useful for the str.strip(), str.split() and str.splilines() methods. Currently they split using all Unicode whitespaces and line separators, but there is a need to split only on ASCII whitespaces and line separators CR, LF and CRLF. In case of str.strip() and str.split() you can just pass the string of whitespace characters, but there is no such option for str.splilines().

_______________________________________________
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/

Reply via email to