Re: [Python-ideas] Adding str.isascii() ?

Serhiy Storchaka Wed, 31 Jan 2018 02:50:48 -0800

26.01.18 10:42, INADA Naoki пише:

Currently, int(), str.isdigit(), str.isalnum(), etc... accepts
non-ASCII strings.

s =  １２３"
s

'１２３'

s.isdigit()

True

print(ascii(s))

'\uff11\uff12\uff13'

int(s)

123

But sometimes, we want to accept only ascii string.  For example,
ipaddress module uses:

_DECIMAL_DIGITS = frozenset('0123456789')
...
if _DECIMAL_DIGITS.issuperset(str):

ref: 
https://github.com/python/cpython/blob/e76daebc0c8afa3981a4c5a8b54537f756e805de/Lib/ipaddress.py#L491-L494

If str has str.isascii() method, it can be simpler:

`if s.isascii() and s.isdigit():`

I want to add it in Python 3.7 if there are no opposite opinions.

There were discussions about this. See for examplehttps://bugs.python.org/issue18814.


In short, there are two considerations that prevented adding this feature:

1. This function can have the constant computation complexity in CPython(just check a single bit), but other implementations may provide onlythe linear computation complexity.

2. In many cases just after taking the answer to this question we encodethe string to bytes (or decode bytes to string). Thus the most naturalway to determining if the string is ASCII-only is trying to encode it toASCII.


And adding a new method to the basic type has a high bar.

The code in ipaddress

        if not _BaseV4._DECIMAL_DIGITS.issuperset(prefixlen_str):
            cls._report_invalid_netmask(prefixlen_str)
        try:
            prefixlen = int(prefixlen_str)
        except ValueError:
            cls._report_invalid_netmask(prefixlen_str)
        if not (0 <= prefixlen <= cls._max_prefixlen):
            cls._report_invalid_netmask(prefixlen_str)
        return prefixlen

can be rewritten as:

        if not prefixlen_str.isdigit():
            cls._report_invalid_netmask(prefixlen_str)
        try:
            prefixlen = int(prefixlen_str.encode('ascii'))
        except UnicodeEncodeError:
            cls._report_invalid_netmask(prefixlen_str)
        except ValueError:
            cls._report_invalid_netmask(prefixlen_str)
        if not (0 <= prefixlen <= cls._max_prefixlen):
            cls._report_invalid_netmask(prefixlen_str)
        return prefixlen

Other possibility -- adding support of the boolean argument instr.isdigit() and similar predicates that switch them to the ASCII-onlymode. Such option will be very useful for the str.strip(), str.split()and str.splilines() methods. Currently they split using all Unicodewhitespaces and line separators, but there is a need to split only onASCII whitespaces and line separators CR, LF and CRLF. In case ofstr.strip() and str.split() you can just pass the string of whitespacecharacters, but there is no such option for str.splilines().


_______________________________________________
Python-ideas mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/

Re: [Python-ideas] Adding str.isascii() ?

Reply via email to