Hm, it seems I was too hurry to implement it... > > There were discussions about this. See for example > https://bugs.python.org/issue18814. > > In short, there are two considerations that prevented adding this feature: > > 1. This function can have the constant computation complexity in CPython > (just check a single bit), but other implementations may provide only the > linear computation complexity. >
Yes. There are no O(1) guarantee about .isascii(). But I expect UTF-8 based string implementation PyPy will have can achieve O(1); just test len(s) == __internal_utf8_len(s) I think if *some* of implementations can achieve O(1), it's beneficial to implement. > 2. In many cases just after taking the answer to this question we encode the > string to bytes (or decode bytes to string). Thus the most natural way to > determining if the string is ASCII-only is trying to encode it to ASCII. > Yes. But ASCII is so special. Someone may want to check ASCII before passing string to int(), float(), decimal.Decimal(), etc... But I don't think there is real use case for encodings other than ASCII. > And adding a new method to the basic type has a high bar. > Agree. > The code in ipaddress > > if not _BaseV4._DECIMAL_DIGITS.issuperset(prefixlen_str): > cls._report_invalid_netmask(prefixlen_str) > try: > prefixlen = int(prefixlen_str) > except ValueError: > cls._report_invalid_netmask(prefixlen_str) > if not (0 <= prefixlen <= cls._max_prefixlen): > cls._report_invalid_netmask(prefixlen_str) > return prefixlen > > can be rewritten as: > > if not prefixlen_str.isdigit(): > cls._report_invalid_netmask(prefixlen_str) > try: > prefixlen = int(prefixlen_str.encode('ascii')) > except UnicodeEncodeError: > cls._report_invalid_netmask(prefixlen_str) > except ValueError: > cls._report_invalid_netmask(prefixlen_str) > if not (0 <= prefixlen <= cls._max_prefixlen): > cls._report_invalid_netmask(prefixlen_str) > return prefixlen > Yes. But .isascii() will be match faster than try ... .encode('ascii') ... except UnicodeEncodeError on most Python implementations. > Other possibility -- adding support of the boolean argument in str.isdigit() > and similar predicates that switch them to the ASCII-only mode. Such option > will be very useful for the str.strip(), str.split() and str.splilines() > methods. Currently they split using all Unicode whitespaces and line > separators, but there is a need to split only on ASCII whitespaces and line > separators CR, LF and CRLF. In case of str.strip() and str.split() you can > just pass the string of whitespace characters, but there is no such option > for str.splilines(). > It sounds good idea. Maybe, keyword only argument `ascii=False`? But if revert adding str.isascii() from Python 3.7, same keyword-only argument should be added to int(), float(), decimal.Decimal(), fractions.Fraction(), etc... It's bit hard. So I think adding .isascii() is beneficial even if all str.is***() methods have `ascii=False` flag. _______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/