I like the idea of str.isdigit(ascii=True): would behave as str.isdigit() and str.isascii(). It's easy to implement and likely to be very efficient. I'm just not sure that it's so commonly required?
At least, I guess that some users can be surprised that str.isdigit() is "Unicode aware", accept non-ASCII digits, as int(str). Victor 2018-01-31 12:18 GMT+01:00 INADA Naoki <songofaca...@gmail.com>: > Hm, it seems I was too hurry to implement it... > >> >> There were discussions about this. See for example >> https://bugs.python.org/issue18814. >> >> In short, there are two considerations that prevented adding this feature: >> >> 1. This function can have the constant computation complexity in CPython >> (just check a single bit), but other implementations may provide only the >> linear computation complexity. >> > > Yes. There are no O(1) guarantee about .isascii(). > But I expect UTF-8 based string implementation PyPy will have can achieve > O(1); just test len(s) == __internal_utf8_len(s) > > I think if *some* of implementations can achieve O(1), it's beneficial > to implement. > > >> 2. In many cases just after taking the answer to this question we encode the >> string to bytes (or decode bytes to string). Thus the most natural way to >> determining if the string is ASCII-only is trying to encode it to ASCII. >> > > Yes. But ASCII is so special. > Someone may want to check ASCII before passing string to int(), > float(), decimal.Decimal(), etc... > But I don't think there is real use case for encodings other than ASCII. > >> And adding a new method to the basic type has a high bar. >> > > Agree. > >> The code in ipaddress >> >> if not _BaseV4._DECIMAL_DIGITS.issuperset(prefixlen_str): >> cls._report_invalid_netmask(prefixlen_str) >> try: >> prefixlen = int(prefixlen_str) >> except ValueError: >> cls._report_invalid_netmask(prefixlen_str) >> if not (0 <= prefixlen <= cls._max_prefixlen): >> cls._report_invalid_netmask(prefixlen_str) >> return prefixlen >> >> can be rewritten as: >> >> if not prefixlen_str.isdigit(): >> cls._report_invalid_netmask(prefixlen_str) >> try: >> prefixlen = int(prefixlen_str.encode('ascii')) >> except UnicodeEncodeError: >> cls._report_invalid_netmask(prefixlen_str) >> except ValueError: >> cls._report_invalid_netmask(prefixlen_str) >> if not (0 <= prefixlen <= cls._max_prefixlen): >> cls._report_invalid_netmask(prefixlen_str) >> return prefixlen >> > > Yes. But .isascii() will be match faster than try ... > .encode('ascii') ... except UnicodeEncodeError > on most Python implementations. > > >> Other possibility -- adding support of the boolean argument in str.isdigit() >> and similar predicates that switch them to the ASCII-only mode. Such option >> will be very useful for the str.strip(), str.split() and str.splilines() >> methods. Currently they split using all Unicode whitespaces and line >> separators, but there is a need to split only on ASCII whitespaces and line >> separators CR, LF and CRLF. In case of str.strip() and str.split() you can >> just pass the string of whitespace characters, but there is no such option >> for str.splilines(). >> > > It sounds good idea. Maybe, keyword only argument `ascii=False`? > > But if revert adding str.isascii() from Python 3.7, same keyword-only > argument should be > added to int(), float(), decimal.Decimal(), fractions.Fraction(), > etc... It's bit hard. > > So I think adding .isascii() is beneficial even if all str.is***() > methods have `ascii=False` flag. > _______________________________________________ > Python-ideas mailing list > Python-ideas@python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ _______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/