Re: [Python-ideas] Adding str.isascii() ?

2018-01-31 Thread Guido van Rossum
For the record, I encouraged this. I see no reason to amend my decision. Only .isascii(), for str, bytes, bytearray. No ascii= flags to other functions. On Wed, Jan 31, 2018 at 4:17 AM, Serhiy Storchaka wrote: > 31.01.18 13:18, INADA Naoki пише: > >> Yes. But .isascii()

Re: [Python-ideas] Adding str.isascii() ?

2018-01-31 Thread Chris Angelico
On Thu, Feb 1, 2018 at 12:44 AM, Victor Stinner wrote: > I like the idea of str.isdigit(ascii=True): would behave as > str.isdigit() and str.isascii(). It's easy to implement and likely to > be very efficient. I'm just not sure that it's so commonly required? > > At

Re: [Python-ideas] Adding str.isascii() ?

2018-01-31 Thread Victor Stinner
I like the idea of str.isdigit(ascii=True): would behave as str.isdigit() and str.isascii(). It's easy to implement and likely to be very efficient. I'm just not sure that it's so commonly required? At least, I guess that some users can be surprised that str.isdigit() is "Unicode aware", accept

Re: [Python-ideas] Adding str.isascii() ?

2018-01-31 Thread Serhiy Storchaka
31.01.18 13:18, INADA Naoki пише: Yes. But .isascii() will be match faster than try ... .encode('ascii') ... except UnicodeEncodeError on most Python implementations. In this case this doesn't matter since this is an exceptional case, and in any case an exception is raised for non-ascii

Re: [Python-ideas] Adding str.isascii() ?

2018-01-31 Thread INADA Naoki
Hm, it seems I was too hurry to implement it... > > There were discussions about this. See for example > https://bugs.python.org/issue18814. > > In short, there are two considerations that prevented adding this feature: > > 1. This function can have the constant computation complexity in CPython

Re: [Python-ideas] Adding str.isascii() ?

2018-01-31 Thread Serhiy Storchaka
26.01.18 10:42, INADA Naoki пише: Currently, int(), str.isdigit(), str.isalnum(), etc... accepts non-ASCII strings. s = 123" s '123' s.isdigit() True print(ascii(s)) '\uff11\uff12\uff13' int(s) 123 But sometimes, we want to accept only ascii string. For example, ipaddress module

Re: [Python-ideas] Adding str.isascii() ?

2018-01-30 Thread Chris Barker
On Tue, Jan 30, 2018 at 12:00 AM, Steven D'Aprano wrote: > > But it's also a readability question: "is_ascii()" and > > "is_UCS2()/is_BMP()" just require knowing what 7-bit ASCII and UCS-2 > > (or the basic multilingual plane) *are*, whereas the current ways of > > checking

Re: [Python-ideas] Adding str.isascii() ?

2018-01-30 Thread Steven D'Aprano
On Mon, Jan 29, 2018 at 12:54:41PM -0800, Chris Barker wrote: > I'm confused -- isn't the way to do this to encode your text into the > encoding the other application accepts ? Its more about warning the user of *my* application that the data they're exporting could generate mojibake, or even

Re: [Python-ideas] Adding str.isascii() ?

2018-01-30 Thread Steven D'Aprano
On Tue, Jan 30, 2018 at 03:12:52PM +1000, Nick Coghlan wrote: [...] > So this is partly an optimisation question: > > - folks want to avoid allocating a bytes object just to throw it away > - folks want to avoid running the equivalent of "max(map(ord, text))" > - folks know that CPython (at

Re: [Python-ideas] Adding str.isascii() ?

2018-01-29 Thread Nick Coghlan
On 30 January 2018 at 06:54, Chris Barker wrote: > On Fri, Jan 26, 2018 at 5:27 PM, Steven D'Aprano > wrote: >> >> tcl/tk and Javascript only support UCS-2 (16 bit) Unicode strings. >> Dealing with the Supplementary Unicode Planes have the same

Re: [Python-ideas] Adding str.isascii() ?

2018-01-29 Thread Chris Barker
On Fri, Jan 26, 2018 at 5:27 PM, Steven D'Aprano wrote: > tcl/tk and Javascript only support UCS-2 (16 bit) Unicode strings. > Dealing with the Supplementary Unicode Planes have the same problems > that older "narrow" builds of Python sufferred from: single code points >

Re: [Python-ideas] Adding str.isascii() ?

2018-01-26 Thread Terry Reedy
On 1/27/2018 2:01 AM, Guido van Rossum wrote: On Fri, Jan 26, 2018 at 8:22 PM, Terry Reedy > wrote: On 1/26/2018 8:27 PM, Steven D'Aprano wrote: On Fri, Jan 26, 2018 at 02:37:14PM +0100, Victor Stinner wrote: Really? I

Re: [Python-ideas] Adding str.isascii() ?

2018-01-26 Thread Guido van Rossum
On Fri, Jan 26, 2018 at 8:22 PM, Terry Reedy wrote: > On 1/26/2018 8:27 PM, Steven D'Aprano wrote: > >> On Fri, Jan 26, 2018 at 02:37:14PM +0100, Victor Stinner wrote: >> > > Really? I never required such check in practice. Would you mind to >>> elaborate your use case? >>> >>

Re: [Python-ideas] Adding str.isascii() ?

2018-01-26 Thread Terry Reedy
On 1/26/2018 8:27 PM, Steven D'Aprano wrote: On Fri, Jan 26, 2018 at 02:37:14PM +0100, Victor Stinner wrote: Really? I never required such check in practice. Would you mind to elaborate your use case? tcl/tk and Javascript only support UCS-2 (16 bit) Unicode strings. Since IDLE is a

Re: [Python-ideas] Adding str.isascii() ?

2018-01-26 Thread Guido van Rossum
IMO the special status for isascii() matches the special status of ASCII as encoding (yeah, I know, it's not the default encoding anywhere, but it still comes up regularly in standards and as common subset of other encodings). Should you wish to check for compatibility with other ranges IMO some

Re: [Python-ideas] Adding str.isascii() ?

2018-01-26 Thread Steven D'Aprano
On Fri, Jan 26, 2018 at 02:37:14PM +0100, Victor Stinner wrote: > 2018-01-26 13:39 GMT+01:00 Steven D'Aprano : > > I have no objection to isascii, but I don't think it goes far enough. > > Sometimes I want to know whether a string is compatible with Latin-1 or > > UCS-2 as

Re: [Python-ideas] Adding str.isascii() ?

2018-01-26 Thread INADA Naoki
> > That's fine with me. Please also add it to bytes and bytearray objects. It's > okay if the implementation has to scan the string -- so do isdigit() etc. > > -- > --Guido van Rossum (python.org/~guido) Thanks for your pronouncement! I'll do it in this weekend. Regards, -- INADA Naoki

Re: [Python-ideas] Adding str.isascii() ?

2018-01-26 Thread Guido van Rossum
On Fri, Jan 26, 2018 at 12:42 AM, INADA Naoki wrote: > Currently, int(), str.isdigit(), str.isalnum(), etc... accepts > non-ASCII strings. > > >>> s = 123" > >>> s > '123' > >>> s.isdigit() > True > >>> print(ascii(s)) > '\uff11\uff12\uff13' > >>> int(s) > 123 > > But

Re: [Python-ideas] Adding str.isascii() ?

2018-01-26 Thread INADA Naoki
We have _PyUnicodeWriter for such use cases. We may able to expose it as public API, but please start another thread for it. Unicode created by wrong maxchar is not supported from Python 3.3. == and hash() doesn't work properly for such unicode object. So str.isascii() has not to support it too.

Re: [Python-ideas] Adding str.isascii() ?

2018-01-26 Thread M.-A. Lemburg
On 26.01.2018 16:16, Random832 wrote: > On Fri, Jan 26, 2018, at 09:18, M.-A. Lemburg wrote: >> Is there a way to call an API which fixes the setting >> (a public version of unicode_adjust_maxchar()) ? >> >> Without this, how would an extension be able to provide a >> correct value upfront without

Re: [Python-ideas] Adding str.isascii() ?

2018-01-26 Thread Random832
On Fri, Jan 26, 2018, at 09:18, M.-A. Lemburg wrote: > Is there a way to call an API which fixes the setting > (a public version of unicode_adjust_maxchar()) ? > > Without this, how would an extension be able to provide a > correct value upfront without knowing the content ? It obviously has to

Re: [Python-ideas] Adding str.isascii() ?

2018-01-26 Thread INADA Naoki
No. See this mail. https://mail.python.org/pipermail/python-ideas/2018-January/048748.html Point is should we support invalid Unicode created by C API. And I assume no. 2018/01/26 午後11:58 "Antoine Pitrou" : On Fri, 26 Jan 2018 22:33:36 +0900 INADA Naoki

Re: [Python-ideas] Adding str.isascii() ?

2018-01-26 Thread M.-A. Lemburg
On 26.01.2018 15:58, Antoine Pitrou wrote: > On Fri, 26 Jan 2018 22:33:36 +0900 > INADA Naoki > wrote: >>> >>> Can you create a simple test-case that proves this? >> >> Sure. > > I think the question assumed "without writing custom C or ctypes code > that deliberately

Re: [Python-ideas] Adding str.isascii() ?

2018-01-26 Thread Antoine Pitrou
On Fri, 26 Jan 2018 22:33:36 +0900 INADA Naoki wrote: > > > > Can you create a simple test-case that proves this? > > Sure. I think the question assumed "without writing custom C or ctypes code that deliberately builds a non-conformant unicode object" ;-) Regards

Re: [Python-ideas] Adding str.isascii() ?

2018-01-26 Thread M.-A. Lemburg
On 26.01.2018 14:55, Victor Stinner wrote: > 2018-01-26 14:43 GMT+01:00 M.-A. Lemburg : >> If that's indeed being used as assumption, the docs must be >> fixed and PyUnicode_New() should verify this assumption as >> well - not only in debug builds using C asserts() :-) > > As

Re: [Python-ideas] Adding str.isascii() ?

2018-01-26 Thread Victor Stinner
2018-01-26 14:43 GMT+01:00 M.-A. Lemburg : > If that's indeed being used as assumption, the docs must be > fixed and PyUnicode_New() should verify this assumption as > well - not only in debug builds using C asserts() :-) As PyUnicode_FromStringAndSize(NULL, size),

Re: [Python-ideas] Adding str.isascii() ?

2018-01-26 Thread M.-A. Lemburg
On 26.01.2018 14:31, Victor Stinner wrote: > 2018-01-26 12:17 GMT+01:00 INADA Naoki : >>> No, because you can pass in maxchar to PyUnicode_New() and >>> the implementation will take this as hint to the max code point >>> used in the string. There is no check done whether

Re: [Python-ideas] Adding str.isascii() ?

2018-01-26 Thread Victor Stinner
2018-01-26 13:39 GMT+01:00 Steven D'Aprano : > I have no objection to isascii, but I don't think it goes far enough. > Sometimes I want to know whether a string is compatible with Latin-1 or > UCS-2 as well as ASCII. For that, I used a function that exposes the > size of code

Re: [Python-ideas] Adding str.isascii() ?

2018-01-26 Thread INADA Naoki
> > Can you create a simple test-case that proves this? Sure. $ git diff diff --git a/Modules/_testcapimodule.c b/Modules/_testcapimodule.c index 2ad4322eca..475d5219e1 100644 --- a/Modules/_testcapimodule.c +++ b/Modules/_testcapimodule.c @@ -5307,6 +5307,12 @@ PyInit__testcapi(void)

Re: [Python-ideas] Adding str.isascii() ?

2018-01-26 Thread Victor Stinner
2018-01-26 12:17 GMT+01:00 INADA Naoki : >> No, because you can pass in maxchar to PyUnicode_New() and >> the implementation will take this as hint to the max code point >> used in the string. There is no check done whether maxchar >> is indeed the minimum upper bound to

Re: [Python-ideas] Adding str.isascii() ?

2018-01-26 Thread Chris Angelico
On Fri, Jan 26, 2018 at 10:17 PM, INADA Naoki wrote: >> No, because you can pass in maxchar to PyUnicode_New() and >> the implementation will take this as hint to the max code point >> used in the string. There is no check done whether maxchar >> is indeed the minimum

Re: [Python-ideas] Adding str.isascii() ?

2018-01-26 Thread INADA Naoki
Do you mean we should fix *all* of CPython unicode handling, not only str.isascii()? At least, equality test doesn't care wrong kind. https://github.com/python/cpython/blob/master/Objects/stringlib/eq.h

Re: [Python-ideas] Adding str.isascii() ?

2018-01-26 Thread Steven D'Aprano
On Fri, Jan 26, 2018 at 05:42:31PM +0900, INADA Naoki wrote: > If str has str.isascii() method, it can be simpler: > > `if s.isascii() and s.isdigit():` > > I want to add it in Python 3.7 if there are no opposite opinions. I have no objection to isascii, but I don't think it goes far enough.

Re: [Python-ideas] Adding str.isascii() ?

2018-01-26 Thread INADA Naoki
> No, because you can pass in maxchar to PyUnicode_New() and > the implementation will take this as hint to the max code point > used in the string. There is no check done whether maxchar > is indeed the minimum upper bound to the code point ordinals. API doc says: """ maxchar should be the true

Re: [Python-ideas] Adding str.isascii() ?

2018-01-26 Thread Victor Stinner
+1 The idea is not new and I like it. Naoki created https://bugs.python.org/issue32677 Victor 2018-01-26 11:22 GMT+01:00 Antoine Pitrou : > On Fri, 26 Jan 2018 17:42:31 +0900 > INADA Naoki > wrote: >> >> If str has str.isascii() method, it can be

Re: [Python-ideas] Adding str.isascii() ?

2018-01-26 Thread Antoine Pitrou
On Fri, 26 Jan 2018 17:42:31 +0900 INADA Naoki wrote: > > If str has str.isascii() method, it can be simpler: > > `if s.isascii() and s.isdigit():` > > I want to add it in Python 3.7 if there are no opposite opinions. +1 from me. Regards Antoine.

Re: [Python-ideas] Adding str.isascii() ?

2018-01-26 Thread M.-A. Lemburg
On 26.01.2018 10:44, INADA Naoki wrote: >> +1 >> >> Just a note: checking the header in CPython will only give a hint, >> since strings created using higher order kinds can still be 100% >> ASCII. >> > > Oh, really? > I think checking header is enough for all ready unicode. No, because you can

Re: [Python-ideas] Adding str.isascii() ?

2018-01-26 Thread INADA Naoki
> +1 > > Just a note: checking the header in CPython will only give a hint, > since strings created using higher order kinds can still be 100% > ASCII. > Oh, really? I think checking header is enough for all ready unicode. For example, this is _PyUnicode_EqualToASCIIString implementation: if

Re: [Python-ideas] Adding str.isascii() ?

2018-01-26 Thread M.-A. Lemburg
On 26.01.2018 09:53, Chris Angelico wrote: > On Fri, Jan 26, 2018 at 7:42 PM, INADA Naoki wrote: >> Hi. >> >> Currently, int(), str.isdigit(), str.isalnum(), etc... accepts >> non-ASCII strings. >> > s = 123" > s >> '123' > s.isdigit() >> True >

Re: [Python-ideas] Adding str.isascii() ?

2018-01-26 Thread Chris Angelico
On Fri, Jan 26, 2018 at 7:42 PM, INADA Naoki wrote: > Hi. > > Currently, int(), str.isdigit(), str.isalnum(), etc... accepts > non-ASCII strings. > s = 123" s > '123' s.isdigit() > True print(ascii(s)) > '\uff11\uff12\uff13' int(s) > 123 > > But