For the record, I encouraged this. I see no reason to amend my decision.
Only .isascii(), for str, bytes, bytearray. No ascii= flags to other
functions.
On Wed, Jan 31, 2018 at 4:17 AM, Serhiy Storchaka
wrote:
> 31.01.18 13:18, INADA Naoki пише:
>
>> Yes. But .isascii()
On Thu, Feb 1, 2018 at 12:44 AM, Victor Stinner
wrote:
> I like the idea of str.isdigit(ascii=True): would behave as
> str.isdigit() and str.isascii(). It's easy to implement and likely to
> be very efficient. I'm just not sure that it's so commonly required?
>
> At
I like the idea of str.isdigit(ascii=True): would behave as
str.isdigit() and str.isascii(). It's easy to implement and likely to
be very efficient. I'm just not sure that it's so commonly required?
At least, I guess that some users can be surprised that str.isdigit()
is "Unicode aware", accept
31.01.18 13:18, INADA Naoki пише:
Yes. But .isascii() will be match faster than try ...
.encode('ascii') ... except UnicodeEncodeError
on most Python implementations.
In this case this doesn't matter since this is an exceptional case, and
in any case an exception is raised for non-ascii
Hm, it seems I was too hurry to implement it...
>
> There were discussions about this. See for example
> https://bugs.python.org/issue18814.
>
> In short, there are two considerations that prevented adding this feature:
>
> 1. This function can have the constant computation complexity in CPython
26.01.18 10:42, INADA Naoki пише:
Currently, int(), str.isdigit(), str.isalnum(), etc... accepts
non-ASCII strings.
s = 123"
s
'123'
s.isdigit()
True
print(ascii(s))
'\uff11\uff12\uff13'
int(s)
123
But sometimes, we want to accept only ascii string. For example,
ipaddress module
On Tue, Jan 30, 2018 at 12:00 AM, Steven D'Aprano
wrote:
> > But it's also a readability question: "is_ascii()" and
> > "is_UCS2()/is_BMP()" just require knowing what 7-bit ASCII and UCS-2
> > (or the basic multilingual plane) *are*, whereas the current ways of
> > checking
On Mon, Jan 29, 2018 at 12:54:41PM -0800, Chris Barker wrote:
> I'm confused -- isn't the way to do this to encode your text into the
> encoding the other application accepts ?
Its more about warning the user of *my* application that the data
they're exporting could generate mojibake, or even
On Tue, Jan 30, 2018 at 03:12:52PM +1000, Nick Coghlan wrote:
[...]
> So this is partly an optimisation question:
>
> - folks want to avoid allocating a bytes object just to throw it away
> - folks want to avoid running the equivalent of "max(map(ord, text))"
> - folks know that CPython (at
On 30 January 2018 at 06:54, Chris Barker wrote:
> On Fri, Jan 26, 2018 at 5:27 PM, Steven D'Aprano
> wrote:
>>
>> tcl/tk and Javascript only support UCS-2 (16 bit) Unicode strings.
>> Dealing with the Supplementary Unicode Planes have the same
On Fri, Jan 26, 2018 at 5:27 PM, Steven D'Aprano
wrote:
> tcl/tk and Javascript only support UCS-2 (16 bit) Unicode strings.
> Dealing with the Supplementary Unicode Planes have the same problems
> that older "narrow" builds of Python sufferred from: single code points
>
On 1/27/2018 2:01 AM, Guido van Rossum wrote:
On Fri, Jan 26, 2018 at 8:22 PM, Terry Reedy
> wrote:
On 1/26/2018 8:27 PM, Steven D'Aprano wrote:
On Fri, Jan 26, 2018 at 02:37:14PM +0100, Victor Stinner wrote:
Really? I
On Fri, Jan 26, 2018 at 8:22 PM, Terry Reedy wrote:
> On 1/26/2018 8:27 PM, Steven D'Aprano wrote:
>
>> On Fri, Jan 26, 2018 at 02:37:14PM +0100, Victor Stinner wrote:
>>
>
> Really? I never required such check in practice. Would you mind to
>>> elaborate your use case?
>>>
>>
On 1/26/2018 8:27 PM, Steven D'Aprano wrote:
On Fri, Jan 26, 2018 at 02:37:14PM +0100, Victor Stinner wrote:
Really? I never required such check in practice. Would you mind to
elaborate your use case?
tcl/tk and Javascript only support UCS-2 (16 bit) Unicode strings.
Since IDLE is a
IMO the special status for isascii() matches the special status of ASCII as
encoding (yeah, I know, it's not the default encoding anywhere, but it
still comes up regularly in standards and as common subset of other
encodings).
Should you wish to check for compatibility with other ranges IMO some
On Fri, Jan 26, 2018 at 02:37:14PM +0100, Victor Stinner wrote:
> 2018-01-26 13:39 GMT+01:00 Steven D'Aprano :
> > I have no objection to isascii, but I don't think it goes far enough.
> > Sometimes I want to know whether a string is compatible with Latin-1 or
> > UCS-2 as
>
> That's fine with me. Please also add it to bytes and bytearray objects. It's
> okay if the implementation has to scan the string -- so do isdigit() etc.
>
> --
> --Guido van Rossum (python.org/~guido)
Thanks for your pronouncement!
I'll do it in this weekend.
Regards,
--
INADA Naoki
On Fri, Jan 26, 2018 at 12:42 AM, INADA Naoki
wrote:
> Currently, int(), str.isdigit(), str.isalnum(), etc... accepts
> non-ASCII strings.
>
> >>> s = 123"
> >>> s
> '123'
> >>> s.isdigit()
> True
> >>> print(ascii(s))
> '\uff11\uff12\uff13'
> >>> int(s)
> 123
>
> But
We have _PyUnicodeWriter for such use cases.
We may able to expose it as public API, but please start another thread
for it.
Unicode created by wrong maxchar is not supported from Python 3.3.
== and hash() doesn't work properly for such unicode object.
So str.isascii() has not to support it too.
On 26.01.2018 16:16, Random832 wrote:
> On Fri, Jan 26, 2018, at 09:18, M.-A. Lemburg wrote:
>> Is there a way to call an API which fixes the setting
>> (a public version of unicode_adjust_maxchar()) ?
>>
>> Without this, how would an extension be able to provide a
>> correct value upfront without
On Fri, Jan 26, 2018, at 09:18, M.-A. Lemburg wrote:
> Is there a way to call an API which fixes the setting
> (a public version of unicode_adjust_maxchar()) ?
>
> Without this, how would an extension be able to provide a
> correct value upfront without knowing the content ?
It obviously has to
No. See this mail.
https://mail.python.org/pipermail/python-ideas/2018-January/048748.html
Point is should we support invalid Unicode created by C API.
And I assume no.
2018/01/26 午後11:58 "Antoine Pitrou" :
On Fri, 26 Jan 2018 22:33:36 +0900
INADA Naoki
On 26.01.2018 15:58, Antoine Pitrou wrote:
> On Fri, 26 Jan 2018 22:33:36 +0900
> INADA Naoki
> wrote:
>>>
>>> Can you create a simple test-case that proves this?
>>
>> Sure.
>
> I think the question assumed "without writing custom C or ctypes code
> that deliberately
On Fri, 26 Jan 2018 22:33:36 +0900
INADA Naoki
wrote:
> >
> > Can you create a simple test-case that proves this?
>
> Sure.
I think the question assumed "without writing custom C or ctypes code
that deliberately builds a non-conformant unicode object" ;-)
Regards
On 26.01.2018 14:55, Victor Stinner wrote:
> 2018-01-26 14:43 GMT+01:00 M.-A. Lemburg :
>> If that's indeed being used as assumption, the docs must be
>> fixed and PyUnicode_New() should verify this assumption as
>> well - not only in debug builds using C asserts() :-)
>
> As
2018-01-26 14:43 GMT+01:00 M.-A. Lemburg :
> If that's indeed being used as assumption, the docs must be
> fixed and PyUnicode_New() should verify this assumption as
> well - not only in debug builds using C asserts() :-)
As PyUnicode_FromStringAndSize(NULL, size),
On 26.01.2018 14:31, Victor Stinner wrote:
> 2018-01-26 12:17 GMT+01:00 INADA Naoki :
>>> No, because you can pass in maxchar to PyUnicode_New() and
>>> the implementation will take this as hint to the max code point
>>> used in the string. There is no check done whether
2018-01-26 13:39 GMT+01:00 Steven D'Aprano :
> I have no objection to isascii, but I don't think it goes far enough.
> Sometimes I want to know whether a string is compatible with Latin-1 or
> UCS-2 as well as ASCII. For that, I used a function that exposes the
> size of code
>
> Can you create a simple test-case that proves this?
Sure.
$ git diff
diff --git a/Modules/_testcapimodule.c b/Modules/_testcapimodule.c
index 2ad4322eca..475d5219e1 100644
--- a/Modules/_testcapimodule.c
+++ b/Modules/_testcapimodule.c
@@ -5307,6 +5307,12 @@ PyInit__testcapi(void)
2018-01-26 12:17 GMT+01:00 INADA Naoki :
>> No, because you can pass in maxchar to PyUnicode_New() and
>> the implementation will take this as hint to the max code point
>> used in the string. There is no check done whether maxchar
>> is indeed the minimum upper bound to
On Fri, Jan 26, 2018 at 10:17 PM, INADA Naoki wrote:
>> No, because you can pass in maxchar to PyUnicode_New() and
>> the implementation will take this as hint to the max code point
>> used in the string. There is no check done whether maxchar
>> is indeed the minimum
Do you mean we should fix *all* of CPython unicode handling,
not only str.isascii()?
At least, equality test doesn't care wrong kind.
https://github.com/python/cpython/blob/master/Objects/stringlib/eq.h
On Fri, Jan 26, 2018 at 05:42:31PM +0900, INADA Naoki wrote:
> If str has str.isascii() method, it can be simpler:
>
> `if s.isascii() and s.isdigit():`
>
> I want to add it in Python 3.7 if there are no opposite opinions.
I have no objection to isascii, but I don't think it goes far enough.
> No, because you can pass in maxchar to PyUnicode_New() and
> the implementation will take this as hint to the max code point
> used in the string. There is no check done whether maxchar
> is indeed the minimum upper bound to the code point ordinals.
API doc says:
"""
maxchar should be the true
+1
The idea is not new and I like it.
Naoki created https://bugs.python.org/issue32677
Victor
2018-01-26 11:22 GMT+01:00 Antoine Pitrou :
> On Fri, 26 Jan 2018 17:42:31 +0900
> INADA Naoki
> wrote:
>>
>> If str has str.isascii() method, it can be
On Fri, 26 Jan 2018 17:42:31 +0900
INADA Naoki
wrote:
>
> If str has str.isascii() method, it can be simpler:
>
> `if s.isascii() and s.isdigit():`
>
> I want to add it in Python 3.7 if there are no opposite opinions.
+1 from me.
Regards
Antoine.
On 26.01.2018 10:44, INADA Naoki wrote:
>> +1
>>
>> Just a note: checking the header in CPython will only give a hint,
>> since strings created using higher order kinds can still be 100%
>> ASCII.
>>
>
> Oh, really?
> I think checking header is enough for all ready unicode.
No, because you can
> +1
>
> Just a note: checking the header in CPython will only give a hint,
> since strings created using higher order kinds can still be 100%
> ASCII.
>
Oh, really?
I think checking header is enough for all ready unicode.
For example, this is _PyUnicode_EqualToASCIIString implementation:
if
On 26.01.2018 09:53, Chris Angelico wrote:
> On Fri, Jan 26, 2018 at 7:42 PM, INADA Naoki wrote:
>> Hi.
>>
>> Currently, int(), str.isdigit(), str.isalnum(), etc... accepts
>> non-ASCII strings.
>>
> s = 123"
> s
>> '123'
> s.isdigit()
>> True
>
On Fri, Jan 26, 2018 at 7:42 PM, INADA Naoki wrote:
> Hi.
>
> Currently, int(), str.isdigit(), str.isalnum(), etc... accepts
> non-ASCII strings.
>
s = 123"
s
> '123'
s.isdigit()
> True
print(ascii(s))
> '\uff11\uff12\uff13'
int(s)
> 123
>
> But
40 matches
Mail list logo