Re: [Python-Dev] What do we do about bad slicing and possible crashes (issue 27867)

2016-08-30 Thread Nick Coghlan
On 31 August 2016 at 05:06, Serhiy Storchaka  wrote:
> On 30.08.16 20:42, Nick Coghlan wrote:
>> Given Serhiy's clarification that this is primarily a thread safety
>> problem, I'm more supportive of the "PySlice_GetIndicesForObject"
>> approach (since that can call all the __index__ methods first, leaving
>> the final __len__ call as the only problematic case).
>
> This doesn't work with multidimensional slicing (like _testbuffer.ndarray or
> NumPy arrays).

Thanks, that makes sense.

>> However, given the observation that __len__ can also release the GIL,
>> I'm not clear on how 2A is supposed to work - a poorly timed thread
>> switch means there's always going to be a risk of len(obj) returning
>> outdated information if a container implemented in Python is being
>> mutated concurrently from different threads, so what can be done
>> differently in the calling functions that couldn't be done in a new
>> API that accepted the container reference?
>
> Current code doesn't use __len__. It uses something like
> PyUnicode_GET_LENGTH().

Oh, I see - it's the usual rule that C code can be made implicitly
atomic if it avoids calling hooks potentially written in Python, but
pure Python containers need explicit locks to allow concurrent access
from multiple threads.

> The solution was found easier than I afraid. See my patch to issue27867.

+1 from me. Would it make sense to make these public and new additions
to the stable ABI for 3.6+?

Cheers,
Nick.

-- 
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] What do we do about bad slicing and possible crashes (issue 27867)

2016-08-30 Thread Serhiy Storchaka

On 30.08.16 20:42, Nick Coghlan wrote:

On 28 August 2016 at 08:25, Terry Reedy  wrote:

Slicing can be made to malfunction and even crash with an 'evil' __index__
method. https://bugs.python.org/issue27867

The crux of the problem is this: PySlice_GetIndicesEx
receives a slice object and a sequence length.  Calling __index__ on the
start, stop, and step components can mutate the sequence and invalidate the
length.  Adjusting the int values of start and stop according to an invalid
length (in particular, one that is too long) will result in invalid results
or a crash.

Possible actions -- very briefly.  For more see end of
https://bugs.python.org/issue27867?@ok_message=msg 273801
0. Do nothing.
1. Detect length change and raise.
2. Retrieve length after any possible changes and proceed as normal.

Possible implementation strategies for 1. and 2.
A. Change all functions that call PySlice_GetIndicesEx.
B. Add PySlice_GetIndicesEx2 (or ExEx?), which would receive *collection
instead of length, so the length could be retrieved after the __index__
calls.  Change calls. Deprecate PySlice_GetIndicesEx.


Given Serhiy's clarification that this is primarily a thread safety
problem, I'm more supportive of the "PySlice_GetIndicesForObject"
approach (since that can call all the __index__ methods first, leaving
the final __len__ call as the only problematic case).


This doesn't work with multidimensional slicing (like 
_testbuffer.ndarray or NumPy arrays).



However, given the observation that __len__ can also release the GIL,
I'm not clear on how 2A is supposed to work - a poorly timed thread
switch means there's always going to be a risk of len(obj) returning
outdated information if a container implemented in Python is being
mutated concurrently from different threads, so what can be done
differently in the calling functions that couldn't be done in a new
API that accepted the container reference?


Current code doesn't use __len__. It uses something like 
PyUnicode_GET_LENGTH().


The solution was found easier than I afraid. See my patch to issue27867.


___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] What do we do about bad slicing and possible crashes (issue 27867)

2016-08-30 Thread Nick Coghlan
On 28 August 2016 at 08:25, Terry Reedy  wrote:
> Slicing can be made to malfunction and even crash with an 'evil' __index__
> method. https://bugs.python.org/issue27867
>
> The crux of the problem is this: PySlice_GetIndicesEx
> receives a slice object and a sequence length.  Calling __index__ on the
> start, stop, and step components can mutate the sequence and invalidate the
> length.  Adjusting the int values of start and stop according to an invalid
> length (in particular, one that is too long) will result in invalid results
> or a crash.
>
> Possible actions -- very briefly.  For more see end of
> https://bugs.python.org/issue27867?@ok_message=msg 273801
> 0. Do nothing.
> 1. Detect length change and raise.
> 2. Retrieve length after any possible changes and proceed as normal.
>
> Possible implementation strategies for 1. and 2.
> A. Change all functions that call PySlice_GetIndicesEx.
> B. Add PySlice_GetIndicesEx2 (or ExEx?), which would receive *collection
> instead of length, so the length could be retrieved after the __index__
> calls.  Change calls. Deprecate PySlice_GetIndicesEx.

Given Serhiy's clarification that this is primarily a thread safety
problem, I'm more supportive of the "PySlice_GetIndicesForObject"
approach (since that can call all the __index__ methods first, leaving
the final __len__ call as the only problematic case).

However, given the observation that __len__ can also release the GIL,
I'm not clear on how 2A is supposed to work - a poorly timed thread
switch means there's always going to be a risk of len(obj) returning
outdated information if a container implemented in Python is being
mutated concurrently from different threads, so what can be done
differently in the calling functions that couldn't be done in a new
API that accepted the container reference?

Cheers,
Nick.

-- 
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] What do we do about bad slicing and possible crashes (issue 27867)

2016-08-30 Thread Serhiy Storchaka

On 30.08.16 15:31, Dima Tisnek wrote:

On 30 August 2016 at 14:13, Serhiy Storchaka  wrote:

1. Detect length change and raise.



It would be simpler solution. But I afraid that this can break third-party
code that "just works" now. For example slicing a list "just works" if step
is 1. It can return not what the author expected if a list grows, but it
never crashes, and existing code can depends on current behavior. This
solution is not applicable in maintained versions.


Serhiy,

If dictionary is iterated in thread1 while thread2 changes the
dictionary, thread1 currently raises RuntimeError.

Would cloning current dict behaviour to slice with overridden
__index__ make sense?


No, these are different things. The problem with dict iterating is 
unavoidable, but slicing can be defined consistently (as described by 
Terry in option 2). Changing a dict can change the order and invalidates 
iterators (except trivial cases of just created or finished iterators). 
But slicing can be atomic (and it is atomic de facto in many cases), we 
just should call all __index__-es before looking on a sequence.



I'd argue 3rd party code depends on slicing not to raise an exception,
is same as 3rd party code depending on dict iteration not to raise and
exception; If same container may be concurrently used in another
thread, then 3rd party code is actually buggy. It's OK to break such
code.


We shouldn't break third-party code in maintained releases. De facto 
slicing is atomic now in many cases, and it is nowhere documented that 
it is not atomic. The problem only with using broken by design 
PySlice_GetIndicesEx() in CPython. If CPython would implemented without 
PySlice_GetIndicesEx() (with more cumbersome code), it is likely this 
issue wouldn't be raised.



___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] What do we do about bad slicing and possible crashes (issue 27867)

2016-08-30 Thread Stefan Krah
On Tue, Aug 30, 2016 at 03:11:25PM +0200, Maciej Fijalkowski wrote:
> It's more complicated - if the third party rely on the code working
> when one thread slices while the other thread modifies that gives
> implicit atomicity requirements. Those specific requirements are very
> hard to maintain across the python versions and python
> implementations. Replicating the exact CPython behavior (for each
> CPython version too!) is a major nightmare for such specific
> scenarios.
> 
> I propose the following:
> 
> * we raise an error if detected

+1


Stefan Krah

___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] What do we do about bad slicing and possible crashes (issue 27867)

2016-08-30 Thread Maciej Fijalkowski
On Tue, Aug 30, 2016 at 2:31 PM, Dima Tisnek  wrote:
> On 30 August 2016 at 14:13, Serhiy Storchaka  wrote:
>>> 1. Detect length change and raise.
>>
>>
>> It would be simpler solution. But I afraid that this can break third-party
>> code that "just works" now. For example slicing a list "just works" if step
>> is 1. It can return not what the author expected if a list grows, but it
>> never crashes, and existing code can depends on current behavior. This
>> solution is not applicable in maintained versions.
>
> Serhiy,
>
> If dictionary is iterated in thread1 while thread2 changes the
> dictionary, thread1 currently raises RuntimeError.
>
> Would cloning current dict behaviour to slice with overridden
> __index__ make sense?
>
>
> I'd argue 3rd party code depends on slicing not to raise an exception,
> is same as 3rd party code depending on dict iteration not to raise and
> exception; If same container may be concurrently used in another
> thread, then 3rd party code is actually buggy. It's OK to break such
> code.
>
>
> Just my 2c.
> ___
> Python-Dev mailing list
> Python-Dev@python.org
> https://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe: 
> https://mail.python.org/mailman/options/python-dev/fijall%40gmail.com

I'm with Dima here.

It's more complicated - if the third party rely on the code working
when one thread slices while the other thread modifies that gives
implicit atomicity requirements. Those specific requirements are very
hard to maintain across the python versions and python
implementations. Replicating the exact CPython behavior (for each
CPython version too!) is a major nightmare for such specific
scenarios.

I propose the following:

* we raise an error if detected

-or-

* we define the exact behavior what it means to modify the collection
in one thread while the other is slicing it (what do you get? what are
the guarantees? does it also apply if the list is resized?)
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] What do we do about bad slicing and possible crashes (issue 27867)

2016-08-30 Thread Dima Tisnek
On 30 August 2016 at 14:13, Serhiy Storchaka  wrote:
>> 1. Detect length change and raise.
>
>
> It would be simpler solution. But I afraid that this can break third-party
> code that "just works" now. For example slicing a list "just works" if step
> is 1. It can return not what the author expected if a list grows, but it
> never crashes, and existing code can depends on current behavior. This
> solution is not applicable in maintained versions.

Serhiy,

If dictionary is iterated in thread1 while thread2 changes the
dictionary, thread1 currently raises RuntimeError.

Would cloning current dict behaviour to slice with overridden
__index__ make sense?


I'd argue 3rd party code depends on slicing not to raise an exception,
is same as 3rd party code depending on dict iteration not to raise and
exception; If same container may be concurrently used in another
thread, then 3rd party code is actually buggy. It's OK to break such
code.


Just my 2c.
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] What do we do about bad slicing and possible crashes (issue 27867)

2016-08-30 Thread Serhiy Storchaka

On 28.08.16 01:25, Terry Reedy wrote:

0. Do nothing.


The problem is not in pathological __index__. The problem is in 
executing Python code and releasing GIL. In multithread production code 
one thread can read a slice when other thread modifies a collection. In 
very very rare case it causes a crash (or worse, a corruption of data). 
We shouldn't left it as is.



1. Detect length change and raise.


It would be simpler solution. But I afraid that this can break 
third-party code that "just works" now. For example slicing a list "just 
works" if step is 1. It can return not what the author expected if a 
list grows, but it never crashes, and existing code can depends on 
current behavior. This solution is not applicable in maintained versions.



2. Retrieve length after any possible changes and proceed as normal.


This behavior looks the most expectable to me, but needs more work.


B. Add PySlice_GetIndicesEx2 (or ExEx?), which would receive *collection
instead of length, so the length could be retrieved after the __index__
calls.  Change calls. Deprecate PySlice_GetIndicesEx.


This is not always possible. The limit for a slice is not always the 
length of the collection (see multidimensional arrays). And how to 
determine the length? Call __len__? It can be overridden in Python, this 
causes releasing GIL, switching to other thread and modifying the 
collection. The original problem is returned.



And what versions should be patched?


Since this is heisenbug that can cause a crash, I think we should apply 
some solutions for all versions. But in develop version we a free to 
introduce small incompatibility.


I prefer 2A in maintained versions (may be including 3.3 and 3.4) and 2A 
or 1A in 3.6.



___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] What do we do about bad slicing and possible crashes (issue 27867)

2016-08-29 Thread Ethan Furman

On 08/28/2016 09:26 AM, Nick Coghlan wrote:

On 28 August 2016 at 08:25, Terry Reedy  wrote:

Slicing can be made to malfunction and even crash with an 'evil' __index__
method. https://bugs.python.org/issue27867

The crux of the problem is this: PySlice_GetIndicesEx
receives a slice object and a sequence length.  Calling __index__ on the
start, stop, and step components can mutate the sequence and invalidate the
length.  Adjusting the int values of start and stop according to an invalid
length (in particular, one that is too long) will result in invalid results
or a crash.

Possible actions -- very briefly.  For more see end of
https://bugs.python.org/issue27867?@ok_message=msg 273801
0. Do nothing.
1. Detect length change and raise.


I suggest taking this path - it's the lowest impact, and akin to the
"dictionary changed size during iteration" runtime error.


+1.  Being able to do such strange things with list but not dict would be 
irritating and a nuisance (although maybe not attractive ;) .


__index__ having side effects is pathological code behaviour, so we
really just need to prevent the interpreter crash, rather than trying
to make it sense of it.


Agreed.

--
~Ethan~
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] What do we do about bad slicing and possible crashes (issue 27867)

2016-08-28 Thread Nick Coghlan
On 28 August 2016 at 08:25, Terry Reedy  wrote:
> Slicing can be made to malfunction and even crash with an 'evil' __index__
> method. https://bugs.python.org/issue27867
>
> The crux of the problem is this: PySlice_GetIndicesEx
> receives a slice object and a sequence length.  Calling __index__ on the
> start, stop, and step components can mutate the sequence and invalidate the
> length.  Adjusting the int values of start and stop according to an invalid
> length (in particular, one that is too long) will result in invalid results
> or a crash.
>
> Possible actions -- very briefly.  For more see end of
> https://bugs.python.org/issue27867?@ok_message=msg 273801
> 0. Do nothing.
> 1. Detect length change and raise.

I suggest taking this path - it's the lowest impact, and akin to the
"dictionary changed size during iteration" runtime error.

__index__ having side effects is pathological code behaviour, so we
really just need to prevent the interpreter crash, rather than trying
to make it sense of it.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] What do we do about bad slicing and possible crashes (issue 27867)

2016-08-27 Thread Terry Reedy
Slicing can be made to malfunction and even crash with an 'evil' 
__index__ method. https://bugs.python.org/issue27867


The crux of the problem is this: PySlice_GetIndicesEx
receives a slice object and a sequence length.  Calling __index__ on the 
start, stop, and step components can mutate the sequence and invalidate 
the length.  Adjusting the int values of start and stop according to an 
invalid length (in particular, one that is too long) will result in 
invalid results or a crash.


Possible actions -- very briefly.  For more see end of
https://bugs.python.org/issue27867?@ok_message=msg 273801
0. Do nothing.
1. Detect length change and raise.
2. Retrieve length after any possible changes and proceed as normal.

Possible implementation strategies for 1. and 2.
A. Change all functions that call PySlice_GetIndicesEx.
B. Add PySlice_GetIndicesEx2 (or ExEx?), which would receive *collection 
instead of length, so the length could be retrieved after the __index__ 
calls.  Change calls. Deprecate PySlice_GetIndicesEx.


Which of the 4 possible patches, if any, would be best?
I personally prefer 2B.

And what versions should be patched?

--
Terry Jan Reedy

___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com