Re: [Python-Dev] What do we do about bad slicing and possible crashes (issue 27867)
On 31 August 2016 at 05:06, Serhiy Storchakawrote: > On 30.08.16 20:42, Nick Coghlan wrote: >> Given Serhiy's clarification that this is primarily a thread safety >> problem, I'm more supportive of the "PySlice_GetIndicesForObject" >> approach (since that can call all the __index__ methods first, leaving >> the final __len__ call as the only problematic case). > > This doesn't work with multidimensional slicing (like _testbuffer.ndarray or > NumPy arrays). Thanks, that makes sense. >> However, given the observation that __len__ can also release the GIL, >> I'm not clear on how 2A is supposed to work - a poorly timed thread >> switch means there's always going to be a risk of len(obj) returning >> outdated information if a container implemented in Python is being >> mutated concurrently from different threads, so what can be done >> differently in the calling functions that couldn't be done in a new >> API that accepted the container reference? > > Current code doesn't use __len__. It uses something like > PyUnicode_GET_LENGTH(). Oh, I see - it's the usual rule that C code can be made implicitly atomic if it avoids calling hooks potentially written in Python, but pure Python containers need explicit locks to allow concurrent access from multiple threads. > The solution was found easier than I afraid. See my patch to issue27867. +1 from me. Would it make sense to make these public and new additions to the stable ABI for 3.6+? Cheers, Nick. -- Nick Coghlan | ncogh...@gmail.com | Brisbane, Australia ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] What do we do about bad slicing and possible crashes (issue 27867)
On 30.08.16 20:42, Nick Coghlan wrote: On 28 August 2016 at 08:25, Terry Reedywrote: Slicing can be made to malfunction and even crash with an 'evil' __index__ method. https://bugs.python.org/issue27867 The crux of the problem is this: PySlice_GetIndicesEx receives a slice object and a sequence length. Calling __index__ on the start, stop, and step components can mutate the sequence and invalidate the length. Adjusting the int values of start and stop according to an invalid length (in particular, one that is too long) will result in invalid results or a crash. Possible actions -- very briefly. For more see end of https://bugs.python.org/issue27867?@ok_message=msg 273801 0. Do nothing. 1. Detect length change and raise. 2. Retrieve length after any possible changes and proceed as normal. Possible implementation strategies for 1. and 2. A. Change all functions that call PySlice_GetIndicesEx. B. Add PySlice_GetIndicesEx2 (or ExEx?), which would receive *collection instead of length, so the length could be retrieved after the __index__ calls. Change calls. Deprecate PySlice_GetIndicesEx. Given Serhiy's clarification that this is primarily a thread safety problem, I'm more supportive of the "PySlice_GetIndicesForObject" approach (since that can call all the __index__ methods first, leaving the final __len__ call as the only problematic case). This doesn't work with multidimensional slicing (like _testbuffer.ndarray or NumPy arrays). However, given the observation that __len__ can also release the GIL, I'm not clear on how 2A is supposed to work - a poorly timed thread switch means there's always going to be a risk of len(obj) returning outdated information if a container implemented in Python is being mutated concurrently from different threads, so what can be done differently in the calling functions that couldn't be done in a new API that accepted the container reference? Current code doesn't use __len__. It uses something like PyUnicode_GET_LENGTH(). The solution was found easier than I afraid. See my patch to issue27867. ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] What do we do about bad slicing and possible crashes (issue 27867)
On 28 August 2016 at 08:25, Terry Reedywrote: > Slicing can be made to malfunction and even crash with an 'evil' __index__ > method. https://bugs.python.org/issue27867 > > The crux of the problem is this: PySlice_GetIndicesEx > receives a slice object and a sequence length. Calling __index__ on the > start, stop, and step components can mutate the sequence and invalidate the > length. Adjusting the int values of start and stop according to an invalid > length (in particular, one that is too long) will result in invalid results > or a crash. > > Possible actions -- very briefly. For more see end of > https://bugs.python.org/issue27867?@ok_message=msg 273801 > 0. Do nothing. > 1. Detect length change and raise. > 2. Retrieve length after any possible changes and proceed as normal. > > Possible implementation strategies for 1. and 2. > A. Change all functions that call PySlice_GetIndicesEx. > B. Add PySlice_GetIndicesEx2 (or ExEx?), which would receive *collection > instead of length, so the length could be retrieved after the __index__ > calls. Change calls. Deprecate PySlice_GetIndicesEx. Given Serhiy's clarification that this is primarily a thread safety problem, I'm more supportive of the "PySlice_GetIndicesForObject" approach (since that can call all the __index__ methods first, leaving the final __len__ call as the only problematic case). However, given the observation that __len__ can also release the GIL, I'm not clear on how 2A is supposed to work - a poorly timed thread switch means there's always going to be a risk of len(obj) returning outdated information if a container implemented in Python is being mutated concurrently from different threads, so what can be done differently in the calling functions that couldn't be done in a new API that accepted the container reference? Cheers, Nick. -- Nick Coghlan | ncogh...@gmail.com | Brisbane, Australia ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] What do we do about bad slicing and possible crashes (issue 27867)
On 30.08.16 15:31, Dima Tisnek wrote: On 30 August 2016 at 14:13, Serhiy Storchakawrote: 1. Detect length change and raise. It would be simpler solution. But I afraid that this can break third-party code that "just works" now. For example slicing a list "just works" if step is 1. It can return not what the author expected if a list grows, but it never crashes, and existing code can depends on current behavior. This solution is not applicable in maintained versions. Serhiy, If dictionary is iterated in thread1 while thread2 changes the dictionary, thread1 currently raises RuntimeError. Would cloning current dict behaviour to slice with overridden __index__ make sense? No, these are different things. The problem with dict iterating is unavoidable, but slicing can be defined consistently (as described by Terry in option 2). Changing a dict can change the order and invalidates iterators (except trivial cases of just created or finished iterators). But slicing can be atomic (and it is atomic de facto in many cases), we just should call all __index__-es before looking on a sequence. I'd argue 3rd party code depends on slicing not to raise an exception, is same as 3rd party code depending on dict iteration not to raise and exception; If same container may be concurrently used in another thread, then 3rd party code is actually buggy. It's OK to break such code. We shouldn't break third-party code in maintained releases. De facto slicing is atomic now in many cases, and it is nowhere documented that it is not atomic. The problem only with using broken by design PySlice_GetIndicesEx() in CPython. If CPython would implemented without PySlice_GetIndicesEx() (with more cumbersome code), it is likely this issue wouldn't be raised. ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] What do we do about bad slicing and possible crashes (issue 27867)
On Tue, Aug 30, 2016 at 03:11:25PM +0200, Maciej Fijalkowski wrote: > It's more complicated - if the third party rely on the code working > when one thread slices while the other thread modifies that gives > implicit atomicity requirements. Those specific requirements are very > hard to maintain across the python versions and python > implementations. Replicating the exact CPython behavior (for each > CPython version too!) is a major nightmare for such specific > scenarios. > > I propose the following: > > * we raise an error if detected +1 Stefan Krah ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] What do we do about bad slicing and possible crashes (issue 27867)
On Tue, Aug 30, 2016 at 2:31 PM, Dima Tisnekwrote: > On 30 August 2016 at 14:13, Serhiy Storchaka wrote: >>> 1. Detect length change and raise. >> >> >> It would be simpler solution. But I afraid that this can break third-party >> code that "just works" now. For example slicing a list "just works" if step >> is 1. It can return not what the author expected if a list grows, but it >> never crashes, and existing code can depends on current behavior. This >> solution is not applicable in maintained versions. > > Serhiy, > > If dictionary is iterated in thread1 while thread2 changes the > dictionary, thread1 currently raises RuntimeError. > > Would cloning current dict behaviour to slice with overridden > __index__ make sense? > > > I'd argue 3rd party code depends on slicing not to raise an exception, > is same as 3rd party code depending on dict iteration not to raise and > exception; If same container may be concurrently used in another > thread, then 3rd party code is actually buggy. It's OK to break such > code. > > > Just my 2c. > ___ > Python-Dev mailing list > Python-Dev@python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: > https://mail.python.org/mailman/options/python-dev/fijall%40gmail.com I'm with Dima here. It's more complicated - if the third party rely on the code working when one thread slices while the other thread modifies that gives implicit atomicity requirements. Those specific requirements are very hard to maintain across the python versions and python implementations. Replicating the exact CPython behavior (for each CPython version too!) is a major nightmare for such specific scenarios. I propose the following: * we raise an error if detected -or- * we define the exact behavior what it means to modify the collection in one thread while the other is slicing it (what do you get? what are the guarantees? does it also apply if the list is resized?) ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] What do we do about bad slicing and possible crashes (issue 27867)
On 30 August 2016 at 14:13, Serhiy Storchakawrote: >> 1. Detect length change and raise. > > > It would be simpler solution. But I afraid that this can break third-party > code that "just works" now. For example slicing a list "just works" if step > is 1. It can return not what the author expected if a list grows, but it > never crashes, and existing code can depends on current behavior. This > solution is not applicable in maintained versions. Serhiy, If dictionary is iterated in thread1 while thread2 changes the dictionary, thread1 currently raises RuntimeError. Would cloning current dict behaviour to slice with overridden __index__ make sense? I'd argue 3rd party code depends on slicing not to raise an exception, is same as 3rd party code depending on dict iteration not to raise and exception; If same container may be concurrently used in another thread, then 3rd party code is actually buggy. It's OK to break such code. Just my 2c. ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] What do we do about bad slicing and possible crashes (issue 27867)
On 28.08.16 01:25, Terry Reedy wrote: 0. Do nothing. The problem is not in pathological __index__. The problem is in executing Python code and releasing GIL. In multithread production code one thread can read a slice when other thread modifies a collection. In very very rare case it causes a crash (or worse, a corruption of data). We shouldn't left it as is. 1. Detect length change and raise. It would be simpler solution. But I afraid that this can break third-party code that "just works" now. For example slicing a list "just works" if step is 1. It can return not what the author expected if a list grows, but it never crashes, and existing code can depends on current behavior. This solution is not applicable in maintained versions. 2. Retrieve length after any possible changes and proceed as normal. This behavior looks the most expectable to me, but needs more work. B. Add PySlice_GetIndicesEx2 (or ExEx?), which would receive *collection instead of length, so the length could be retrieved after the __index__ calls. Change calls. Deprecate PySlice_GetIndicesEx. This is not always possible. The limit for a slice is not always the length of the collection (see multidimensional arrays). And how to determine the length? Call __len__? It can be overridden in Python, this causes releasing GIL, switching to other thread and modifying the collection. The original problem is returned. And what versions should be patched? Since this is heisenbug that can cause a crash, I think we should apply some solutions for all versions. But in develop version we a free to introduce small incompatibility. I prefer 2A in maintained versions (may be including 3.3 and 3.4) and 2A or 1A in 3.6. ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] What do we do about bad slicing and possible crashes (issue 27867)
On 08/28/2016 09:26 AM, Nick Coghlan wrote: On 28 August 2016 at 08:25, Terry Reedywrote: Slicing can be made to malfunction and even crash with an 'evil' __index__ method. https://bugs.python.org/issue27867 The crux of the problem is this: PySlice_GetIndicesEx receives a slice object and a sequence length. Calling __index__ on the start, stop, and step components can mutate the sequence and invalidate the length. Adjusting the int values of start and stop according to an invalid length (in particular, one that is too long) will result in invalid results or a crash. Possible actions -- very briefly. For more see end of https://bugs.python.org/issue27867?@ok_message=msg 273801 0. Do nothing. 1. Detect length change and raise. I suggest taking this path - it's the lowest impact, and akin to the "dictionary changed size during iteration" runtime error. +1. Being able to do such strange things with list but not dict would be irritating and a nuisance (although maybe not attractive ;) . __index__ having side effects is pathological code behaviour, so we really just need to prevent the interpreter crash, rather than trying to make it sense of it. Agreed. -- ~Ethan~ ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] What do we do about bad slicing and possible crashes (issue 27867)
On 28 August 2016 at 08:25, Terry Reedywrote: > Slicing can be made to malfunction and even crash with an 'evil' __index__ > method. https://bugs.python.org/issue27867 > > The crux of the problem is this: PySlice_GetIndicesEx > receives a slice object and a sequence length. Calling __index__ on the > start, stop, and step components can mutate the sequence and invalidate the > length. Adjusting the int values of start and stop according to an invalid > length (in particular, one that is too long) will result in invalid results > or a crash. > > Possible actions -- very briefly. For more see end of > https://bugs.python.org/issue27867?@ok_message=msg 273801 > 0. Do nothing. > 1. Detect length change and raise. I suggest taking this path - it's the lowest impact, and akin to the "dictionary changed size during iteration" runtime error. __index__ having side effects is pathological code behaviour, so we really just need to prevent the interpreter crash, rather than trying to make it sense of it. Cheers, Nick. -- Nick Coghlan | ncogh...@gmail.com | Brisbane, Australia ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
[Python-Dev] What do we do about bad slicing and possible crashes (issue 27867)
Slicing can be made to malfunction and even crash with an 'evil' __index__ method. https://bugs.python.org/issue27867 The crux of the problem is this: PySlice_GetIndicesEx receives a slice object and a sequence length. Calling __index__ on the start, stop, and step components can mutate the sequence and invalidate the length. Adjusting the int values of start and stop according to an invalid length (in particular, one that is too long) will result in invalid results or a crash. Possible actions -- very briefly. For more see end of https://bugs.python.org/issue27867?@ok_message=msg 273801 0. Do nothing. 1. Detect length change and raise. 2. Retrieve length after any possible changes and proceed as normal. Possible implementation strategies for 1. and 2. A. Change all functions that call PySlice_GetIndicesEx. B. Add PySlice_GetIndicesEx2 (or ExEx?), which would receive *collection instead of length, so the length could be retrieved after the __index__ calls. Change calls. Deprecate PySlice_GetIndicesEx. Which of the 4 possible patches, if any, would be best? I personally prefer 2B. And what versions should be patched? -- Terry Jan Reedy ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com