Re: Thread-safe way to add a key to a dict only if it isn't already there?
Chris Angelico : > On Mon, Jul 9, 2018 at 5:18 AM, Marko Rauhamaa wrote: >> Chris Angelico : >>> Are you assuming that Python's semantics are defined by the semantics >>> of one possible implementation language? >> >> What are Python's semantics defined by? I've been using these: >> >>https://docs.python.org/3/reference/> >> >>https://docs.python.org/3/library/> >> >> Unfortunately, neither spec says anything about the atomicity of >> dict.setdefault(). >> >> Therefore, the application programmer must assume it is not atomic. In >> fact, as brought up in this discussion, the consultation of >> object.__hash__() and object.__eq__() almost guarantee the >> *non*-atomicity of dict.setdefault(). > > If by "atomic" you mean that absolutely no context switch can occur > during setdefault, then it probably isn't. But the point of an atomic > query/update operation is that there are exactly two possibilities: > > 1) The key did not exist in the dictionary. It now does, with the > provided default, which was returned. > 2) The key did exist in the dictionary. The provided default is > ignored, and the previous value is returned. This is a classic test-and-set race condition: # Initially assert "A" not in d # Thread 1 d.setdefault("A", 1) # Thread 2 d["A"] = 2 If dict.setdefault() is atomic, the end result in all timings must be: d["A"] == 2 However, if dict.setdefault() is not atomic, the end result may also be: d["A"] == 1 > Neither object.__hash__ nor object.__eq__ gives any way for this to be > violated (unless you mess with the concept of "the key did exist in > the dictionary" by breaking the definition of equality, but that's > nothing to do with atomicity). Here's the definition of setdefault: > > setdefault(key, default=None, /) method of builtins.dict instance > Insert key with a value of default if key is not in the dictionary. > > Return the value for key if key is in the dictionary, else default. > > Simple semantics. Straight-forward. Now, maybe there's a bug in some > implementation whereby threads can violate this; but that would be a > bug to be fixed, and nothing more. You should be able to assume that > it will behave as stated. Since atomicity is not guaranteed by the definition of the method, the application programmer must be prepared for the end result to be 1 (or even something completely different because the Language Specification doesn't say anything about the outcome of data races). Marko -- https://mail.python.org/mailman/listinfo/python-list
Re: Thread-safe way to add a key to a dict only if it isn't already there?
On Mon, Jul 9, 2018 at 5:18 AM, Marko Rauhamaa wrote: > Chris Angelico : >> Are you assuming that Python's semantics are defined by the semantics >> of one possible implementation language? > > What are Python's semantics defined by? I've been using these: > >https://docs.python.org/3/reference/> > >https://docs.python.org/3/library/> > > Unfortunately, neither spec says anything about the atomicity of > dict.setdefault(). > > Therefore, the application programmer must assume it is not atomic. In > fact, as brought up in this discussion, the consultation of > object.__hash__() and object.__eq__() almost guarantee the > *non*-atomicity of dict.setdefault(). If by "atomic" you mean that absolutely no context switch can occur during setdefault, then it probably isn't. But the point of an atomic query/update operation is that there are exactly two possibilities: 1) The key did not exist in the dictionary. It now does, with the provided default, which was returned. 2) The key did exist in the dictionary. The provided default is ignored, and the previous value is returned. Neither object.__hash__ nor object.__eq__ gives any way for this to be violated (unless you mess with the concept of "the key did exist in the dictionary" by breaking the definition of equality, but that's nothing to do with atomicity). Here's the definition of setdefault: setdefault(key, default=None, /) method of builtins.dict instance Insert key with a value of default if key is not in the dictionary. Return the value for key if key is in the dictionary, else default. Simple semantics. Straight-forward. Now, maybe there's a bug in some implementation whereby threads can violate this; but that would be a bug to be fixed, and nothing more. You should be able to assume that it will behave as stated. ChrisA -- https://mail.python.org/mailman/listinfo/python-list
Re: Thread-safe way to add a key to a dict only if it isn't already there?
Chris Angelico : > Are you assuming that Python's semantics are defined by the semantics > of one possible implementation language? What are Python's semantics defined by? I've been using these: https://docs.python.org/3/reference/> https://docs.python.org/3/library/> Unfortunately, neither spec says anything about the atomicity of dict.setdefault(). Therefore, the application programmer must assume it is not atomic. In fact, as brought up in this discussion, the consultation of object.__hash__() and object.__eq__() almost guarantee the *non*-atomicity of dict.setdefault(). Marko -- https://mail.python.org/mailman/listinfo/python-list
Re: Thread-safe way to add a key to a dict only if it isn't already there?
On Mon, Jul 9, 2018 at 4:57 AM, Marko Rauhamaa wrote: > Chris Angelico : > >> On Mon, Jul 9, 2018 at 2:11 AM, Marko Rauhamaa wrote: >>> MRAB : In C you'd declare 'quit' as 'volatile' to tell the compiler that it could change unexpectedly, so don't make that assumption. >>> >>> C is an even tougher case. Even if the compiler kept on checking a >>> volatile value, the CPU might never propagate the cache content to >>> the other core. You'd need a memory barrier. In Java, "volatile" >>> effectively creates a memory barrier, but in C (and C++) it does not. >>> In C you need something like a mutex to see the effects of other >>> threads running. >>> >>> (BTW, I think that's a terrible thing for the C standards committee to >>> specify.) >> >> None of this has any impact on Python whatsoever. > > [citation needed] > Why? You might as well say "in Blurple, all integers greater than 5 compare equal to each other, and it's possible to implement a Python interpreter in Blurple, therefore we can't trust integer comparisons in Python". It's ridiculous to consider. The languages are completely independent. Are you assuming that Python's semantics are defined by the semantics of one possible implementation language? ChrisA -- https://mail.python.org/mailman/listinfo/python-list
Re: Thread-safe way to add a key to a dict only if it isn't already there?
Chris Angelico : > On Mon, Jul 9, 2018 at 2:11 AM, Marko Rauhamaa wrote: >> MRAB : >>> In C you'd declare 'quit' as 'volatile' to tell the compiler that it >>> could change unexpectedly, so don't make that assumption. >> >> C is an even tougher case. Even if the compiler kept on checking a >> volatile value, the CPU might never propagate the cache content to >> the other core. You'd need a memory barrier. In Java, "volatile" >> effectively creates a memory barrier, but in C (and C++) it does not. >> In C you need something like a mutex to see the effects of other >> threads running. >> >> (BTW, I think that's a terrible thing for the C standards committee to >> specify.) > > None of this has any impact on Python whatsoever. [citation needed] Marko -- https://mail.python.org/mailman/listinfo/python-list
Re: Thread-safe way to add a key to a dict only if it isn't already there?
On Sun, 08 Jul 2018 16:37:11 +0100, MRAB wrote: > On 2018-07-08 14:38, Steven D'Aprano wrote: >> On Sun, 08 Jul 2018 14:11:58 +0300, Marko Rauhamaa wrote: >> > [snip] >>> More importantly, this loop may never finish: >>> >>> # Initially >>> quit = False >>> >>> # Thread 1 >>> global quit >>> while not quit: >>> time.sleep(1) >>> >>> # Thread 2 >>> global quit >>> quit = True >> >> Assuming that thread 2 actually runs *at some point*, I don't see how >> that can't terminate. Neither thread sets quit to False, so provided >> thread 2 runs at all, it has to terminate. >> > [snip] > > The compiler could look at the code for thread 1 and see that 'quit' is > never assigned to, meaning that it could be "optimised" to: > > global quit > if not quit: > while True: > time.sleep(1) I'm glad you put "optimized" in scare quotes there, because any optimizer that did that to code that runs in a thread is buggy. The re-write has changed the semantics of the code. Of course a compiler "could" do anything. If it re-wrote the code to instead perform: while True: print(math.sin(random.random()+1) instead, we'd have no trouble recognising that the compiler has changed the semantics of our code to something we didn't write, and in just about every language apart from C, we would rightly call it a compiler bug. If I write "Do X", and the compiler instead executes "Do Y", that's a compiler bug. The compiler has one job: to take my source code and convert it to something which can be executed, and if it cannot do that faithfully, it is buggy. C is the special case: C programmers routinely blame *themselves* when the compiler changes "Do X" to "Do Y", because the standard says it is permitted to do anything it likes in the event of undefined behaviour. Talk about victims coming to rationalise their own abuse and identify with their abuser. W.A. Wulf might have been talking about C compilers when he wrote: "More computing sins are committed in the name of efficiency (without necessarily achieving it) than for any other single reason — including blind stupidity." > In C you'd declare 'quit' as 'volatile' to tell the compiler that it > could change unexpectedly, so don't make that assumption. You shouldn't need to tell the compiler to assume that the variable could change. In multi-threaded code, there's no justification for assuming the variable can't change unless you've done a whole-program analysis and can prove that *no* thread ever writes to that variable. Even in single-threaded code, I think that optimizations which change execution order are dubious. There's far too much opportunity for "it's not a bug, because the specification says we can deny it is a bug" faults. It's simply *bad engineering practice*. When engineers design a bridge or a road or a building, the builders have to faithfully follow the engineer's design, unless the design itself explicitly allows them to make substitutions. "The blueprints say these supporting pillars have to be filled with steel- reinforced concrete. How about we just fill them with empty cans instead, what could possibly go wrong?" http://shanghaiist.com/2016/02/08/taiwan_earthquake_building_collapse/ (Disclaimer: there seems to be some controversy over whether the cans were actually in supporting columns or not. But the point still stands.) -- Steven D'Aprano "Ever since I learned about confirmation bias, I've been seeing it everywhere." -- Jon Ronson -- https://mail.python.org/mailman/listinfo/python-list
Re: Thread-safe way to add a key to a dict only if it isn't already there?
On Sun, 08 Jul 2018 19:35:55 +0300, Marko Rauhamaa wrote: > Steven D'Aprano : >> On Sun, 08 Jul 2018 14:11:58 +0300, Marko Rauhamaa wrote: >>> PS My example with "impossible" being the result of a racy integer >>> operation is of course unlikely but could be the outcome if the Python >>> runtime reorganized its object cache on the fly (in a hypothetical >>> implementation). >> >> That would be a cache bug :-) >> >> Not every interpreter bug should be considered the caller's fault :-) > > Whether it's a bug or not depends on the language specification. "It is true that if you turn the indicator on at the same time that you turn the windshield wipers on, the car will explode in an enormous fireball releasing sufficient poisonous gases to kill everyone in a fifteen block radius. But we never said that it was safe to turn the indicator and windshield wipers on at the same time, so it's not a bug, its operator error." Compiler writers and language designers need to stop hiding behind specifications. No other industry so routinely tries (and too often succeeds) in playing the "but we never said our product wouldn't set you on fire" card with so little justification. > Java > has a very clear definition for correct synchronization. It's not so > clear on what could happen in the event of a data race; it's not > entirely unspecified, but the language spec admonishes the application > to brace itself for surprising effects. Incorrect synchronisation is irrelevant in this example. If the object cache is *ever* in an invalid state, such that (as per your example) the literal 1 (an int) could be evaluated as "impossible" (a string), that is a bug in the object management. Threads or no threads. The only excuse would be if the caller directly messed with the cache in an unsupported fashion (say, with ctypes). Software needs "fit for purpose" laws and end-user warranties with teeth. If that means that the software industry spends the next thirty years fixing bugs instead of adding features, that's a good thing. -- Steven D'Aprano "Ever since I learned about confirmation bias, I've been seeing it everywhere." -- Jon Ronson -- https://mail.python.org/mailman/listinfo/python-list
Re: Thread-safe way to add a key to a dict only if it isn't already there?
On Mon, Jul 9, 2018 at 2:11 AM, Marko Rauhamaa wrote: > MRAB : >> On 2018-07-08 14:38, Steven D'Aprano wrote: >>> On Sun, 08 Jul 2018 14:11:58 +0300, Marko Rauhamaa wrote: >>> >> [snip] More importantly, this loop may never finish: # Initially quit = False # Thread 1 global quit while not quit: time.sleep(1) # Thread 2 global quit quit = True >>> >>> Assuming that thread 2 actually runs *at some point*, I don't see how >>> that can't terminate. Neither thread sets quit to False, so provided >>> thread 2 runs at all, it has to terminate. >>> >> [snip] >> >> The compiler could look at the code for thread 1 and see that 'quit' is >> never assigned to, meaning that it could be "optimised" to: >> >> global quit >> if not quit: >> while True: >> time.sleep(1) >> >> In C you'd declare 'quit' as 'volatile' to tell the compiler that it >> could change unexpectedly, so don't make that assumption. > > C is an even tougher case. Even if the compiler kept on checking a > volatile value, the CPU might never propagate the cache content to the > other core. You'd need a memory barrier. In Java, "volatile" effectively > creates a memory barrier, but in C (and C++) it does not. In C you need > something like a mutex to see the effects of other threads running. > > (BTW, I think that's a terrible thing for the C standards committee to > specify.) None of this has any impact on Python whatsoever. ChrisA -- https://mail.python.org/mailman/listinfo/python-list
Re: Thread-safe way to add a key to a dict only if it isn't already there?
Steven D'Aprano : > On Sun, 08 Jul 2018 14:11:58 +0300, Marko Rauhamaa wrote: >> PS My example with "impossible" being the result of a racy integer >> operation is of course unlikely but could be the outcome if the Python >> runtime reorganized its object cache on the fly (in a hypothetical >> implementation). > > That would be a cache bug :-) > > Not every interpreter bug should be considered the caller's fault :-) Whether it's a bug or not depends on the language specification. Java has a very clear definition for correct synchronization. It's not so clear on what could happen in the event of a data race; it's not entirely unspecified, but the language spec admonishes the application to brace itself for surprising effects. If the Python Language Specification hasn't specified the effects of incorrect synchronization so we can rightly suppose that the results are unspecified. Marko -- https://mail.python.org/mailman/listinfo/python-list
Re: Thread-safe way to add a key to a dict only if it isn't already there?
MRAB : > On 2018-07-08 14:38, Steven D'Aprano wrote: >> On Sun, 08 Jul 2018 14:11:58 +0300, Marko Rauhamaa wrote: >> > [snip] >>> More importantly, this loop may never finish: >>> >>> # Initially >>> quit = False >>> >>> # Thread 1 >>> global quit >>> while not quit: >>> time.sleep(1) >>> >>> # Thread 2 >>> global quit >>> quit = True >> >> Assuming that thread 2 actually runs *at some point*, I don't see how >> that can't terminate. Neither thread sets quit to False, so provided >> thread 2 runs at all, it has to terminate. >> > [snip] > > The compiler could look at the code for thread 1 and see that 'quit' is > never assigned to, meaning that it could be "optimised" to: > > global quit > if not quit: > while True: > time.sleep(1) > > In C you'd declare 'quit' as 'volatile' to tell the compiler that it > could change unexpectedly, so don't make that assumption. C is an even tougher case. Even if the compiler kept on checking a volatile value, the CPU might never propagate the cache content to the other core. You'd need a memory barrier. In Java, "volatile" effectively creates a memory barrier, but in C (and C++) it does not. In C you need something like a mutex to see the effects of other threads running. (BTW, I think that's a terrible thing for the C standards committee to specify.) Marko -- https://mail.python.org/mailman/listinfo/python-list
Re: Thread-safe way to add a key to a dict only if it isn't already there?
On 2018-07-08 14:38, Steven D'Aprano wrote: On Sun, 08 Jul 2018 14:11:58 +0300, Marko Rauhamaa wrote: [snip] More importantly, this loop may never finish: # Initially quit = False # Thread 1 global quit while not quit: time.sleep(1) # Thread 2 global quit quit = True Assuming that thread 2 actually runs *at some point*, I don't see how that can't terminate. Neither thread sets quit to False, so provided thread 2 runs at all, it has to terminate. [snip] The compiler could look at the code for thread 1 and see that 'quit' is never assigned to, meaning that it could be "optimised" to: global quit if not quit: while True: time.sleep(1) In C you'd declare 'quit' as 'volatile' to tell the compiler that it could change unexpectedly, so don't make that assumption. -- https://mail.python.org/mailman/listinfo/python-list
Re: Thread-safe way to add a key to a dict only if it isn't already there?
On Sun, 08 Jul 2018 14:11:58 +0300, Marko Rauhamaa wrote: > Steven D'Aprano : >> Changing implementations from one which is thread safe to one which is >> not can break people's code, and shouldn't be done on a whim. >> Especially since such breakage could be subtle, hard to notice, harder >> to track down, and even harder still to fix. > > Java's HotSpot does it all the time, and it did result in code breakage > -- although the code was broken to begin with. I said "shouldn't be done", rather than claiming that was the situation right now with all compilers. But I'm willing to give a little bit of slack to aggressively optimizing compilers, provided they come with a warning. >> So there is no coherent way to get a result of "impossible" from just >> adding 1 to 1 in any coherent implementation of Python. > > Back to Java, there was a real case of 64-bit integer operations not > being atomic on 32-bit machines. Mixing up upper and lower halves > between threads could result in really weird evaluations. Oh don't get me wrong, I agree with you that threading can result in strange, unpredictable errors. That's why I try not to use threading. I have no illusions about my ability to debug those sorts of problems. > More importantly, this loop may never finish: > > # Initially > quit = False > > # Thread 1 > global quit > while not quit: > time.sleep(1) > > # Thread 2 > global quit > quit = True Assuming that thread 2 actually runs *at some point*, I don't see how that can't terminate. Neither thread sets quit to False, so provided thread 2 runs at all, it has to terminate. I suppose if the threading implementation *could* fall back to sequential code (thread 2 doesn't run until thread 1 finishes, which it never does...) that outcome is possible. But it would have to be a pretty poor implementation. Now if you said there were fifty threads (aside from the main thread, which is guaranteed to run) all reading quit, and only thread 50 ever assigns to it, I'd believe that perhaps thread 50 never gets a chance to run. But with just two threads? Explain please. > That's the reality in Java and C. I see no reason why that wouldn't be > the reality in Python as well -- unless the language specification said > otherwise. Because the Python core developers care more about correctness than speed. > Marko > > PS My example with "impossible" being the result of a racy integer > operation is of course unlikely but could be the outcome if the Python > runtime reorganized its object cache on the fly (in a hypothetical > implementation). That would be a cache bug :-) Not every interpreter bug should be considered the caller's fault :-) -- Steven D'Aprano "Ever since I learned about confirmation bias, I've been seeing it everywhere." -- Jon Ronson -- https://mail.python.org/mailman/listinfo/python-list
Re: Thread-safe way to add a key to a dict only if it isn't already there?
Steven D'Aprano : > Changing implementations from one which is thread safe to one which is > not can break people's code, and shouldn't be done on a whim. > Especially since such breakage could be subtle, hard to notice, harder > to track down, and even harder still to fix. Java's HotSpot does it all the time, and it did result in code breakage -- although the code was broken to begin with. > So there is no coherent way to get a result of "impossible" from just > adding 1 to 1 in any coherent implementation of Python. Back to Java, there was a real case of 64-bit integer operations not being atomic on 32-bit machines. Mixing up upper and lower halves between threads could result in really weird evaluations. More importantly, this loop may never finish: # Initially quit = False # Thread 1 global quit while not quit: time.sleep(1) # Thread 2 global quit quit = True That's the reality in Java and C. I see no reason why that wouldn't be the reality in Python as well -- unless the language specification said otherwise. Marko PS My example with "impossible" being the result of a racy integer operation is of course unlikely but could be the outcome if the Python runtime reorganized its object cache on the fly (in a hypothetical implementation). -- https://mail.python.org/mailman/listinfo/python-list
Re: Thread-safe way to add a key to a dict only if it isn't already there?
On Sun, 08 Jul 2018 10:52:15 +0300, Marko Rauhamaa wrote: > You are on the right track, Chris, but you are still deducing behavior > from a particular implementation. For example Java's definition is > approximately the one I give above: > >Without correct synchronization, very strange, confusing and >counterintuitive behaviors are possible. Indeed, and that applies to all languages with threading, including Python. But notice that it doesn't say: Without correct synchronization, impossible things are possible. >https://docs.oracle.com/javase/specs/jls/se7/html/jls-17.htm >l#jls-17.4.5> You seem to have read this as "Java threading can do impossible things", but I think the correct reading of the article is "Here are the possible things that Java threading can do to mess with your mind". That's a much smaller subset of things which can go wrong. (But still pretty big.) See, for example Example 17.4-1, which explicitly shows that the Java execution model is permitted to reorder statements: r1 = A; B = 1; may be reordered to B = 1; r1 = A; That's harmless in single-threaded code, but risky in multi-threaded code. The result is that in the example given, r1 could end up being set to 1 instead of the expected values -- but it cannot possibly be set to the string "impossible", since that goes against the Java type system. In Python, there's no possible order of operations of assigning 1 and adding 1 that could result in a string. No Python compiler or interpreter is permitted to evaluate `1 + 1` or `2 + 1` as the string "impossible", even if threads are involved. (That's not to say that threads might not assign a string to the variable we expected to contain 2, but it would have to do so by assignment, not by mucking about with the execution order of integer addition.) In the case of Example 17.4-1, starting with A = B = 0, two threads execute: Thread 1:r2 = A; B = 1; Thread 2:r1 = B; A = 2; Java compilers are permitted to inspect each thread *in isolation*, decide that the order of lines is invisible to the caller, and so re- order them. The result of that is that those four lines can be executed in any permutation, giving a total of 24 separate possibilities. py> from itertools import permutations py> instructions = ["r2 = A;", "B = 1;", "r1 = B;", "A = 2;"] py> len(list(permutations(instructions))) 24 But Python (at least for now) cannot do this. It has a more deterministic execution order which guarantees top-to-bottom execution of Python statements. That implies that there are only *six* possible execution orders of those two threads (which is still enough to add non-determinism to your code!) not twenty-four: r2 = A; B = 1; r1 = B; A = 2; r2 = A; r1 = B; B = 1; A = 2; r2 = A; r1 = B; A = 2; B = 1; r1 = B; A = 2; r2 = A; B = 1; r1 = B; r2 = A; A = 2; B = 1; r1 = B; r2 = A; A = 2; B = 1; and that remains true even if the underlying implementation is written in Java, or Malbolge for that matter. > And we know that Python has been implemented using Java... That's irrelevant. A legal Python does not inherit the quirks of the underlying implementation -- it must still follow the rules of the language, or it isn't Python. Java is statically typed, with machine ints, Python is dynamically typed, with no machine ints. Relevant: http://doc.pypy.org/en/latest/cpython_differences.html http://docs.micropython.org/en/latest/unix/genrst/index.html -- Steven D'Aprano "Ever since I learned about confirmation bias, I've been seeing it everywhere." -- Jon Ronson -- https://mail.python.org/mailman/listinfo/python-list
Re: Thread-safe way to add a key to a dict only if it isn't already there?
Chris Angelico : > On Sun, Jul 8, 2018 at 11:04 AM, Steven D'Aprano > wrote: >>> The only thing Python should guarantee is that the data structures stay >>> "coherent" under race conditions. In other words, there cannot be a >>> segmentation fault. For example, if two threads executed this code in >>> parallel: >>> >>> global i >>> i = 1 >>> i += 1 >>> >>> a legal end result could be that i contains the string "impossible". >> >> That wouldn't be coherent. The only coherent results are that i could >> equal either 2 or 3: > > Python threads don't switch only between lines of code, so the actual > interaction is a bit more complicated than you say. [...] > > But you're absolutely right that there are only a small handful of > plausible results, even with threading involved. You are on the right track, Chris, but you are still deducing behavior from a particular implementation. For example Java's definition is approximately the one I give above: Without correct synchronization, very strange, confusing and counterintuitive behaviors are possible. https://docs.oracle.com/javase/specs/jls/se7/html/jls-17.htm l#jls-17.4.5> And we know that Python has been implemented using Java... Marko -- https://mail.python.org/mailman/listinfo/python-list
Re: Thread-safe way to add a key to a dict only if it isn't already there?
On Sun, Jul 8, 2018 at 1:58 PM, Steven D'Aprano wrote: > On Sun, 08 Jul 2018 12:23:41 +1000, Chris Angelico wrote: > >>> Some people, when confronted with a problem, say, "I know, I'll use >>> threads". Nothhtwo probw ey ave lems. >> >> Right. Now they have to deal with interleaving, but that's all. And >> honestly, MOST CODE wouldn't notice interleaving; it's only when you >> change (either by rebinding or by mutating) something that can be seen >> by multiple threads. Which basically means "mutable globals are a risk, >> pretty much everything else is safe". > > Its not just globals though. Any mutable object shared between two > threads is potentially a risk. Globals are just the most obvious example. > Right; by "basically" I mean "yes there are others but this is what matters". The technically-accurate part is in the previous sentence: "can be seen by multiple threads". ChrisA -- https://mail.python.org/mailman/listinfo/python-list
Re: Thread-safe way to add a key to a dict only if it isn't already there?
On Sun, 08 Jul 2018 12:23:41 +1000, Chris Angelico wrote: >> Some people, when confronted with a problem, say, "I know, I'll use >> threads". Nothhtwo probw ey ave lems. > > Right. Now they have to deal with interleaving, but that's all. And > honestly, MOST CODE wouldn't notice interleaving; it's only when you > change (either by rebinding or by mutating) something that can be seen > by multiple threads. Which basically means "mutable globals are a risk, > pretty much everything else is safe". Its not just globals though. Any mutable object shared between two threads is potentially a risk. Globals are just the most obvious example. -- Steven D'Aprano "Ever since I learned about confirmation bias, I've been seeing it everywhere." -- Jon Ronson -- https://mail.python.org/mailman/listinfo/python-list
Re: Thread-safe way to add a key to a dict only if it isn't already there?
On Sun, Jul 8, 2018 at 12:12 PM, Steven D'Aprano wrote: > On Sun, 08 Jul 2018 11:15:17 +1000, Chris Angelico wrote: > > [...] >> Python threads don't switch only between lines of code, > > As I understand it, there could be a switch between any two byte codes, > or maybe only between certain bytes codes. But certain more fine grained > than just between lines of code. > > >> so the actual >> interaction is a bit more complicated than you say. In CPython, the >> increment operation is: >> >> 3 0 LOAD_GLOBAL 0 (i) >> 2 LOAD_CONST 1 (1) >> 4 INPLACE_ADD >> 6 STORE_GLOBAL 0 (i) >> >> A context switch could happen between any pair of statements. > > If you actually mean *statements* as opposed to byte codes, then the only > place there could be a switch would be either before the LOAD_GLOBAL or > after the STORE_GLOBAL (given that i is a built-in int and cannot have a > custom __iadd__ method). > > Is that what you mean? I may be wrong, but I always assume that a context switch could happen between any two bytecode operations - or, if you're reading the disassembly, between any two lines *of disassembly*. So there could be a switch before LOAD_GLOBAL, a switch between that and LOAD_CONST, another switch before the ADD, another before the STORE, and another right at the end. Well, there won't be *all* of those, but there could be any of them. This might not be entirely correct - there might be pairs that are functionally atomic - but it's the safe assumption. >> For instance, if you replace "i >> += 1" with "i += i", to double the value, you'll get this: >> >> 3 0 LOAD_GLOBAL 0 (i) >> 2 LOAD_GLOBAL 0 (i) >> 4 INPLACE_ADD >> 6 STORE_GLOBAL 0 (i) >> >> and that could potentially have both of them load the initial value, >> then one of them runs to completion, and then the other loads the result >> - so it'll add 1 and 2 and have a result of 3, rather than 2 or 4. > > Some people, when confronted with a problem, say, "I know, I'll use > threads". Nothhtwo probw ey ave lems. Right. Now they have to deal with interleaving, but that's all. And honestly, MOST CODE wouldn't notice interleaving; it's only when you change (either by rebinding or by mutating) something that can be seen by multiple threads. Which basically means "mutable globals are a risk, pretty much everything else is safe". >> But you're absolutely right that there are only a small handful of >> plausible results, even with threading involved. > > Indeed. Even though threading is non-deterministic, it isn't *entirely* > unconstrained. > Yeah. Quite far from it, in fact. Python threading is well-defined and fairly easy to work with. Only in a handful of operations do you need to worry about atomicity - like the one that started this thread. ChrisA -- https://mail.python.org/mailman/listinfo/python-list
Re: Thread-safe way to add a key to a dict only if it isn't already there?
On Sun, 08 Jul 2018 11:15:17 +1000, Chris Angelico wrote: [...] > Python threads don't switch only between lines of code, As I understand it, there could be a switch between any two byte codes, or maybe only between certain bytes codes. But certain more fine grained than just between lines of code. > so the actual > interaction is a bit more complicated than you say. In CPython, the > increment operation is: > > 3 0 LOAD_GLOBAL 0 (i) > 2 LOAD_CONST 1 (1) > 4 INPLACE_ADD > 6 STORE_GLOBAL 0 (i) > > A context switch could happen between any pair of statements. If you actually mean *statements* as opposed to byte codes, then the only place there could be a switch would be either before the LOAD_GLOBAL or after the STORE_GLOBAL (given that i is a built-in int and cannot have a custom __iadd__ method). Is that what you mean? > In this > particular example, the end result doesn't change - coherent results are > 2 and 3, nothing else - but in other situations, there may be times when > the separate steps might be significant. Fortunately I wasn't talking about other code snippets, only the one shown :-) > For instance, if you replace "i > += 1" with "i += i", to double the value, you'll get this: > > 3 0 LOAD_GLOBAL 0 (i) > 2 LOAD_GLOBAL 0 (i) > 4 INPLACE_ADD > 6 STORE_GLOBAL 0 (i) > > and that could potentially have both of them load the initial value, > then one of them runs to completion, and then the other loads the result > - so it'll add 1 and 2 and have a result of 3, rather than 2 or 4. Some people, when confronted with a problem, say, "I know, I'll use threads". Nothhtwo probw ey ave lems. > But you're absolutely right that there are only a small handful of > plausible results, even with threading involved. Indeed. Even though threading is non-deterministic, it isn't *entirely* unconstrained. -- Steven D'Aprano "Ever since I learned about confirmation bias, I've been seeing it everywhere." -- Jon Ronson -- https://mail.python.org/mailman/listinfo/python-list
Re: Thread-safe way to add a key to a dict only if it isn't already there?
On Sun, Jul 8, 2018 at 11:04 AM, Steven D'Aprano wrote: >> The only thing Python should guarantee is that the data structures stay >> "coherent" under race conditions. In other words, there cannot be a >> segmentation fault. For example, if two threads executed this code in >> parallel: >> >> global i >> i = 1 >> i += 1 >> >> a legal end result could be that i contains the string "impossible". > > That wouldn't be coherent. The only coherent results are that i could > equal either 2 or 3: > > Possibility 1: > > i = 1 # thread 1 > i += 1 # thread 1 > i = 1 # thread 2 > i += 1 # thread 2 > assert i == 2 > > Possibility 2: > > i = 1 # thread 1 > i = 1 # thread 2 > i += 1 # thread 1 > i += 1 # thread 2 > assert i == 3 > > > Executing statements out of order is impossible, even in threaded code. > Redefining the meaning of the integer literal 1 is impossible (ignoring > unsupported shenanigans with ctypes). Monkey-patching the int __iadd__ > method is impossible. So there is no coherent way to get a result of > "impossible" from just adding 1 to 1 in any coherent implementation of > Python. Python threads don't switch only between lines of code, so the actual interaction is a bit more complicated than you say. In CPython, the increment operation is: 3 0 LOAD_GLOBAL 0 (i) 2 LOAD_CONST 1 (1) 4 INPLACE_ADD 6 STORE_GLOBAL 0 (i) A context switch could happen between any pair of statements. In this particular example, the end result doesn't change - coherent results are 2 and 3, nothing else - but in other situations, there may be times when the separate steps might be significant. For instance, if you replace "i += 1" with "i += i", to double the value, you'll get this: 3 0 LOAD_GLOBAL 0 (i) 2 LOAD_GLOBAL 0 (i) 4 INPLACE_ADD 6 STORE_GLOBAL 0 (i) and that could potentially have both of them load the initial value, then one of them runs to completion, and then the other loads the result - so it'll add 1 and 2 and have a result of 3, rather than 2 or 4. But you're absolutely right that there are only a small handful of plausible results, even with threading involved. ChrisA -- https://mail.python.org/mailman/listinfo/python-list
Re: Thread-safe way to add a key to a dict only if it isn't already there?
On Sun, 08 Jul 2018 00:00:26 +0300, Marko Rauhamaa wrote: > Ian Kelly : >> the leaning of the devs seems to be to refrain from documenting it and >> instead document that *no* operations are guaranteed atomic. > > I believe that to be wise. Otherwise, Python would limit its future > implementation options. o_O By that logic, one should say we shouldn't document the semantics of operations, so we don't limit the implementation. "What does list.sort do?" "It currently sorts the list using a stable comparison sort, but you can't rely on that it, because we didn't want to limit the implementation." Sorting is guaranteed to be stable. Does that limit the implementation options? Yes, of course it does. So does the fact that it *sorts* -- we're limited to implementations which *sort*, and specifically comparison sorts (not, for example, radix sorts). Changing the implementation to a radix sort that only works on ints would break things. So would changing the implementation to something which empties the list of all elements. ("The empty list is always sorted.") Stable semantics are more important than freedom to change implementation. Whether or not an operation is thread-safe, or atomic, is part of the semantics of the operation. It's not merely a performance issue, and we ought to treat it will a bit more respect. I'm not saying that everything needs to be documented as thread-safe or not (for at least one release, Python's sorting actually was stable before it was documented as such) but the gung-ho attitude that safety is just an implementation detail should be resisted. Changing implementations from one which is thread safe to one which is not can break people's code, and shouldn't be done on a whim. Especially since such breakage could be subtle, hard to notice, harder to track down, and even harder still to fix. > The only thing Python should guarantee is that the data structures stay > "coherent" under race conditions. In other words, there cannot be a > segmentation fault. For example, if two threads executed this code in > parallel: > > global i > i = 1 > i += 1 > > a legal end result could be that i contains the string "impossible". That wouldn't be coherent. The only coherent results are that i could equal either 2 or 3: Possibility 1: i = 1 # thread 1 i += 1 # thread 1 i = 1 # thread 2 i += 1 # thread 2 assert i == 2 Possibility 2: i = 1 # thread 1 i = 1 # thread 2 i += 1 # thread 1 i += 1 # thread 2 assert i == 3 Executing statements out of order is impossible, even in threaded code. Redefining the meaning of the integer literal 1 is impossible (ignoring unsupported shenanigans with ctypes). Monkey-patching the int __iadd__ method is impossible. So there is no coherent way to get a result of "impossible" from just adding 1 to 1 in any coherent implementation of Python. -- Steven D'Aprano "Ever since I learned about confirmation bias, I've been seeing it everywhere." -- Jon Ronson -- https://mail.python.org/mailman/listinfo/python-list
Re: Thread-safe way to add a key to a dict only if it isn't already there?
Ian Kelly : > the leaning of the devs seems to be to refrain from documenting it and > instead document that *no* operations are guaranteed atomic. I believe that to be wise. Otherwise, Python would limit its future implementation options. The only thing Python should guarantee is that the data structures stay "coherent" under race conditions. In other words, there cannot be a segmentation fault. For example, if two threads executed this code in parallel: global i i = 1 i += 1 a legal end result could be that i contains the string "impossible". Marko -- https://mail.python.org/mailman/listinfo/python-list
Re: Thread-safe way to add a key to a dict only if it isn't already there?
On 07/06/2018 10:51 AM, INADA Naoki wrote: > On Sat, Jul 7, 2018 at 2:49 AM Steven D'Aprano wrote: >> I have a dict with string keys: >> >> --> D = {'a': None, 'b': None} >> How do I do a thread-safe insertion if, and only if, the key isn't >> already there? > D.setdefault('c', None) This is how Enum avoids race conditions when setting up the re.RegexType enumeration. -- ~Ethan~ -- https://mail.python.org/mailman/listinfo/python-list
Re: Thread-safe way to add a key to a dict only if it isn't already there?
On Fri, Jul 6, 2018 at 11:56 AM INADA Naoki wrote: > > D.setdefault('c', None) Not guaranteed to be atomic. I think the only safe way to do it is with a Lock. -- https://mail.python.org/mailman/listinfo/python-list
Re: Thread-safe way to add a key to a dict only if it isn't already there?
On Sat, Jul 7, 2018 at 6:49 AM Marko Rauhamaa wrote: > Is that guaranteed to be thread-safe? The documentation ( s://docs.python.org/3/library/stdtypes.html#dict.setdefault>) makes no > such promise. It's guaranteed to be thread-safe because all of Python's core containers are thread safe (in as far as they document behaviors/invariants, which implicitly also hold in multithreaded code -- Python does not take the approach other languages do of "thread-compatible" containers that have undefined behavior if mutated from multiple threads simultaneously). It isn't guaranteed to be _atomic_ by the documentation, but I bet no Python implementation would make dict.setdefault non-atomic. There's no good description of the threading rules for Python data structures anywhere. ISTR there was a proposal to give Python some defined rules around thread-safety a couple of years ago (to help with things like GIL-less python projects), but I guess nothing ever came of it. -- Devin -- https://mail.python.org/mailman/listinfo/python-list
Re: Thread-safe way to add a key to a dict only if it isn't already there?
On Sat, Jul 7, 2018 at 8:03 AM Stefan Behnel wrote: > > Marko Rauhamaa schrieb am 07.07.2018 um 15:41: > > Steven D'Aprano : > >> On Sat, 07 Jul 2018 02:51:41 +0900, INADA Naoki wrote: > >>> D.setdefault('c', None) > >> > >> Oh that's clever! > > > > Is that guaranteed to be thread-safe? The documentation ( > s://docs.python.org/3/library/stdtypes.html#dict.setdefault>) makes no > > such promise. > > It's implemented in C and it's at least designed to avoid multiple lookups > and hash value calculations, which suggests that it's also thread-safe by > design (or by a side-effect of the design). Whether that's guaranteed, I > cannot say, but a change that makes it non-thread-safe would probably be > very controversial. It's only implemented in C if you're using CPython (and if it's the builtin dict type and not a subclass). If there's any chance that your code might run under any other interpreter than CPython, then you can't rely on the GIL for thread-safety. I would also point out https://bugs.python.org/issue25343. While some operations are known to be atomic (and therefore thread-safe), the leaning of the devs seems to be to refrain from documenting it and instead document that *no* operations are guaranteed atomic. > > At least __collectios_abc.py > > contains this method definition for MutableMapping: > > > > def setdefault(self, key, default=None): > > 'D.setdefault(k[,d]) -> D.get(k,d), also set D[k]=d if k not in D' > > try: > > return self[key] > > except KeyError: > > self[key] = default > > return default > > > > There are more such non-thread-safe definitions. > > That's a different beast, because Python code can always be interrupted by > thread switches (between each byte code execution). C code cannot, unless > it starts executing byte code (e.g. for calculating a key's hash value) or > explicitly allows a thread switch at a given point. dict.setdefault does potentially call __hash__ and __eq__ on the key. Since this is part of the lookup I don't know whether it affects thread-safety as long as the key is properly hashable, but it does make it more difficult to reason about. I don't *think* that setdefault calls Py_DECREF, but if it did then that is another potential point of thread interruption. By contrast, using a mutex to guard accesses is definitely safe, and it's also self-documenting of the fact that thread-safety is a concern. -- https://mail.python.org/mailman/listinfo/python-list
Re: Thread-safe way to add a key to a dict only if it isn't already there?
Marko Rauhamaa schrieb am 07.07.2018 um 15:41: > Steven D'Aprano : >> On Sat, 07 Jul 2018 02:51:41 +0900, INADA Naoki wrote: >>> D.setdefault('c', None) >> >> Oh that's clever! > > Is that guaranteed to be thread-safe? The documentation ( s://docs.python.org/3/library/stdtypes.html#dict.setdefault>) makes no > such promise. It's implemented in C and it's at least designed to avoid multiple lookups and hash value calculations, which suggests that it's also thread-safe by design (or by a side-effect of the design). Whether that's guaranteed, I cannot say, but a change that makes it non-thread-safe would probably be very controversial. > At least __collectios_abc.py > contains this method definition for MutableMapping: > > def setdefault(self, key, default=None): > 'D.setdefault(k[,d]) -> D.get(k,d), also set D[k]=d if k not in D' > try: > return self[key] > except KeyError: > self[key] = default > return default > > There are more such non-thread-safe definitions. That's a different beast, because Python code can always be interrupted by thread switches (between each byte code execution). C code cannot, unless it starts executing byte code (e.g. for calculating a key's hash value) or explicitly allows a thread switch at a given point. Stefan -- https://mail.python.org/mailman/listinfo/python-list
Re: Thread-safe way to add a key to a dict only if it isn't already there?
Steven D'Aprano : > On Sat, 07 Jul 2018 02:51:41 +0900, INADA Naoki wrote: >> D.setdefault('c', None) > > Oh that's clever! Is that guaranteed to be thread-safe? The documentation () makes no such promise. At least __collectios_abc.py contains this method definition for MutableMapping: def setdefault(self, key, default=None): 'D.setdefault(k[,d]) -> D.get(k,d), also set D[k]=d if k not in D' try: return self[key] except KeyError: self[key] = default return default There are more such non-thread-safe definitions. Marko -- https://mail.python.org/mailman/listinfo/python-list
Re: Thread-safe way to add a key to a dict only if it isn't already there?
On Sat, 07 Jul 2018 02:51:41 +0900, INADA Naoki wrote: > D.setdefault('c', None) Oh that's clever! Thanks! -- Steven D'Aprano "Ever since I learned about confirmation bias, I've been seeing it everywhere." -- Jon Ronson -- https://mail.python.org/mailman/listinfo/python-list
Re: Thread-safe way to add a key to a dict only if it isn't already there?
D.setdefault('c', None) On Sat, Jul 7, 2018 at 2:49 AM Steven D'Aprano wrote: > > I have a dict with string keys: > > D = {'a': None, 'b': None} > > (the values don't matter for this question) and I want to add a key but > only if it isn't already there. If I do the obvious: > > if not 'c' in D: > D['c'] = None > > there's a Time Of Check to Time Of Use bug where some other thread could > conceivably insert 'c' into the dict between the check and the insertion. > > How do I do a thread-safe insertion if, and only if, the key isn't > already there? > > > > Thanks in advance, > > > > -- > Steven D'Aprano > "Ever since I learned about confirmation bias, I've been seeing > it everywhere." -- Jon Ronson > > -- > https://mail.python.org/mailman/listinfo/python-list -- INADA Naoki -- https://mail.python.org/mailman/listinfo/python-list
Thread-safe way to add a key to a dict only if it isn't already there?
I have a dict with string keys: D = {'a': None, 'b': None} (the values don't matter for this question) and I want to add a key but only if it isn't already there. If I do the obvious: if not 'c' in D: D['c'] = None there's a Time Of Check to Time Of Use bug where some other thread could conceivably insert 'c' into the dict between the check and the insertion. How do I do a thread-safe insertion if, and only if, the key isn't already there? Thanks in advance, -- Steven D'Aprano "Ever since I learned about confirmation bias, I've been seeing it everywhere." -- Jon Ronson -- https://mail.python.org/mailman/listinfo/python-list