Re: Thread-safe way to add a key to a dict only if it isn't already there?

2018-07-08 Thread Marko Rauhamaa
Chris Angelico :

> On Mon, Jul 9, 2018 at 5:18 AM, Marko Rauhamaa  wrote:
>> Chris Angelico :
>>> Are you assuming that Python's semantics are defined by the semantics
>>> of one possible implementation language?
>>
>> What are Python's semantics defined by? I've been using these:
>>
>>https://docs.python.org/3/reference/>
>>
>>https://docs.python.org/3/library/>
>>
>> Unfortunately, neither spec says anything about the atomicity of
>> dict.setdefault().
>>
>> Therefore, the application programmer must assume it is not atomic. In
>> fact, as brought up in this discussion, the consultation of
>> object.__hash__() and object.__eq__() almost guarantee the
>> *non*-atomicity of dict.setdefault().
>
> If by "atomic" you mean that absolutely no context switch can occur
> during setdefault, then it probably isn't. But the point of an atomic
> query/update operation is that there are exactly two possibilities:
>
> 1) The key did not exist in the dictionary. It now does, with the
> provided default, which was returned.
> 2) The key did exist in the dictionary. The provided default is
> ignored, and the previous value is returned.

This is a classic test-and-set race condition:

# Initially
assert "A" not in d

# Thread 1
d.setdefault("A", 1)

# Thread 2
d["A"] = 2

If dict.setdefault() is atomic, the end result in all timings must be:

   d["A"] == 2

However, if dict.setdefault() is not atomic, the end result may also be:

   d["A"] == 1

> Neither object.__hash__ nor object.__eq__ gives any way for this to be
> violated (unless you mess with the concept of "the key did exist in
> the dictionary" by breaking the definition of equality, but that's
> nothing to do with atomicity). Here's the definition of setdefault:
>
> setdefault(key, default=None, /) method of builtins.dict instance
> Insert key with a value of default if key is not in the dictionary.
>
> Return the value for key if key is in the dictionary, else default.
>
> Simple semantics. Straight-forward. Now, maybe there's a bug in some
> implementation whereby threads can violate this; but that would be a
> bug to be fixed, and nothing more. You should be able to assume that
> it will behave as stated.

Since atomicity is not guaranteed by the definition of the method, the
application programmer must be prepared for the end result to be 1 (or
even something completely different because the Language Specification
doesn't say anything about the outcome of data races).


Marko
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Thread-safe way to add a key to a dict only if it isn't already there?

2018-07-08 Thread Chris Angelico
On Mon, Jul 9, 2018 at 5:18 AM, Marko Rauhamaa  wrote:
> Chris Angelico :
>> Are you assuming that Python's semantics are defined by the semantics
>> of one possible implementation language?
>
> What are Python's semantics defined by? I've been using these:
>
>https://docs.python.org/3/reference/>
>
>https://docs.python.org/3/library/>
>
> Unfortunately, neither spec says anything about the atomicity of
> dict.setdefault().
>
> Therefore, the application programmer must assume it is not atomic. In
> fact, as brought up in this discussion, the consultation of
> object.__hash__() and object.__eq__() almost guarantee the
> *non*-atomicity of dict.setdefault().

If by "atomic" you mean that absolutely no context switch can occur
during setdefault, then it probably isn't. But the point of an atomic
query/update operation is that there are exactly two possibilities:

1) The key did not exist in the dictionary. It now does, with the
provided default, which was returned.
2) The key did exist in the dictionary. The provided default is
ignored, and the previous value is returned.

Neither object.__hash__ nor object.__eq__ gives any way for this to be
violated (unless you mess with the concept of "the key did exist in
the dictionary" by breaking the definition of equality, but that's
nothing to do with atomicity). Here's the definition of setdefault:

setdefault(key, default=None, /) method of builtins.dict instance
Insert key with a value of default if key is not in the dictionary.

Return the value for key if key is in the dictionary, else default.


Simple semantics. Straight-forward. Now, maybe there's a bug in some
implementation whereby threads can violate this; but that would be a
bug to be fixed, and nothing more. You should be able to assume that
it will behave as stated.

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Thread-safe way to add a key to a dict only if it isn't already there?

2018-07-08 Thread Marko Rauhamaa
Chris Angelico :
> Are you assuming that Python's semantics are defined by the semantics
> of one possible implementation language?

What are Python's semantics defined by? I've been using these:

   https://docs.python.org/3/reference/>

   https://docs.python.org/3/library/>

Unfortunately, neither spec says anything about the atomicity of
dict.setdefault().

Therefore, the application programmer must assume it is not atomic. In
fact, as brought up in this discussion, the consultation of
object.__hash__() and object.__eq__() almost guarantee the
*non*-atomicity of dict.setdefault().


Marko
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Thread-safe way to add a key to a dict only if it isn't already there?

2018-07-08 Thread Chris Angelico
On Mon, Jul 9, 2018 at 4:57 AM, Marko Rauhamaa  wrote:
> Chris Angelico :
>
>> On Mon, Jul 9, 2018 at 2:11 AM, Marko Rauhamaa  wrote:
>>> MRAB :
 In C you'd declare 'quit' as 'volatile' to tell the compiler that it
 could change unexpectedly, so don't make that assumption.
>>>
>>> C is an even tougher case. Even if the compiler kept on checking a
>>> volatile value, the CPU might never propagate the cache content to
>>> the other core. You'd need a memory barrier. In Java, "volatile"
>>> effectively creates a memory barrier, but in C (and C++) it does not.
>>> In C you need something like a mutex to see the effects of other
>>> threads running.
>>>
>>> (BTW, I think that's a terrible thing for the C standards committee to
>>> specify.)
>>
>> None of this has any impact on Python whatsoever.
>
> [citation needed]
>

Why? You might as well say "in Blurple, all integers greater than 5
compare equal to each other, and it's possible to implement a Python
interpreter in Blurple, therefore we can't trust integer comparisons
in Python". It's ridiculous to consider. The languages are completely
independent.

Are you assuming that Python's semantics are defined by the semantics
of one possible implementation language?

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Thread-safe way to add a key to a dict only if it isn't already there?

2018-07-08 Thread Marko Rauhamaa
Chris Angelico :

> On Mon, Jul 9, 2018 at 2:11 AM, Marko Rauhamaa  wrote:
>> MRAB :
>>> In C you'd declare 'quit' as 'volatile' to tell the compiler that it
>>> could change unexpectedly, so don't make that assumption.
>>
>> C is an even tougher case. Even if the compiler kept on checking a
>> volatile value, the CPU might never propagate the cache content to
>> the other core. You'd need a memory barrier. In Java, "volatile"
>> effectively creates a memory barrier, but in C (and C++) it does not.
>> In C you need something like a mutex to see the effects of other
>> threads running.
>>
>> (BTW, I think that's a terrible thing for the C standards committee to
>> specify.)
>
> None of this has any impact on Python whatsoever.

[citation needed]


Marko
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Thread-safe way to add a key to a dict only if it isn't already there?

2018-07-08 Thread Steven D'Aprano
On Sun, 08 Jul 2018 16:37:11 +0100, MRAB wrote:

> On 2018-07-08 14:38, Steven D'Aprano wrote:
>> On Sun, 08 Jul 2018 14:11:58 +0300, Marko Rauhamaa wrote:
>> 
> [snip]
>>> More importantly, this loop may never finish:
>>> 
>>> # Initially
>>> quit = False
>>> 
>>> # Thread 1
>>> global quit
>>> while not quit:
>>> time.sleep(1)
>>> 
>>> # Thread 2
>>> global quit
>>> quit = True
>> 
>> Assuming that thread 2 actually runs *at some point*, I don't see how
>> that can't terminate. Neither thread sets quit to False, so provided
>> thread 2 runs at all, it has to terminate.
>> 
> [snip]
> 
> The compiler could look at the code for thread 1 and see that 'quit' is
> never assigned to, meaning that it could be "optimised" to:
> 
>  global quit
>  if not quit:
>  while True:
>  time.sleep(1)


I'm glad you put "optimized" in scare quotes there, because any optimizer 
that did that to code that runs in a thread is buggy. The re-write has 
changed the semantics of the code.

Of course a compiler "could" do anything. If it re-wrote the code to 
instead perform:

while True:
print(math.sin(random.random()+1)

instead, we'd have no trouble recognising that the compiler has changed 
the semantics of our code to something we didn't write, and in just about 
every language apart from C, we would rightly call it a compiler bug.

If I write "Do X", and the compiler instead executes "Do Y", that's a 
compiler bug. The compiler has one job: to take my source code and 
convert it to something which can be executed, and if it cannot do that 
faithfully, it is buggy.

C is the special case: C programmers routinely blame *themselves* when 
the compiler changes "Do X" to "Do Y", because the standard says it is 
permitted to do anything it likes in the event of undefined behaviour. 
Talk about victims coming to rationalise their own abuse and identify 
with their abuser.

W.A. Wulf might have been talking about C compilers when he wrote:

"More computing sins are committed in the name of efficiency
(without necessarily achieving it) than for any other single
reason — including blind stupidity."


> In C you'd declare 'quit' as 'volatile' to tell the compiler that it
> could change unexpectedly, so don't make that assumption.

You shouldn't need to tell the compiler to assume that the variable could 
change. In multi-threaded code, there's no justification for assuming the 
variable can't change unless you've done a whole-program analysis and can 
prove that *no* thread ever writes to that variable.

Even in single-threaded code, I think that optimizations which change 
execution order are dubious. There's far too much opportunity for "it's 
not a bug, because the specification says we can deny it is a bug" 
faults. It's simply *bad engineering practice*.

When engineers design a bridge or a road or a building, the builders have 
to faithfully follow the engineer's design, unless the design itself 
explicitly allows them to make substitutions.

"The blueprints say these supporting pillars have to be filled with steel-
reinforced concrete. How about we just fill them with empty cans instead, 
what could possibly go wrong?"

http://shanghaiist.com/2016/02/08/taiwan_earthquake_building_collapse/


(Disclaimer: there seems to be some controversy over whether the cans 
were actually in supporting columns or not. But the point still stands.)


-- 
Steven D'Aprano
"Ever since I learned about confirmation bias, I've been seeing
it everywhere." -- Jon Ronson

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Thread-safe way to add a key to a dict only if it isn't already there?

2018-07-08 Thread Steven D'Aprano
On Sun, 08 Jul 2018 19:35:55 +0300, Marko Rauhamaa wrote:

> Steven D'Aprano :
>> On Sun, 08 Jul 2018 14:11:58 +0300, Marko Rauhamaa wrote:
>>> PS My example with "impossible" being the result of a racy integer
>>> operation is of course unlikely but could be the outcome if the Python
>>> runtime reorganized its object cache on the fly (in a hypothetical
>>> implementation).
>>
>> That would be a cache bug :-)
>>
>> Not every interpreter bug should be considered the caller's fault :-)
> 
> Whether it's a bug or not depends on the language specification.

"It is true that if you turn the indicator on at the same time that you 
turn the windshield wipers on, the car will explode in an enormous 
fireball releasing sufficient poisonous gases to kill everyone in a 
fifteen block radius. But we never said that it was safe to turn the 
indicator and windshield wipers on at the same time, so it's not a bug, 
its operator error."

Compiler writers and language designers need to stop hiding behind 
specifications. No other industry so routinely tries (and too often 
succeeds) in playing the "but we never said our product wouldn't set you 
on fire" card with so little justification.



> Java
> has a very clear definition for correct synchronization. It's not so
> clear on what could happen in the event of a data race; it's not
> entirely unspecified, but the language spec admonishes the application
> to brace itself for surprising effects.

Incorrect synchronisation is irrelevant in this example. If the object 
cache is *ever* in an invalid state, such that (as per your example) the 
literal 1 (an int) could be evaluated as "impossible" (a string), that is 
a bug in the object management. Threads or no threads.

The only excuse would be if the caller directly messed with the cache in 
an unsupported fashion (say, with ctypes). 


Software needs "fit for purpose" laws and end-user warranties with teeth. 
If that means that the software industry spends the next thirty years 
fixing bugs instead of adding features, that's a good thing.



-- 
Steven D'Aprano
"Ever since I learned about confirmation bias, I've been seeing
it everywhere." -- Jon Ronson

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Thread-safe way to add a key to a dict only if it isn't already there?

2018-07-08 Thread Chris Angelico
On Mon, Jul 9, 2018 at 2:11 AM, Marko Rauhamaa  wrote:
> MRAB :
>> On 2018-07-08 14:38, Steven D'Aprano wrote:
>>> On Sun, 08 Jul 2018 14:11:58 +0300, Marko Rauhamaa wrote:
>>>
>> [snip]
 More importantly, this loop may never finish:

 # Initially
 quit = False

 # Thread 1
 global quit
 while not quit:
 time.sleep(1)

 # Thread 2
 global quit
 quit = True
>>>
>>> Assuming that thread 2 actually runs *at some point*, I don't see how
>>> that can't terminate. Neither thread sets quit to False, so provided
>>> thread 2 runs at all, it has to terminate.
>>>
>> [snip]
>>
>> The compiler could look at the code for thread 1 and see that 'quit' is
>> never assigned to, meaning that it could be "optimised" to:
>>
>> global quit
>> if not quit:
>> while True:
>> time.sleep(1)
>>
>> In C you'd declare 'quit' as 'volatile' to tell the compiler that it
>> could change unexpectedly, so don't make that assumption.
>
> C is an even tougher case. Even if the compiler kept on checking a
> volatile value, the CPU might never propagate the cache content to the
> other core. You'd need a memory barrier. In Java, "volatile" effectively
> creates a memory barrier, but in C (and C++) it does not. In C you need
> something like a mutex to see the effects of other threads running.
>
> (BTW, I think that's a terrible thing for the C standards committee to
> specify.)

None of this has any impact on Python whatsoever.

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Thread-safe way to add a key to a dict only if it isn't already there?

2018-07-08 Thread Marko Rauhamaa
Steven D'Aprano :
> On Sun, 08 Jul 2018 14:11:58 +0300, Marko Rauhamaa wrote:
>> PS My example with "impossible" being the result of a racy integer
>> operation is of course unlikely but could be the outcome if the Python
>> runtime reorganized its object cache on the fly (in a hypothetical
>> implementation).
>
> That would be a cache bug :-)
>
> Not every interpreter bug should be considered the caller's fault :-)

Whether it's a bug or not depends on the language specification. Java
has a very clear definition for correct synchronization. It's not so
clear on what could happen in the event of a data race; it's not
entirely unspecified, but the language spec admonishes the application
to brace itself for surprising effects.

If the Python Language Specification hasn't specified the effects of
incorrect synchronization so we can rightly suppose that the results are
unspecified.


Marko
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Thread-safe way to add a key to a dict only if it isn't already there?

2018-07-08 Thread Marko Rauhamaa
MRAB :
> On 2018-07-08 14:38, Steven D'Aprano wrote:
>> On Sun, 08 Jul 2018 14:11:58 +0300, Marko Rauhamaa wrote:
>>
> [snip]
>>> More importantly, this loop may never finish:
>>>
>>> # Initially
>>> quit = False
>>>
>>> # Thread 1
>>> global quit
>>> while not quit:
>>> time.sleep(1)
>>>
>>> # Thread 2
>>> global quit
>>> quit = True
>>
>> Assuming that thread 2 actually runs *at some point*, I don't see how
>> that can't terminate. Neither thread sets quit to False, so provided
>> thread 2 runs at all, it has to terminate.
>>
> [snip]
>
> The compiler could look at the code for thread 1 and see that 'quit' is
> never assigned to, meaning that it could be "optimised" to:
>
> global quit
> if not quit:
> while True:
> time.sleep(1)
>
> In C you'd declare 'quit' as 'volatile' to tell the compiler that it
> could change unexpectedly, so don't make that assumption.

C is an even tougher case. Even if the compiler kept on checking a
volatile value, the CPU might never propagate the cache content to the
other core. You'd need a memory barrier. In Java, "volatile" effectively
creates a memory barrier, but in C (and C++) it does not. In C you need
something like a mutex to see the effects of other threads running.

(BTW, I think that's a terrible thing for the C standards committee to
specify.)


Marko
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Thread-safe way to add a key to a dict only if it isn't already there?

2018-07-08 Thread MRAB

On 2018-07-08 14:38, Steven D'Aprano wrote:

On Sun, 08 Jul 2018 14:11:58 +0300, Marko Rauhamaa wrote:


[snip]

More importantly, this loop may never finish:

# Initially
quit = False

# Thread 1
global quit
while not quit:
time.sleep(1)

# Thread 2
global quit
quit = True


Assuming that thread 2 actually runs *at some point*, I don't see how
that can't terminate. Neither thread sets quit to False, so provided
thread 2 runs at all, it has to terminate.


[snip]

The compiler could look at the code for thread 1 and see that 'quit' is 
never assigned to, meaning that it could be "optimised" to:


global quit
if not quit:
while True:
time.sleep(1)

In C you'd declare 'quit' as 'volatile' to tell the compiler that it 
could change unexpectedly, so don't make that assumption.

--
https://mail.python.org/mailman/listinfo/python-list


Re: Thread-safe way to add a key to a dict only if it isn't already there?

2018-07-08 Thread Steven D'Aprano
On Sun, 08 Jul 2018 14:11:58 +0300, Marko Rauhamaa wrote:

> Steven D'Aprano :
>> Changing implementations from one which is thread safe to one which is
>> not can break people's code, and shouldn't be done on a whim.
>> Especially since such breakage could be subtle, hard to notice, harder
>> to track down, and even harder still to fix.
> 
> Java's HotSpot does it all the time, and it did result in code breakage
> -- although the code was broken to begin with.

I said "shouldn't be done", rather than claiming that was the situation 
right now with all compilers.

But I'm willing to give a little bit of slack to aggressively optimizing 
compilers, provided they come with a warning.


>> So there is no coherent way to get a result of "impossible" from just
>> adding 1 to 1 in any coherent implementation of Python.
> 
> Back to Java, there was a real case of 64-bit integer operations not
> being atomic on 32-bit machines. Mixing up upper and lower halves
> between threads could result in really weird evaluations.

Oh don't get me wrong, I agree with you that threading can result in 
strange, unpredictable errors.

That's why I try not to use threading. I have no illusions about my 
ability to debug those sorts of problems.


> More importantly, this loop may never finish:
> 
> # Initially
> quit = False
> 
> # Thread 1
> global quit
> while not quit:
> time.sleep(1)
> 
> # Thread 2
> global quit
> quit = True

Assuming that thread 2 actually runs *at some point*, I don't see how 
that can't terminate. Neither thread sets quit to False, so provided 
thread 2 runs at all, it has to terminate.


I suppose if the threading implementation *could* fall back to sequential 
code (thread 2 doesn't run until thread 1 finishes, which it never 
does...) that outcome is possible. But it would have to be a pretty poor 
implementation.

Now if you said there were fifty threads (aside from the main thread, 
which is guaranteed to run) all reading quit, and only thread 50 ever 
assigns to it, I'd believe that perhaps thread 50 never gets a chance to 
run. But with just two threads? Explain please.


> That's the reality in Java and C. I see no reason why that wouldn't be
> the reality in Python as well -- unless the language specification said
> otherwise.

Because the Python core developers care more about correctness than speed.


> Marko
> 
> PS My example with "impossible" being the result of a racy integer
> operation is of course unlikely but could be the outcome if the Python
> runtime reorganized its object cache on the fly (in a hypothetical
> implementation).

That would be a cache bug :-)

Not every interpreter bug should be considered the caller's fault :-)


-- 
Steven D'Aprano
"Ever since I learned about confirmation bias, I've been seeing
it everywhere." -- Jon Ronson

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Thread-safe way to add a key to a dict only if it isn't already there?

2018-07-08 Thread Marko Rauhamaa
Steven D'Aprano :
> Changing implementations from one which is thread safe to one which is
> not can break people's code, and shouldn't be done on a whim.
> Especially since such breakage could be subtle, hard to notice, harder
> to track down, and even harder still to fix.

Java's HotSpot does it all the time, and it did result in code
breakage -- although the code was broken to begin with.

> So there is no coherent way to get a result of "impossible" from just
> adding 1 to 1 in any coherent implementation of Python.

Back to Java, there was a real case of 64-bit integer operations not
being atomic on 32-bit machines. Mixing up upper and lower halves
between threads could result in really weird evaluations.

More importantly, this loop may never finish:

# Initially
quit = False

# Thread 1
global quit
while not quit:
time.sleep(1)

# Thread 2
global quit
quit = True

That's the reality in Java and C. I see no reason why that wouldn't be
the reality in Python as well -- unless the language specification said
otherwise.


Marko

PS My example with "impossible" being the result of a racy integer
operation is of course unlikely but could be the outcome if the Python
runtime reorganized its object cache on the fly (in a hypothetical
implementation).
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Thread-safe way to add a key to a dict only if it isn't already there?

2018-07-08 Thread Steven D'Aprano
On Sun, 08 Jul 2018 10:52:15 +0300, Marko Rauhamaa wrote:

> You are on the right track, Chris, but you are still deducing behavior
> from a particular implementation. For example Java's definition is
> approximately the one I give above:
> 
>Without correct synchronization, very strange, confusing and
>counterintuitive behaviors are possible.

Indeed, and that applies to all languages with threading, including 
Python. But notice that it doesn't say:

Without correct synchronization, impossible things are possible.


>https://docs.oracle.com/javase/specs/jls/se7/html/jls-17.htm
>l#jls-17.4.5>

You seem to have read this as "Java threading can do impossible things", 
but I think the correct reading of the article is "Here are the possible 
things that Java threading can do to mess with your mind". That's a much 
smaller subset of things which can go wrong.

(But still pretty big.)

See, for example Example 17.4-1, which explicitly shows that the Java 
execution model is permitted to reorder statements:

r1 = A;  B = 1;

may be reordered to 

B = 1;  r1 = A;

That's harmless in single-threaded code, but risky in multi-threaded 
code. The result is that in the example given, r1 could end up being set 
to 1 instead of the expected values -- but it cannot possibly be set to 
the string "impossible", since that goes against the Java type system.

In Python, there's no possible order of operations of assigning 1 and 
adding 1 that could result in a string. No Python compiler or interpreter 
is permitted to evaluate `1 + 1` or `2 + 1` as the string "impossible", 
even if threads are involved.

(That's not to say that threads might not assign a string to the variable 
we expected to contain 2, but it would have to do so by assignment, not 
by mucking about with the execution order of integer addition.)


In the case of Example 17.4-1, starting with A = B = 0, two threads 
execute:

Thread 1:r2 = A; B = 1; 
Thread 2:r1 = B; A = 2;

Java compilers are permitted to inspect each thread *in isolation*, 
decide that the order of lines is invisible to the caller, and so re-
order them. The result of that is that those four lines can be executed 
in any permutation, giving a total of 24 separate possibilities.

py> from itertools import permutations
py> instructions = ["r2 = A;", "B = 1;", "r1 = B;", "A = 2;"]
py> len(list(permutations(instructions)))
24


But Python (at least for now) cannot do this. It has a more deterministic 
execution order which guarantees top-to-bottom execution of Python 
statements. That implies that there are only *six* possible execution 
orders of those two threads (which is still enough to add non-determinism 
to your code!) not twenty-four:

r2 = A;  B = 1;  r1 = B;  A = 2;
r2 = A;  r1 = B;  B = 1;  A = 2;
r2 = A;  r1 = B;  A = 2;  B = 1;
r1 = B;  A = 2;  r2 = A;  B = 1;
r1 = B;  r2 = A;  A = 2;  B = 1;
r1 = B;  r2 = A;  A = 2;  B = 1;

and that remains true even if the underlying implementation is written in 
Java, or Malbolge for that matter.


> And we know that Python has been implemented using Java...

That's irrelevant. A legal Python does not inherit the quirks of the 
underlying implementation -- it must still follow the rules of the 
language, or it isn't Python. Java is statically typed, with machine 
ints, Python is dynamically typed, with no machine ints.

Relevant: 

http://doc.pypy.org/en/latest/cpython_differences.html

http://docs.micropython.org/en/latest/unix/genrst/index.html


-- 
Steven D'Aprano
"Ever since I learned about confirmation bias, I've been seeing
it everywhere." -- Jon Ronson

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Thread-safe way to add a key to a dict only if it isn't already there?

2018-07-08 Thread Marko Rauhamaa
Chris Angelico :

> On Sun, Jul 8, 2018 at 11:04 AM, Steven D'Aprano
>  wrote:
>>> The only thing Python should guarantee is that the data structures stay
>>> "coherent" under race conditions. In other words, there cannot be a
>>> segmentation fault. For example, if two threads executed this code in
>>> parallel:
>>>
>>> global i
>>> i = 1
>>> i += 1
>>>
>>> a legal end result could be that i contains the string "impossible".
>>
>> That wouldn't be coherent. The only coherent results are that i could
>> equal either 2 or 3:
>
> Python threads don't switch only between lines of code, so the actual
> interaction is a bit more complicated than you say. [...]
>
> But you're absolutely right that there are only a small handful of
> plausible results, even with threading involved.

You are on the right track, Chris, but you are still deducing behavior
from a particular implementation. For example Java's definition is
approximately the one I give above:

   Without correct synchronization, very strange, confusing and
   counterintuitive behaviors are possible.

   https://docs.oracle.com/javase/specs/jls/se7/html/jls-17.htm
   l#jls-17.4.5>

And we know that Python has been implemented using Java...


Marko
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Thread-safe way to add a key to a dict only if it isn't already there?

2018-07-07 Thread Chris Angelico
On Sun, Jul 8, 2018 at 1:58 PM, Steven D'Aprano
 wrote:
> On Sun, 08 Jul 2018 12:23:41 +1000, Chris Angelico wrote:
>
>>> Some people, when confronted with a problem, say, "I know, I'll use
>>> threads". Nothhtwo probw ey ave lems.
>>
>> Right. Now they have to deal with interleaving, but that's all. And
>> honestly, MOST CODE wouldn't notice interleaving; it's only when you
>> change (either by rebinding or by mutating) something that can be seen
>> by multiple threads. Which basically means "mutable globals are a risk,
>> pretty much everything else is safe".
>
> Its not just globals though. Any mutable object shared between two
> threads is potentially a risk. Globals are just the most obvious example.
>

Right; by "basically" I mean "yes there are others but this is what
matters". The technically-accurate part is in the previous sentence:
"can be seen by multiple threads".

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Thread-safe way to add a key to a dict only if it isn't already there?

2018-07-07 Thread Steven D'Aprano
On Sun, 08 Jul 2018 12:23:41 +1000, Chris Angelico wrote:

>> Some people, when confronted with a problem, say, "I know, I'll use
>> threads". Nothhtwo probw ey ave lems.
> 
> Right. Now they have to deal with interleaving, but that's all. And
> honestly, MOST CODE wouldn't notice interleaving; it's only when you
> change (either by rebinding or by mutating) something that can be seen
> by multiple threads. Which basically means "mutable globals are a risk,
> pretty much everything else is safe".

Its not just globals though. Any mutable object shared between two 
threads is potentially a risk. Globals are just the most obvious example.



-- 
Steven D'Aprano
"Ever since I learned about confirmation bias, I've been seeing
it everywhere." -- Jon Ronson

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Thread-safe way to add a key to a dict only if it isn't already there?

2018-07-07 Thread Chris Angelico
On Sun, Jul 8, 2018 at 12:12 PM, Steven D'Aprano
 wrote:
> On Sun, 08 Jul 2018 11:15:17 +1000, Chris Angelico wrote:
>
> [...]
>> Python threads don't switch only between lines of code,
>
> As I understand it, there could be a switch between any two byte codes,
> or maybe only between certain bytes codes. But certain more fine grained
> than just between lines of code.
>
>
>> so the actual
>> interaction is a bit more complicated than you say. In CPython, the
>> increment operation is:
>>
>>   3   0 LOAD_GLOBAL  0 (i)
>>   2 LOAD_CONST   1 (1)
>>   4 INPLACE_ADD
>>   6 STORE_GLOBAL 0 (i)
>>
>> A context switch could happen between any pair of statements.
>
> If you actually mean *statements* as opposed to byte codes, then the only
> place there could be a switch would be either before the LOAD_GLOBAL or
> after the STORE_GLOBAL (given that i is a built-in int and cannot have a
> custom __iadd__ method).
>
> Is that what you mean?

I may be wrong, but I always assume that a context switch could happen
between any two bytecode operations - or, if you're reading the
disassembly, between any two lines *of disassembly*. So there could be
a switch before LOAD_GLOBAL, a switch between that and LOAD_CONST,
another switch before the ADD, another before the STORE, and another
right at the end. Well, there won't be *all* of those, but there could
be any of them.

This might not be entirely correct - there might be pairs that are
functionally atomic - but it's the safe assumption.

>> For instance, if you replace "i
>> += 1" with "i += i", to double the value, you'll get this:
>>
>>   3   0 LOAD_GLOBAL  0 (i)
>>   2 LOAD_GLOBAL  0 (i)
>>   4 INPLACE_ADD
>>   6 STORE_GLOBAL 0 (i)
>>
>> and that could potentially have both of them load the initial value,
>> then one of them runs to completion, and then the other loads the result
>> - so it'll add 1 and 2 and have a result of 3, rather than 2 or 4.
>
> Some people, when confronted with a problem, say, "I know, I'll use
> threads". Nothhtwo probw ey ave lems.

Right. Now they have to deal with interleaving, but that's all. And
honestly, MOST CODE wouldn't notice interleaving; it's only when you
change (either by rebinding or by mutating) something that can be seen
by multiple threads. Which basically means "mutable globals are a
risk, pretty much everything else is safe".

>> But you're absolutely right that there are only a small handful of
>> plausible results, even with threading involved.
>
> Indeed. Even though threading is non-deterministic, it isn't *entirely*
> unconstrained.
>

Yeah. Quite far from it, in fact. Python threading is well-defined and
fairly easy to work with. Only in a handful of operations do you need
to worry about atomicity - like the one that started this thread.

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Thread-safe way to add a key to a dict only if it isn't already there?

2018-07-07 Thread Steven D'Aprano
On Sun, 08 Jul 2018 11:15:17 +1000, Chris Angelico wrote:

[...]
> Python threads don't switch only between lines of code, 

As I understand it, there could be a switch between any two byte codes, 
or maybe only between certain bytes codes. But certain more fine grained 
than just between lines of code.


> so the actual
> interaction is a bit more complicated than you say. In CPython, the
> increment operation is:
> 
>   3   0 LOAD_GLOBAL  0 (i)
>   2 LOAD_CONST   1 (1)
>   4 INPLACE_ADD
>   6 STORE_GLOBAL 0 (i)
> 
> A context switch could happen between any pair of statements.

If you actually mean *statements* as opposed to byte codes, then the only 
place there could be a switch would be either before the LOAD_GLOBAL or 
after the STORE_GLOBAL (given that i is a built-in int and cannot have a 
custom __iadd__ method).

Is that what you mean?


> In this
> particular example, the end result doesn't change - coherent results are
> 2 and 3, nothing else - but in other situations, there may be times when
> the separate steps might be significant.

Fortunately I wasn't talking about other code snippets, only the one 
shown :-)


> For instance, if you replace "i
> += 1" with "i += i", to double the value, you'll get this:
> 
>   3   0 LOAD_GLOBAL  0 (i)
>   2 LOAD_GLOBAL  0 (i)
>   4 INPLACE_ADD
>   6 STORE_GLOBAL 0 (i)
> 
> and that could potentially have both of them load the initial value,
> then one of them runs to completion, and then the other loads the result
> - so it'll add 1 and 2 and have a result of 3, rather than 2 or 4.

Some people, when confronted with a problem, say, "I know, I'll use 
threads". Nothhtwo probw ey ave lems.



> But you're absolutely right that there are only a small handful of
> plausible results, even with threading involved.

Indeed. Even though threading is non-deterministic, it isn't *entirely* 
unconstrained.



-- 
Steven D'Aprano
"Ever since I learned about confirmation bias, I've been seeing
it everywhere." -- Jon Ronson

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Thread-safe way to add a key to a dict only if it isn't already there?

2018-07-07 Thread Chris Angelico
On Sun, Jul 8, 2018 at 11:04 AM, Steven D'Aprano
 wrote:
>> The only thing Python should guarantee is that the data structures stay
>> "coherent" under race conditions. In other words, there cannot be a
>> segmentation fault. For example, if two threads executed this code in
>> parallel:
>>
>> global i
>> i = 1
>> i += 1
>>
>> a legal end result could be that i contains the string "impossible".
>
> That wouldn't be coherent. The only coherent results are that i could
> equal either 2 or 3:
>
> Possibility 1:
>
> i = 1   # thread 1
> i += 1  # thread 1
> i = 1   # thread 2
> i += 1  # thread 2
> assert i == 2
>
> Possibility 2:
>
> i = 1   # thread 1
> i = 1   # thread 2
> i += 1  # thread 1
> i += 1  # thread 2
> assert i == 3
>
>
> Executing statements out of order is impossible, even in threaded code.
> Redefining the meaning of the integer literal 1 is impossible (ignoring
> unsupported shenanigans with ctypes). Monkey-patching the int __iadd__
> method is impossible. So there is no coherent way to get a result of
> "impossible" from just adding 1 to 1 in any coherent implementation of
> Python.

Python threads don't switch only between lines of code, so the actual
interaction is a bit more complicated than you say. In CPython, the
increment operation is:

  3   0 LOAD_GLOBAL  0 (i)
  2 LOAD_CONST   1 (1)
  4 INPLACE_ADD
  6 STORE_GLOBAL 0 (i)

A context switch could happen between any pair of statements. In this
particular example, the end result doesn't change - coherent results
are 2 and 3, nothing else - but in other situations, there may be
times when the separate steps might be significant. For instance, if
you replace "i += 1" with "i += i", to double the value, you'll get
this:

  3   0 LOAD_GLOBAL  0 (i)
  2 LOAD_GLOBAL  0 (i)
  4 INPLACE_ADD
  6 STORE_GLOBAL 0 (i)

and that could potentially have both of them load the initial value,
then one of them runs to completion, and then the other loads the
result - so it'll add 1 and 2 and have a result of 3, rather than 2 or
4.

But you're absolutely right that there are only a small handful of
plausible results, even with threading involved.

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Thread-safe way to add a key to a dict only if it isn't already there?

2018-07-07 Thread Steven D'Aprano
On Sun, 08 Jul 2018 00:00:26 +0300, Marko Rauhamaa wrote:

> Ian Kelly :
>> the leaning of the devs seems to be to refrain from documenting it and
>> instead document that *no* operations are guaranteed atomic.
> 
> I believe that to be wise. Otherwise, Python would limit its future
> implementation options.

o_O

By that logic, one should say we shouldn't document the semantics of 
operations, so we don't limit the implementation.

"What does list.sort do?"

"It currently sorts the list using a stable comparison sort, but you 
can't rely on that it, because we didn't want to limit the 
implementation."

Sorting is guaranteed to be stable. Does that limit the implementation 
options? Yes, of course it does. So does the fact that it *sorts* -- 
we're limited to implementations which *sort*, and specifically 
comparison sorts (not, for example, radix sorts).

Changing the implementation to a radix sort that only works on ints would 
break things. So would changing the implementation to something which 
empties the list of all elements. ("The empty list is always sorted.")

Stable semantics are more important than freedom to change implementation.

Whether or not an operation is thread-safe, or atomic, is part of the 
semantics of the operation. It's not merely a performance issue, and we 
ought to treat it will a bit more respect.

I'm not saying that everything needs to be documented as thread-safe or 
not (for at least one release, Python's sorting actually was stable 
before it was documented as such) but the gung-ho attitude that safety is 
just an implementation detail should be resisted.

Changing implementations from one which is thread safe to one which is 
not can break people's code, and shouldn't be done on a whim. Especially 
since such breakage could be subtle, hard to notice, harder to track 
down, and even harder still to fix.


> The only thing Python should guarantee is that the data structures stay
> "coherent" under race conditions. In other words, there cannot be a
> segmentation fault. For example, if two threads executed this code in
> parallel:
> 
> global i
> i = 1
> i += 1
> 
> a legal end result could be that i contains the string "impossible".

That wouldn't be coherent. The only coherent results are that i could 
equal either 2 or 3:

Possibility 1:

i = 1   # thread 1
i += 1  # thread 1
i = 1   # thread 2
i += 1  # thread 2
assert i == 2

Possibility 2:

i = 1   # thread 1
i = 1   # thread 2
i += 1  # thread 1
i += 1  # thread 2
assert i == 3


Executing statements out of order is impossible, even in threaded code. 
Redefining the meaning of the integer literal 1 is impossible (ignoring 
unsupported shenanigans with ctypes). Monkey-patching the int __iadd__ 
method is impossible. So there is no coherent way to get a result of 
"impossible" from just adding 1 to 1 in any coherent implementation of 
Python.


-- 
Steven D'Aprano
"Ever since I learned about confirmation bias, I've been seeing
it everywhere." -- Jon Ronson

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Thread-safe way to add a key to a dict only if it isn't already there?

2018-07-07 Thread Marko Rauhamaa
Ian Kelly :
> the leaning of the devs seems to be to refrain from documenting it and
> instead document that *no* operations are guaranteed atomic.

I believe that to be wise. Otherwise, Python would limit its future
implementation options.

The only thing Python should guarantee is that the data structures stay
"coherent" under race conditions. In other words, there cannot be a
segmentation fault. For example, if two threads executed this code in
parallel:

global i
i = 1
i += 1

a legal end result could be that i contains the string "impossible".


Marko
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Thread-safe way to add a key to a dict only if it isn't already there?

2018-07-07 Thread Ethan Furman

On 07/06/2018 10:51 AM, INADA Naoki wrote:
> On Sat, Jul 7, 2018 at 2:49 AM Steven D'Aprano wrote:

>> I have a dict with string keys:
>>
>> --> D = {'a': None, 'b': None}

>> How do I do a thread-safe insertion if, and only if, the key isn't
>> already there?
>

D.setdefault('c', None)


This is how Enum avoids race conditions when setting up the re.RegexType 
enumeration.

--
~Ethan~
--
https://mail.python.org/mailman/listinfo/python-list


Re: Thread-safe way to add a key to a dict only if it isn't already there?

2018-07-07 Thread Ian Kelly
On Fri, Jul 6, 2018 at 11:56 AM INADA Naoki  wrote:
>
> D.setdefault('c', None)

Not guaranteed to be atomic.

I think the only safe way to do it is with a Lock.
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Thread-safe way to add a key to a dict only if it isn't already there?

2018-07-07 Thread Devin Jeanpierre
On Sat, Jul 7, 2018 at 6:49 AM Marko Rauhamaa  wrote:
> Is that guaranteed to be thread-safe? The documentation ( s://docs.python.org/3/library/stdtypes.html#dict.setdefault>) makes no
> such promise.

It's guaranteed to be thread-safe because all of Python's core
containers are thread safe (in as far as they document
behaviors/invariants, which implicitly also hold in multithreaded code
-- Python does not take the approach other languages do of
"thread-compatible" containers that have undefined behavior if mutated
from multiple threads simultaneously). It isn't guaranteed to be
_atomic_ by the documentation, but I bet no Python implementation
would make dict.setdefault non-atomic.

There's no good description of the threading rules for Python data
structures anywhere. ISTR there was a proposal to give Python some
defined rules around thread-safety a couple of years ago (to help with
things like GIL-less python projects), but I guess nothing ever came
of it.

-- Devin
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Thread-safe way to add a key to a dict only if it isn't already there?

2018-07-07 Thread Ian Kelly
On Sat, Jul 7, 2018 at 8:03 AM Stefan Behnel  wrote:
>
> Marko Rauhamaa schrieb am 07.07.2018 um 15:41:
> > Steven D'Aprano :
> >> On Sat, 07 Jul 2018 02:51:41 +0900, INADA Naoki wrote:
> >>> D.setdefault('c', None)
> >>
> >> Oh that's clever!
> >
> > Is that guaranteed to be thread-safe? The documentation ( > s://docs.python.org/3/library/stdtypes.html#dict.setdefault>) makes no
> > such promise.
>
> It's implemented in C and it's at least designed to avoid multiple lookups
> and hash value calculations, which suggests that it's also thread-safe by
> design (or by a side-effect of the design). Whether that's guaranteed, I
> cannot say, but a change that makes it non-thread-safe would probably be
> very controversial.

It's only implemented in C if you're using CPython (and if it's the
builtin dict type and not a subclass). If there's any chance that your
code might run under any other interpreter than CPython, then you
can't rely on the GIL for thread-safety. I would also point out
https://bugs.python.org/issue25343. While some operations are known to
be atomic (and therefore thread-safe), the leaning of the devs seems
to be to refrain from documenting it and instead document that *no*
operations are guaranteed atomic.

> > At least __collectios_abc.py
> > contains this method definition for MutableMapping:
> >
> > def setdefault(self, key, default=None):
> > 'D.setdefault(k[,d]) -> D.get(k,d), also set D[k]=d if k not in D'
> > try:
> > return self[key]
> > except KeyError:
> > self[key] = default
> > return default
> >
> > There are more such non-thread-safe definitions.
>
> That's a different beast, because Python code can always be interrupted by
> thread switches (between each byte code execution). C code cannot, unless
> it starts executing byte code (e.g. for calculating a key's hash value) or
> explicitly allows a thread switch at a given point.

dict.setdefault does potentially call __hash__ and __eq__ on the key.
Since this is part of the lookup I don't know whether it affects
thread-safety as long as the key is properly hashable, but it does
make it more difficult to reason about. I don't *think* that
setdefault calls Py_DECREF, but if it did then that is another
potential point of thread interruption.

By contrast, using a mutex to guard accesses is definitely safe, and
it's also self-documenting of the fact that thread-safety is a
concern.
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Thread-safe way to add a key to a dict only if it isn't already there?

2018-07-07 Thread Stefan Behnel
Marko Rauhamaa schrieb am 07.07.2018 um 15:41:
> Steven D'Aprano :
>> On Sat, 07 Jul 2018 02:51:41 +0900, INADA Naoki wrote:
>>> D.setdefault('c', None)
>>
>> Oh that's clever!
> 
> Is that guaranteed to be thread-safe? The documentation ( s://docs.python.org/3/library/stdtypes.html#dict.setdefault>) makes no
> such promise.

It's implemented in C and it's at least designed to avoid multiple lookups
and hash value calculations, which suggests that it's also thread-safe by
design (or by a side-effect of the design). Whether that's guaranteed, I
cannot say, but a change that makes it non-thread-safe would probably be
very controversial.


> At least __collectios_abc.py
> contains this method definition for MutableMapping:
> 
> def setdefault(self, key, default=None):
> 'D.setdefault(k[,d]) -> D.get(k,d), also set D[k]=d if k not in D'
> try:
> return self[key]
> except KeyError:
> self[key] = default
> return default
> 
> There are more such non-thread-safe definitions.

That's a different beast, because Python code can always be interrupted by
thread switches (between each byte code execution). C code cannot, unless
it starts executing byte code (e.g. for calculating a key's hash value) or
explicitly allows a thread switch at a given point.

Stefan

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Thread-safe way to add a key to a dict only if it isn't already there?

2018-07-07 Thread Marko Rauhamaa
Steven D'Aprano :
> On Sat, 07 Jul 2018 02:51:41 +0900, INADA Naoki wrote:
>> D.setdefault('c', None)
>
> Oh that's clever!

Is that guaranteed to be thread-safe? The documentation () makes no
such promise.

At least __collectios_abc.py
contains this method definition for MutableMapping:

def setdefault(self, key, default=None):
'D.setdefault(k[,d]) -> D.get(k,d), also set D[k]=d if k not in D'
try:
return self[key]
except KeyError:
self[key] = default
return default

There are more such non-thread-safe definitions.


Marko
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Thread-safe way to add a key to a dict only if it isn't already there?

2018-07-06 Thread Steven D'Aprano
On Sat, 07 Jul 2018 02:51:41 +0900, INADA Naoki wrote:

> D.setdefault('c', None)

Oh that's clever!

Thanks!



-- 
Steven D'Aprano
"Ever since I learned about confirmation bias, I've been seeing
it everywhere." -- Jon Ronson

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Thread-safe way to add a key to a dict only if it isn't already there?

2018-07-06 Thread INADA Naoki
D.setdefault('c', None)
On Sat, Jul 7, 2018 at 2:49 AM Steven D'Aprano
 wrote:
>
> I have a dict with string keys:
>
> D = {'a': None, 'b': None}
>
> (the values don't matter for this question) and I want to add a key but
> only if it isn't already there. If I do the obvious:
>
> if not 'c' in D:
> D['c'] = None
>
> there's a Time Of Check to Time Of Use bug where some other thread could
> conceivably insert 'c' into the dict between the check and the insertion.
>
> How do I do a thread-safe insertion if, and only if, the key isn't
> already there?
>
>
>
> Thanks in advance,
>
>
>
> --
> Steven D'Aprano
> "Ever since I learned about confirmation bias, I've been seeing
> it everywhere." -- Jon Ronson
>
> --
> https://mail.python.org/mailman/listinfo/python-list



-- 
INADA Naoki  
-- 
https://mail.python.org/mailman/listinfo/python-list


Thread-safe way to add a key to a dict only if it isn't already there?

2018-07-06 Thread Steven D'Aprano
I have a dict with string keys:

D = {'a': None, 'b': None}

(the values don't matter for this question) and I want to add a key but 
only if it isn't already there. If I do the obvious:

if not 'c' in D:
D['c'] = None

there's a Time Of Check to Time Of Use bug where some other thread could 
conceivably insert 'c' into the dict between the check and the insertion.

How do I do a thread-safe insertion if, and only if, the key isn't 
already there?



Thanks in advance,



-- 
Steven D'Aprano
"Ever since I learned about confirmation bias, I've been seeing
it everywhere." -- Jon Ronson

-- 
https://mail.python.org/mailman/listinfo/python-list