TL;DR

If the `set.add` feature can return a flag stating whether or not the 
element was previously present or not cheaply and atomically, without 
slowing down the single-threaded case with locks, then that's great (as 
I already stated...) and this entire subthread about threading is 
irrelevant and you can stop reading now.

If not, then we should consider carefully the consequences of adding a 
feature that will either slow down the common single-threaded case, or 
potentially be an attractive nuisance in the multi-threaded case.


On Sat, Jun 27, 2020 at 06:46:47PM +1000, Chris Angelico wrote:

> I grew up with concurrency being a perfectly normal thing. To me,
> EVERYTHING is concurrent. (Back then, it was single-core CPUs, so
> technically most machine code instructions could be assumed to be
> atomic, but at any higher level, you could be preempted at any
> moment.)

Cool, but it ignores the point that when you access the file system, 
you cannot avoid concurrency because that's what the OS gives you whether 
you want it or not. That's not what happens with adding elements to a 
set, unless you are sharing it between threads.

(For the record, I too grew up with concurrency being a perfectly normal 
thing. Every desktop OS I've ever touched has had some form of multi- 
processing or another. Classic MacOS had cooperative multiprocessing and 
desk accessories, DOS had TSRs, and the first computer I used for 
serious coding was Unix.)


> > Unless you are in the [Quadrant of Hell](https://imgur.com/a/jjA52vI)
> > you don't even need to think about this for sets. There are no Time
> > Of Check to Time Of Use issues worth thinking about in the given
> > example, and frankly Chris I doubt that you surround ever set access
> > with locks in single-threaded code because TOCTOU is "stronger" (more
> > important?) than writing clear and simple code. And if you do, you're
> > probably wondering why Python is so slow *wink*
> 
> And that is a fundamental difference between our philosophies. You
> consider threads to be terrifyingly scary, 

Oh I do, sometimes I just start screaming at the very thought. I've even 
been known to wet myself.


> and shared mutable data to be "hell". I consider this to be the normal 
> state of things, and that whenever you care, you take care. It 
> actually doesn't come up all that often.

I'm sure you're very good at what you do, but deadlocks, resource 
starvation, non-deterministic execution order, race conditions, etc are 
genuine problems for people, and concurrent code is much harder to write 
correctly than sequential code, and even more difficult to debug. 
Stories like these are hardly rare:

http://thedailywtf.com/articles/Sprite_Threading

http://worsethanfailure.com/articles/Flossin_0x27__My_Threads

Other people have suggested that concurrent code is often riddled with 
race conditions:

"Because concurrency is so incredibly difficult, many code bases I have 
worked on have these sorts of bugsā€¦.everywhere"

https://medium.com/dm03514-tech-blog/golang-candidates-and-contexts-a-heuristic-approach-to-race-condition-detection-e2b230e70d08

and the consequences of race conditions have sometimes been 
catastrophic:

https://www.securityfocus.com/news/8412

https://en.wikipedia.org/wiki/Therac-25

which is why people much smarter than me have spent decades designing 
languages and execution models specifically to deal with concurrency and 
parallel computing safely:

https://en.wikipedia.org/wiki/List_of_concurrent_and_parallel_programming_languages

or simply recommending that, when possible, we ought to avoid mutable 
shared state (in the same way we avoid GOTOs and global variables and 
other anti-patterns):

https://jaxenter.com/akka-anti-patterns-shared-mutable-state-128827.html

https://exploringjs.com/deep-js/ch_shared-mutable-state.html

http://web.mit.edu/6.031/www/fa17/classes/20-thread-safety/

I'm not sure why this is so controversial. I'm making a moderate claim 
here, threading is much harder to get right than sequential code.


> > But I'll grant you that I wasn't thinking about threaded code. The
> > example given was single-threaded, but maybe a better example would
> > involve threading. But that supports my point: thinking concurrently
> > doesn't come naturally.
> 
> Nor does thinking algebraically, nor thinking like a database, nor
> thinking in terms of diamond inheritance and an MRO. These things have
> to be learned. Is it a problem to have to learn things?

Maybe. It depends on the things you have to learn and why you need to. 
Personally I don't feel that learning category theory and monads just to 
be able to print "Hello World" would be a good use of my time.

https://en.wikipedia.org/wiki/Monad_%28functional_programming%29

Please let's not over-generalise here, I'm not opposed to learning in 
general, I'm pointing out that writing concurrent-style code for 
something which is trivially thought of as "if the element has not been 
seen, add it" is not the most natural way of thinking about the task.


> > So `add` would have to be atomic at the C level. If it is already
> > atomic, great, that's one objection neutralised. But if not, then will
> > making it atomic have a performance impact?
> 
> Question: If it ISN'T atomic at the C level, what happens?

Sometimes it will return False when it should have returned True, or 
True when it should have returned False, and because of that wrong 
answer, people will have mysterious bugs, crashes, failures, and code 
that silently does the wrong thing.


[...]
> I would expect that set.add() is guaranteed
> to not add a duplicate, and that right there basically guarantees
> everything that I need.

Don't you also need the method to guarantee to return True only if 
the element was already in the set, and False only if the element was 
not in the set? That is the point of the proposed feature.

(Or maybe it was the other way around, I forget which way the flag is 
supposed to go.)

If you don't have that guarantee, then you either can't use this 
feature, or you have to use manual locks, in which case the benefit is 
significantly reduced.


[...]
> > > > This idiom works with sets, it works with lists, it works with trees, it
> > > > works with anything that supports membership testing and inserting into
> > > > a collection. It is the natural way to think about the task.
> > >
> > > So would "ensure this is here, and let me know if you created it".
> >
> > Are you proposing to add this to lists, deques, dicts, Mappings,
> > Sequences and all other collections? I think you'll need a PEP :-)
> 
> No, I'm not. I'm only talking about this as a concept in things that
> reject duplicates. So at best, it would be sets and dicts. Why on
> earth would you assume I would want this operation on a list?

I commented that the "check, then insert" idiom works with lists etc as 
well as sets, but the "insert and return a flag" idiom doesn't work 
with lists etc. You countered that by saying that the "insert and return 
flag" would work equally well. It can't work at all if it doesn't exist. 
So for it to work just as well for lists as it works for sets, it has to 
be added to lists.


> Are you THAT desperate for an argument?

Is that what we're having? I thought we were discussing the pros and 
cons of a proposed feature. That's what the purpose of Python-Ideas is, 
or at least was.

That's the second time you've made that comment. It would be more 
considerate, open and empathic to assume I'm having a good faith 
discussion instead of accusing me of trolling.


[...]
> > They have different semantics, and I can imagine cases where both would
> > be useful. Although I suppose option 1 is moot if sets don't over-write
> > the existing element.
> >
> 
> They don't have different semantics, so the distinction is irrelevant.

In one the add method is unconditionally called. In the other, the add 
method is only sometimes called. The difference in semantics is clear if 
you consider that this will affect not only builtin sets, but also 
subclasses which may have side-effects or perform other computations.


-- 
Steven
_______________________________________________
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/JMUZEHMEXYNXP7MCQPIFZA3HFV6HG7SV/
Code of Conduct: http://python.org/psf/codeofconduct/

Reply via email to