[Python-ideas] Re: Adding a "once" function to functools

Tom Forbes Fri, 01 May 2020 09:55:14 -0700

> You’ve written an exactly equIvalent to the double-checked locking for 
> singletons examples that broke Java 1.4 and C++03 and led to us having once 
> functions in the first place.
>  … but what about on Jython, or PyPy-STM, or a future GIL-less Python?


While I truly do appreciate your feedback on this idea, I’m really not clear on 
your line of reasoning here. What specifically do you propose would be the 
issue with the *Python* implementation? Are you proposing that under some 
Python implementations `cache = func()` could be… the result of half a function 
call? I could buy an issue with some implementations meaning that `cache` still 
appears as `sentinel` in specific situations, but I feel that would constitute 
a pretty obvious bug in the implementation that would impact a _lot_ of other 
multithreaded code rather than a glaring issue with this snippet. Both the 
issues you’ve referenced valid, but also are rather specific to the languages 
that they affect. I don’t believe they apply to Python.

> But you won’t be paying the overhead only on the first call, you’ll be paying 
> it on all of the calls that before the first one completed. 

Sure, that was a typo. It should have read:

> Seems generally more correct, even in single threaded cases, to pay the 
> overhead only in the first call (or first few calls if there is contention) 
> if you want `call_once` semantics

I still think the point stands. With your two-separate-decorators approach 
you’re paying it on every call. As a general purpose `call_once()` 
implementation I think the snippet works well, but obviously if you have some 
very specific use-case where it’s not appropriate - well then you are probably 
able to write a very specific and suitable decorator.


> On 30 Apr 2020, at 03:09, Andrew Barnert <abarn...@yahoo.com> wrote:
> 
> On Apr 29, 2020, at 11:15, Tom Forbes <t...@tomforb.es 
> <mailto:t...@tomforb.es>> wrote:
>> 
>>> Thread 2 wakes up with the lock, calls the function, fills the cache, and 
>>> releases the lock.
>> 
>> What exactly would the issue be with this:
>> 
>> ```
>> import functools
>> from threading import Lock
>> 
>> def once(func):
>>   sentinel = object()
>>   cache = sentinel
>>   lock = Lock()
>> 
>>   @functools.wraps(func)
>>   def _wrapper():
>>       nonlocal cache, lock, sentinel
>>       if cache is sentinel:
>>           with lock:
>>               if cache is sentinel:
>>                   cache = func()
>>       return cache
>> 
>>   return _wrapper
>> ```
> 
> You’ve written an exactly equIvalent to the double-checked locking for 
> singletons examples that broke Java 1.4 and C++03 and led to us having once 
> functions in the first place.
> 
> In both of those languages, and most others, there is no guarantee that the 
> write to cache in thread 1 happens between the two reads from cache in thread 
> 2. Which gives you the fun kind of bug that every few thousand runs you have 
> corrupted data an hour later, or it works fine on your computer but it 
> crashes for one of your users because they have two CPUs that don’t share L2 
> cache while you have all your cores on the same die, or it works fine until 
> you change some completely unrelated part of the code, etc.
> 
> Java solved this by adding volatile variables in Java 5 (existing code was 
> still broken, but just mark cache volatile and it’s fixed); C++11 added a 
> compiler-assisted call_once function (and added a memory model that allows 
> them to specify exactly what happens and when so that the desired behavior 
> was actually guaranteeable). Newer languages learned from their experience 
> and got it right the first time, rather than repeating the same mistake.
> 
> Is there anything about Python’s memory model guarantee that means it can’t 
> happen in Python? I don’t think there _is_ a memory model. In CPython, or any 
> GIL-based implementation, I _think_ it’s safe (the other thread can’t be 
> running at the same time on a different core, so there can’t be a cache 
> coherency ordering issue between the cores, right?), but what about on 
> Jython, or PyPy-STM, or a future GIL-less Python?
> 
> And in both of those languages, double-checked locking is still nowhere near 
> as efficient as using a local static.
> 
>> Seems generally more correct, even in single threaded cases, to pay the 
>> overhead only in the first call if you want `call_once` semantics. Which is 
>> why you would be using `call_once` in the first place?
> 
> But you won’t be paying the overhead only on the first call, you’ll be paying 
> it on all of the calls that before the first one completed. That’s the whole 
> point of the lock, after all—they have to wait until it’s ready—and they 
> can’t possibly do that without the lock overhead. And for the next few 
> afterward, because they’ll have gotten far enough to check even if they 
> haven’t gotten far enough to get the lock, and there’s no way they can know 
> they don’t need the lock. And for the next few after that, because unless the 
> system only runs one thread at a time and synchronizes all of memory every 
> time you switch threads they may not see the write yet anyway.

_______________________________________________
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/BSYHWJVMHNSPXEWFCNBP5MI6A4QRO44M/
Code of Conduct: http://python.org/psf/codeofconduct/

[Python-ideas] Re: Adding a "once" function to functools

Reply via email to