[issue35780] Recheck logic in the C version of the lru_cache()

2019-02-01 Thread Raymond Hettinger


Change by Raymond Hettinger :


--
pull_requests: +11630, 11631, 11632

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue35780] Recheck logic in the C version of the lru_cache()

2019-02-01 Thread Raymond Hettinger


Change by Raymond Hettinger :


--
pull_requests: +11630, 11631

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue35780] Recheck logic in the C version of the lru_cache()

2019-02-01 Thread Raymond Hettinger


Change by Raymond Hettinger :


--
pull_requests: +11630

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue35780] Recheck logic in the C version of the lru_cache()

2019-01-31 Thread miss-islington


Change by miss-islington :


--
pull_requests: +11579, 11580

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue35780] Recheck logic in the C version of the lru_cache()

2019-01-31 Thread miss-islington


Change by miss-islington :


--
pull_requests: +11579, 11580, 11581

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue35780] Recheck logic in the C version of the lru_cache()

2019-01-31 Thread miss-islington


Change by miss-islington :


--
pull_requests: +11579

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue35780] Recheck logic in the C version of the lru_cache()

2019-01-31 Thread Raymond Hettinger


Change by Raymond Hettinger :


--
pull_requests: +11576, 11577

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue35780] Recheck logic in the C version of the lru_cache()

2019-01-31 Thread Raymond Hettinger


Change by Raymond Hettinger :


--
pull_requests: +11576

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue35780] Recheck logic in the C version of the lru_cache()

2019-01-31 Thread Raymond Hettinger


Change by Raymond Hettinger :


--
pull_requests: +11576, 11577, 11578

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue35780] Recheck logic in the C version of the lru_cache()

2019-01-26 Thread Raymond Hettinger


Change by Raymond Hettinger :


--
resolution:  -> fixed
stage: patch review -> resolved
status: open -> closed

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue35780] Recheck logic in the C version of the lru_cache()

2019-01-26 Thread Raymond Hettinger


Raymond Hettinger  added the comment:


New changeset b2b023c657ba8c3f4a24d0c847d10fe8e2a73d44 by Raymond Hettinger 
(Miss Islington (bot)) in branch '3.7':
bpo-35780: Fix errors in lru_cache() C code (GH-11623) (GH-11682)
https://github.com/python/cpython/commit/b2b023c657ba8c3f4a24d0c847d10fe8e2a73d44


--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue35780] Recheck logic in the C version of the lru_cache()

2019-01-26 Thread miss-islington


Change by miss-islington :


--
pull_requests: +11515, 11516

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue35780] Recheck logic in the C version of the lru_cache()

2019-01-26 Thread miss-islington


Change by miss-islington :


--
pull_requests: +11515

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue35780] Recheck logic in the C version of the lru_cache()

2019-01-26 Thread Raymond Hettinger


Raymond Hettinger  added the comment:


New changeset d8080c01195cc9a19af752bfa04d98824dd9fb15 by Raymond Hettinger in 
branch 'master':
bpo-35780: Fix errors in lru_cache() C code (GH-11623)
https://github.com/python/cpython/commit/d8080c01195cc9a19af752bfa04d98824dd9fb15


--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue35780] Recheck logic in the C version of the lru_cache()

2019-01-21 Thread Raymond Hettinger


Raymond Hettinger  added the comment:

> I am not sure about an addition _PyDict_GetItem_KnownHash().

It is a necessary check.  The user call is allowed to update the cache so we no 
longer know without checking whether the new key/result pair has already been 
added.  That is in fact the main bug that is being fixed.

> Every of dict operations can cause executing an arbitrary code 
> and re-entering the execution of bounded_lru_cache_wrapper().

FWIW, the new _PyDict_GetItem_KnownHash() call is made at the top of the post 
user call code path, before any modifications of the cache have occurred.  This 
is the safest possible time to allow reentry.  There's really nothing this call 
can do that couldn't have happened in the user function call.

Also, only the normal case (where the key is not present) proceeds to modify 
the cache state.  The other two paths exit immediately.

> Could not the API that atomically checks and updates the dict
> (like getdefault()/setdefault()) be useful here?

That seems tempting but our goal is a "contains" test.  We're not storing a new 
entry at this point.  Instead, we're just checking to see if any further work 
is necessary.

At some point in the future (not in this bug fix), we could further sync the C 
code and pure python code to use the rotating root entry.   That means that in 
the "self->full" path, we never need to extract, append, or prepend links.  
Only the value fields and root pointer need to be updated.  That does less work 
and always leaves the structure in a consistent state.

> Currently the C implementation memoizes only one result, and 
> f(1.0) returns 1. With the Python implementation and with
> the proposed changes it will return 1.0.

When keyword argument sorting was dropped, we gained a speed improvement but 
came to treat f(a=1, b=2) and f(b=2, a=1) as distinct calls.  Here we cut 
memory overhead in half for common cases but will treat f(1) and f(1.0) as 
distinct calls.  I'm fine with that. It's also nice that the C and pure Python 
versions now match.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue35780] Recheck logic in the C version of the lru_cache()

2019-01-21 Thread Serhiy Storchaka


Serhiy Storchaka  added the comment:

For the test you can use "nonlocal".

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue35780] Recheck logic in the C version of the lru_cache()

2019-01-21 Thread Serhiy Storchaka


Serhiy Storchaka  added the comment:

It was the explanation of possible history, not the justification of bugs. Of 
course bugs should be fixed. Thank you for rechecking this code and for your 
fix.

As for the optimization in lru_cache_make_key(), consider the following example:

@lru_cache()
def f(x):
return x

print(f(1))
print(f(1.0))

Currently the C implementation memoizes only one result, and f(1.0) returns 1. 
With the Python implementation and with the proposed changes  it will return 
1.0. I do not say that any of answers is definitely wrong, but we should be 
aware of this, and it would be better if both implementation will be 
consistent. I am sure this example was already discussed, but I can not find it 
now.

I still did not analyze the new code for adding to the cache. The old code 
contains flaws, agree.

I am not sure about an addition _PyDict_GetItem_KnownHash(). Every of dict 
operations can cause executing an arbitrary code and re-entering the execution 
of bounded_lru_cache_wrapper(). Could not the API that atomically checks and 
updates the dict (like getdefault()/setdefault()) be useful here?

--
assignee: serhiy.storchaka -> 

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue35780] Recheck logic in the C version of the lru_cache()

2019-01-20 Thread Ned Deily


Change by Ned Deily :


--
versions:  -Python 3.6

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue35780] Recheck logic in the C version of the lru_cache()

2019-01-19 Thread Raymond Hettinger


Change by Raymond Hettinger :


--
keywords: +patch, patch, patch
pull_requests: +11378, 11379, 11380
stage:  -> patch review

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue35780] Recheck logic in the C version of the lru_cache()

2019-01-19 Thread Raymond Hettinger


Change by Raymond Hettinger :


--
keywords: +patch
pull_requests: +11378
stage:  -> patch review

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue35780] Recheck logic in the C version of the lru_cache()

2019-01-19 Thread Raymond Hettinger


Change by Raymond Hettinger :


--
keywords: +patch, patch
pull_requests: +11378, 11379
stage:  -> patch review

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue35780] Recheck logic in the C version of the lru_cache()

2019-01-19 Thread Raymond Hettinger


Raymond Hettinger  added the comment:

> It was discussed before, and there is a closed issue.

That is a non-answer.  The above patch is correct and achieves an essential 
goal of the lru_cache to save space when possible (avoid an unnecessary extra 
tuple per entry).   Also, please apply the other patch to eliminate the 
unnecessary double lookup in lru_cache_extricate_link().   Please also fix the 
naming problem, "extricate" -> "extract".

> It may be unintentionally. In any case, this is a case that should be very 
> rare.

Please just fix the bug.  That code path incorrectly refreshes an old-key that 
has not been called recently.  It fails to add the new key that was just 
called.  It leaves an orphan link causing an unnecessary downstream check for 
root!=next to guard for the broken invariants.  It leaves the full variable set 
when the cache is not longer full (missing link).

FWIW, one principal use case for lru_cache() is to support dynamic programming 
in recursive functions.  So a "call within a call" should not be regarded as 
"rare".

> Seems the comment was placed at wrong place.

Not just that.  The relevant code was omitted.  It is important to check to see 
if the new key has already been added by the user_function().  The knowledge of 
whether that key is present is stale after the user function call.  Please 
follow the pure python code in this regard.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue35780] Recheck logic in the C version of the lru_cache()

2019-01-19 Thread Serhiy Storchaka


Serhiy Storchaka  added the comment:

Operations with the linked list are atomic (guarded with the GIL), while 
operations with the cache dict are not. That is why links are removed first 
from the linked list and added back in case of error.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue35780] Recheck logic in the C version of the lru_cache()

2019-01-19 Thread Serhiy Storchaka


Serhiy Storchaka  added the comment:

> If so, then why is the link being moved to the front of the lru_cache -- it 
> should have remained at the oldest position.

It may be unintentionally. In any case, this is a case that should be very rare.

> The solution to this is only extract the link after a successful pop rather 
> than before.

Then there is a possibility to pop the same key (from the last link) twice. The 
GIL can be released in _PyDict_Pop_KnownHash() and other thread can went the 
same way.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue35780] Recheck logic in the C version of the lru_cache()

2019-01-19 Thread Serhiy Storchaka


Serhiy Storchaka  added the comment:

> After the check for popresult==Py_None, there is the comment that was mostly 
> copied from the Python version but doesn't match the actual code:

Seems the comment was placed at wrong place. And it should be updated since 
locks are not used, but the GIL.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue35780] Recheck logic in the C version of the lru_cache()

2019-01-19 Thread Serhiy Storchaka


Serhiy Storchaka  added the comment:

> would it be reasonable to emulate the pure python code and return a scalar 
> instead of a tuple when the tuple length is one and there are no keyword 
> arguments or typing requirements?

It was discussed before, and there is a closed issue. I am not sure about the 
optimization in the Python code. It may lead to bugs in corner cases too.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue35780] Recheck logic in the C version of the lru_cache()

2019-01-19 Thread Raymond Hettinger


Raymond Hettinger  added the comment:

-  Demonstration of one of the bugs -

# The currsize is initially equal to maxsize of 10
# Then we cause an orphan link in a full cache 
# The currsize drops to 9 and never recovers the full size of 10

from functools import lru_cache

once = True

@lru_cache(maxsize=10)
def f(x):
global once
rv = f'.{x}.'
if x == 20 and once:
once = False
print('Calling again', f(x))
return rv

for x in range(15):
f(x)

print(f.cache_info())
print(f(20))
print(f.cache_info())
print(f(21))
print(f.cache_info())


-- Output 
CacheInfo(hits=0, misses=15, maxsize=10, currsize=10)
Calling again .20.
.20.
CacheInfo(hits=0, misses=17, maxsize=10, currsize=9)
.21.
CacheInfo(hits=0, misses=18, maxsize=10, currsize=9)

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue35780] Recheck logic in the C version of the lru_cache()

2019-01-19 Thread Raymond Hettinger


Raymond Hettinger  added the comment:

Suggested code for the open question listed above:

--- a/Modules/_functoolsmodule.c
+++ b/Modules/_functoolsmodule.c
@@ -733,6 +733,15 @@ lru_cache_make_key(PyObject *args, PyObject *kwds, int 
typed)

 /* short path, key will match args anyway, which is a tuple */
 if (!typed && !kwds) {
+if (PyTuple_GET_SIZE(args) == 1) {
+key = PyTuple_GET_ITEM(args, 0);
+if (!PySequence_Check(key)) {
+/* For scalar keys, save space and
+   drop the enclosing args tuple  */
+Py_INCREF(key);
+return key;
+}
+}
 Py_INCREF(args);
 return args;
 }

--
versions:  -Python 2.7, Python 3.4, Python 3.5

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue35780] Recheck logic in the C version of the lru_cache()

2019-01-18 Thread Raymond Hettinger


New submission from Raymond Hettinger :

After the check for popresult==Py_None, there is the comment that was mostly 
copied from the Python version but doesn't match the actual code:

 /* Getting here means that this same key was added to the
cache while the lock was released.  Since the link
update is already done, we need only return the
computed result and update the count of misses. */

The cache.pop uses the old key (the one being evicted), so at this point in the 
code we have an extracted link containing the old key but the pop failed to 
find the dictionary reference to that link.  It tells us nothing about whether 
the current new key has already been added to the cache or whether another 
thread added a different key. This code path doesn't add the new key.  Also, it 
leaves the self->full variable set to True even though we're now at least one 
link short of maxsize.

The next test is for popresult == NULL. If I'm understanding it correctly, it 
means that an error occurred during lookup (possible during the equality 
check).  If so, then why is the link being moved to the front of the lru_cache 
-- it should have remained at the oldest position.  The solution to this is 
only extract the link after a successful pop rather than before.

The final case runs code when the pop succeeded in finding the oldest link.  
The popresult is decreffed but not checked to make sure that it actually is the 
oldest link.  Afterwards, _PyDict_SetItem_KnownHash() is called with the new 
key.  Unlike the pure python code, it does not check to see if the new key has 
already been added by another thread.  This can result in an orphaned link (a 
link not referred to by the cache dict).  I think that is why popresult code 
can ever get to a state where it can return Py_None (it means that the cache 
structure is in an inconsistent state).

I think the fix is to make the code more closely follow the pure python code.  
Verify that the new key hasn't been added by another thread during the user 
function call.  Don't delete the old link until it has been successfully 
popped.  A Py_None return from the pop should be regarded as sign the structure 
is in an inconsistent state.  The self->full variable needed to be reset if 
there are any code paths that deletes links but don't add them back.  Better 
yet, the extraction of a link should be immediately followed by repopulating it 
will new values and moving it to the front of the cache.  That way, the cache 
structure will always remain in a consistent state and the number of links will 
be constant from start to finish.

The current code likely doesn't fail in any spectacular way.  Instead, it will 
occasionally have unreferenced orphan links, will occasionally be marked as 
full when it is short one or more links (and never regaining the lost links), 
will occasionally not put the result of the newest function call into the 
cache, and will occasionally mark the oldest link as being the newest even 
though there wasn't a user function call to the corresponding old key.

Minor nit:  The decrefs should be done at the bottom of each code path instead 
of the top.  This makes it a lot easier to verify that we aren't making 
arbitrary reentrant callbacks until the cache data structures have been put 
into a consistent state.

Minor nit: The test "self->root.next != >root" may no longer be necessary 
if the above issues are fixed.  We can only get to this wrapper when maxsize > 
0, so self->full being true implies that there is at least one link in the 
chain, so self->root.next cannot point back to itself.  Possibly the need for 
this test exists only because the cache is getting into an inconsistent state 
where it is marked as full but there aren't any extant links.

Minor nit:  "lru_cache_extricate_link" should be named 
"lru_cache_extract_link".  The word "extricate" applies only when solving an 
error case; whereas, "extract" applies equally well to normal cases and cases.  
The latter word more closely means "remove an object from a data structure" 
which is what was likely intended.

Another minor nit:  The code in lru_cache_append_link() is written in way where 
the compiler has to handle an impossible case where "link->prev->next = 
link->next" changes the value of "link->next".  The suspicion of aliased 
pointers causes the compiler to generate an unnecessary and redundant memory 
fetch.   The solution again is to more closely follow the pure python code:

diff --git a/Modules/_functoolsmodule.c b/Modules/_functoolsmodule.c
index 0fb4847af9..8cbd79ceaf 100644
--- a/Modules/_functoolsmodule.c
+++ b/Modules/_functoolsmodule.c
@@ -837,8 +837,10 @@ infinite_lru_cache_wrapper(lru_cache_object *self, 
PyObject *args, PyObject *kwd
 static void
 lru_cache_extricate_link(lru_list_elem *link)
 {
-link->prev->next = link->next;
-link->next->prev = link->prev;
+lru_list_elem *link_prev = link->prev;
+lru_list_elem *link_next = link->next;
+