[Python-ideas] Re: Additional LRU cache introspection facilities

2021-01-14 Thread Steven D'Aprano
Okay, thanks everyone for the feedback. I accept that there are more 
practical difficulties than I expected, and the work-arounds I have are 
not too onerous.


-- 
Steve
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/2BLRMAASLOTWWQ5ICC557XPNYX473FRB/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: Additional LRU cache introspection facilities

2021-01-12 Thread Raymond Hettinger
> I propose a method:
> ...
> returns a dictionary {arg: value} representing the cache. 
> It wouldn't be the cache itself, just a shallow copy 
> of the cache data

I recommend against going down this path.

It exposes (and potentially locks in) implementation details such as
how we distinguish positional arguments, keyword arguments, and
type information (something that has changed more than once).

Also, a shallow copy still leaves plenty of room for meddling
with the contents of the keys, potentially breaking the integrity
of the cache.

Another concern is that we've worked hard to remove potential
deadlocks from the lru_cache.  Hanging on a lock while copying the
whole cache complicates our efforts and risks breaking it as users
exploit the new feature in unpredictable ways.

FWIW, OrderedDict provides methods that make it easy to roll
your own variants of the lru_cache().  It would better to do that
than to complexify the base implementation in ways that I think
we would regret.

Raymond
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/OKQO6GFE4JTEAJR4S454KMMKN6C6CNUZ/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: Additional LRU cache introspection facilities

2021-01-12 Thread Serhiy Storchaka
12.01.21 21:28, Sebastian Kreft пише:
> class ForcePythonLruCache(importlib.abc.MetaPathFinder):
>     def find_spec(self, fullname, path, target=None):
>         if fullname == '_functools':
>             raise ImportError('_functools not available')

You can just set sys.modules['_functools'] = None.
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/DQDGTHKTQLQYKD7EZAQQ2RL5HBG6QGEU/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: Additional LRU cache introspection facilities

2021-01-12 Thread Serhiy Storchaka
12.01.21 18:57, Steven D'Aprano пише:
> Can you explain further why the cached function needs additional 
> syncronisation overhead?

The cache uses a double-linked list to track what item is the oldest and
changes it every time you call the cached function (move the found or
new item to the beginning of the list). Code that changes links of the
list is critical. If it is interrupted, we can get a crash or infinite
loop in C. It is guarded by GIL, now Python code is called when changes
are made.

Now, if we iterate the list and save items, we can call Python code when
save items (especially if we save them into a dict). It can change the
list. In the best case it can lead to skipped or duplicated items and
random runtime errors when we use the cached function in other thread
during inspecting the cache. It is not much worse than iterating a
modifying dict. In worst case we can get crashes, perhaps in different
places of code. This code should be written very accurately. There were
three iterations by three authors of writing the C implementation of the
lru_cache, and some bugs were founds several months after that.

One of ways to do it safely is to add explicit locks in addition to GIL.
It is not the easiest, nor the safest, and of course not the most
efficient way, but it is the most obvious one.

> - If you export the cache from one thread while another thread is 
>   reading the cache, I expect that would be safe.

Reading the cache always modifies it.

> I was having trouble with the function, and couldn't tell if the right 
> arguments where going into the cache. What I wanted to do was peek at 
> the cache and see which keys were ending up in the cache and compare 
> that to what I expected.

In your case it would perhaps easier to write your own implementation of
the cache, or disable the C implementation and use the Python
implementation of lru_cache(), which allows some introspection. In any
case, it was one-time problem, and it is already solved. If it occurred
more than one time, it would make sense to think about including such
feature in the stdlib.

But the cost of it may be larger than you expect.
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/CRI3QNJEGAFCQXPSFAVZWJTPXF5Y7OOT/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: Additional LRU cache introspection facilities

2021-01-12 Thread Paul Moore
On Tue, 12 Jan 2021 at 19:28, Sebastian Kreft  wrote:
>
> Actually you can get the cache data from the python version, for that you 
> need to force the use of the python version of functools

"as private as you can get" != "private" :-)
I had a feeling there would be a way to do it, but wasn't bothered to
work out exactly how. Thanks for the example.

Paul
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/2LIH4BSQ2QFFL7GKDYYLDPFFMUBZ4ZRB/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: Additional LRU cache introspection facilities

2021-01-12 Thread Sebastian Kreft
Actually you can get the cache data from the python version, for that you
need to force the use of the python version of functools

def get_cache(f):
freevars = f.__code__.co_freevars
index = freevars.index('cache')
return f.__closure__[index].cell_contents

import importlib.abc

class ForcePythonLruCache(importlib.abc.MetaPathFinder):
def find_spec(self, fullname, path, target=None):
if fullname == '_functools':
raise ImportError('_functools not available')

import sys
del sys.modules['functools']
del sys.modules['_functools']



import functools

@functools.lru_cache
def test(x):
return x

test(1)
test(2)
test('a')
test('foo')

print(list(get_cache(test).keys()))

On Tue, Jan 12, 2021 at 4:09 PM Paul Moore  wrote:

> On Tue, 12 Jan 2021 at 17:16, Christopher Barker 
> wrote:
> >
> > Is the implementation of lru_cache too opaque to poke into it without an
> existing method? Or write a quick monkey-patch?
> >
> > Sorry for not checking myself, but the ability to do that kind of thing
> is one of the great things about a dynamic open source language.
>
> I've only looked at the Python implementation, but the cache is a
> local variable in the wrapper, unavailable from outside. The cache
> information function is a closure that references that local variable,
> assigned as an attribute of the wrapped function.
>
> It's about as private as it's possible to get in Python (not
> particularly because it's *intended* to hide anything, as far as I can
> tell, more likely for performance or some other implementation
> reason).
>
> Paul
> ___
> Python-ideas mailing list -- python-ideas@python.org
> To unsubscribe send an email to python-ideas-le...@python.org
> https://mail.python.org/mailman3/lists/python-ideas.python.org/
> Message archived at
> https://mail.python.org/archives/list/python-ideas@python.org/message/C6YULKSV5RV36WSWMGKWPMLUFDL7YN2Y/
> Code of Conduct: http://python.org/psf/codeofconduct/
>


-- 
Sebastian Kreft
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/EK5DBSL57BMXEQTD7JYC3GVRN2T3YH6K/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: Additional LRU cache introspection facilities

2021-01-12 Thread Paul Moore
On Tue, 12 Jan 2021 at 17:16, Christopher Barker  wrote:
>
> Is the implementation of lru_cache too opaque to poke into it without an 
> existing method? Or write a quick monkey-patch?
>
> Sorry for not checking myself, but the ability to do that kind of thing is 
> one of the great things about a dynamic open source language.

I've only looked at the Python implementation, but the cache is a
local variable in the wrapper, unavailable from outside. The cache
information function is a closure that references that local variable,
assigned as an attribute of the wrapped function.

It's about as private as it's possible to get in Python (not
particularly because it's *intended* to hide anything, as far as I can
tell, more likely for performance or some other implementation
reason).

Paul
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/C6YULKSV5RV36WSWMGKWPMLUFDL7YN2Y/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: Additional LRU cache introspection facilities

2021-01-12 Thread Christopher Barker
Is the implementation of lru_cache too opaque to poke into it without an
existing method? Or write a quick monkey-patch?

Sorry for not checking myself, but the ability to do that kind of thing is
one of the great things about a dynamic open source language.

-CHB

On Tue, Jan 12, 2021 at 9:04 AM Steven D'Aprano  wrote:

> On Tue, Jan 12, 2021 at 04:32:14PM +0200, Serhiy Storchaka wrote:
> > 12.01.21 12:02, Steven D'Aprano пише:
>
> > > I propose a method:
> > >
> > > @functools.lru_cache()
> > > def function(arg):
> > > ...
> > >
> > > function.cache_export()
> > >
> > > that returns a dictionary {arg: value} representing the cache. It
> > > wouldn't be the cache itself, just a shallow copy of the cache data.
> >
> > What if the function supports multiple arguments (including passed by
> > keyword)? Note that internal representation of the key is an
> > implementation detail, so you need to invent and specify some new
> > representation. For example return a list of tuples (args, kwargs,
> result).
>
> Sure. That's a detail that can be worked out once we agree that this is
> a useful feature.
>
>
> > Depending on the implementation, getting the list of all arguments can
> > have larger that linear complexity.
>
> I don't mind. Efficiency is not a priority for this. This is an
> introspection feature for development and debugging, not a feature for
> production. I don't expect it to be called in tight loops. I expect to
> use it from the REPL while I am debugging my code.
>
> I might have to rethink if it was exponentionally slow, but O(n log n)
> like sorting would be fine; I'd even consider O(n**2) acceptable, with a
> documentation note that exporting large caches may be slow.
>
>
> > Other cache implementations can contain additional information: the
> > number of hits for every value, times. Are you interesting to get that
> > information too or ignore it?
>
> No.
>
>
> > Currently the cache is thread-safe in CPython, but getting all arguments
> > and values may not be (or we will need to add a synchronization overhead
> > for every call of the cached function).
>
> Can you explain further why the cached function needs additional
> syncronisation overhead?
>
> I am quite happy for exporting to be thread-unsafe, so long as it
> doesn't crash. Don't export the cache if it is being updated from
> another thread, or you might get inaccurate results.
>
> To be clear:
>
> - If you export the cache from one thread while another thread is
>   reading the cache, I expect that would be safe.
>
> - If you export the cache from one thread while another thread is
>   *modifying* the cache, I expect that the only promise we make is
>   that there shouldn't be a segfault.
>
>
>
> > And finally, what is your use case? Is it important enough to justify
> > the cost?
>
> I think so or I wouldn't have asked :-)
>
> There shouldn't be much (or any?) runtime cost on the cache except for
> the presence of an additional method. The exported data is just a
> snapshot, it doesn't have to be a view of the cache. Changing the
> exported snapshot will not change the cache.
>
> My use-case is debugging functions that are using an LRU cache,
> specifically complex recursive functions. I have some functions where:
>
> f(N)
>
> ends up calling itself many times, but not in any obvious pattern:
>
> f(N-1), f(N-2), f(N-5), f(N-7), f(N-12), f(N-15), f(N-22), ...
>
> for example. So each call to f() could make dozens of recursive calls,
> if N is big enough, and there are gaps in the calls.
>
> I was having trouble with the function, and couldn't tell if the right
> arguments where going into the cache. What I wanted to do was peek at
> the cache and see which keys were ending up in the cache and compare
> that to what I expected.
>
> I did end up get the function working, but I think it would have been
> much easier if I could have seen what was inside the cache and how the
> cache was changing from one call to the next.
>
> So this is why I don't care about performance (within reason). My use
> case is interactive debugging.
>
>
> --
> Steve
> ___
> Python-ideas mailing list -- python-ideas@python.org
> To unsubscribe send an email to python-ideas-le...@python.org
> https://mail.python.org/mailman3/lists/python-ideas.python.org/
> Message archived at
> https://mail.python.org/archives/list/python-ideas@python.org/message/EV2W2DMXSONPHUYXGQD5HK3BIUTFIEVU/
> Code of Conduct: http://python.org/psf/codeofconduct/
>
-- 
Christopher Barker, PhD (Chris)

Python Language Consulting
  - Teaching
  - Scientific Software Development
  - Desktop GUI and Web Development
  - wxPython, numpy, scipy, Cython
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 

[Python-ideas] Re: Additional LRU cache introspection facilities

2021-01-12 Thread Paul Moore
On Tue, 12 Jan 2021 at 17:04, Steven D'Aprano  wrote:

> My use-case is debugging functions that are using an LRU cache,
> specifically complex recursive functions. I have some functions where:
>
> f(N)
>
> ends up calling itself many times, but not in any obvious pattern:
>
> f(N-1), f(N-2), f(N-5), f(N-7), f(N-12), f(N-15), f(N-22), ...
>
> for example. So each call to f() could make dozens of recursive calls,
> if N is big enough, and there are gaps in the calls.
>
> I was having trouble with the function, and couldn't tell if the right
> arguments where going into the cache. What I wanted to do was peek at
> the cache and see which keys were ending up in the cache and compare
> that to what I expected.
>
> I did end up get the function working, but I think it would have been
> much easier if I could have seen what was inside the cache and how the
> cache was changing from one call to the next.
>
> So this is why I don't care about performance (within reason). My use
> case is interactive debugging.

For debugging, could you not temporarily write your own brain-dead simple cache?

CACHE = {}
def f(n):
if n not in CACHE:
calculate result
CACHE[n] = result
return CACHE[n]

(make a decorator, if you feel like).

You can then inspect and instrument as much as you like, and once
you've got things working, replace the hand-written cache with the
stdlib one.

Personally, I don't use lru_cache enough to really have an opinion on
this. My first reaction was "that would be neat", but a quick look at
the code (the Python version, I didn't go near the C version) and
Serhiy's comments made me think that "it's neat" isn't enough
justification for me, at least. In practice, I'd just do as I
described above, if I needed to.

Paul
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/JAU7CDKTU2HUVN33JI7Y56QXVD7C76K5/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: Additional LRU cache introspection facilities

2021-01-12 Thread Steven D'Aprano
On Tue, Jan 12, 2021 at 04:32:14PM +0200, Serhiy Storchaka wrote:
> 12.01.21 12:02, Steven D'Aprano пише:

> > I propose a method:
> > 
> > @functools.lru_cache()
> > def function(arg):
> > ...
> > 
> > function.cache_export()
> > 
> > that returns a dictionary {arg: value} representing the cache. It 
> > wouldn't be the cache itself, just a shallow copy of the cache data.
> 
> What if the function supports multiple arguments (including passed by
> keyword)? Note that internal representation of the key is an
> implementation detail, so you need to invent and specify some new
> representation. For example return a list of tuples (args, kwargs, result).

Sure. That's a detail that can be worked out once we agree that this is 
a useful feature.


> Depending on the implementation, getting the list of all arguments can
> have larger that linear complexity.

I don't mind. Efficiency is not a priority for this. This is an 
introspection feature for development and debugging, not a feature for 
production. I don't expect it to be called in tight loops. I expect to 
use it from the REPL while I am debugging my code.

I might have to rethink if it was exponentionally slow, but O(n log n) 
like sorting would be fine; I'd even consider O(n**2) acceptable, with a 
documentation note that exporting large caches may be slow.


> Other cache implementations can contain additional information: the
> number of hits for every value, times. Are you interesting to get that
> information too or ignore it?

No.


> Currently the cache is thread-safe in CPython, but getting all arguments
> and values may not be (or we will need to add a synchronization overhead
> for every call of the cached function).

Can you explain further why the cached function needs additional 
syncronisation overhead?

I am quite happy for exporting to be thread-unsafe, so long as it 
doesn't crash. Don't export the cache if it is being updated from 
another thread, or you might get inaccurate results.

To be clear:

- If you export the cache from one thread while another thread is 
  reading the cache, I expect that would be safe.

- If you export the cache from one thread while another thread is 
  *modifying* the cache, I expect that the only promise we make is
  that there shouldn't be a segfault.



> And finally, what is your use case? Is it important enough to justify
> the cost?

I think so or I wouldn't have asked :-)

There shouldn't be much (or any?) runtime cost on the cache except for 
the presence of an additional method. The exported data is just a 
snapshot, it doesn't have to be a view of the cache. Changing the 
exported snapshot will not change the cache.

My use-case is debugging functions that are using an LRU cache, 
specifically complex recursive functions. I have some functions where:

f(N)

ends up calling itself many times, but not in any obvious pattern:

f(N-1), f(N-2), f(N-5), f(N-7), f(N-12), f(N-15), f(N-22), ...

for example. So each call to f() could make dozens of recursive calls, 
if N is big enough, and there are gaps in the calls.

I was having trouble with the function, and couldn't tell if the right 
arguments where going into the cache. What I wanted to do was peek at 
the cache and see which keys were ending up in the cache and compare 
that to what I expected.

I did end up get the function working, but I think it would have been 
much easier if I could have seen what was inside the cache and how the 
cache was changing from one call to the next.

So this is why I don't care about performance (within reason). My use 
case is interactive debugging.


-- 
Steve
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/EV2W2DMXSONPHUYXGQD5HK3BIUTFIEVU/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: Additional LRU cache introspection facilities

2021-01-12 Thread Serhiy Storchaka
12.01.21 12:02, Steven D'Aprano пише:
> Is anyone else interested in additional introspection facilities for the 
> functools.lru_cache?
> 
> You can view the cache hit and miss statistics using the cache_info() 
> method, but you can't see what values actually are in the cache. That 
> would be useful to me.
> 
> I propose a method:
> 
> @functools.lru_cache()
> def function(arg):
> ...
> 
> function.cache_export()
> 
> that returns a dictionary {arg: value} representing the cache. It 
> wouldn't be the cache itself, just a shallow copy of the cache data.

What if the function supports multiple arguments (including passed by
keyword)? Note that internal representation of the key is an
implementation detail, so you need to invent and specify some new
representation. For example return a list of tuples (args, kwargs, result).

Depending on the implementation, getting the list of all arguments can
have larger that linear complexity.

Other cache implementations can contain additional information: the
number of hits for every value, times. Are you interesting to get that
information too or ignore it?

Currently the cache is thread-safe in CPython, but getting all arguments
and values may not be (or we will need to add a synchronization overhead
for every call of the cached function).

And finally, what is your use case? Is it important enough to justify
the cost?
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/IUOWLRO425FVYOA7ICRCTPYQCUBELCWB/
Code of Conduct: http://python.org/psf/codeofconduct/