[issue41220] add optional make_key argument to lru_cache

2020-07-17 Thread Raymond Hettinger

Raymond Hettinger  added the comment:

I've left this open for a week to collect comments.  I concur with Felipe that 
this should be the responsibility of the caller.  And I concur with Jim that 
the use cases are likely too rare to warrant shoehorning in the extra 
functionality.

For my part, I'm concerned that this would be bug bait.  The result of the 
function would need to be uniquely associated with a particular output, 
independent of the contents of the mutable arguments.  I don't think it is good 
design for function calls to be disregarding the part of the input that was 
originally used to compute the output — in a way that notion conflict with the 
core idea of the of the lru_cache giving the same outputs for the same inputs.

Likewise, debugging and exception handing would be complicated by having two 
underlying user functions.  Exceptions could arise from three sources: the 
make_key() function, the cached function, and the lru cache internals.

Further, if the make_key() function turns out not to be unique, the ensuring 
bugs would be hard to find.  In the OP's later example, if we get two distinct 
inputs which happen to have the same timestamp, the error would pass without 
warning and would be hard to reproduce.  Similarly, the make_key() function has 
to assume that the mutable portion of the data hasn't mutated, but a key 
feature of mutable data is that it can mutate.  That too would result in a hard 
to find bug.

Also, while the OP has a specific use case in mind, we can't know in advance 
how others will use make_key().  It could be misused to convert mutable data to 
immutable data, possibly resulting in a cached function that is slower than the 
original (caching mean() for example).  Or the make_key() function could had 
side-effects.  The OP's original "my_list[0]" example showed yet another way 
that bugs could arise. It is problematic that that particular bug might not 
have been caught by a test suite that didn't know to call the function twice 
with a common first argument but with differing subsequent arguments.

Another possible bug trap is that it would make it easier to accidentally cache 
mutable function results without getting a warning.  For example:
  @lru_cache(128, make_key = lambda target, size, uniq_id, data: uniq_id)
  def search(target, size, uniq_id, data):
   i = data.index(target)
   return data[i : i+size]# <== bug because this is mutable if data 
is mutable
   
I think the suggestion was a neat idea (and thought provoking), but I think we 
should decline because 1) the use case is uncommon 2) because it makes the API 
harder to understand 3) because it is bug bait and 4) because the 
responsibility should probably be with the caller.

Thank you for the suggestion.  It was inspired.  Don't be discouraged — many of 
my proposals don't get accepted as well.  Please continue to submit suggestions.



P.S. make_key() shouldn't be called a key-function so that we don't make that 
term ambiguous.  Elsewhere key-functions are all about comparison logic rather 
than hashing.  They can be created by functools.cmp_to_key(), they tend to be 
used only once per data element, and the same function tends to be inoperable 
with all the tools that accept key-functions.  So the term make_key() was 
likely the better terminology.  Perhaps get_unique_id() would have been even 
less ambiguous.

--
resolution:  -> rejected
stage: patch review -> resolved
status: open -> closed

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue41220] add optional make_key argument to lru_cache

2020-07-11 Thread Jim Jewett


Jim Jewett  added the comment:

Going back to Raymond's analysis, this is useful when at least some of the 
parameters either do not change the result, or are not hashable.

At a minimum, you need to figure out which parameters those are, and whether to 
drop them or transform them.

Is this already sufficiently rare or tricky that a subclass is justified, 
instead of trying to shoehorn things into a single key method?

--
nosy: +Jim.Jewett

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue41220] add optional make_key argument to lru_cache

2020-07-11 Thread Itay azolay


Itay azolay  added the comment:

Hey Felipe! Thanks for stepping in!
I do get your argument. 
However, in my opinion, I try to argue the same thing for max or sorted.
"if one wants to use `sorted`, they should make sure their arguments are 
comparable".
However, it is not the case, since we do have the `key` argument for sorted or 
max. 
Also, I don't believe caching equals hashing. 
Maybe from the technical point view, it does, but in reality, One can(and 
probably will) cache unhashable object, whether we give the option to do so or 
not.
I think, embedding the key argument in lru_cache, we allow the 
caller(developer) to solve the caching issue, in a way that is right according 
to his implementation of the cached function and its arguments.

Unrelated, this is my first feature proposal for python. I want to thank you 
for taking the time to think and answer with some very good arguments and 
respect, I truly enjoy this little debate we have here :)

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue41220] add optional make_key argument to lru_cache

2020-07-10 Thread Felipe Rodrigues


Felipe Rodrigues  added the comment:

Correction: (...) on the _caller_ to solve the hashing* issue (...)

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue41220] add optional make_key argument to lru_cache

2020-07-10 Thread Felipe Rodrigues


Felipe Rodrigues  added the comment:

Hello all,

I love discussions about caching! While I do see the point of Itayazolay's 
proposal, I think that it should be on the _caller_ to solve the caching issue, 
and not on the _callee_ to provide a way to make it happen.

That is, I think that if one wants to use a cache, they should make sure their 
arguments are hashable and not that we should modify called functions to 
provide workarounds for those.

--
nosy: +fbidu

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue41220] add optional make_key argument to lru_cache

2020-07-08 Thread Itay azolay


Itay azolay  added the comment:

Thanks, you have some very good points.
Let my try to address them

* cache functions really are expected to be cheap, but what they really need to 
be is *cheaper*. If my computation is expensive enough, I might be okay with 
making a less, still somewhat expensive computation instead. I believe it's for 
the developer to decide.

* key is usually used in a sequence of elements contexts, but it is expected to 
run multiple times. I believe that this is expected(what else could someone 
expect to happen?). I believe this is solvable through good docs(or change the 
name of the parameter?) 

* I believe that a matching signature key-make function is a good thing. It 
will enforce the user to address the key-make function if he changes the 
behaviour of the cached function, and he would rethink the cache, otherwise it 
will not work.

* I can't argue about API simplicity, you probably have much more experience 
there. However, I believe that if we can agree that this is a useful feature, 
we can find a way to make the API clear and welcoming.
BTW, I agree with the problems with the typed argument, never quite understood 
when can this be useful.

I'd like to compare the key argument suggested here, to key argument through 
other python functions. let's take `sorted` as example.
sorted supports key to be able to sort other types of data structures,
even though I like your suggestion, to use dataclass, I believe that if it's 
applicable here, we can say the same thing for sorted.
we could require sorted to work the same way:

@total_ordering # If I'm not mistaken
@dataclass
class MyData:
   ...
   fields
   ...
   def __gt__(self, other):
 return self.field > other.field

sorted(MyData(my_data_instance))


I think we both see the reason why this wouldn't be optimal in some cases here.
Without the key function, the sorted function doesn't support a big part of 
python objects.
I think the same applies for LRU cache. Right now, we just can use it with all 
python objects. we have to change the API, the way we move data around, the way 
we keep our objects, just so that lru_cache would work.
And after all that, lru_cache will just break if someone send some data in a 
list instead of tuple. I think that cause a lot of developers to give up the 
default stdlib lru_cache.

In my case, I have a few list of lists, each list indicates an event that 
happened. In each event, there is a unique timestamp. 
I have an object, that have few different lists
class Myobj:
events_1: List[list]
events_2: List[list]


I have a small, esoteric function, that looks like that now:
def calc(list_of_events):
  # calculation
  pass

and is being called from multiple places in the code, which takes a lot of 
time, like that
calc(events_1) # multiple times
calc(events_2) # multiple times

I wanted to cache the function calc, but now I have to do something like that:
@lru_cache
def calc_events_1(myobj):
  calc(myobj.events_1)

@lru_cache
def calc_events_2(myobj):
  calc(myobj.events_2)

right now I can't change the API of the lists, because they are being used in 
multiple places, some of this least(I have multiple events-lists) are being 
converted to numpy, some doesn't.

Regarding API, we could make it simpler by either use must have kwargs, like 
lru_cache(maxsize, typed, *, key=None)
or, like the property setters/getters case
lru_cache
def function(args, ...):
  pass
@function.make_key # or key, whatever name is good
def _(args, ...):
  return new_key

However I like the second option less.

Thanks

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue41220] add optional make_key argument to lru_cache

2020-07-07 Thread Raymond Hettinger


Raymond Hettinger  added the comment:

Thanks, I see what you're trying to do now:

1) Given a slow function 
2) that takes a complex argument 
   2a)  that includes a hashable unique identifier 
   2b)  and some unhashable data
3) Cache the function result using only the unique identifier

The lru_cache() currently can't be used directly because
all the function arguments must be hashable.

The proposed solution:
1) Write a helper function
   1a) that hash the same signature as the original function
   1b) that returns only the hashable unique identifier
2) With a single @decorator application, connect
   2a) the original function
   2b) the helper function
   2c) and the lru_cache logic


A few areas of concern come to mind:

* People have come to expect cached calls to be very cheap, but it is easy to 
write input transformations that aren't cheap (i.e. looping over all the inputs 
as in your example or converting entire mutable structures to immutable 
structures).

* While key-functions are relatively well understood, when we use them 
elsewhere key-functions only get called once per element.  Here, the 
lru_cache() would call the key function every time even if the arguments are 
identical.  This will be surprising to some users.

* The helper function signature needs exactly match the wrapped function.  
Changes would need to be made in both places.

* It would be hard to debug if the helper function return values ever stop 
being unique.  For example, if the timestamps start getting rounded to the 
nearest second, they will sporadically become non-unique.

* The lru_cache signature makes it awkward to add more arguments.  That is why 
your examples had to explicitly specify a maxsize of 128 even though 128 is the 
default. 

* API simplicity was an early design goal.  Already, I made a mistake by 
accepting the "typed" argument which is almost never used but regularly causes 
confusion and affects learnability.

* The use case is predicated on having a large unhashable dataset accompanied 
by a hashable identifier that is assumed to be unique.  This probably isn't 
common enough to warrant an API extension.  

Out of curiosity, what are you doing now without the proposed extension?  

As a first try, I would likely write a dataclass to be explicit about the types 
and about which fields are used in hashing and equality testing:

@dataclass(unsafe_hash=True)
class ItemsList:
unique_id: float
data: dict = field(hash=False, compare=False)

I expect that dataclasses like this will emerge as the standard solution 
whenever people need a mapping or dict to work with keys that have a mix of 
hashable and unhashable components.  This will work with the lru_cache(), 
dict(), defaultdict(), ChainMap(), set(), frozenset(), etc.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue41220] add optional make_key argument to lru_cache

2020-07-07 Thread Itay azolay


Itay azolay  added the comment:

Yes, you're right.
It's a bad example, I tried to simplify it, and I ended up oversimplifying it.
Real-life cases are of course more complicated.
What I wanted to accomplish, is very similar to the `key` argument in 
sorted/min/max/etc..
let my try and give you an example.

assume we have a complex data type, with a timestamp-signed attribute.
our single item will look as follows:
SingleItem = (unique_timestamp,  )
now assume we have a function "extensive_computation"

def extensive_computation(items: List[SingleItem]):
  # very hard work
  sleep(60)

As developer, I know that the every item has unique timestamp.
So for a list of N timestamp, when they are the same, the result of the 
computation will be the same.

def item_cache_key(items: List[SingleItem]):
  return (timestamp for timestamp, data in items)

I would like to then create:

@lru_cache(128, key=item_cache_key):
def cache_extensive_computation(items):
  extensive_computation(items)

Does that makes more sense?

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue41220] add optional make_key argument to lru_cache

2020-07-06 Thread Raymond Hettinger


Change by Raymond Hettinger :


--
Removed message: https://bugs.python.org/msg373196

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue41220] add optional make_key argument to lru_cache

2020-07-06 Thread Raymond Hettinger


Raymond Hettinger  added the comment:

>>> from functools import lru_cache
>>> def my_make_key(my_list):
...   return my_list[0]
...
>>> @lru_cache(128, make_key=my_make_key)
... def cached_func(my_list):
...   return sum(my_list)
...
>>> cached_func([10, 20, 30])
60
>>> cached_func([10, 11, 12]) # <-- Why would we want this to return 60?
60

This seems unsafe.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue41220] add optional make_key argument to lru_cache

2020-07-06 Thread Raymond Hettinger


Raymond Hettinger  added the comment:

What is the given example trying to accomplish?  It looks like every possible 
input list would be cached using just the first element of the list.  Would 
cached_func([10, 20, 30]) and cached_func([10, 11, 12]) return the same sum?  
If so, why would that be desirable?

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue41220] add optional make_key argument to lru_cache

2020-07-06 Thread Raymond Hettinger


Change by Raymond Hettinger :


--
Removed message: https://bugs.python.org/msg373194

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue41220] add optional make_key argument to lru_cache

2020-07-06 Thread Raymond Hettinger


Raymond Hettinger  added the comment:

What is the given example trying to accomplish?  It looks like every possible 
input list would be cached using just the first element of the list.  Would 
cached_func([10, 20, 30]) and cached_func([10, 11, 12]) return the same sum?  
If not, why would that be desireable?

--
assignee:  -> rhettinger

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue41220] add optional make_key argument to lru_cache

2020-07-06 Thread Karthikeyan Singaravelan


Change by Karthikeyan Singaravelan :


--
nosy: +rhettinger

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue41220] add optional make_key argument to lru_cache

2020-07-06 Thread Bar Harel


Bar Harel  added the comment:

May I suggest calling it key? Will comply with other Python functions.

--
nosy: +bar.harel

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue41220] add optional make_key argument to lru_cache

2020-07-06 Thread Itay azolay


Change by Itay azolay :


--
keywords: +patch
pull_requests: +20499
stage:  -> patch review
pull_request: https://github.com/python/cpython/pull/21353

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue41220] add optional make_key argument to lru_cache

2020-07-06 Thread Itay azolay


New submission from Itay azolay :

I'd like to add optional argument to lru_cache.
This argument is a user given function that will replace the default behaviour 
of creating a key from the args/kwds of the function.

for example:

def my_make_key(my_list):
  return my_list[0]

@lru_cache(128, make_key=my_make_key)
def cached_func(my_list):
  return sum(my_list)

This will creating a cached function that accepts immutable.
Also, It will allow user to add custom functions from knowledge about the 
expected function input, without the need to create custom classes and/or 
overriding __hash__

--
components: Library (Lib)
messages: 373141
nosy: Itayazolay
priority: normal
severity: normal
status: open
title: add optional make_key argument to lru_cache
type: enhancement
versions: Python 3.10

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com