Raymond Hettinger <[email protected]> added the comment:
Thanks, I see what you're trying to do now:
1) Given a slow function
2) that takes a complex argument
2a) that includes a hashable unique identifier
2b) and some unhashable data
3) Cache the function result using only the unique identifier
The lru_cache() currently can't be used directly because
all the function arguments must be hashable.
The proposed solution:
1) Write a helper function
1a) that hash the same signature as the original function
1b) that returns only the hashable unique identifier
2) With a single @decorator application, connect
2a) the original function
2b) the helper function
2c) and the lru_cache logic
A few areas of concern come to mind:
* People have come to expect cached calls to be very cheap, but it is easy to
write input transformations that aren't cheap (i.e. looping over all the inputs
as in your example or converting entire mutable structures to immutable
structures).
* While key-functions are relatively well understood, when we use them
elsewhere key-functions only get called once per element. Here, the
lru_cache() would call the key function every time even if the arguments are
identical. This will be surprising to some users.
* The helper function signature needs exactly match the wrapped function.
Changes would need to be made in both places.
* It would be hard to debug if the helper function return values ever stop
being unique. For example, if the timestamps start getting rounded to the
nearest second, they will sporadically become non-unique.
* The lru_cache signature makes it awkward to add more arguments. That is why
your examples had to explicitly specify a maxsize of 128 even though 128 is the
default.
* API simplicity was an early design goal. Already, I made a mistake by
accepting the "typed" argument which is almost never used but regularly causes
confusion and affects learnability.
* The use case is predicated on having a large unhashable dataset accompanied
by a hashable identifier that is assumed to be unique. This probably isn't
common enough to warrant an API extension.
Out of curiosity, what are you doing now without the proposed extension?
As a first try, I would likely write a dataclass to be explicit about the types
and about which fields are used in hashing and equality testing:
@dataclass(unsafe_hash=True)
class ItemsList:
unique_id: float
data: dict = field(hash=False, compare=False)
I expect that dataclasses like this will emerge as the standard solution
whenever people need a mapping or dict to work with keys that have a mix of
hashable and unhashable components. This will work with the lru_cache(),
dict(), defaultdict(), ChainMap(), set(), frozenset(), etc.
----------
_______________________________________
Python tracker <[email protected]>
<https://bugs.python.org/issue41220>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe:
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com