On Sat, Dec 04, 2015 at 7:49 AM, Victor Stinner <victor.stin...@gmail.com> wrote:
> Versionned dictionary > ===================== > > In the previous milestone of FAT Python, the versionned dictionary was a > new type inherited from the builtin dict type which added a __version__ > read-only (global "version" of dictionary, incremented at each change), > a getversion(key) method (version of a one key) and it added support for > weak references. I was thinking (as an alternative to versioning dicts) about a dictionary which would be able to return name/value pairs, which would also be internally used by the dictionary. This would be way less sensitive to irrelevant changes in the scope dictionary, but cost an extra pointer to each key. Here's how it would work: pair = scope.item(name) scope[name] = newval assert pair.value is newval assert pair.key is name assert pair is scope.item(name) # Alternatively, to only create pair objects when `item` is called, have `==` compare the underlying pair. del scope[name] assert pair.key is None # name-dicts can't have `None` keys assert pair.value is None # Alternatively, pair.value is scope.NULL This dict will allow one to hold references to its entries (with the caller promising not to change them, enforced by exceptions). You won't have to keep looking up keys (unless the name is deleted), and functions are allowed to change. For inlining, you can detect whether the function has been redefined by testing the saved pair.value against the saved function, and go into the slow path if needed (or recompile the inlining). I am not sure whether deleting from the dict and then readding the same key should reuse the pair container. I think the only potential issue for the Python version is memory use. There aren't going to be THAT many names being deleted, right? So I say that deleted things in the scope dict should not be removed from the inner dict. I predict that this will simplify a lot of other things, especially when deleting and readding the same name: if you save a pair, and it becomes invalid, you don't have to do another lookup to make sure that it's REALLY gone. If memory is a real concern, deleted pairs can be weakrefed (and saved in a second dict?) until they are reused. This way, pairs which aren't saved by something outside will be removed. For implementation, a Python implementation of the idea has probably already been done. Here are some details: - set: Internally store d._d[k] = k,v. - get: Reject k if d._d[k].key is None. (Names must be strings.) - del: Set d._d[k].key = None and .val = d.NULL to invalidate this entry. For the CPython version, CPython's dict already stores its entries as PyDictKeyEntry (hash, *key, *value), but those entries can move around on resizing. Two possible implementations: 1. Fork dict to store {hash, *kv_pair}. 2. Use an inner dict (like in the Python suggestion) where values are kv_pair. Write the indirection code in C, because scope dicts must be fast. For exposing a pair to Python code, here are two possibilities: 1. Make them Python objects in the first place. 2. Keep a second hash table in lockstep with the first (so that you can do a lookup to find the index in the first, and then use that same index with the second). In this table, store pair objects that have been created. (They can be weakrefed, as before. Unless it's impossible to weakref something you're returning?) This will save memory for pairs that aren't ever exposed. If compact dictionaries are implemented, the second hash table will be a second array instead. To use this kind of scopedict, functions would have to store a list of used names, which is memory overhead. But for what you're doing, some overhead will be necessary anyway. _______________________________________________ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com