I realized yet another thing, which will reduce overhead: the original array can store values directly, and you maintain the refs by repeatedly updating them when moving refs around. RefCells will point to a pointer to the value cell (which already exists in the table).
- `getitem` will be almost the same as a normal dict: it has to check if value is valid (which it already checked, but it will be invalid a lot more often). - `setitem` the same as a normal dict (since the RefCells will just point to the _address_ of the value pointer in the main table), except that the dict will be bigger, and compaction/expansion has more overhead. No creation of refcells here. - `delitem` will just null the value pointer, which shouldn't cost much more, if it doesn't cost less. - Expansion and compaction will cost more, since we have to copy over the RefCell pointers and also check whether they should be copied. - Deletion of the dict will cost more, due to the additional logic for deciding what to delete, and the RefCells can no longer point into the entry table. They would have to point at the value (requiring logic, or the replacement of a function pointer) or at a new allocated object (requiring an allocation of sizeof(PyObject*) bytes). On Tue, Dec 15, 2015 at 5:38 PM, Victor Stinner <victor.stin...@gmail.com> wrote: > Sorry, I didn't read carefully your email, but I don't think that it's > acceptable to make Python namespaces slower. In FAT mode, we need > versionned dictionaries for module namespace, type namespace, global > namespace, etc. It was actually more "it might be a problem" than "it will be a problem". I don't know if the overhead will be high enough to worry about. It might be dominated by whatever savings there would be by not having to look up names more than once. (Unless builtins get mixed with globals? I think that's solvable, though. It's just that the solutions I can think of have different tradeoffs.) I am confident that the time overhead and the savings will beat the versioning dict. The versioning dict method has to save a reference to the variable value and a reference to the name, and regularly test whether the dict has changed. This method only has to save a reference to a reference to the value (though it might need the name to allow debugging), doesn't care if it's changed, will be an identity (to NULL?) test if it's deleted (and only if it's not replaced after), and absolutely doesn't care if the dict had other updates (which might increase the version number). >>> Do you have an estimation of the cost of the "extra pointer"? Impact >>> on memory and CPU. dict is really a very important type for the >>> performance of Python. If you make dict slower, I'm sure that Python >>> overall will be slower. >> >> I'm proposing it as a subclass. > > Please read the "Versionned dictionary" section of my email: > https://mail.python.org/pipermail/python-dev/2015-December/142397.html > > I explained why using a subclass doesn't work in practice. I've read it again. By subclass, I mean that it implements the same interface. But at the C level, I want to have it be a fork(?) of the current dict implementation. As for `exec`, I think it might be okay for it to be slower at the early stages of this game. Here's the lookup function for a string-only dict (used both for setting and getting): https://github.com/python/cpython/blob/master/Objects/dictobject.c#L443 I want to break that up into two parts: - Figure out the index of the {hash, *key, *val} entry in the array. - Do whatever to it. (In the original: make *value_addr point to the value pointer.) I want to do this so that I can use that index to point into ANOTHER array, which will be the metadata for the refcells (whatever it ends up being). This will mean that there's no second lookup. This has to be done at the C level, because the dict object doesn't expose the index of the {hash, *key, *val} entries on lookup. If you don't want to make it a subclass, then we can propose a new function `get_ref` (or something) for dict's C API (probably a hard sell), which returns RefCell objects, and an additional pointer in `dict` to the RefCells table (so a total of two pointers). When `get_ref` is first called, it will - calloc the RefCell table (which will be the same length as the entry table) - replace all of the dict's functions with ones that know how to deal with the RefCells, - replace itself with a function that knows how to return these refs. - call its replacement. If the dict never gets RefCells, you only pay a few pointers in size, and a few creation/deletion values. This is possible now that the dictionary itself will store values as normal. There might be more necessary. For example, the replaced functions might need to keep pointers to their originals (so that you can slip additional deep C subclasses in). And it might be nice if the `get_index` function could be internally relied upon by the C-level subclasses, because "keeping a metadata table index-synchronized with the real one" is something I've wanted to do for two different dict subclasses now. _______________________________________________ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com