Patches item #1615701, was opened at 2006-12-14 14:08 Message generated for change (Comment added) made by doerwalter You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1615701&group_id=5470
Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: Core (C code) Group: Python 2.6 Status: Open Resolution: None Priority: 5 Private: No Submitted By: Walter Dörwald (doerwalter) Assigned to: Nobody/Anonymous (nobody) Summary: Creating dicts for dict subclasses Initial Comment: This patch changes dictobject.c so that creating dicts from mapping like objects only uses the internal dict functions if the argument is a *real* dict, not a subclass. This means that overwritten keys() and __getitem__() methods are now honored. In addition to that the fallback implementation now tries iterkeys() before trying keys(). It also adds a PyMapping_IterKeys() macro. ---------------------------------------------------------------------- >Comment By: Walter Dörwald (doerwalter) Date: 2006-12-20 13:59 Message: Logged In: YES user_id=89016 Originator: YES To clear up some apparent misunderstandings: This patch does *not* advocate that some dict methods should be implemented by calling other dict methods so that dict subclasses only have to overwrite a few methods to gain a completely consistent implementation. This patch only fixes the dict constructor (and update()) and consists of two parts: (1) There are three code paths in dictobject.c::dict_update_common(): (a) if the constructor argument doesn't have a "keys" attribute treat it as a iterable of items; (b) if the argument has a "keys" attribute, but is not a dict (and not an instance of a subclass of dict), use keys() and __getitem__() to make a copy of the mapping-like object. (c) if the argument has a "keys" attribute and is a dict (or an instance of a subclass of dict) bypass any of the overwritten methods that the object might provide and directly use the dict implementation. This patch changes PyDict_Merge() so that code path (b) is used for dict constructor arguments that are subclasses of dict, so that any overwritten methods are honored. (2) This means that now if a subclass of dict is passed to the constructor or update() the code is IMHO more correct (it honors the reimplemenation of the mapping methods), but slower. To reduce the slowdown instead of using kesY() and __getitem__(), iterkeys() and __getitem__() are used. I can't see why the current behaviour should be better: Yes, it is faster, but it is also wrong: Without the patch the behaviour of dict() and dict.update() depends on the fact whether the argument happens to subclass dict or not. If it doesn't all is well: the argument is treated as a mapping (i.e. keys() and __getitem__() are used) otherwise the methods are completely ignored. So can we agree on the fact that (1) is desirable? (At least Guido said as much: http://mail.python.org/pipermail/python-dev/2006-December/070341.html) BTW, I only added PyMapping_Iterkeys() because it mirrors PyMapping_Keys(). ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2006-12-20 01:13 Message: Logged In: YES user_id=80475 Originator: NO Since update already supports (key, item) changes, I do not see that rationale in trying to expand the definition what is dict-like to include a try-this, then try-that approach. This is a little too ad-hoc for too little benefit. Also, I do not see the point of adding PyMapping_Iterkeys to the C API. It affords no advantage over its macro definition (the current one-way-to-do-it). ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2006-12-20 00:00 Message: Logged In: YES user_id=80475 Originator: NO It is also asking for bugs if someone hooks __getitem__ and starts to make possibly invalid assumptions about what other changes occur implicitly. Also, this patch kills the performance of builtin subclasses. If I subclass dict to add a new method, it would suck to have the performance of all of the other methods drop percariously. This patch is somewhat overzealous. It encroaches on the terriority of UserDict.DictMixin which was specifically made for propagating new behaviors. It unnecessarily exposes implementation details. It introduces implicit behaviors that should be done through explicit overrides of methods whose behavior is supposed to change. And, it is on the top of a slippery slope that we don't want to go down (i.e. do we want to guarantee that list.append is implemented in terms of list.extend, etc). Python has no shortage of places where builtin subclasses make direct calls to the underlying C code -- this patch leads to a bottomless pit of changes that kill performance and make implicit side-effects the norm instead of the exception. ---------------------------------------------------------------------- Comment By: Jim Jewett (jimjjewett) Date: 2006-12-19 23:29 Message: Logged In: YES user_id=764593 Originator: NO FWIW, I'm not sure I agree on not specifying which methods call share implementation. If someone hooks __getitem__ but not get, that is just asking for bugs. (The implementation of get may -- but need not -- make its own call to __getitem__, and not everyone will make the same decision.) ---------------------------------------------------------------------- Comment By: Jim Jewett (jimjjewett) Date: 2006-12-19 23:26 Message: Logged In: YES user_id=764593 Originator: NO As I understand it, the problem is that dict.update is assuming any dict subclass will use the same internal data representation. Restricting the fast path to exactly builtin dicts (not subclasses) fixes the bug, but makes the fallback more frequent. The existing fallback is to call keys(), then iterate over it, retrieving the value for each key. (keys is required for a "minimal mapping" as documented is UserDict, and a few other random places.) The only potential dependency on other methods is his proposed new intermediate path that avoids creating a list of all keys, by using iterkeys if it exists. (I suggested using iteritems to avoid the lookups.) If iter* aren't implemented, the only harm is falling back to the existing fallback of "for k in keys():" ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2006-12-19 22:07 Message: Logged In: YES user_id=80475 Originator: NO I'm -1 on making ANY guarantees about which methods underlie others -- that would constitute new and everlasting guarantees about how mappings are implemented. Subclasses should explicity override/extend the methods withed changed behavior. If that proves non-trivial, then it is likely there should be a has-a relationship instead of an is-a relationship. Also, it is likely that the subclass will have Liskov substitutability violations. Either way, there is probably a design flaw. ---------------------------------------------------------------------- Comment By: Walter Dörwald (doerwalter) Date: 2006-12-19 20:23 Message: Logged In: YES user_id=89016 Originator: YES iteritems() has to create a new tuple for each item, so this might be slower. ---------------------------------------------------------------------- Comment By: Jim Jewett (jimjjewett) Date: 2006-12-19 18:50 Message: Logged In: YES user_id=764593 Originator: NO Why are you using iterkeys instead of iteritems? It seems like if they've filled out the interface enough to have iterkeys, they've probably filled it out all the way, and you do need the value as soon as you get the key. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1615701&group_id=5470 _______________________________________________ Patches mailing list Patches@python.org http://mail.python.org/mailman/listinfo/patches