Thanks for this thorough review, Raymond! Especially the user research is amazing.
And thanks for Antoine for writing the PEP -- you never know how an idea pans out until you've tried it. --Guido On Thu, May 14, 2015 at 7:29 AM, Raymond Hettinger < raymond.hettin...@gmail.com> wrote: > Before the Python 3.5 feature freeze, I should step-up and > formally reject PEP 455 for "Adding a key-transforming > dictionary to collections". > > I had completed an involved review effort a long time ago > and I apologize for the delay in making the pronouncement. > > What made it a interesting choice from the outset is that the > idea of a "transformation" is an enticing concept that seems > full of possibility. I spent a good deal of time exploring > what could be done with it but found that it mostly fell short > of its promise. > > There were many issues. Here are some that were at the top: > > * Most use cases don't need or want the reverse lookup feature > (what is wanted is a set of one-way canonicalization functions). > Those that do would want to have a choice of what is saved > (first stored, last stored, n most recent, a set of all inputs, > a list of all inputs, nothing, etc). In database terms, it > models a many-to-one table (the canonicalization or > transformation function) with the one being a primary key into > another possibly surjective table of two columns (the > key/value store). A surjection into another surjection isn't > inherently reversible in a useful way, nor does it seem to be a > common way to model data. > > * People are creative at coming up with using cases for the TD > but then find that the resulting code is less clear, slower, > less intuitive, more memory intensive, and harder to debug than > just using a plain dict with a function call before the lookup: > d[func(key)]. It was challenging to find any existing code > that would be made better by the availability of the TD. > > * The TD seems to be all about combining data scrubbing > (case-folding, unicode canonicalization, type-folding, object > identity, unit-conversion, or finding a canonical member of an > equivalence class) with a mapping (looking-up a value for a > given key). Those two operations are conceptually orthogonal. > The former doesn't get easier when hidden behind a mapping API > and the latter loses the flexibility of choosing your preferred > mapping (an ordereddict, a persistentdict, a chainmap, etc) and > the flexibility of establishing your own rules for whether and > how to do a reverse lookup. > > > Raymond Hettinger > > > P.S. Besides the core conceptual issues listed above, there > are a number of smaller issues with the TD that surfaced > during design review sessions. In no particular order, here > are a few of the observations: > > * It seems to require above average skill to figure-out what > can be used as a transform function. It is more > expert-friendly than beginner friendly. It takes a little > while to get used to it. It wasn't self-evident that > transformations happen both when a key is stored and again > when it is looked-up (contrast this with key-functions for > sorting which are called at most once per key). > > * The name, TransformDict, suggests that it might transform the > value instead of the key or that it might transform the > dictionary into something else. The name TransformDict is so > general that it would be hard to discover when faced with a > specific problem. The name also limits perception of what > could be done with it (i.e. a function that logs accesses > but doesn't actually change the key). > > * The tool doesn't self describe itself well. Looking at the > help(), or the __repr__(), or the tooltips did not provide > much insight or clarity. The dir() shows many of the > _abc implementation details rather than the API itself. > > * The original key is stored and if you change it, the change > isn't stored. The _original dict is private (perhaps to > reduce the risk of putting the TD in an inconsistent state) > but this limits access to the stored data. > > * The TD is unsuitable for bijections because the API is > inherently biased with a rich group of operators and methods > for forward lookup but has only one method for reverse lookup. > > * The reverse feature is hard to find (getitem vs __getitem__) > and its output pair is surprising and a bit awkward to use. > It provides only one accessor method rather that the full > dict API that would be given by a second dictionary. The > API hides the fact that there are two underlying dictionaries. > > * It was surprising that when d[k] failed, it failed with > transformation exception rather than a KeyError, violating > the expectations of the calling code (for example, if the > transformation function is int(), the call d["12"] > transforms to d[12] and either succeeds in returning a value > or in raising a KeyError, but the call d["12.0"] fails with > a TypeError). The latter issue limits its substitutability > into existing code that expects real mappings and for > exposing to end-users as if it were a normal dictionary. > > * There were other issues with dict invariants as well and > these affected substitutability in a sometimes subtle way. > For example, the TD does not work with __missing__(). > Also, "k in td" does not imply that "k in list(td.keys())". > > * The API is at odds with wanting to access the transformations. > You pay a transformation cost both when storing and when > looking up, but you can't access the transformed value itself. > For example, if the transformation is a function that scrubs > hand entered mailing addresses and puts them into a standard > format with standard abbreviations, you have no way of getting > back to the cleaned-up address. > > * One design reviewer summarized her thoughts like this: > "There is a learning curve to be climbed to figure out what > it does, how to use it, and what the applications [are]. > But, the [working out the same] examplea with plain dicts > requires only basic knowledge." -- Patricia > _______________________________________________ > Python-Dev mailing list > Python-Dev@python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: > https://mail.python.org/mailman/options/python-dev/guido%40python.org > -- --Guido van Rossum (python.org/~guido)
_______________________________________________ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com