On 7/10/06, Raymond Hettinger <[EMAIL PROTECTED]> wrote: > Guido van Rossum wrote: > > >I've also sometimes thought of unifying dicts and sets by implementing > >set operations on the keys of dicts only. When the values aren't the > >same, we could make an arbitrary decision e.g. the left operand wins. > >You get quite far. E.g. > > > >a = {1: "a", 2: "b"} > >b = {1: "c", 3: "d"} > > > ># These already work: > >assert 1 in a > >assert 1 in b > >assert 3 not in a > ># etc. > > > ># These would be new (the equivalent augmented assignments would work too): > >assert a|b == {1: "a", 2: "b", 3: "c"}
(That should be 3: "d" BTW.) > >assert a&b == {1: "a"} > >assert a^b == {2: "b", 3: "d"} > >assert a-b == {2: "b"} > >assert b-a == {3: "d"} > > > ># We could use these equivalencies: > >assert {1} == {1: None} == set({1: "a"}) # I.e. canonical sets have > >all None values > > > ># And of course it would solve the empty set notation problem nicely: > >assert dict() == {} == set() > > > >Unfortunately we couldn't redefine <=, <, >=, > to mean various > >subset/superset tests without backwards incompatibilities (but does > >anyone care?), and == and != would of course take the values into > >account which would occasionally sting. Also, sets would grow some > >operations that don't make a lot of sense (e.g. __getitem__, get, > >setdefault) but that's minor once you know the same implementation is > >used. > > > >I still expect there's a fatal flaw in the scheme, but I can't think > >of it right now... > > I don't think there is a fatal flaw in terms of implementation -- a > little modification of sets.py would demonstrate that with certainly. > > There would be some implementation consequences in terms of speed and > memory usage (we would lose the memory savings and some of the speed-ups > gained in the Py2.5 implementation of sets). The storage size would > grow 50%. All set building operations would incur the overhead of > copying in values as well as keys. All of those could probably be eliminated through sheer willpower. E.g. a flag "are any values non-None" would go a long way. Of course, the 50% growth is only in terms of memory for the dict itself; if we take the space for the keys into account the percentage would be much smaller. > The biggest issues are on the conceptual side -- do we think about set > operations and mapping operations in the same way or in different ways? > Guido listed the basic conflicts 1) arbitrary decisions as to which > values to keep, 2) unexpected stings from != and == taking values into > account, 3) operations that don't make sense (__setitem__, setdefault, > etc). > > The deeper problem is that decisions on which values to keep will > inevitability break set invariants (see a long list of these in > test.test_set.py): > > assert a|b == b|a, 'commutative property' > assert (a-b) | (a&b) | (b-a) == a|b, 'whole is the sum of the parts' Yeah, I think when I was cosidering this earlier (it's been a shower thought for quite a while on and off) I considered those fatal flaws. Today, I think the invariants would simply change to assert set(a|b) == set(b|a) == set(a)|set(b) == set(b)|set(a) with set() being an alias for dict.fromkeys(). (I wonder if we need a similar thing that doesn't make a copy if the input is already a set, i.e. has all-None values.) > The notion of sets carries so much mathematical significance that the > loss of expected invariants would be a recurring surprise. But only when mixing sets with non-sets. > On the positive side, I've occasionally wanted some set style operations > for dictionaries (i.e. give me all the entries in d1 that aren't in d2, > etc.) Right. Plus of course there's a long history of pre-2.3 code that (ab)uses dicts as sets. (We could get rid of the "ab". :-) > The LUA programming language demonstrated that even unifying lists and > dicts is possible and that people can get used to just about anything. And ABC has lists and tables that were quite different from Python's lists and dicts... > The question boils down to what is best for the user in terms of > learnability, clarity, reliability, and performance. Where the unified data type doesn't necessarily score poorly -- one less type means less to learn, less code, and fewer choices to make. I'm still very much undecided but I don't want to rule this out for py3k. Perhaps I'll write up a PEP and see how it flies. -- --Guido van Rossum (home page: http://www.python.org/~guido/) _______________________________________________ Python-3000 mailing list Python-3000@python.org http://mail.python.org/mailman/listinfo/python-3000 Unsubscribe: http://mail.python.org/mailman/options/python-3000/archive%40mail-archive.com