On 12/31/05, Raymond Hettinger <[EMAIL PROTECTED]> wrote: > [Noam] > > For example, iteration over a set which once had > > 1,000,000 members and now has 2 can take 1,000,000 operations every > > time you traverse all the (2) elements. > > Do you find that to be a common or plausible use case?
I don't have a concrete example in this minute, but a set which is repeatedly filled with elements and then emptied by pop operations doesn't seem to me that far-fetched. > > Was Guido's suggestion of s=set(s) unworkable for some reason? dicts > and sets emphasize fast lookups over fast iteration -- apps requiring > many iterations over a collection may be better off converting to a list > (which has no dummy entries or empty gaps between entries). It's workable, but I think that most Python programmers haven't read that specific message, and are expecting operations which should take a short time to take a short time. Converting to a list won't help the use-case above, and anyway, it's another thing that I wouldn't expect anyone to do - there's no reason that iteration over a set should take a long time. (I'm speaking of my point of view, which I believe is common. I don't expect programs I write in Python to be super-fast - if that were the case, I would write them in C. I do expect them to take a reasonable amount of time, and in the case of iteration over a set, that means a time proportional to the number of elements in the set.) > > Would the case be improved by incurring the time cost of 999,998 tests > for possible resizing (one for each pop) and some non-trivial number of > resize operations along the way (each requiring a full-iteration over > the then current size)? > I believe it would. It seems to me that those 999,998 tests take not much more than a machine clock, which means about 1 milisecond on todays computers. Those resize operations will take some more miliseconds. It all doesn't really matter, since probably all other things will take much more. I now run this code >>> s = set() >>> for j in xrange(1000000): ... s.add(j) ... >>> while s: ... tmp = s.pop() ... And it took 2.4 seconds to finish. And it's okay - I'm just saying that a few additional clock ticks per operation will usually not matter when the overall complexity is the same, but changes in order of complexity can matter much more. > Even if this unique case could be improved, what is the impact on common > cases? Would a downsizing scheme risk thrashing with the > over-allocation scheme in cases with mixed adds and pops? > I think that there shouldn't be additional damage beyond those clock ticks. The simple method I took from "Introduction to Algorithms" works no matter what sequence of adds and pops you have. > Is there any new information/research beyond what has been obvious from > the moment the dict resizing scheme was born? I wanted to say that there isn't any new information, and yet I don't think that I have to assume that everything in current Python is the best that can be. All I did was finding another reason why a downsizing scheme might be good, and posting it to ask if people have thought about it. If you have a document listing all the design decisions that went into dict implementation, then please send it to me and I won't ask about things that were already thought about. But the answer is, yes. I beleive that the current dict resizing scheme was born before the iterator protocol was introduced, and it may be a reason why the current scheme doesn't try to minimize the number of empty hash table entries. Noam _______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com