Top posting as this is more a response to the whole thread. My take aways from the most excellent discussion:
* There is some benefit to iteritems in python2 when you need it. * OpenStack does not seem to need it - Except in places that are operating on tens of thousands of large objects concurrently such as the nova scheduler. * six.anything is more code, and more code is more burden in general. >From this I believe we should distill some clear developer and reviewer recommendations which should go in our developer docs: * Do not use six.iteritems in new patches without a clear reason stated and attached. - Reasons should clearly state why .items() would be a large enough burden, such as "this list will be large and stay resident in memory for the duration of the program. Each concurrent request will have similar lists." * -1 patches using six.iteritems in flight now with "Please remove or justify six.iteritems usage." * Patches touching code sections which uses six.iteritems should be allowed to remove its usage without justification. I've gone ahead and added this suggestion in a patch to the infra-manual: https://review.openstack.org/190757 This looks quite a bit like a hacking rule definition. How strongly do we feel about this, do we want to require a tag of some kind on lines that use six.iteritems(), or are we comfortable with this just being in our python3 porting documentation? Excerpts from Robert Collins's message of 2015-06-09 17:15:33 -0700: > I'm very glad folk are working on Python3 ports. > > I'd like to call attention to one little wart in that process: I get > the feeling that folk are applying a massive regex to find things like > d.iteritems() and convert that to six.iteritems(d). > > I'd very much prefer that such a regex approach move things to > d.items(), which is much easier to read. > > Here's why. Firstly, very very very few of our dict iterations are > going to be performance sensitive in the way that iteritems() matters. > Secondly, no really - unless you're doing HUGE dicts, it doesn't > matter. Thirdly. Really, it doesn't. > > At 1 million items the overhead is 54ms[1]. If we're doing inner loops > on million item dictionaries anywhere in OpenStack today, we have a > problem. We might want to in e.g. the scheduler... if it held > in-memory state on a million hypervisors at once, because I don't > really to to imagine it pulling a million rows from a DB on every > action. But then, we'd be looking at a whole 54ms. I think we could > survive, if we did that (which we don't). > > So - please, no six.iteritems(). > > Thanks, > Rob > > > [1] > python2.7 -m timeit -s 'd=dict(enumerate(range(1000000)))' 'for i in > d.items(): pass' > 10 loops, best of 3: 76.6 msec per loop > python2.7 -m timeit -s 'd=dict(enumerate(range(1000000)))' 'for i in > d.iteritems(): pass' > 100 loops, best of 3: 22.6 msec per loop > python3.4 -m timeit -s 'd=dict(enumerate(range(1000000)))' 'for i in > d.items(): pass' > 10 loops, best of 3: 18.9 msec per loop > pypy2.3 -m timeit -s 'd=dict(enumerate(range(1000000)))' 'for i in > d.items(): pass' > 10 loops, best of 3: 65.8 msec per loop > # and out of interest, assuming that that hadn't triggered the JIT.... > but it had. > pypy -m timeit -n 1000 -s 'd=dict(enumerate(range(1000000)))' 'for i > in d.items(): pass' > 1000 loops, best of 3: 64.3 msec per loop > __________________________________________________________________________ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev