On 12 June 2015 at 05:39, Dolph Mathews <dolph.math...@gmail.com> wrote: > > On Thu, Jun 11, 2015 at 12:34 AM, Robert Collins <robe...@robertcollins.net> > wrote: >> >> On 11 June 2015 at 17:16, Robert Collins <robe...@robertcollins.net> >> wrote: >> >> > This test conflates setup and execution. Better like my example, >> ... >> >> Just had it pointed out to me that I've let my inner asshole out again >> - sorry. I'm going to step away from the thread for a bit; my personal >> state (daughter just had a routine but painful operation) shouldn't be >> taken out on other folk, however indirectly. > > > Ha, no worries. You are completely correct about conflating setup and > execution. As far as I can tell though, even if I isolate the dict setup > from the benchmark, I get the same relative differences in results. > iteritems() was introduced for a reason!
Absolutely: the key question is whether that reason is applicable to us. > If you don't need to go back to .items()'s copy behavior in py2, then > six.iteritems() seems to be the best general purpose choice. > I think Gordon said it best elsewhere in this thread: > >> again, i just want to reiterate, i'm not saying don't use items(), i just >> think we should not blindly use items() just as we shouldn't blindly use >> iteritems()/viewitems() I'd like to recap and summarise a bit. I think its broadly agreed that: The three view based methods -- iteritems, iterkeys, iteritems -- in Python2 became unified with the list-form equivalents in Python3. The view based methods are substantially faster and lower overhead than the list form methods, approximately 3x. We don't have any services today that expect to hold million item dicts, or even 10K item dicts in a persistent fashion. There's some cognitive overhead involved in reading six.iteritems(d) vs d.items(). We should use d.items() except where it matters. Where does it matter? We have several process architectures in OpenStack: - We have API servers that are eventlet (except keystone) WSGI servers. They respond to requests on HTTP[S], each request is independent and loads all its state from the DB and/or memcache each time. We don't expect large numbers of concurrent active requests per process. (Where large would be e.g. 1000). - We have MQ servers that are conceptually the same as WSGI, just a different listening protocol. They do sometimes have background tasks, and for some (e.g. neutron-l3-agent) may hold significant cached state between requests. But thats still scoped to medium size datasets. We expect moderate numbers of concurrent active requests, as these are the actual backends doing things for users, but since these servers are typically working with actual slow things (e.g. the hypervisor) high concurrency typically goes badly :). - We have CLIs that start up, process some data and exit. This includes python-novaclient and nova-manage. They generally work with very small datasets and have no concurrency at all. There are two ways that iteritems vs items etc could matter. One A) is memory&cpu on single use of very large dicts. The other B) is aggregate overhead on many concurrent uses of a single shared dict (or C) possibly N similar-sized dicts). A) Doesn't apply to us in any case I can think of. B) Doesn't apply to us either - our peak concurrency on any single process is still low (we may manage to make it higher now we're moving on the PyMYSql thing, but thats still in progress - and of course there are tradeoffs with high concurrency depending on the ratio of work-to-wait each request has. Very high concurrency depends on a very low ratio: to have 1000 concurrent requests that aren't slowing each other down requires that each requests wall clock be 1000x the time spent in-process actioning it; and that there be enough backend capacity (whatever that is) to dispatch the work to without causing queuing in that part of the system. C) We can eliminate via both the argument on B, and on relative overheads: if we had 10000 1000-item dicts in process at once, the relative overhead of making items() from them all is approx the size of the dicts: but its almost certain we have much more state hanging around in each of those 10000 threads than each dict: so the incremental cost will not dominate the process overheads. I'm not - and haven't - said that iteritems() is never applicable *in general*, rather I don't believe its ever applicable *to us* today: and I'm arguing that we should default to items() and bring in iteritems() if and when we need it. -Rob -- Robert Collins <rbtcoll...@hp.com> Distinguished Technologist HP Converged Cloud __________________________________________________________________________ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev