Re: [openstack-dev] [all][python3] use of six.iteritems()
On Sep 13, 2016 10:42 PM, "Kevin Benton"wrote: > > >All performance matters. All memory consumption matters. Being wasteful over a purely aesthetic few extra characters of code is silly. > > Isn't the logical conclusion of this to write everything in a different language? :) I'm up for it if you are. :D __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [all][python3] use of six.iteritems()
On 09/13/2016 08:23 PM, Terry Wilson wrote: On Tue, Sep 13, 2016 at 6:31 PM, Jay Pipeswrote: On 09/13/2016 01:40 PM, Terry Wilson wrote: On Thu, Jun 11, 2015 at 8:33 AM, Sean Dague wrote: On 06/11/2015 09:02 AM, Jay Pipes wrote: On 06/11/2015 01:16 AM, Robert Collins wrote: But again - where in OpenStack does this matter the slightest? Precisely. I can't think of a single case where we are iterating over anywhere near the number of dictionary items that we would see any impact whatsoever. In neutron, the ovsdb native code iterates over fairly large dictionaries since the underlying OVS library stores OVSDB tables completely in memory as dicts. I just looked at the code I wrote and it currently uses values() and I now want to switch it to six.itervalues() :p. Best, -jay +1. This is a massive premature optimization which just makes all the code gorpy for no real reason. Premature optimization is about wasting a bunch of time trying to optimize code before you know you need to, not about following the accepted almost-always-faster/always-less-memory-using solution that already exists. Memory-wise it's the difference between a constant 88-byte iterator and the storage for an additional list of tuples. And if Raymond Hettinger, in a talk called "Transforming Code Into Beautiful Idiomatic Python" specifically mentions that people should always use iteritems (https://www.youtube.com/watch?v=OSGv2VnC0go=youtu.be=21m24s), I tend to believe him. Sure, it'd be much better if Python 3 and Python 2 both returned iterators for items(), values(), keys(), etc., but it doesn't. Wasting memory for purely aesthetic reasons (they're even both the same number of lines) is just a bad idea, IMNSHO. Is it wasted time to respond to a mailing list post from 18 months ago? -jay Ha! Absolutely it is. Someone posted a Neutron patch haphazardly converting all of of the six.iteritems() calls to items() and it struck a nerve. I searched for the thread in gmail not noticing the date. My apologies! :) Heh, no worries, I was mostly just being tongue-in-cheek :) -jay __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [all][python3] use of six.iteritems()
>All performance matters. All memory consumption matters. Being wasteful over a purely aesthetic few extra characters of code is silly. Isn't the logical conclusion of this to write everything in a different language? :) On Tue, Sep 13, 2016 at 8:42 AM, Terry Wilsonwrote: > On Wed, Jun 10, 2015 at 4:41 AM, Robert Collins > wrote: > > On 10 June 2015 at 21:30, Ihar Hrachyshka wrote: > >> -BEGIN PGP SIGNED MESSAGE- > >> Hash: SHA256 > >> > >> On 06/10/2015 02:15 AM, Robert Collins wrote: > >>> I'm very glad folk are working on Python3 ports. > >>> > >>> I'd like to call attention to one little wart in that process: I > >>> get the feeling that folk are applying a massive regex to find > >>> things like d.iteritems() and convert that to six.iteritems(d). > >>> > >>> I'd very much prefer that such a regex approach move things to > >>> d.items(), which is much easier to read. > >>> > >>> Here's why. Firstly, very very very few of our dict iterations are > >>> going to be performance sensitive in the way that iteritems() > >>> matters. Secondly, no really - unless you're doing HUGE dicts, it > >>> doesn't matter. Thirdly. Really, it doesn't. > >>> > >> > >> Does it hurt though? ;) > > > > Yes. > > > > Its: harder to read. Its going to have to be removed eventually anyway > > (when we stop supporting 2.7). Its marginally slower on 3.x (it has a > > function and an iterator wrapping the actual thing). Its unidiomatic, > > and we get lots of programmers that are new to Python; we should be > > giving them as beautiful code as we can to help them learn. > > If someone is so new they can't handle six.iteritems, they should stay > away from Neutron code. It'll eat them. > > >>> At 1 million items the overhead is 54ms[1]. If we're doing inner > >>> loops on million item dictionaries anywhere in OpenStack today, we > >>> have a problem. We might want to in e.g. the scheduler... if it > >>> held in-memory state on a million hypervisors at once, because I > >>> don't really to to imagine it pulling a million rows from a DB on > >>> every action. But then, we'd be looking at a whole 54ms. I think we > >>> could survive, if we did that (which we don't). > >>> > >>> So - please, no six.iteritems(). > > Huge -1 from me. The "I like looking at d.items() more than I like > looking at six.iteritems(d) so make everything (even slightly) less > efficient" argument is insane to me. All performance matters. All > memory consumption matters. Being wasteful over a purely aesthetic few > extra characters of code is silly. > > __ > OpenStack Development Mailing List (not for usage questions) > Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev > __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [all][python3] use of six.iteritems()
On Tue, Sep 13, 2016 at 6:31 PM, Jay Pipeswrote: > On 09/13/2016 01:40 PM, Terry Wilson wrote: >> >> On Thu, Jun 11, 2015 at 8:33 AM, Sean Dague wrote: >>> >>> On 06/11/2015 09:02 AM, Jay Pipes wrote: On 06/11/2015 01:16 AM, Robert Collins wrote: > > But again - where in OpenStack does this matter the slightest? Precisely. I can't think of a single case where we are iterating over anywhere near the number of dictionary items that we would see any impact whatsoever. >> >> >> In neutron, the ovsdb native code iterates over fairly large >> dictionaries since the underlying OVS library stores OVSDB tables >> completely in memory as dicts. I just looked at the code I wrote and >> it currently uses values() and I now want to switch it to >> six.itervalues() :p. >> Best, -jay >>> >>> >>> +1. >>> >>> This is a massive premature optimization which just makes all the code >>> gorpy for no real reason. >> >> >> Premature optimization is about wasting a bunch of time trying to >> optimize code before you know you need to, not about following the >> accepted almost-always-faster/always-less-memory-using solution that >> already exists. Memory-wise it's the difference between a constant >> 88-byte iterator and the storage for an additional list of tuples. And >> if Raymond Hettinger, in a talk called "Transforming Code Into >> Beautiful Idiomatic Python" specifically mentions that people should >> always use iteritems >> (https://www.youtube.com/watch?v=OSGv2VnC0go=youtu.be=21m24s), >> I tend to believe him. Sure, it'd be much better if Python 3 and >> Python 2 both returned iterators for items(), values(), keys(), etc., >> but it doesn't. Wasting memory for purely aesthetic reasons (they're >> even both the same number of lines) is just a bad idea, IMNSHO. > > > Is it wasted time to respond to a mailing list post from 18 months ago? > > -jay Ha! Absolutely it is. Someone posted a Neutron patch haphazardly converting all of of the six.iteritems() calls to items() and it struck a nerve. I searched for the thread in gmail not noticing the date. My apologies! :) Terry __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [all][python3] use of six.iteritems()
On 09/13/2016 01:40 PM, Terry Wilson wrote: On Thu, Jun 11, 2015 at 8:33 AM, Sean Daguewrote: On 06/11/2015 09:02 AM, Jay Pipes wrote: On 06/11/2015 01:16 AM, Robert Collins wrote: But again - where in OpenStack does this matter the slightest? Precisely. I can't think of a single case where we are iterating over anywhere near the number of dictionary items that we would see any impact whatsoever. In neutron, the ovsdb native code iterates over fairly large dictionaries since the underlying OVS library stores OVSDB tables completely in memory as dicts. I just looked at the code I wrote and it currently uses values() and I now want to switch it to six.itervalues() :p. Best, -jay +1. This is a massive premature optimization which just makes all the code gorpy for no real reason. Premature optimization is about wasting a bunch of time trying to optimize code before you know you need to, not about following the accepted almost-always-faster/always-less-memory-using solution that already exists. Memory-wise it's the difference between a constant 88-byte iterator and the storage for an additional list of tuples. And if Raymond Hettinger, in a talk called "Transforming Code Into Beautiful Idiomatic Python" specifically mentions that people should always use iteritems (https://www.youtube.com/watch?v=OSGv2VnC0go=youtu.be=21m24s), I tend to believe him. Sure, it'd be much better if Python 3 and Python 2 both returned iterators for items(), values(), keys(), etc., but it doesn't. Wasting memory for purely aesthetic reasons (they're even both the same number of lines) is just a bad idea, IMNSHO. Is it wasted time to respond to a mailing list post from 18 months ago? -jay __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [all][python3] use of six.iteritems()
On Thu, Jun 11, 2015 at 8:33 AM, Sean Daguewrote: > On 06/11/2015 09:02 AM, Jay Pipes wrote: >> On 06/11/2015 01:16 AM, Robert Collins wrote: >>> But again - where in OpenStack does this matter the slightest? >> >> Precisely. I can't think of a single case where we are iterating over >> anywhere near the number of dictionary items that we would see any >> impact whatsoever. In neutron, the ovsdb native code iterates over fairly large dictionaries since the underlying OVS library stores OVSDB tables completely in memory as dicts. I just looked at the code I wrote and it currently uses values() and I now want to switch it to six.itervalues() :p. >> Best, >> -jay > > +1. > > This is a massive premature optimization which just makes all the code > gorpy for no real reason. Premature optimization is about wasting a bunch of time trying to optimize code before you know you need to, not about following the accepted almost-always-faster/always-less-memory-using solution that already exists. Memory-wise it's the difference between a constant 88-byte iterator and the storage for an additional list of tuples. And if Raymond Hettinger, in a talk called "Transforming Code Into Beautiful Idiomatic Python" specifically mentions that people should always use iteritems (https://www.youtube.com/watch?v=OSGv2VnC0go=youtu.be=21m24s), I tend to believe him. Sure, it'd be much better if Python 3 and Python 2 both returned iterators for items(), values(), keys(), etc., but it doesn't. Wasting memory for purely aesthetic reasons (they're even both the same number of lines) is just a bad idea, IMNSHO. Terry __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [all][python3] use of six.iteritems()
On Sep 13, 2016, at 10:42 AM, Terry Wilsonwrote: > All performance matters. All > memory consumption matters. Being wasteful over a purely aesthetic few > extra characters of code is silly. import this -- Ed Leafe __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [all][python3] use of six.iteritems()
On Wed, Jun 10, 2015 at 4:41 AM, Robert Collinswrote: > On 10 June 2015 at 21:30, Ihar Hrachyshka wrote: >> -BEGIN PGP SIGNED MESSAGE- >> Hash: SHA256 >> >> On 06/10/2015 02:15 AM, Robert Collins wrote: >>> I'm very glad folk are working on Python3 ports. >>> >>> I'd like to call attention to one little wart in that process: I >>> get the feeling that folk are applying a massive regex to find >>> things like d.iteritems() and convert that to six.iteritems(d). >>> >>> I'd very much prefer that such a regex approach move things to >>> d.items(), which is much easier to read. >>> >>> Here's why. Firstly, very very very few of our dict iterations are >>> going to be performance sensitive in the way that iteritems() >>> matters. Secondly, no really - unless you're doing HUGE dicts, it >>> doesn't matter. Thirdly. Really, it doesn't. >>> >> >> Does it hurt though? ;) > > Yes. > > Its: harder to read. Its going to have to be removed eventually anyway > (when we stop supporting 2.7). Its marginally slower on 3.x (it has a > function and an iterator wrapping the actual thing). Its unidiomatic, > and we get lots of programmers that are new to Python; we should be > giving them as beautiful code as we can to help them learn. If someone is so new they can't handle six.iteritems, they should stay away from Neutron code. It'll eat them. >>> At 1 million items the overhead is 54ms[1]. If we're doing inner >>> loops on million item dictionaries anywhere in OpenStack today, we >>> have a problem. We might want to in e.g. the scheduler... if it >>> held in-memory state on a million hypervisors at once, because I >>> don't really to to imagine it pulling a million rows from a DB on >>> every action. But then, we'd be looking at a whole 54ms. I think we >>> could survive, if we did that (which we don't). >>> >>> So - please, no six.iteritems(). Huge -1 from me. The "I like looking at d.items() more than I like looking at six.iteritems(d) so make everything (even slightly) less efficient" argument is insane to me. All performance matters. All memory consumption matters. Being wasteful over a purely aesthetic few extra characters of code is silly. __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [all][python3] use of six.iteritems()
On 12 June 2015 at 05:39, Dolph Mathews dolph.math...@gmail.com wrote: On Thu, Jun 11, 2015 at 12:34 AM, Robert Collins robe...@robertcollins.net wrote: On 11 June 2015 at 17:16, Robert Collins robe...@robertcollins.net wrote: This test conflates setup and execution. Better like my example, ... Just had it pointed out to me that I've let my inner asshole out again - sorry. I'm going to step away from the thread for a bit; my personal state (daughter just had a routine but painful operation) shouldn't be taken out on other folk, however indirectly. Ha, no worries. You are completely correct about conflating setup and execution. As far as I can tell though, even if I isolate the dict setup from the benchmark, I get the same relative differences in results. iteritems() was introduced for a reason! Absolutely: the key question is whether that reason is applicable to us. If you don't need to go back to .items()'s copy behavior in py2, then six.iteritems() seems to be the best general purpose choice. I think Gordon said it best elsewhere in this thread: again, i just want to reiterate, i'm not saying don't use items(), i just think we should not blindly use items() just as we shouldn't blindly use iteritems()/viewitems() I'd like to recap and summarise a bit. I think its broadly agreed that: The three view based methods -- iteritems, iterkeys, iteritems -- in Python2 became unified with the list-form equivalents in Python3. The view based methods are substantially faster and lower overhead than the list form methods, approximately 3x. We don't have any services today that expect to hold million item dicts, or even 10K item dicts in a persistent fashion. There's some cognitive overhead involved in reading six.iteritems(d) vs d.items(). We should use d.items() except where it matters. Where does it matter? We have several process architectures in OpenStack: - We have API servers that are eventlet (except keystone) WSGI servers. They respond to requests on HTTP[S], each request is independent and loads all its state from the DB and/or memcache each time. We don't expect large numbers of concurrent active requests per process. (Where large would be e.g. 1000). - We have MQ servers that are conceptually the same as WSGI, just a different listening protocol. They do sometimes have background tasks, and for some (e.g. neutron-l3-agent) may hold significant cached state between requests. But thats still scoped to medium size datasets. We expect moderate numbers of concurrent active requests, as these are the actual backends doing things for users, but since these servers are typically working with actual slow things (e.g. the hypervisor) high concurrency typically goes badly :). - We have CLIs that start up, process some data and exit. This includes python-novaclient and nova-manage. They generally work with very small datasets and have no concurrency at all. There are two ways that iteritems vs items etc could matter. One A) is memorycpu on single use of very large dicts. The other B) is aggregate overhead on many concurrent uses of a single shared dict (or C) possibly N similar-sized dicts). A) Doesn't apply to us in any case I can think of. B) Doesn't apply to us either - our peak concurrency on any single process is still low (we may manage to make it higher now we're moving on the PyMYSql thing, but thats still in progress - and of course there are tradeoffs with high concurrency depending on the ratio of work-to-wait each request has. Very high concurrency depends on a very low ratio: to have 1000 concurrent requests that aren't slowing each other down requires that each requests wall clock be 1000x the time spent in-process actioning it; and that there be enough backend capacity (whatever that is) to dispatch the work to without causing queuing in that part of the system. C) We can eliminate via both the argument on B, and on relative overheads: if we had 1 1000-item dicts in process at once, the relative overhead of making items() from them all is approx the size of the dicts: but its almost certain we have much more state hanging around in each of those 1 threads than each dict: so the incremental cost will not dominate the process overheads. I'm not - and haven't - said that iteritems() is never applicable *in general*, rather I don't believe its ever applicable *to us* today: and I'm arguing that we should default to items() and bring in iteritems() if and when we need it. -Rob -- Robert Collins rbtcoll...@hp.com Distinguished Technologist HP Converged Cloud __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [all][python3] use of six.iteritems()
no worries... can't speak for that Dolph fellow though :) i think it's good to understand/learn different testing/benchmarking strategies. cheers, gord Date: Thu, 11 Jun 2015 17:34:57 +1200 From: robe...@robertcollins.net To: openstack-dev@lists.openstack.org Subject: Re: [openstack-dev] [all][python3] use of six.iteritems() On 11 June 2015 at 17:16, Robert Collins robe...@robertcollins.net wrote: This test conflates setup and execution. Better like my example, ... Just had it pointed out to me that I've let my inner asshole out again - sorry. I'm going to step away from the thread for a bit; my personal state (daughter just had a routine but painful operation) shouldn't be taken out on other folk, however indirectly. -Rob -- Robert Collins rbtcoll...@hp.com Distinguished Technologist HP Converged Cloud __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [all][python3] use of six.iteritems()
On 6/10/15 11:48 PM, Dolph Mathews wrote: tl;dr *.iteritems() is faster and more memory efficient than .items() in python2* Using xrange() in python2 instead of range() because it's more memory efficient and consistent between python 2 and 3... # xrange() + .items() python -m timeit -n 20 for\ i\ in\ dict(enumerate(xrange(100))).items():\ pass 20 loops, best of 3: 729 msec per loop peak memory usage: 203 megabytes # xrange() + .iteritems() python -m timeit -n 20 for\ i\ in\ dict(enumerate(xrange(100))).iteritems():\ pass 20 loops, best of 3: 644 msec per loop peak memory usage: 176 megabytes # python 3 python3 -m timeit -n 20 for\ i\ in\ dict(enumerate(range(100))).items():\ pass 20 loops, best of 3: 826 msec per loop peak memory usage: 198 megabytes it is just me, or are these differences pretty negligible considering this is the 1 million item dictionary, which in itself is a unicorn in openstack code or really most code anywhere? as was stated before, if we have million-item dictionaries floating around, that code has problems. I already have to wait full seconds for responses to come back when I play around with Neutron + Horizon in a devstack VM, and that's with no data at all. 100ms extra for a hypothetical million item structure would be long after the whole app has fallen over from having just ten thousand of anything, much less a million. My only concern with items() is that it is semantically different in Py2k / Py3k. Code that would otherwise have a dictionary changed size issue under iteritems() / py3k items() would succeed under py2k items(). If such a coding mistake is not covered by tests (as this is a data-dependent error condition), it would manifest as a sudden error condition on Py3k only. And if you really want to see the results with range() in python2... # range() + .items() python -m timeit -n 20 for\ i\ in\ dict(enumerate(range(100))).items():\ pass 20 loops, best of 3: 851 msec per loop peak memory usage: 254 megabytes # range() + .iteritems() python -m timeit -n 20 for\ i\ in\ dict(enumerate(range(100))).iteritems():\ pass 20 loops, best of 3: 919 msec per loop peak memory usage: 184 megabytes To benchmark memory consumption, I used the following on bare metal: $ valgrind --tool=massif --pages-as-heap=yes -massif-out-file=massif.out $COMMAND_FROM_ABOVE $ cat massif.out | grep mem_heap_B | sort -u $ python2 --version Python 2.7.9 $ python3 --version Python 3.4.3 On Wed, Jun 10, 2015 at 8:36 PM, gordon chung g...@live.ca mailto:g...@live.ca wrote: Date: Wed, 10 Jun 2015 21:33:44 +1200 From: robe...@robertcollins.net mailto:robe...@robertcollins.net To: openstack-dev@lists.openstack.org mailto:openstack-dev@lists.openstack.org Subject: Re: [openstack-dev] [all][python3] use of six.iteritems() On 10 June 2015 at 17:22, gordon chung g...@live.ca mailto:g...@live.ca wrote: maybe the suggestion should be don't blindly apply six.iteritems or items rather than don't apply iteritems at all. admittedly, it's a massive eyesore, but it's a very real use case that some projects deal with large data results and to enforce the latter policy can have negative effects[1]. one million item dictionary might be negligible but in a multi-user, multi-* environment that can have a significant impact on the amount memory required to store everything. [1] disclaimer: i have no real world results but i assume memory management was the reason for the switch in logic from py2 to py3 I wouldn't make that assumption. And no, memory isn't an issue. If you have a million item dict, ignoring the internal overheads, the dict needs 1 million object pointers. The size of a list with those pointers in it is 1M (pointer size in bytes). E.g. 4M or 8M. Nothing to worry about given the footprint of such a program :) iiuc, items() (in py2) will create a copy of the dictionary in memory to be processed. this is useful for cases such as concurrency where you want to ensure consistency but doing a quick test i noticed a massive spike in memory usage between items() and iteritems. 'for i in dict(enumerate(range(100))).items(): pass' consumes significantly more memory than 'for i in dict(enumerate(range(100))).iteritems(): pass'. on my system, the difference in memory consumption was double when using items() vs iteritems() and the cpu util was significantly more as well... let me know if there's anything that stands out as inaccurate. unless there's something wrong with my ignorant testing above, i think it's something projects should consider when mass applying any iteritems/items patch. cheers, gord __ OpenStack Development Mailing List (not for usage
Re: [openstack-dev] [all][python3] use of six.iteritems()
it is just me, or are these differences pretty negligible considering this is the 1 million item dictionary, which in itself is a unicorn in openstack code or really most code anywhere? as was stated before, if we have million-item dictionaries floating around, that code has problems. I already have to wait full seconds for responses to come back when I play around with Neutron + Horizon in a devstack VM, and that's with no data at all. 100ms extra for a hypothetical million item structure would be long after the whole app has fallen over from having just ten thousand of anything, much less a million. my concern isn't the 1million item dictionary -- i think we're all just using that as a simple test -- it's the 100 concurrent actions against a 10,000 item dictionary or the 1000 concurrent actions against at 1000 item dictionary... when tracking htop to see memory consumption, items() consistently doubled the memory consumption of iteritems(). again, i just want to reiterate, i'm not saying don't use items(), i just think we should not blindly use items() just as we shouldn't blindly use iteritems()/viewitems() My only concern with items() is that it is semantically different in Py2k / Py3k. Code that would otherwise have a dictionary changed size issue under iteritems() / py3k items() would succeed under py2k items(). If such a coding mistake is not covered by tests (as this is a data-dependent error condition), it would manifest as a sudden error condition on Py3k only. __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [all][python3] use of six.iteritems()
Hi, Le 10/06/2015 02:15, Robert Collins a écrit : python2.7 -m timeit -s 'd=dict(enumerate(range(100)))' 'for i in d.items(): pass' 10 loops, best of 3: 76.6 msec per loop python2.7 -m timeit -s 'd=dict(enumerate(range(100)))' 'for i in d.iteritems(): pass' 100 loops, best of 3: 22.6 msec per loop .items() is 3x as slow as .iteritems(). Hum, I don't have the same results. Try attached benchmark. I'm using my own wrapper on top of timeit, because timeit is bad at calibrating the benchmark :-/ timeit gives unreliable results. Results on with CPU model: Intel(R) Core(TM) i7-3520M CPU @ 2.90GHz: [ 10 keys ] 713 ns: iteritems 922 ns (+29%): items [ 10^3 keys ] 42.1 us: iteritems 59.4 us (+41%): items [ 10^6 keys (1 million) ] 89.3 ms: iteritems 442 ms (+395%): items In my benchmark, .items() is 5x as slow as .iteritems(). The code to iterate on 1 million items takes almost an half second. IMO adding 300 ms to each request is not negligible on an application. If this delay is added multiple times (multiple loops iterating on 1 million items), we may reach up to 1 second on an user request :-/ Anyway, when I write patches to port a project to Python 3, I don't want to touch *anything* to Python 2. The API, the performances, the behaviour, etc. must not change. I don't want to be responsible of a slow down, and I don't feel able to estimate if replacing dict.iteritems() with dict.items() has a cost on a real application. As Ihar wrote: it must be done in a separated patch, by developers knowning well the project. Currently, most developers writing Python 3 patches are not heavily involved in each ported project. There is also dict.itervalues(), not only dict.iteritems(). for key in dict.iterkeys() can simply be written for key in dict:. There is also xrange() vs range(), the debate is similar: https://review.openstack.org/#/c/185418/ For Python 3, I suggest to use from six.moves import range to get the Python 3 behaviour on Python 2: range() always create an iterator, it doesn't create a temporary list. IMO it makes the code more readable because for i in xrange(n): becomes for i in range(n):. six is not written outside imports and range() is better than xrange() for developers starting to learn Python. Victor Micro-benchmark for the Python operation key in dict. Run it with: ./python.orig benchmark.py script bench_str.py --file=orig ./python.patched benchmark.py script bench_str.py --file=patched ./python.patched benchmark.py compare_to orig patched Download benchmark.py from: https://bitbucket.org/haypo/misc/raw/tip/python/benchmark.py import gc def consume_items(dico): for key, value in dico.items(): pass def consume_iteritems(dico): for key, value in dico.iteritems(): pass def run_benchmark(bench): for nkeys in (10, 10**3, 10**6): bench.start_group('%s keys' % nkeys) dico = {str(index): index for index in range(nkeys)} bench.compare_functions( ('iteritems', consume_iteritems, dico), ('items', consume_items, dico), ) dico = None gc.collect() gc.collect() if __name__ == __main__: import benchmark benchmark.main() __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [all][python3] use of six.iteritems()
On 06/11/2015 01:16 AM, Robert Collins wrote: But again - where in OpenStack does this matter the slightest? Precisely. I can't think of a single case where we are iterating over anywhere near the number of dictionary items that we would see any impact whatsoever. Best, -jay __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [all][python3] use of six.iteritems()
On 06/11/2015 09:02 AM, Jay Pipes wrote: On 06/11/2015 01:16 AM, Robert Collins wrote: But again - where in OpenStack does this matter the slightest? Precisely. I can't think of a single case where we are iterating over anywhere near the number of dictionary items that we would see any impact whatsoever. Best, -jay +1. This is a massive premature optimization which just makes all the code gorpy for no real reason. -Sean -- Sean Dague http://dague.net __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [all][python3] use of six.iteritems()
On Thu, Jun 11, 2015 at 12:34 AM, Robert Collins robe...@robertcollins.net wrote: On 11 June 2015 at 17:16, Robert Collins robe...@robertcollins.net wrote: This test conflates setup and execution. Better like my example, ... Just had it pointed out to me that I've let my inner asshole out again - sorry. I'm going to step away from the thread for a bit; my personal state (daughter just had a routine but painful operation) shouldn't be taken out on other folk, however indirectly. Ha, no worries. You are completely correct about conflating setup and execution. As far as I can tell though, even if I isolate the dict setup from the benchmark, I get the same relative differences in results. iteritems() was introduced for a reason! If you don't need to go back to .items()'s copy behavior in py2, then six.iteritems() seems to be the best general purpose choice. I think Gordon said it best elsewhere in this thread: again, i just want to reiterate, i'm not saying don't use items(), i just think we should not blindly use items() just as we shouldn't blindly use iteritems()/viewitems() -Rob -- Robert Collins rbtcoll...@hp.com Distinguished Technologist HP Converged Cloud __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [all][python3] use of six.iteritems()
On 6/11/15 1:39 PM, Dolph Mathews wrote: On Thu, Jun 11, 2015 at 12:34 AM, Robert Collins robe...@robertcollins.net mailto:robe...@robertcollins.net wrote: On 11 June 2015 at 17:16, Robert Collins robe...@robertcollins.net mailto:robe...@robertcollins.net wrote: This test conflates setup and execution. Better like my example, ... Just had it pointed out to me that I've let my inner asshole out again - sorry. I'm going to step away from the thread for a bit; my personal state (daughter just had a routine but painful operation) shouldn't be taken out on other folk, however indirectly. Ha, no worries. You are completely correct about conflating setup and execution. As far as I can tell though, even if I isolate the dict setup from the benchmark, I get the same relative differences in results. iteritems() was introduced for a reason! If you don't need to go back to .items()'s copy behavior in py2, then six.iteritems() seems to be the best general purpose choice. I am firmly in the let's use items() camp. A 100 ms difference for a totally not-real-world case of a dictionary 1M items in size is no kind of rationale for the Openstack project - if someone has a dictionary that's 1M objects in size, or even 100K, that's a bug in and of itself. the real benchmarks we should be using, if we are to even bother at all (which we shouldn't), is to observe if items() vs. iteritems() has *any* difference that is at all measurable in terms of the overall execution of real-world openstack use cases. These nano-differences in speed are immediately dwarfed by all those operations surrounding them long before we even get to the level of RPC overhead. I think Gordon said it best elsewhere in this thread: again, i just want to reiterate, i'm not saying don't use items(), i just think we should not blindly use items() just as we shouldn't blindly use iteritems()/viewitems() If a demonstrable difference can be established in terms of real-world use cases for code that is using iteritems() vs. items(), then you can justify this difference. Otherwise, not worth it. -Rob -- Robert Collins rbtcoll...@hp.com mailto:rbtcoll...@hp.com Distinguished Technologist HP Converged Cloud __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [all][python3] use of six.iteritems()
+1 : .items() Why can't we just add six.iteritems calls case by case basis (if that happens)? Regex substitutions for a library call don't make sense to me on such a massive scale. On 6/11/15 11:00 AM, Victor Stinner wrote: Hi, Le 10/06/2015 02:15, Robert Collins a écrit : python2.7 -m timeit -s 'd=dict(enumerate(range(100)))' 'for i in d.items(): pass' 10 loops, best of 3: 76.6 msec per loop python2.7 -m timeit -s 'd=dict(enumerate(range(100)))' 'for i in d.iteritems(): pass' 100 loops, best of 3: 22.6 msec per loop .items() is 3x as slow as .iteritems(). Hum, I don't have the same results. Try attached benchmark. I'm using my own wrapper on top of timeit, because timeit is bad at calibrating the benchmark :-/ timeit gives unreliable results. Results on with CPU model: Intel(R) Core(TM) i7-3520M CPU @ 2.90GHz: [ 10 keys ] 713 ns: iteritems 922 ns (+29%): items [ 10^3 keys ] 42.1 us: iteritems 59.4 us (+41%): items [ 10^6 keys (1 million) ] 89.3 ms: iteritems 442 ms (+395%): items In my benchmark, .items() is 5x as slow as .iteritems(). The code to iterate on 1 million items takes almost an half second. IMO adding 300 ms to each request is not negligible on an application. If this delay is added multiple times (multiple loops iterating on 1 million items), we may reach up to 1 second on an user request :-/ Anyway, when I write patches to port a project to Python 3, I don't want to touch *anything* to Python 2. The API, the performances, the behaviour, etc. must not change. I don't want to be responsible of a slow down, and I don't feel able to estimate if replacing dict.iteritems() with dict.items() has a cost on a real application. As Ihar wrote: it must be done in a separated patch, by developers knowning well the project. Currently, most developers writing Python 3 patches are not heavily involved in each ported project. There is also dict.itervalues(), not only dict.iteritems(). for key in dict.iterkeys() can simply be written for key in dict:. There is also xrange() vs range(), the debate is similar: https://review.openstack.org/#/c/185418/ For Python 3, I suggest to use from six.moves import range to get the Python 3 behaviour on Python 2: range() always create an iterator, it doesn't create a temporary list. IMO it makes the code more readable because for i in xrange(n): becomes for i in range(n):. six is not written outside imports and range() is better than xrange() for developers starting to learn Python. Victor __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev -- Thanks, Nikhil __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [all][python3] use of six.iteritems()
On 06/11/2015 01:46 PM, Mike Bayer wrote: I am firmly in the let's use items() camp. A 100 ms difference for a totally not-real-world case of a dictionary 1M items in size is no kind of rationale for the Openstack project - if someone has a dictionary that's 1M objects in size, or even 100K, that's a bug in and of itself. the real benchmarks we should be using, if we are to even bother at all (which we shouldn't), is to observe if items() vs. iteritems() has *any* difference that is at all measurable in terms of the overall execution of real-world openstack use cases. These nano-differences in speed are immediately dwarfed by all those operations surrounding them long before we even get to the level of RPC overhead. Lessons learned in the trenches: * The best code is the simplest [1] and easiest to read. * Code is write-once, read-many; clarity is a vital part of the read-many. * Do not optimize until functionality is complete. * Optimize only after profiling real world use cases. * Prior assumptions about what needs optimization are almost always proven wrong by a profiler. * I/O latency vastly overwhelms most code optimization making obtuse optimization pointless and detrimental to long term robustness. * The amount of optimization needed is usually minimal, restricted to just a few code locations and 80% of the speed increases occur in just the first few tweaks after analyzing profile data. [1] Compilers can optimize simple code best, simple code is easy to write and easier to read while at the same time giving the tool chain the best chance of turning your simple code into efficient code. (Not sure how much this applies to Python, but it's certainly true of other compiled languages.) John __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [all][python3] use of six.iteritems()
Top posting as this is more a response to the whole thread. My take aways from the most excellent discussion: * There is some benefit to iteritems in python2 when you need it. * OpenStack does not seem to need it - Except in places that are operating on tens of thousands of large objects concurrently such as the nova scheduler. * six.anything is more code, and more code is more burden in general. From this I believe we should distill some clear developer and reviewer recommendations which should go in our developer docs: * Do not use six.iteritems in new patches without a clear reason stated and attached. - Reasons should clearly state why .items() would be a large enough burden, such as this list will be large and stay resident in memory for the duration of the program. Each concurrent request will have similar lists. * -1 patches using six.iteritems in flight now with Please remove or justify six.iteritems usage. * Patches touching code sections which uses six.iteritems should be allowed to remove its usage without justification. I've gone ahead and added this suggestion in a patch to the infra-manual: https://review.openstack.org/190757 This looks quite a bit like a hacking rule definition. How strongly do we feel about this, do we want to require a tag of some kind on lines that use six.iteritems(), or are we comfortable with this just being in our python3 porting documentation? Excerpts from Robert Collins's message of 2015-06-09 17:15:33 -0700: I'm very glad folk are working on Python3 ports. I'd like to call attention to one little wart in that process: I get the feeling that folk are applying a massive regex to find things like d.iteritems() and convert that to six.iteritems(d). I'd very much prefer that such a regex approach move things to d.items(), which is much easier to read. Here's why. Firstly, very very very few of our dict iterations are going to be performance sensitive in the way that iteritems() matters. Secondly, no really - unless you're doing HUGE dicts, it doesn't matter. Thirdly. Really, it doesn't. At 1 million items the overhead is 54ms[1]. If we're doing inner loops on million item dictionaries anywhere in OpenStack today, we have a problem. We might want to in e.g. the scheduler... if it held in-memory state on a million hypervisors at once, because I don't really to to imagine it pulling a million rows from a DB on every action. But then, we'd be looking at a whole 54ms. I think we could survive, if we did that (which we don't). So - please, no six.iteritems(). Thanks, Rob [1] python2.7 -m timeit -s 'd=dict(enumerate(range(100)))' 'for i in d.items(): pass' 10 loops, best of 3: 76.6 msec per loop python2.7 -m timeit -s 'd=dict(enumerate(range(100)))' 'for i in d.iteritems(): pass' 100 loops, best of 3: 22.6 msec per loop python3.4 -m timeit -s 'd=dict(enumerate(range(100)))' 'for i in d.items(): pass' 10 loops, best of 3: 18.9 msec per loop pypy2.3 -m timeit -s 'd=dict(enumerate(range(100)))' 'for i in d.items(): pass' 10 loops, best of 3: 65.8 msec per loop # and out of interest, assuming that that hadn't triggered the JIT but it had. pypy -m timeit -n 1000 -s 'd=dict(enumerate(range(100)))' 'for i in d.items(): pass' 1000 loops, best of 3: 64.3 msec per loop __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [all][python3] use of six.iteritems()
On 6/11/15 2:45 PM, John Dennis wrote: On 06/11/2015 01:46 PM, Mike Bayer wrote: I am firmly in the let's use items() camp. A 100 ms difference for a totally not-real-world case of a dictionary 1M items in size is no kind of rationale for the Openstack project - if someone has a dictionary that's 1M objects in size, or even 100K, that's a bug in and of itself. the real benchmarks we should be using, if we are to even bother at all (which we shouldn't), is to observe if items() vs. iteritems() has *any* difference that is at all measurable in terms of the overall execution of real-world openstack use cases. These nano-differences in speed are immediately dwarfed by all those operations surrounding them long before we even get to the level of RPC overhead. Lessons learned in the trenches: * The best code is the simplest [1] and easiest to read. * Code is write-once, read-many; clarity is a vital part of the read-many. +1 * Do not optimize until functionality is complete. +1 * Optimize only after profiling real world use cases. +2! * Prior assumptions about what needs optimization are almost always proven wrong by a profiler. +2! * I/O latency vastly overwhelms most code optimization making obtuse optimization pointless and detrimental to long term robustness. Couldn't agree more * The amount of optimization needed is usually minimal, restricted to just a few code locations and 80% of the speed increases occur in just the first few tweaks after analyzing profile data. [1] Compilers can optimize simple code best, simple code is easy to write and easier to read while at the same time giving the tool chain the best chance of turning your simple code into efficient code. (Not sure how much this applies to Python, but it's certainly true of other compiled languages.) John __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev -- Thanks, Nikhil __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [all][python3] use of six.iteritems()
On 06/09/2015 08:15 PM, Robert Collins wrote: I'm very glad folk are working on Python3 ports. I'd like to call attention to one little wart in that process: I get the feeling that folk are applying a massive regex to find things like d.iteritems() and convert that to six.iteritems(d). I'd very much prefer that such a regex approach move things to d.items(), which is much easier to read. Here's why. Firstly, very very very few of our dict iterations are going to be performance sensitive in the way that iteritems() matters. Secondly, no really - unless you're doing HUGE dicts, it doesn't matter. Thirdly. Really, it doesn't. At 1 million items the overhead is 54ms[1]. If we're doing inner loops on million item dictionaries anywhere in OpenStack today, we have a problem. We might want to in e.g. the scheduler... if it held in-memory state on a million hypervisors at once, because I don't really to to imagine it pulling a million rows from a DB on every action. But then, we'd be looking at a whole 54ms. I think we could survive, if we did that (which we don't). So - please, no six.iteritems(). Thanks, Rob [1] python2.7 -m timeit -s 'd=dict(enumerate(range(100)))' 'for i in d.items(): pass' 10 loops, best of 3: 76.6 msec per loop python2.7 -m timeit -s 'd=dict(enumerate(range(100)))' 'for i in d.iteritems(): pass' 100 loops, best of 3: 22.6 msec per loop python3.4 -m timeit -s 'd=dict(enumerate(range(100)))' 'for i in d.items(): pass' 10 loops, best of 3: 18.9 msec per loop pypy2.3 -m timeit -s 'd=dict(enumerate(range(100)))' 'for i in d.items(): pass' 10 loops, best of 3: 65.8 msec per loop # and out of interest, assuming that that hadn't triggered the JIT but it had. pypy -m timeit -n 1000 -s 'd=dict(enumerate(range(100)))' 'for i in d.items(): pass' 1000 loops, best of 3: 64.3 msec per loop That's awesome, because those six.iteritems loops make me want to throw up a little. Very happy to have our code just use items instead. -Sean -- Sean Dague http://dague.net __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [all][python3] use of six.iteritems()
On 10 June 2015 at 21:30, Ihar Hrachyshka ihrac...@redhat.com wrote: -BEGIN PGP SIGNED MESSAGE- Hash: SHA256 On 06/10/2015 02:15 AM, Robert Collins wrote: I'm very glad folk are working on Python3 ports. I'd like to call attention to one little wart in that process: I get the feeling that folk are applying a massive regex to find things like d.iteritems() and convert that to six.iteritems(d). I'd very much prefer that such a regex approach move things to d.items(), which is much easier to read. Here's why. Firstly, very very very few of our dict iterations are going to be performance sensitive in the way that iteritems() matters. Secondly, no really - unless you're doing HUGE dicts, it doesn't matter. Thirdly. Really, it doesn't. Does it hurt though? ;) Yes. Its: harder to read. Its going to have to be removed eventually anyway (when we stop supporting 2.7). Its marginally slower on 3.x (it has a function and an iterator wrapping the actual thing). Its unidiomatic, and we get lots of programmers that are new to Python; we should be giving them as beautiful code as we can to help them learn. At 1 million items the overhead is 54ms[1]. If we're doing inner loops on million item dictionaries anywhere in OpenStack today, we have a problem. We might want to in e.g. the scheduler... if it held in-memory state on a million hypervisors at once, because I don't really to to imagine it pulling a million rows from a DB on every action. But then, we'd be looking at a whole 54ms. I think we could survive, if we did that (which we don't). So - please, no six.iteritems(). The reason why in e.g. neutron we merged the patch using six.iteritems is that we don't want to go too deep into determining whether the original usage of iteritems() was justified. Its not. The goal of the patch is to get python3 support, not to apply subjective style guidelines, so if someone wants to eliminate .iteritems(), he should create another patch just for that and struggle with reviewing it. While folks interested python3 can proceed with their work. We should not be afraid of multiple patches. We shouldn't be indeed. All I'm asking is that we don't do poor intermediate patches. I've written code where performance tuning like that around iteritems mattered. That code also needed to optimise tuple unpacking to avoid performance hits and was aiming to manipulate million item data sets from interpreter startup in subsecond times. It was some of the worst, most impenetrable Python code I've ever seen, and while our code has lots of issues, it neither has the same performance context that that did, nor (thankfully) is it such impenetrable code. -Rob -- Robert Collins rbtcoll...@hp.com Distinguished Technologist HP Converged Cloud __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [all][python3] use of six.iteritems()
-BEGIN PGP SIGNED MESSAGE- Hash: SHA256 On 06/10/2015 02:15 AM, Robert Collins wrote: I'm very glad folk are working on Python3 ports. I'd like to call attention to one little wart in that process: I get the feeling that folk are applying a massive regex to find things like d.iteritems() and convert that to six.iteritems(d). I'd very much prefer that such a regex approach move things to d.items(), which is much easier to read. Here's why. Firstly, very very very few of our dict iterations are going to be performance sensitive in the way that iteritems() matters. Secondly, no really - unless you're doing HUGE dicts, it doesn't matter. Thirdly. Really, it doesn't. Does it hurt though? ;) At 1 million items the overhead is 54ms[1]. If we're doing inner loops on million item dictionaries anywhere in OpenStack today, we have a problem. We might want to in e.g. the scheduler... if it held in-memory state on a million hypervisors at once, because I don't really to to imagine it pulling a million rows from a DB on every action. But then, we'd be looking at a whole 54ms. I think we could survive, if we did that (which we don't). So - please, no six.iteritems(). The reason why in e.g. neutron we merged the patch using six.iteritems is that we don't want to go too deep into determining whether the original usage of iteritems() was justified. The goal of the patch is to get python3 support, not to apply subjective style guidelines, so if someone wants to eliminate .iteritems(), he should create another patch just for that and struggle with reviewing it. While folks interested python3 can proceed with their work. We should not be afraid of multiple patches. Ihar -BEGIN PGP SIGNATURE- Version: GnuPG v2 iQEcBAEBCAAGBQJVeAO+AAoJEC5aWaUY1u57/XwH/0AsOQHa1IDWOauginSHAbi+ ZNwAUDRSKEI+ydwf9u/DxRkZP2MsiJwAbrlPeGyjr8aqNpqoTLcS5CxYaS7IqSOn khrVGkczv6yNwKrB6j3jAFJtXz+Z2i475eTLJqRgdUeI4gJinhc0ghXJzF+4HpUN 2DewJlOqrD3OWJcUu0Gvmp4aEkr8JK0Iu2crCRoFJ2N5fvv7rt8FfcZ3oGkixJXd n0+xD5Aszl8M/jAv3xt7ZxqFSL7QUiEhAVVgJEHm0D8mAR+2J9bpCKVvjkJ5T7Tw fkHYXetzUipe0MMpXPl3jfSKBitpFOOOEBaqOSXvvgtxAo8U6nkNgPe6n+vuduc= =lUY4 -END PGP SIGNATURE- __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [all][python3] use of six.iteritems()
On 10 June 2015 at 17:22, gordon chung g...@live.ca wrote: maybe the suggestion should be don't blindly apply six.iteritems or items rather than don't apply iteritems at all. admittedly, it's a massive eyesore, but it's a very real use case that some projects deal with large data results and to enforce the latter policy can have negative effects[1]. one million item dictionary might be negligible but in a multi-user, multi-* environment that can have a significant impact on the amount memory required to store everything. [1] disclaimer: i have no real world results but i assume memory management was the reason for the switch in logic from py2 to py3 I wouldn't make that assumption. And no, memory isn't an issue. If you have a million item dict, ignoring the internal overheads, the dict needs 1 million object pointers. The size of a list with those pointers in it is 1M (pointer size in bytes). E.g. 4M or 8M. Nothing to worry about given the footprint of such a program :) -Rob -- Robert Collins rbtcoll...@hp.com Distinguished Technologist HP Converged Cloud __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [all][python3] use of six.iteritems()
Date: Wed, 10 Jun 2015 21:33:44 +1200 From: robe...@robertcollins.net To: openstack-dev@lists.openstack.org Subject: Re: [openstack-dev] [all][python3] use of six.iteritems() On 10 June 2015 at 17:22, gordon chung g...@live.ca wrote: maybe the suggestion should be don't blindly apply six.iteritems or items rather than don't apply iteritems at all. admittedly, it's a massive eyesore, but it's a very real use case that some projects deal with large data results and to enforce the latter policy can have negative effects[1]. one million item dictionary might be negligible but in a multi-user, multi-* environment that can have a significant impact on the amount memory required to store everything. [1] disclaimer: i have no real world results but i assume memory management was the reason for the switch in logic from py2 to py3 I wouldn't make that assumption. And no, memory isn't an issue. If you have a million item dict, ignoring the internal overheads, the dict needs 1 million object pointers. The size of a list with those pointers in it is 1M (pointer size in bytes). E.g. 4M or 8M. Nothing to worry about given the footprint of such a program :) iiuc, items() (in py2) will create a copy of the dictionary in memory to be processed. this is useful for cases such as concurrency where you want to ensure consistency but doing a quick test i noticed a massive spike in memory usage between items() and iteritems. 'for i in dict(enumerate(range(100))).items(): pass' consumes significantly more memory than 'for i in dict(enumerate(range(100))).iteritems(): pass'. on my system, the difference in memory consumption was double when using items() vs iteritems() and the cpu util was significantly more as well... let me know if there's anything that stands out as inaccurate. unless there's something wrong with my ignorant testing above, i think it's something projects should consider when mass applying any iteritems/items patch. cheers,gord __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [all][python3] use of six.iteritems()
On 11 June 2015 at 17:16, Robert Collins robe...@robertcollins.net wrote: This test conflates setup and execution. Better like my example, ... Just had it pointed out to me that I've let my inner asshole out again - sorry. I'm going to step away from the thread for a bit; my personal state (daughter just had a routine but painful operation) shouldn't be taken out on other folk, however indirectly. -Rob -- Robert Collins rbtcoll...@hp.com Distinguished Technologist HP Converged Cloud __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [all][python3] use of six.iteritems()
On 11 June 2015 at 15:48, Dolph Mathews dolph.math...@gmail.com wrote: tl;dr .iteritems() is faster and more memory efficient than .items() in python2 Using xrange() in python2 instead of range() because it's more memory efficient and consistent between python 2 and 3... # xrange() + .items() python -m timeit -n 20 for\ i\ in\ dict(enumerate(xrange(100))).items():\ pass 20 loops, best of 3: 729 msec per loop peak memory usage: 203 megabytes This test conflates setup and execution. Better like my example, because otherwise you're not testing iteritems vs items, you're testing dictionary creation time; likewise memory pressure. Your times are meaningless as it stands. To test memory pressure, don't use timeit. Just use the interpreter. $ python Python 2.7.8 (default, Oct 20 2014, 15:05:19) [GCC 4.9.1] on linux2 Type help, copyright, credits or license for more information. d = dict(enumerate(range(100))) import os os.getpid() 28345 PID USER PR NIVIRTRESSHR S %CPU %MEM TIME+ COMMAND 28345 robertc 20 0 127260 104568 4744 S 0.0 0.6 0:00.17 python i = d.items() 28345 robertc 20 0 206524 183560 4744 S 0.0 1.1 0:00.59 python 183560-104568=80M to hold a reference to all 1 million items, which indeed is not as efficient as python3. So *IF* we had a million item dict, and absolutely nothing else around, we should care. But again - where in OpenStack does this matter the slightest? No one has disputed that they are different. The assertion that it matters is what is out of line with our reality. 1 items: 28399 robertc 20 0 31404 8480 4612 S 0.0 0.1 0:00.01 python 28399 robertc 20 0 32172 9268 4612 S 0.0 0.1 0:00.01 python 9268-8489=0.8M which is indeed 2 orders of magnitude less. And I'd STILL challenge anyone to find a place where 1 items are being passed around within OpenStack's components without it being a bug today. Optimising away under a M of data when we shouldn't have that many rows/items/whatever in memory in the first place is just entirely missing the point of programming in Python. -Rob -- Robert Collins rbtcoll...@hp.com Distinguished Technologist HP Converged Cloud __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [all][python3] use of six.iteritems()
tl;dr *.iteritems() is faster and more memory efficient than .items() in python2* Using xrange() in python2 instead of range() because it's more memory efficient and consistent between python 2 and 3... # xrange() + .items() python -m timeit -n 20 for\ i\ in\ dict(enumerate(xrange(100))).items():\ pass 20 loops, best of 3: 729 msec per loop peak memory usage: 203 megabytes # xrange() + .iteritems() python -m timeit -n 20 for\ i\ in\ dict(enumerate(xrange(100))).iteritems():\ pass 20 loops, best of 3: 644 msec per loop peak memory usage: 176 megabytes # python 3 python3 -m timeit -n 20 for\ i\ in\ dict(enumerate(range(100))).items():\ pass 20 loops, best of 3: 826 msec per loop peak memory usage: 198 megabytes And if you really want to see the results with range() in python2... # range() + .items() python -m timeit -n 20 for\ i\ in\ dict(enumerate(range(100))).items():\ pass 20 loops, best of 3: 851 msec per loop peak memory usage: 254 megabytes # range() + .iteritems() python -m timeit -n 20 for\ i\ in\ dict(enumerate(range(100))).iteritems():\ pass 20 loops, best of 3: 919 msec per loop peak memory usage: 184 megabytes To benchmark memory consumption, I used the following on bare metal: $ valgrind --tool=massif --pages-as-heap=yes -massif-out-file=massif.out $COMMAND_FROM_ABOVE $ cat massif.out | grep mem_heap_B | sort -u $ python2 --version Python 2.7.9 $ python3 --version Python 3.4.3 On Wed, Jun 10, 2015 at 8:36 PM, gordon chung g...@live.ca wrote: Date: Wed, 10 Jun 2015 21:33:44 +1200 From: robe...@robertcollins.net To: openstack-dev@lists.openstack.org Subject: Re: [openstack-dev] [all][python3] use of six.iteritems() On 10 June 2015 at 17:22, gordon chung g...@live.ca wrote: maybe the suggestion should be don't blindly apply six.iteritems or items rather than don't apply iteritems at all. admittedly, it's a massive eyesore, but it's a very real use case that some projects deal with large data results and to enforce the latter policy can have negative effects[1]. one million item dictionary might be negligible but in a multi-user, multi-* environment that can have a significant impact on the amount memory required to store everything. [1] disclaimer: i have no real world results but i assume memory management was the reason for the switch in logic from py2 to py3 I wouldn't make that assumption. And no, memory isn't an issue. If you have a million item dict, ignoring the internal overheads, the dict needs 1 million object pointers. The size of a list with those pointers in it is 1M (pointer size in bytes). E.g. 4M or 8M. Nothing to worry about given the footprint of such a program :) iiuc, items() (in py2) will create a copy of the dictionary in memory to be processed. this is useful for cases such as concurrency where you want to ensure consistency but doing a quick test i noticed a massive spike in memory usage between items() and iteritems. 'for i in dict(enumerate(range(100))).items(): pass' consumes significantly more memory than 'for i in dict(enumerate(range(100))).iteritems(): pass'. on my system, the difference in memory consumption was double when using items() vs iteritems() and the cpu util was significantly more as well... let me know if there's anything that stands out as inaccurate. unless there's something wrong with my ignorant testing above, i think it's something projects should consider when mass applying any iteritems/items patch. cheers, gord __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
[openstack-dev] [all][python3] use of six.iteritems()
I'm very glad folk are working on Python3 ports. I'd like to call attention to one little wart in that process: I get the feeling that folk are applying a massive regex to find things like d.iteritems() and convert that to six.iteritems(d). I'd very much prefer that such a regex approach move things to d.items(), which is much easier to read. Here's why. Firstly, very very very few of our dict iterations are going to be performance sensitive in the way that iteritems() matters. Secondly, no really - unless you're doing HUGE dicts, it doesn't matter. Thirdly. Really, it doesn't. At 1 million items the overhead is 54ms[1]. If we're doing inner loops on million item dictionaries anywhere in OpenStack today, we have a problem. We might want to in e.g. the scheduler... if it held in-memory state on a million hypervisors at once, because I don't really to to imagine it pulling a million rows from a DB on every action. But then, we'd be looking at a whole 54ms. I think we could survive, if we did that (which we don't). So - please, no six.iteritems(). Thanks, Rob [1] python2.7 -m timeit -s 'd=dict(enumerate(range(100)))' 'for i in d.items(): pass' 10 loops, best of 3: 76.6 msec per loop python2.7 -m timeit -s 'd=dict(enumerate(range(100)))' 'for i in d.iteritems(): pass' 100 loops, best of 3: 22.6 msec per loop python3.4 -m timeit -s 'd=dict(enumerate(range(100)))' 'for i in d.items(): pass' 10 loops, best of 3: 18.9 msec per loop pypy2.3 -m timeit -s 'd=dict(enumerate(range(100)))' 'for i in d.items(): pass' 10 loops, best of 3: 65.8 msec per loop # and out of interest, assuming that that hadn't triggered the JIT but it had. pypy -m timeit -n 1000 -s 'd=dict(enumerate(range(100)))' 'for i in d.items(): pass' 1000 loops, best of 3: 64.3 msec per loop -- Robert Collins rbtcoll...@hp.com Distinguished Technologist HP Converged Cloud __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [all][python3] use of six.iteritems()
On 06/09/2015 08:15 PM, Robert Collins wrote: I'm very glad folk are working on Python3 ports. I'd like to call attention to one little wart in that process: I get the feeling that folk are applying a massive regex to find things like d.iteritems() and convert that to six.iteritems(d). I'd very much prefer that such a regex approach move things to d.items(), which is much easier to read. Here's why. Firstly, very very very few of our dict iterations are going to be performance sensitive in the way that iteritems() matters. Secondly, no really - unless you're doing HUGE dicts, it doesn't matter. Thirdly. Really, it doesn't. At 1 million items the overhead is 54ms[1]. If we're doing inner loops on million item dictionaries anywhere in OpenStack today, we have a problem. We might want to in e.g. the scheduler... if it held in-memory state on a million hypervisors at once, because I don't really to to imagine it pulling a million rows from a DB on every action. But then, we'd be looking at a whole 54ms. I think we could survive, if we did that (which we don't). So - please, no six.iteritems(). +1 -jay __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [all][python3] use of six.iteritems()
+1 Don't forget values and keys in addition to items. They aren't as common but come up every so often. I think you can iterate the keys just by iterating on the dict itself. Carl On Jun 9, 2015 6:18 PM, Robert Collins robe...@robertcollins.net wrote: I'm very glad folk are working on Python3 ports. I'd like to call attention to one little wart in that process: I get the feeling that folk are applying a massive regex to find things like d.iteritems() and convert that to six.iteritems(d). I'd very much prefer that such a regex approach move things to d.items(), which is much easier to read. Here's why. Firstly, very very very few of our dict iterations are going to be performance sensitive in the way that iteritems() matters. Secondly, no really - unless you're doing HUGE dicts, it doesn't matter. Thirdly. Really, it doesn't. At 1 million items the overhead is 54ms[1]. If we're doing inner loops on million item dictionaries anywhere in OpenStack today, we have a problem. We might want to in e.g. the scheduler... if it held in-memory state on a million hypervisors at once, because I don't really to to imagine it pulling a million rows from a DB on every action. But then, we'd be looking at a whole 54ms. I think we could survive, if we did that (which we don't). So - please, no six.iteritems(). Thanks, Rob [1] python2.7 -m timeit -s 'd=dict(enumerate(range(100)))' 'for i in d.items(): pass' 10 loops, best of 3: 76.6 msec per loop python2.7 -m timeit -s 'd=dict(enumerate(range(100)))' 'for i in d.iteritems(): pass' 100 loops, best of 3: 22.6 msec per loop python3.4 -m timeit -s 'd=dict(enumerate(range(100)))' 'for i in d.items(): pass' 10 loops, best of 3: 18.9 msec per loop pypy2.3 -m timeit -s 'd=dict(enumerate(range(100)))' 'for i in d.items(): pass' 10 loops, best of 3: 65.8 msec per loop # and out of interest, assuming that that hadn't triggered the JIT but it had. pypy -m timeit -n 1000 -s 'd=dict(enumerate(range(100)))' 'for i in d.items(): pass' 1000 loops, best of 3: 64.3 msec per loop -- Robert Collins rbtcoll...@hp.com Distinguished Technologist HP Converged Cloud __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [all][python3] use of six.iteritems()
Huge +1 both for the suggestion and for reasoning. It's better to avoid substituting language features by a library. Eugene. On Tue, Jun 9, 2015 at 5:15 PM, Robert Collins robe...@robertcollins.net wrote: I'm very glad folk are working on Python3 ports. I'd like to call attention to one little wart in that process: I get the feeling that folk are applying a massive regex to find things like d.iteritems() and convert that to six.iteritems(d). I'd very much prefer that such a regex approach move things to d.items(), which is much easier to read. Here's why. Firstly, very very very few of our dict iterations are going to be performance sensitive in the way that iteritems() matters. Secondly, no really - unless you're doing HUGE dicts, it doesn't matter. Thirdly. Really, it doesn't. At 1 million items the overhead is 54ms[1]. If we're doing inner loops on million item dictionaries anywhere in OpenStack today, we have a problem. We might want to in e.g. the scheduler... if it held in-memory state on a million hypervisors at once, because I don't really to to imagine it pulling a million rows from a DB on every action. But then, we'd be looking at a whole 54ms. I think we could survive, if we did that (which we don't). So - please, no six.iteritems(). Thanks, Rob [1] python2.7 -m timeit -s 'd=dict(enumerate(range(100)))' 'for i in d.items(): pass' 10 loops, best of 3: 76.6 msec per loop python2.7 -m timeit -s 'd=dict(enumerate(range(100)))' 'for i in d.iteritems(): pass' 100 loops, best of 3: 22.6 msec per loop python3.4 -m timeit -s 'd=dict(enumerate(range(100)))' 'for i in d.items(): pass' 10 loops, best of 3: 18.9 msec per loop pypy2.3 -m timeit -s 'd=dict(enumerate(range(100)))' 'for i in d.items(): pass' 10 loops, best of 3: 65.8 msec per loop # and out of interest, assuming that that hadn't triggered the JIT but it had. pypy -m timeit -n 1000 -s 'd=dict(enumerate(range(100)))' 'for i in d.items(): pass' 1000 loops, best of 3: 64.3 msec per loop -- Robert Collins rbtcoll...@hp.com Distinguished Technologist HP Converged Cloud __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [all][python3] use of six.iteritems()
maybe the suggestion should be don't blindly apply six.iteritems or items rather than don't apply iteritems at all. admittedly, it's a massive eyesore, but it's a very real use case that some projects deal with large data results and to enforce the latter policy can have negative effects[1]. one million item dictionary might be negligible but in a multi-user, multi-* environment that can have a significant impact on the amount memory required to store everything. [1] disclaimer: i have no real world results but i assume memory management was the reason for the switch in logic from py2 to py3 cheers, gord Date: Wed, 10 Jun 2015 12:15:33 +1200 From: robe...@robertcollins.net To: openstack-dev@lists.openstack.org Subject: [openstack-dev] [all][python3] use of six.iteritems() I'm very glad folk are working on Python3 ports. I'd like to call attention to one little wart in that process: I get the feeling that folk are applying a massive regex to find things like d.iteritems() and convert that to six.iteritems(d). I'd very much prefer that such a regex approach move things to d.items(), which is much easier to read. Here's why. Firstly, very very very few of our dict iterations are going to be performance sensitive in the way that iteritems() matters. Secondly, no really - unless you're doing HUGE dicts, it doesn't matter. Thirdly. Really, it doesn't. At 1 million items the overhead is 54ms[1]. If we're doing inner loops on million item dictionaries anywhere in OpenStack today, we have a problem. We might want to in e.g. the scheduler... if it held in-memory state on a million hypervisors at once, because I don't really to to imagine it pulling a million rows from a DB on every action. But then, we'd be looking at a whole 54ms. I think we could survive, if we did that (which we don't). So - please, no six.iteritems(). Thanks, Rob [1] python2.7 -m timeit -s 'd=dict(enumerate(range(100)))' 'for i in d.items(): pass' 10 loops, best of 3: 76.6 msec per loop python2.7 -m timeit -s 'd=dict(enumerate(range(100)))' 'for i in d.iteritems(): pass' 100 loops, best of 3: 22.6 msec per loop python3.4 -m timeit -s 'd=dict(enumerate(range(100)))' 'for i in d.items(): pass' 10 loops, best of 3: 18.9 msec per loop pypy2.3 -m timeit -s 'd=dict(enumerate(range(100)))' 'for i in d.items(): pass' 10 loops, best of 3: 65.8 msec per loop # and out of interest, assuming that that hadn't triggered the JIT but it had. pypy -m timeit -n 1000 -s 'd=dict(enumerate(range(100)))' 'for i in d.items(): pass' 1000 loops, best of 3: 64.3 msec per loop -- Robert Collins rbtcoll...@hp.com Distinguished Technologist HP Converged Cloud __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev