Re: [openstack-dev] [all][python3] use of six.iteritems()

2016-09-14 Thread Terry Wilson
On Sep 13, 2016 10:42 PM, "Kevin Benton"  wrote:
>
> >All performance matters. All memory consumption matters. Being wasteful
over a purely aesthetic few extra characters of code is silly.
>
> Isn't the logical conclusion of this to write everything in a different
language? :)

I'm up for it if you are. :D
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [all][python3] use of six.iteritems()

2016-09-13 Thread Jay Pipes

On 09/13/2016 08:23 PM, Terry Wilson wrote:

On Tue, Sep 13, 2016 at 6:31 PM, Jay Pipes  wrote:

On 09/13/2016 01:40 PM, Terry Wilson wrote:


On Thu, Jun 11, 2015 at 8:33 AM, Sean Dague  wrote:


On 06/11/2015 09:02 AM, Jay Pipes wrote:


On 06/11/2015 01:16 AM, Robert Collins wrote:


But again - where in OpenStack does this matter the slightest?



Precisely. I can't think of a single case where we are iterating over
anywhere near the number of dictionary items that we would see any
impact whatsoever.



In neutron, the ovsdb native code iterates over fairly large
dictionaries since the underlying OVS library stores OVSDB tables
completely in memory as dicts. I just looked at the code I wrote and
it currently uses values() and I now want to switch it to
six.itervalues() :p.


Best,
-jay



+1.

This is a massive premature optimization which just makes all the code
gorpy for no real reason.



Premature optimization is about wasting a bunch of time trying to
optimize code before you know you need to, not about following the
accepted almost-always-faster/always-less-memory-using solution that
already exists. Memory-wise it's the difference between a constant
88-byte iterator and the storage for an additional list of tuples. And
if Raymond Hettinger, in a talk called "Transforming Code Into
Beautiful Idiomatic Python" specifically mentions that people should
always use iteritems
(https://www.youtube.com/watch?v=OSGv2VnC0go=youtu.be=21m24s),
I tend to believe him. Sure, it'd be much better if Python 3 and
Python 2 both returned iterators for items(), values(), keys(), etc.,
but it doesn't. Wasting memory for purely aesthetic reasons (they're
even both the same number of lines) is just a bad idea, IMNSHO.



Is it wasted time to respond to a mailing list post from 18 months ago?

-jay


Ha! Absolutely it is. Someone posted a Neutron patch haphazardly
converting all of of the six.iteritems() calls to items() and it
struck a nerve. I searched for the thread in gmail not noticing the
date. My apologies! :)


Heh, no worries, I was mostly just being tongue-in-cheek :)

-jay

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [all][python3] use of six.iteritems()

2016-09-13 Thread Kevin Benton
>All performance matters. All memory consumption matters. Being wasteful
over a purely aesthetic few extra characters of code is silly.

Isn't the logical conclusion of this to write everything in a different
language? :)

On Tue, Sep 13, 2016 at 8:42 AM, Terry Wilson  wrote:

> On Wed, Jun 10, 2015 at 4:41 AM, Robert Collins
>  wrote:
> > On 10 June 2015 at 21:30, Ihar Hrachyshka  wrote:
> >> -BEGIN PGP SIGNED MESSAGE-
> >> Hash: SHA256
> >>
> >> On 06/10/2015 02:15 AM, Robert Collins wrote:
> >>> I'm very glad folk are working on Python3 ports.
> >>>
> >>> I'd like to call attention to one little wart in that process: I
> >>> get the feeling that folk are applying a massive regex to find
> >>> things like d.iteritems() and convert that to six.iteritems(d).
> >>>
> >>> I'd very much prefer that such a regex approach move things to
> >>> d.items(), which is much easier to read.
> >>>
> >>> Here's why. Firstly, very very very few of our dict iterations are
> >>> going to be performance sensitive in the way that iteritems()
> >>> matters. Secondly, no really - unless you're doing HUGE dicts, it
> >>> doesn't matter. Thirdly. Really, it doesn't.
> >>>
> >>
> >> Does it hurt though? ;)
> >
> > Yes.
> >
> > Its: harder to read. Its going to have to be removed eventually anyway
> > (when we stop supporting 2.7). Its marginally slower on 3.x (it has a
> > function and an iterator wrapping the actual thing). Its unidiomatic,
> > and we get lots of programmers that are new to Python; we should be
> > giving them as beautiful code as we can to help them learn.
>
> If someone is so new they can't handle six.iteritems, they should stay
> away from Neutron code. It'll eat them.
>
> >>> At 1 million items the overhead is 54ms[1]. If we're doing inner
> >>> loops on million item dictionaries anywhere in OpenStack today, we
> >>> have a problem. We might want to in e.g. the scheduler... if it
> >>> held in-memory state on a million hypervisors at once, because I
> >>> don't really to to imagine it pulling a million rows from a DB on
> >>> every action. But then, we'd be looking at a whole 54ms. I think we
> >>> could survive, if we did that (which we don't).
> >>>
> >>> So - please, no six.iteritems().
>
> Huge -1 from me. The "I like looking at d.items() more than I like
> looking at six.iteritems(d) so make everything (even slightly) less
> efficient" argument is insane to me. All performance matters. All
> memory consumption matters. Being wasteful over a purely aesthetic few
> extra characters of code is silly.
>
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [all][python3] use of six.iteritems()

2016-09-13 Thread Terry Wilson
On Tue, Sep 13, 2016 at 6:31 PM, Jay Pipes  wrote:
> On 09/13/2016 01:40 PM, Terry Wilson wrote:
>>
>> On Thu, Jun 11, 2015 at 8:33 AM, Sean Dague  wrote:
>>>
>>> On 06/11/2015 09:02 AM, Jay Pipes wrote:

 On 06/11/2015 01:16 AM, Robert Collins wrote:
>
> But again - where in OpenStack does this matter the slightest?


 Precisely. I can't think of a single case where we are iterating over
 anywhere near the number of dictionary items that we would see any
 impact whatsoever.
>>
>>
>> In neutron, the ovsdb native code iterates over fairly large
>> dictionaries since the underlying OVS library stores OVSDB tables
>> completely in memory as dicts. I just looked at the code I wrote and
>> it currently uses values() and I now want to switch it to
>> six.itervalues() :p.
>>
 Best,
 -jay
>>>
>>>
>>> +1.
>>>
>>> This is a massive premature optimization which just makes all the code
>>> gorpy for no real reason.
>>
>>
>> Premature optimization is about wasting a bunch of time trying to
>> optimize code before you know you need to, not about following the
>> accepted almost-always-faster/always-less-memory-using solution that
>> already exists. Memory-wise it's the difference between a constant
>> 88-byte iterator and the storage for an additional list of tuples. And
>> if Raymond Hettinger, in a talk called "Transforming Code Into
>> Beautiful Idiomatic Python" specifically mentions that people should
>> always use iteritems
>> (https://www.youtube.com/watch?v=OSGv2VnC0go=youtu.be=21m24s),
>> I tend to believe him. Sure, it'd be much better if Python 3 and
>> Python 2 both returned iterators for items(), values(), keys(), etc.,
>> but it doesn't. Wasting memory for purely aesthetic reasons (they're
>> even both the same number of lines) is just a bad idea, IMNSHO.
>
>
> Is it wasted time to respond to a mailing list post from 18 months ago?
>
> -jay

Ha! Absolutely it is. Someone posted a Neutron patch haphazardly
converting all of of the six.iteritems() calls to items() and it
struck a nerve. I searched for the thread in gmail not noticing the
date. My apologies! :)

Terry

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [all][python3] use of six.iteritems()

2016-09-13 Thread Jay Pipes

On 09/13/2016 01:40 PM, Terry Wilson wrote:

On Thu, Jun 11, 2015 at 8:33 AM, Sean Dague  wrote:

On 06/11/2015 09:02 AM, Jay Pipes wrote:

On 06/11/2015 01:16 AM, Robert Collins wrote:

But again - where in OpenStack does this matter the slightest?


Precisely. I can't think of a single case where we are iterating over
anywhere near the number of dictionary items that we would see any
impact whatsoever.


In neutron, the ovsdb native code iterates over fairly large
dictionaries since the underlying OVS library stores OVSDB tables
completely in memory as dicts. I just looked at the code I wrote and
it currently uses values() and I now want to switch it to
six.itervalues() :p.


Best,
-jay


+1.

This is a massive premature optimization which just makes all the code
gorpy for no real reason.


Premature optimization is about wasting a bunch of time trying to
optimize code before you know you need to, not about following the
accepted almost-always-faster/always-less-memory-using solution that
already exists. Memory-wise it's the difference between a constant
88-byte iterator and the storage for an additional list of tuples. And
if Raymond Hettinger, in a talk called "Transforming Code Into
Beautiful Idiomatic Python" specifically mentions that people should
always use iteritems
(https://www.youtube.com/watch?v=OSGv2VnC0go=youtu.be=21m24s),
I tend to believe him. Sure, it'd be much better if Python 3 and
Python 2 both returned iterators for items(), values(), keys(), etc.,
but it doesn't. Wasting memory for purely aesthetic reasons (they're
even both the same number of lines) is just a bad idea, IMNSHO.


Is it wasted time to respond to a mailing list post from 18 months ago?

-jay

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [all][python3] use of six.iteritems()

2016-09-13 Thread Terry Wilson
On Thu, Jun 11, 2015 at 8:33 AM, Sean Dague  wrote:
> On 06/11/2015 09:02 AM, Jay Pipes wrote:
>> On 06/11/2015 01:16 AM, Robert Collins wrote:
>>> But again - where in OpenStack does this matter the slightest?
>>
>> Precisely. I can't think of a single case where we are iterating over
>> anywhere near the number of dictionary items that we would see any
>> impact whatsoever.

In neutron, the ovsdb native code iterates over fairly large
dictionaries since the underlying OVS library stores OVSDB tables
completely in memory as dicts. I just looked at the code I wrote and
it currently uses values() and I now want to switch it to
six.itervalues() :p.

>> Best,
>> -jay
>
> +1.
>
> This is a massive premature optimization which just makes all the code
> gorpy for no real reason.

Premature optimization is about wasting a bunch of time trying to
optimize code before you know you need to, not about following the
accepted almost-always-faster/always-less-memory-using solution that
already exists. Memory-wise it's the difference between a constant
88-byte iterator and the storage for an additional list of tuples. And
if Raymond Hettinger, in a talk called "Transforming Code Into
Beautiful Idiomatic Python" specifically mentions that people should
always use iteritems
(https://www.youtube.com/watch?v=OSGv2VnC0go=youtu.be=21m24s),
I tend to believe him. Sure, it'd be much better if Python 3 and
Python 2 both returned iterators for items(), values(), keys(), etc.,
but it doesn't. Wasting memory for purely aesthetic reasons (they're
even both the same number of lines) is just a bad idea, IMNSHO.

Terry

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [all][python3] use of six.iteritems()

2016-09-13 Thread Ed Leafe
On Sep 13, 2016, at 10:42 AM, Terry Wilson  wrote:

> All performance matters. All
> memory consumption matters. Being wasteful over a purely aesthetic few
> extra characters of code is silly.

import this


-- Ed Leafe






__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [all][python3] use of six.iteritems()

2016-09-13 Thread Terry Wilson
On Wed, Jun 10, 2015 at 4:41 AM, Robert Collins
 wrote:
> On 10 June 2015 at 21:30, Ihar Hrachyshka  wrote:
>> -BEGIN PGP SIGNED MESSAGE-
>> Hash: SHA256
>>
>> On 06/10/2015 02:15 AM, Robert Collins wrote:
>>> I'm very glad folk are working on Python3 ports.
>>>
>>> I'd like to call attention to one little wart in that process: I
>>> get the feeling that folk are applying a massive regex to find
>>> things like d.iteritems() and convert that to six.iteritems(d).
>>>
>>> I'd very much prefer that such a regex approach move things to
>>> d.items(), which is much easier to read.
>>>
>>> Here's why. Firstly, very very very few of our dict iterations are
>>> going to be performance sensitive in the way that iteritems()
>>> matters. Secondly, no really - unless you're doing HUGE dicts, it
>>> doesn't matter. Thirdly. Really, it doesn't.
>>>
>>
>> Does it hurt though? ;)
>
> Yes.
>
> Its: harder to read. Its going to have to be removed eventually anyway
> (when we stop supporting 2.7). Its marginally slower on 3.x (it has a
> function and an iterator wrapping the actual thing). Its unidiomatic,
> and we get lots of programmers that are new to Python; we should be
> giving them as beautiful code as we can to help them learn.

If someone is so new they can't handle six.iteritems, they should stay
away from Neutron code. It'll eat them.

>>> At 1 million items the overhead is 54ms[1]. If we're doing inner
>>> loops on million item dictionaries anywhere in OpenStack today, we
>>> have a problem. We might want to in e.g. the scheduler... if it
>>> held in-memory state on a million hypervisors at once, because I
>>> don't really to to imagine it pulling a million rows from a DB on
>>> every action. But then, we'd be looking at a whole 54ms. I think we
>>> could survive, if we did that (which we don't).
>>>
>>> So - please, no six.iteritems().

Huge -1 from me. The "I like looking at d.items() more than I like
looking at six.iteritems(d) so make everything (even slightly) less
efficient" argument is insane to me. All performance matters. All
memory consumption matters. Being wasteful over a purely aesthetic few
extra characters of code is silly.

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [all][python3] use of six.iteritems()

2015-06-15 Thread Robert Collins
On 12 June 2015 at 05:39, Dolph Mathews dolph.math...@gmail.com wrote:

 On Thu, Jun 11, 2015 at 12:34 AM, Robert Collins robe...@robertcollins.net
 wrote:

 On 11 June 2015 at 17:16, Robert Collins robe...@robertcollins.net
 wrote:

  This test conflates setup and execution. Better like my example,
 ...

 Just had it pointed out to me that I've let my inner asshole out again
 - sorry. I'm going to step away from the thread for a bit; my personal
 state (daughter just had a routine but painful operation) shouldn't be
 taken out on other folk, however indirectly.


 Ha, no worries. You are completely correct about conflating setup and
 execution. As far as I can tell though, even if I isolate the dict setup
 from the benchmark, I get the same relative differences in results.
 iteritems() was introduced for a reason!

Absolutely: the key question is whether that reason is applicable to us.

 If you don't need to go back to .items()'s copy behavior in py2, then
 six.iteritems() seems to be the best general purpose choice.

 I think Gordon said it best elsewhere in this thread:

 again, i just want to reiterate, i'm not saying don't use items(), i just
 think we should not blindly use items() just as we shouldn't blindly use
 iteritems()/viewitems()

I'd like to recap and summarise a bit.

I think its broadly agreed that:

The three view based methods -- iteritems, iterkeys, iteritems -- in
Python2 became unified with the list-form equivalents in Python3.

The view based methods are substantially faster and lower overhead
than the list form methods, approximately 3x.

We don't have any services today that expect to hold million item
dicts, or even 10K item dicts in a persistent fashion.

There's some cognitive overhead involved in reading six.iteritems(d)
vs d.items().

We should use d.items() except where it matters.


Where does it matter?
We have several process architectures in OpenStack:
- We have API servers that are eventlet (except keystone) WSGI
servers. They respond to requests on HTTP[S], each request is
independent and loads all its state from the DB and/or memcache each
time. We don't expect large numbers of concurrent active requests per
process. (Where large would be e.g. 1000).
- We have MQ servers that are conceptually the same as WSGI, just a
different listening protocol. They do sometimes have background tasks,
and for some (e.g. neutron-l3-agent) may hold significant cached state
between requests. But thats still scoped to medium size datasets. We
expect moderate numbers of concurrent active requests, as these are
the actual backends doing things for users, but since these servers
are typically working with actual slow things (e.g. the hypervisor)
high concurrency typically goes badly :).
- We have CLIs that start up, process some data and exit. This
includes python-novaclient and nova-manage. They generally work with
very small datasets and have no concurrency at all.

There are two ways that iteritems vs items etc could matter. One A) is
memorycpu on single use of very large dicts. The other B) is
aggregate overhead on many concurrent uses of a single shared dict (or
C) possibly N similar-sized dicts).

A) Doesn't apply to us in any case I can think of.
B) Doesn't apply to us either - our peak concurrency on any single
process is still low (we may manage to make it higher now we're moving
on the PyMYSql thing, but thats still in progress - and of course
there are tradeoffs with high concurrency depending on the ratio of
work-to-wait each request has. Very high concurrency depends on a very
low ratio: to have 1000 concurrent requests that aren't slowing each
other down requires that each requests wall clock be 1000x the time
spent in-process actioning it; and that there be enough backend
capacity (whatever that is) to dispatch the work to without causing
queuing in that part of the system.
C) We can eliminate via both the argument on B, and on relative
overheads: if we had 1 1000-item dicts in process at once, the
relative overhead of making items() from them all is approx the size
of the dicts: but its almost certain we have much more state hanging
around in each of those 1 threads than each dict: so the
incremental cost will not dominate the process overheads.

I'm not - and haven't - said that iteritems() is never applicable *in
general*, rather I don't believe its ever applicable *to us* today:
and I'm arguing that we should default to items() and bring in
iteritems() if and when we need it.

-Rob

-- 
Robert Collins rbtcoll...@hp.com
Distinguished Technologist
HP Converged Cloud

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [all][python3] use of six.iteritems()

2015-06-11 Thread gordon chung
no worries... can't speak for that Dolph fellow though :)
i think it's good to understand/learn different testing/benchmarking 
strategies. 

cheers,
gord


 Date: Thu, 11 Jun 2015 17:34:57 +1200
 From: robe...@robertcollins.net
 To: openstack-dev@lists.openstack.org
 Subject: Re: [openstack-dev] [all][python3] use of six.iteritems()
 
 On 11 June 2015 at 17:16, Robert Collins robe...@robertcollins.net wrote:
 
  This test conflates setup and execution. Better like my example,
 ...
 
 Just had it pointed out to me that I've let my inner asshole out again
 - sorry. I'm going to step away from the thread for a bit; my personal
 state (daughter just had a routine but painful operation) shouldn't be
 taken out on other folk, however indirectly.
 
 -Rob
 
 -- 
 Robert Collins rbtcoll...@hp.com
 Distinguished Technologist
 HP Converged Cloud
 
 __
 OpenStack Development Mailing List (not for usage questions)
 Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
  __
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [all][python3] use of six.iteritems()

2015-06-11 Thread Mike Bayer



On 6/10/15 11:48 PM, Dolph Mathews wrote:
tl;dr *.iteritems() is faster and more memory efficient than .items() 
in python2*



Using xrange() in python2 instead of range() because it's more memory 
efficient and consistent between python 2 and 3...


# xrange() + .items()
python -m timeit -n 20 for\ i\ in\ 
dict(enumerate(xrange(100))).items():\ pass

20 loops, best of 3: 729 msec per loop
peak memory usage: 203 megabytes

# xrange() + .iteritems()
python -m timeit -n 20 for\ i\ in\ 
dict(enumerate(xrange(100))).iteritems():\ pass

20 loops, best of 3: 644 msec per loop
peak memory usage: 176 megabytes

# python 3
python3 -m timeit -n 20 for\ i\ in\ 
dict(enumerate(range(100))).items():\ pass

20 loops, best of 3: 826 msec per loop
peak memory usage: 198 megabytes
it is just me, or are these differences pretty negligible considering 
this is the 1 million item dictionary, which in itself is a unicorn in 
openstack code or really most code anywhere?


as was stated before, if we have million-item dictionaries floating 
around, that code has problems.   I already have to wait full seconds 
for responses to come back when I play around with Neutron + Horizon in 
a devstack VM, and that's with no data at all.  100ms extra for a 
hypothetical million item structure would be long after the whole app 
has fallen over from having just ten thousand of anything, much less a 
million.


My only concern with items() is that it is semantically different in 
Py2k / Py3k.  Code that would otherwise have a dictionary changed size 
issue under iteritems() / py3k items() would succeed under py2k 
items().   If such a coding mistake is not covered by tests (as this is 
a data-dependent error condition), it would manifest as a sudden error 
condition on Py3k only.







And if you really want to see the results with range() in python2...

# range() + .items()
python -m timeit -n 20 for\ i\ in\ 
dict(enumerate(range(100))).items():\ pass

20 loops, best of 3: 851 msec per loop
peak memory usage: 254 megabytes

# range() + .iteritems()
python -m timeit -n 20 for\ i\ in\ 
dict(enumerate(range(100))).iteritems():\ pass

20 loops, best of 3: 919 msec per loop
peak memory usage: 184 megabytes


To benchmark memory consumption, I used the following on bare metal:

$ valgrind --tool=massif --pages-as-heap=yes 
-massif-out-file=massif.out $COMMAND_FROM_ABOVE

$ cat massif.out | grep mem_heap_B | sort -u

$ python2 --version
Python 2.7.9

$ python3 --version
Python 3.4.3


On Wed, Jun 10, 2015 at 8:36 PM, gordon chung g...@live.ca 
mailto:g...@live.ca wrote:


 Date: Wed, 10 Jun 2015 21:33:44 +1200
 From: robe...@robertcollins.net mailto:robe...@robertcollins.net
 To: openstack-dev@lists.openstack.org
mailto:openstack-dev@lists.openstack.org
 Subject: Re: [openstack-dev] [all][python3] use of six.iteritems()

 On 10 June 2015 at 17:22, gordon chung g...@live.ca
mailto:g...@live.ca wrote:
  maybe the suggestion should be don't blindly apply
six.iteritems or items rather than don't apply iteritems at all.
admittedly, it's a massive eyesore, but it's a very real use case
that some projects deal with large data results and to enforce the
latter policy can have negative effects[1]. one million item
dictionary might be negligible but in a multi-user, multi-*
environment that can have a significant impact on the amount
memory required to store everything.

  [1] disclaimer: i have no real world results but i assume
memory management was the reason for the switch in logic from py2
to py3

 I wouldn't make that assumption.

 And no, memory isn't an issue. If you have a million item dict,
 ignoring the internal overheads, the dict needs 1 million object
 pointers. The size of a list with those pointers in it is 1M
(pointer
 size in bytes). E.g. 4M or 8M. Nothing to worry about given the
 footprint of such a program :)

iiuc, items() (in py2) will create a copy of  the dictionary in
memory to be processed. this is useful for cases such as
concurrency where you want to ensure consistency but doing a quick
test i noticed a massive spike in memory usage between items() and
iteritems.

'for i in dict(enumerate(range(100))).items(): pass' consumes
significantly more memory than 'for i in
dict(enumerate(range(100))).iteritems(): pass'. on my system,
the difference in memory consumption was double when using items()
vs iteritems() and the cpu util was significantly more as well...
let me know if there's anything that stands out as inaccurate.

unless there's something wrong with my ignorant testing above, i
think it's something projects should consider when mass applying
any iteritems/items patch.

cheers,
gord

__
OpenStack Development Mailing List (not for usage

Re: [openstack-dev] [all][python3] use of six.iteritems()

2015-06-11 Thread gordon chung
 it is just me, or are these differences pretty negligible considering 
 this is the 1 million item dictionary, which in itself is a unicorn 
 in openstack code or really most code anywhere? 
 
 as was stated before, if we have million-item dictionaries floating 
 around, that code has problems. I already have to wait full seconds 
 for responses to come back when I play around with Neutron + Horizon in 
 a devstack VM, and that's with no data at all. 100ms extra for a 
 hypothetical million item structure would be long after the whole app 
 has fallen over from having just ten thousand of anything, much less a 
 million. 

my concern isn't the 1million item dictionary -- i think we're all just using 
that as a simple test -- it's the 100 concurrent actions against a 10,000 item 
dictionary or the  1000 concurrent actions against at 1000 item dictionary... 
when tracking htop to see memory consumption, items() consistently doubled the 
memory consumption of iteritems().

again, i just want to reiterate, i'm not saying don't use items(), i just think 
we should not blindly use items() just as we shouldn't blindly use 
iteritems()/viewitems()

 
 My only concern with items() is that it is semantically different in 
 Py2k / Py3k. Code that would otherwise have a dictionary changed 
 size issue under iteritems() / py3k items() would succeed under py2k 
 items(). If such a coding mistake is not covered by tests (as this is 
 a data-dependent error condition), it would manifest as a sudden error 
 condition on Py3k only. 
 
  
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [all][python3] use of six.iteritems()

2015-06-11 Thread Victor Stinner

Hi,

Le 10/06/2015 02:15, Robert Collins a écrit :

python2.7 -m timeit -s 'd=dict(enumerate(range(100)))' 'for i in
d.items(): pass'
10 loops, best of 3: 76.6 msec per loop



python2.7 -m timeit -s 'd=dict(enumerate(range(100)))' 'for i in
d.iteritems(): pass'
100 loops, best of 3: 22.6 msec per loop


.items() is 3x as slow as .iteritems(). Hum, I don't have the same 
results. Try attached benchmark. I'm using my own wrapper on top of 
timeit, because timeit is bad at calibrating the benchmark :-/ timeit 
gives unreliable results.


Results on with CPU model: Intel(R) Core(TM) i7-3520M CPU @ 2.90GHz:

[ 10 keys ]
713 ns: iteritems
922 ns (+29%): items

[ 10^3 keys ]
42.1 us: iteritems
59.4 us (+41%): items


[ 10^6 keys (1 million) ]
89.3 ms: iteritems
442 ms (+395%): items

In my benchmark, .items() is 5x as slow as .iteritems(). The code to 
iterate on 1 million items takes almost an half second. IMO adding 300 
ms to each request is not negligible on an application. If this delay is 
added multiple times (multiple loops iterating on 1 million items), we 
may reach up to 1 second on an user request :-/


Anyway, when I write patches to port a project to Python 3, I don't want 
to touch *anything* to Python 2. The API, the performances, the 
behaviour, etc. must not change.


I don't want to be responsible of a slow down, and I don't feel able to 
estimate if replacing dict.iteritems() with dict.items() has a cost on a 
real application.


As Ihar wrote: it must be done in a separated patch, by developers 
knowning well the project.


Currently, most developers writing Python 3 patches are not heavily 
involved in each ported project.


There is also dict.itervalues(), not only dict.iteritems().

for key in dict.iterkeys() can simply be written for key in dict:.

There is also xrange() vs range(), the debate is similar:
https://review.openstack.org/#/c/185418/

For Python 3, I suggest to use from six.moves import range to get the 
Python 3 behaviour  on Python 2: range() always create an iterator, it 
doesn't create a temporary list. IMO it makes the code more readable 
because for i in xrange(n): becomes for i in range(n):. six is not 
written outside imports and range() is better than xrange() for 
developers starting to learn Python.


Victor

Micro-benchmark for the Python operation key in dict. Run it with:

./python.orig benchmark.py script bench_str.py --file=orig
./python.patched benchmark.py script bench_str.py --file=patched
./python.patched benchmark.py compare_to orig patched

Download benchmark.py from:

https://bitbucket.org/haypo/misc/raw/tip/python/benchmark.py

import gc

def consume_items(dico):
for key, value in dico.items():
pass


def consume_iteritems(dico):
for key, value in dico.iteritems():
pass


def run_benchmark(bench):
for nkeys in (10, 10**3, 10**6):
bench.start_group('%s keys' % nkeys)
dico = {str(index): index for index in range(nkeys)}

bench.compare_functions(
('iteritems', consume_iteritems, dico),
('items', consume_items, dico),
)
dico = None
gc.collect()
gc.collect()

if __name__ == __main__:
import benchmark
benchmark.main()
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [all][python3] use of six.iteritems()

2015-06-11 Thread Jay Pipes

On 06/11/2015 01:16 AM, Robert Collins wrote:

But again - where in OpenStack does this matter the slightest?


Precisely. I can't think of a single case where we are iterating over 
anywhere near the number of dictionary items that we would see any 
impact whatsoever.


Best,
-jay

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [all][python3] use of six.iteritems()

2015-06-11 Thread Sean Dague
On 06/11/2015 09:02 AM, Jay Pipes wrote:
 On 06/11/2015 01:16 AM, Robert Collins wrote:
 But again - where in OpenStack does this matter the slightest?
 
 Precisely. I can't think of a single case where we are iterating over
 anywhere near the number of dictionary items that we would see any
 impact whatsoever.
 
 Best,
 -jay

+1.

This is a massive premature optimization which just makes all the code
gorpy for no real reason.

-Sean

-- 
Sean Dague
http://dague.net

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [all][python3] use of six.iteritems()

2015-06-11 Thread Dolph Mathews
On Thu, Jun 11, 2015 at 12:34 AM, Robert Collins robe...@robertcollins.net
wrote:

 On 11 June 2015 at 17:16, Robert Collins robe...@robertcollins.net
 wrote:

  This test conflates setup and execution. Better like my example,
 ...

 Just had it pointed out to me that I've let my inner asshole out again
 - sorry. I'm going to step away from the thread for a bit; my personal
 state (daughter just had a routine but painful operation) shouldn't be
 taken out on other folk, however indirectly.


Ha, no worries. You are completely correct about conflating setup and
execution. As far as I can tell though, even if I isolate the dict setup
from the benchmark, I get the same relative differences in results.
iteritems() was introduced for a reason!

If you don't need to go back to .items()'s copy behavior in py2, then
six.iteritems() seems to be the best general purpose choice.

I think Gordon said it best elsewhere in this thread:

 again, i just want to reiterate, i'm not saying don't use items(), i just
think we should not blindly use items() just as we shouldn't blindly use
iteritems()/viewitems()



 -Rob

 --
 Robert Collins rbtcoll...@hp.com
 Distinguished Technologist
 HP Converged Cloud

 __
 OpenStack Development Mailing List (not for usage questions)
 Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [all][python3] use of six.iteritems()

2015-06-11 Thread Mike Bayer



On 6/11/15 1:39 PM, Dolph Mathews wrote:


On Thu, Jun 11, 2015 at 12:34 AM, Robert Collins 
robe...@robertcollins.net mailto:robe...@robertcollins.net wrote:


On 11 June 2015 at 17:16, Robert Collins
robe...@robertcollins.net mailto:robe...@robertcollins.net wrote:

 This test conflates setup and execution. Better like my example,
...

Just had it pointed out to me that I've let my inner asshole out again
- sorry. I'm going to step away from the thread for a bit; my personal
state (daughter just had a routine but painful operation) shouldn't be
taken out on other folk, however indirectly.


Ha, no worries. You are completely correct about conflating setup and 
execution. As far as I can tell though, even if I isolate the dict 
setup from the benchmark, I get the same relative differences in 
results. iteritems() was introduced for a reason!


If you don't need to go back to .items()'s copy behavior in py2, then 
six.iteritems() seems to be the best general purpose choice.
I am firmly in the let's use items() camp.  A 100 ms difference for a 
totally not-real-world case of a dictionary 1M items in size is no kind 
of rationale for the Openstack project - if someone has a dictionary 
that's 1M objects in size, or even 100K, that's a bug in and of itself.


the real benchmarks we should be using, if we are to even bother at all 
(which we shouldn't), is to observe if items() vs. iteritems() has *any* 
difference that is at all measurable in terms of the overall execution 
of real-world openstack use cases.   These nano-differences in speed are 
immediately dwarfed by all those operations surrounding them long before 
we even get to the level of RPC overhead.






I think Gordon said it best elsewhere in this thread:

 again, i just want to reiterate, i'm not saying don't use items(), i 
just think we should not blindly use items() just as we shouldn't 
blindly use iteritems()/viewitems()
If a demonstrable difference can be established in terms of real-world 
use cases for code that is using iteritems() vs. items(), then you can 
justify this difference.  Otherwise, not worth it.







-Rob

--
Robert Collins rbtcoll...@hp.com mailto:rbtcoll...@hp.com
Distinguished Technologist
HP Converged Cloud

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe:
openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev




__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [all][python3] use of six.iteritems()

2015-06-11 Thread Nikhil Komawar
+1 : .items()

Why can't we just add six.iteritems calls case by case basis (if that
happens)? Regex substitutions for a library call don't make sense to me
on such a massive scale.

On 6/11/15 11:00 AM, Victor Stinner wrote:
 Hi,

 Le 10/06/2015 02:15, Robert Collins a écrit :
 python2.7 -m timeit -s 'd=dict(enumerate(range(100)))' 'for i in
 d.items(): pass'
 10 loops, best of 3: 76.6 msec per loop

 python2.7 -m timeit -s 'd=dict(enumerate(range(100)))' 'for i in
 d.iteritems(): pass'
 100 loops, best of 3: 22.6 msec per loop

 .items() is 3x as slow as .iteritems(). Hum, I don't have the same
 results. Try attached benchmark. I'm using my own wrapper on top of
 timeit, because timeit is bad at calibrating the benchmark :-/ timeit
 gives unreliable results.

 Results on with CPU model: Intel(R) Core(TM) i7-3520M CPU @ 2.90GHz:

 [ 10 keys ]
 713 ns: iteritems
 922 ns (+29%): items

 [ 10^3 keys ]
 42.1 us: iteritems
 59.4 us (+41%): items


 [ 10^6 keys (1 million) ]
 89.3 ms: iteritems
 442 ms (+395%): items

 In my benchmark, .items() is 5x as slow as .iteritems(). The code to
 iterate on 1 million items takes almost an half second. IMO adding 300
 ms to each request is not negligible on an application. If this delay
 is added multiple times (multiple loops iterating on 1 million items),
 we may reach up to 1 second on an user request :-/

 Anyway, when I write patches to port a project to Python 3, I don't
 want to touch *anything* to Python 2. The API, the performances, the
 behaviour, etc. must not change.

 I don't want to be responsible of a slow down, and I don't feel able
 to estimate if replacing dict.iteritems() with dict.items() has a cost
 on a real application.

 As Ihar wrote: it must be done in a separated patch, by developers
 knowning well the project.

 Currently, most developers writing Python 3 patches are not heavily
 involved in each ported project.

 There is also dict.itervalues(), not only dict.iteritems().

 for key in dict.iterkeys() can simply be written for key in dict:.

 There is also xrange() vs range(), the debate is similar:
 https://review.openstack.org/#/c/185418/

 For Python 3, I suggest to use from six.moves import range to get
 the Python 3 behaviour  on Python 2: range() always create an
 iterator, it doesn't create a temporary list. IMO it makes the code
 more readable because for i in xrange(n): becomes for i in
 range(n):. six is not written outside imports and range() is better
 than xrange() for developers starting to learn Python.

 Victor


 __
 OpenStack Development Mailing List (not for usage questions)
 Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

-- 

Thanks,
Nikhil


__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [all][python3] use of six.iteritems()

2015-06-11 Thread John Dennis

On 06/11/2015 01:46 PM, Mike Bayer wrote:
I am firmly in the let's use items() camp.  A 100 ms difference for 
a totally not-real-world case of a dictionary 1M items in size is no 
kind of rationale for the Openstack project - if someone has a 
dictionary that's 1M objects in size, or even 100K, that's a bug in 
and of itself.


the real benchmarks we should be using, if we are to even bother at 
all (which we shouldn't), is to observe if items() vs. iteritems() has 
*any* difference that is at all measurable in terms of the overall 
execution of real-world openstack use cases.   These nano-differences 
in speed are immediately dwarfed by all those operations surrounding 
them long before we even get to the level of RPC overhead.


Lessons learned in the trenches:

* The best code is the simplest [1] and easiest to read.

* Code is write-once, read-many; clarity is a vital part of the read-many.

* Do not optimize until functionality is complete.

* Optimize only after profiling real world use cases.

* Prior assumptions about what needs optimization are almost always 
proven wrong by a profiler.


* I/O latency vastly overwhelms most code optimization making obtuse 
optimization pointless and detrimental to long term robustness.


* The amount of optimization needed is usually minimal, restricted to 
just a few code locations and 80% of the speed increases occur in just 
the first few tweaks after analyzing profile data.


[1] Compilers can optimize simple code best, simple code is easy to 
write and easier to read while at the same time giving the tool chain 
the best chance of turning your simple code into efficient code. (Not 
sure how much this applies to Python, but it's certainly true of other 
compiled languages.)


John

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [all][python3] use of six.iteritems()

2015-06-11 Thread Clint Byrum
Top posting as this is more a response to the whole thread.

My take aways from the most excellent discussion:

* There is some benefit to iteritems in python2 when you need it.
* OpenStack does not seem to need it
  - Except in places that are operating on tens of thousands of large
objects concurrently such as the nova scheduler.
* six.anything is more code, and more code is more burden in general.

From this I believe we should distill some clear developer
and reviewer recommendations which should go in our developer docs:

* Do not use six.iteritems in new patches without a clear reason
  stated and attached.
  - Reasons should clearly state why .items() would be a large enough
burden, such as this list will be large and stay resident in
memory for the duration of the program. Each concurrent request
will have similar lists.
* -1 patches using six.iteritems in flight now with Please remove or
  justify six.iteritems usage.
* Patches touching code sections which uses six.iteritems should be
  allowed to remove its usage without justification.

I've gone ahead and added this suggestion in a patch to the
infra-manual:

https://review.openstack.org/190757

This looks quite a bit like a hacking rule definition. How strongly do
we feel about this, do we want to require a tag of some kind on lines
that use six.iteritems(), or are we comfortable with this just being in
our python3 porting documentation?

Excerpts from Robert Collins's message of 2015-06-09 17:15:33 -0700:
 I'm very glad folk are working on Python3 ports.
 
 I'd like to call attention to one little wart in that process: I get
 the feeling that folk are applying a massive regex to find things like
 d.iteritems() and convert that to six.iteritems(d).
 
 I'd very much prefer that such a regex approach move things to
 d.items(), which is much easier to read.
 
 Here's why. Firstly, very very very few of our dict iterations are
 going to be performance sensitive in the way that iteritems() matters.
 Secondly, no really - unless you're doing HUGE dicts, it doesn't
 matter. Thirdly. Really, it doesn't.
 
 At 1 million items the overhead is 54ms[1]. If we're doing inner loops
 on million item dictionaries anywhere in OpenStack today, we have a
 problem. We might want to in e.g. the scheduler... if it held
 in-memory state on a million hypervisors at once, because I don't
 really to to imagine it pulling a million rows from a DB on every
 action. But then, we'd be looking at a whole 54ms. I think we could
 survive, if we did that (which we don't).
 
 So - please, no six.iteritems().
 
 Thanks,
 Rob
 
 
 [1]
 python2.7 -m timeit -s 'd=dict(enumerate(range(100)))' 'for i in
 d.items(): pass'
 10 loops, best of 3: 76.6 msec per loop
 python2.7 -m timeit -s 'd=dict(enumerate(range(100)))' 'for i in
 d.iteritems(): pass'
 100 loops, best of 3: 22.6 msec per loop
 python3.4 -m timeit -s 'd=dict(enumerate(range(100)))' 'for i in
 d.items(): pass'
 10 loops, best of 3: 18.9 msec per loop
 pypy2.3 -m timeit -s 'd=dict(enumerate(range(100)))' 'for i in
 d.items(): pass'
 10 loops, best of 3: 65.8 msec per loop
 # and out of interest, assuming that that hadn't triggered the JIT
 but it had.
  pypy -m timeit -n 1000 -s 'd=dict(enumerate(range(100)))' 'for i
 in d.items(): pass'
 1000 loops, best of 3: 64.3 msec per loop
 

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [all][python3] use of six.iteritems()

2015-06-11 Thread Nikhil Komawar


On 6/11/15 2:45 PM, John Dennis wrote:
 On 06/11/2015 01:46 PM, Mike Bayer wrote:
 I am firmly in the let's use items() camp.  A 100 ms difference for
 a totally not-real-world case of a dictionary 1M items in size is no
 kind of rationale for the Openstack project - if someone has a
 dictionary that's 1M objects in size, or even 100K, that's a bug in
 and of itself.  

 the real benchmarks we should be using, if we are to even bother at
 all (which we shouldn't), is to observe if items() vs. iteritems()
 has *any* difference that is at all measurable in terms of the
 overall execution of real-world openstack use cases.   These
 nano-differences in speed are immediately dwarfed by all those
 operations surrounding them long before we even get to the level of
 RPC overhead.

 Lessons learned in the trenches:

 * The best code is the simplest [1] and easiest to read.

 * Code is write-once, read-many; clarity is a vital part of the read-many.


+1

 * Do not optimize until functionality is complete.


+1

 * Optimize only after profiling real world use cases.


+2!

 * Prior assumptions about what needs optimization are almost always
 proven wrong by a profiler.


+2!

 * I/O latency vastly overwhelms most code optimization making obtuse
 optimization pointless and detrimental to long term robustness.


Couldn't agree more

 * The amount of optimization needed is usually minimal, restricted to
 just a few code locations and 80% of the speed increases occur in just
 the first few tweaks after analyzing profile data.

 [1] Compilers can optimize simple code best, simple code is easy to
 write and easier to read while at the same time giving the tool chain
 the best chance of turning your simple code into efficient code. (Not
 sure how much this applies to Python, but it's certainly true of other
 compiled languages.)

 John



 __
 OpenStack Development Mailing List (not for usage questions)
 Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

-- 

Thanks,
Nikhil


__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [all][python3] use of six.iteritems()

2015-06-10 Thread Sean Dague
On 06/09/2015 08:15 PM, Robert Collins wrote:
 I'm very glad folk are working on Python3 ports.
 
 I'd like to call attention to one little wart in that process: I get
 the feeling that folk are applying a massive regex to find things like
 d.iteritems() and convert that to six.iteritems(d).
 
 I'd very much prefer that such a regex approach move things to
 d.items(), which is much easier to read.
 
 Here's why. Firstly, very very very few of our dict iterations are
 going to be performance sensitive in the way that iteritems() matters.
 Secondly, no really - unless you're doing HUGE dicts, it doesn't
 matter. Thirdly. Really, it doesn't.
 
 At 1 million items the overhead is 54ms[1]. If we're doing inner loops
 on million item dictionaries anywhere in OpenStack today, we have a
 problem. We might want to in e.g. the scheduler... if it held
 in-memory state on a million hypervisors at once, because I don't
 really to to imagine it pulling a million rows from a DB on every
 action. But then, we'd be looking at a whole 54ms. I think we could
 survive, if we did that (which we don't).
 
 So - please, no six.iteritems().
 
 Thanks,
 Rob
 
 
 [1]
 python2.7 -m timeit -s 'd=dict(enumerate(range(100)))' 'for i in
 d.items(): pass'
 10 loops, best of 3: 76.6 msec per loop
 python2.7 -m timeit -s 'd=dict(enumerate(range(100)))' 'for i in
 d.iteritems(): pass'
 100 loops, best of 3: 22.6 msec per loop
 python3.4 -m timeit -s 'd=dict(enumerate(range(100)))' 'for i in
 d.items(): pass'
 10 loops, best of 3: 18.9 msec per loop
 pypy2.3 -m timeit -s 'd=dict(enumerate(range(100)))' 'for i in
 d.items(): pass'
 10 loops, best of 3: 65.8 msec per loop
 # and out of interest, assuming that that hadn't triggered the JIT
 but it had.
  pypy -m timeit -n 1000 -s 'd=dict(enumerate(range(100)))' 'for i
 in d.items(): pass'
 1000 loops, best of 3: 64.3 msec per loop
 

That's awesome, because those six.iteritems loops make me want to throw
up a little. Very happy to have our code just use items instead.

-Sean

-- 
Sean Dague
http://dague.net

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [all][python3] use of six.iteritems()

2015-06-10 Thread Robert Collins
On 10 June 2015 at 21:30, Ihar Hrachyshka ihrac...@redhat.com wrote:
 -BEGIN PGP SIGNED MESSAGE-
 Hash: SHA256

 On 06/10/2015 02:15 AM, Robert Collins wrote:
 I'm very glad folk are working on Python3 ports.

 I'd like to call attention to one little wart in that process: I
 get the feeling that folk are applying a massive regex to find
 things like d.iteritems() and convert that to six.iteritems(d).

 I'd very much prefer that such a regex approach move things to
 d.items(), which is much easier to read.

 Here's why. Firstly, very very very few of our dict iterations are
 going to be performance sensitive in the way that iteritems()
 matters. Secondly, no really - unless you're doing HUGE dicts, it
 doesn't matter. Thirdly. Really, it doesn't.


 Does it hurt though? ;)

Yes.

Its: harder to read. Its going to have to be removed eventually anyway
(when we stop supporting 2.7). Its marginally slower on 3.x (it has a
function and an iterator wrapping the actual thing). Its unidiomatic,
and we get lots of programmers that are new to Python; we should be
giving them as beautiful code as we can to help them learn.

 At 1 million items the overhead is 54ms[1]. If we're doing inner
 loops on million item dictionaries anywhere in OpenStack today, we
 have a problem. We might want to in e.g. the scheduler... if it
 held in-memory state on a million hypervisors at once, because I
 don't really to to imagine it pulling a million rows from a DB on
 every action. But then, we'd be looking at a whole 54ms. I think we
 could survive, if we did that (which we don't).

 So - please, no six.iteritems().


 The reason why in e.g. neutron we merged the patch using six.iteritems
 is that we don't want to go too deep into determining whether the
 original usage of iteritems() was justified.

Its not.

 The goal of the patch is
 to get python3 support, not to apply subjective style guidelines, so
 if someone wants to eliminate .iteritems(), he should create another
 patch just for that and struggle with reviewing it. While folks
 interested python3 can proceed with their work.

 We should not be afraid of multiple patches.

We shouldn't be indeed. All I'm asking is that we don't do poor
intermediate patches.

I've written code where performance tuning like that around iteritems
mattered. That code also needed to optimise tuple unpacking to avoid
performance hits and was aiming to manipulate million item data sets
from interpreter startup in subsecond times. It was some of the worst,
most impenetrable Python code I've ever seen, and while our code has
lots of issues, it neither has the same performance context that that
did, nor (thankfully) is it such impenetrable code.

-Rob

-- 
Robert Collins rbtcoll...@hp.com
Distinguished Technologist
HP Converged Cloud

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [all][python3] use of six.iteritems()

2015-06-10 Thread Ihar Hrachyshka
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA256

On 06/10/2015 02:15 AM, Robert Collins wrote:
 I'm very glad folk are working on Python3 ports.
 
 I'd like to call attention to one little wart in that process: I
 get the feeling that folk are applying a massive regex to find
 things like d.iteritems() and convert that to six.iteritems(d).
 
 I'd very much prefer that such a regex approach move things to 
 d.items(), which is much easier to read.
 
 Here's why. Firstly, very very very few of our dict iterations are 
 going to be performance sensitive in the way that iteritems()
 matters. Secondly, no really - unless you're doing HUGE dicts, it
 doesn't matter. Thirdly. Really, it doesn't.
 

Does it hurt though? ;)

 At 1 million items the overhead is 54ms[1]. If we're doing inner
 loops on million item dictionaries anywhere in OpenStack today, we
 have a problem. We might want to in e.g. the scheduler... if it
 held in-memory state on a million hypervisors at once, because I
 don't really to to imagine it pulling a million rows from a DB on
 every action. But then, we'd be looking at a whole 54ms. I think we
 could survive, if we did that (which we don't).
 
 So - please, no six.iteritems().
 

The reason why in e.g. neutron we merged the patch using six.iteritems
is that we don't want to go too deep into determining whether the
original usage of iteritems() was justified. The goal of the patch is
to get python3 support, not to apply subjective style guidelines, so
if someone wants to eliminate .iteritems(), he should create another
patch just for that and struggle with reviewing it. While folks
interested python3 can proceed with their work.

We should not be afraid of multiple patches.

Ihar
-BEGIN PGP SIGNATURE-
Version: GnuPG v2

iQEcBAEBCAAGBQJVeAO+AAoJEC5aWaUY1u57/XwH/0AsOQHa1IDWOauginSHAbi+
ZNwAUDRSKEI+ydwf9u/DxRkZP2MsiJwAbrlPeGyjr8aqNpqoTLcS5CxYaS7IqSOn
khrVGkczv6yNwKrB6j3jAFJtXz+Z2i475eTLJqRgdUeI4gJinhc0ghXJzF+4HpUN
2DewJlOqrD3OWJcUu0Gvmp4aEkr8JK0Iu2crCRoFJ2N5fvv7rt8FfcZ3oGkixJXd
n0+xD5Aszl8M/jAv3xt7ZxqFSL7QUiEhAVVgJEHm0D8mAR+2J9bpCKVvjkJ5T7Tw
fkHYXetzUipe0MMpXPl3jfSKBitpFOOOEBaqOSXvvgtxAo8U6nkNgPe6n+vuduc=
=lUY4
-END PGP SIGNATURE-

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [all][python3] use of six.iteritems()

2015-06-10 Thread Robert Collins
On 10 June 2015 at 17:22, gordon chung g...@live.ca wrote:
 maybe the suggestion should be don't blindly apply six.iteritems or items 
 rather than don't apply iteritems at all. admittedly, it's a massive eyesore, 
 but it's a very real use case that some projects deal with large data results 
 and to enforce the latter policy can have negative effects[1].  one million 
 item dictionary might be negligible but in a multi-user, multi-* environment 
 that can have a significant impact on the amount memory required to store 
 everything.

 [1] disclaimer: i have no real world results but i assume memory management 
 was the reason for the switch in logic from py2 to py3

I wouldn't make that assumption.

And no, memory isn't an issue. If you have a million item dict,
ignoring the internal overheads, the dict needs 1 million object
pointers. The size of a list with those pointers in it is 1M (pointer
size in bytes). E.g. 4M or 8M. Nothing to worry about given the
footprint of such a program :)

-Rob

-- 
Robert Collins rbtcoll...@hp.com
Distinguished Technologist
HP Converged Cloud

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [all][python3] use of six.iteritems()

2015-06-10 Thread gordon chung



 Date: Wed, 10 Jun 2015 21:33:44 +1200
 From: robe...@robertcollins.net
 To: openstack-dev@lists.openstack.org
 Subject: Re: [openstack-dev] [all][python3] use of six.iteritems()
 
 On 10 June 2015 at 17:22, gordon chung g...@live.ca wrote:
  maybe the suggestion should be don't blindly apply six.iteritems or items 
  rather than don't apply iteritems at all. admittedly, it's a massive 
  eyesore, but it's a very real use case that some projects deal with large 
  data results and to enforce the latter policy can have negative effects[1]. 
   one million item dictionary might be negligible but in a multi-user, 
  multi-* environment that can have a significant impact on the amount memory 
  required to store everything.
 
  [1] disclaimer: i have no real world results but i assume memory management 
  was the reason for the switch in logic from py2 to py3
 
 I wouldn't make that assumption.
 
 And no, memory isn't an issue. If you have a million item dict,
 ignoring the internal overheads, the dict needs 1 million object
 pointers. The size of a list with those pointers in it is 1M (pointer
 size in bytes). E.g. 4M or 8M. Nothing to worry about given the
 footprint of such a program :)
iiuc, items() (in py2) will create a copy of  the dictionary in memory to be 
processed. this is useful for cases such as concurrency where you want to 
ensure consistency but doing a quick test i noticed a massive spike in memory 
usage between items() and iteritems.
'for i in dict(enumerate(range(100))).items(): pass' consumes significantly 
more memory than 'for i in dict(enumerate(range(100))).iteritems(): pass'. 
on my system, the difference in memory consumption was double when using 
items() vs iteritems() and the cpu util was significantly more as well... let 
me know if there's anything that stands out as inaccurate.
unless there's something wrong with my ignorant testing above, i think it's 
something projects should consider when mass applying any iteritems/items patch.
cheers,gord
  __
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [all][python3] use of six.iteritems()

2015-06-10 Thread Robert Collins
On 11 June 2015 at 17:16, Robert Collins robe...@robertcollins.net wrote:

 This test conflates setup and execution. Better like my example,
...

Just had it pointed out to me that I've let my inner asshole out again
- sorry. I'm going to step away from the thread for a bit; my personal
state (daughter just had a routine but painful operation) shouldn't be
taken out on other folk, however indirectly.

-Rob

-- 
Robert Collins rbtcoll...@hp.com
Distinguished Technologist
HP Converged Cloud

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [all][python3] use of six.iteritems()

2015-06-10 Thread Robert Collins
On 11 June 2015 at 15:48, Dolph Mathews dolph.math...@gmail.com wrote:
 tl;dr .iteritems() is faster and more memory efficient than .items() in
 python2


 Using xrange() in python2 instead of range() because it's more memory
 efficient and consistent between python 2 and 3...

 # xrange() + .items()
 python -m timeit -n 20 for\ i\ in\
 dict(enumerate(xrange(100))).items():\ pass
 20 loops, best of 3: 729 msec per loop
 peak memory usage: 203 megabytes

This test conflates setup and execution. Better like my example,
because otherwise you're not testing iteritems vs items, you're
testing dictionary creation time; likewise memory pressure. Your times
are meaningless as it stands.

To test memory pressure, don't use timeit. Just use the interpreter.
$ python
Python 2.7.8 (default, Oct 20 2014, 15:05:19)
[GCC 4.9.1] on linux2
Type help, copyright, credits or license for more information.
 d = dict(enumerate(range(100)))
 import os
 os.getpid()
28345
  PID USER  PR  NIVIRTRESSHR S  %CPU %MEM TIME+
COMMAND
28345 robertc   20   0  127260 104568   4744 S   0.0  0.6   0:00.17
python

 i = d.items()

28345 robertc   20   0  206524 183560   4744 S   0.0  1.1   0:00.59
python

183560-104568=80M to hold a reference to all 1 million items, which
indeed is not as efficient as python3. So *IF* we had a million item
dict, and absolutely nothing else around, we should care.

But again - where in OpenStack does this matter the slightest?

No one has disputed that they are different. The assertion that it
matters is what is out of line with our reality.

1 items:
28399 robertc   20   0   31404   8480   4612 S   0.0  0.1   0:00.01
python
28399 robertc   20   0   32172   9268   4612 S   0.0  0.1   0:00.01
python
9268-8489=0.8M which is indeed 2 orders of magnitude less. And I'd
STILL challenge anyone to find a place where 1 items are being
passed around within OpenStack's components without it being a bug
today.

Optimising away under a M of data when we shouldn't have that many
rows/items/whatever in memory in the first place is just entirely
missing the point of programming in Python.

-Rob

-- 
Robert Collins rbtcoll...@hp.com
Distinguished Technologist
HP Converged Cloud

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [all][python3] use of six.iteritems()

2015-06-10 Thread Dolph Mathews
tl;dr *.iteritems() is faster and more memory efficient than .items() in
python2*


Using xrange() in python2 instead of range() because it's more memory
efficient and consistent between python 2 and 3...

# xrange() + .items()
python -m timeit -n 20 for\ i\ in\
dict(enumerate(xrange(100))).items():\ pass
20 loops, best of 3: 729 msec per loop
peak memory usage: 203 megabytes

# xrange() + .iteritems()
python -m timeit -n 20 for\ i\ in\
dict(enumerate(xrange(100))).iteritems():\ pass
20 loops, best of 3: 644 msec per loop
peak memory usage: 176 megabytes

# python 3
python3 -m timeit -n 20 for\ i\ in\
dict(enumerate(range(100))).items():\ pass
20 loops, best of 3: 826 msec per loop
peak memory usage: 198 megabytes


And if you really want to see the results with range() in python2...

# range() + .items()
python -m timeit -n 20 for\ i\ in\
dict(enumerate(range(100))).items():\ pass
20 loops, best of 3: 851 msec per loop
peak memory usage: 254 megabytes

# range() + .iteritems()
python -m timeit -n 20 for\ i\ in\
dict(enumerate(range(100))).iteritems():\ pass
20 loops, best of 3: 919 msec per loop
peak memory usage: 184 megabytes


To benchmark memory consumption, I used the following on bare metal:

$ valgrind --tool=massif --pages-as-heap=yes -massif-out-file=massif.out
$COMMAND_FROM_ABOVE
$ cat massif.out | grep mem_heap_B | sort -u

$ python2 --version
Python 2.7.9

$ python3 --version
Python 3.4.3

On Wed, Jun 10, 2015 at 8:36 PM, gordon chung g...@live.ca wrote:

   Date: Wed, 10 Jun 2015 21:33:44 +1200
  From: robe...@robertcollins.net
  To: openstack-dev@lists.openstack.org
  Subject: Re: [openstack-dev] [all][python3] use of six.iteritems()
 
  On 10 June 2015 at 17:22, gordon chung g...@live.ca wrote:
   maybe the suggestion should be don't blindly apply six.iteritems or
 items rather than don't apply iteritems at all. admittedly, it's a massive
 eyesore, but it's a very real use case that some projects deal with large
 data results and to enforce the latter policy can have negative effects[1].
 one million item dictionary might be negligible but in a multi-user,
 multi-* environment that can have a significant impact on the amount memory
 required to store everything.
 
   [1] disclaimer: i have no real world results but i assume memory
 management was the reason for the switch in logic from py2 to py3
 
  I wouldn't make that assumption.
 
  And no, memory isn't an issue. If you have a million item dict,
  ignoring the internal overheads, the dict needs 1 million object
  pointers. The size of a list with those pointers in it is 1M (pointer
  size in bytes). E.g. 4M or 8M. Nothing to worry about given the
  footprint of such a program :)

 iiuc, items() (in py2) will create a copy of  the dictionary in memory to
 be processed. this is useful for cases such as concurrency where you want
 to ensure consistency but doing a quick test i noticed a massive spike in
 memory usage between items() and iteritems.

 'for i in dict(enumerate(range(100))).items(): pass' consumes
 significantly more memory than 'for i in
 dict(enumerate(range(100))).iteritems(): pass'. on my system, the
 difference in memory consumption was double when using items() vs
 iteritems() and the cpu util was significantly more as well... let me know
 if there's anything that stands out as inaccurate.

 unless there's something wrong with my ignorant testing above, i think
 it's something projects should consider when mass applying any
 iteritems/items patch.

 cheers,
 gord

 __
 OpenStack Development Mailing List (not for usage questions)
 Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] [all][python3] use of six.iteritems()

2015-06-09 Thread Robert Collins
I'm very glad folk are working on Python3 ports.

I'd like to call attention to one little wart in that process: I get
the feeling that folk are applying a massive regex to find things like
d.iteritems() and convert that to six.iteritems(d).

I'd very much prefer that such a regex approach move things to
d.items(), which is much easier to read.

Here's why. Firstly, very very very few of our dict iterations are
going to be performance sensitive in the way that iteritems() matters.
Secondly, no really - unless you're doing HUGE dicts, it doesn't
matter. Thirdly. Really, it doesn't.

At 1 million items the overhead is 54ms[1]. If we're doing inner loops
on million item dictionaries anywhere in OpenStack today, we have a
problem. We might want to in e.g. the scheduler... if it held
in-memory state on a million hypervisors at once, because I don't
really to to imagine it pulling a million rows from a DB on every
action. But then, we'd be looking at a whole 54ms. I think we could
survive, if we did that (which we don't).

So - please, no six.iteritems().

Thanks,
Rob


[1]
python2.7 -m timeit -s 'd=dict(enumerate(range(100)))' 'for i in
d.items(): pass'
10 loops, best of 3: 76.6 msec per loop
python2.7 -m timeit -s 'd=dict(enumerate(range(100)))' 'for i in
d.iteritems(): pass'
100 loops, best of 3: 22.6 msec per loop
python3.4 -m timeit -s 'd=dict(enumerate(range(100)))' 'for i in
d.items(): pass'
10 loops, best of 3: 18.9 msec per loop
pypy2.3 -m timeit -s 'd=dict(enumerate(range(100)))' 'for i in
d.items(): pass'
10 loops, best of 3: 65.8 msec per loop
# and out of interest, assuming that that hadn't triggered the JIT
but it had.
 pypy -m timeit -n 1000 -s 'd=dict(enumerate(range(100)))' 'for i
in d.items(): pass'
1000 loops, best of 3: 64.3 msec per loop

-- 
Robert Collins rbtcoll...@hp.com
Distinguished Technologist
HP Converged Cloud

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [all][python3] use of six.iteritems()

2015-06-09 Thread Jay Pipes

On 06/09/2015 08:15 PM, Robert Collins wrote:

I'm very glad folk are working on Python3 ports.

I'd like to call attention to one little wart in that process: I get
the feeling that folk are applying a massive regex to find things like
d.iteritems() and convert that to six.iteritems(d).

I'd very much prefer that such a regex approach move things to
d.items(), which is much easier to read.

Here's why. Firstly, very very very few of our dict iterations are
going to be performance sensitive in the way that iteritems() matters.
Secondly, no really - unless you're doing HUGE dicts, it doesn't
matter. Thirdly. Really, it doesn't.

At 1 million items the overhead is 54ms[1]. If we're doing inner loops
on million item dictionaries anywhere in OpenStack today, we have a
problem. We might want to in e.g. the scheduler... if it held
in-memory state on a million hypervisors at once, because I don't
really to to imagine it pulling a million rows from a DB on every
action. But then, we'd be looking at a whole 54ms. I think we could
survive, if we did that (which we don't).

So - please, no six.iteritems().


+1

-jay

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [all][python3] use of six.iteritems()

2015-06-09 Thread Carl Baldwin
+1

Don't forget values and keys in addition to items.  They aren't as common
but come up every so often.  I think you can iterate the keys just by
iterating on the dict itself.

Carl
On Jun 9, 2015 6:18 PM, Robert Collins robe...@robertcollins.net wrote:

 I'm very glad folk are working on Python3 ports.

 I'd like to call attention to one little wart in that process: I get
 the feeling that folk are applying a massive regex to find things like
 d.iteritems() and convert that to six.iteritems(d).

 I'd very much prefer that such a regex approach move things to
 d.items(), which is much easier to read.

 Here's why. Firstly, very very very few of our dict iterations are
 going to be performance sensitive in the way that iteritems() matters.
 Secondly, no really - unless you're doing HUGE dicts, it doesn't
 matter. Thirdly. Really, it doesn't.

 At 1 million items the overhead is 54ms[1]. If we're doing inner loops
 on million item dictionaries anywhere in OpenStack today, we have a
 problem. We might want to in e.g. the scheduler... if it held
 in-memory state on a million hypervisors at once, because I don't
 really to to imagine it pulling a million rows from a DB on every
 action. But then, we'd be looking at a whole 54ms. I think we could
 survive, if we did that (which we don't).

 So - please, no six.iteritems().

 Thanks,
 Rob


 [1]
 python2.7 -m timeit -s 'd=dict(enumerate(range(100)))' 'for i in
 d.items(): pass'
 10 loops, best of 3: 76.6 msec per loop
 python2.7 -m timeit -s 'd=dict(enumerate(range(100)))' 'for i in
 d.iteritems(): pass'
 100 loops, best of 3: 22.6 msec per loop
 python3.4 -m timeit -s 'd=dict(enumerate(range(100)))' 'for i in
 d.items(): pass'
 10 loops, best of 3: 18.9 msec per loop
 pypy2.3 -m timeit -s 'd=dict(enumerate(range(100)))' 'for i in
 d.items(): pass'
 10 loops, best of 3: 65.8 msec per loop
 # and out of interest, assuming that that hadn't triggered the JIT
 but it had.
  pypy -m timeit -n 1000 -s 'd=dict(enumerate(range(100)))' 'for i
 in d.items(): pass'
 1000 loops, best of 3: 64.3 msec per loop

 --
 Robert Collins rbtcoll...@hp.com
 Distinguished Technologist
 HP Converged Cloud

 __
 OpenStack Development Mailing List (not for usage questions)
 Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [all][python3] use of six.iteritems()

2015-06-09 Thread Eugene Nikanorov
Huge +1 both for the suggestion and for reasoning.

It's better to avoid substituting language features by a library.

Eugene.

On Tue, Jun 9, 2015 at 5:15 PM, Robert Collins robe...@robertcollins.net
wrote:

 I'm very glad folk are working on Python3 ports.

 I'd like to call attention to one little wart in that process: I get
 the feeling that folk are applying a massive regex to find things like
 d.iteritems() and convert that to six.iteritems(d).

 I'd very much prefer that such a regex approach move things to
 d.items(), which is much easier to read.

 Here's why. Firstly, very very very few of our dict iterations are
 going to be performance sensitive in the way that iteritems() matters.
 Secondly, no really - unless you're doing HUGE dicts, it doesn't
 matter. Thirdly. Really, it doesn't.

 At 1 million items the overhead is 54ms[1]. If we're doing inner loops
 on million item dictionaries anywhere in OpenStack today, we have a
 problem. We might want to in e.g. the scheduler... if it held
 in-memory state on a million hypervisors at once, because I don't
 really to to imagine it pulling a million rows from a DB on every
 action. But then, we'd be looking at a whole 54ms. I think we could
 survive, if we did that (which we don't).

 So - please, no six.iteritems().

 Thanks,
 Rob


 [1]
 python2.7 -m timeit -s 'd=dict(enumerate(range(100)))' 'for i in
 d.items(): pass'
 10 loops, best of 3: 76.6 msec per loop
 python2.7 -m timeit -s 'd=dict(enumerate(range(100)))' 'for i in
 d.iteritems(): pass'
 100 loops, best of 3: 22.6 msec per loop
 python3.4 -m timeit -s 'd=dict(enumerate(range(100)))' 'for i in
 d.items(): pass'
 10 loops, best of 3: 18.9 msec per loop
 pypy2.3 -m timeit -s 'd=dict(enumerate(range(100)))' 'for i in
 d.items(): pass'
 10 loops, best of 3: 65.8 msec per loop
 # and out of interest, assuming that that hadn't triggered the JIT
 but it had.
  pypy -m timeit -n 1000 -s 'd=dict(enumerate(range(100)))' 'for i
 in d.items(): pass'
 1000 loops, best of 3: 64.3 msec per loop

 --
 Robert Collins rbtcoll...@hp.com
 Distinguished Technologist
 HP Converged Cloud

 __
 OpenStack Development Mailing List (not for usage questions)
 Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [all][python3] use of six.iteritems()

2015-06-09 Thread gordon chung
maybe the suggestion should be don't blindly apply six.iteritems or items 
rather than don't apply iteritems at all. admittedly, it's a massive eyesore, 
but it's a very real use case that some projects deal with large data results 
and to enforce the latter policy can have negative effects[1].  one million 
item dictionary might be negligible but in a multi-user, multi-* environment 
that can have a significant impact on the amount memory required to store 
everything.

[1] disclaimer: i have no real world results but i assume memory management was 
the reason for the switch in logic from py2 to py3

cheers,
gord



 Date: Wed, 10 Jun 2015 12:15:33 +1200
 From: robe...@robertcollins.net
 To: openstack-dev@lists.openstack.org
 Subject: [openstack-dev] [all][python3] use of six.iteritems()

 I'm very glad folk are working on Python3 ports.

 I'd like to call attention to one little wart in that process: I get
 the feeling that folk are applying a massive regex to find things like
 d.iteritems() and convert that to six.iteritems(d).

 I'd very much prefer that such a regex approach move things to
 d.items(), which is much easier to read.

 Here's why. Firstly, very very very few of our dict iterations are
 going to be performance sensitive in the way that iteritems() matters.
 Secondly, no really - unless you're doing HUGE dicts, it doesn't
 matter. Thirdly. Really, it doesn't.

 At 1 million items the overhead is 54ms[1]. If we're doing inner loops
 on million item dictionaries anywhere in OpenStack today, we have a
 problem. We might want to in e.g. the scheduler... if it held
 in-memory state on a million hypervisors at once, because I don't
 really to to imagine it pulling a million rows from a DB on every
 action. But then, we'd be looking at a whole 54ms. I think we could
 survive, if we did that (which we don't).

 So - please, no six.iteritems().

 Thanks,
 Rob


 [1]
 python2.7 -m timeit -s 'd=dict(enumerate(range(100)))' 'for i in
 d.items(): pass'
 10 loops, best of 3: 76.6 msec per loop
 python2.7 -m timeit -s 'd=dict(enumerate(range(100)))' 'for i in
 d.iteritems(): pass'
 100 loops, best of 3: 22.6 msec per loop
 python3.4 -m timeit -s 'd=dict(enumerate(range(100)))' 'for i in
 d.items(): pass'
 10 loops, best of 3: 18.9 msec per loop
 pypy2.3 -m timeit -s 'd=dict(enumerate(range(100)))' 'for i in
 d.items(): pass'
 10 loops, best of 3: 65.8 msec per loop
 # and out of interest, assuming that that hadn't triggered the JIT
 but it had.
 pypy -m timeit -n 1000 -s 'd=dict(enumerate(range(100)))' 'for i
 in d.items(): pass'
 1000 loops, best of 3: 64.3 msec per loop

 --
 Robert Collins rbtcoll...@hp.com
 Distinguished Technologist
 HP Converged Cloud

 __
 OpenStack Development Mailing List (not for usage questions)
 Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
  
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev