Re: Set & Frozenset?

Lie Ryan Wed, 11 Mar 2009 08:00:48 -0700

R. David Murray wrote:

Lie Ryan <lie.1...@gmail.com> wrote:

Matt Nordhoff wrote:
Alan G Isaac wrote:
Hans Larsen schrieb:
How could I "take" an elemment from a set or a frozenset
On 3/8/2009 2:06 PM Diez B. Roggisch apparently wrote:
You iterate over them. If you only want one value, use
iter(the_set).next()
I recall a claim that
    for result in myset: break

is the most efficient way to get one result.
Is this right? (It seems nearly the same.)

Alan Isaac
Checking Python 2.5 on Linux, your solution is much faster, but seeing
as they both come in under a microsecond, it hardly matters.
It's unexpected...

 >>> timeit.timeit('res=iter(myset).next()', 'myset=range(1000000)')
0.88944123999999647
>>> timeit.timeit('res=myset.next()', 'myset=range(1000000);myset=iter(myset)')
0.49165520000002516
 >>> timeit.timeit('for res in myset: break', 'myset=range(1000000)')
0.32933007999997699
I'd never expect that for-loop assignment is even faster than aprecreated iter object (the second test)... but I don't think thisfor-looping variable leaking behavior is guaranteed, isn't it?


My guess would be that what's controlling the timing here is
name lookup.  Three in the first example, two in the second,
and one in the third.


You got it:

>>> timeit.timeit('res=myset()', 'myset=range(1000000);myset=iter(myset).next')

0.26465903999999796


----------------------

The following is a complete benchmark:

>>> timeit.timeit('res=iter(myset).next()', 'myset=range(10000000)',number=10000000)

8.5145002000000432

>>> timeit.timeit('res=myset.next()', 'myset=range(10000000);myset=iter(myset)', number=10000000)

4.5509802800000898

>>> timeit.timeit('for res in myset: break', 'myset=range(10000000)',number=10000000)

2.9994213600000421

>>> timeit.timeit('res=myset()', 'myset=range(10000000);myset=iter(myset).next', number=10000000)

2.2228832400001011

----------------------
I also performed additional timing for overhead:

Local name lookup:
>>> timeit.timeit('myset', 'myset=range(1000000)', number=10000000)
1.1086400799999865

Global name lookup:
>>> timeit.timeit('iter', number=10000000)
1.8149410799999259

Attribute lookup:

>>> timeit.timeit('myset.next', 'myset=range(10000000);myset=iter(myset)', number=10000000)

3.3011333999997987

Combined multiple name lookup that troubled first test

>>> timeit.timeit('iter(myset).next', 'myset=range(10000000)',number=10000000)

6.5599374800000305

Creating iterables:
>>> timeit.timeit('iter(myset)', 'myset=range(1000000)', number=10000000)
4.259406719999788

----------------------
So adjusting the overheads:

Attribute lookup:

>>> timeit.timeit('myset.next', 'myset=range(10000000);myset=iter(myset)', number=10000000)

3.3011333999997987

The timing for Attribute also include a local name lookup (myset), sothe real attribute lookup time shold be:

3.3011333999997987 - 1.1086400799999865 = 2.1924933199998122

Creating iterables:
>>> timeit.timeit('iter(myset)', 'myset=range(1000000)', number=10000000)
4.259406719999788
Creating iterable involve global name lookup, so the real time should be:
4.259406719999788 - 1.8149410799999259 = 2.4444656399998621

----------------------
To summarize the adjusted overheads:

Local name lookup: 1.1086400799999865
Global name lookup: 1.8149410799999259
Attribute lookup: 2.1924933199998122
Creating iterables: 2.4444656399998621

----------------------
Back to the problem, now we'll be adjusting the timing of each codes:
'res=iter(myset).next()': 8.5145002000000432
Adjusting with the "Combined multiple name lookup"
8.5145002000000432 - 6.5599374800000305 = 1.9545627200000126
Another way to do the adjustment:
Adjusting global name lookup (iter):
8.5145002000000432 - 1.8149410799999259 = 6.6995591200001172
Adjusting iterable creation:
6.6995591200001172 - 2.4444656399998621 = 4.2550934800002551
Adjusting attribute lookup:
4.2550934800002551 - 2.1924933199998122 = 2.0626001600004429

'res=myset.next()': 4.5509802800000898
Adjusting with |unadjusted| attribute lookup:
4.5509802800000898 - 3.3011333999997987 = 1.2498468800002911
Another way to do the adjustment:
Adjusting with local name lookup:
4.5509802800000898 - 1.1086400799999865 = 3.4423402000001033
Adjusting with attribute lookup:
3.4423402000001033 - 2.1924933199998122 = 1.2498468800002911

'for res in myset: break': 2.9994213600000421
Adjusting for local name lookup (myset):
2.9994213600000421 - 1.1086400799999865 = 1.8907812800000556

'res=myset()': 2.2228832400001011
Adjusting for local name lookup
2.2228832400001011 - 1.1086400799999865 = 1.1142431600001146

----------------------

To summarize:
'res=iter(myset).next()': 1.9545627200000126 / 2.0626001600004429
'res=myset.next()': 1.2498468800002911 / 1.2498468800002911
'for res in myset: break': 1.8907812800000556
'res=myset()': 1.1142431600001146

----------------------

To conclude, 'for res in myset: break' is actually not much faster than'res=iter(myset).next()' except the former saves a lot of name lookup.The problem with 'res=iter(myset).next()' is too many name lookup andcreating iter() object.

The fastest method is 'res=myset()' which eliminates the name lookup, itis twice as fast as any other methods after all the overheads areeliminated.


DISCLAIMER: I cannot guarantee there aren't any mistake.

PS: The result of the benchmark must be taken with a grain of salt. Itis only apparent after 10000000 (10**7) iteration, which means a seconddifference is only 10**-7 difference in reality.

--
http://mail.python.org/mailman/listinfo/python-list

Re: Set & Frozenset?

Reply via email to