R. David Murray wrote:
Lie Ryan <lie.1...@gmail.com> wrote:
Matt Nordhoff wrote:
Alan G Isaac wrote:
Hans Larsen schrieb:
How could I "take" an elemment from a set or a frozenset
On 3/8/2009 2:06 PM Diez B. Roggisch apparently wrote:
You iterate over them. If you only want one value, use
iter(the_set).next()
I recall a claim that

    for result in myset: break

is the most efficient way to get one result.
Is this right? (It seems nearly the same.)

Alan Isaac
Checking Python 2.5 on Linux, your solution is much faster, but seeing
as they both come in under a microsecond, it hardly matters.

It's unexpected...

 >>> timeit.timeit('res=iter(myset).next()', 'myset=range(1000000)')
0.88944123999999647
>>> timeit.timeit('res=myset.next()', 'myset=range(1000000); myset=iter(myset)')
0.49165520000002516
 >>> timeit.timeit('for res in myset: break', 'myset=range(1000000)')
0.32933007999997699

I'd never expect that for-loop assignment is even faster than a precreated iter object (the second test)... but I don't think this for-looping variable leaking behavior is guaranteed, isn't it?

My guess would be that what's controlling the timing here is
name lookup.  Three in the first example, two in the second,
and one in the third.

You got it:

>>> timeit.timeit('res=myset()', 'myset=range(1000000); myset=iter(myset).next')
0.26465903999999796


----------------------

The following is a complete benchmark:

>>> timeit.timeit('res=iter(myset).next()', 'myset=range(10000000)', number=10000000)
8.5145002000000432

>>> timeit.timeit('res=myset.next()', 'myset=range(10000000); myset=iter(myset)', number=10000000)
4.5509802800000898

>>> timeit.timeit('for res in myset: break', 'myset=range(10000000)', number=10000000)
2.9994213600000421

>>> timeit.timeit('res=myset()', 'myset=range(10000000); myset=iter(myset).next', number=10000000)
2.2228832400001011

----------------------
I also performed additional timing for overhead:

Local name lookup:
>>> timeit.timeit('myset', 'myset=range(1000000)', number=10000000)
1.1086400799999865

Global name lookup:
>>> timeit.timeit('iter', number=10000000)
1.8149410799999259

Attribute lookup:
>>> timeit.timeit('myset.next', 'myset=range(10000000); myset=iter(myset)', number=10000000)
3.3011333999997987

Combined multiple name lookup that troubled first test
>>> timeit.timeit('iter(myset).next', 'myset=range(10000000)', number=10000000)
6.5599374800000305

Creating iterables:
>>> timeit.timeit('iter(myset)', 'myset=range(1000000)', number=10000000)
4.259406719999788

----------------------
So adjusting the overheads:

Attribute lookup:
>>> timeit.timeit('myset.next', 'myset=range(10000000); myset=iter(myset)', number=10000000)
3.3011333999997987
The timing for Attribute also include a local name lookup (myset), so the real attribute lookup time shold be:
3.3011333999997987 - 1.1086400799999865 = 2.1924933199998122

Creating iterables:
>>> timeit.timeit('iter(myset)', 'myset=range(1000000)', number=10000000)
4.259406719999788
Creating iterable involve global name lookup, so the real time should be:
4.259406719999788 - 1.8149410799999259 = 2.4444656399998621

----------------------
To summarize the adjusted overheads:

Local name lookup: 1.1086400799999865
Global name lookup: 1.8149410799999259
Attribute lookup: 2.1924933199998122
Creating iterables: 2.4444656399998621

----------------------
Back to the problem, now we'll be adjusting the timing of each codes:
'res=iter(myset).next()': 8.5145002000000432
Adjusting with the "Combined multiple name lookup"
8.5145002000000432 - 6.5599374800000305 = 1.9545627200000126
Another way to do the adjustment:
Adjusting global name lookup (iter):
8.5145002000000432 - 1.8149410799999259 = 6.6995591200001172
Adjusting iterable creation:
6.6995591200001172 - 2.4444656399998621 = 4.2550934800002551
Adjusting attribute lookup:
4.2550934800002551 - 2.1924933199998122 = 2.0626001600004429

'res=myset.next()': 4.5509802800000898
Adjusting with |unadjusted| attribute lookup:
4.5509802800000898 - 3.3011333999997987 = 1.2498468800002911
Another way to do the adjustment:
Adjusting with local name lookup:
4.5509802800000898 - 1.1086400799999865 = 3.4423402000001033
Adjusting with attribute lookup:
3.4423402000001033 - 2.1924933199998122 = 1.2498468800002911

'for res in myset: break': 2.9994213600000421
Adjusting for local name lookup (myset):
2.9994213600000421 - 1.1086400799999865 = 1.8907812800000556

'res=myset()': 2.2228832400001011
Adjusting for local name lookup
2.2228832400001011 - 1.1086400799999865 = 1.1142431600001146

----------------------

To summarize:
'res=iter(myset).next()': 1.9545627200000126 / 2.0626001600004429
'res=myset.next()': 1.2498468800002911 / 1.2498468800002911
'for res in myset: break': 1.8907812800000556
'res=myset()': 1.1142431600001146

----------------------

To conclude, 'for res in myset: break' is actually not much faster than 'res=iter(myset).next()' except the former saves a lot of name lookup. The problem with 'res=iter(myset).next()' is too many name lookup and creating iter() object.

The fastest method is 'res=myset()' which eliminates the name lookup, it is twice as fast as any other methods after all the overheads are eliminated.

DISCLAIMER: I cannot guarantee there aren't any mistake.

PS: The result of the benchmark must be taken with a grain of salt. It is only apparent after 10000000 (10**7) iteration, which means a second difference is only 10**-7 difference in reality.
--
http://mail.python.org/mailman/listinfo/python-list

Reply via email to