[issue2672] speed of set.update(

2008-04-24 Thread John Arbash Meinel

John Arbash Meinel [EMAIL PROTECTED] added the comment:

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Alexander Belopolsky wrote:
 Alexander Belopolsky [EMAIL PROTECTED] added the comment:
 
 This has nothing to do with set.update, the difference is due to the
 time to setup the generator:
 
 $ python -m timeit -s 'x = set(range(1)); y = []' 'x.update(y)'
 100 loops, best of 3: 0.38 usec per loop
 $ python -m timeit -s 'x = set(range(1)); y = (i for i in [])'
 'x.update(y)'
 100 loops, best of 3: 0.335 usec per loop
 
 --
 nosy: +belopolsky
 
 __
 Tracker [EMAIL PROTECTED]
 http://bugs.python.org/issue2672
 __
 

That is true, though if I just force a generator overhead:

% python -m timeit -s 'x = set(range(1)); y = []' 'x.update(y)'
100 loops, best of 3: 0.204 usec per loop
% python -m timeit -s 'x = set(range(1)); y = (i for i in [])'
'x.update(y)'
1000 loops, best of 3: 0.173 usec per loop
% python -m timeit -s 'x = set(range(1)); l = []' 'x.update(i for i
in l)'
100 loops, best of 3: 0.662 usec per loop
 python -m timeit -s 'x = set(range(1)); l = []; y = (i for i in l)'
'(i for i in l); x.update(y)'
100 loops, best of 3: 1.87 usec per loop

So if you compare consuming a generator multiple times to creating it
each time, it is 0.662 usec - 0.173 usec = 0.489 usec to create a generator.

So why does: (i for i in l); x.update(y) take an additional 1.208 usec.

(I'm certainly willing to believe that set.update() is generator/list
agnostic, but something weird is still happening.)

John
=:-

-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFIENAoJdeBCYSNAAMRAk2yAJ4okAalR6zWD0/E5XHei/ckce+L7QCgstEQ
l+6+bl7oAJMhdJ70viqicnQ=
=pLX6
-END PGP SIGNATURE-

--
title: speed of set.update([]) - speed of set.update(

__
Tracker [EMAIL PROTECTED]
http://bugs.python.org/issue2672
__
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue2672] speed of set.update(

2008-04-24 Thread Alexander Belopolsky

Alexander Belopolsky [EMAIL PROTECTED] added the comment:

On Thu, Apr 24, 2008 at 2:23 PM, John Arbash Meinel
[EMAIL PROTECTED] wrote:
..
  So if you compare consuming a generator multiple times to creating it
  each time, it is 0.662 usec - 0.173 usec = 0.489 usec to create a generator.

  So why does: (i for i in l); x.update(y) take an additional 1.208 usec.

  (I'm certainly willing to believe that set.update() is generator/list
  agnostic, but something weird is still happening.)

I've seen a similar strangeness in timings:

$ python -m timeit '(i for i in [])'
10 loops, best of 3: 4.16 usec per loop

but

$ python -m timeit -s 'x = set()' 'x.update(i for i in [])'
100 loops, best of 3: 1.31 usec per loop

on the other hand,

$ python -m timeit -s 'x = []' 'x.extend(i for i in [])'
10 loops, best of 3: 4.54 usec per loop

How can x.update(i for i in []) take *less* time than simply creating a genexp?

Note that there is no apparent bytecode tricks here:

  1   0 LOAD_CONST   0 (code object genexpr at
0xf7e88920, file stdin, line 1)
  3 MAKE_FUNCTION0
  6 BUILD_LIST   0
  9 GET_ITER
 10 CALL_FUNCTION1
 13 RETURN_VALUE
 dis(lambda:x.update(i for i in []))
  1   0 LOAD_GLOBAL  0 (x)
  3 LOAD_ATTR1 (update)
  6 LOAD_CONST   0 (code object genexpr at
0xf7e88920, file stdin, line 1)
  9 MAKE_FUNCTION0
 12 BUILD_LIST   0
 15 GET_ITER
 16 CALL_FUNCTION1
 19 CALL_FUNCTION1
 22 RETURN_VALUE

__
Tracker [EMAIL PROTECTED]
http://bugs.python.org/issue2672
__
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue2672] speed of set.update(

2008-04-24 Thread Raymond Hettinger

Raymond Hettinger [EMAIL PROTECTED] added the comment:

John, when y=[], the update method has to create a new list iterator on 
each invocation.  But when y is a genexp, it is self-iterable (iow, iter
(y) will return self, not a new object).

Also, when doing timings, it can be helpful to factor-out the attribute 
lookup:

python -m timeit -s 'x=set(range(1)); y=[]; xu=x.update' 'xu(y)'

__
Tracker [EMAIL PROTECTED]
http://bugs.python.org/issue2672
__
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue2672] speed of set.update(

2008-04-24 Thread John Arbash Meinel

John Arbash Meinel [EMAIL PROTECTED] added the comment:

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Raymond Hettinger wrote:
 Raymond Hettinger [EMAIL PROTECTED] added the comment:
 
 John, when y=[], the update method has to create a new list iterator on 
 each invocation.  But when y is a genexp, it is self-iterable (iow, iter
 (y) will return self, not a new object).
 
 Also, when doing timings, it can be helpful to factor-out the attribute 
 lookup:
 
 python -m timeit -s 'x=set(range(1)); y=[]; xu=x.update' 'xu(y)'
 
 __
 Tracker [EMAIL PROTECTED]
 http://bugs.python.org/issue2672
 __
 

Sure, I wasn't surprised at the set.update(y) versus set.update([])

What I was surprised at is the time for:

(i for i in []) being about 4x longer than
set.update(i for i in [])

Anyway, the original issue is probably closed, whether we want to track
into the generator stuff or not is probably a different issue.

John
=:-

-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFIEP4EJdeBCYSNAAMRAq+MAKC6tLjEtIBX7YgLNoYEfqjRKB4DzACglXjh
cEVLEP5Hu3vpeVgVYdTbAVc=
=94ja
-END PGP SIGNATURE-

__
Tracker [EMAIL PROTECTED]
http://bugs.python.org/issue2672
__
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue2672] speed of set.update([])

2008-04-23 Thread Alexander Belopolsky

Alexander Belopolsky [EMAIL PROTECTED] added the comment:

This has nothing to do with set.update, the difference is due to the
time to setup the generator:

$ python -m timeit -s 'x = set(range(1)); y = []' 'x.update(y)'
100 loops, best of 3: 0.38 usec per loop
$ python -m timeit -s 'x = set(range(1)); y = (i for i in [])'
'x.update(y)'
100 loops, best of 3: 0.335 usec per loop

--
nosy: +belopolsky

__
Tracker [EMAIL PROTECTED]
http://bugs.python.org/issue2672
__
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue2672] speed of set.update([])

2008-04-23 Thread Raymond Hettinger

Raymond Hettinger [EMAIL PROTECTED] added the comment:

I concur.  The source code for set_update() in Objects/setobject.c 
shows that both versions are accessed through the iterator protocol, so 
what you're seeing are the basic performance differences (start-up and 
per-item costs) for genexps vs list iterators.

--
resolution:  - invalid
status: open - closed

__
Tracker [EMAIL PROTECTED]
http://bugs.python.org/issue2672
__
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue2672] speed of set.update([])

2008-04-22 Thread John Arbash Meinel

New submission from John Arbash Meinel [EMAIL PROTECTED]:

I was performance profiling some of my own code, and I ran into
something unexpected. Specifically,
set.update(empty_generator_expression) was significantly slower than
set.update(empty_list_expression).

I double checked my findings with timeit:

With python 2.4.3:

$ python -m timeit -s 'x = set(range(1))' 'x.update([])'
100 loops, best of 3: 0.296 usec per loop
$ python -m timeit -s 'x = set(range(1))' 'x.update(y for y in [])'
100 loops, best of 3: 0.837 usec per loop
$ python -m timeit -s 'x = set(range(1))' 'x.update([y for y in []])'
100 loops, best of 3: 0.462 usec per loop

With 2.5.1 (on a different machine)
$ python -m timeit -s 'x = set(range(1))' 'x.update([])'
100 loops, best of 3: 0.265 usec per loop
$ python -m timeit -s 'x = set(range(1))' 'x.update(y for y in [])'
100 loops, best of 3: 0.717 usec per loop
$ python -m timeit -s 'x = set(range(1))' 'x.update([y for y in []])'
100 loops, best of 3: 0.39 usec per loop

So generally, it is about 2x faster to create the empty list expression
and pass it in than to use an empty generator.

--
components: Interpreter Core
messages: 65694
nosy: jameinel
severity: normal
status: open
title: speed of set.update([])
versions: Python 2.4, Python 2.5

__
Tracker [EMAIL PROTECTED]
http://bugs.python.org/issue2672
__
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue2672] speed of set.update([])

2008-04-22 Thread John Arbash Meinel

Changes by John Arbash Meinel [EMAIL PROTECTED]:


--
type:  - performance

__
Tracker [EMAIL PROTECTED]
http://bugs.python.org/issue2672
__
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue2672] speed of set.update([])

2008-04-22 Thread Raymond Hettinger

Changes by Raymond Hettinger [EMAIL PROTECTED]:


--
assignee:  - rhettinger
nosy: +rhettinger

__
Tracker [EMAIL PROTECTED]
http://bugs.python.org/issue2672
__
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com