Re: Performance of list vs. set equality operations

2010-04-10 Thread Stefan Behnel

Steven D'Aprano, 08.04.2010 03:41:

On Wed, 07 Apr 2010 10:55:10 -0700, Raymond Hettinger wrote:


[Gustavo Nare]

In other words: The more different elements two collections have, the
faster it is to compare them as sets. And as a consequence, the more
equivalent elements two collections have, the faster it is to compare
them as lists.

Is this correct?


If two collections are equal, then comparing them as a set is always
slower than comparing them as a list.  Both have to call __eq__ for
every element, but sets have to search for each element while lists can
just iterate over consecutive pointers.

If the two collections have unequal sizes, then both ways immediately
return unequal.



Perhaps I'm misinterpreting what you are saying, but I can't confirm that
behaviour, at least not for subclasses of list:

 class MyList(list):
... def __len__(self):
... return self.n
...
 L1 = MyList(range(10))
 L2 = MyList(range(10))
 L1.n = 9
 L2.n = 10
 L1 == L2
True
 len(L1) == len(L2)
False


This code incorrectly assumes that overriding __len__ has an impact on the 
equality of two lists. If you want to influence the equality, you need to 
override __eq__. If you don't, the original implementation is free to do 
whatever it likes to determine if it is equal to another value or not. If 
it uses __len__ for that or not is only an implementation detail that can't 
be relied upon.


Stefan

--
http://mail.python.org/mailman/listinfo/python-list


Re: Performance of list vs. set equality operations

2010-04-10 Thread Terry Reedy

On 4/10/2010 8:32 AM, Stefan Behnel wrote:

Steven D'Aprano, 08.04.2010 03:41:

On Wed, 07 Apr 2010 10:55:10 -0700, Raymond Hettinger wrote:



If the two collections have unequal sizes, then both ways immediately
return unequal.



Perhaps I'm misinterpreting what you are saying, but I can't confirm that
behaviour, at least not for subclasses of list:

 class MyList(list):
... def __len__(self):
... return self.n
...
 L1 = MyList(range(10))
 L2 = MyList(range(10))
 L1.n = 9
 L2.n = 10
 L1 == L2
True
 len(L1) == len(L2)
False


This code incorrectly assumes that overriding __len__ has an impact on
the equality of two lists. If you want to influence the equality, you
need to override __eq__. If you don't, the original implementation is
free to do whatever it likes to determine if it is equal to another
value or not. If it uses __len__ for that or not is only an
implementation detail that can't be relied upon.


After reading the responses of both you and Raymond, I realized that a) 
there is a real difference between 'checking lengths' and 'calling 
__len__', which I (and apparently the example) had seen as the same and 
b) that the example shows that assuming that they are the same is a 
mistake. Thank you both for the clarification.


Terry Jan Reedy

--
http://mail.python.org/mailman/listinfo/python-list


Re: Performance of list vs. set equality operations

2010-04-09 Thread Gabriel Genellina
En Thu, 08 Apr 2010 21:02:23 -0300, Patrick Maupin pmau...@gmail.com  
escribió:

On Apr 8, 6:35 pm, Gabriel Genellina gagsl-...@yahoo.com.ar wrote:

The CPython source contains lots of shortcuts like that. Perhaps the  
checks should be stricter in some cases, but I imagine it's not so easy  
to fix: lots of code was written in the pre-2.2 era, assuming that  
internal types were not subclassable.


I don't know if it's a good fix anyway.  If you subclass an internal
type, you can certainly supply your own rich comparison methods, which
would (IMO) put the CPU computation burden where it belongs if you
decide to do something goofy like subclass a list and then override
__len__.


We're all consenting adults, that's the Python philosophy, isn't it?
If I decide to make stupid things, it's my fault. I don't see why Python  
should have to prevent that.


--
Gabriel Genellina

--
http://mail.python.org/mailman/listinfo/python-list


Re: Performance of list vs. set equality operations

2010-04-09 Thread Patrick Maupin
On Apr 9, 1:07 pm, Gabriel Genellina gagsl-...@yahoo.com.ar wrote:
 En Thu, 08 Apr 2010 21:02:23 -0300, Patrick Maupin pmau...@gmail.com  
 escribió:

  On Apr 8, 6:35 pm, Gabriel Genellina gagsl-...@yahoo.com.ar wrote:

  The CPython source contains lots of shortcuts like that. Perhaps the  
  checks should be stricter in some cases, but I imagine it's not so easy  
  to fix: lots of code was written in the pre-2.2 era, assuming that  
  internal types were not subclassable.

  I don't know if it's a good fix anyway.  If you subclass an internal
  type, you can certainly supply your own rich comparison methods, which
  would (IMO) put the CPU computation burden where it belongs if you
  decide to do something goofy like subclass a list and then override
  __len__.

 We're all consenting adults, that's the Python philosophy, isn't it?
 If I decide to make stupid things, it's my fault. I don't see why Python  
 should have to prevent that.

 --
 Gabriel Genellina

Exactly.  I think we're in violent agreement on this issue ;-)
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Performance of list vs. set equality operations

2010-04-09 Thread Raymond Hettinger
  I don't know if it's a good fix anyway.  If you subclass an internal
  type, you can certainly supply your own rich comparison methods, which
  would (IMO) put the CPU computation burden where it belongs if you
  decide to do something goofy like subclass a list and then override
  __len__.

 We're all consenting adults, that's the Python philosophy, isn't it?
 If I decide to make stupid things, it's my fault. I don't see why Python  
 should have to prevent that.

Perhaps so for pure python classes, but the C builtins are another
story.

The C containers directly reference underlying structure and methods
for several reasons.  The foremost reason is that if their internal
invariants are violated, they can segfault.  A list's __getitem__
method needs to know the real length (not what you report in __len__)
if it is to avoid writing objects outside of its allocated memory
range.  Another reason is efficiency -- the cost of attribute lookups
is high and would spoil the performance of the builtins if they could
not access their underlying structure and friend methods directly.
It is important to have those perform well because they are used
heavily
in everyday programming.

There are also couple of OOP design considerations.  The
http://en.wikipedia.org/wiki/Open/closed_principle is one example.

Encapsulation is another example.  If you override __len__
in order to influence the behavior of __eq__, then you're
relying on an implementation detail, not the published interface.
Eventhough the length check is an obvious optimization
for list equality and set equality, there is no guarantee
that other implementations of Python use that same pattern.

my-two-cents-ly yours,

Raymond


-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Performance of list vs. set equality operations

2010-04-08 Thread Steven D'Aprano
On Wed, 07 Apr 2010 20:14:23 -0700, Raymond Hettinger wrote:

 [Raymond Hettinger]
  If the two collections have unequal sizes, then both ways immediately
  return unequal.
 
 [Steven D'Aprano]
 Perhaps I'm misinterpreting what you are saying, but I can't confirm
 that behaviour, at least not for subclasses of list:
 
 For doubters, see list_richcompare() in
 http://svn.python.org/view/python/trunk/Objects/listobject.c?
revision=78522view=markup

So what happens in my example with a subclass that (falsely) reports a 
different length even when the lists are the same?

I can guess that perhaps Py_SIZE does not call the subclass __len__ 
method, and therefore is not fooled by it lying. Is that the case?


-- 
Steven
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Performance of list vs. set equality operations

2010-04-08 Thread Terry Reedy

On 4/8/2010 3:07 AM, Steven D'Aprano wrote:

On Wed, 07 Apr 2010 20:14:23 -0700, Raymond Hettinger wrote:


[Raymond Hettinger]

If the two collections have unequal sizes, then both ways immediately
return unequal.


[Steven D'Aprano]

Perhaps I'm misinterpreting what you are saying, but I can't confirm
that behaviour, at least not for subclasses of list:


For doubters, see list_richcompare() in
http://svn.python.org/view/python/trunk/Objects/listobject.c?

revision=78522view=markup

So what happens in my example with a subclass that (falsely) reports a
different length even when the lists are the same?

I can guess that perhaps Py_SIZE does not call the subclass __len__
method, and therefore is not fooled by it lying. Is that the case?


Adding a print call within __len__ should determine that.



--
http://mail.python.org/mailman/listinfo/python-list


Re: Performance of list vs. set equality operations

2010-04-08 Thread Raymond Hettinger
[Steven D'Aprano]
 So what happens in my example with a subclass that (falsely) reports a
 different length even when the lists are the same?

 I can guess that perhaps Py_SIZE does not call the subclass __len__
 method, and therefore is not fooled by it lying. Is that the case?

Yes.  Py_SIZE() gets the actual size of the underlying list.

The methods for most builtin containers typically access the
underlying structure directly.  That makes them fast and allows
them to maintain their internal invariants.


Raymond

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Performance of list vs. set equality operations

2010-04-08 Thread Gabriel Genellina
En Thu, 08 Apr 2010 04:07:53 -0300, Steven D'Aprano  
ste...@remove.this.cybersource.com.au escribió:

On Wed, 07 Apr 2010 20:14:23 -0700, Raymond Hettinger wrote:

[Raymond Hettinger]



 If the two collections have unequal sizes, then both ways immediately
 return unequal.


[Steven D'Aprano]

Perhaps I'm misinterpreting what you are saying, but I can't confirm
that behaviour, at least not for subclasses of list:


For doubters, see list_richcompare() in
http://svn.python.org/view/python/trunk/Objects/listobject.c?

revision=78522view=markup

So what happens in my example with a subclass that (falsely) reports a
different length even when the lists are the same?

I can guess that perhaps Py_SIZE does not call the subclass __len__
method, and therefore is not fooled by it lying. Is that the case?


Yes. Py_SIZE is a generic macro, it returns the ob_size field from the  
object structure. No method is called at all.


Another example: the print statement bypasses the sys.stdout.write()  
method and calls directly fwrite() at the C level when it determines that  
sys.stdout is a `file` instance. Even if it's a subclass of file, so  
overriding write() in Python code does not work.


The CPython source contains lots of shortcuts like that. Perhaps the  
checks should be stricter in some cases, but I imagine it's not so easy to  
fix: lots of code was written in the pre-2.2 era, assuming that internal  
types were not subclassable.


--
Gabriel Genellina

--
http://mail.python.org/mailman/listinfo/python-list


Re: Performance of list vs. set equality operations

2010-04-08 Thread Patrick Maupin
On Apr 8, 6:35 pm, Gabriel Genellina gagsl-...@yahoo.com.ar wrote:

 The CPython source contains lots of shortcuts like that. Perhaps the  
 checks should be stricter in some cases, but I imagine it's not so easy to  
 fix: lots of code was written in the pre-2.2 era, assuming that internal  
 types were not subclassable.

I don't know if it's a good fix anyway.  If you subclass an internal
type, you can certainly supply your own rich comparison methods, which
would (IMO) put the CPU computation burden where it belongs if you
decide to do something goofy like subclass a list and then override
__len__.

Regards,
Pat
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Performance of list vs. set equality operations

2010-04-07 Thread Raymond Hettinger
[Gustavo Nare]
 In other words: The more different elements two collections have, the
 faster it is to compare them as sets. And as a consequence, the more
 equivalent elements two collections have, the faster it is to compare
 them as lists.

 Is this correct?

If two collections are equal, then comparing them as a set is always
slower than comparing them as a list.  Both have to call __eq__ for
every element, but sets have to search for each element while lists
can just iterate over consecutive pointers.

If the two collections have unequal sizes, then both ways immediately
return unequal.

If the two collections are unequal but have the same size, then
the comparison time is data dependent (when the first mismatch
is found).


Raymond
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Performance of list vs. set equality operations

2010-04-07 Thread Steven D'Aprano
On Wed, 07 Apr 2010 10:55:10 -0700, Raymond Hettinger wrote:

 [Gustavo Nare]
 In other words: The more different elements two collections have, the
 faster it is to compare them as sets. And as a consequence, the more
 equivalent elements two collections have, the faster it is to compare
 them as lists.

 Is this correct?
 
 If two collections are equal, then comparing them as a set is always
 slower than comparing them as a list.  Both have to call __eq__ for
 every element, but sets have to search for each element while lists can
 just iterate over consecutive pointers.
 
 If the two collections have unequal sizes, then both ways immediately
 return unequal.


Perhaps I'm misinterpreting what you are saying, but I can't confirm that 
behaviour, at least not for subclasses of list:

 class MyList(list):
... def __len__(self):
... return self.n
...
 L1 = MyList(range(10))
 L2 = MyList(range(10))
 L1.n = 9
 L2.n = 10
 L1 == L2
True
 len(L1) == len(L2)
False




-- 
Steven
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Performance of list vs. set equality operations

2010-04-07 Thread Patrick Maupin
On Apr 7, 8:41 pm, Steven D'Aprano
ste...@remove.this.cybersource.com.au wrote:
 On Wed, 07 Apr 2010 10:55:10 -0700, Raymond Hettinger wrote:
  [Gustavo Nare]
  In other words: The more different elements two collections have, the
  faster it is to compare them as sets. And as a consequence, the more
  equivalent elements two collections have, the faster it is to compare
  them as lists.

  Is this correct?

  If two collections are equal, then comparing them as a set is always
  slower than comparing them as a list.  Both have to call __eq__ for
  every element, but sets have to search for each element while lists can
  just iterate over consecutive pointers.

  If the two collections have unequal sizes, then both ways immediately
  return unequal.

 Perhaps I'm misinterpreting what you are saying, but I can't confirm that
 behaviour, at least not for subclasses of list:

  class MyList(list):

 ...     def __len__(self):
 ...             return self.n
 ... L1 = MyList(range(10))
  L2 = MyList(range(10))
  L1.n = 9
  L2.n = 10
  L1 == L2
 True
  len(L1) == len(L2)

 False

 --
 Steven

I think what he is saying is that the list __eq__ method will look at
the list lengths first.  This may or may not be considered a subtle
bug for the edge case you are showing.

If I do the following:

 L1 = range(1000)
 L2 = range(1000)
 L3 = range(1001)
 L1 == L2
True
 L1 == L3
False


I don't even need to run timeit -- the True takes awhile to print
out, while the False prints out immediately.

Regards,
Pat
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Performance of list vs. set equality operations

2010-04-07 Thread Raymond Hettinger
[Raymond Hettinger]
  If the two collections have unequal sizes, then both ways immediately
  return unequal.

[Steven D'Aprano]
 Perhaps I'm misinterpreting what you are saying, but I can't confirm that
 behaviour, at least not for subclasses of list:

For doubters, see list_richcompare() in
http://svn.python.org/view/python/trunk/Objects/listobject.c?revision=78522view=markup

if (Py_SIZE(vl) != Py_SIZE(wl)  (op == Py_EQ || op == Py_NE)) {
/* Shortcut: if the lengths differ, the lists differ */
PyObject *res;
if (op == Py_EQ)
res = Py_False;
else
res = Py_True;
Py_INCREF(res);
return res;
}

And see set_richcompare() in
http://svn.python.org/view/python/trunk/Objects/setobject.c?revision=78886view=markup

case Py_EQ:
if (PySet_GET_SIZE(v) != PySet_GET_SIZE(w))
Py_RETURN_FALSE;
if (v-hash != -1  
((PySetObject *)w)-hash != -1 
v-hash != ((PySetObject *)w)-hash)
Py_RETURN_FALSE;
return set_issubset(v, w);


Raymond
-- 
http://mail.python.org/mailman/listinfo/python-list


Performance of list vs. set equality operations

2010-04-06 Thread Gustavo Narea
Hello!

Could you please confirm whether my understanding of equality
operations in sets and lists is correct? This is how I think things
work, partially based on experimentation and the online documentation
for Python:

When you compare two lists, *every* element of one of the lists is
compared against the element at the same position in the other list;
that comparison is done by the __eq__() method (or the equivalent for
builtin types). This is interrupted when a result is False.

When you compare two sets, there's a loop over all the elements of the
first set, where the hash for that element is looked up in the second
set:
 - If this hash matches the hash for one or more elements in the
second set, the element in the first set is compared (with __eq__ or
equivalent) against the elements in the second set which have the same
hash. When a result is True, nothing else is done on that element and
the loop takes the next element in the first set; when all the results
are False, the loop ends and the two sets are not equivalent.
 - If the hash doesn't match that of an element in the second set,
then the loop ends and the two sets are not equivalent.

So this means that:
 1.- When you have two collections which have the same elements, the
equality operation will *always* be faster with lists.
 2.- When you have two collections with different elements, the
equality operation *may* be faster with sets.

For example, if you have two collections of 1,000 elements each and
998 of them are equivalent, comparing both collections as sets will be
slower than doing it with lists. But if you have two collections of
1,000 elements each and 998 of them are not equivalent, then comparing
both collections as lists will be slower than doing it with sets.

The performance of equality operations on sets is directly
proportional to the amount of different elements in both sets, while
the performance of equality operations on lists is simply proportional
to the cardinality of the collection.

In other words: The more different elements two collections have, the
faster it is to compare them as sets. And as a consequence, the more
equivalent elements two collections have, the faster it is to compare
them as lists.

Is this correct?

This is why so many people advocate the use of sets instead of lists/
tuples in similar situations, right?

Cheers,

 - Gustavo.
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Performance of list vs. set equality operations

2010-04-06 Thread Chris Colbert
the proof is in the pudding:

In [1]: a = range(1)

In [2]: s = set(a)

In [3]: s2 = set(a)

In [5]: b = range(1)

In [6]: a == b
Out[6]: True

In [7]: s == s2
Out[7]: True

In [8]: %timeit a == b
1000 loops, best of 3: 204 us per loop

In [9]: %timeit s == s2
1 loops, best of 3: 124 us per loop


On Tue, Apr 6, 2010 at 2:11 PM, Gustavo Narea m...@gustavonarea.net wrote:
 Hello!

 Could you please confirm whether my understanding of equality
 operations in sets and lists is correct? This is how I think things
 work, partially based on experimentation and the online documentation
 for Python:

 When you compare two lists, *every* element of one of the lists is
 compared against the element at the same position in the other list;
 that comparison is done by the __eq__() method (or the equivalent for
 builtin types). This is interrupted when a result is False.

 When you compare two sets, there's a loop over all the elements of the
 first set, where the hash for that element is looked up in the second
 set:
  - If this hash matches the hash for one or more elements in the
 second set, the element in the first set is compared (with __eq__ or
 equivalent) against the elements in the second set which have the same
 hash. When a result is True, nothing else is done on that element and
 the loop takes the next element in the first set; when all the results
 are False, the loop ends and the two sets are not equivalent.
  - If the hash doesn't match that of an element in the second set,
 then the loop ends and the two sets are not equivalent.

 So this means that:
  1.- When you have two collections which have the same elements, the
 equality operation will *always* be faster with lists.
  2.- When you have two collections with different elements, the
 equality operation *may* be faster with sets.

 For example, if you have two collections of 1,000 elements each and
 998 of them are equivalent, comparing both collections as sets will be
 slower than doing it with lists. But if you have two collections of
 1,000 elements each and 998 of them are not equivalent, then comparing
 both collections as lists will be slower than doing it with sets.

 The performance of equality operations on sets is directly
 proportional to the amount of different elements in both sets, while
 the performance of equality operations on lists is simply proportional
 to the cardinality of the collection.

 In other words: The more different elements two collections have, the
 faster it is to compare them as sets. And as a consequence, the more
 equivalent elements two collections have, the faster it is to compare
 them as lists.

 Is this correct?

 This is why so many people advocate the use of sets instead of lists/
 tuples in similar situations, right?

 Cheers,

  - Gustavo.
 --
 http://mail.python.org/mailman/listinfo/python-list

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Performance of list vs. set equality operations

2010-04-06 Thread Gustavo Narea
On Apr 6, 7:28 pm, Chris Colbert sccolb...@gmail.com wrote:
 the proof is in the pudding:

 In [1]: a = range(1)

 In [2]: s = set(a)

 In [3]: s2 = set(a)

 In [5]: b = range(1)

 In [6]: a == b
 Out[6]: True

 In [7]: s == s2
 Out[7]: True

 In [8]: %timeit a == b
 1000 loops, best of 3: 204 us per loop

 In [9]: %timeit s == s2
 1 loops, best of 3: 124 us per loop


I think you meant to set s2 = set(b):
=
In [1]: a = range(1)

In [2]: b = range(1)

In [3]: s1 = set(a)

In [4]: s2 = set(a)

In [5]: s3 = set(b)

In [6]: %timeit a == b
1 loops, best of 3: 191 us per loop

In [7]: %timeit s1 == s2
1 loops, best of 3: 118 us per loop

In [8]: %timeit s1 == s3
1000 loops, best of 3: 325 us per loop
=

Cheers.
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Performance of list vs. set equality operations

2010-04-06 Thread Chris Colbert
:slaps forehead:

good catch.

On Tue, Apr 6, 2010 at 5:18 PM, Gustavo Narea m...@gustavonarea.net wrote:
 On Apr 6, 7:28 pm, Chris Colbert sccolb...@gmail.com wrote:
 the proof is in the pudding:

 In [1]: a = range(1)

 In [2]: s = set(a)

 In [3]: s2 = set(a)

 In [5]: b = range(1)

 In [6]: a == b
 Out[6]: True

 In [7]: s == s2
 Out[7]: True

 In [8]: %timeit a == b
 1000 loops, best of 3: 204 us per loop

 In [9]: %timeit s == s2
 1 loops, best of 3: 124 us per loop


 I think you meant to set s2 = set(b):
 =
 In [1]: a = range(1)

 In [2]: b = range(1)

 In [3]: s1 = set(a)

 In [4]: s2 = set(a)

 In [5]: s3 = set(b)

 In [6]: %timeit a == b
 1 loops, best of 3: 191 us per loop

 In [7]: %timeit s1 == s2
 1 loops, best of 3: 118 us per loop

 In [8]: %timeit s1 == s3
 1000 loops, best of 3: 325 us per loop
 =

 Cheers.
 --
 http://mail.python.org/mailman/listinfo/python-list

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Performance of list vs. set equality operations

2010-04-06 Thread Jack Diederich
On Tue, Apr 6, 2010 at 2:11 PM, Gustavo Narea m...@gustavonarea.net wrote:
 Hello!

 Could you please confirm whether my understanding of equality
 operations in sets and lists is correct? This is how I think things
 work, partially based on experimentation and the online documentation
 for Python:

 When you compare two lists, *every* element of one of the lists is
 compared against the element at the same position in the other list;
 that comparison is done by the __eq__() method (or the equivalent for
 builtin types). This is interrupted when a result is False.

 When you compare two sets, there's a loop over all the elements of the
 first set, where the hash for that element is looked up in the second
 set:
[snip]
 In other words: The more different elements two collections have, the
 faster it is to compare them as sets. And as a consequence, the more
 equivalent elements two collections have, the faster it is to compare
 them as lists.

 Is this correct?

Yes, but faster isn't the same thing as free.  I still get bitten
occasionally by code that blows away the difference by including a
set-wise assert() in a loop.   Also, lists can have mutable members so
there are times you really /do/ want to compare lists instead of
hashes.

-Jack
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Performance of list vs. set equality operations

2010-04-06 Thread Lie Ryan
On 04/07/10 04:11, Gustavo Narea wrote:
 Hello!
 
 Could you please confirm whether my understanding of equality
 operations in sets and lists is correct? This is how I think things
 work, partially based on experimentation and the online documentation
 for Python:
 
 When you compare two lists, *every* element of one of the lists is
 compared against the element at the same position in the other list;
 that comparison is done by the __eq__() method (or the equivalent for
 builtin types). This is interrupted when a result is False.
 
 When you compare two sets, there's a loop over all the elements of the
 first set, where the hash for that element is looked up in the second
 set:
  - If this hash matches the hash for one or more elements in the
 second set, the element in the first set is compared (with __eq__ or
 equivalent) against the elements in the second set which have the same
 hash. When a result is True, nothing else is done on that element and
 the loop takes the next element in the first set; when all the results
 are False, the loop ends and the two sets are not equivalent.
  - If the hash doesn't match that of an element in the second set,
 then the loop ends and the two sets are not equivalent.

I have not seen python's set implementation, but if you keep a bitmap of
hashes that already exist in a set, you can compare 32 or 64 items (i.e.
the computer's native word-size) at a time instead of comparing items
one-by-one[1]; this could marginally improve set operation's performance
for doing comparisons, difference, update, etc. Anyone that have seen
python's set source code can confirm whether such thing are implemented
in python's set?

[1] the possibility of hash collision complicates this a little bit, I
haven't fully thought out the the consequences of the interaction of
such bitmap with hash collision handling (there was a PyCon 2010 talk
The Mighty Dictionary by Brandon Craig Rhodes describing collision
handling
http://python.mirocommunity.org/video/1591/pycon-2010-the-mighty-dictiona).
-- 
http://mail.python.org/mailman/listinfo/python-list