Re: [Python-Dev] Retrieve an arbitrary element from a setwithoutremoving it

2009-11-05 Thread Chris Bergstresser
On Wed, Nov 4, 2009 at 7:07 PM, Raymond Hettinger pyt...@rcn.com wrote:
 [Steven D'Aprano]
 Anyway, given the level of opposition to the suggestion, I'm no longer
 willing to carry the flag for it. If anyone else -- perhaps the OP --
 feels they want to take it any further, be my guest.

   I feel pretty strongly that it's a wart in the language, and a
sufficiently strong one that it should be remedied.  I'm happy to
champion it, but haven't the faintest idea what that entails.

 Summarizing my opposition to a new set method:
 1) there already are at least two succinct ways to get the same effect
 2) those ways work with any container, not just sets
 3) set implementations in other languages show that this isn't needed.
 4) there is value to keeping the API compact
 5) isn't needed for optimization (selecting the same value in a loop makes
 no sense)
 6) absence of real-world code examples that would be meaningfully improved

 I would be happy to add an example to the docs so that this thread
 can finally end.

   Adding an example to the docs does not solve the problem, which is
if you come across the following code:

 for x in s:
 break

... it really looks like it does nothing.  It's only because of the
slightly idiosyncratic way Python handles variable scoping that it has
an effect at all, and that effect isn't overtly related to what the
code says, which is Iterate over all the elements in this set, then
immediately stop after the first one.  s.get() or s.pick() are both
more succinct and more clear, saying Get me an arbitrary element from
this set.  To make matters worse, for x in s: break fails silently
when s is empty, and x = iter(s).next() raises a StopIteration
exception.  Neither is clear.
   The obvious way, for newcomers, of achieving the effect is:

 x = s.pop()
 s.add(x)

... and that's simply horrible in terms of efficiency.  So the
obvious way of doing it in Python is wrong(TM), and the correct
way of doing it is obscure and raises misleading exceptions.

I suppose, mulling things over, the method should be called
.pick(), which avoids any confusion with .get().  And, as I've stated,
I think it should return a member of the set, with no guarantees what
member of the set is returned.  It could be the same one every time,
or a random one, or the last one placed in the set.
   For cases where people want to cycle through the members of the set
in a predictable order, they can either copy the contents into a list
(sorted or unsorted) *or* subclass set and override the .pick() method
to place stronger guarantees on the API.

So, summarizing my responses:

1) the two succinct ways are unclear and not immediately obvious
2) the existing methods aren't needed for other objects
3) set implementations in other languages are irrelevant
4) this is a small, targeted change which not make the API disordered or unruly
5) could very well be needed for optimization, in cases where
constructing an iterator is expensive
6) there have been several real-world examples posted which would be
improved by this change

-- Chris
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Retrieve an arbitrary element from a setwithoutremoving it

2009-11-05 Thread Alexander Belopolsky
On Thu, Nov 5, 2009 at 3:43 PM, Chris Bergstresser ch...@subtlety.com wrote:
 .. and x = iter(s).next() raises a StopIteration
 exception.

And that's why the documented recipe should probably recommend
next(iter(s), default) instead.  Especially because iter(s).next() is
not even valid code in 3.0.
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Retrieve an arbitrary element from a setwithoutremoving it

2009-11-05 Thread Martin v. Löwis
I feel pretty strongly that it's a wart in the language, and a
 sufficiently strong one that it should be remedied.  I'm happy to
 champion it, but haven't the faintest idea what that entails.

There are two ways

a) write a library that provides what you want, publish it on PyPI,
   and report back in a few years of how many users your library has,
   what they use it for, and why it should become builtin
b) write a PEP, wait a few years for the language moratorium to be
   lifted, provide an implementation, and put the PEP for pronouncement.
   Careful reading of the Moratorium PEP may allow shortening of the
   wait.

In any case, it seems that this specific change will see some
opposition. So you will need to convince the opposition, one way or
the other.

The obvious way, for newcomers, of achieving the effect is:
 
  x = s.pop()
  s.add(x)
 
 ... and that's simply horrible in terms of efficiency.  So the
 obvious way of doing it in Python is wrong(TM), and the correct
 way of doing it is obscure and raises misleading exceptions.

If you chose to write a PEP, include a proof that this approach is
indeed horrible in terms of efficiency; I question that.

Regards,
Martin
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Retrieve an arbitrary element from a setwithoutremoving it

2009-11-05 Thread Chris Bergstresser
On Thu, Nov 5, 2009 at 3:21 PM, Martin v. Löwis mar...@v.loewis.de wrote:
 There are two ways

 a) write a library that provides what you want, publish it on PyPI,
   and report back in a few years of how many users your library has,
   what they use it for, and why it should become builtin

This clearly isn't called for in this case.  We're talking about a
single function on a collection.  In this case, importing an
alternative set API (and maintaining the dependency) is more work than
just writing your own workaround.  The purpose of adding a method is
to prevent the need of everyone writing their own workaround.

 b) write a PEP, wait a few years for the language moratorium to be
   lifted, provide an implementation, and put the PEP for pronouncement.
   Careful reading of the Moratorium PEP may allow shortening of the
   wait.

   Clearly, I'll need to write up the PEP.

 In any case, it seems that this specific change will see some
 opposition. So you will need to convince the opposition, one way or
 the other.

   I doubt some of the people on either side are going to be
convinced.  I'd settle for convincing most of the fence-sitters, along
with a few of the loyal opposition.

-- Chris
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Retrieve an arbitrary element from a setwithoutremoving it

2009-11-05 Thread Raymond Hettinger


[Chris Bergstresser]
 Clearly, I'll need to write up the PEP.

Why not write a short, fast get_first() function for your utils directory and 
be done with it?
That could work with sets, mappings, generators, and other containers and 
iterators.
No need to fatten the set/frozenset API for something so trivial and so rarely 
needed.


Raymond
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Retrieve an arbitrary element from a setwithoutremoving it

2009-11-05 Thread geremy condra
On Thu, Nov 5, 2009 at 4:09 PM, Alexander Belopolsky
alexander.belopol...@gmail.com wrote:
 On Thu, Nov 5, 2009 at 3:43 PM, Chris Bergstresser ch...@subtlety.com wrote:
 .. and x = iter(s).next() raises a StopIteration
 exception.

 And that's why the documented recipe should probably recommend
 next(iter(s), default) instead.  Especially because iter(s).next() is
 not even valid code in 3.0.

This seems reasonably legible to you? Strikes me as coding by
incantation. Also, while I've heard people say that the naive
approach is slower, I'm not getting that result. Here's my test:


 smrt = timeit.Timer(next(iter(s)), s=set(range(100)))
 smrt.repeat(10)
[1.2845709323883057, 0.60247397422790527, 0.59621405601501465,
0.59133195877075195, 0.58387589454650879, 0.56839084625244141,
0.56839680671691895, 0.56877803802490234, 0.56905913352966309,
0.56846404075622559]

 naive = timeit.Timer(x=s.pop();s.add(x), s=set(range(100)))
 naive.repeat(10)
[0.93139314651489258, 0.53566789627075195, 0.53674602508544922,
0.53608798980712891, 0.53634309768676758, 0.53557991981506348,
0.53578495979309082, 0.53666114807128906, 0.53576493263244629,
0.53491711616516113]


Perhaps my test is flawed in some way?

Geremy Condra
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Retrieve an arbitrary element from a setwithoutremoving it

2009-11-05 Thread James Y Knight

On Nov 5, 2009, at 6:04 PM, geremy condra wrote:

Perhaps my test is flawed in some way?


Yes: you're testing the speed of something that makes absolutely no  
sense to do in a tight loop, so *who the heck cares how fast any way  
of doing it is*!


Is this thread over yet?

James
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Retrieve an arbitrary element from a setwithoutremoving it

2009-11-05 Thread Yuvgoog Greenle
On Fri, Nov 6, 2009 at 1:17 AM, James Y Knight f...@fuhm.net wrote:


 Is this thread over yet?


Sorry, I just had to point out that pop/add has a side effect that would be
apparent on a set that multiple threads access - it loses an item and then
gets it back. Sounds like a sleeper race condition that's going to be rare
but extremely hard to find if it does occur. Crooked as a gil.

--yuv
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Retrieve an arbitrary element from a setwithoutremoving it

2009-11-05 Thread geremy condra
On Thu, Nov 5, 2009 at 6:17 PM, James Y Knight f...@fuhm.net wrote:
 On Nov 5, 2009, at 6:04 PM, geremy condra wrote:

 Perhaps my test is flawed in some way?

 Yes: you're testing the speed of something that makes absolutely no sense to
 do in a tight loop, so *who the heck cares how fast any way of doing it is*!

 Is this thread over yet?

 James

I'm testing the speed because the claim was made that the pop/add
approach was inefficient. Here's the full quote:

The obvious way, for newcomers, of achieving the effect is:

  x = s.pop()
  s.add(x)

 ... and that's simply horrible in terms of efficiency.  So the
 obvious way of doing it in Python is wrong(TM), and the correct
 way of doing it is obscure and raises misleading exceptions.

Since I use this in a graph theory library that I am currently trying
to scale to handle 300 million node simulations, and this is used in
several relevant algorithms, I was concerned.

Geremy Condra
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Retrieve an arbitrary element from a setwithoutremoving it

2009-11-05 Thread Chris Bergstresser
On Thu, Nov 5, 2009 at 6:30 PM, geremy condra debat...@gmail.com wrote:
 I'm testing the speed because the claim was made that the pop/add
 approach was inefficient. Here's the full quote:

    The obvious way, for newcomers, of achieving the effect is:

  x = s.pop()
  s.add(x)

 ... and that's simply horrible in terms of efficiency.  So the
 obvious way of doing it in Python is wrong(TM), and the correct
 way of doing it is obscure and raises misleading exceptions.

   I was talking mainly from a theoretical standpoint, and because the
library I'm working on is designed to work seamlessly over the
network.  In those cases, where the set the user is working with is
actually a proxy object across the wire, the time to acquire the
locks, remove the object, release the locks, reacquire the locks, add
the object, then rerelease the locks is *significantly* more expensive
than just noting the set hasn't changed and returning a cached object
from it.

-- Chris
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Retrieve an arbitrary element from a setwithoutremoving it

2009-11-05 Thread John Arbash Meinel
geremy condra wrote:
 On Thu, Nov 5, 2009 at 4:09 PM, Alexander Belopolsky
 alexander.belopol...@gmail.com wrote:
 On Thu, Nov 5, 2009 at 3:43 PM, Chris Bergstresser ch...@subtlety.com 
 wrote:
 .. and x = iter(s).next() raises a StopIteration
 exception.
 And that's why the documented recipe should probably recommend
 next(iter(s), default) instead.  Especially because iter(s).next() is
 not even valid code in 3.0.
 
 This seems reasonably legible to you? Strikes me as coding by
 incantation. Also, while I've heard people say that the naive
 approach is slower, I'm not getting that result. Here's my test:
 
 
 smrt = timeit.Timer(next(iter(s)), s=set(range(100)))
 smrt.repeat(10)
 [1.2845709323883057, 0.60247397422790527, 0.59621405601501465,
 0.59133195877075195, 0.58387589454650879, 0.56839084625244141,
 0.56839680671691895, 0.56877803802490234, 0.56905913352966309,
 0.56846404075622559]
 
 naive = timeit.Timer(x=s.pop();s.add(x), s=set(range(100)))
 naive.repeat(10)
 [0.93139314651489258, 0.53566789627075195, 0.53674602508544922,
 0.53608798980712891, 0.53634309768676758, 0.53557991981506348,
 0.53578495979309082, 0.53666114807128906, 0.53576493263244629,
 0.53491711616516113]
 
 
 Perhaps my test is flawed in some way?
 
 Geremy Condra

Well, it also uses a fairly trivial set. 'set(range(100))' is going to
generally have 0 collisions and everything will hash to a unique bucket.
 As such, pop ing an item out of the hash is a single val = table[int 
mask]; table[int  mask] = _dummy, and then looking it up again
requires 2 table lookups (one finds the _dummy, the next finds that the
chain is broken and can rewrite the _dummy.)

However, if a set is more full, or has more collisions, or ... then
pop() and add() become relatively more expensive.

Surprising to me, is that next(iter(s)) was actually slower than
.pop() + .add() for 100 node set in my testing:

$ alias TIMEIT=python -m timeit -s 's = set(range(100)'
$ TIMEIT x = next(iter(s))
100 loops, best of 3: 0.263 usec per loop

$ TIMEIT x = s.pop(); s.add(x)
100 loops, best of 3: 0.217 usec per loop

though both are much slower than the fastest we've found:
$ TIMEIT for x in s: break
1000 loops, best of 3: 0.0943 usec per loop


now, what about a set with *lots* of collisions. Create 100 keys that
all hash to the same bucket:
aliase TIMEIT=python -m timeit -s 's = set([x*1024*1024 for x in
range(100)])'
$ TIMEIT x = next(iter(s))
100 loops, best of 3: 0.257 usec per loop

$ TIMEIT x = s.pop(); s.add(x)
100 loops, best of 3: 0.218 usec per loop

I tried a few different ways, and I got the same results, until I did:

$ python -m timeit -s s = set(range(10, 1000100)) next(iter(s))
1000 loops, best of 3: 255 usec per loop

 Now something seems terribly wrong here. next(iter(s)) suddenly
jumps up from being  0.3 us, to being more than 200us. Or ~1000x slower.

I realize the above has 900k keys, which is big. But 'next(iter(s))'
should only touch 1, and, in fact, should always return the *first*
entry. My best guess is just that the *first* entry in the internal set
table is no longer close to offset 0, which means that 'next(iter(s))'
has to evaluate a bunch of table slots before it finds a non-empty entry.


Anyway, none of the proposals will really ever be faster than:
  for x in s: break

It is a bit ugly of a construct, but you don't have an attribute lookup,
etc. As long as you don't do:
  for x in s: pass

Then it stays nice and fast.

John
=:-
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Retrieve an arbitrary element from a setwithoutremoving it

2009-11-05 Thread geremy condra
On Thu, Nov 5, 2009 at 11:21 PM, John Arbash Meinel
john.arbash.mei...@gmail.com wrote:
 geremy condra wrote:
 On Thu, Nov 5, 2009 at 4:09 PM, Alexander Belopolsky
 alexander.belopol...@gmail.com wrote:
 On Thu, Nov 5, 2009 at 3:43 PM, Chris Bergstresser ch...@subtlety.com 
 wrote:
 .. and x = iter(s).next() raises a StopIteration
 exception.
 And that's why the documented recipe should probably recommend
 next(iter(s), default) instead.  Especially because iter(s).next() is
 not even valid code in 3.0.

 This seems reasonably legible to you? Strikes me as coding by
 incantation. Also, while I've heard people say that the naive
 approach is slower, I'm not getting that result. Here's my test:


 smrt = timeit.Timer(next(iter(s)), s=set(range(100)))
 smrt.repeat(10)
 [1.2845709323883057, 0.60247397422790527, 0.59621405601501465,
 0.59133195877075195, 0.58387589454650879, 0.56839084625244141,
 0.56839680671691895, 0.56877803802490234, 0.56905913352966309,
 0.56846404075622559]

 naive = timeit.Timer(x=s.pop();s.add(x), s=set(range(100)))
 naive.repeat(10)
 [0.93139314651489258, 0.53566789627075195, 0.53674602508544922,
 0.53608798980712891, 0.53634309768676758, 0.53557991981506348,
 0.53578495979309082, 0.53666114807128906, 0.53576493263244629,
 0.53491711616516113]


 Perhaps my test is flawed in some way?

 Geremy Condra

 Well, it also uses a fairly trivial set. 'set(range(100))' is going to
 generally have 0 collisions and everything will hash to a unique bucket.
  As such, pop ing an item out of the hash is a single val = table[int 
 mask]; table[int  mask] = _dummy, and then looking it up again
 requires 2 table lookups (one finds the _dummy, the next finds that the
 chain is broken and can rewrite the _dummy.)

 However, if a set is more full, or has more collisions, or ... then
 pop() and add() become relatively more expensive.

 Surprising to me, is that next(iter(s)) was actually slower than
 .pop() + .add() for 100 node set in my testing:

 $ alias TIMEIT=python -m timeit -s 's = set(range(100)'
 $ TIMEIT x = next(iter(s))
 100 loops, best of 3: 0.263 usec per loop

 $ TIMEIT x = s.pop(); s.add(x)
 100 loops, best of 3: 0.217 usec per loop

 though both are much slower than the fastest we've found:
 $ TIMEIT for x in s: break
 1000 loops, best of 3: 0.0943 usec per loop


 now, what about a set with *lots* of collisions. Create 100 keys that
 all hash to the same bucket:
 aliase TIMEIT=python -m timeit -s 's = set([x*1024*1024 for x in
 range(100)])'
 $ TIMEIT x = next(iter(s))
 100 loops, best of 3: 0.257 usec per loop

 $ TIMEIT x = s.pop(); s.add(x)
 100 loops, best of 3: 0.218 usec per loop

 I tried a few different ways, and I got the same results, until I did:

 $ python -m timeit -s s = set(range(10, 1000100)) next(iter(s))
 1000 loops, best of 3: 255 usec per loop

  Now something seems terribly wrong here. next(iter(s)) suddenly
 jumps up from being  0.3 us, to being more than 200us. Or ~1000x slower.

 I realize the above has 900k keys, which is big. But 'next(iter(s))'
 should only touch 1, and, in fact, should always return the *first*
 entry. My best guess is just that the *first* entry in the internal set
 table is no longer close to offset 0, which means that 'next(iter(s))'
 has to evaluate a bunch of table slots before it finds a non-empty entry.


 Anyway, none of the proposals will really ever be faster than:
  for x in s: break

 It is a bit ugly of a construct, but you don't have an attribute lookup,
 etc. As long as you don't do:
  for x in s: pass

 Then it stays nice and fast.

 John
 =:-

Thanks for the info. Doing a bit more digging it appears that taking
the lookup out of the picture puts the pop/add more or less on par
with the for x in s: break solution, with the former behaving more
predictably and the latter averaging marginally faster times. Either
way, the loss in readability isn't worth it to me to get the minor
improvement in performance.

Given that adding a set.pick method would incur half the lookup
cost that pop/add does, I think its reasonable to say that even
using the fastest method proposed to implement it won't differ
all that greatly from the least efficient proposal, and that its
therefore pointless to consider set.pick from an optimisation
standpoint. Aesthetics, of course, are another thing altogether.

Geremy Condra
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Retrieve an arbitrary element from a setwithoutremoving it

2009-11-05 Thread Steven D'Aprano
On Fri, 6 Nov 2009 10:52:54 am Yuvgoog Greenle wrote:
 On Fri, Nov 6, 2009 at 1:17 AM, James Y Knight f...@fuhm.net wrote:
  Is this thread over yet?

 Sorry, I just had to point out that pop/add has a side effect that
 would be apparent on a set that multiple threads access - it loses an
 item and then gets it back. Sounds like a sleeper race condition
 that's going to be rare but extremely hard to find if it does occur.
 Crooked as a gil.


Surely Raymond's suggestion also suffers from a similar race condition?

for x in set:
return x

creates a set_iterator. If another thread modifies the original set 
after the set_iterator is created but before the return, you would get 
a mysterious and almost impossible to debug RuntimeError.



-- 
Steven D'Aprano
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Retrieve an arbitrary element from a setwithoutremoving it

2009-11-04 Thread Raymond Hettinger



[Steven D'Aprano]

Anyway, given the level of opposition to the suggestion, I'm no longer
willing to carry the flag for it. If anyone else -- perhaps the OP --
feels they want to take it any further, be my guest.


[geremy condra]

I've said before that I'd like there to be one, standard way of
doing this. A function call- set.pick() seems reasonably named
to me- is probably the cleanest way to do that. Absent that,
an example in the docs that illustrates the preferred idiom
would be great. 


Summarizing my opposition to a new set method:
1) there already are at least two succinct ways to get the same effect
2) those ways work with any container, not just sets
3) set implementations in other languages show that this isn't needed.
4) there is value to keeping the API compact
5) isn't needed for optimization (selecting the same value in a loop makes no 
sense)
6) absence of real-world code examples that would be meaningfully improved

I would be happy to add an example to the docs so that this thread
can finally end.


Raymond
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Retrieve an arbitrary element from a setwithoutremoving it

2009-11-04 Thread Eric Smith

Raymond Hettinger wrote:

Summarizing my opposition to a new set method:
1) there already are at least two succinct ways to get the same effect
2) those ways work with any container, not just sets
3) set implementations in other languages show that this isn't needed.
4) there is value to keeping the API compact
5) isn't needed for optimization (selecting the same value in a loop 
makes no sense)

6) absence of real-world code examples that would be meaningfully improved

I would be happy to add an example to the docs so that this thread
can finally end.


Please do!

Eric.

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Retrieve an arbitrary element from a setwithoutremoving it

2009-11-04 Thread Terry Reedy

Eric Smith wrote:

Raymond Hettinger wrote:

Summarizing my opposition to a new set method:
1) there already are at least two succinct ways to get the same effect
2) those ways work with any container, not just sets
3) set implementations in other languages show that this isn't needed.
4) there is value to keeping the API compact
5) isn't needed for optimization (selecting the same value in a loop 
makes no sense)
6) absence of real-world code examples that would be meaningfully 
improved


Agreed


I would be happy to add an example to the docs so that this thread
can finally end.


Please do!


Yes!

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Retrieve an arbitrary element from a setwithoutremoving it

2009-11-03 Thread Raymond Hettinger

Personally, I have found it useful in doco I write to have a section on
Common Tasks, with recommended/suggested examples of how to do them and
short rationale for the chosen method. It seems to me that if .pick()
is frequently desired and None of the standard solutions are obvious
or easily discoverable then they should be _made_ so with documentation.

Comments?


That's reasonable.  It's in the same category as people who can't figure-out how to clear a list because they forgot about slice 
notation.


Raymond


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Retrieve an arbitrary element from a setwithoutremoving it

2009-11-01 Thread Steven D'Aprano
It seems that even those originally asking for set retrieval have gone 
silent, so I guess this isn't going anywhere.

However, for the benefit of future discussions (because I'm sure this 
question will be raised again), I'd like to answer a couple of points 
raised by Stephen.

On Sat, 31 Oct 2009 01:42:46 pm Stephen J. Turnbull wrote:
   If folks prefer a different name, by all means suggest it. I like
   the name suggested by Wikipedia, pick.

 Pick has a connotation of removal in many contexts: Pick a card,
 any card.  

And in just as many, there is no connotation of removal. Pick a 
colour.

For what it's worth, Forth and similar stack-based languages usually 
have a function pick equivalent to pop-without-removal. pick seems to 
be a standard term for this behaviour.


 Like get, to me it has a flavor of 
 according to some criterion: a band of picked men.  I would 
 expect a pick or get operation to take an optional predicate. 

I think you are underestimating the power of context here. In practice, 
method names are interpreted in the context of the data being operated 
on. We don't get confused that ConfigParser.get() has a different 
signature to dict.get(), which is different again from Queue.get().

list.pop() takes an index; dict.pop() takes a key and an optional second 
argument; set.pop() takes no argument at all. Is anyone actually 
confused by this?

If not, why would set.get() be more confusing? I think far too many 
people say this is confusing when what they really mean is I don't 
like this.


 The use case I have so far
 observed people to have in mind is a cast from an equivalence class
 to a representative member.

The difficulty is that we actually have two related, but different, 
behaviours, and sadly we've conflated the two of them by using the same 
name get for both.

One behaviour is what Wikipedia calls pick:

set.pick() - return an arbitrary element from set without removing it

This is not useful for the cast you describe. For that, you need a 
method that takes an argument. I'm going to call it retrieve, to 
avoid the baggage of get:

set.retrieve(x) - return the element from set equal to x


In the first case, it seems self-evident to me that having arbitrary 
mean the same element each time it is called really would condemn 
pick() to be unused, but I don't care enough to argue.

In the second case, the retrieval isn't arbitrary, so this is not a 
problem that needs solving.


   No. Since sets are unordered, there's no way to seek to a specific
   element.

 I think people will realize that in fact *these* sets are ordered and
 they will demand a seek function for various practical purposes.

We can iterate over dicts, and the order is arbitrary but consistent. 
Where are the people clamouring for dict.seek()?




-- 
Steven D'Aprano
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Retrieve an arbitrary element from a setwithoutremoving it

2009-11-01 Thread Willi Richert
Am Sonntag, 1. November 2009 12:21:15 schrieben Sie:
 It seems that even those originally asking for set retrieval have gone 
 silent

Nope. Stilll following and waiting for the verdict of the community after 
having filed the corpus delicti [1] 

wr

[1]: http://bugs.python.org/issue7212

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Retrieve an arbitrary element from a setwithoutremoving it

2009-10-30 Thread Steven D'Aprano
On Fri, 30 Oct 2009 03:00:27 pm Raymond Hettinger wrote:

[skipping to the last paragraph]

 Sorry for so many questions

Don't be sorry. These are good questions, and I'll try to answer them. 
But keep in mind that this isn't my proposal -- I vary between +0 and 
+1 on the proposal depending on the time of the day *wink*


 (1) it will only fail if the set is empty;

 Just like next() except that next() gives you the option to supply a
 default and can be used on any iterator (perhaps iter(s) or
 itertools.cycle(s) etc).

Yes. I had considered suggesting that sets become their own iterator, so 
you could do the following:

 s = set([1,2])
 next(s)
2

but that leads to the problem that if you accept a set from somewhere 
else, you have no idea whether next(s) will succeed or raise 
StopIteration. It may have already been exhausted before it reached 
you.


  (2) it should be efficient;

 Is this about optimization?

Not primarily. Perhaps I should have said it should not be inefficient. 
It's primarily about there being One Obvious Way to extract an 
arbitrary item from a set -- this is a reoccurring question on 
comp.lang.python. Being efficient is a bonus, but it shouldn't be 
inefficient.


 I wouldn't expect x=s.get() to beat for x in s: break.
 Attribute lookup and method calls usually are slower
 than equivalents using built-in syntax with specific opcodes.

Obviously any hypothetical solution would need to be benchmarked, but I 
wouldn't be concerned if s.get() was marginally slower than for `x in 
s: break`. 

When people are left to come up with their own ad hoc solutions, we've 
seen some poor solutions. I've seen, possibly in this very thread, the 
following O(N) suggestions:

for x in s:
pass

x = list(s)[0]

The usual technique people tend to come up with is:

x = s.pop()
s.add(x)

Which strikes many people (including myself) as inelegant. Surely get 
an element is a more fundamental operation than get an element and 
remove it?


  (3) if you call it repeatedly on a set without modifying the set,
  you will cycle through each element in turn in some unspecified
  arbitrary order.

 What's wrong with using next()?  That is what it's for.

You can't call next() on a set. You have to call next(iter(set)). From 
Python 3.0:

 next(set([1,2]))
Traceback (most recent call last):
  File stdin, line 1, in module
TypeError: set object is not an iterator
 next(iter(set([1,2])))
1

Unless I missed something, this is so unobvious that nobody has 
suggested it in the thread yet.


 What about this proposal is specific to sets, i.e. why don't you want
 the same thing for lists. tuples, strings, file objects, or any other
 iterable?

Sequences such as lists, tuples and strings have easy random access. 
It's easy to get an arbitrary element: pick an index i by whatever 
method you like, and get seq[i]. Many file objects are similar, you 
have random access with seek().

It is unpractical and an over-generalisation to apply this to general 
iterables, for reasons I'm sure I don't need to go into. But sets and 
frozensets aren't lazily generated streams, they actually do store the 
elements inside their data structure in the same way that lists do.


 Does this proposal pass the test of being self-descriptive?  Can you
 write a code fragment that exercises the cycling behavior, show it to
 another programmer, and have them correctly deduce what the code does
 (i.e. that different values are returned, that it fails when the set
 it empty, that it wraps around and never terminates)?  Can they
 readily differentiate it for dict.get() which has decidedly different
 semantics?

I don't expect ConfigParser.get() to have the same semantics as 
dict.get(), and they're much more closely related than sets and dicts. 
I don't understand why so many people apparently have a problem with 
the name get(), but that's okay, I'm not wedded to the idea either. If 
folks prefer a different name, by all means suggest it. I like the name 
suggested by Wikipedia, pick.

As for being self-descriptive, I don't know, I haven't tried the 
experiment.


  To clarify point 3, given:
 
  x = set.get()
  y = set.get()
 
  then x and y will only be the same element if set has length one.

 So, it can't even be used for looping through a set because there is
 no termination?

That's a feature, not a bug! This is not intended to be a replacement 
for standard set iteration. Anyone writing the following:

for _ in len(s):
process(s.get())

instead of 

for x in s:
process(x)

will be taken out and slapped with a large fish.

My reasoning for the given behaviour is as follows:

* If you want the same element twice from a set, the One Obvious Way is 
to get an element from the set (somehow!) and assign it to a local 
variable:

x = s.get()
process(x)
# later...
process(x)

So having get() return the same value each time is not useful or 
necessary.

* If you want a random element on each call, the One Obvious Way is 

Re: [Python-Dev] Retrieve an arbitrary element from a setwithoutremoving it

2009-10-30 Thread Nick Coghlan
Steven D'Aprano wrote:
 So you want to introduce additional, hidden state to sets? (to make
 sure that successive invocations return different values)
 
 If you can think of any other way to efficiently cycle over the elements 
 in a set, I'm all for it :)

for x in itertools.cycle(s):
  # this is an infinite loop

Having a pick() or get() method that returns an arbitrary member of a
set makes sense to me. Having any state on the set that guarantees
successive calls to get will return different values feels wrong -
creating an object with that extra state is what iter(s) is for.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
---
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Retrieve an arbitrary element from a setwithoutremoving it

2009-10-30 Thread Stephen J. Turnbull
Steven D'Aprano writes:

  The usual technique people tend to come up with is:
  
  x = s.pop()
  s.add(x)
  
  Which strikes many people (including myself) as inelegant. Surely get 
  an element is a more fundamental operation than get an element and 
  remove it?

Not in a literal urn or bag.  See sampling with replacement in any
statistics text.  Note that in order to have your cake and eat it
too there's an implicit copy operation (although in Python it will be
a cheap ref copy rather than an expensive object copy).

I'm afraid that the various conflicting intuitions here are all
correct.  That is going to make design impossible without some
compelling use cases.

   What's wrong with using next()?  That is what it's for.
  
  You can't call next() on a set. You have to call next(iter(set)).
  [...]
  Unless I missed something, this is so unobvious that nobody has
  suggested it in the thread yet.

I've seen it at least twice (Nick suggested it IIRC), although that
may have been on Python-Ideas (which is where this thread belongs
IMHO).

  If folks prefer a different name, by all means suggest it. I like
  the name suggested by Wikipedia, pick.

Pick has a connotation of removal in many contexts: Pick a card,
any card.  Pick it up off the floor (it's not in the set of things
on the floor any more).  Like get, to me it has a flavor of according
to some criterion: a band of picked men.  I would expect a pick or
get operation to take an optional predicate.  But then TOOWTDI is

for x in container:
if predicate(x):
break
else:
x = default_x# or perhaps raise in cases where there
 # theoretically should be an element

  My reasoning for the given behaviour is as follows:
  
  * If you want the same element twice from a set,

Nobody is asking for that AFAICS.  The use case I have so far observed
people to have in mind is a cast from an equivalence class to a
representative member.  They don't care whether the member is the same
or not.  If they wanted iterator behavior, getting it would not be a
problem.  next(iter(foo_set)) is TOOWTDI.  If they wanted cyclical
behavior, itertools.cycle.  If they wanted random behavior, make a
list and choose from it.

In one leading subcase, the equivalence class is a singleton.  In this
use case what people really want, I suspect, is behavior like Python
strings and characters: a singleton set should provide the same
operations as its unique element and vice versa.

  So having get() return the same value each time is not useful or 
  necessary.

Which is not the point.  The point is that having get return a
different value if possible is not useful or necessary in at least
some of the leading use cases.  OTOH, we don't know enough about
weird use cases where having a different value returned is important
to decide how that should be done, where weird means outside of the
list of already Obvious Ways.

  No. Since sets are unordered, there's no way to seek to a specific 
  element.

I think people will realize that in fact *these* sets are ordered and
they will demand a seek function for various practical purposes.

   Sorry for so many questions, but I honestly think there are too many
   unresolved design issues.

I have to agree with Raymond.

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Retrieve an arbitrary element from a setwithoutremoving it

2009-10-29 Thread Raymond Hettinger


[Steven D'Aprano]

The suggested
semantics for set.get() with no arguments, as I understand them, are:

(1) it will only fail if the set is empty;


Just like next() except that next() gives you the option to supply a default
and can be used on any iterator (perhaps iter(s) or itertools.cycle(s) etc).



(2) it should be efficient;


Is this about optimization?

I wouldn't expect x=s.get() to beat for x in s: break.
Attribute lookup and method calls usually are slower
than equivalents using built-in syntax with specific opcodes.



(3) if you call it repeatedly on a set without modifying the set, you
will cycle through each element in turn in some unspecified arbitrary
order.


What's wrong with using next()?  That is what it's for.

What about this proposal is specific to sets, i.e. why don't you want the same thing for lists. tuples, strings, file objects, or 
any other iterable?


Does this proposal pass the test of being self-descriptive?  Can you write a code fragment that exercises the cycling behavior, show 
it to another programmer, and have them correctly deduce what the code does (i.e. that different values are returned, that it fails 
when the set it empty, that it wraps around and never terminates)?  Can they readily differentiate it for dict.get() which has 
decidedly different semantics?





To clarify point 3, given:

x = set.get()
y = set.get()

then x and y will only be the same element if set has length one.


So, it can't even be used for looping through a set because there is no 
termination?




I believe that the patch supplied by Willi Richart implemented these
behaviours.

http://bugs.python.org/issue7212


So you want to introduce additional, hidden state to sets? (to make sure that 
successive invocations return different values)

Do you want a thread local version too? (so that two threads can call gets without stomping on each other's guarantees that 
successive calls will produce distinct elements)


Do you have any real-world use-cases where next(), for-loops, or itertools 
wouldn't suffice?

Is there a precedent in *any* other language you've ever seen?  (setl has an arb function but it makes no promises about returning 
different values on consequetive calls; otherwise, I've never seen an equivalent in any other set implementation).


Do you think the return-different-values-on-successive-calls semantics is self-evident and non-magical as compared to a straight 
for-loop or next(it)?


ISTM, that when streams have non-destructive getters with self-advancing pointers, they also have a seek() function so that it can 
be controlled.  Will this proposal need a seek() method too?


Sorry for so many questions, but I honestly think there are too many unresolved design issues.  We've seen no real-world source code 
that would be improved fwith the proposal.  I think it sounds conceptually tempting and is fun to theorize about, but it actual 
implementation it will make sets more difficult to learn and it would quickly become a piece of rarely used, poorly understood 
cruft.



Raymond

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Retrieve an arbitrary element from a setwithoutremoving it

2009-10-27 Thread Raymond Hettinger


[geremy condra]

Was it ever decided whether this would fall under the moratorium?


Decided isn't the right word:
http://mail.python.org/pipermail/python-dev/2009-October/093373.html

FWIW, I'm a strong -1 on both proposals.

Just add a short get_one() function and a get_equivalent() recipe
to your utils directory.  That will get the job done (thought I don't
expect that you will *ever* make much use of either one).
No need to complexify a type that is currently very simple. 



Raymond



P.S.  get_equivalent:  http://code.activestate.com/recipes/499299/
get_one = lambda s, default=None:  next(iter(s), default)

The first works with all type that defines __contains__.
The second works for any iterable.
Neither of these concepts are specific to set objects.
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Retrieve an arbitrary element from a setwithoutremoving it

2009-10-27 Thread geremy condra
On Tue, Oct 27, 2009 at 3:49 PM, Raymond Hettinger pyt...@rcn.com wrote:

 [geremy condra]

 Was it ever decided whether this would fall under the moratorium?

 Decided isn't the right word:
 http://mail.python.org/pipermail/python-dev/2009-October/093373.html


snip

I'm unclear- does that imply that this is this going to be decided on a
case-by-case basis in the future or have the details for the big M just
not been hammered out yet?

Geremy Condra
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com