[Python-ideas] Re: giving set.add a return value

2020-07-06 Thread Chris Angelico
On Tue, Jul 7, 2020 at 12:53 PM Stephen J. Turnbull
 wrote:
>
> Christopher Barker writes:
>
>  > I meant in the context of putting things in, or testing whether they are
>  > in, sets. Not in any other context.
>
> OK, but restricting context that way is asking for eventual trouble,
> because there's one more thing you can do with sets: get stuff out of
> them (by iterating).
>
>  > But to see if I understand correctly, you couldn’t quite use a set to
>  > intern objects: once 2.0 was in there you could not put 2 in as
>  > well.
>
> Good point.  You could delete one and replace it with the other, but
> not have both.  That does kind of defeat the whole idea of the
> "intern" nomenclature.
>

"Interning" is usually only done with strings. To get a similar
effect, you could put (type(x), x) into the tuple, which would allow
you to have both the int 2 and the float 2.0 in the interning set. Or
have separate sets for different types.

ChrisA
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/BIWJ2KWJ5FLEU2ZCBKU26FY6LSESPBLE/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: giving set.add a return value

2020-07-06 Thread Stephen J. Turnbull
Christopher Barker writes:

 > I meant in the context of putting things in, or testing whether they are
 > in, sets. Not in any other context.

OK, but restricting context that way is asking for eventual trouble,
because there's one more thing you can do with sets: get stuff out of
them (by iterating).

 > But to see if I understand correctly, you couldn’t quite use a set to
 > intern objects: once 2.0 was in there you could not put 2 in as
 > well.

Good point.  You could delete one and replace it with the other, but
not have both.  That does kind of defeat the whole idea of the
"intern" nomenclature.

Steve
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/QUFENUDPGNQZHH35F7ZA2SBLTAAH4FGV/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: giving set.add a return value

2020-07-05 Thread Christopher Barker
Sorry, this got lost in the shuffle without being sent, and the
conversation seems to have died down. but since I wrote it ...

On Sat, Jul 4, 2020 at 8:25 AM Stephen J. Turnbull <
turnbull.stephen...@u.tsukuba.ac.jp> wrote:

> Christopher Barker writes:
>
>  > I am really confused here -- I'm suggesting that is should NOT be
> possible
>  > to tell the difference.
>
> But it is possible to tell the difference, so that suggestion is moot.


I meant in the context of putting things in, or testing whether they are
in, sets. Not in any other context.

There are languages where there is no numeric tower, just numbers, and
> [0][0], [0][0.0], and [0][0+0j] all just work.  Python isn't one of
> them.  Fewer places you can tell in Python 3 than in Python 2, I
> believe, but several remain.


Sure, but putting them in sets, or using them as a key in a dict is not one
of them.

set.intern does yield the same result regardless of what's passed in:
> whichever is in the set.  That's *why* I can call it "intern".[1]


Fair enough. You can of course propose any functionality you want. I’m
suggesting that my understanding of what you are proposing doesn’t match
well with the rest of how sets work.

But to see if I understand correctly, you couldn’t quite use a set to
intern objects: once 2.0 was in there you could not put 2 in as well. You
could know that 2.0 was already in there when you tried to add 2, but it
would still never be added.

It seems if you really want to intern objects, you’d need something kind of
like, but not quite, a set().

Note: this all is only an issue for object that are of a different type,
but can compare equal and have the same hash - maybe the numeric types are
the only ones in the stdlib for which this is the case. Nevertheless, they
are commonly used :-)

Nope: turns out NamedTuples and Tuple are the "same" as far as sets are
concerned as well.

In [66]: s = {p}


In [67]: s

Out[67]: {Point(x=2, y=3)}

In [68]: s.add((2,3))


In [69]: s

Out[69]: {Point(x=2, y=3)}

In [70]: s = {(2,3)}


In [71]: s.add(p)


In [72]: s

Out[72]: {(2, 3)}

So if you want something that treats different types with the same value as
different, you really will need a different storage mechanism than a set.

-CHB
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/JKK6J7CGWLLSOQATFD5JRL6VAS3YQGGI/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: giving set.add a return value

2020-07-03 Thread Random832
On Mon, Jun 29, 2020, at 04:51, Stephen J. Turnbull wrote:
> Of course, we may prefer a boolean return, despite the general rule
> about returning elements.  I'm single-threaded, and so agnostic on
> that. :-)  But if it turns out that somebody *wants* to check "2 is 
> 2.0",
> this .add_unique can serve both purposes.

I don't think this is a good idea - for one thing, an element return means you 
can't handle, or have to handle specially, None being in a set.

If we want a way to check the type or identity of an item in a set, I think it 
would be best to add another method to access the item. Or an operator - i.e. 
check if type(set[2]) is int or float, or whatever it is you want. I did once 
write a way to do that in O(1) time in unmodified python [abusing the __eq__ 
method of a custom class to  smuggle the value out] as an exercise, 
incidentally, and it seemed to work fine.
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/NBMII2CZRA7SSZSRMH3LTDUECUQV3BGB/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: giving set.add a return value

2020-07-02 Thread Stephen J. Turnbull
Christopher Barker writes:
 > On Mon, Jun 29, 2020 at 1:53 AM Stephen J. Turnbull <
 > turnbull.stephen...@u.tsukuba.ac.jp> wrote:

 > > But if it turns out that somebody *wants* to check "2 is 2.0",
 > > this .add_unique can serve both purposes.

 > can it though?

Yes.

 > (or should it?)

YMMV, but I think that's not our call.  This check *can* be done
without this facility, but AFAICS it would be O(n) (iterate over the
set, testing for equality, then testing for identity if found).
Providing this function simply optimizes that to O(1).

The argument against this isn't "should we distinguish int from float
in principle", it's "should we bother with a status return at all" and
"if we bother, why not KISS with a boolean return", don't you think?

 > -- the fact is that it was decided the 2 and 2.0 are equivalent as
 > members of a set (which makes great sense, there are they SAME
 > number, and the difference is an implementation detail),

No, it's not just an implementation detail.  If 2.0 can sneak into an
integer set, so can Inf and Nan, but any integer calculation using
members of that set will not be prepared for that.  And it's possible
that 2.001 can sneak in as well, changing an exact integer
calculation into floating point calculation with rather different
rules.

This is the same kind of argument used to justify zip(strict=True).
I.e, this only happens if there's a programmer error, and if it does,
the feature helps to debug that error.

 > I've lost track, but the answer to
 > 
 > a_set.add_unique(2)
 > should always be the same as
 > a_set.add_unique(2.0)
 > 
 > for the same set.

True.  What about that?  That's what makes it possible to tell whether
you added the same object or merely an equal one.  I don't have time
to think carefully about it, but it now occurs to me to spell
add_unique as "intern", as in the idiom

x = a_set.intern(x)

Of course we already have this for str, but only str, with sys.intern.
It would likely be a rare use case compared to strings, but in theory
any immutable could benefit from this.
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/TIHXYHI5H7U5SITKIYGHINYDFTVGV2MB/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: giving set.add a return value

2020-06-29 Thread Christopher Barker
On Mon, Jun 29, 2020 at 1:53 AM Stephen J. Turnbull <
turnbull.stephen...@u.tsukuba.ac.jp> wrote:

> But if it turns out that somebody *wants* to check "2 is 2.0",
> this .add_unique can serve both purposes.
>

can it though? (or should it?) -- the fact is that it was decided the 2 and
2.0 are equivalent as members of a set (which makes great sense, there are
they SAME number, and the difference is an implementation detail), so the
answer to:

2.0 in {2}
and
2 in {2.0}

is True

I've lost track, but the answer to

a_set.add_unique(2)
should always be the same as
a_set.add_unique(2.0)

for the same set.

-CHB


-- 
Christopher Barker, PhD

Python Language Consulting
  - Teaching
  - Scientific Software Development
  - Desktop GUI and Web Development
  - wxPython, numpy, scipy, Cython
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/G52XJGXY7HGYGSPNFEOPX53OPL3JYBY3/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: giving set.add a return value

2020-06-29 Thread Stephen J. Turnbull
Bar Harel writes:

 > The original example of "if dict.setdefault()" is problematic as well for
 > the exact same reason. You can't tell if the default value was already
 > there in the first place.

That's not a problem.  In both cases if you need an object that can't
possibly be there, you have

# 'default' by analogy to dict.setdefault.  It's a bad name for
# this argument to add_unique, and does not have the semantics of
# inserting default in the set.

olditem = someset.add_unique(thing, default=(sentinel:=object()))
if olditem is not sentinel:
# handle dupe

More likely you trust yourself not to insert it, and use a sentinel
defined previously.

Of course, we may prefer a boolean return, despite the general rule
about returning elements.  I'm single-threaded, and so agnostic on
that. :-)  But if it turns out that somebody *wants* to check "2 is 2.0",
this .add_unique can serve both purposes.
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/PV6JTCENHCHSRCY4B2YX4LZRNAEWDUFE/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: giving set.add a return value

2020-06-27 Thread Chris Angelico
On Sun, Jun 28, 2020 at 3:39 AM Steven D'Aprano  wrote:
>
> TL;DR
>
> If the `set.add` feature can return a flag stating whether or not the
> element was previously present or not cheaply and atomically, without
> slowing down the single-threaded case with locks, then that's great (as
> I already stated...) and this entire subthread about threading is
> irrelevant and you can stop reading now.
>
> If not, then we should consider carefully the consequences of adding a
> feature that will either slow down the common single-threaded case, or
> potentially be an attractive nuisance in the multi-threaded case.
>

CPython has the GIL. Every other Python will have some equivalent way
to ensure that its own internal data structures maintain their
integrity.

At some point, there will be a bit of code somewhere that decides
whether to change some bits in memory, or not change some bits in
memory. In CPython, there's basically no way that this can be done
safely without also guaranteeing the atomicity needed, and therefore
there is no problem. Other implementations may have something slightly
different, but the chances of a problem that doesn't also cause
fundamental internal issues would be extremely low.

The worst case scenario that I can imagine is a non-refcounting Python
(let's say it uses mark-and-sweep garbage collection), a set is
defined as a hash table, and insertion into it is performed by writing
a single pointer into a single cell in the base array. In a
multithreaded situation, you could have two threads simultaneously try
to add the same item, or simultaneously try to add two different
items. In order to guarantee that adding two items won't accidentally
overwrite one (which would imply that "s.add(1)" could silently
discard a completely unrelated value), there would need to be a safe
way to write one pointer into one cell with a guarantee that the cell
had previously been empty. And that is, in fact, precisely what is
needed for the proposed feature: "insert this, and be certain that it
hadn't previously been here". This is basically what "lock cmpxchg" is
for on the Intel CPUs, and equivalents on others.

There could be lock-free code before and after this (although in
CPython, the GIL basically means there won't be), but at some point,
there would need to be that "final action" that is atomic, and it
*already* needs to be. This feature does not increase the atomicity
requirements.

> which is why people much smarter than me have spent decades designing
> languages and execution models specifically to deal with concurrency and
> parallel computing safely:
>
> https://en.wikipedia.org/wiki/List_of_concurrent_and_parallel_programming_languages

Well... yes. That's exactly what I'm building on here. If you have a
Python implementation that includes threads, then this functionality
MUST already exist.

> or simply recommending that, when possible, we ought to avoid mutable
> shared state (in the same way we avoid GOTOs and global variables and
> other anti-patterns):
>
> https://jaxenter.com/akka-anti-patterns-shared-mutable-state-128827.html
>
> https://exploringjs.com/deep-js/ch_shared-mutable-state.html
>
> http://web.mit.edu/6.031/www/fa17/classes/20-thread-safety/

Right. Avoid it where possible, because it has many costs. But
fundamentally it can't be completely eliminated without some
ridiculous dances, and I'm assuming here that the set *is* being
mutated from multiple threads simultaneously. If it isn't (either
because it's local to one thread, or it's in a "one writer, many
readers" model), it's easy, and the only place that needs to have any
locking whatsoever is a hash resize or equivalent. This discussion is
all about the worst case scenario: two threads concurrently trying to
insert into the same set.

> I'm not sure why this is so controversial. I'm making a moderate claim
> here, threading is much harder to get right than sequential code.

And I'm making a counter-claim: threading is an already-solved
problem. Sure it's harder, but it's already harder and we're dealing
with it.

> Please let's not over-generalise here, I'm not opposed to learning in
> general, I'm pointing out that writing concurrent-style code for
> something which is trivially thought of as "if the element has not been
> seen, add it" is not the most natural way of thinking about the task.

Should you have to think about resource leakage with open() calls, or
do you just learn to follow best practice and use a 'with' block?
Sometimes we don't need to learn about concurrency to do the right
thing, we just learn what's the right thing, and if we're curious as
to WHY it's the right thing, we can *then* learn about concurrency.

> > > So `add` would have to be atomic at the C level. If it is already
> > > atomic, great, that's one objection neutralised. But if not, then will
> > > making it atomic have a performance impact?
> >
> > Question: If it ISN'T atomic at the C level, what happens?
>
> Sometimes it will return Fal

[Python-ideas] Re: giving set.add a return value

2020-06-27 Thread Steven D'Aprano
TL;DR

If the `set.add` feature can return a flag stating whether or not the 
element was previously present or not cheaply and atomically, without 
slowing down the single-threaded case with locks, then that's great (as 
I already stated...) and this entire subthread about threading is 
irrelevant and you can stop reading now.

If not, then we should consider carefully the consequences of adding a 
feature that will either slow down the common single-threaded case, or 
potentially be an attractive nuisance in the multi-threaded case.


On Sat, Jun 27, 2020 at 06:46:47PM +1000, Chris Angelico wrote:

> I grew up with concurrency being a perfectly normal thing. To me,
> EVERYTHING is concurrent. (Back then, it was single-core CPUs, so
> technically most machine code instructions could be assumed to be
> atomic, but at any higher level, you could be preempted at any
> moment.)

Cool, but it ignores the point that when you access the file system, 
you cannot avoid concurrency because that's what the OS gives you whether 
you want it or not. That's not what happens with adding elements to a 
set, unless you are sharing it between threads.

(For the record, I too grew up with concurrency being a perfectly normal 
thing. Every desktop OS I've ever touched has had some form of multi- 
processing or another. Classic MacOS had cooperative multiprocessing and 
desk accessories, DOS had TSRs, and the first computer I used for 
serious coding was Unix.)


> > Unless you are in the [Quadrant of Hell](https://imgur.com/a/jjA52vI)
> > you don't even need to think about this for sets. There are no Time
> > Of Check to Time Of Use issues worth thinking about in the given
> > example, and frankly Chris I doubt that you surround ever set access
> > with locks in single-threaded code because TOCTOU is "stronger" (more
> > important?) than writing clear and simple code. And if you do, you're
> > probably wondering why Python is so slow *wink*
> 
> And that is a fundamental difference between our philosophies. You
> consider threads to be terrifyingly scary, 

Oh I do, sometimes I just start screaming at the very thought. I've even 
been known to wet myself.


> and shared mutable data to be "hell". I consider this to be the normal 
> state of things, and that whenever you care, you take care. It 
> actually doesn't come up all that often.

I'm sure you're very good at what you do, but deadlocks, resource 
starvation, non-deterministic execution order, race conditions, etc are 
genuine problems for people, and concurrent code is much harder to write 
correctly than sequential code, and even more difficult to debug. 
Stories like these are hardly rare:

http://thedailywtf.com/articles/Sprite_Threading

http://worsethanfailure.com/articles/Flossin_0x27__My_Threads

Other people have suggested that concurrent code is often riddled with 
race conditions:

"Because concurrency is so incredibly difficult, many code bases I have 
worked on have these sorts of bugs….everywhere"

https://medium.com/dm03514-tech-blog/golang-candidates-and-contexts-a-heuristic-approach-to-race-condition-detection-e2b230e70d08

and the consequences of race conditions have sometimes been 
catastrophic:

https://www.securityfocus.com/news/8412

https://en.wikipedia.org/wiki/Therac-25

which is why people much smarter than me have spent decades designing 
languages and execution models specifically to deal with concurrency and 
parallel computing safely:

https://en.wikipedia.org/wiki/List_of_concurrent_and_parallel_programming_languages

or simply recommending that, when possible, we ought to avoid mutable 
shared state (in the same way we avoid GOTOs and global variables and 
other anti-patterns):

https://jaxenter.com/akka-anti-patterns-shared-mutable-state-128827.html

https://exploringjs.com/deep-js/ch_shared-mutable-state.html

http://web.mit.edu/6.031/www/fa17/classes/20-thread-safety/

I'm not sure why this is so controversial. I'm making a moderate claim 
here, threading is much harder to get right than sequential code.


> > But I'll grant you that I wasn't thinking about threaded code. The
> > example given was single-threaded, but maybe a better example would
> > involve threading. But that supports my point: thinking concurrently
> > doesn't come naturally.
> 
> Nor does thinking algebraically, nor thinking like a database, nor
> thinking in terms of diamond inheritance and an MRO. These things have
> to be learned. Is it a problem to have to learn things?

Maybe. It depends on the things you have to learn and why you need to. 
Personally I don't feel that learning category theory and monads just to 
be able to print "Hello World" would be a good use of my time.

https://en.wikipedia.org/wiki/Monad_%28functional_programming%29

Please let's not over-generalise here, I'm not opposed to learning in 
general, I'm pointing out that writing concurrent-style code for 
something which is trivially thought of as "if the element has not been 
seen, add it

[Python-ideas] Re: giving set.add a return value

2020-06-27 Thread Bar Harel
>
> Given dict.setdefault, a parallel method like set.add_unique, which
> returns the existing element or None, might be the better approach.
>

Although it breaks the convention of these methods returning the item, I'm
not sure returning the element is a good idea.
"set.add_unique(None)"  will return None.

The original example of "if dict.setdefault()" is problematic as well for
the exact same reason. You can't tell if the default value was already
there in the first place.

@Chris
The set.add is already threadsafe as GIL is not checked while running the
function, but it shouldn't be relied upon, if only because of ABC not
guaranteeing other implementations to be threadsafe, let alone other python
implementations as well.

De-facto I've never seen TOCTOU issues with these things as they don't
contain external resource access and are usually surrounded by locks for
plenty of other reasons, but theoretically it can be useful to prevent the
need of locks.

Either way, modifying the existing method is probably a no-no - it will
break backwards compatibility in both Python and C-API.

Regarding a new method, what are the actual arguments in favor? Apart from
saving one line of code ofc. Besides, I don't believe trying to 1-line it
makes the code any clearer, but that's personal taste.

>
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/WQDGF25GDFID73F4FKCJAGBH6T4ETGQX/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: giving set.add a return value

2020-06-27 Thread Stephen J. Turnbull
Dan Sommers writes:

 > Perhaps by your standard [it's not EAFP].  The code I wrote[:]

already_there = seen.add(element)
if already_there:
# handle the duplicate case

 > performs an operation, and then asks whether or not some condition
 > was met, as opposed to asking whether the condition is met first,
 > and then conditionally performing the operation.

Aside: Checking status returns seems like LBYL to me, too.  EAFP
doesn't ask at all, it responds to a tug on its sleeve. :-)

I think the point that we could cheaply have a (nearly?) atomic
"check_presence_and_add_if_not" operation is valid.  I can see how
that might be useful in coordinating resource access, for example.

Given dict.setdefault, a parallel method like set.add_unique, which
returns the existing element or None, might be the better approach.
This would also allow the programmer to check for (2 is not 2.0) if he
wants to.
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/CMBPSULCM42B6EYIP2N7EWZBYQ7YUSGR/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: giving set.add a return value

2020-06-27 Thread Chris Angelico
On Sat, Jun 27, 2020 at 10:06 PM Rob Cliffe  wrote:
>
>
>
> On 27/06/2020 05:33, Chris Angelico wrote:
> > On Sat, Jun 27, 2020 at 2:29 PM Steven D'Aprano  wrote:
> >> Seriously, I genuinely thought that the existing behaviour was the
> >> opposite and that `add` unconditionally added the element. "Last seen
> >> wins". If I was designing sets, that's probably how I would design it.
> >> After all, it's called *add*, not *add if not already there*. I was so
> >> sure that this was the current behaviour that I didn't bother to check
> >> it before posting, which is rare for me.
> >>
> >> So I think this counts as the principle of maximal surprise :-)
> > Well, that's not how Python's dicts OR sets work, so that would be
> > inconsistent. But feel free to do it your own way in your own
> > language.
> >
> > ChrisA
> > ___
> >
> I don't know what you mean Chris, ISTM that *is* the way dicts work:
>
>  >>> d = { 'one' : 1 }
>  >>> d['one'] = 1.0
>  >>> d
> {'one': 1.0}
>  >>>
>  >>> d = { 'one' : 1 }
>  >>> d.update( {'one' : 1.0} )
>  >>> d
> {'one': 1.0}
>
> Have I misunderstood you?

Sets behave like dict keys, not their values. Obviously the last
*value* will win (because that's how assignment works), but with the
keys, the one already in the dictionary will be retained. Switching
your example around:

>>> d = {1: 'one'}
>>> d[1.0] = 'one'
>>> d
{1: 'one'}
>>> d = {1: 'one'}
>>> d.update({1.0: 'one'})
>>> d
{1: 'one'}
>>> dict(((1, 'one'), (1.0, 'one')))
{1: 'one'}
>>> {1: 'one', 1.0: 'one'}
{1: 'one'}

When sets were introduced, I believe this behaviour of dicts was
deliberately mimicked for sets, although those who've been Pythonning
for longer than I have would be better able to say.

ChrisA
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/SZPNUALECZMA4FLF5NV2SJFWJUWD7UID/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: giving set.add a return value

2020-06-27 Thread Dan Sommers



On Saturday, June 27, 2020, at 03:46 -0500, Chris Angelico wrote:

I grew up with concurrency being a perfectly normal thing. To 
me,
EVERYTHING is concurrent. (Back then, it was single-core CPUs, 
so
technically most machine code instructions could be assumed to 
be

atomic, but at any higher level, you could be preempted at any
moment.)


Concurrent and asynchronous.  And these funny red hats.

Way back when (1989ish?), I ported an [in house, proprietary] OS 
from the MC68000 to the MC68020 and MC68030 
(https://en.wikipedia.org/wiki/Motorola_68000_series), and one of 
the "breakthroughs" was that individual instructions on a single 
core CPU were *not* atomic.  IIRC, the worst case was that an 
individual instruction could be interrupted three or four times 
due to page faults, plus certain I/O or other interrupts.  Not to 
mention that consecutive instructions could overlap in weird ways 
or be executed out of order.  But I digress.


I, too, found out later that concurrency was hard, mostly from 
people who *hadn't* programmed at that bare metal level.

___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/FETSQQNWSQF6LSJP6LYGTE7DGF5MUGDC/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: giving set.add a return value

2020-06-27 Thread Dan Sommers



On Friday, June 26, 2020, at 23:25 -0500, Steven D'Aprano wrote:


On Fri, Jun 26, 2020 at 06:16:05AM -0500, Dan Sommers wrote:


>already_there = seen.add(element)
>if already_there:
># handle the duplicate case
>
>Who thinks like that? *wink*

Anyone who practices EAFP rather than LBYL?  Or is that why 
you're 
winking?


That doesn't come naturally, and in single-threaded code it's 
also 
unnecessary.


Not unlike ChrisA, I grew up in a multiprocessor, multithreaded, 
asynchronous world, and I don't always assume that single-threaded 
code will stay that way.


By the way, since there's no try...except involved and no need 
to catch 
an exception, it's not EAFP.


Perhaps by your standard.  The code I wrote performs an operation, 
and then asks whether or not some condition was met, as opposed to 
asking whether the condition is met first, and then conditionally 
performing the operation.


>But either way, you also have to decide whether the `add` (or 
>the new 
>method) should *unconditionally* insert the element, or only 
>do 
>so if it 
>wasn't present. This makes a big difference:

>
>seen = {2}
>already_there = seen.add(2.0)
>
>At this point, is `seen` the set {2} or {2.0}? Justify why one 
>answer is 
>the best answer.


The actual best answer is left as an exercise for the 
interested 
reader, but whatever it is, it's justified by backwards 
compatibility, the existing definition of "present," and the 
principle of least surprise:


The documentation doesn't guarantee one behaviour or another:

https://docs.python.org/3/library/stdtypes.html#frozenset.add


Not explicitly, no, but the definition of a set (from that same 
section of documentation) is "an unordered collection of distinct 
hashable objects."  Is 2 distinct from 2.0?  Not according to 
Python:


   Python 3.8.3 (default, May 17 2020, 18:15:42) 
   [GCC 10.1.0] on linux
   Type "help", "copyright", "credits" or "license" for more 
   information.

   >>> 2 == 2.0
   True


   Python 3.8.3 (default, May 17 2020, 18:15:42)
   [GCC 10.1.0] on linux
   Type "help", "copyright", "credits" or "license" for more 
   information.

   >>> seen = {2}
   >>> 2.0 in seen
   True
   >>> seen.add(2.0)
   >>> seen
   {2}


Well I'm completely surprised, because I expected `add` to, you 
know, 
actually add the element replacing the one that was already 
there!


But 2 == 2.0, and I asked first, and the answer was that 2.0 was 
already in the set.  (Unsurprisingly, I get the same results for 
2+0i and 2.0+0.0j.)  So how would you know whether or not set.add 
added the [not-new] element or just left the set as is?  Do you 
write code that cares?


Seriously, I genuinely thought that the existing behaviour was 
the 
opposite and that `add` unconditionally added the element. "Last 
seen 
wins". If I was designing sets, that's probably how I would 
design it. 
After all, it's called *add*, not *add if not already there*. I 
was so 
sure that this was the current behaviour that I didn't bother to 
check 
it before posting, which is rare for me.


So I think this counts as the principle of maximal surprise :-)


Then there's a bug in the documentation.  Perhaps the word 
"distinct" or the description of set.add is insufficient.


Relatedly, would your design include both remove and discard?

Should the flag be "element was already present and nothing was 
added" 
or "element was not there, so something was added"?


Given that the name of the method in question is "add," IMO the 
flag would indicate that the element was added.  Please don't read 
too much into my answer(s), though, I usually write code more like 
this:


   # process unique elements of iterable
   for element in set(iterable):
   process_element(element)

and I don't care about handling duplicates one at a time, so I 
don't care which of multiple non-distinct elements end up in my 
set.

___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/PTL4DVOY7WJBE4AD2ZEXIILTWJ75HGO7/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: giving set.add a return value

2020-06-27 Thread Chris Angelico
On Sat, Jun 27, 2020 at 4:54 PM Steven D'Aprano  wrote:
>
> On Fri, Jun 26, 2020 at 10:10:22PM +1000, Chris Angelico wrote:
> > On Fri, Jun 26, 2020 at 7:58 PM Steven D'Aprano  wrote:
> > > Most importantly, it matches the way people think about the task:
> > >
> > > # Task: look for duplicates
> > > if element in seen:
> > > # it's a duplicate
> > > ...
> > > else:
> > > # never seen before, so remember it
> > > ...
> >
> > Do you do this:
> >
> > if file exists:
> > # read the file
> > else:
> > # create the file
>
> No, because I have learned that this is both unreliable and dangerous.
> But that's how I *think* about the problem, at least initially. I
> expect that if you were prototyping the code on a whiteboard, you
> probably would too.
>
> Checking existence of an element in a set is not comparable to file
> access. File access has a lot more that can go wrong than just whether
> or not the file exists. There are permission errors and transient errors
> that bulletproof code needs to deal with. And file access is always
> concurrent, since the OS is concurrent. (Unless you have the luxury
> of working on a single user, single process OS.)

I grew up with concurrency being a perfectly normal thing. To me,
EVERYTHING is concurrent. (Back then, it was single-core CPUs, so
technically most machine code instructions could be assumed to be
atomic, but at any higher level, you could be preempted at any
moment.)

> So long as you know you are dealing with hashable objects, `set.add`
> should always succeed.
>
>
> > Or would you try/except around an open call? The TOCTOU issue may be
> > less obvious with sets, but it's still a reasonable thing to concern
> > yourself with.
>
> Unless you are in the [Quadrant of Hell](https://imgur.com/a/jjA52vI)
> you don't even need to think about this for sets. There are no Time
> Of Check to Time Of Use issues worth thinking about in the given
> example, and frankly Chris I doubt that you surround ever set access
> with locks in single-threaded code because TOCTOU is "stronger" (more
> important?) than writing clear and simple code. And if you do, you're
> probably wondering why Python is so slow *wink*

And that is a fundamental difference between our philosophies. You
consider threads to be terrifyingly scary, and shared mutable data to
be "hell". I consider this to be the normal state of things, and that
whenever you care, you take care. It actually doesn't come up all that
often.

> But I'll grant you that I wasn't thinking about threaded code. The
> example given was single-threaded, but maybe a better example would
> involve threading. But that supports my point: thinking concurrently
> doesn't come naturally.

Nor does thinking algebraically, nor thinking like a database, nor
thinking in terms of diamond inheritance and an MRO. These things have
to be learned. Is it a problem to have to learn things?

> Is this check and insert version of `add` threadsafe? If not, then you
> still need manual locking and this proposal doesn't get you much; if
> anything it's an attractive nuisance because it looks atomic but isn't:
>
> seen = {}
> assert seen.add(None) is False
>
> This could fail in threaded code if the underlying `add` method
> is not thread-safe:
>
> - your thread calls `add`, which checks that None is not an element;
> - between the check and the actual insertion, another thread adds None;
> - and your thread redundently adds it a second time and returns False.
>
> I don't know about Python sets, but most Java collections are not
> thread safe unless explicitly documented as such, so I wouldn't be the
> least bit surprised if this could happen.
>
> So `add` would have to be atomic at the C level. If it is already
> atomic, great, that's one objection neutralised. But if not, then will
> making it atomic have a performance impact?

Question: If it ISN'T atomic at the C level, what happens? I can
imagine a few possibilities, including utter memory corruption,
leaking of objects, insertion of two identical elements into a set,
and many other things that would fundamentally violate Python's own
integrity. What would happen that would be considered acceptable, but
wouldn't guarantee this? I would expect that set.add() is guaranteed
to not add a duplicate, and that right there basically guarantees
everything that I need. It doesn't have to be completely atomic but it
does have to maintain the set invariants.

CPython guarantees this level of safety using the GIL. Other Pythons
may do it in other ways. It hardly matters, because any way will
suffice.

> I'm not very interested in slowing down every single `set.add` just so I
> have the opportunity to write less understandable code when I don't need
> to. But if `set.add` is already atomic and hence thread safe, and the
> other issues with this proposal are addressed, then I don't object to
> this proposal. (I reserve the right to change my m

[Python-ideas] Re: giving set.add a return value

2020-06-27 Thread Steven D'Aprano
On Sat, Jun 27, 2020 at 04:57:32PM +1000, Steven D'Aprano wrote:

> I did find one thing... I was removing an unorded sequence by dropping 
> the elements into a set then converting back to a list, and documented 
> it as "last seen wins" in the case of duplicates.

Sheesh, I was removing *duplicates* from an *unordered* sequence.

I need a proof reader.


-- 
Steven
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/JN5GSYXB6UPT7QMWTDMBKXVHSI6CNE6M/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: giving set.add a return value

2020-06-27 Thread Steven D'Aprano
On Sat, Jun 27, 2020 at 06:09:02PM +1200, Greg Ewing wrote:
> On 27/06/20 4:25 pm, Steven D'Aprano wrote:
> >Seriously, I genuinely thought that the existing behaviour was the
> >opposite and that `add` unconditionally added the element. "Last seen
> >wins".
> 
> The fact that you haven't noticed until now suggests that
> you've never written any code that depends on the difference.

No, because I've tried avoid code that would behave differently in those 
cases.

I did find one thing... I was removing an unorded sequence by dropping 
the elements into a set then converting back to a list, and documented 
it as "last seen wins" in the case of duplicates.



-- 
Steven
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/VQXYMGYVAGBX3F26OXSXFQTRD2EVBRE2/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: giving set.add a return value

2020-06-26 Thread Steven D'Aprano
On Fri, Jun 26, 2020 at 10:10:22PM +1000, Chris Angelico wrote:
> On Fri, Jun 26, 2020 at 7:58 PM Steven D'Aprano  wrote:
> > Most importantly, it matches the way people think about the task:
> >
> > # Task: look for duplicates
> > if element in seen:
> > # it's a duplicate
> > ...
> > else:
> > # never seen before, so remember it
> > ...
> 
> Do you do this:
> 
> if file exists:
> # read the file
> else:
> # create the file

No, because I have learned that this is both unreliable and dangerous. 
But that's how I *think* about the problem, at least initially. I 
expect that if you were prototyping the code on a whiteboard, you 
probably would too.

Checking existence of an element in a set is not comparable to file 
access. File access has a lot more that can go wrong than just whether 
or not the file exists. There are permission errors and transient errors 
that bulletproof code needs to deal with. And file access is always 
concurrent, since the OS is concurrent. (Unless you have the luxury 
of working on a single user, single process OS.)

So long as you know you are dealing with hashable objects, `set.add` 
should always succeed.


> Or would you try/except around an open call? The TOCTOU issue may be
> less obvious with sets, but it's still a reasonable thing to concern
> yourself with.

Unless you are in the [Quadrant of Hell](https://imgur.com/a/jjA52vI) 
you don't even need to think about this for sets. There are no Time 
Of Check to Time Of Use issues worth thinking about in the given 
example, and frankly Chris I doubt that you surround ever set access 
with locks in single-threaded code because TOCTOU is "stronger" (more 
important?) than writing clear and simple code. And if you do, you're 
probably wondering why Python is so slow *wink*

But I'll grant you that I wasn't thinking about threaded code. The 
example given was single-threaded, but maybe a better example would 
involve threading. But that supports my point: thinking concurrently 
doesn't come naturally.

Is this check and insert version of `add` threadsafe? If not, then you 
still need manual locking and this proposal doesn't get you much; if 
anything it's an attractive nuisance because it looks atomic but isn't:

seen = {}
assert seen.add(None) is False

This could fail in threaded code if the underlying `add` method 
is not thread-safe:

- your thread calls `add`, which checks that None is not an element;
- between the check and the actual insertion, another thread adds None;
- and your thread redundently adds it a second time and returns False.

I don't know about Python sets, but most Java collections are not 
thread safe unless explicitly documented as such, so I wouldn't be the 
least bit surprised if this could happen.

So `add` would have to be atomic at the C level. If it is already 
atomic, great, that's one objection neutralised. But if not, then will 
making it atomic have a performance impact?

I'm not very interested in slowing down every single `set.add` just so I 
have the opportunity to write less understandable code when I don't need 
to. But if `set.add` is already atomic and hence thread safe, and the 
other issues with this proposal are addressed, then I don't object to 
this proposal. (I reserve the right to change my mind :-)

Just don't tell me that this:

was_already_there = seen.add(element)
if was_already_there:
...

is a more natural way of thinking about the problem of checking for 
duplicates than:

if element in seen:
...


> > This idiom works with sets, it works with lists, it works with trees, it
> > works with anything that supports membership testing and inserting into
> > a collection. It is the natural way to think about the task.
> 
> So would "ensure this is here, and let me know if you created it".

Are you proposing to add this to lists, deques, dicts, Mappings, 
Sequences and all other collections? I think you'll need a PEP :-)


> > But either way, you also have to decide whether the `add` (or the new
> > method) should *unconditionally* insert the element, or only do so if it
> > wasn't present. This makes a big difference:
> >
> > seen = {2}
> > already_there = seen.add(2.0)
> >
> >
> > At this point, is `seen` the set {2} or {2.0}? Justify why one answer is
> > the best answer.
> >
> 
> It should do the same thing that happens already: keep the existing
> one.

I actually thought that the existing behaviour was the opposite, and 
for once I failed to check it before posting. Oops!

But that's not really important. What is important is the question of 
which of these we wish to match:

# 1
if element in seen:
print("duplicate element")
seen.add(element)

# 2
if element in seen:
print("duplicate element")
else:
seen.add(element)

They have different semantics, and I can imagine cases where both would 
be useful. Although I suppos

[Python-ideas] Re: giving set.add a return value

2020-06-26 Thread Greg Ewing

On 27/06/20 4:25 pm, Steven D'Aprano wrote:

Seriously, I genuinely thought that the existing behaviour was the
opposite and that `add` unconditionally added the element. "Last seen
wins".


The fact that you haven't noticed until now suggests that
you've never written any code that depends on the difference.


If I was designing sets, that's probably how I would design it.
After all, it's called *add*, not *add if not already there*.


But equally, if it's already there, there's no need to do
anything, right?

I don't think it's possible to intuit the behaviour from
the name "add".

I also think that it doesn't matter, and if it does, then
a set probably isn't the right data structure for your
application.

--
Greg
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/WDNSZDNHGYBXFEIFXY3XYHW6CZGPIN2E/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: giving set.add a return value

2020-06-26 Thread Chris Angelico
On Sat, Jun 27, 2020 at 2:29 PM Steven D'Aprano  wrote:
> Seriously, I genuinely thought that the existing behaviour was the
> opposite and that `add` unconditionally added the element. "Last seen
> wins". If I was designing sets, that's probably how I would design it.
> After all, it's called *add*, not *add if not already there*. I was so
> sure that this was the current behaviour that I didn't bother to check
> it before posting, which is rare for me.
>
> So I think this counts as the principle of maximal surprise :-)

Well, that's not how Python's dicts OR sets work, so that would be
inconsistent. But feel free to do it your own way in your own
language.

ChrisA
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/LT74WCVQSXAYZEDOEIVHL73QZ4YJHLI3/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: giving set.add a return value

2020-06-26 Thread Steven D'Aprano
On Fri, Jun 26, 2020 at 06:16:05AM -0500, Dan Sommers wrote:

> >already_there = seen.add(element)
> >if already_there:
> ># handle the duplicate case
> >
> >Who thinks like that? *wink*
> 
> Anyone who practices EAFP rather than LBYL?  Or is that why you're 
> winking?

That doesn't come naturally, and in single-threaded code it's also 
unnecessary.

By the way, since there's no try...except involved and no need to catch 
an exception, it's not EAFP.



> >But either way, you also have to decide whether the `add` (or 
> >the new 
> >method) should *unconditionally* insert the element, or only do 
> >so if it 
> >wasn't present. This makes a big difference:
> >
> >seen = {2}
> >already_there = seen.add(2.0)
> >
> >At this point, is `seen` the set {2} or {2.0}? Justify why one 
> >answer is 
> >the best answer.
> 
> The actual best answer is left as an exercise for the interested 
> reader, but whatever it is, it's justified by backwards 
> compatibility, the existing definition of "present," and the 
> principle of least surprise:

The documentation doesn't guarantee one behaviour or another:

https://docs.python.org/3/library/stdtypes.html#frozenset.add

 
>Python 3.8.3 (default, May 17 2020, 18:15:42)
>[GCC 10.1.0] on linux
>Type "help", "copyright", "credits" or "license" for more 
>information.
>>>> seen = {2}
>>>> 2.0 in seen
>True
>>>> seen.add(2.0)
>>>> seen
>{2}

Well I'm completely surprised, because I expected `add` to, you know, 
actually add the element replacing the one that was already there!

Seriously, I genuinely thought that the existing behaviour was the 
opposite and that `add` unconditionally added the element. "Last seen 
wins". If I was designing sets, that's probably how I would design it. 
After all, it's called *add*, not *add if not already there*. I was so 
sure that this was the current behaviour that I didn't bother to check 
it before posting, which is rare for me.

So I think this counts as the principle of maximal surprise :-)

Should the flag be "element was already present and nothing was added" 
or "element was not there, so something was added"?



-- 
Steven
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/3Z5KH7FCAUK6WLXOBLMBSKC54BMZ2H57/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: giving set.add a return value

2020-06-26 Thread Chris Angelico
On Fri, Jun 26, 2020 at 7:58 PM Steven D'Aprano  wrote:
> Most importantly, it matches the way people think about the task:
>
> # Task: look for duplicates
> if element in seen:
> # it's a duplicate
> ...
> else:
> # never seen before, so remember it
> ...

Do you do this:

if file exists:
# read the file
else:
# create the file

Or would you try/except around an open call? The TOCTOU issue may be
less obvious with sets, but it's still a reasonable thing to concern
yourself with.

> This idiom works with sets, it works with lists, it works with trees, it
> works with anything that supports membership testing and inserting into
> a collection. It is the natural way to think about the task.

So would "ensure this is here, and let me know if you created it". Or
"add this, and throw an exception if it was already there", which
comes to the same thing, but uses exception handling as its mechanism.

> Now, I agree that sometimes for the sake of efficiency, we need to do
> things in a weird way. But membership testing in sets is cheap, it's
> not a linear search of a huge list. So the efficiency argument here is
> weak.

TOCTOU is stronger.

> But either way, you also have to decide whether the `add` (or the new
> method) should *unconditionally* insert the element, or only do so if it
> wasn't present. This makes a big difference:
>
> seen = {2}
> already_there = seen.add(2.0)
>
>
> At this point, is `seen` the set {2} or {2.0}? Justify why one answer is
> the best answer.
>

It should do the same thing that happens already: keep the existing
one. I don't see why this is even in question. Are you searching that
hard for objections?

ChrisA
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/G2RRY7BJLFROU7DMTOWL44X455254RBC/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: giving set.add a return value

2020-06-26 Thread Dan Sommers



On Friday, June 26, 2020, at 04:54 -0500, Steven D'Aprano wrote:


On Thu, Jun 25, 2020 at 05:27:16PM +0300, Ben Avrahami wrote:

Hey all,
Often I've found this kind of code:

seen = set()
for i in iterable:
  if i in seen:
...  # do something in case of duplicates
  else:
seen.add(i)
... # do something in case of first visit

This kind of code appears whenever one needs to check for 
duplicates in
case of a user-submitted iterable, or when we loop over a 
recursive
iteration that may involve cycles (graph search or the 
like). This code
could be improved if one could ensure an item is in the set, 
and get

whether it was there before in one operation.



Whereas this:

# Check for duplicates.
add element to the collection
was it already there before you just added it?

is a weird way to think about the problem.

already_there = seen.add(element)
if already_there:
# handle the duplicate case

Who thinks like that? *wink*


Anyone who practices EAFP rather than LBYL?  Or is that why you're 
winking?


But either way, you also have to decide whether the `add` (or 
the new 
method) should *unconditionally* insert the element, or only do 
so if it 
wasn't present. This makes a big difference:


seen = {2}
already_there = seen.add(2.0)

At this point, is `seen` the set {2} or {2.0}? Justify why one 
answer is 
the best answer.


The actual best answer is left as an exercise for the interested 
reader, but whatever it is, it's justified by backwards 
compatibility, the existing definition of "present," and the 
principle of least surprise:


   Python 3.8.3 (default, May 17 2020, 18:15:42)
   [GCC 10.1.0] on linux
   Type "help", "copyright", "credits" or "license" for more 
   information.

   >>> seen = {2}
   >>> 2.0 in seen
   True
   >>> seen.add(2.0)
   >>> seen
   {2}

Unless the new method is called set.add_with_different_semantics. 
;-)

___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/X7MDBK5BGA4G3W4H7UNGNGJSSWI3ZT7G/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: giving set.add a return value

2020-06-26 Thread Steven D'Aprano
On Thu, Jun 25, 2020 at 05:27:16PM +0300, Ben Avrahami wrote:
> Hey all,
> Often I've found this kind of code:
> 
> seen = set()
> for i in iterable:
>   if i in seen:
> ...  # do something in case of duplicates
>   else:
> seen.add(i)
> ... # do something in case of first visit
> 
> This kind of code appears whenever one needs to check for duplicates in
> case of a user-submitted iterable, or when we loop over a recursive
> iteration that may involve cycles (graph search or the like). This code
> could be improved if one could ensure an item is in the set, and get
> whether it was there before in one operation.

I disagree that it would be improved. The existing idiom is simple and 
flexible and perfectly understandable even if you don't known the gory 
details about how sets work.

Most importantly, it matches the way people think about the task:

# Task: look for duplicates
if element in seen:
# it's a duplicate
...
else:
# never seen before, so remember it
...

This idiom works with sets, it works with lists, it works with trees, it 
works with anything that supports membership testing and inserting into 
a collection. It is the natural way to think about the task.

Whereas this:

# Check for duplicates.
add element to the collection
was it already there before you just added it?

is a weird way to think about the problem.

already_there = seen.add(element)
if already_there:
# handle the duplicate case

Who thinks like that? *wink*

Now, I agree that sometimes for the sake of efficiency, we need to do 
things in a weird way. But membership testing in sets is cheap, it's 
not a linear search of a huge list. So the efficiency argument here is 
weak.

If you can demonstrate a non-contrived case where the efficiency 
argument was strong, then I might be persuaded that a new method (not 
`add`) could be justified.

But either way, you also have to decide whether the `add` (or the new 
method) should *unconditionally* insert the element, or only do so if it 
wasn't present. This makes a big difference:

seen = {2}
already_there = seen.add(2.0)


At this point, is `seen` the set {2} or {2.0}? Justify why one answer is 
the best answer.


-- 
Steven
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/M4BDA4AR555XY666ZNU5A5O4TNYDZLGR/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: giving set.add a return value

2020-06-25 Thread Bar Harel
I believe I'm -0 for modifying the existing add() and +0 for a new function.

By the way, a good phrasing of the general mutating/non-mutating convention
opens the built-in types section of the docs:

"Some collection classes are mutable. The methods that add, subtract, or
rearrange their members in place, and don’t return a specific item, never
return the collection instance itself but None."

Note the use of the word item, rather than any return value.

Out of curiosity though, were we to change set.add(), how could we keep
backwards compatibility in the c-api?

On the Python level, we might find people do "val = set.add(i) or i", or
even "any(set.add(i) for i in range (10))" which would break. I myself
wrote the former quite often with "list.append" etc.

On the c-api, the problem might be worse -
 I wouldn't be surprised if people use PyCheckExact or pointer comparison
for the results of both PySet_Add and set_add of the set object.

Is there a way to maintain compatibility on the c-api that I'm not aware
of?

On Fri, Jun 26, 2020, 3:41 AM Greg Ewing 
wrote:

> On 26/06/20 4:01 am, Steele Farnsworth wrote:
> > My point was only that, as far as I know, all the methods for built in
> > container types that serve only to change what is contained return None,
> > and that this was an intentional design choice, so changing it in one
> > case would have to evoke a larger discussion about what those sorts of
> > methods should return.
>
> The reason for that convention is so that there is no confusion
> about which methods return new objects and which modify the object
> in place.
>
> However, achieving that goal only requires that mutating methods
> don't return 'self'. It doesn't mean they can't return some other
> value that might be useful.
>
> A wider argument against that might be that methods should be
> classifiable into "procedures" (that have side effects but not
> return values) and "functions" (that return values but don't
> have side effects). I'm not sure whether that's considered part
> of the Python design philosophy -- I don't remember seeing much
> discussion about it.
>
> In this particular case, it might be better to add a new method
> rather than redefining the existing 'add' method, because if code
> assuming the new behaviour were run on an older version of Python,
> it would fail in an obscure way.
>
> --
> Greg
> ___
> Python-ideas mailing list -- python-ideas@python.org
> To unsubscribe send an email to python-ideas-le...@python.org
> https://mail.python.org/mailman3/lists/python-ideas.python.org/
> Message archived at
> https://mail.python.org/archives/list/python-ideas@python.org/message/ZYYUMHMD2SCNOETAPQQHDCET7JUQ5ZNZ/
> Code of Conduct: http://python.org/psf/codeofconduct/
>
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/SAZ5GTS6THVYUYMKU2VNE5RI3IJI3YCC/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: giving set.add a return value

2020-06-25 Thread Greg Ewing

On 26/06/20 4:01 am, Steele Farnsworth wrote:
My point was only that, as far as I know, all the methods for built in 
container types that serve only to change what is contained return None, 
and that this was an intentional design choice, so changing it in one 
case would have to evoke a larger discussion about what those sorts of 
methods should return.


The reason for that convention is so that there is no confusion
about which methods return new objects and which modify the object
in place.

However, achieving that goal only requires that mutating methods
don't return 'self'. It doesn't mean they can't return some other
value that might be useful.

A wider argument against that might be that methods should be
classifiable into "procedures" (that have side effects but not
return values) and "functions" (that return values but don't
have side effects). I'm not sure whether that's considered part
of the Python design philosophy -- I don't remember seeing much
discussion about it.

In this particular case, it might be better to add a new method
rather than redefining the existing 'add' method, because if code
assuming the new behaviour were run on an older version of Python,
it would fail in an obscure way.

--
Greg
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/ZYYUMHMD2SCNOETAPQQHDCET7JUQ5ZNZ/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: giving set.add a return value

2020-06-25 Thread Greg Ewing

On 26/06/20 2:45 am, Steele Farnsworth wrote:
Personally, I'd want to see mutator methods return `self` so you can do 
more than one mutation in a statement, but the convention is that all 
the mutator methods return `None`.


I would say the convention is that mutator methods don't return 'self'.
That doesn't preclude them from returning some other useful value.

--
Greg
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/HF5GGQ6WZD3RQ2J2H7AHZQPM24RH2YFT/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: giving set.add a return value

2020-06-25 Thread Jonathan Fine
Hi Ben

You've definitely got a point. When you do
seen.add(key)
it's sometimes important to know if this changes 'seen'.

Here's a one-liner that does what you want:
key in seen or seen.add(key)

It's a bit obscure, and perhaps a bit too much
https://en.wikipedia.org/wiki/Code_golf. So here's a worked example.

>>> seen = set()
>>> 0 in seen or seen.add(0)
>>> 0 in seen or seen.add(0)
True

The first time '0 in seen' is False, so the OR part 'seen.add(0)' is
executed. As you and others have said, seen.add(key) always return None.
(The REPL does not echo the expression value if it is None.)

The second time '0 in seen' is True, so the OR part is not done, and the
value of the expression is True.

If you want, in your own code you can subclass set to provide the behaviour
you think is missing. Or just write a utility function. I think that would
in practice be better.

>>> def in_or_add(s, key):
...return key in s or s.add(key) or False

>>> seen = set()
>>> in_or_add(seen, 0)
False
>>> in_or_add(seen, 0)
True

An aside. Your proposal is:

> value. set.add would return True if the item was already in the set prior
> to insertion, and False otherwise.


This allows you to simplify your code example. But I find the semantics a
bit odd. If set.add(key) were to return a boolean, I'd expect it to be True
if the item genuinely was added. But your semantics are the other way
around.

My proposal of in_or_add and your revision of set.add both have the same
semantics. I think that is clear. But they have different names. If enough
people use a utility function such as in_or_add(s, key), then that makes a
good case for adding it as a set method.

Two final points. While processing the 'not key in seen' branch, an
exception might occur. Sometimes it's better to add the key to 'seen' once
all the associated processing is complete. (An exception that leaves the
system in an inconsistent state can cause problems later.) The second point
is that I'm not confident about 'in_or_add' as a name for the method.

Once again, thank you for your interesting idea.
-- 
Jonathan
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/AIZUO34NMWKH5GRZSVQ4BNF66OILU5VI/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: giving set.add a return value

2020-06-25 Thread Brett Cannon
On Thu, Jun 25, 2020 at 10:10 AM Ben Avrahami 
wrote:

> On Thu, Jun 25, 2020 at 7:55 PM Christopher Barker 
> wrote:
>
>> On Thu, Jun 25, 2020 at 9:02 AM Steele Farnsworth 
>> wrote:
>> indeed -- and that is pretty darn baked in to Python, so I don't think
>> it's going to change.
>>
>
30 years of history beats out 8 years since the last discussion of this. 😉


>
>>
> Except this convention doesn't hold for dict.setvalue (which I did
> misspell, sorry), or for dict.pop.
>

I don't know what you mean by setvalue(), but I'm assuming you meant
setdefault(). If that's true, then the guideline isn't "all mutating
methods must return None", it's "*unless* the method is specifically
designed to have a return value based on the purpose of the method, the
method should return None".


> Both these methods fundamentally mutate the collection, but they also
> return a value (which could be retrieved by a non-mutating method), that
> somewhat pertains to the operation performed.
>

Both pop() and setdefault() are meant to return something, they just also
happen to mutate things. You're wanting the reverse and that isn't how
Python has chosen to go.

Same with people who bring up wanting a fluent API where 'self' is always
returned. It's a very specific decision that for methods that do nothing
but mutation they return None to signify that you're not getting back a new
object, but something that mutated in-place. This is why list.sort(), for
instance, returns None; you are not getting a new copy of the list sorted,
you actually changed the original object which you should have named with a
variable. That way you have to access the object by the name you already
gave it as a reminder its the same object.

Plus Python isn't big on the crazy one-lineers where you might want to do
an inline call of add() to simply save yourself one line of code. Explicit
is better than implicit, and that can mean needing to simply do something
in two lines instead of one.
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/XBUPWZ3ZPEQZ233TF6CRYPMGXQ7XSJ7Y/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: giving set.add a return value

2020-06-25 Thread Ben Avrahami
On Thu, Jun 25, 2020 at 7:55 PM Christopher Barker 
wrote:

> On Thu, Jun 25, 2020 at 9:02 AM Steele Farnsworth 
> wrote:
> indeed -- and that is pretty darn baked in to Python, so I don't think
> it's going to change.
>
>
Except this convention doesn't hold for dict.setvalue (which I did
misspell, sorry), or for dict.pop. Both these methods fundamentally mutate
the collection, but they also return a value (which could be retrieved by a
non-mutating method), that somewhat pertains to the operation performed.


> Note: I"n not sure your example with setdefault is correct:
>

Fair enough, I misspelled setdefault and messed up the example, what I
meant was:
seen = {}
for k, v in iterable_of_tuples:
  if seen.setdefault(k,v) is not v:
... # duplicate key
  else:
... # new key
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/PSQRU7QZHFFTFA45DQJKQF4RLFIHSGMF/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: giving set.add a return value

2020-06-25 Thread Christopher Barker
On Thu, Jun 25, 2020 at 9:02 AM Steele Farnsworth 
wrote:

> My point was only that, as far as I know, all the methods for built in
> container types that serve only to change what is contained return None,
> and that this was an intentional design choice, so changing it in one case
> would have to evoke a larger discussion about what those sorts of methods
> should return.
>

indeed -- and that is pretty darn baked in to Python, so I don't think it's
going to change.

Note: I"n not sure your example with setdefault is correct:

seen = {}
for i in iterable:
  if seen.set_default(i, some_value) is not None:
...  # do something in case of duplicates
  else:
... # do something in case of first visit

A) you spelled setdefault() wrong -- no underscore

B) seen.setdefault(i, some_value) will return some_value if it's not there,
and whatever the value it is it is (in this case, starting with an empty
dict, it will be some_value always.

Running this code:

some_value = "sentinel"

iterable = [3, 2, 4, 2, 3]

seen = {}
for i in iterable:
if seen.setdefault(i, some_value) is not None:
# do something in case of duplicates
print(f'{i} was already in there')
else:
# do something in case of first visit
print(f'{i} was not already there')

results in:

In [9]: run in_set.py

3 was already in there
2 was already in there
4 was already in there
2 was already in there
3 was already in there

so not working.

But you can make it work if you reset the value:

some_value = "sentinel"

iterable = [3, 2, 4, 2, 3]

seen = {}
for i in iterable:
if seen.setdefault(i, None) is not None:
# do something in case of duplicates
print(f'{i} was already in there')
else:
# do something in case of first visit
print(f'{i} was not already there')
seen[i] = some_value

In [11]: run in_set.py

3 was not already there
2 was not already there
4 was not already there
2 was already in there
3 was already in there

But this is a bit klunky as well, not really any better than the set
version.




However, for the case at hand, adding a method similar to the
dict.setdefault() would be a reasonable thing to do. I'm not sure what to
call it, or what the API should be, but maybe:

class my_set(set):

def add_if_not_there(self, item):
if item in self:
return True
else:
self.add(item)
return False

seen = my_set()

for i in iterable:
if seen.add_if_not_there(i):
print(f'{i} was already in there')
else:
print(f'{i} was not already there')

However, while dict.setdefault does clean up and clarify otherwise somewhat
ugly code, I'm not sure this is that much better than:

for i in iterable:
if i in seen:
print(f'{i} was already in there')
else:
seen.add(i)
print(f'{i} was not already there')

But feel free to make the case :-)

Note that setdefault is in the MutableMapping ABC, so there could be some
debate about whether to add this new method to the MutableSet ABC.

-CHB




>
> I wouldn't be opposed to that discussion happening and for any changes
> that are made to happen within 3.x because I doubt that very much code that
> currently exists depends on these methods returning None or even use what
> they return at all.
>
> On Thu, Jun 25, 2020, 10:28 AM Ben Avrahami 
> wrote:
>
>> Hey all,
>> Often I've found this kind of code:
>>
>> seen = set()
>> for i in iterable:
>>   if i in seen:
>> ...  # do something in case of duplicates
>>   else:
>> seen.add(i)
>> ... # do something in case of first visit
>>
>> This kind of code appears whenever one needs to check for duplicates in
>> case of a user-submitted iterable, or when we loop over a recursive
>> iteration that may involve cycles (graph search or the like). This code
>> could be improved if one could ensure an item is in the set, and get
>> whether it was there before in one operation. This may seem overly
>> specific, but dicts do do this:
>>
>> seen = {}
>> for i in iterable:
>>   if seen.set_default(i, some_value) is not None:
>> ...  # do something in case of duplicates
>>   else:
>> ... # do something in case of first visit
>>
>> I think the set type would benefit greatly from its add method having a
>> return value. set.add would return True if the item was already in the set
>> prior to insertion, and False otherwise.
>>
>> Looking at the Cpython code, the set_add_entry already detects existing
>> entries, adding a return value would require no additional complexity.
>>
>> Any thoughts?
>> ___
>> Python-ideas mailing list -- python-ideas@python.org
>> To unsubscribe send an email to python-ideas-le...@python.org
>> https://mail.python.org/mailman3/lists/python-ideas.python.org/
>> Message archived at
>> https://mail.python.org/archives/list/python-ideas@python.org/message/6WYNYNG5J5HBD3PA7PW75RP4PMLOMH4C/
>> Code of Conduct: http://p

[Python-ideas] Re: giving set.add a return value

2020-06-25 Thread Ben Avrahami
On Thu, Jun 25, 2020 at 6:44 PM Alex Hall  wrote:

> Previous discussions on this:
>
>
> https://mail.python.org/archives/list/python-ideas@python.org/thread/ASHOHN32BQPBVPIGBZQRS24XHXFMB6XZ/
>
> https://mail.python.org/archives/list/python-ideas@python.org/thread/K5SS62AB5DFFZIJ7ASKPLB2P3XGSYFPC/
>  (seems
> like part of the above discussion that got separated)
>
> https://mail.python.org/archives/list/python-ideas@python.org/thread/CKF2GFI3HKQAAYRCMMRPTFMONG3UGO4T/#CKF2GFI3HKQAAYRCMMRPTFMONG3UGO4T
>

The latest of these discussions is over 8 years old, and I think it
deserves bringing up again.

The convention that "methods that modify self return None" is useful, but I
would say that dict's equivalent method: dict.setdefault is a modifying
method that also returns a pertinent (and often useful value), even though
it could too be split into two methods (__getitem__ and __setitem__ if the
key is missing), as does dict.pop (BTW, I think that set.discard should
also return a boolean, but that discussion can wait).

Indeed, adding a return value to set.add would not improve runtime
performance, but I believe it would both improve readability and make code
more fluent to write.

As for teaching, I don't think this additional bit of complexity would be
too difficult to learn. "set.add ensures that an item is in the set and
returns whether it was already included" seems pretty straightforward to
me. Not to mention that anyone coming from C#, Java, C++, and many other
languages would already be familiar with this concept.

Overall, I think my gripe is that, currently, set() does not have the
capabilities of a dict that maps to NoneType, that the simplest
implementation of set in python is more powerful than the standard
library's version.
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/UHCVO5PAS7DJXSGLMYEW72BRCE3RZUYT/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: giving set.add a return value

2020-06-25 Thread Steele Farnsworth
My point was only that, as far as I know, all the methods for built in
container types that serve only to change what is contained return None,
and that this was an intentional design choice, so changing it in one case
would have to evoke a larger discussion about what those sorts of methods
should return.

I wouldn't be opposed to that discussion happening and for any changes that
are made to happen within 3.x because I doubt that very much code that
currently exists depends on these methods returning None or even use what
they return at all.

On Thu, Jun 25, 2020, 10:28 AM Ben Avrahami  wrote:

> Hey all,
> Often I've found this kind of code:
>
> seen = set()
> for i in iterable:
>   if i in seen:
> ...  # do something in case of duplicates
>   else:
> seen.add(i)
> ... # do something in case of first visit
>
> This kind of code appears whenever one needs to check for duplicates in
> case of a user-submitted iterable, or when we loop over a recursive
> iteration that may involve cycles (graph search or the like). This code
> could be improved if one could ensure an item is in the set, and get
> whether it was there before in one operation. This may seem overly
> specific, but dicts do do this:
>
> seen = {}
> for i in iterable:
>   if seen.set_default(i, some_value) is not None:
> ...  # do something in case of duplicates
>   else:
> ... # do something in case of first visit
>
> I think the set type would benefit greatly from its add method having a
> return value. set.add would return True if the item was already in the set
> prior to insertion, and False otherwise.
>
> Looking at the Cpython code, the set_add_entry already detects existing
> entries, adding a return value would require no additional complexity.
>
> Any thoughts?
> ___
> Python-ideas mailing list -- python-ideas@python.org
> To unsubscribe send an email to python-ideas-le...@python.org
> https://mail.python.org/mailman3/lists/python-ideas.python.org/
> Message archived at
> https://mail.python.org/archives/list/python-ideas@python.org/message/6WYNYNG5J5HBD3PA7PW75RP4PMLOMH4C/
> Code of Conduct: http://python.org/psf/codeofconduct/
>
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/ILNANLAGZR3S6VBMK7FJXUZZUMKGKJOV/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: giving set.add a return value

2020-06-25 Thread Alex Hall
Previous discussions on this:

https://mail.python.org/archives/list/python-ideas@python.org/thread/ASHOHN32BQPBVPIGBZQRS24XHXFMB6XZ/
https://mail.python.org/archives/list/python-ideas@python.org/thread/K5SS62AB5DFFZIJ7ASKPLB2P3XGSYFPC/
(seems
like part of the above discussion that got separated)
https://mail.python.org/archives/list/python-ideas@python.org/thread/CKF2GFI3HKQAAYRCMMRPTFMONG3UGO4T/#CKF2GFI3HKQAAYRCMMRPTFMONG3UGO4T

On Thu, Jun 25, 2020 at 4:30 PM Ben Avrahami  wrote:

> Hey all,
> Often I've found this kind of code:
>
> seen = set()
> for i in iterable:
>   if i in seen:
> ...  # do something in case of duplicates
>   else:
> seen.add(i)
> ... # do something in case of first visit
>
> This kind of code appears whenever one needs to check for duplicates in
> case of a user-submitted iterable, or when we loop over a recursive
> iteration that may involve cycles (graph search or the like). This code
> could be improved if one could ensure an item is in the set, and get
> whether it was there before in one operation. This may seem overly
> specific, but dicts do do this:
>
> seen = {}
> for i in iterable:
>   if seen.set_default(i, some_value) is not None:
> ...  # do something in case of duplicates
>   else:
> ... # do something in case of first visit
>
> I think the set type would benefit greatly from its add method having a
> return value. set.add would return True if the item was already in the set
> prior to insertion, and False otherwise.
>
> Looking at the Cpython code, the set_add_entry already detects existing
> entries, adding a return value would require no additional complexity.
>
> Any thoughts?
> ___
> Python-ideas mailing list -- python-ideas@python.org
> To unsubscribe send an email to python-ideas-le...@python.org
> https://mail.python.org/mailman3/lists/python-ideas.python.org/
> Message archived at
> https://mail.python.org/archives/list/python-ideas@python.org/message/6WYNYNG5J5HBD3PA7PW75RP4PMLOMH4C/
> Code of Conduct: http://python.org/psf/codeofconduct/
>
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/Y7TV4ZQOXE3BEIEDU7STKST4EL5GXUQW/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: giving set.add a return value

2020-06-25 Thread Steele Farnsworth
Personally, I'd want to see mutator methods return `self` so you can do
more than one mutation in a statement, but the convention is that all the
mutator methods return `None`.

On Thu, Jun 25, 2020, 10:28 AM Ben Avrahami  wrote:

> Hey all,
> Often I've found this kind of code:
>
> seen = set()
> for i in iterable:
>   if i in seen:
> ...  # do something in case of duplicates
>   else:
> seen.add(i)
> ... # do something in case of first visit
>
> This kind of code appears whenever one needs to check for duplicates in
> case of a user-submitted iterable, or when we loop over a recursive
> iteration that may involve cycles (graph search or the like). This code
> could be improved if one could ensure an item is in the set, and get
> whether it was there before in one operation. This may seem overly
> specific, but dicts do do this:
>
> seen = {}
> for i in iterable:
>   if seen.set_default(i, some_value) is not None:
> ...  # do something in case of duplicates
>   else:
> ... # do something in case of first visit
>
> I think the set type would benefit greatly from its add method having a
> return value. set.add would return True if the item was already in the set
> prior to insertion, and False otherwise.
>
> Looking at the Cpython code, the set_add_entry already detects existing
> entries, adding a return value would require no additional complexity.
>
> Any thoughts?
> ___
> Python-ideas mailing list -- python-ideas@python.org
> To unsubscribe send an email to python-ideas-le...@python.org
> https://mail.python.org/mailman3/lists/python-ideas.python.org/
> Message archived at
> https://mail.python.org/archives/list/python-ideas@python.org/message/6WYNYNG5J5HBD3PA7PW75RP4PMLOMH4C/
> Code of Conduct: http://python.org/psf/codeofconduct/
>
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/GQ3GYFCYMQSXRM2OPJ6ZRAKS4IYNETMG/
Code of Conduct: http://python.org/psf/codeofconduct/