[Python-ideas] Re: Argumenting in favor of first()

Andrew Barnert via Python-ideas Sat, 07 Dec 2019 20:31:38 -0800

On Dec 7, 2019, at 18:09, Wes Turner <[email protected]> wrote:
> 
> 
>> On Sat, Dec 7, 2019, 8:20 PM Andrew Barnert <[email protected]> wrote:
>> On Dec 7, 2019, at 07:33, Wes Turner <[email protected]> wrote:
>> > 
>> > 
>> > +1 for itertools.first(seq, default=Exception) *and* itertools.one(seq, 
>> > default=Exception)
>> 
>> What does default=Exception mean? What happens if you pass a different 
>> value? Does it do one thing if the argument is a type that’s a subclass of 
>> Exception (or of BaseException?) and a different thing if it’s any other 
>> value?
> 
> 
> That's a good point: Exception is a bad sentinel value. Is None a good 
> default value? What if the genexpr'd iterable is [None, 2, 3]


That’s a common issue in Python. When you can’t use None as a sentinel because 
it could be a valid user input or return value, you just create a private 
module or class attribute that can’t equal anything the user could pass in, 
like this:

    _sentinel = object()

And then:

    def spam(stuff, default=_sentinel):
        if default is _sentinel:
            do single-argument stuff here
        else:
            do default-value stuff here

This seems like the kind of thing that should be explained somewhere in every 
tutorial (including the official one), but most people end up finding it only 
by accident, reading some code that uses it and trying to figure out what it 
does and why. The same way people figure out how useful two-argument iter is, 
and a couple other things.

>> Also, “seq” implies that you’re expecting these to be used on sequences, not 
>> general iterables. In that case, why not just use [0]?
> 
> 
> I chose `seq` as the argument because I was looking at 
> toolz.itertoolz.first(),which has no default= argument.

Ah. I’m not sure why toolz uses seq for arguments that are usually iterators, 
but I guess that’s not horrible or anything. In itertools and more-itertools, 
the argument is usually called iterable, and seq is reserved specifically for 
the ones that should be sequences, like chunked(iterable, n) vs. sliced(seq, 
n). But as useful as that convention is, I suppose it’s not a universal thing 
that everyone knows and follows; it’s not documented or explained anywhere, you 
just kind of have to guess the distinction from the names.

> Though, .first() (or .one()) on an unordered iterable is effectively 
> first(shuffle(iterable)), which *could* raise an annotation exception at 
> compile time. 

I’m not sure what you mean by an “annotation exception”. You mean an error from 
static type checking in mypy or something? I’m not sure why it would be an 
error, unless you got the annotation wrong. It should be Iterable, and that 
will work for iterators and sequences and sets and so on just fine.

Also, it’s not really like shuffle, because even most “unordered iterables” in 
Python, like sets, actually have an order. It’s not guaranteed to be a 
meaningful one, but it’s not guaranteed to be meaningless either. If you need 
that (e.g., you’re creating a guessing game where you don’t want the answer to 
be the same every time anyone runs the game, or for security reasons), you 
really do need to explicitly randomize. For example, if s = 
set(range(10,0,-1)), it’s not guaranteed anywhere that next(iter(s)) will be 0, 
but it is still always 0 in any version of CPython. Worse, whatever 
next(iter(s)) is, if you call next(iter(s)) again (without mutating s in 
between), you’ll get the same value from the new Iterator in any version of any 
Python implementation.

But if you don’t care whether it’s meaningful or meaningless, first, one, etc. 
on a set are fine.
    
> Sets are unordered iterables and so aren't sequences; arent OrderedIterables.

Right, but a sequence isn’t just an ordered iterable, it’s also random-access 
indexable (plus a few other things). An itertools.count(), a typical sorteddict 
type, a typical linked list, etc. are all ordered but not sequences. The 
more-itertools functions that require sequences (and name them seq) usually 
require indexing or slicing.

>> Arguably, first, and maybe some of it’s cousins, should go into the recipes. 
>> And I don’t see any reason they shouldn’t be identical to the versions in 
>> more-itertools, but if there is one, it should be coordinated with Erik Rose 
>> in some way so they stay in sync.
> 
> 
> Oh hey, "more-itertools". I should've found that link in the cpython docs.

Well, it was only added to the docs in, I believe, 3.8, so a lot of people 
probably haven’t seen the link yet. (That’s always a problem for a widely-used 
decades-old language that evolves over 18-month cycles and carefully preserves 
backward compatibility; you can’t expect everyone to always know the latest of 
anything the way you can with something like Swift. But if we’re talking about 
further changes beyond what’s in 3.8, I think we have to assume that the docs 
change will start being effective before anything new we propose.)

> https://more-itertools.readthedocs.io/en/stable/_modules/more_itertools/more.html#first
>  :
>  
>     def first(iterable, default=_marker)
> 
> That makes more sense than default=Exception. 
> 
> FWIW, more-itertools .one() raises ValueError (or whatever's passed as 
> too_short= or too_long= kwargs). Default subclasses of ValueError may not be 
> justified?

I don’t _think_ they are. It’s probably pretty rare that you want to switch on 
the type programmatically (e.g., use different handlers for too short and too 
long), and it’s pretty trivial to add your own subclasses. You do often want to 
be able to distinguish them as a human when debugging your code, but that’s 
already taken care of by the exception message text. (That’s only documented in 
the examples, but is that a problem?)

>> Maybe first is so useful, so much more so than all of the other very useful 
>> recipes, including things like consume, flatten, and unique (which IIRC were 
>> the ones that convinced everyone it’s time to add a more-itertools link to 
>> the docs), that it needs to be slightly more discoverable—e.g., by 
>> itertools.<TAB> completion? But that seems unlikely given that they’ve been 
>> recipes for decades and first wasn’t. 
> 
> 
> def itertools._check_more_itertools():
>    """ https://more-itertools.readthedocs.io/en/stable/api.html """

I’m not sure what this is intended to mean. Are you suggesting we could add 
this as an empty function just so that tab completion, dir, help, IDE 
mechanisms, etc. could make it more discoverable? 

If so, you’d want to give it a non-private name (most of those things will 
ignore a name starting with an underscore; some of them will use __all__ to 
override it, but others won’t.) But otherwise, it might not be a bad idea. A 
lot of people do explore by IDE completion, apparently.

>> And it seems even less likely for one, which nobody has mentioned in this 
>> thread yet.
>> 
>> If there’s a general argument that linking to more-itertools hasn’t helped 
>> anything, or that the recipes are still useless until someone makes the 
>> often-proposed/never-followed-through change of finding a way to make the 
>> recipes individually searchable and linkable, or whatever, that’s fine, but 
>> it’s not really an argument against making a special case for one that isn’t 
>> made for unique or consume.
> 
> 
> Is programming by Exception faster or preferable to a sys.version_info 
> conditional?

I don’t know if it’s faster (and I doubt that matters), and I’m sure you could 
argue other pros and cons each way, but it is a long-standing common idiom (at 
least back to the 2.5 days, when half the web services on the internet probably 
started by importing json with a fallback to simplejson) to do it your first 
way:

> try:
>    from itertools import one, first
> except ImportError:
>    from more_itertools.more import one, first

But why does one need to be added to itertools in the first place? Is it really 
that much more common a need than flatten, consume, etc., or so much harder to 
write yourself (maybe not inherently, but because its target audience is more 
novice-y), or what?

You need some argument for that to overcome the status quo, and expanding 
itertools making it harder to find the stuff that really is necessary to have 
there, and the fact that you’d either have to implement it in C or convert 
itertools to a Python-and-C module to do it, etc. Otherwise, either just adding 
it to the recipes, or doing nothing at all, seems like the right choice.

_______________________________________________
Python-ideas mailing list -- [email protected]
To unsubscribe send an email to [email protected]
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/[email protected]/message/JELWWQ6OKR7I4UUKBKVUK4WNSLI2A7Y4/
Code of Conduct: http://python.org/psf/codeofconduct/

[Python-ideas] Re: Argumenting in favor of first()

Reply via email to