Andrew Barnert writes:

 > >> The answer is that files are iterators, while lists are… well,
 > >> there is no word.
 > > 
 > > As Chris B said, sure there are words:  File objects are *already*
 > > iterators, while lists are *not*.  My question is, "why isn't that
 > > instructive?"
 > 
 > Well, it’s not _completely_ not instructive, it’s just not
 > _sufficiently_ instructive.
 > 
 > Language is more useful when the concepts it names carve up the
 > world in the same way you usually think about it.

True.  But that doesn't mean we need names for everything.  In your
"phases of matter" example, there are two characteristics, fluidity
(which gases and liquids have, but solids don't) and compressibility
(which gases have, but neither solids nor liquids do).  Here the
tripartite vocabulary makes sense, since they're orthogonal, and (in
our modern world) all three concepts are everyday experience.

 > Yes, it’s true that we can talk about “iterables that are not
 > iterators”. But that doesn’t mean there’s no need for a word.

True, but that also doesn't mean there *is* need for a word.

 > We don’t technically need the word “liquid” because we could always
 > talk about “compressibles that are not solid” (or “fluids that are
 > not gas”)

True, but neither "compressibles" nor "fluids" "is a thing".  Instead,
in everyday language "fluid" is pretty much synonymous with "liquid",
and AFAIK there are no compressibles that aren't fluids, so
"compressible" is pretty much purely an adjective.  OTOH, it's useful
to pick out each phase of matter separately.

You haven't make an argument that it's useful to pick out "iterables
that aren't iterators" separately yet, except that you believe that a
word would help (which to me is evidence for the need, but not very
strong evidence).

The reason I'm quite unpersuaded is that there's also a concept of
marked vs unmarked in linguistics.  Marked concepts are explicitly
indicated; unmarked concepts require an explicit contrast with the
marked concept, or they get folded into the generic word, leaving some
ambiguity that gets resolved by context.  (This can get really
persnickety with no obvious rules even in the same domain.  For
example, with gender, "he" is unmarked, and you need to disambiguate
"male person" from "person of unknown gender" fromm context, at least
in traditional English grammar.  While "she" is marked.  By contrast,
"male" and "female" are both unambiguous.)

Now, it seems to me that we are only ever going to discuss iterators
in the context of iteration, which means our domain of discourse is
pretty much restricted to iterables.  (In the sense that there's
nothing left to discuss about iteration once you've classed an entity
as "not iterable".)  Given the way iterable and iterator are defined,
it seems perfectly reasonable to me that iterator would be marked,
non-iterator iterable left to its own devices, and the word "iterable"
disambiguated from context, or perhaps marked with some fairly clumsy
modifier.

So how can one explain "the problem with re-iterating files"?  Here's
how I would (now that I've thought more about it than I should ;-):

Student: OK, so we use 'for' to iterate over lists etc.  And it's cool
         that we can do "for line in file".  But how come if I need to
         do it twice, with lists I can just use a new 'for' statement,
         but with files nothing useful happens?
Teacher: That's a good question.  You know that "things we can use in
         a for statement" are called "iterables" right?
         Well, files are a special kind of iterable called
         "iterator", and you can "start them where you left off" with
         a new 'for' statement.
Student: But the 'for' statement runs out!  You don't want to restart
         in the middle!
Teacher: Exactly!  And that's why nothing useful happens when you use
         a second for statement on an already-open file.  But you can
         use 'break' to stop partway through.
Student: Huh?  What's that good for?
Teacher: [Gives relevant example: paragraph-wise processing in text
         files with empty line paragraph breaks, message-wise
         processing in mbox files, etc.]
Student: Well, OK.  But that's not what I expected or wanted.
Teacher: [Presses "play" on Rolling Stones tune cued up for this moment.
         Continues as voice-over.]
         True enough.  I wasn't there when they designed this
         interface to files, so I'm not sure all the reasons but I do
         find it useful for the kind of processing I described
         earlier.  Of course, you can get the effect you want by using
         'open' again.  It's a little annoying that *you* have to remember
         to do this.  Also, there is a way to reset files the way you
         want.  Just use the '.seek(0)' method on the file before the
         second 'for' statement.
Student: Hey, wait!  Suppose I wanted to "restart where I left off" in
         iterating over a list.  I guess that just doesn't work?
Teacher: [Wishes she had more students like this.]
         Another good question.  If you want to do that, you have to
         construct an iterator from the list: 'lit = iter(l)'.  Now
         iterate over 'lit', and you can break in the middle and
         restart with a new 'for' statement, just like with files.
         It's a little annoying that you have to remember ...
Student: [clobbers teacher with a handy copy of Python Essential Reference]

The point of the little dialogue is that although the word "iterator"
is used, the student only has to remember it until the end of any
sentence in which it's used.  I think the student's responses are
quite natural, and they don't mention "iterator".  I suspect this
student won't remember 'iter' but I bet she does remember '.seek(0)'.

On the other hand, what is there to explain *specifically* about
iterables that aren't iterators that explaining about iterables
doesn't do just as well?  I guess there's the inverse of the "why
doesn't it work with files?" question, but does that ever get asked?
Surely almost all students encounter iteration over sequences first,
and only later over iterators?

 > > 2.  The *for* statement and the *next* builtin require an iterator
 > >   object to work.  Since for *always* needs an iterator object, it
 > >   automatically converts the "in" object to an iterator implicitly.
 > >   (Technical note: for the convenience of implementors of 'for',
 > >   when iter is applied to an iterator, it always returns the
 > >   iterator itself.)

 > I think this is more complicated than people need to know, or
 > usually learn. People use for loops almost from the start, but many
 > people get by with never calling next. All you need is the concept
 > “thing that can be used in a for loop”, which we call
 > “iterable”.

Conceded.  "Had I only more time, I would have written a much shorter
post."

 > “Iterable” is the fundamental concept.

We agree on this too.

 > Of course you will need to learn the concept “iterator” pretty soon
 > anyway, but only because Python actually gives you iterators all
 > over the place. [...] You want to know whether they can be used in
 > for loops

I think now you are over-thinking this.  Iterators *are* iterables.
You have one because somebody told you it's iterable, and you want to
use it in a 'for' loop.  You only need to know that it's an iterator
if you want to re-iterate from the beginning, rather than re-start
from where you left off.

"Iterator" is the marked case.  But the "marker" is that you find out
about it when it doesn't "do what I meant".

 > I think many people do get this, and that’s exactly what leads to
 > confusion. They think that “lazy” and “iterator” (or “consumed when
 > you loop over it”) go together. But they don’t.

I'll grant that my words admit such confusion, especially if people
are predisposed to it.  I think they are.  After all, none of your
"many people" have read my thoughts on the matter before this thread!
Just as there are times when LBYL is the appropriate programming
technique (even though EAFP is possible), sometimes people who don't
read the whole relevant manual section in advance are going to get
burned by their guesses and analogies (especially if they got them
from others of the same type).

 > > Back to the discussion: the child can touch both, and does so
 > > frequently (assuming you don't feed them from the dog's bowl and
 > > also bathe them regularly).  They've seen glasses break, most
 > > likely, and splashed water.
 > 
 > And someone learning Python does get to touch both things
 > here. They get lists, dicts, and ranges, and they get files, zips,
 > and enumerate. Both categories come up pretty early in learning
 > Python, just like both solids and liquids come up pretty early in
 > learning to be human.

No, they don't, in a sense I explained.  Until the student has a use
case where they need to restart (either where they left off or from
the beginning) they can't tell the difference because they just put
the whatever in a 'for' statement which works like magic -- and to
them it is pure magic, because they don't know what iterable or
iterator or __iter__ or iter or __next__ or next are.  They just know
you can use lists and some other things in a 'for' statement.  The
restart distinction may not come up for a long time.  I didn't really
have a use case for it, until one time I wanted to do something with
mbox files and I didn't like what the mailbox module does.  So I had
to roll my own.

 > No, it’s iterables whose purpose is being fed to a for statement. 

I disgree, both in the abstract (Sequences are iterable, but don't
necessarily have an __iter__, and so I don't see how you can support
your assertion that their purpose is to be fed to 'for') and in the
concrete (lots of iterables with __iter__ are instantiated and never
intended to be iterated, yet are useful).  By contrast, every iterator
has an __iter__, and the technical term for an iterator that is never
iterated is "garbage".

 > Yes, iterators are what for statements use under the covers to deal
 > with iterables, but you don’t need to learn that until well after
 > you’ve learned that iterators are what you get from open and zip.

True enough, my bad.  I was confounding two documentation problems
there.  One is teaching new users, and the other is helping experts
get it exactly right.  I've mixed them up quite a bit, but my list of
5 points should be thought of as aimed at a concise but comprehensive
description rather than a tutorial.

 > You don’t have to call them “file iterators”, you just have to have
 > to word “iterator” lying around to teach them when they ask why
 > they can’t loop over a file twice. Which we do.

Eh, that's my argument. :-)

 > In the same way, you don’t need to call lists “list iterables”[.]

And there's no way that I would.  "Iterable" is an adjective.  The
usage "iterables" for the class of iterable objects is something of an
abuse.[2]  My point about files is that they're the thing I would
expect would be most folks' first unpleasant encounter with an
exhausted iterator object, and by naming them as "file iterators" you
might be able to induce a lot of "a ha!" moments.  You come around to
a related suggestion below.  I admit that the "file iterator"
suggestion is pretty implausible.

 > You just need to have the word “iterable” lying around to teach them
 > when they ask what other kinds of things can go in a for loop.

I don't think you meant to write that: when they ask that, you don't
say "iterables, of course", you say "tuples, sets, and perhaps
surprisingly dicts, as well as dict views, and many other things."
It's only when you or the student need a name for that whole class
that you bring up the term "iterable" (at least in its noun form).
But I don't think that comes up, at least on the student side, for
quite a while.  A good student might ask "what else is iterable?" but
"What else can I use in a 'for' statement?" is perfectly serviceable.
I suppose the teacher might find it painful to completely avoid the
term "iterable" (especially as an adjective, and "iterator", for that
matter), but I would solve that problem as in the dialog: just use
them in such a way that the student doesn't need to remember them.  I
think that's quite do-able, even natural.

I do not claim this leaves the student with a complete and
satisfactory understanding of the concept of iterator, merely that it
allows them to understand the difference between iterables that start
from where they left off and those that begin again at the beginning.

 > And you don’t need to call lists “list collections”, you just need
 > to have the word “collection” lying around to teach them when they
 > ask why ranges and lists and dicts let you loop over their values
 > over and over.

Have you ever been asked that, outside of the context of explaining
why files, zips, etc. don't allow re-iteration from the start?  Has
anyone come to you puzzled because the second loop over a list did
useful work?

 > > We have that word and distinction.  A file object *is* an
 > > iterator.  A list is *not* an iterator.  *for* works *with*
 > > iterators internally, and *on* iterables through the magic of
 > > __iter__.
 > 
 > “Not an iterator” is not a word. Of course you _can_ talk about
 > things that don’t have names by being circuitous, but it’s harder.

Or you can not talk about them at all.  This is very frustrating,
because I agree with everything you say as a general principle, but
your concrete discussion never refers to iterators or iterables.  It's
always an analogy to birds and reptiles and plasmas and liquids.

I think that analogy breaks down because I doubt new programmers get
confused by the fact that they can re-iterate over lists.  Like, not
ever.  I'd even bet that students who try breaking out, then
restarting where they left off, and have it fail by restarting from
the beginning, are disappointed but not shocked.  So when do you
*need* to talk about non-iterator iterables?  Outside of threads like
this one?

 > And in practice, people do need to think about “things that can be
 > looped over repeatedly and give you their values over and over”,
 > and having to say “iterables that are not iterators” may be
 > technically sufficient, but practically it makes communication and
 > thought harder.

Or you can just treat "things that can be looped over repeatedly and
give you their values over and over" as the unmarked case of "iterable",
and speak of "iterators" when you need to distinguish the marked
case.[3]  Use of "marking" is something we do all the time.  I can't
say for sure that it would work here, but nothing you've written yet
convinces me it wouldn't.

 > It means we have to be more verbose and less to the point,

It doesn't mean we *have* to be more verbose, in principle.  "Marking"
works fine in natural language, just as anaphoric "it" does.  I may be
missing something, but you need to be more concrete about what the
need for this word (yet to be named) is.

 > and people make silly mistakes like the one in the parent thread,
 > and people make more serious mistakes like teaching others that
 > ranges are iterators,

Indeed they do.  I don't think that has as much to do with people not
having a word for iterables that aren't iterators as it does with them
not understanding what an iterator is.  Just because you have a word,
say "nandaro", for iterables that aren't iterators doesn't mean that
otherwise well-informed people will correctly classify ranges as
nandaro rather than incorrectly as iterators.

As far as I can tell, most of the rest of your post addresses an
argument that I'm not making, and I don't know how to do it better, so
I'm just going to let it rest there.

As mentioned above, this captures a good bit of what I'm trying to get
at:

 > On the other hand, this would certainly get the notion of “files
 > are streams” across to novices (as opposed to people coming from
 > other languages) faster and more easily than we do today, which
 > might help a lot of them. It might even turn out to solve the “why
 > can’t I loop over this file twice” question for a lot of people in
 > a different way, and that different way might be something you
 > could build on to explain the difference between zip and
 > range. “Like a stream” is much more accurate than “because it wants
 > to be lazy”, and maybe easier to understand as well.


Footnotes: 
[1]  Or maybe "marked" doesn't apply here because those words are on
equal footing -- I'm not a linguist, I've just heard the concept
discussed by real linguists.

[2]  Linguists have a technical term for this kind of "abuse" but I
don't remember it.

[3]  I recognize that you can create objects that break this
dichotomy.  I doubt they're important enough to impede discussion for
lack of the word for "non-iterator iterables".  Again, concrete
examples would really help.

_______________________________________________
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/VNUP3DKVWHFG2PZP4TW7LXVEDY2STA4J/
Code of Conduct: http://python.org/psf/codeofconduct/

Reply via email to