Andrew Barnert writes: > And I’m pretty sure that’s exactly the confusion that led you to > think that dict_keys have weird behavior,
That wasn't me .... I'm here to discuss documentation, not dict or sequence views. ;-) Changing the subject field to match. > Students often want to know why this doesn’t work: > > with open("file") as f: > for line in file: > do_stuff(line) > for line in file: > do_other_stuff(line) Sure. *Some* students do. I've never gotten that question from mine, though I do occasionally see with open("file") as f: for line in f: # ;-) do_stuff(line) with open("file") as f: for line in f: do_other_stuff(line) I don't know, maybe they asked the student next to them. :-) > The answer is that files are iterators, while lists are… well, > there is no word. As Chris B said, sure there are words: File objects are *already* iterators, while lists are *not*. My question is, "why isn't that instructive?" > We shouldn’t define everything up front, just the most important > things. But this is one of the most important things. People need > to understand this distinction very early on to use Python, No, they don't. They neither understand, nor (to a large extent) do they *need* to. We cannot solve the problem of "lazy in the technical sense" programming by improving Python. It's a matter of optimizing programmer effort. If cargo culting and asking on Stack Overflow and bitching on Twitter or your personal blog when software doesn't DWIM is psychologically (and frequently time management-ly) cheaper than learning How Things Work, that's what people are going to do. I can't tell them they're wrong (except my own students, and they mostly ignore me until they run out of options other than listening to me :-). ISTM that all we need to say is that 1. An *iterator* is a Python object whose only necessary function is to return an object when next is applied to it. Its purpose is to keep track of "next" for *for*. (It might do other useful things for the user, eg, file objects.) 2. The *for* statement and the *next* builtin require an iterator object to work. Since for *always* needs an iterator object, it automatically converts the "in" object to an iterator implicitly. (Technical note: for the convenience of implementors of 'for', when iter is applied to an iterator, it always returns the iterator itself.) 3. When a "generic" iterator "runs out", it's exhausted, it's truly done. It is no longer useful, and there's nothing you can do but throw it away. Generic iterators do not have a reset method. Specialized iterators may provide one, but most do not. 4. Objects that can be converted to iterators are *iterables*. Trivially, iterators are iterable (see technical note supra). 5. Most Python objects are not iterators, but many can be converted. However, some Python objects are constructed as iterators because they want to be "lazy". Examples are files (so that a huge file can be processed line by line without reading the whole thing into memory) and "generators" which yield a new item each time they are called. But AFAIK we *do* say that, and it doesn't get through. > I can teach a child why a glass will break permanently when you hit > it while a lake won’t by using the words “solid” and “liquid”. Terrible example, since a glass is just a geologically slow liquid. ;-) Back to the discussion: the child can touch both, and does so frequently (assuming you don't feed them from the dog's bowl and also bathe them regularly). They've seen glasses break, most likely, and splashed water. Iterators have one overriding purpose: to be fed to *for* statements, be exhausted, and then discarded. This is so important that it's done implicitly and in every single *for* statement. We have the necessary word, "iterator," but students don't have the necessary experience of "touching" the iterator that *for* actually iterates over instead of the list that is explicit in the *for* statement. That iterator is created implicitly and becomes garbage as soon as the *for* statement. And there's no way for the student to touch it, it doesn't have a name! If you want to fix nomenclature, don't call them "files," don't call them "file objects," call them "file iterators". Then students have an everyday iterator they can touch. I'll guarantee that causes other problems, though, and gets a ton of resistence. Even from me. :-) > Yes, and defining terminology for the one distinction that almost > always is relevant helps distinguish that distinction from the > other ones that rarely come up. Most people (especially novices) > don’t often need to think about the distinction between iterables > that are sized and also containers vs. those that are not both > sized and containers, so the word for that doesn’t buy us much. But > the distinction between iterators and things-like-list-and-so-on > comes up earlier, and a lot more often, so a word for that would > buy us a lot more. We have that word and distinction. A file object *is* an iterator. A list is *not* an iterator. *for* works *with* iterators internally, and *on* iterables through the magic of __iter__. > > But you *don't* use seek(0) on files (which are not iterators, and in > > fact don't actually exist inside of Python, only names for them do). > > You use them on opened *file objects* which are iterators. > > A file object is a file, in the same way that a list object is a > list and an int object is an int. No, it's not the same: your level of abstraction is so high that you've lost sight of the iterable/iterator distinction. All of the latter objects own their own data in a way that a file object does not. All of the latter objects are different from their iterators (where such iterators exist), while the file object is not. > The fact that we use “file” ambiguously for a bunch of related but > contradictory abstractions (a stream that you can read or write, a > directory entry, the thing an inode points to, a document that an > app is working on, …) makes it a bit more confusing, but > unfortunately that ambiguity is forced on people before they even > get to their first attempt at programming, so it’s probably too > late for Python to help (or hurt). Agreed. I would be much happier if we could discuss an example that is *not* iterating over files but *does* come up every day on StackOverflow. Maybe zips would work but I'm not sure the motivation comes together the way it does for files (why do zips want to be lazy? what are the compelling examples for zip of "restarting the iteration where you left off" with a new *for* statement?) > > When you open a file again, by default you get a new iterator > > which begins at the beginning, as you want for those others. > > My point is that none of the other types you mention are iterators. > > I don’t get what you’re driving at here. Simply that we have the necessary distinction already: iterators vs. everything else. IMO the problem is that the students have zero or very little experience of iterators other than files, and so think of file objects as weird iterables, rather than as iterators. > Lists, sets, ranges, dict_keys, etc. are not iterators. You can > write `for x in xs:` over and over and get the values over and > over. Because each time, you get a new iterator over their values. You and I know that, because we know what an iterator is, and we know it's there because it has to be: *for* doesn't iterate anything but an iterator. But (except via a bytecode-level debugger) nobody has ever seen that iterator. You can use iter to get a similar iterator, of course, but it's not the same object that any for statement ever used. (Unless you explicitly created it with iter, but then you can re-run the for statement on it the way you do with a list.) > > The difference with files is just that they happen to exist in > > Python as iterables. But after > > _What_ exists in Python as iterables? Lists, tuples, sets, dicts, and other containers. > Files, maps, zips, generators, etc. are not like that. They’re > iterators. If you write `for x in xs:` twice, you get nothing the > second time, because each time you’re using the same iterator, and > you’ve already used it up. Because iter(xs) is xs when it’s a file > or generator etc. Genexps are iterators, but generators (in the sense of the product of a def that contains "yield") are not even iterable. Those are iterator factories. > The only representation of files in Python is file objects—the > thing you get back from open (or socket.makefile or io.StringIO or > whatever else)—and those are iterators. The thought occurred to me, "What if that was a bad decision? Maybe in principle files shouldn't be iterators, but rather iterables with a real __iter__ that creates the iterable." I realized that I'd already answered my own question in part: I find it easy to imagine cases where I'd want to get some lines of input from a file as a higher-level unit, then stop and do some processing. The killer app for me is mbox files. Another plausible case is reading top-level Lisp expressions from a file (although that doesn't necessarily divide neatly into lines.) I also found it surprisingly complicated to think about the consequences to the type of making that change. Going back to the documentation theme, maybe one way to approach explaining iterators is to start with the use case of files as (non-seekable) streams, show how 'for iteration' can be "restarted" where you left off in the file, and teach that "this is the canonical behavior of iterators; lists etc are *iterable* because 'for' automatically converts them to iterators "behind the scenes". If sockets or pipes were more familiar to beginning programmers, they might be better examples, but I think that files-as-streams might be the most familiar and approachable, though real files are far more flexible than just unseekable streams. I'll try to take a look at the "official" tutorials and language documentation "sometime soon" and see if maybe this idea could be applied to improve them. Steve _______________________________________________ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-le...@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/54J6KBD7YLAGQXN3VLYKG3GAPXLVRQFH/ Code of Conduct: http://python.org/psf/codeofconduct/