[rust-dev] Proposal for clarifying the iterator protocol

Kevin Ballard Sat, 03 Aug 2013 18:18:27 -0700

The iterator protocol, as I'm sure you're aware, is the protocol that
defines the behavior of the Iterator trait. Unfortunately, at the moment
the trait does not document what happens if you call `.next()` on an
iterator after a previous call has returned `None`. According to Daniel
Micay, the intention was that the iterator would return `None` forever.
However, this is not guaranteed by at least one iterator adaptor (Scan),
nor is it documented. Furthermore, no thought has been given to what
happens if an iterator pipeline has side-effects. A trivial example of the
side-effect problem is this:


    let x = [1,2,3];
    let mut it = x.iter().peek_(|x| printfln!(*x)).scan(true, |st, &x| { if
*st { *st = false; Some(x) } else { None } });
    (it.next(), it.next(), it.next())

This results in `(Some(1), None, None)` but it prints out

    &1
    &2
    &3

After giving it some thought, I came up with 3 possible definitions for
behavior in this case:

1. Once `.next()` has returned `None`, it will return None forever.
Furthermore, calls to `.next()` after `None` has been returned will not
trigger side-effects in the iterator pipeline. This means that once
`.next()` has returned `None`, it becomes idempotent.

   This is most likely going to be what people will assume the iterator
protocol defines, in the absence of any explicit statement. What's more,
they probably won't even consider the side-effects case.

   Implementing this will require care be given to every single iterator
and iterator adaptor. Most iterators will probably behave like this (unless
they use a user-supplied closure), but a number of different iterator
adaptors will need to track this explicitly with a bool flag. It's likely
that user-supplied iterator adaptors will forget to enforce this and will
therefore behave subtlely wrong in the face of side-effects.

2. Once `.next()` has returned `None`, it will return `None` forever. No
statement is made regarding side-effects.

   This is what most people will think they're assuming, if asked. The
danger here is that they will almost certainly actaully assume #1, and thus
may write subtlely incorrect code if they're given an iterator pipeline
with side-effects.

   This is easier to implement than #1. Most iterators will do this
already. Iterator adaptors will generally only have to take care when they
use a user-supplied closure (e.g. `scan()`).

3. The behavior of `.next()` after `None` has been returned is left
undefined. Individual iterators may choose to define behavior here however
they see fit.

   This is what we actually have implemented in the standard libraries
today. It's also by far the easiest to implement, as iterators and adaptors
may simply choose to not define any particular behavior.

   This is made more attractive by the fact that some iterators may choose
to actually define behavior that's different than "return `None` forever".
For example, a user may write an iterator that wraps non-blocking I/O,
returning `None` when there's no data available and returning `Some(x)`
again once more data comes in. Or if you don't like that example, they
could write an iterator that may be updated to contain more data after
being exhausted.

   The downside is that users may assume #1 when #3 holds, which is why
this needs to be documented properly.

---

I believe that #3 is the right behavior to define. This gives the most
flexibility to individual iterators, and we can provide an iterator adaptor
that gives any iterator the behavior defined by #1 (see Fuse in PR
#8276<https://github.com/mozilla/rust/pull/8276>
).

I am not strongly opposed to defining #1 instead, but I am mildly worried
about the likelihood that users will implement iterators that don't have
this guarantee, as this is not something that can be statically checked by
the compiler. What's more, if an iterator breaks this guarantee, the
problem will show up in the code that calls it, rather than in the iterator
itself, which may make debugging harder.

I am strongly opposed to #2. If we guarantee that an iterator that returns
`None` once will return `None` forever, users will assume that this means
that `.next()` becomes idempotent (with regards to side-effects) after
`None` is returned, but this will not be true. Furthermore, users will
probably not even realize they've made a bad assumption, as most users will
not be thinking about side-effects when consuming iterators.

I've already gone ahead and implemented #3 in pull request
#8276<https://github.com/mozilla/rust/pull/8276>
.

-Kevin

_______________________________________________
Rust-dev mailing list
[email protected]
https://mail.mozilla.org/listinfo/rust-dev

[rust-dev] Proposal for clarifying the iterator protocol

Reply via email to