Re: [rust-dev] Proposal for clarifying the iterator protocol

Daniel Micay Sat, 03 Aug 2013 22:50:38 -0700

On Sat, Aug 3, 2013 at 9:18 PM, Kevin Ballard <kball...@gmail.com> wrote:
> The iterator protocol, as I'm sure you're aware, is the protocol that
> defines the behavior of the Iterator trait. Unfortunately, at the moment the
> trait does not document what happens if you call `.next()` on an iterator
> after a previous call has returned `None`. According to Daniel Micay, the
> intention was that the iterator would return `None` forever. However, this
> is not guaranteed by at least one iterator adaptor (Scan), nor is it
> documented. Furthermore, no thought has been given to what happens if an
> iterator pipeline has side-effects. A trivial example of the side-effect
> problem is this:
>
>     let x = [1,2,3];
>     let mut it = x.iter().peek_(|x| printfln!(*x)).scan(true, |st, &x| { if
> *st { *st = false; Some(x) } else { None } });
>     (it.next(), it.next(), it.next())
>
> This results in `(Some(1), None, None)` but it prints out
>
>     &1
>     &2
>     &3
>
> After giving it some thought, I came up with 3 possible definitions for
> behavior in this case:
>
> 1. Once `.next()` has returned `None`, it will return None forever.
> Furthermore, calls to `.next()` after `None` has been returned will not
> trigger side-effects in the iterator pipeline. This means that once
> `.next()` has returned `None`, it becomes idempotent.
>
>    This is most likely going to be what people will assume the iterator
> protocol defines, in the absence of any explicit statement. What's more,
> they probably won't even consider the side-effects case.
>
>    Implementing this will require care be given to every single iterator and
> iterator adaptor. Most iterators will probably behave like this (unless they
> use a user-supplied closure), but a number of different iterator adaptors
> will need to track this explicitly with a bool flag. It's likely that
> user-supplied iterator adaptors will forget to enforce this and will
> therefore behave subtlely wrong in the face of side-effects.
>
> 2. Once `.next()` has returned `None`, it will return `None` forever. No
> statement is made regarding side-effects.
>
>    This is what most people will think they're assuming, if asked. The
> danger here is that they will almost certainly actaully assume #1, and thus
> may write subtlely incorrect code if they're given an iterator pipeline with
> side-effects.
>
>    This is easier to implement than #1. Most iterators will do this already.
> Iterator adaptors will generally only have to take care when they use a
> user-supplied closure (e.g. `scan()`).
>
> 3. The behavior of `.next()` after `None` has been returned is left
> undefined. Individual iterators may choose to define behavior here however
> they see fit.
>
>    This is what we actually have implemented in the standard libraries
> today. It's also by far the easiest to implement, as iterators and adaptors
> may simply choose to not define any particular behavior.
>
>    This is made more attractive by the fact that some iterators may choose
> to actually define behavior that's different than "return `None` forever".
> For example, a user may write an iterator that wraps non-blocking I/O,
> returning `None` when there's no data available and returning `Some(x)`
> again once more data comes in. Or if you don't like that example, they could
> write an iterator that may be updated to contain more data after being
> exhausted.
>
>    The downside is that users may assume #1 when #3 holds, which is why this
> needs to be documented properly.
>
> ---
>
> I believe that #3 is the right behavior to define. This gives the most
> flexibility to individual iterators, and we can provide an iterator adaptor
> that gives any iterator the behavior defined by #1 (see Fuse in PR #8276).
>
> I am not strongly opposed to defining #1 instead, but I am mildly worried
> about the likelihood that users will implement iterators that don't have
> this guarantee, as this is not something that can be statically checked by
> the compiler. What's more, if an iterator breaks this guarantee, the problem
> will show up in the code that calls it, rather than in the iterator itself,
> which may make debugging harder.
>
> I am strongly opposed to #2. If we guarantee that an iterator that returns
> `None` once will return `None` forever, users will assume that this means
> that `.next()` becomes idempotent (with regards to side-effects) after
> `None` is returned, but this will not be true. Furthermore, users will
> probably not even realize they've made a bad assumption, as most users will
> not be thinking about side-effects when consuming iterators.
>
> I've already gone ahead and implemented #3 in pull request #8276.
>
> -Kevin


I'm leaning towards #2 or #3, mostly because adaptors *not*
dispatching to the underlying next() implementation are too complex.

I took a look at the behaviour of Python's iterators in these corner
cases as good baseline for comparison:

~~~
>>> def peek(it):
...     for x in it:
...         print(x)
...         yield x
...
>>> xs = [1, 2, 3]
>>> ys = [1, 2, 3, 4, 5]
~~~

You can tell their `zip` function short-circuits, and simply
dispatches to the underlying implementations. Rust's `zip` is similar
but doesn't currently short-circuit (it might as well).

~~~
>>> it = zip(peek(ys), xs)
>>> next(it)
1
(1, 1)
>>> next(it)
2
(2, 2)
>>> next(it)
3
(3, 3)
>>> next(it)
4
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
StopIteration
>>> next(it)
5
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
StopIteration
>>> next(it)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
StopIteration
>>> it = zip(xs, peek(ys))
>>> next(it)
1
(1, 1)
>>> next(it)
2
(2, 2)
>>> next(it)
3
(3, 3)
>>> next(it)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
StopIteration
~~~

It also makes no attempt to store whether it has stopped internally,
and will start yielding again if each iterator yields an element when
zip asks for them one by one (keeping in mind that it short-circuits).

Most other language keep `hasNext` and `next` separate (D and Scala,
among others) leading to more corner cases, and they do not seem to
clearly define the semantics for side effects down the pipeline.

http://dlang.org/phobos/std_range.html
http://www.scala-lang.org/api/current/scala/collection/Iterator.html
_______________________________________________
Rust-dev mailing list
Rust-dev@mozilla.org
https://mail.mozilla.org/listinfo/rust-dev

Re: [rust-dev] Proposal for clarifying the iterator protocol

Reply via email to