Re: [rust-dev] Proposal for clarifying the iterator protocol

Jason Fager Sun, 04 Aug 2013 17:54:10 -0700

Not confused, I understand your point about for loops not caring about what
happens with additional 'next' calls.


But as a user of an iterator, I have the expectation that a for loop
exhausts the elements available from an iterator unless I return early or
break.  Designing an iterator that intentionally sidesteps that expectation
seems like a bad idea.  Principle of least astonishment, etc.

And yes, of course, iterators already return Option.  But they return
Option to satisfy the *iterator protocol*, not the use cases you
described.  I'm talking about adding another layer of Option

So say I want to implement non-blocking io that plays nice w/ the iterator
protocol.  Using an implementation taking advantage of option#3 look
something like:

loop {
    for i in iter {
        foo(i);
    }
    // other stuff
    if(noReallyImDone) {
        break;
    }
}


While hoisting the Iterator's type into another layer of Option looks like:

for i in iter {
    match i {
        Some(i) => foo(i);
        None => {
           //other stuff
        }
    }
}

The outer Option allows the iterator protocol to work as expected, i.e.
iterate over all available elements in the iterator, and the inner
implements the non-blocking protocol you're looking for.







On Sun, Aug 4, 2013 at 8:12 PM, Kevin Ballard <[email protected]> wrote:

> I suspect you're confused about something.
>
> The for loop doesn't care in the slightest what an iterator does after
> it's returned None. All 3 approaches work equally well as far as the for
> loop is concerned.
>
> And I'm not sure what you mean by "design Iterators that would take
> advantage of the undefined behavior". If an iterator defines how it behaves
> after returning None, then it's defined behavior. If you're using iterators
> and you know your entire iterator pipeline, then you can use whatever
> behavior the iterators involved define. You only need to restrict yourself
> to what the iterator protocol defines if you don't know what iterator
> you're consuming.
>
> I also don't understand your suggestion about using Option. Iterators *
> already* return an Option.
>
> -Kevin
>
> On Aug 4, 2013, at 4:45 PM, Jason Fager <[email protected]> wrote:
>
> Of course.  I think I'm reacting more to the possible use cases you
> described for option 3 than the actual meaning of it.  It seems like a
> really bad idea to design iterators that would take advantage of the
> undefined behavior, not least b/c it's unexpected and not supported by the
> most pervasive client of the iterator protocol (the for loop, in the sense
> of actually iterating through all elements available through the iterator),
> but that doesn't mean option 3 is in itself the wrong thing to do.
>
> But addressing the use cases you mentioned, if you need that kind of
> functionality, shouldn't you be hoisting the iterator's return type into
> its own Option?  i.e., an Iterator<T> should be become an
> Iterator<Option<T>>?
>
>
> On Sun, Aug 4, 2013 at 6:23 PM, Kevin Ballard <[email protected]> wrote:
>
>> The new for loop works with all 3 of these. Your output shows that it
>> queried .next() twice, and got a single Some(1) result back. Once it gets
>> None, it never calls .next() again, whereas the 3 behaviors stated
>> previously are exclusively concerned with what happens if you call .next()
>> again after it has already returned None.
>>
>> -Kevin
>>
>> P.S. I changed the email address that I'm subscribed to this list with,
>> so apologies for any potential confusion.
>>
>> On Aug 4, 2013, at 6:18 AM, Jason Fager <[email protected]> wrote:
>>
>> The new for loop already assumes #2, right?
>>
>> let x = [1,2,3];
>> let mut it = x.iter().peek_(|x| printfln!(*x)).scan(true, |st, &x| { if
>> *st { *st = false; Some(x) } else { None } });
>>
>> for i in it {
>>     printfln!("from for loop: %?", i);
>> }
>>
>>
>> Which produces:
>>
>> &1
>> from for loop: 1
>> &2
>>
>>
>>
>> On Sun, Aug 4, 2013 at 1:49 AM, Daniel Micay <[email protected]>wrote:
>>
>>> On Sat, Aug 3, 2013 at 9:18 PM, Kevin Ballard <[email protected]>
>>> wrote:
>>> > The iterator protocol, as I'm sure you're aware, is the protocol that
>>> > defines the behavior of the Iterator trait. Unfortunately, at the
>>> moment the
>>> > trait does not document what happens if you call `.next()` on an
>>> iterator
>>> > after a previous call has returned `None`. According to Daniel Micay,
>>> the
>>> > intention was that the iterator would return `None` forever. However,
>>> this
>>> > is not guaranteed by at least one iterator adaptor (Scan), nor is it
>>> > documented. Furthermore, no thought has been given to what happens if
>>> an
>>> > iterator pipeline has side-effects. A trivial example of the
>>> side-effect
>>> > problem is this:
>>> >
>>> >     let x = [1,2,3];
>>> >     let mut it = x.iter().peek_(|x| printfln!(*x)).scan(true, |st, &x|
>>> { if
>>> > *st { *st = false; Some(x) } else { None } });
>>> >     (it.next(), it.next(), it.next())
>>> >
>>> > This results in `(Some(1), None, None)` but it prints out
>>> >
>>> >     &1
>>> >     &2
>>> >     &3
>>> >
>>> > After giving it some thought, I came up with 3 possible definitions for
>>> > behavior in this case:
>>> >
>>> > 1. Once `.next()` has returned `None`, it will return None forever.
>>> > Furthermore, calls to `.next()` after `None` has been returned will not
>>> > trigger side-effects in the iterator pipeline. This means that once
>>> > `.next()` has returned `None`, it becomes idempotent.
>>> >
>>> >    This is most likely going to be what people will assume the iterator
>>> > protocol defines, in the absence of any explicit statement. What's
>>> more,
>>> > they probably won't even consider the side-effects case.
>>> >
>>> >    Implementing this will require care be given to every single
>>> iterator and
>>> > iterator adaptor. Most iterators will probably behave like this
>>> (unless they
>>> > use a user-supplied closure), but a number of different iterator
>>> adaptors
>>> > will need to track this explicitly with a bool flag. It's likely that
>>> > user-supplied iterator adaptors will forget to enforce this and will
>>> > therefore behave subtlely wrong in the face of side-effects.
>>> >
>>> > 2. Once `.next()` has returned `None`, it will return `None` forever.
>>> No
>>> > statement is made regarding side-effects.
>>> >
>>> >    This is what most people will think they're assuming, if asked. The
>>> > danger here is that they will almost certainly actaully assume #1, and
>>> thus
>>> > may write subtlely incorrect code if they're given an iterator
>>> pipeline with
>>> > side-effects.
>>> >
>>> >    This is easier to implement than #1. Most iterators will do this
>>> already.
>>> > Iterator adaptors will generally only have to take care when they use a
>>> > user-supplied closure (e.g. `scan()`).
>>> >
>>> > 3. The behavior of `.next()` after `None` has been returned is left
>>> > undefined. Individual iterators may choose to define behavior here
>>> however
>>> > they see fit.
>>> >
>>> >    This is what we actually have implemented in the standard libraries
>>> > today. It's also by far the easiest to implement, as iterators and
>>> adaptors
>>> > may simply choose to not define any particular behavior.
>>> >
>>> >    This is made more attractive by the fact that some iterators may
>>> choose
>>> > to actually define behavior that's different than "return `None`
>>> forever".
>>> > For example, a user may write an iterator that wraps non-blocking I/O,
>>> > returning `None` when there's no data available and returning `Some(x)`
>>> > again once more data comes in. Or if you don't like that example, they
>>> could
>>> > write an iterator that may be updated to contain more data after being
>>> > exhausted.
>>> >
>>> >    The downside is that users may assume #1 when #3 holds, which is
>>> why this
>>> > needs to be documented properly.
>>> >
>>> > ---
>>> >
>>> > I believe that #3 is the right behavior to define. This gives the most
>>> > flexibility to individual iterators, and we can provide an iterator
>>> adaptor
>>> > that gives any iterator the behavior defined by #1 (see Fuse in PR
>>> #8276).
>>> >
>>> > I am not strongly opposed to defining #1 instead, but I am mildly
>>> worried
>>> > about the likelihood that users will implement iterators that don't
>>> have
>>> > this guarantee, as this is not something that can be statically
>>> checked by
>>> > the compiler. What's more, if an iterator breaks this guarantee, the
>>> problem
>>> > will show up in the code that calls it, rather than in the iterator
>>> itself,
>>> > which may make debugging harder.
>>> >
>>> > I am strongly opposed to #2. If we guarantee that an iterator that
>>> returns
>>> > `None` once will return `None` forever, users will assume that this
>>> means
>>> > that `.next()` becomes idempotent (with regards to side-effects) after
>>> > `None` is returned, but this will not be true. Furthermore, users will
>>> > probably not even realize they've made a bad assumption, as most users
>>> will
>>> > not be thinking about side-effects when consuming iterators.
>>> >
>>> > I've already gone ahead and implemented #3 in pull request #8276.
>>> >
>>> > -Kevin
>>>
>>> I'm leaning towards #2 or #3, mostly because adaptors *not*
>>> dispatching to the underlying next() implementation are too complex.
>>>
>>> I took a look at the behaviour of Python's iterators in these corner
>>> cases as good baseline for comparison:
>>>
>>> ~~~
>>> >>> def peek(it):
>>> ...     for x in it:
>>> ...         print(x)
>>> ...         yield x
>>> ...
>>> >>> xs = [1, 2, 3]
>>> >>> ys = [1, 2, 3, 4, 5]
>>> ~~~
>>>
>>> You can tell their `zip` function short-circuits, and simply
>>> dispatches to the underlying implementations. Rust's `zip` is similar
>>> but doesn't currently short-circuit (it might as well).
>>>
>>> ~~~
>>> >>> it = zip(peek(ys), xs)
>>> >>> next(it)
>>> 1
>>> (1, 1)
>>> >>> next(it)
>>> 2
>>> (2, 2)
>>> >>> next(it)
>>> 3
>>> (3, 3)
>>> >>> next(it)
>>> 4
>>> Traceback (most recent call last):
>>>   File "<stdin>", line 1, in <module>
>>> StopIteration
>>> >>> next(it)
>>> 5
>>> Traceback (most recent call last):
>>>   File "<stdin>", line 1, in <module>
>>> StopIteration
>>> >>> next(it)
>>> Traceback (most recent call last):
>>>   File "<stdin>", line 1, in <module>
>>> StopIteration
>>> >>> it = zip(xs, peek(ys))
>>> >>> next(it)
>>> 1
>>> (1, 1)
>>> >>> next(it)
>>> 2
>>> (2, 2)
>>> >>> next(it)
>>> 3
>>> (3, 3)
>>> >>> next(it)
>>> Traceback (most recent call last):
>>>   File "<stdin>", line 1, in <module>
>>> StopIteration
>>> ~~~
>>>
>>> It also makes no attempt to store whether it has stopped internally,
>>> and will start yielding again if each iterator yields an element when
>>> zip asks for them one by one (keeping in mind that it short-circuits).
>>>
>>> Most other language keep `hasNext` and `next` separate (D and Scala,
>>> among others) leading to more corner cases, and they do not seem to
>>> clearly define the semantics for side effects down the pipeline.
>>>
>>> http://dlang.org/phobos/std_range.html
>>> http://www.scala-lang.org/api/current/scala/collection/Iterator.html
>>> _______________________________________________
>>> Rust-dev mailing list
>>> [email protected]
>>> https://mail.mozilla.org/listinfo/rust-dev
>>>
>>
>> _______________________________________________
>> Rust-dev mailing list
>> [email protected]
>> https://mail.mozilla.org/listinfo/rust-dev
>>
>>
>>
>
>

_______________________________________________
Rust-dev mailing list
[email protected]
https://mail.mozilla.org/listinfo/rust-dev

Re: [rust-dev] Proposal for clarifying the iterator protocol

Reply via email to