Re: [rust-dev] Proposal for clarifying the iterator protocol

Kevin Ballard Mon, 05 Aug 2013 16:09:35 -0700

As an example of something that #3 makes possible, here's a proof-of-concept 
implementation of the previously-mentioned push back behavior (renamed here as 
.push_front(), because .push_back() in this context sounds like it pushes onto 
the end):


pub struct PushFront<A, T> {
    priv val: Option<A>,
    priv iter: T
}

impl<A, T: Iterator<A>> PushFront<A, T> {
    #[inline]
    fn new(iter: T) -> PushFront<A, T> {
        PushFront{ val: None, iter: iter }
    }
    
    /// Pushes the value onto the front of the iterator, replacing any previous
    /// pushed value if it hasn't already been yielded.
    ///
    /// Only one value may be buffered at any time.
    #[inline]
    fn push_front(&mut self, val: A) {
        self.val = Some(val);
    }
}

impl<A, T: Iterator<A>> Iterator<A> for PushFront<A, T> {
    #[inline]
    fn next(&mut self) -> Option<A> {
        if self.val.is_some() {
            self.val.take()
        } else {
            self.iter.next()
        }
    }
    
    #[inline]
    fn size_hint(&self) -> (uint, Option<uint>) {
        let (low, hi) = self.iter.size_hint();
        if self.val.is_some() {
            (low+1, hi.map_consume(|x| if x < uint::max_value { x+1 } else { x 
}))
        } else {
            (low, hi)
        }
    }
}

impl<A, T: DoubleEndedIterator<A>> DoubleEndedIterator<A> for PushFront<A, T> {
    #[inline]
    fn next_back(&mut self) -> Option<A> {
        match self.iter.next() {
            Some(x) => Some(x),
            None if self.val.is_some() => self.val.take(),
            _ => None
        }
    }
}

impl<A: Clone, T: RandomAccessIterator<A>> RandomAccessIterator<A> for 
PushFront<A, T> {
    #[inline]
    fn indexable(&self) -> uint {
        let x = self.iter.indexable();
        if self.val.is_some() {
            if x < uint::max_value { x+1 } else { x }
        } else { x }
    }
    
    #[inline]
    fn idx(&self, index: uint) -> Option<A> {
        if index == 0 && self.val.is_some() {
            self.val.clone()
        } else {
            self.iter.idx(index-1)
        }
    }
}

-Kevin

On Aug 4, 2013, at 8:33 PM, Kevin Ballard <[email protected]> wrote:

> You could certainly design a non-blocking IO iterator that way. My example of 
> a non-blocking IO iterator wasn't meant to illustrate something I'd actually 
> recommend you do (since the iterator protocol isn't really a great fit for 
> this), but rather show the flexibility of the approach.
> 
> That said, in the non-blocking IO approach, it does still "exhaust the 
> elements available", it just so happens that more elements become available 
> over time.
> 
> A better example might be an iterator that allows you to push more elements 
> onto the end. Or one that allows you to move the iterator backwards in time, 
> so to speak, so it re-emits the same elements it emitted previously (e.g. 
> give it a .prev() method). Or merely one that allows you to push back a 
> single element (that gets yielded on the next call to .next()), which is 
> actually fairly common in character iterators in other languages (gives you 
> the equivalent of a .peek() operation, by consuming a value and then 
> immediately pushing it back if you don't want it). None of these examples are 
> possible with approaches #1 or #2, only with approach #3. And all of them are 
> still compatible with the basic for loop.
> 
> -Kevin
> 
> On Aug 4, 2013, at 5:53 PM, Jason Fager <[email protected]> wrote:
> 
>> Not confused, I understand your point about for loops not caring about what 
>> happens with additional 'next' calls.  
>> 
>> But as a user of an iterator, I have the expectation that a for loop 
>> exhausts the elements available from an iterator unless I return early or 
>> break.  Designing an iterator that intentionally sidesteps that expectation 
>> seems like a bad idea.  Principle of least astonishment, etc.  
>> 
>> And yes, of course, iterators already return Option.  But they return Option 
>> to satisfy the *iterator protocol*, not the use cases you described.  I'm 
>> talking about adding another layer of Option  
>> 
>> So say I want to implement non-blocking io that plays nice w/ the iterator 
>> protocol.  Using an implementation taking advantage of option#3 look 
>> something like:
>> 
>> loop {
>>     for i in iter {
>>         foo(i);
>>     }
>>     // other stuff
>>     if(noReallyImDone) {
>>         break;
>>     }
>> }
>> 
>> 
>> While hoisting the Iterator's type into another layer of Option looks like:
>> 
>> for i in iter {
>>     match i {
>>         Some(i) => foo(i);
>>         None => {
>>            //other stuff
>>         }
>>     }
>> }
>> 
>> The outer Option allows the iterator protocol to work as expected, i.e. 
>> iterate over all available elements in the iterator, and the inner 
>> implements the non-blocking protocol you're looking for.
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> On Sun, Aug 4, 2013 at 8:12 PM, Kevin Ballard <[email protected]> wrote:
>> I suspect you're confused about something.
>> 
>> The for loop doesn't care in the slightest what an iterator does after it's 
>> returned None. All 3 approaches work equally well as far as the for loop is 
>> concerned.
>> 
>> And I'm not sure what you mean by "design Iterators that would take 
>> advantage of the undefined behavior". If an iterator defines how it behaves 
>> after returning None, then it's defined behavior. If you're using iterators 
>> and you know your entire iterator pipeline, then you can use whatever 
>> behavior the iterators involved define. You only need to restrict yourself 
>> to what the iterator protocol defines if you don't know what iterator you're 
>> consuming.
>> 
>> I also don't understand your suggestion about using Option. Iterators 
>> already return an Option.
>> 
>> -Kevin
>> 
>> On Aug 4, 2013, at 4:45 PM, Jason Fager <[email protected]> wrote:
>> 
>>> Of course.  I think I'm reacting more to the possible use cases you 
>>> described for option 3 than the actual meaning of it.  It seems like a 
>>> really bad idea to design iterators that would take advantage of the 
>>> undefined behavior, not least b/c it's unexpected and not supported by the 
>>> most pervasive client of the iterator protocol (the for loop, in the sense 
>>> of actually iterating through all elements available through the iterator), 
>>> but that doesn't mean option 3 is in itself the wrong thing to do.  
>>> 
>>> But addressing the use cases you mentioned, if you need that kind of 
>>> functionality, shouldn't you be hoisting the iterator's return type into 
>>> its own Option?  i.e., an Iterator<T> should be become an 
>>> Iterator<Option<T>>?
>>> 
>>> 
>>> On Sun, Aug 4, 2013 at 6:23 PM, Kevin Ballard <[email protected]> wrote:
>>> The new for loop works with all 3 of these. Your output shows that it 
>>> queried .next() twice, and got a single Some(1) result back. Once it gets 
>>> None, it never calls .next() again, whereas the 3 behaviors stated 
>>> previously are exclusively concerned with what happens if you call .next() 
>>> again after it has already returned None.
>>> 
>>> -Kevin
>>> 
>>> P.S. I changed the email address that I'm subscribed to this list with, so 
>>> apologies for any potential confusion.
>>> 
>>> On Aug 4, 2013, at 6:18 AM, Jason Fager <[email protected]> wrote:
>>> 
>>>> The new for loop already assumes #2, right?
>>>> 
>>>> let x = [1,2,3];
>>>> let mut it = x.iter().peek_(|x| printfln!(*x)).scan(true, |st, &x| { if 
>>>> *st { *st = false; Some(x) } else { None } });
>>>> 
>>>> for i in it {
>>>>     printfln!("from for loop: %?", i);
>>>> }
>>>> 
>>>> 
>>>> Which produces:
>>>> 
>>>> &1
>>>> from for loop: 1
>>>> &2
>>>> 
>>>> 
>>>> 
>>>> On Sun, Aug 4, 2013 at 1:49 AM, Daniel Micay <[email protected]> wrote:
>>>> On Sat, Aug 3, 2013 at 9:18 PM, Kevin Ballard <[email protected]> wrote:
>>>> > The iterator protocol, as I'm sure you're aware, is the protocol that
>>>> > defines the behavior of the Iterator trait. Unfortunately, at the moment 
>>>> > the
>>>> > trait does not document what happens if you call `.next()` on an iterator
>>>> > after a previous call has returned `None`. According to Daniel Micay, the
>>>> > intention was that the iterator would return `None` forever. However, 
>>>> > this
>>>> > is not guaranteed by at least one iterator adaptor (Scan), nor is it
>>>> > documented. Furthermore, no thought has been given to what happens if an
>>>> > iterator pipeline has side-effects. A trivial example of the side-effect
>>>> > problem is this:
>>>> >
>>>> >     let x = [1,2,3];
>>>> >     let mut it = x.iter().peek_(|x| printfln!(*x)).scan(true, |st, &x| { 
>>>> > if
>>>> > *st { *st = false; Some(x) } else { None } });
>>>> >     (it.next(), it.next(), it.next())
>>>> >
>>>> > This results in `(Some(1), None, None)` but it prints out
>>>> >
>>>> >     &1
>>>> >     &2
>>>> >     &3
>>>> >
>>>> > After giving it some thought, I came up with 3 possible definitions for
>>>> > behavior in this case:
>>>> >
>>>> > 1. Once `.next()` has returned `None`, it will return None forever.
>>>> > Furthermore, calls to `.next()` after `None` has been returned will not
>>>> > trigger side-effects in the iterator pipeline. This means that once
>>>> > `.next()` has returned `None`, it becomes idempotent.
>>>> >
>>>> >    This is most likely going to be what people will assume the iterator
>>>> > protocol defines, in the absence of any explicit statement. What's more,
>>>> > they probably won't even consider the side-effects case.
>>>> >
>>>> >    Implementing this will require care be given to every single iterator 
>>>> > and
>>>> > iterator adaptor. Most iterators will probably behave like this (unless 
>>>> > they
>>>> > use a user-supplied closure), but a number of different iterator adaptors
>>>> > will need to track this explicitly with a bool flag. It's likely that
>>>> > user-supplied iterator adaptors will forget to enforce this and will
>>>> > therefore behave subtlely wrong in the face of side-effects.
>>>> >
>>>> > 2. Once `.next()` has returned `None`, it will return `None` forever. No
>>>> > statement is made regarding side-effects.
>>>> >
>>>> >    This is what most people will think they're assuming, if asked. The
>>>> > danger here is that they will almost certainly actaully assume #1, and 
>>>> > thus
>>>> > may write subtlely incorrect code if they're given an iterator pipeline 
>>>> > with
>>>> > side-effects.
>>>> >
>>>> >    This is easier to implement than #1. Most iterators will do this 
>>>> > already.
>>>> > Iterator adaptors will generally only have to take care when they use a
>>>> > user-supplied closure (e.g. `scan()`).
>>>> >
>>>> > 3. The behavior of `.next()` after `None` has been returned is left
>>>> > undefined. Individual iterators may choose to define behavior here 
>>>> > however
>>>> > they see fit.
>>>> >
>>>> >    This is what we actually have implemented in the standard libraries
>>>> > today. It's also by far the easiest to implement, as iterators and 
>>>> > adaptors
>>>> > may simply choose to not define any particular behavior.
>>>> >
>>>> >    This is made more attractive by the fact that some iterators may 
>>>> > choose
>>>> > to actually define behavior that's different than "return `None` 
>>>> > forever".
>>>> > For example, a user may write an iterator that wraps non-blocking I/O,
>>>> > returning `None` when there's no data available and returning `Some(x)`
>>>> > again once more data comes in. Or if you don't like that example, they 
>>>> > could
>>>> > write an iterator that may be updated to contain more data after being
>>>> > exhausted.
>>>> >
>>>> >    The downside is that users may assume #1 when #3 holds, which is why 
>>>> > this
>>>> > needs to be documented properly.
>>>> >
>>>> > ---
>>>> >
>>>> > I believe that #3 is the right behavior to define. This gives the most
>>>> > flexibility to individual iterators, and we can provide an iterator 
>>>> > adaptor
>>>> > that gives any iterator the behavior defined by #1 (see Fuse in PR 
>>>> > #8276).
>>>> >
>>>> > I am not strongly opposed to defining #1 instead, but I am mildly worried
>>>> > about the likelihood that users will implement iterators that don't have
>>>> > this guarantee, as this is not something that can be statically checked 
>>>> > by
>>>> > the compiler. What's more, if an iterator breaks this guarantee, the 
>>>> > problem
>>>> > will show up in the code that calls it, rather than in the iterator 
>>>> > itself,
>>>> > which may make debugging harder.
>>>> >
>>>> > I am strongly opposed to #2. If we guarantee that an iterator that 
>>>> > returns
>>>> > `None` once will return `None` forever, users will assume that this means
>>>> > that `.next()` becomes idempotent (with regards to side-effects) after
>>>> > `None` is returned, but this will not be true. Furthermore, users will
>>>> > probably not even realize they've made a bad assumption, as most users 
>>>> > will
>>>> > not be thinking about side-effects when consuming iterators.
>>>> >
>>>> > I've already gone ahead and implemented #3 in pull request #8276.
>>>> >
>>>> > -Kevin
>>>> 
>>>> I'm leaning towards #2 or #3, mostly because adaptors *not*
>>>> dispatching to the underlying next() implementation are too complex.
>>>> 
>>>> I took a look at the behaviour of Python's iterators in these corner
>>>> cases as good baseline for comparison:
>>>> 
>>>> ~~~
>>>> >>> def peek(it):
>>>> ...     for x in it:
>>>> ...         print(x)
>>>> ...         yield x
>>>> ...
>>>> >>> xs = [1, 2, 3]
>>>> >>> ys = [1, 2, 3, 4, 5]
>>>> ~~~
>>>> 
>>>> You can tell their `zip` function short-circuits, and simply
>>>> dispatches to the underlying implementations. Rust's `zip` is similar
>>>> but doesn't currently short-circuit (it might as well).
>>>> 
>>>> ~~~
>>>> >>> it = zip(peek(ys), xs)
>>>> >>> next(it)
>>>> 1
>>>> (1, 1)
>>>> >>> next(it)
>>>> 2
>>>> (2, 2)
>>>> >>> next(it)
>>>> 3
>>>> (3, 3)
>>>> >>> next(it)
>>>> 4
>>>> Traceback (most recent call last):
>>>>   File "<stdin>", line 1, in <module>
>>>> StopIteration
>>>> >>> next(it)
>>>> 5
>>>> Traceback (most recent call last):
>>>>   File "<stdin>", line 1, in <module>
>>>> StopIteration
>>>> >>> next(it)
>>>> Traceback (most recent call last):
>>>>   File "<stdin>", line 1, in <module>
>>>> StopIteration
>>>> >>> it = zip(xs, peek(ys))
>>>> >>> next(it)
>>>> 1
>>>> (1, 1)
>>>> >>> next(it)
>>>> 2
>>>> (2, 2)
>>>> >>> next(it)
>>>> 3
>>>> (3, 3)
>>>> >>> next(it)
>>>> Traceback (most recent call last):
>>>>   File "<stdin>", line 1, in <module>
>>>> StopIteration
>>>> ~~~
>>>> 
>>>> It also makes no attempt to store whether it has stopped internally,
>>>> and will start yielding again if each iterator yields an element when
>>>> zip asks for them one by one (keeping in mind that it short-circuits).
>>>> 
>>>> Most other language keep `hasNext` and `next` separate (D and Scala,
>>>> among others) leading to more corner cases, and they do not seem to
>>>> clearly define the semantics for side effects down the pipeline.
>>>> 
>>>> http://dlang.org/phobos/std_range.html
>>>> http://www.scala-lang.org/api/current/scala/collection/Iterator.html
>>>> _______________________________________________
>>>> Rust-dev mailing list
>>>> [email protected]
>>>> https://mail.mozilla.org/listinfo/rust-dev
>>>> 
>>>> _______________________________________________
>>>> Rust-dev mailing list
>>>> [email protected]
>>>> https://mail.mozilla.org/listinfo/rust-dev
>>> 
>>> 
>> 
>> 
> 

_______________________________________________
Rust-dev mailing list
[email protected]
https://mail.mozilla.org/listinfo/rust-dev

Re: [rust-dev] Proposal for clarifying the iterator protocol

Reply via email to