Hi,
I'm excited about Rust's iterator library based on external style.
On the other hand, I was thinking about extending current "for" syntax/protocol
to support returning values, which is useful for filter_map() like
functions [*0]. (It might have been discussed before, but I cannot
find it in rust-dev archives)

Example:

    fn test() {
        let src = ~[0, 1, 2, 3, 4];
        let mut sum = 0;
        let dst = for src.filter_map() |&x| {
            sum += x;
            if sum == 10 {
                break x + 1; // => (false, Some(x + 1))
            }
            else if sum > 10 {
                break        // => (false, None)
            }

            if x % 2 == 0 {
                loop         // => (true, None)
            }
            else {
                loop x + 1;  // => (true, Some(x + 1))
            }
        }
    }
    // In extended "for", loop/break statements in the closures are
    // transformed with following rules:
    //   loop      => return (true , None)
    //   loop val  => return (true , Some(val))
    //   break     => return (false, None)
    //   break val => return (false, Some(val))

vec::filter_map() should be slightly modified from original [*1]:

    pure fn filter_map<T, U: Copy>(v: &[T], f: fn(t: &T) -> (bool,
Option<U>)) -> ~[U] {
        let mut result = ~[];
        for each(v) |elem| {
            let (cont, val) = f(elem);
            match val {
                None => { /* no-op */ }
                Some(result_elem) => unsafe { result.push(result_elem); }
            }
            cont || break;
        }
        result
    }

With this extension, closure-calling "for" syntactic sugar is still useful even
if Rust's iterators become external style.  In Daniel Micay's example:

    fn main() {
        let mut it = Counter::new(0.0, 1.0)
            .take_while(|x| *x < 10000000.0)
            .transform(|x| x / 2.0)
            .transform(|x| x + 2.0);
    }

This can be rewritten as follows (with appropriate FilterMapIterator<>
implementation):

    fn main() {
        let mut it = for Counter::new(0.0, 1.0).filter_map() |&x| {
            if (x < 10000000.0) {
                x / 2.0 + 2.0
            }
            else {
                break;
            }
        }
    }

Any combination of transform/filter/skip_whike/take_while/skip/take/scan can be
converted.  I think this is more readable and expressive.

Many languages have special syntax for manipulating iterators (Python's
generator comprehension, C#'s query expression, Haskell's List monads, etc.  I
know they are based on very different approaches and different range of
application, but for similar purposes).  This extension may be seen as one of
them.


[*0]: Sorry for out of the topic in this thread, but I think it is also useful
(and better consistency) to enable "while" statement to return value like
following:

    let xs = ~[0, 1, 2, 4, 5];
    let found_two =
        while i < xs.size() {
            if xs[i] == 2 {
                break true
            }
        }
        else {
            // "else" clause is executed when "while" statement is terminated
            // NOT by a "break" statement. (Python behaves like the same)
            false
        }

[*1]: Original implementation vec::filter_map():

    pure fn filter_map<T, U: Copy>(v: &[T], f: fn(t: &T) -> Option<U>) -> ~[U] {
        let mut result = ~[];
        for each(v) |elem| {
            match f(elem) {
                None => { /* no-op */ }
                Some(result_elem) => unsafe { result.push(result_elem); }
            }
        }
        result


On Thu, Jun 6, 2013 at 1:09 PM, Daniel Micay <[email protected]> wrote:
> A quick terminology refresher, for those who aren't familiar with it:
>
> * Internal iterator: takes a closure, runs the closure until it asks to break
> * External iterator: state machine, advanced by the caller in a loop
>
> To a caller, external iterators provide the most functionality, because they
> can be used as an internal iterator. You lose the state of an internal 
> iterator
> by breaking out of iterator, so generic algorithms like zip, union, intersect
> and merge can't be implemented for a pair of iterators.
>
> # Issues with internal iterators in Rust
>
> A few months ago, we only had internal iterators and there were no generic
> algorithms to use with any iterator - only with BaseIter's `each` method.
>
> Rust's internal iterators implement the protocol encoded in Rust's for
> statement, but it's not possible to give them all a common trait or implement
> generic methods or functions taking any internal iterator.
>
> As a workaround, we can write algorithms assuming internal iterators only take
> one argument (the closure):
>
>     fn count<T>(f: &fn(fn(T) -> bool) -> bool) -> uint
>
> The caller has to use a partial function to call these adaptors, for which we
> lack sugar:
>
>     count(|f| uint::range(0, 10, f))
>
> For simple functions, this is fairly reasonable once you're used to it. It
> quickly gets out of control though, even for a simple function like filter:
>
>     filter<T>(pred: &fn(&T) -> bool, input: &fn(&fn(T) -> bool) -> bool, 
> output: &fn(T) -> bool) -> bool {}
>
> Sadly, `filter_ref` is also needed to work around closures behaving badly with
> lifetimes. An example of the problem with `fold_ref`:
>
>     fn product<T: One + Mul<T, T>>(iter: &fn(f: &fn(&T) -> bool) -> bool) -> 
> T {
>         fold_ref(One::one::<T>(), iter, |a, x| *a = a.mul(x))
>     }
>
> Since `product` expects an iterator yielding `&T` (a borrowed pointer in any
> region), it won't work with `fold` because that requires the borrowed pointer
> to have the same type (and thus lifetime) for every iteration like `&'a int`.
>
> This issue with borrowed pointers was blocking me from replacing the existing
> algorithms reimplemented for both `str` and `vec` with the generic ones.
>
> Chaining together iteration algorithms is a common use case, but even chaining
> two together is confusing at first:
>
>     to_vec(|g| filter(|&x| *x < 3, |f| xs.each(f), g)
>
> Another more alarming issue is that with internal iterators, the `break` and
> `return` statements don't always work in a `for` loop. If the iterator isn't
> implemented correctly, the loop will keep going.
>
> This also has borrow checking implications, because the compiler can't assume
> those statements actually cause flow control to leave the loop immediately.
>
> # External iterators
>
> Based on the above, you might think generic iteration algorithms in Rust are a
> bleak prospect. However, we already have a nice external iterator library, and
> they don't suffer from the above issues.
>
> All kinds of external iterators implement the following trait, whether they 
> are
> a fibonacci number generator, a reverse iterator over a vector or iterator 
> over
> a range in a sorted set:
>
>     pub trait Iterator<A> {
>         /// Advance the iterator and return the next value. Return `None` 
> when the end is reached.
>         fn next(&mut self) -> Option<A>;
>     }
>
> Generic adaptors are implemented on `Iterator`, and many of them are 
> `Iterator`
> implementations themselves:
>
>     use std::iterator::*;
>
>     fn main() {
>         let mut it = Counter::new(0.0, 1.0)
>             .take_while(|x| *x < 10000000.0)
>             .transform(|x| x / 2.0)
>             .transform(|x| x + 2.0);
>         println(it.fold(0.0, |a, b| a + b).to_str())
>     }
>
> If you're curious, the optimized LLVM IR: http://ix.io/5Xl
>
> Unlike internal iterators, external iterators only run one iteration at a 
> time,
> so a `for` loop designed for them would always be able to succeed with `break`
> and `return`. It would also be able to avoid the ugly `advance` wrapper
> currently required to use external iterators with `for`.
>
>     // The current situation, wrapping an external iterator as an internal one
>     //
>     // Since the advance method is not known the be correct, borrow checking
>     // still assumes `return` and `break` are imperfect.
>     for xs.zip(ys).advance |x| { ... }
>
>     // A hypothetical `for` loop using the `Iterator` trait
>     for iterator |x| { ... }
>
>     // It could also fall back to an `Iterable` trait and obtain an iterator
>     for container |x| { ... }
>
> External iterators also avoid the problems with references and closures,
> because they simply return `T` rather than passing it to a closure.
>
> # Why not just switch to external iterators?
>
> Algorithms that can be represented easily without the call stack are as easy 
> to
> write as either an internal or external iterator.
>
> However, without support for compiling a function to a state machine (C# does
> this for yield), traversal of a recursive data structure has to be manually
> translated to a procedural algorithm with an explicit stack. For complex data
> structures, this process can be very difficult.
>
> I'm hopeful Rust will gain support for this down the road after 1.0. If it
> does, there will be no reason to write immutable internal iterators.
>
> Another issue is mutability, as you can write iterators that are able to 
> mutate
> containers. With internal iterators, this is easy to do with safe code. With
> external ones, it will `unsafe` and won't be easy to get right.
>
> # Solution?
>
> I don't have any proposal to completely solve this issue. :)
>
> I think extending the built-in `for` loop to work with external iterators
> should be considered, because right now the verbosity discourages using them
> and makes borrow checking more painful than it has to be.
>
> It could treat functions as internal iterators, and look for an `Iterator`
> implementation (using a `lang` item) for external ones.
>
> Python's `for` loop starts by looking for an iterator (a `__next__` method) 
> and
> falls back to an iterable (an `__iter__` method) so behaviour like this isn't
> an alien concept.
>
> _______________________________________________
> Rust-dev mailing list
> [email protected]
> https://mail.mozilla.org/listinfo/rust-dev
>
_______________________________________________
Rust-dev mailing list
[email protected]
https://mail.mozilla.org/listinfo/rust-dev

Reply via email to