Hi Marijn,

Sorry for the late reply. Overall, I am a fan of this idea. In the previous thread the idea of bounded polymorphism came up and it's been rattling around in my head ever since: I'm glad to see it be sketched out in more detail. To my mind, this need not conflict with the previous proposals: basically it can be seen as a pure addition that allows one to avoid constructing interface instances but still make use of dynamic dispatch.

Some comments:

- I don't see why every category must declare an associated interface. I like the idea of *allowing* categories to declare associated interfaces, as it will lead to better error reporting, but I think *requiring* it is too much. It implies that the main purpose of a category is dynamic dispatch and I am not sure that will be the case.

- I also like the idea that an interface can be created after the fact and make use of methods declared in categories or elsewhere that are not aware of the new interface. This is not possible in your proposal unless you declare a wrapper "instance." Maybe this is not a big deal in practice, or even a good thing, as you avoid the potential problem of the compiler assembling an interface out of various unrelated methods that happen to have the right names.

- I think the term instance ought to be reserved for runtime values: most languages that aren't Haskell use instance to refer to an instance of a type, such as an object etc. I'm not sure why Haskell felt the need to change that convention but I think it's confusing and there is no need for us to follow suit. Therefore, I will keep calling instances categories :).

- I really like the idea of making copy, equality, and other such things interface functions (not sure whether send will work or not). I am a bit concerned about the definition of comparison and equality because they really want to have two receiver types. I guess that one could do:

  interface ord<T> {
    fn lt(other: T) -> bool;
  }

and now you can write a sort function:

  fn sort<S:ord<S>>(v: [S]) { ... }

assuming F-bounded polymorphism, where the bounds of the type variable S can reference S. We have to make sure we get this right of course.

- It also raises some questions about static overloading. Assuming that there are multiple ord<> categories declared for the type of the variable `x`, what happens when `x.lt(y)` is invoked? Under my proposal, I suppose, each category would have a unique name, and an explicit category name would be required: `x.foo::lt(y)`. The more Java-like solution would be to decide which `lt()` is invoked based on the type of `y`, but this complicates type inference and is confusing to boot (static overloading vs dynamic dispatch).



Niko


On 11/25/11 8:06 AM, Marijn Haverbeke wrote:
Niko proposed categories [1] two weeks ago. I'm happy that we're
looking in this direction. Niko's proposal makes interfaces
structural. I'm going to argue that nominal interfaces have advantages
both in programmer ergonomics and in compilation-model complexity.

[1]:https://mail.mozilla.org/pipermail/rust-dev/2011-November/000941.html

I'll be sticking rather closely to Haskell's type class system, which
has proven itself in practice. If you aren't already enthusiastic
about Haskell's type classes, I recommend watching Simon Peyton Jones'
talk about them [2]. He goes over the way type classes can be
implemented, and shows a number of really cool applications.

[2]:http://channel9.msdn.com/posts/MDCC-TechTalk-Classes-Jim-but-not-as-we-know-them
(try to skip the first 3 minutes, they might spoil your appetite)

Context: I'd like to implement some minimum viable dynamic dispatch
system and get rid of our `obj` implementation before the first public
release. Something that can be extended later with classes and traits,
but for now just allows us to define vtables that accompany types.

To recap, Niko's categories look like this:

     category vec_seq([T]) {
       fn len() ->  uint { vec::len(self) }
       fn iter(f: block(T)) { for elt in self { f(elt); } }
     }

Having that, you can do

     import vec::vec_seq;
     assert [1].len() == 1u;
     [1, 2, 3].iter({|i| log i; })

Which is statically resolved. Dynamic dispatch (if I understand
correctly), would look like this:

     type iterable<T>  = iface { fn iter(block(T)); };
     fn append_to_vec<copy T>(x: iterable<T>, y: itable<T>) ->  [T] {
         let r = [];
         x.iter {|e| r += [e];}
         y.iter {|e| r += [e];}
         r
     }
     // Assuming there exists a category for lists that implements iter
     append_to_vec([1, 2, 3] as iterable, list::cons(4, list::nil) as iterable)

That causes the compiler to create two vtables, both containing an
`iter` method, and wrap the arguments in {vtable, @value} structures
when they are cast to `iterable` (they'll probably have to be boxed to
make sure the size of such a value is uniform, and cleanup is
straightforward).

Alternatively, my proposal looks like this: interfaces could be fixed
groups of methods, that are always implemented all at once.

     // Define an interface called `seq`
     interface seq<T>  {
         fn len() ->  uint;
         fn iter(f: block(T));
     }
     // Declare [T] to be an instance of seq
     instance [T] seq<T>  {
         fn len() ->  uint { vec::len(self) }
         fn iter(f: block(T)) { for elt in self { f(elt); } }
     }

The static way to use this would look the same as above. If you've
imported `vec::seq` (std::vec's implementation of seq), you can simply
say [1].len(). If there is any instance in scope that applies to type
[int] and has a method `len`, that instance's implementation is
called. If multiple interfaces are found, the one in the closes scope
is chosen. If they are in the same scope, or no interface is found,
you get a compiler error.

Dynamic dispatch works differently.

     // Declare T to be an instance of the seq interface
     fn total_len<T:seq>(seqs: [T]) ->  uint {
         let cnt = 0u;
         for s in seqs { count += s.len(); }
         count
     }

In this proposal, the seq vtable is not something that get attached to
the value by casting it to an interface, but rather acts as an
implicit parameter to the function. The cool thing is that we already
have these implicit parameters -- they map very closely to our type
descriptors, which we are implicitly passing to generics. What would
happen, for such a call, is that the compiler notices that the type
parameter has an interface bound, so that instead of passing just a
tydesc, it passes both a tydesc and the `seq` vtable that belongs to
that type. Inside the function, `s` is known to be of type `T:seq`, so
the `s.len` call looks up the `len` method in the vtable passed for
type parameter T.

You can also require type parameter to conform to multiple interfaces,
as in `fn foo<T: seq, print>(...)` -- that requires passing multiple
vtables. (Niko: this is the thing you asked about. Turns out it's not
hard to do.)

It should be noted that this has both advantages and disadvantages
compared to the 'wrap by casting to interface approach'. For one
thing, it doesn't allow this

     fn draw_all<T: drawable>(shapes: [T]) { for s in shapes { s.draw(); } }

.. or at least, it doesn't do what you want, because it requires all
arguments to be of the same type, and only passes a single vtable. An
extension can be implemented (at a later time), to support this
instead:

     fn draw_all(shapes: [drawable]) { ... }
     draw_all([my_circle as drawable, my_rectangle as drawable]);

The drawable interface, when used as a type instead of a type
parameter bound, would denote a boxed value paired with a vtable, just
like in Niko's proposal.

And the good part: In the case where the interface is used as a type
parameter bound, which should cover most use cases, things do not have
to be boxed to be handled generically, and the content of regular data
structures (such as `[int]`) can be approached generically. This is
fast, and it allows type classes to applied all over the place...

Everything that's currently an obj could become an instance. We'd get
static, super-fast dispatch when using them monomorphically, and be
able to decide on our own representation (rather than being locked
into a boxed representation, as objs are) for the values. Being able
to define methods on built-in types means that many things wouldn't
require defining a new representation at all.

The operations that are currently magically implemented by the
compiler and runtime, such as comparing and logging, could be methods
on interfaces (see Haskell's Eq, Cmp, and Show type classes). That'd
make them overridable and extendable.

With a Sufficiently Smart Inliner, we could even do arithmetic with
methods, and get operator overloading for free (see Haskell's Num type
class), and allow things like generic implementations of sum, average,
and similar numeric operations over sequences of `T: num` type.

Our type parameter bounds, `copy` and `send`, could become interfaces,
with suitable implementations in the standard library. Copying of
generic types would be forbidden, but `copy` would be a method of the
`copy` typeclass, so you could say `copy(elt)` if you declared your
type parameter with a `: copy` bound.

Implementing a system like this in its simple form is not terribly
hard, especially since we already have tydescs implemented. Making it
as powerful as Haskell's requires some extensions (most importantly:
default implementations of methods, and interfaces that are
parameterized with other interfaces), but those can be tackled
piecemeal. There's even a credible path towards multiple dispatch
(methods dispatching on the type of more than one argument), though it
requires a more complicated interface-dispatching mechanism.


I'd like to spend next week implementing this. Comments, additions,
and violently disagreeing flames are welcome.

Cheers,
Marijn
_______________________________________________
Rust-dev mailing list
[email protected]
https://mail.mozilla.org/listinfo/rust-dev



_______________________________________________
Rust-dev mailing list
[email protected]
https://mail.mozilla.org/listinfo/rust-dev

Reply via email to