Bill (cc'ing rust-dev)-

[Executive summary: the provided proposal needs further work.]

I do not know which discussion of data sort refinements you had been reading; I would have liked context about where you were coming from. I assume it was either Niko's blog post [1] or issue #1679 [2], but I would prefer not to make any assumption at all.

[1]: http://smallcultfollowing.com/babysteps/blog/2012/08/24/datasort-refinements/
[2]: https://github.com/mozilla/rust/issues/1679

Some immediate thoughts:

* This strikes me as an extreme change to the language, but perhaps my gut is overly conservative.

-- (At first I thought you were suggesting adding headers to every struct, but then I realized that the compiler should be able to insert the tags at the points where a struct is passed into an evaluation context expecting a structural-enum. So its not as extreme a change as I had initially worried; but I still worry.)

-- I think one of Niko's points in his blog post was that his proposal was not an extreme change.

* You have not addressed in your proposal how you would change the match syntax to deal with non-struct variants, such as ~str or int.

-- (I would probably just sidestep this by including the hypothetical stipulation that you mentioned, where only structs can be part of a structural enum; then I think the match syntax can remain largely unchanged, but see caveats with next bullet.)

* Finally, I think your note at the end about generic instantiation is a bigger problem than you make it out to be.

-- For example, can I actually expect to be able to write code that processes arguments of type "A | B | S(Y)" ?

  struct S<Y> { y: Y }
  fn <A,B,X>(x: A | B | S<Y>) {
    match x {
... what could go here ? ..., // early case clauses, maybe to handle A
      S{ y: the_y } => { ... handle the_y ... },
      ... and what goes here ? ...  // late case clauses, maybe to handle B
    }
  }

There is the issue you already pointed out, where a type variables might be instantiated to S<Y>. But could they also be instantiated to S<Z>? (Do the tags on the variants need to encode the whole type, and not just which struct it is?) And what about the poor user who didn't think about the fact that they might alias each other, and thought that all the clauses in the code for A | B| S<Y> were disjoint, but in fact they potentially overlap due to the potential aliasing, and thus the order of the cases in the above is now crucial, yes?

-- Another example: Can/should I now just throw seemingly unrelated struct's into a match, in order to anticipate that the parameters will be instantiated with that struct? Consider the following:

  struct S<Y> { y: Y }
  struct T<Z> { z: Z }

  fn <A,B,X>(x: A | B | S<Y>, f(ab: A | B) -> int) -> int {
    match x {
T{ z: the_z } => { who knows, maybe A or B were instantiated with T<Z>, handle it },
      S{ y: the_y } => [ ... handle the_y ... },
      other => return f(other)
  }

-- Perhaps I am misunderstanding your proposal, and your hypothetical type system would reject the T clause in the latter example, and the *only* option for handling parametric variants in structural-enums is via a catch all clause (that can pass the problem off to another function, as illustrated by the final clause in the latter example).


I do not want to spend too much time trying to infer the fine details of what you propose; this e-mail may be prohibitively long as it is. I just wanted to put down my initial thoughts.

It is possible that a more conservative approach would be easier for me to swallow. (And it is also possible that other developers will be enthused about tackling these issues, rather than worried.)

Cheers,
-Felix

On 28/08/2013 00:58, Bill Myers wrote:
I was reading a proposal about adding "datasort refinements" to make enum variants first-class types, and it seems to me there is a simpler and more effective way of solving the problem.

The idea is that if A, B and C are types, then "A | B | C" is a "structural" enum type that can be either A, B or C.

In addition, A can be implicitly converted to "A | B", "A | B" can be implicitly converted to "A | B | C", and also "(A | B) | C" and "A | (B | C)" are equivalent to "A | B | C", and finally "C | B | A" is equivalent to "A | B | C" (to support the latter, the implementation needs to sort variants in some arbitrary total order before assigning tag numbers).

Furthermore, a way to bind variables to an "or" pattern is introduced to allow to convert "A | B | C" to "A | B" in the case that it holds an A or a B.

This way, one can rewrite Option as a type alias like this:
struct Some<T>(T);
struct None;

type Option<T> = None | Some<T>;

Which is like the current Option, but also makes None and Some<T> first-class types.

The current enum syntax can remain as syntax sugar for the above code.

The only issue I see is what to do for code such as "let mut x = Some(3); x = None;": with this proposal, Some and None are separate unrelated types, so we either have this code emit an error, or x must be given the type "Some<int> | None" automatically, which however can lead to obscure error messages if one mistakenly attempts to assign a string to it causing the type to become "Some<int> | None | ~str" (i.e. the user might be told than a match is not exhaustive because it does not handle the "~str" case, rather than that they assigned a ~str to an Option-typed variable).

It should be possible to allow this, and make the error-emitting code use heuristics to figure out whether it is more likely that the user assigned a value of the wrong type, or used an enum improperly (for example, by looking at whether the implicitly created enum type is ever written explicitly in the source, and whether the deduced structural enum type is being used in places that require a non-enum type).

Alternatively, one can stipulate that only types that are structs, or that are structs marked "enum struct" or "case struct" or similar can become part of an inferred structural enum, but this seems unappealing.

Note that some structural enums can change representations depending generic instantiation, since "T | int" becomes just "int" if T = int, while it is "~str | int" if T = ~str (and similar for "Some<T> | Some<int>"), but this should not be a problem.



_______________________________________________
Rust-dev mailing list
[email protected]
https://mail.mozilla.org/listinfo/rust-dev


--
irc: pnkfelix on irc.mozilla.org
email: {fklock, pnkfelix}@mozilla.com

_______________________________________________
Rust-dev mailing list
[email protected]
https://mail.mozilla.org/listinfo/rust-dev

Reply via email to