Bill (cc'ing rust-dev)-
[Executive summary: the provided proposal needs further work.]
I do not know which discussion of data sort refinements you had been
reading; I would have liked context about where you were coming from. I
assume it was either Niko's blog post [1] or issue #1679 [2], but I
would prefer not to make any assumption at all.
[1]:
http://smallcultfollowing.com/babysteps/blog/2012/08/24/datasort-refinements/
[2]: https://github.com/mozilla/rust/issues/1679
Some immediate thoughts:
* This strikes me as an extreme change to the language, but perhaps my
gut is overly conservative.
-- (At first I thought you were suggesting adding headers to every
struct, but then I realized that the compiler should be able to insert
the tags at the points where a struct is passed into an evaluation
context expecting a structural-enum. So its not as extreme a change as
I had initially worried; but I still worry.)
-- I think one of Niko's points in his blog post was that his proposal
was not an extreme change.
* You have not addressed in your proposal how you would change the match
syntax to deal with non-struct variants, such as ~str or int.
-- (I would probably just sidestep this by including the hypothetical
stipulation that you mentioned, where only structs can be part of a
structural enum; then I think the match syntax can remain largely
unchanged, but see caveats with next bullet.)
* Finally, I think your note at the end about generic instantiation is a
bigger problem than you make it out to be.
-- For example, can I actually expect to be able to write code that
processes arguments of type "A | B | S(Y)" ?
struct S<Y> { y: Y }
fn <A,B,X>(x: A | B | S<Y>) {
match x {
... what could go here ? ..., // early case clauses, maybe to
handle A
S{ y: the_y } => { ... handle the_y ... },
... and what goes here ? ... // late case clauses, maybe to handle B
}
}
There is the issue you already pointed out, where a type variables might
be instantiated to S<Y>. But could they also be instantiated to S<Z>?
(Do the tags on the variants need to encode the whole type, and not just
which struct it is?) And what about the poor user who didn't think
about the fact that they might alias each other, and thought that all
the clauses in the code for A | B| S<Y> were disjoint, but in fact they
potentially overlap due to the potential aliasing, and thus the order of
the cases in the above is now crucial, yes?
-- Another example: Can/should I now just throw seemingly unrelated
struct's into a match, in order to anticipate that the parameters will
be instantiated with that struct? Consider the following:
struct S<Y> { y: Y }
struct T<Z> { z: Z }
fn <A,B,X>(x: A | B | S<Y>, f(ab: A | B) -> int) -> int {
match x {
T{ z: the_z } => { who knows, maybe A or B were instantiated with
T<Z>, handle it },
S{ y: the_y } => [ ... handle the_y ... },
other => return f(other)
}
-- Perhaps I am misunderstanding your proposal, and your hypothetical
type system would reject the T clause in the latter example, and the
*only* option for handling parametric variants in structural-enums is
via a catch all clause (that can pass the problem off to another
function, as illustrated by the final clause in the latter example).
I do not want to spend too much time trying to infer the fine details of
what you propose; this e-mail may be prohibitively long as it is. I
just wanted to put down my initial thoughts.
It is possible that a more conservative approach would be easier for me
to swallow. (And it is also possible that other developers will be
enthused about tackling these issues, rather than worried.)
Cheers,
-Felix
On 28/08/2013 00:58, Bill Myers wrote:
I was reading a proposal about adding "datasort refinements" to make
enum variants first-class types, and it seems to me there is a simpler
and more effective way of solving the problem.
The idea is that if A, B and C are types, then "A | B | C" is a
"structural" enum type that can be either A, B or C.
In addition, A can be implicitly converted to "A | B", "A | B" can be
implicitly converted to "A | B | C", and also "(A | B) | C" and "A |
(B | C)" are equivalent to "A | B | C", and finally "C | B | A" is
equivalent to "A | B | C" (to support the latter, the implementation
needs to sort variants in some arbitrary total order before assigning
tag numbers).
Furthermore, a way to bind variables to an "or" pattern is introduced
to allow to convert "A | B | C" to "A | B" in the case that it holds
an A or a B.
This way, one can rewrite Option as a type alias like this:
struct Some<T>(T);
struct None;
type Option<T> = None | Some<T>;
Which is like the current Option, but also makes None and Some<T>
first-class types.
The current enum syntax can remain as syntax sugar for the above code.
The only issue I see is what to do for code such as "let mut x =
Some(3); x = None;": with this proposal, Some and None are separate
unrelated types, so we either have this code emit an error, or x must
be given the type "Some<int> | None" automatically, which however can
lead to obscure error messages if one mistakenly attempts to assign a
string to it causing the type to become "Some<int> | None | ~str"
(i.e. the user might be told than a match is not exhaustive because it
does not handle the "~str" case, rather than that they assigned a ~str
to an Option-typed variable).
It should be possible to allow this, and make the error-emitting code
use heuristics to figure out whether it is more likely that the user
assigned a value of the wrong type, or used an enum improperly (for
example, by looking at whether the implicitly created enum type is
ever written explicitly in the source, and whether the deduced
structural enum type is being used in places that require a non-enum
type).
Alternatively, one can stipulate that only types that are structs, or
that are structs marked "enum struct" or "case struct" or similar can
become part of an inferred structural enum, but this seems unappealing.
Note that some structural enums can change representations depending
generic instantiation, since "T | int" becomes just "int" if T = int,
while it is "~str | int" if T = ~str (and similar for "Some<T> |
Some<int>"), but this should not be a problem.
_______________________________________________
Rust-dev mailing list
[email protected]
https://mail.mozilla.org/listinfo/rust-dev
--
irc: pnkfelix on irc.mozilla.org
email: {fklock, pnkfelix}@mozilla.com
_______________________________________________
Rust-dev mailing list
[email protected]
https://mail.mozilla.org/listinfo/rust-dev