We have discussed this point for some time, though I don't know if anyone raised the security review angle specifically before.

I think there is a good case to be made that lint warnings adequately address this problem. You can set the mode to cause an error if case-distinctions are not observed in a pattern binding. Then a reviewer can be confident that, as long as the code compiles, nothing surprising is going on.



Niko

On 8/16/12 1:26 PM, Nathan wrote:
Hello,

I'm brand new to rust.  I've read the tutorial once and not the
manual.  My first attempt at running "make" failed with a compiler
error (I'll email or irc separately).  I apologize if I'm re-covering
ground or missing something, but I felt this general point was
important to make as early as possible in a language's design.  I hope
I'm not too late.

In another thread about naming conventions, it sounds as if there is a
grammatical ambiguity which is resolved by scoping rules at compile
time:

match myfoo {
   bar => /* ... stuff ... */
}

What does the left hand side of the match rule represent?  IIUC, it is
impossible to tell in rust without understanding what "bar" signifies
in the surrounding scope.  If there is an enum discriminator named
"bar" and myfoo is of that type, then it means "match if myfoo is a
bar value".  On the other hand if this is not the case, it means
"create a new binding called bar and bind it to the value in
question".

Is this true?

If so, I suggest this is a serious problem which naming conventions
will not solve.  If it is not the case, please ignore this email and
tell me to rtfm.  ;-)

So see why it is a problem, consider two use cases: a person learning
the language, and a person auditing code for bugs.  (For bonus
material, see the post-script.)


A person learning the language may learn only one semantic
interpretation first, or may learn both but forget about one.  They
write code depending on the one semantic interpretation they are
familiar with.  It works.  Then one day, there's a strange compiler
error.  Hopefully it's a very clear compiler error.  That could save
some time.  Either way they probably need to revisit two different
parts of language reference documentation to get a full understanding
of the issue.  Even if the "official documentation" has a single point
that contrasts these two semantic possibilities, any 3rd party books,
blogs, tutorials, etc... will reinforce the misunderstanding.


Case two: A code auditor is looking for bugs, possibly subtle bugs or
security flaws.  They don't have a compiler.  They're looking at a
printout in an underground layer with no electronics allowed, or
equivalently it's an interview question.

Now, they see a pattern match rule where the left hand side is "bar".
Even though the know the language perfectly, they cannot know the
semantics here without understanding the scope of enum discriminators.
  Does this require looking at more than one file?  If so, the problem
complexity branches out indefinitely.  They must do this for *every*
such matching rule, even though many rules may be simple binding
patterns rather than enum discriminators.

See my last C example to illustrate the compound problems of grammar
ambiguity *and* importing definitions from.  If the imported
definitions are not explicitly named in relation to where they are
imported from, then an auditor must now read *every* file imported,
and they must do this recursively.  (Let's hope they have ide
support.)  If the imported names are explicitly associated with which
source they are imported from, the auditor must recurse but at least
it's linear instead of exponential.


The solution I'm proposing is to alter the grammar so that it's
possible by looking at only the pattern matching text, without knowing
any other context whether it is a discriminator match or a new
binding.  There are at least two ways to do this:

One is to ensure that it's always possible when looking at an
identifier in *any* context whether or not it is a discriminator or a
binding/reference.  Haskell does this elegantly, IMO, by forcing
discriminators to start with upper case and bindings/references to
start with lower case.  Any other rule that prevents the identifiers
from overlapping is sufficient.  I prefer this approach because it
solves the ambiguity problem for *every* grammar production which
involves either a reference/binding *or* an enum discriminator.

Another is to change the specific match syntax so you say something like:

match myfoo {
   discriminator bar => /* yes, this is a klunky new keyword, so I
don't recommend this in practice, but it makes the point. */
   bar => /* bare identifiers are always bindings. */
}

-or-

match myfoo {
   'bar => /* This is just the same as the last, except we use a sigil
instead of a keyword.  It's compact.  This could be considered an
identifier disambiguation approach if all discriminator identifiers
always begin with ' or some other sigil. */
   bar => /* bare identifiers are always bindings. */
}

-or-

match myfoo {
   MyEnum.bar => /* Always require the type for discriminators, at
least in this context.  Klunky if other contexts do not require the
type.  Klunky since the type of myfoo is already specified. */
   bar => /* bare identifiers are always bindings. */
}

-or-

match myfoo {
   bar => /* bare identifiers are always discriminators. */
   let bar => /* bindings always use let (because it is similar to a
let binding). kind of klunky and maybe confusing placement of the
keyword.  Plus nested patterns get klunky: */
   [bar, let bar] => /* match a list/sequence/array thingy with a bar
discriminator value and any other value which is bound to bar.
Contrived but shows the grammar distinction in compound matches. */
}



Anyway, please understand that those proposed syntaxes are just
"ballpark" since I don't understand the grammar well, nor the
style/community/taste.  The main point is that grammars which are
ambiguous without compile/run-time context are fraught with peril.


Regards,
Nathan Wilcox


PS:

Maybe a simpler way to state my desire is: Make it so that it's very
hard to compete in an "underhanded backdoor" competition for rust and
very easy to audit code for bugs.

See for example this competition where entries look like correct C
code to tally votes, but they surreptitiously skew the results in the
favor of the author:
http://graphics.stanford.edu/~danielrh/vote/scores.html

When I am emperor, all language designers will be forced to audit all
entries for all "underhanded backdoor" competitions for all other
languages before they are allowed to design their language.  ;-)  (You
may surmise that I was a security auditor in the past...)

One of my favorites is here:
http://graphics.stanford.edu/~danielrh/vote/mzalewski.c

That entry is a case where examining a bit of text does not tell you
its semantics because it may either be a variable reference *or* a
macro instance and the only way to know is to have a mental model of
the macros and variables in scope.  If instead, all macro expansions
required a $ prefix or whatever, there would be no ambiguity and the
bug would be much easier to track down.


PPS:

Some other simple ambiguities in languages I kind of know off the top
of my head which have lead to real world bugs I wrote or had to fix:

javascript:
x = 5; // ambiguity: Either reassign a declared binding in a
containing scope, or create a new global binding.

erlang:
x = 5; // ambiguity: Either create a new binding called "x" referring
to 5 *or* try to match the existing binding "x" to the value 5.

C:
#include "define_foo.h"

static int bar = 42;

int main(int argc, char** argv) {
   foo(bar); // Ambiguous, even without macros, depending on the
contents of define_foo.h
   bar = 7; // Possibly invalid, depending on the contents of define_foo.h
   return bar;
}

Here are at least two possible contents for define_foo.h:
int foo(int bar);

-or-

typedef char foo;
_______________________________________________
Rust-dev mailing list
[email protected]
https://mail.mozilla.org/listinfo/rust-dev

_______________________________________________
Rust-dev mailing list
[email protected]
https://mail.mozilla.org/listinfo/rust-dev

Reply via email to