Summary: You may be interested to

 * check the new rules and guarantees on bytecode loading to make sure
   they're compatible with any sandboxing uses that you have; and/or

 * check the new `racket/fasl` library for design or implementation
   flaws, especially considering that the new format is intended to be
   forward-compatible forever.

If those topics are not of interest, you can safely skip this long
message!


Some history and motivation:

Racket (well, MzScheme) was originally intended to enforce safety
everywhere with C code as the only escape hatch. That is, the original
intent was that you can't crash the runtime system by writing only
Racket (MzScheme) code. This guarantee was meant to carry over to
bytecode, so generators and sources of bytecode would not have to be
trusted; a bytecode validator ensured that, in the absence of unsafe
operations, any loaded bytecode would be prevented from crashing the
system.

We abandoned the constraint of safe-only Racket code with the
introduction of `ffi/unsafe`. We doubled down with libraries like
`racket/unsafe/ops`. Then `typed/racket` started generating unsafe
operations where types could enforce safe use of the unsafe primitives.
Along similar lines, we starting changing macro-implemented forms like
`for ... in-list` to expand to unsafe operations within a loop, where
the macro generates needed checks outside of the loop. Finally, we
started making the bytecode optimizer substitute unsafe operations for
safe ones when previous operations guard the argument already; for
example, `(and (pair? v) (car v))` is compiled as `(and (pair? v)
(unsafe-car v))`.

These changes moved the boundary where safety was intended to be
enforced. Some source languages can remain safe as long as you don't
use things with `unsafe` in the name, and changing the code inspector
can disallow access to those unsafe operations at the source level. But
safety is generally gone at the bytecode level; the bytecode
"validator" can't really validate, since all bets are currently off
whenever bytecode refers to an unsafe operation.

To avoid unsafety in non-trusting contexts, if the current code
inspector is not the original code inspector, the reader in Racket v6.x
and earlier refuses to load bytecode that refers to an unsafe operator.
So, sandboxed applications can be prevented from crashing the system
via bytecode. Given how often bytecode now contains references to safe
operations, that means very few Racket modules can be loaded in
bytecode form when the code inspector is changed.

For various reasons, preserving the status quo is inconvenient with the
new expander and module system. The problems are related to the
internal reorganization that replaces module primitives with linklet
primitives. Instead of trying to validate bytecode (which,
unsurprisingly, is a source of bugs itself), it seems better to just
make all bytecode non-runnable when loaded in an non-trusting context.
Ideas like bytecode validation also don't adapt easily to
Racket-on-Chez, where machine code is generally trusted.

I see one catch. The bytecode format was used to implement the
`racket/fasl` library, which turns an S-expression into a form that can
be loaded more quickly. It certainly makes sense to read encoded data
even in a non-trusting context. One solution is to implement
`racket/fasl` without using the bytecode reader. While we're at it, we
can fix the inconvenience of a fasl format that's specific to a Racket
version.


As of the latest commits to the Racket repo:

 * Reading bytecode remains safe.

   Unlike Racket v6.x, well-formed bytecode is not rejected by the
   reader on the grounds that it refers to unsafe operations, because
   merely reading those references is not a problem. So, the reader now
   works in some cases where it couldn't work before, and the `read`
   operation remains overall just as safe as before.

 * Reading bytecode with a non-original code inspector marks the loaded
   code as non-runnable.

   Changing the code inspector is already the mechanism for disallowing
   access to unsafe operator at the source level. So, this change
   doesn't add a new requirement for creating a non-trusting context.
   Instead, it takes away the ability to sometimes load bytecode in an
   non-trusting context.

 * The bytecode pseduo-validator is disabled.

   Since the bytecode "validator" can't validate code that has unsafe
   operations, it can't validate real bytecode. Meanwhile, disabling
   the pseduo-validator provides a slight improvement in load time ---
   around 5%.

   For now, the pseudo-validator is still available by setting
   `PLT_VALIDATE_LOAD`, but I don't expect it to survive in the long
   run. Keeping the optimizer and validator in sync seems like more
   trouble than it has worth.

 * The `racket/fasl` library now implements its own format for fast
   loading, instead of using `(compile `(quote ,v))` and relying on the
   bytecode format.

   The new fasl format is meant to be portable and forward-compatible.
   Part of portabilty is that the same format works with the current
   Racket VM and Racket-on-Chez. Forward compatibility means that a
   fasl encoding created today should be readable forever via
   `fasl->s-exp` (although a future `s-exp->fasl` may gain extra
   behavior that can't be read by older versions).

   Keeping that in mind, I'd appreciate reviews of the implementation
   in "collects/racket/fasl.rkt" to look for design or implementation
   flaws that we don't want to deal with forever.

Overall, I think these changes preserve safety and practical
functionality, but the goal of this long message is help make
sure that I haven't missed something.

-- 
You received this message because you are subscribed to the Google Groups 
"Racket Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/racket-dev/5aa85e8d.8d479d0a.34586.0d7fSMTPIN_ADDED_MISSING%40gmr-mx.google.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to