Summary: You may be interested to * check the new rules and guarantees on bytecode loading to make sure they're compatible with any sandboxing uses that you have; and/or
* check the new `racket/fasl` library for design or implementation flaws, especially considering that the new format is intended to be forward-compatible forever. If those topics are not of interest, you can safely skip this long message! Some history and motivation: Racket (well, MzScheme) was originally intended to enforce safety everywhere with C code as the only escape hatch. That is, the original intent was that you can't crash the runtime system by writing only Racket (MzScheme) code. This guarantee was meant to carry over to bytecode, so generators and sources of bytecode would not have to be trusted; a bytecode validator ensured that, in the absence of unsafe operations, any loaded bytecode would be prevented from crashing the system. We abandoned the constraint of safe-only Racket code with the introduction of `ffi/unsafe`. We doubled down with libraries like `racket/unsafe/ops`. Then `typed/racket` started generating unsafe operations where types could enforce safe use of the unsafe primitives. Along similar lines, we starting changing macro-implemented forms like `for ... in-list` to expand to unsafe operations within a loop, where the macro generates needed checks outside of the loop. Finally, we started making the bytecode optimizer substitute unsafe operations for safe ones when previous operations guard the argument already; for example, `(and (pair? v) (car v))` is compiled as `(and (pair? v) (unsafe-car v))`. These changes moved the boundary where safety was intended to be enforced. Some source languages can remain safe as long as you don't use things with `unsafe` in the name, and changing the code inspector can disallow access to those unsafe operations at the source level. But safety is generally gone at the bytecode level; the bytecode "validator" can't really validate, since all bets are currently off whenever bytecode refers to an unsafe operation. To avoid unsafety in non-trusting contexts, if the current code inspector is not the original code inspector, the reader in Racket v6.x and earlier refuses to load bytecode that refers to an unsafe operator. So, sandboxed applications can be prevented from crashing the system via bytecode. Given how often bytecode now contains references to safe operations, that means very few Racket modules can be loaded in bytecode form when the code inspector is changed. For various reasons, preserving the status quo is inconvenient with the new expander and module system. The problems are related to the internal reorganization that replaces module primitives with linklet primitives. Instead of trying to validate bytecode (which, unsurprisingly, is a source of bugs itself), it seems better to just make all bytecode non-runnable when loaded in an non-trusting context. Ideas like bytecode validation also don't adapt easily to Racket-on-Chez, where machine code is generally trusted. I see one catch. The bytecode format was used to implement the `racket/fasl` library, which turns an S-expression into a form that can be loaded more quickly. It certainly makes sense to read encoded data even in a non-trusting context. One solution is to implement `racket/fasl` without using the bytecode reader. While we're at it, we can fix the inconvenience of a fasl format that's specific to a Racket version. As of the latest commits to the Racket repo: * Reading bytecode remains safe. Unlike Racket v6.x, well-formed bytecode is not rejected by the reader on the grounds that it refers to unsafe operations, because merely reading those references is not a problem. So, the reader now works in some cases where it couldn't work before, and the `read` operation remains overall just as safe as before. * Reading bytecode with a non-original code inspector marks the loaded code as non-runnable. Changing the code inspector is already the mechanism for disallowing access to unsafe operator at the source level. So, this change doesn't add a new requirement for creating a non-trusting context. Instead, it takes away the ability to sometimes load bytecode in an non-trusting context. * The bytecode pseduo-validator is disabled. Since the bytecode "validator" can't validate code that has unsafe operations, it can't validate real bytecode. Meanwhile, disabling the pseduo-validator provides a slight improvement in load time --- around 5%. For now, the pseudo-validator is still available by setting `PLT_VALIDATE_LOAD`, but I don't expect it to survive in the long run. Keeping the optimizer and validator in sync seems like more trouble than it has worth. * The `racket/fasl` library now implements its own format for fast loading, instead of using `(compile `(quote ,v))` and relying on the bytecode format. The new fasl format is meant to be portable and forward-compatible. Part of portabilty is that the same format works with the current Racket VM and Racket-on-Chez. Forward compatibility means that a fasl encoding created today should be readable forever via `fasl->s-exp` (although a future `s-exp->fasl` may gain extra behavior that can't be read by older versions). Keeping that in mind, I'd appreciate reviews of the implementation in "collects/racket/fasl.rkt" to look for design or implementation flaws that we don't want to deal with forever. Overall, I think these changes preserve safety and practical functionality, but the goal of this long message is help make sure that I haven't missed something. -- You received this message because you are subscribed to the Google Groups "Racket Developers" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/racket-dev/5aa85e8d.8d479d0a.34586.0d7fSMTPIN_ADDED_MISSING%40gmr-mx.google.com. For more options, visit https://groups.google.com/d/optout.
