I was (relatively) recently bitten by an issue with putting contracts on serializable structs. What's worse, once I figured out what was going on, I realized that I'd run into this very issue before and even discussed it on this list <https://groups.google.com/d/topic/racket-users/xIrqdJE-qdk/discussion>. In that case, though, I'd encountered the problem with higher-order contracts and other complications: I hadn't fully appreciated that the same caveats apply to much simpler cases.
I want to write up this problem for the list, both because I suspect others are making the same oversight I was and because I think this problem deserves a linguistic solution. In essence, the problem is that, if you are using `serializable-struct`, any contracts you may put on the struct (usually through a `struct` clause in `contract-out`) are not enforced when instances are deserialized with `deserialize`. Here is an example program: #lang racket (module server racket (require racket/serialize) (provide (contract-out [struct boxed-set ([v set?])])) (serializable-struct boxed-set (v) #:transparent)) (module bad racket (require racket/serialize (submod ".." server)) (provide (contract-out [make-empty-boxed-set/bad (-> boxed-set?)])) (define (corrupt-serialized s) (list-set s 6 '(0 "not a set"))) (define (make-empty-boxed-set/bad) (deserialize (corrupt-serialized (serialize (boxed-set (set))))))) (module main racket (require (submod ".." server) (submod ".." bad)) (boxed-set-v (make-empty-boxed-set/bad))) The `server` module tries to protect its `boxed-set` datatype with a contract in just the way I was doing, which I expect many other Racketeers are doing as well. If I'd written such a module with `struct` instead of `serializable-struct`, I would know that no other module will be able to construct a `boxed-set` instance with a bad value. If I know that my own module doesn't create malformed instances, either (perhaps because it doesn't construct any instances), I could correctly assume that every instance satisfies its contract, giving all of the familiar benefits for all parties when reasoning about their code. Using `serializable-struct` opens a loophole in this guarantee. The expansion of `serializable-struct` uses the same lower-level mechanisms available for implementing a custom serialized representation: it generates a `prop:serializable` property and a `deserialize-info` value, which is exported from the `deserialize-info` submodule as `deserialize-info:boxed-set-v0`. The `deserialize-info` value encapsulates a function to construct a `boxed-set` instance from the serialized representation, which is implemented using the `boxed-set` constructor. The key point is that the expansion of `serializable-struct` uses the version of the `boxed-set` constructor from inside the `server` module, which *is not protected by a contract.* Thus, `deserialize` effectively exposes the contract-less constructor, albeit quite indirectly. The `bad` module demonstrates how malformed instances of `boxed-set` can be constructed without triggering a contract violation. The `main` module then attempts to use `boxed-set-v` on a malformed value it has gotten from `bad`, triggering this exception, which accurately blames `server` for failing to live up to its promised contract on `boxed-set-v`: boxed-set-v: broke its own contract promised: set? produced: "not a set" in: the range of (-> boxed-set? set?) contract from: (<...>/serial-contract.rkt server) blaming: (<...>/serial-contract.rkt server) (assuming the contract is correct) at: <...>/serial-contract.rkt:7.23 The especially pernicious part is that, if `main` hadn't used `boxed-set-v`, the error might not have been detected until the value had flowed far away into some unrelated code. Why is this significant in practice? Realistically, I don't expect programmers would write functions like `corrupt-serialized` in an attempt to deliberately exploit some library's invariant, and the chance of someone doing so accidentally seems fairly remote. What worries me much more is that I read serializable structs in over the network, stick them in databases, and write them to disk. In the absence of a contract, any struct that has ever been deserialized may be silently corrupt. Some bits could have been flipped or some other process could have mangled a file on disk. An attacker could have intentionally manipulated a value sent over the network as part of some potential exploit. Perhaps most likely, the could have been a backwards-incompatible change (perhaps an unintentional one) to the contract on some field of the struct. A contract couldn't protect me from encountering a malformed serialized representation, but it would let me detect the error as soon as I try to deserialize the bad value, rather than delaying the error until I actually attempt to access a bad field. As things stand, if I usually just pass the value around and move it in and out of storage, I might be unknowingly passing around bad data indefinitely. I'm particularly concerned that people may be vulnerable to these possibilities without realizing it, as I have been. In every other way I can think of, `serializable-struct` is a drop-in replacement for `struct`, and, as I explained, a `server` module that used `struct` would have been sound—so it's surprising that the version with `serializable-struct` isn't, especially since the unprotected constructor is exposed through a side-channel that isn't obvious in the source text of the program. I said at the outset that I think this problem deserves a linguistic solution. Ultimately, I would like it to be as easy to implement a protected serializable struct as it is to use `struct` and `contract-out`. All of the workarounds that I know of seem to have significant limitations. One approach would be to drop down to the level of `prop:serializable` and `make-deserialize-info` and write manual checks, but that seems bad for all of the reasons contracts are so much better than manual checks in general. Robby had suggested <https://groups.google.com/d/msg/racket-users/xIrqdJE-qdk/B20DSUEjBAAJ> expecting clients not to use `deserialize` directly, which could help in a number of cases, but it would be a problem when working with libraries like `#lang web-server` that expect to be given serializable values and to serialize them for you. Matthew gave an example <https://groups.google.com/d/msg/racket-users/xIrqdJE-qdk/zoHUWdEiBAAJ> showing how to put the `deserialize-info` in a separate module, so that it uses the contracted version of the constructor: this could well be the basis for a robust solution, but it would definitely need a friendlier API than manipulating module-path indices. It might well help to add some hooks to `racket/serialize`—maybe something like `chaperone-serializable-struct`? I don't have a very concrete vision of how to approach this, but I think it's worth doing. In the mean time, I hope this is helpful to anyone else who may think they've been enforcing stronger guarantees than they actually have. -Philip -- You received this message because you are subscribed to the Google Groups "Racket Users" group. To unsubscribe from this group and stop receiving emails from it, send an email to racket-users+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.