[ The Types Forum, http://lists.seas.upenn.edu/mailman/listinfo/types-list ]
This is cool! Can you point me to relevant parts of the repo with examples? On Mon, Aug 22, 2022 at 1:03 AM Kalani Thielen <[email protected]> wrote: > Hi Jimmy, > > I’ve used a similar observation in this PL/compiler that I made for Morgan > Stanley a few years back: > https://urldefense.com/v3/__https://github.com/morganstanley/hobbes__;!!IBzWLUs!WAhCjaIxpVfUqykQo-tm_T17e9eXap8gH814S1Bs3-XuAVaYnKFOUf-oUA4tNwSkQq5IKR7_9ZrTBo2NETIYGQI-xJ68$ > > > … although I didn’t require strict field placement but just convertibility > as determined by structural subtyping. So e.g. ‘{x:int,y:int,z:int}’ is a > subtype of ‘{z:long,x:double}’ as witnessed by an inferred function that > drops “y”, swaps “x” and “z” while widening them into their respective > types. As a special case it optimizes for ’trivial convertibility’ (where > the supertype is a prefix of the subtype) and the identity function is > inferred. There’s a flipped rule for sum types, where it’s always safe to > convert into a large subsuming sum (hence the type 0 as the subtype for all > types and 1 as the supertype for all types). > > In this PL, I defined those rules as a system of type classes (ala > Haskell), with the additional feature that constraints on qualified types > are always eliminated at compile-time (since where this is used at Morgan > Stanley, runtime performance penalties have to be minimized). > > As well as being used to decide compatibility between file formats, this > method is also useful to decide compatibility between independent message > passing processes. Previously at Morgan Stanley, we had a lot of software > that kept “version numbers” and it could be very difficult to figure out if > two “versions” of a message format could be used together (or on a larger > scale, if a whole system was even coherent and ready to trade the next > day). Structural subtyping makes that problem much simpler, and also gives > you a straightforward explanation for why e.g. two programs can’t > communicate. > > Also on the subject of file formats, beyond just “flat headers”, I found > dependent sum types very useful to make “file references” as e.g. “T@f” > is an offset in the file “f” (dynamically determined) where a “T” value can > be found (which, if you mmap a large file, you still need to resolve > outside of the file contents since different programs will map at different > base addresses). With this it’s easy to build regular recursive types and > give type assignments to e.g. btrees and skip lists, decide subtyping on > indexes this way, etc. > > I also added probability distributions to structured types, so again > requiring inference of a subtyping witness between convertible structures > with different expectations, and used those expectations to encode > structures down to their entropy limit — so that we can minimally store > billions of ticks split out by symbol in a skip list and efficiently read > them out of mapped files concurrently, automatically converting into a few > different types for reporting tools and ad-hoc queries. > > Hope this helps! > > > > > On Aug 20, 2022, at 12:07 AM, Jimmy Koppel <[email protected]> wrote: > > > > [ The Types Forum, > http://lists.seas.upenn.edu/mailman/listinfo/types-list ] > > > > There’s a beautiful fact, not often discussed, that the theory of > subtyping > > explains forwards compatibility between file formats. > > > > > > For example here’s a simple file format for images: > > > > > > type header_length_v1 = {x : uint32 | x >= 2} > > > > type width = uint32 > > > > type height = uint32 > > > > type header_v1(n) = width * height * uint32[n-2] > > > > type image_data = uint32[] > > > > > > type image_file_v1 = {n: header_length_v1} * header_v1(n) * image_data > > > > > > > > This type allows for arbitrary extensions to the header, so that any > > extension to this format which adds more header fields is a subtype. A > > reader for this file format is thus forwards-compatible with any format > > extension which adds more header fields. > > > > > > However, even though I wrote it them above as products, the components > of a > > binary file format are not products. In particular, they do not allow > > nesting, e.g.: (a*b*c)*d is not compatible with (a*b)*d. Compatibility > > between two binary formats is determined by the same data being in “the > > same place.” Even though, classically, uint32 * uint8 is a subtype of > > uint32, one could not replace width in the type above with (uint32 * > uint8), > > as that would move the location of the height from bytes 8-11 to bytes > > 9-12, breaking existing code. > > > > > > For my class, I made up a “flat sequence type” to describe file formats, > > which effectively functions as a product type which must be > > right-associated. I would instead write the header(n) type as width ++ > > height ++ uint32[n-2]. I have not worked out all the theory, but I’ve > > noticed fewer students making mistakes about binary compatibly since > > introducing this idea. > > > > > > Can anyone point me to related work? > > > > > > Yours, > > > > Jimmy Koppel > > > > MIT CSAIL > > -- Jimmy Koppel twitter.com/jimmykoppel https://urldefense.com/v3/__http://www.jameskoppelcoaching.com__;!!IBzWLUs!WAhCjaIxpVfUqykQo-tm_T17e9eXap8gH814S1Bs3-XuAVaYnKFOUf-oUA4tNwSkQq5IKR7_9ZrTBo2NETIYGZRG-er8$
