Re: [racket-users] How would you implement autoquoted atoms?
Instead of a new `#%q-expression` form, I think there's potential to use `#%datum` or `quote` itself for this. Potentially, the only thing that makes numbers (for instance) special is that the reader, printer, IDE, and bytecode systems already know what module(s) the number structure type(s) come from. As long as user-defined structure types are able to provide each of those systems with the same knowledge (e.g. through a structure type property), then they can have the same benefits. One complication: User-defined structure types typically aren't interoperable across phases and module registries, the way kernel-defined, cross-phase persistent structure types like numbers and lists are. Even if we know what their module path is, that's not all the information needed, some of the information only "exists" once a generative `struct` definition has created it. In particular, I think using `quote` on user-defined types brings up cross-phase difficulties: A `quote` expression is typically computed at phase (N + 1) for use as a constant at phase N. If we expect to be able to compute it at compile time using phase-(N + 1) instances of user-defined structure types, and if we expect to be able to process it at run time using phase-N instances of those types, then the `quote` operation itself needs to perform some kind of marshalling between those. How to do that marshalling? Well, whatever materials we need, the structure type property can provide them. For instance, certain structure types might go through no change at all (e.g. simple procedures, perhaps). Certain others may do their marshalling using a (module path, identifier, data) intermediate stage just like Matthew Flatt and Alexis King are talking about. Maybe some values would even use complex higher-order marshalling behaviors (similar to contracts or an FFI), letting us take an object that uses phase-(N + 1) tools internally and wrap it up in such a way that it can process phase-N input and output values. And maybe some steps of the marshalling would use side effects, for instance to implement the interning zeRusski is talking about. Whatever the technique we use for marshalling any particular structure type, once the marshalling has completed at phase N, we're often going to want a value that's compatible with the phase-N instance of the structure type. And that means that by the time we ever see that result value, the structure type must have been defined at phase N already -- which means the module where the `quote` appears should (at least indirectly) have a phase-N dependency on the module that contains that definition. To make this work, this time it can be the structure type itself that "carries its own `require` at all times" (via the structure type property), and a `quote` implicitly acts as a `require` for all the structure types that appear inside it. This has a lot in common with the `#q` approach, but it seamlessly blends in user-defined types with core types: We can say that when we `quote` numbers and lists, we implicitly `require` their modules too, but that since those modules are part of the kernel, the `require` has just been imperceptible the whole time. For types that need to be marshalled to bytecode or saved as plain text from a graphical editor, I agree that the (module path, identifier, data) format seems like a fine choice. That said, I do want to point out that in this approach, the *bytecode* and *plain text* uses of (module path, identifier, data) triples would be subtly different from each other. In bytecode, the module path would be required at phase 0 (because, as far as I understand it, phase 0 is all that there is in the bytecode) and the construction would be performed near the start of the module. In plain text Racket code, that phase 0 behavior only happens as the *result* of compiling a `quote` form. Since `quote` marshals the value down one phase, it must have started out at phase 1, and thus we need the reader to return a phase-1 instantiation of the value. This suggests that although we would use a reader syntax like `#q(module-path identifier data)` in this approach, its behavior would be to `require` that module path at phase 1 and perform the construction immediately. On Tuesday, April 23, 2019 at 12:45:04 PM UTC-7, zeRusski wrote: > > (begin-for-syntax > (list 1 #k(foo) 2)) > ;; => ; tag: undefined; > > This can be solved with (require (for-syntax prelude/tags)) but as with > other autoquoted types I'd probably want to be able to just write them in > any phase. Docs say some stuff about namespaces having a scope that crosses > all phases plus separate scopes for each phase. Is there a way for a > binding to span all phases without cooperation from the user? > I think one thing that might help is to have #k(foo) read as: ( (let () (local-require (only-in prelude/tags tag)) tag) 'foo) This still supposes that `#%app`, `let`, `lo
Re: [racket-users] How would you implement autoquoted atoms?
On Tuesday, 23 April 2019 15:57:52 UTC+1, Matthew Flatt wrote: > > This response will be rambling, too. :) And here I thought I asked an embarrassingly silly question :) While implementing a "naive" version I ran into two issues that I kind of predicted upfront, but just wanted to make sure they indeed would present a problem: #lang prelude/tags (list 1 #k(foo) 2) ;; => (tag 'foo) as rewritten by our extended reader, ;; where tag is a struct provided by prelude/tags (begin-for-syntax (list 1 #k(foo) 2)) ;; => ; tag: undefined; This can be solved with (require (for-syntax prelude/tags)) but as with other autoquoted types I'd probably want to be able to just write them in any phase. Docs say some stuff about namespaces having a scope that crosses all phases plus separate scopes for each phase. Is there a way for a binding to span all phases without cooperation from the user? Another problem is with REPL. Above runs fine when I run the module, but not if I type in REPL. scratch.rkt> (list 1 #k(foo)) ; stdin::1273: read-syntax: bad syntax `#k` ; foo: undefined; ; cannot reference an identifier before its definition ; in module: "/Users/russki/Code/scratch.rkt" What's up with that? Does the reader there need to be defined specially somehow? I would be really happy to see someone experiment with these ideas, and > I'm pretty sure they could be implemented mostly by changing the > expander and reader in "racket/src/expander" I'd love to see this implemented, but Racket internals terrify me. As you can see above I can barely cope with the basics :) -- You received this message because you are subscribed to the Google Groups "Racket Users" group. To unsubscribe from this group and stop receiving emails from it, send an email to racket-users+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: [racket-users] How would you implement autoquoted atoms?
I find this email fascinating, as about three weeks ago, Spencer Florence and I discussed something almost identical, from the module path + symbol protocol all the way down to the trouble with `quote`. I had been intending to experiment with implementing the idea at some point, but I already have a few too many balls in the air right now (from the intdef changes I’ve been exploring to the `hash-key` implementation I started fiddling with, plus starting the process of looking for a new job), so I probably won’t get to it any time soon. But I’ll go on record as being interested in doing so. > On Apr 23, 2019, at 09:57, Matthew Flatt wrote: > > This response will be rambling, too. :) > > Especially with your follow-up message, I think you're getting to a > problem that we've wrestled with for a while. Sometimes we've called it > the "graphical syntax" problem, because it's related to having non-text > syntax, such as images in DrRacket (which are currently implemented in > an ad hoc way). Another example could be adding quaternion literals, > analogous to complex-number literals. In the cases that we've > considered, we want the language to be extensible with a new kind of > literal, but there's not necessary any specific import the language > extension in the program. That means there's a set of binding, > evaluation, and composition problems to solve. > > > I've discussed the problem the most with William Hatch, and here's as > far as we got with some ideas. > > There could be a new primitive datatype --- at the levels of symbols, > pairs, vectors, etc. --- to let the reader and expander communicate. > Just to have some concrete syntax for the default reader and printer, > let's say that the new kind of value can be written with `#q`, perhaps > of the form > > #q( ) > > The intent of the and components is to give > the value a kind of binding. That binding is analogous to syntax > objects, but without actually using syntax objects, which is arguably > the wrong concept to pull into the reader level. The remaining > is payload to be interpreted by the and > combination, such as image data or real numbers for the > components of a quaternion. > > Of course, a reader might construct these values as a result of parsing > some other text, but the idea is that printing out the result from that > reader with the default printer would use this `#q` notation, and then > that printed form could be read back in. That is, the values can be > consistently marshaled and unmarshaled, just like pairs and vectors and > numbers. > > The benefit of a new datatype is that it can have its own dispatch rule > in the expander. Probably a `#q` in an expression position would get > wrapped by an implicit `#%q-expression`, or something like that, which > would give a language control over whether it wants to allow arbitrary > literal values. But the default `#%q-expression` would consult the > value's "binding" via the and to expand the > value, which might inline an image or quaternion construction, or > something like that. In effect, the reader form carries its own > `require` at all times. > > Maybe interning corresponds to an expansion that lifts out a > calculation (in the sense of `syntax-local-lift-expression`), or maybe > that's not good enough; I'm not sure. > > We imagined that the primitive `quote` form might do something similar > to `#%q-expression` in the case that an image or quaternion is part of > a quoted S-expression. But, then, does there need to be an even > stronger `quote` that doesn't try to expand the `#q` content? I don't > know. > > Meanwhile, the and combination could also > identify a value-specific printer, where images might recognize when > the output context can support rendering the actual image, while > quaternions might print using "+" and "i" and "j". Or maybe that > problem should be left to `prop:custom-write`. > > At the level of writing down programs, the examples or images and > quaternions seem different. For images, DrRacket and other editors have > to include the concept of images somehow, and they insert values that > turn into `#q` forms when the program is viewed as a character > sequence. But quaternions are written with characters, so maybe that > syntax is more like `@` reading in that a language constructor on the > `#lang` line would add quaternion syntax to the readtable (which would > work for S-expression languages). > > > Overall, this reply is intended as a kind of endorsement and > elaboration of your thoughts: Yes, this is an interesting problem, and > it seems to need something new in Racket. And, yes, adding some new > datatype (with some default syntax) seems like the right direction, > mainly because it could trigger a new kind of dispatch in the expander. > Probably that new datatype should have something built-in that amounts > to a binding for it's compile-time and run-time realization. > > I would be really happy to see someone
Re: [racket-users] How would you implement autoquoted atoms?
This response will be rambling, too. :) Especially with your follow-up message, I think you're getting to a problem that we've wrestled with for a while. Sometimes we've called it the "graphical syntax" problem, because it's related to having non-text syntax, such as images in DrRacket (which are currently implemented in an ad hoc way). Another example could be adding quaternion literals, analogous to complex-number literals. In the cases that we've considered, we want the language to be extensible with a new kind of literal, but there's not necessary any specific import the language extension in the program. That means there's a set of binding, evaluation, and composition problems to solve. I've discussed the problem the most with William Hatch, and here's as far as we got with some ideas. There could be a new primitive datatype --- at the levels of symbols, pairs, vectors, etc. --- to let the reader and expander communicate. Just to have some concrete syntax for the default reader and printer, let's say that the new kind of value can be written with `#q`, perhaps of the form #q( ) The intent of the and components is to give the value a kind of binding. That binding is analogous to syntax objects, but without actually using syntax objects, which is arguably the wrong concept to pull into the reader level. The remaining is payload to be interpreted by the and combination, such as image data or real numbers for the components of a quaternion. Of course, a reader might construct these values as a result of parsing some other text, but the idea is that printing out the result from that reader with the default printer would use this `#q` notation, and then that printed form could be read back in. That is, the values can be consistently marshaled and unmarshaled, just like pairs and vectors and numbers. The benefit of a new datatype is that it can have its own dispatch rule in the expander. Probably a `#q` in an expression position would get wrapped by an implicit `#%q-expression`, or something like that, which would give a language control over whether it wants to allow arbitrary literal values. But the default `#%q-expression` would consult the value's "binding" via the and to expand the value, which might inline an image or quaternion construction, or something like that. In effect, the reader form carries its own `require` at all times. Maybe interning corresponds to an expansion that lifts out a calculation (in the sense of `syntax-local-lift-expression`), or maybe that's not good enough; I'm not sure. We imagined that the primitive `quote` form might do something similar to `#%q-expression` in the case that an image or quaternion is part of a quoted S-expression. But, then, does there need to be an even stronger `quote` that doesn't try to expand the `#q` content? I don't know. Meanwhile, the and combination could also identify a value-specific printer, where images might recognize when the output context can support rendering the actual image, while quaternions might print using "+" and "i" and "j". Or maybe that problem should be left to `prop:custom-write`. At the level of writing down programs, the examples or images and quaternions seem different. For images, DrRacket and other editors have to include the concept of images somehow, and they insert values that turn into `#q` forms when the program is viewed as a character sequence. But quaternions are written with characters, so maybe that syntax is more like `@` reading in that a language constructor on the `#lang` line would add quaternion syntax to the readtable (which would work for S-expression languages). Overall, this reply is intended as a kind of endorsement and elaboration of your thoughts: Yes, this is an interesting problem, and it seems to need something new in Racket. And, yes, adding some new datatype (with some default syntax) seems like the right direction, mainly because it could trigger a new kind of dispatch in the expander. Probably that new datatype should have something built-in that amounts to a binding for it's compile-time and run-time realization. I would be really happy to see someone experiment with these ideas, and I'm pretty sure they could be implemented mostly by changing the expander and reader in "racket/src/expander" --- although some cooperation from the bytecode writer and reader is probably also needed, and I'd be happy to help more there. At Tue, 23 Apr 2019 06:08:05 -0700 (PDT), zeRusski wrote: > I must apologies for what follows will be more of a rambling than an > exercise in clear thinking. That is because I am a bit stuck and thought > I'd seek help. > > I have been thinking some about languages and how it isn't always easy to > clearly separate language being implemented from the language used to > implement it. The picture gets particularly blurry in Lisps. This time > around the question that gave me pause was one of implementing symbols. > Better still Racket keywords
[racket-users] How would you implement autoquoted atoms?
I must apologies for what follows will be more of a rambling than an exercise in clear thinking. That is because I am a bit stuck and thought I'd seek help. I have been thinking some about languages and how it isn't always easy to clearly separate language being implemented from the language used to implement it. The picture gets particularly blurry in Lisps. This time around the question that gave me pause was one of implementing symbols. Better still Racket keywords, since like many lispy terms "symbol" has so many confusing meanings that its nigh impossible to tell what people mean exactly. I specifically talk about autoquoted datums. Two interned symbols that are equal? are eq?, two keywords that are equal? are eq?, 42 is eq? to 42, etc. Symbols are bad example cause people often think about 'symbol or identifier with semantics being: perform variable lookup. Someone on this list said everything in Racket is a struct, so lets start there. (struct kw (symbol)) We can also come up with some syntactic representation and extend our language with read and read-syntax that translate this new syntax into kw-struct as needed. But then we also demand that two syntactically equal kws end up being the same value in the language, so no matter where our reader encounters #kw(foo) it must produce the same value. This must be true across module boundaries, too. Just like Racket keywords. So, what are we to do? There's time when the reader runs, followed by expansion. Does this mean they need to communicate somehow? Also, the reader "runs", that is it is written in Racket (or some derivative) after all, but reader's environment isn't one where expansion happens, and that of the final code being evaled is different still. Right? To ensure eq? of two kws with the same printed representation we'll probably want to keep some global table around that keeps track of "interned" kws. So, for any two #kw(foo), our reader would have to produce something like (lookup-intern-kw #:symbol 'foo), which at run-time would consult the table of kws and return the (kw 'foo) already there, or create a fresh entry and return that new struct. Two observations: (a) it follows that the global table is one that must exist at runtime - not while the reader runs, and (b) we end up relying on the host language for symbol equality after all 'foo is eq? 'foo and that allows us to key the table by symbols e.g. 'foo. Is this how you would do it? Is there a better way that involves the reader more and relies on the runtime less? Bonus question. What if we allow families of kws effectively partitioning kws into namespaces: #kw(family name). This appears a small variation of the above, where you'd simply assemble a compound symbol from family and name to use for the table lookup. That is until you allow parameterizing by "current-family", so kw declaration can omit the family part and it gets inserted as needed - not unreasonable in a language with modules or explicit namespaces. We could allow something like this: #lang racket/kws #:current-family addams #kw(morticia) now any kw within a module without family must translate into one of addams family. But also any #kw(addams morticia) in a different module must be eq? to the one above and in fact to any one like that anywhere. One exception is probably if we send them across Racket spaces which IIUC amount to running separate VMs. In the above example the reader would have to be aware of #:current-family declaration that may appear at the top of the module. We'd probably translate that to some (current-family 'addams) parameter setup, or wrap #%module-begin body in parameterize, then every kw without explicit family would have to check the (current-family) parameter. Is there a way to push this more to the read-time? If there is, what happens if we load the module and enter REPL? Could we ensure its reader is properly parameterized that it would use appropriate current-family? How screwed up is my thinking here? Is there a way to leverage the reader more and rely on the runtime less? I imagine that'd make kws discussed lighter weight? We talk about phases some in Racket, but reader runs somewhere or rather sometime, too. I'd like to have a clearer picture in my head, I guess. Thanks -- You received this message because you are subscribed to the Google Groups "Racket Users" group. To unsubscribe from this group and stop receiving emails from it, send an email to racket-users+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.
[racket-users] How would you implement autoquoted atoms?
I must apologies for what follows will be more of a rambling than an exercise in clear thinking. That is because I am a bit stuck and thought I'd seek help. I have been thinking some about languages and how it isn't always easy to clearly separate language being implemented from the language used to implement it. The picture gets particularly blurry in Lisps. This time around the question that gave me pause was one of implementing symbols. Better still Racket keywords, since like many lispy terms "symbol" has so many confusing meanings that its nigh impossible to tell what people mean exactly. I specifically talk about autoquoted datums. Two interned symbols that are equal? are eq?, two keywords that are equal? are eq?, 42 is eq? to 42, etc. Symbols are bad example cause people often think about 'symbol or identifier with semantics being: perform variable lookup. Someone on this list said everything in Racket is a struct, so lets start there. (struct kw (symbol)) We can also come up with some syntactic representation and extend our language with read and read-syntax that translate this new syntax into kw-struct as needed. But then we also demand that two syntactically equal kws end up being the same value in the language, so no matter where our reader encounters #kw(foo) it must produce the same value. This must be true across module boundaries, too. Just like Racket keywords. So, what are we to do? There's time when the reader runs, followed by expansion. Does this mean they need to communicate somehow? Also, the reader "runs", that is it is written in Racket (or some derivative) after all, but reader's environment isn't one where expansion happens, and that of the final code being evaled is different still. Right? To ensure eq? of two kws with the same printed representation we'll probably want to keep some global table around that keeps track of "interned" kws. So, for any two #kw(foo), our reader would have to produce something like (lookup-intern-kw #:symbol 'foo), which at run-time would consult the table of kws and return the (kw 'foo) already there, or create a fresh entry and return that new struct. Two observations: (a) it follows that the global table is one that must exist at runtime - not while the reader runs, and (b) we end up relying on the host language for symbol equality after all 'foo is eq? 'foo and that allows us to key the table by symbols e.g. 'foo. Is this how you would do it? Is there a better way that involves the reader more and relies on the runtime less? Bonus question. What if we allow families of kws effectively partitioning kws into namespaces: #kw(family name). This appears a small variation of the above, where you'd simply assemble a compound symbol from family and name to use for the table lookup. That is until you allow parameterizing by "current-family", so kw declaration can omit the family part and it gets inserted as needed - not unreasonable in a language with modules or explicit namespaces. We could allow something like this: #lang racket/kws #:current-family addams #kw(morticia) now any kw within a module without family must translate into one of addams family. But also any #kw(addams morticia) in a different module must be eq? to the one above and in fact to any one like that anywhere. One exception is probably if we send them across Racket spaces which IIUC amount to running separate VMs. In the above example the reader would have to be aware of #:current-family declaration that may appear at the top of the module. We'd probably translate that to some (current-family 'addams) parameter setup, or wrap #%module-begin body in parameterize, then every kw without explicit family would have to check the (current-family) parameter. Is there a way to push this more to the read-time? If there is, what happens if we load the module and enter REPL? Could we ensure its reader is properly parameterized that it would use appropriate current-family? How screwed up is my thinking here? Is there a way to leverage the reader more and rely on the runtime less? I imagine that'd make kws discussed lighter weight? We talk about phases some in Racket, but reader runs somewhere or rather sometime, too. I'd like to have a clearer picture in my head, I guess. Thanks -- You received this message because you are subscribed to the Google Groups "Racket Users" group. To unsubscribe from this group and stop receiving emails from it, send an email to racket-users+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.