Re: [racket-users] Re: Readers and Namespaces
On Thu, Aug 17, 2017 at 10:11:12PM -0700, Alexis King wrote: Generally, my recommendation is to essentially define your language in two passes: a direct translation to s-expressions, followed by a phase of macroexpansion. The first phase is what your reader interacts with. Give the primitives that your reader produces names that are unlikely to conflict with users’ code — specifically, prefix them with “#%” so that they are clearly special. It can also be useful to include characters in the identifier names that aren’t even valid identifier characters in your language, but this is not always possible if any character can be legally used in an identifier. Either way, this means your my-let macro should likely be named something like #%let or #%my-language-let, and your reader produce syntax objects without lexical context that use these #%-prefixed primitives. Just a clarification about how `#%` is special: in general identifiers prefixed with `#%` signal that they may be redefined for other languages. For instance, identifiers like #%app, #%datum, #%module-begin, etc, are added by the macro expander at various points and people are invited to implement new versions of them when making a new language so that these positions in a program mean something different. So if your reader may be used by multiple languages, beware that you are communicating that #%whatever is a place where you invite the new language to hijack it. It feels a little inelegant that Racket’s hygiene system does not extend to read-time, since it means the sort of hacks necessary in unhygienic languages are sometimes necessary in Racket when implementing a reader. Fortunately, a reader is much more self-contained and smaller in scope than a macro-enabled language, so it usually isn’t a big deal. Perhaps some language will eventually motivate a hygienic reader layer, but Racket doesn’t current have one. The problem is that the guarantees of separate compilation (all the stuff about module visits and instantiation) only apply to expansion, not reading. However, if you write a macro that uses `read-syntax` and returns the result, you can have binding information on the syntax objects and it will work as expected -- the visit and instantiation guarantees will be enforced because you are in macro-expansion time, not the initial read time. If the initial `read-syntax` used on a module were treated as if it were a function used in phase 1, and the visits and instantiations of the modules used by the reader were done as they are with expansion, I think it would work for the reader to return syntax objects with binding information. -- You received this message because you are subscribed to the Google Groups "Racket Users" group. To unsubscribe from this group and stop receiving emails from it, send an email to racket-users+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: [racket-users] Re: Readers and Namespaces
If you are implementing a new language, this can be done by delegating the "=-to-let" work to the macro-expander and let the reader simply parenthesizing the input into s-expression form. In this approach, we don't need to worry about being hygienic in the reader since the actual work is done by the macro expander. For example, we can make the reader turn the following input ; a-test.rkt #lang my-let-lang x = 5; a = 8; my-let = 10; = = a + a; u = x + my-let + = + = + = + = + =; printf("u = ~a\n", u); into some parenthesized form (module a-test my-let-lang (= x 5) (= a 8) (= my-let 10) (= = (+ a a)) (= u (+ x my-let = = = = =)) (printf "u = ~a\n" u)) and implement a module-begin macro that turns the parenthesized body into my-let. (provide (rename-out [new-module-begin #%module-begin])) (require (rename-in racket [#%module-begin racket-module-begin])) (define-syntax (new-module-begin stx) (syntax-parse stx ; parse the s-exp-ized "=" syntax here [(_ (= x:id e:expr) ... body) #'(racket-module-begin (make-nested-my-let ((x e) ...) body))])) --Shu-Hung On Fri, Aug 18, 2017 at 9:21 AM, gfb wrote: > Assuming the setup where you make a module syntax object and call > strip-context on it, you can add a scope to all the user's identifiers after > that so they're not considered “above” any of the language's identifiers. > Make a function to do the marking: > > (define marker (make-syntax-introducer #true)) > > Then walk the syntax object tree and replace each user identifier ‘id’ with > (marker id 'add). Depending on your parsing setup, you could have a specific > non-terminal for places a user identifier occurs and have a very generic > syntax tree walker that looks for the non-terminal and adds the mark to the > identifier. > > If you make a second marker for all other identifiers you encounter, then > none of the user's identifiers will be “under” the language's bindings, and > error messages will be better if the user accidentally uses (without binding) > one of the language's names. > > -- > You received this message because you are subscribed to the Google Groups > "Racket Users" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to racket-users+unsubscr...@googlegroups.com. > For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups "Racket Users" group. To unsubscribe from this group and stop receiving emails from it, send an email to racket-users+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: [racket-users] Re: Readers and Namespaces
Assuming the setup where you make a module syntax object and call strip-context on it, you can add a scope to all the user's identifiers after that so they're not considered “above” any of the language's identifiers. Make a function to do the marking: (define marker (make-syntax-introducer #true)) Then walk the syntax object tree and replace each user identifier ‘id’ with (marker id 'add). Depending on your parsing setup, you could have a specific non-terminal for places a user identifier occurs and have a very generic syntax tree walker that looks for the non-terminal and adds the mark to the identifier. If you make a second marker for all other identifiers you encounter, then none of the user's identifiers will be “under” the language's bindings, and error messages will be better if the user accidentally uses (without binding) one of the language's names. -- You received this message because you are subscribed to the Google Groups "Racket Users" group. To unsubscribe from this group and stop receiving emails from it, send an email to racket-users+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: [racket-users] Re: Readers and Namespaces
On Friday, August 18, 2017 at 1:11:16 AM UTC-4, Alexis King wrote: > > On Aug 17, 2017, at 21:52, Sam Waxman wrote: > > > > On a related note, I've read that read-syntax is supposed to return a > > syntax-object whose lexical context is stripped. Why is that? Doesn't > > that make it impossible for the language to know the difference > > between the let I used in an earlier file and the let that the user > > types as an identifier? > > Yes, this is likely the cause of your problem (and it’s what I guessed > was probably going on before you sent the second post). This is, for > better or for worse, currently a no-no in Racket — syntax objects > produced by a #lang’s reader are supposed to only have source locations > and syntax properties on them, not lexical context. > > There are various reasons this can be explained, some historical, others > technical. One explanation I’ve heard is that valid programs read by > `read-syntax` are supposed to also be valid programs if read by `read` > — this means lexical context shouldn’t matter. In practice, I think this > is probably not a very strong argument (I think `read`’s existence and > its relationship to the #lang protocol is mostly a historical artifact > in Racket), but there are also questions about the meaning of syntax > objects with lexical context that have not yet been seen by the > macroexpander. Those technical details are outside of my realm of > knowledge, but I can at least offer some solutions. > > Generally, my recommendation is to essentially define your language in > two passes: a direct translation to s-expressions, followed by a phase > of macroexpansion. The first phase is what your reader interacts with. > Give the primitives that your reader produces names that are unlikely > to conflict with users’ code — specifically, prefix them with “#%” so > that they are clearly special. It can also be useful to include > characters in the identifier names that aren’t even valid identifier > characters in your language, but this is not always possible if any > character can be legally used in an identifier. Either way, this means > your my-let macro should likely be named something like #%let or > #%my-language-let, and your reader produce syntax objects without > lexical context that use these #%-prefixed primitives. > > The good news is, once you have converted your #lang’s source syntax to > s-expressions, you can hand those expressions off to the macroexpander, > and then you can use whatever hygienic names you wish. You can define > your #%-prefixed primitives as hygienic macros that expand to > nicely-scoped syntax objects with lexical context. For example, in a > non-s-expression language I implemented in Racket, I converted lambda > syntax into uses of a #%lambda form, and since the language supported > pattern-matching, I defined #%lambda as a macro that expanded to plain > old match-lambda** from racket/match. At that point, the name will not > conflict, because macroexpansion is hygienic. > > It feels a little inelegant that Racket’s hygiene system does not extend > to read-time, since it means the sort of hacks necessary in unhygienic > languages are sometimes necessary in Racket when implementing a reader. > Fortunately, a reader is much more self-contained and smaller in scope > than a macro-enabled language, so it usually isn’t a big deal. Perhaps > some language will eventually motivate a hygienic reader layer, but > Racket doesn’t current have one. Wow, I'm very surprised that the reader doesn't have hygiene! Seems odd that you can't do a number of things if your language's identifiers are allowed to have all the characters present in racket's. Thankfully my languages doesn't allow # or % to be id characters, so I can do as you say and just prefix my macros with them. Thanks! -- You received this message because you are subscribed to the Google Groups "Racket Users" group. To unsubscribe from this group and stop receiving emails from it, send an email to racket-users+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: [racket-users] Re: Readers and Namespaces
> On Aug 17, 2017, at 21:52, Sam Waxman wrote: > > On a related note, I've read that read-syntax is supposed to return a > syntax-object whose lexical context is stripped. Why is that? Doesn't > that make it impossible for the language to know the difference > between the let I used in an earlier file and the let that the user > types as an identifier? Yes, this is likely the cause of your problem (and it’s what I guessed was probably going on before you sent the second post). This is, for better or for worse, currently a no-no in Racket — syntax objects produced by a #lang’s reader are supposed to only have source locations and syntax properties on them, not lexical context. There are various reasons this can be explained, some historical, others technical. One explanation I’ve heard is that valid programs read by `read-syntax` are supposed to also be valid programs if read by `read` — this means lexical context shouldn’t matter. In practice, I think this is probably not a very strong argument (I think `read`’s existence and its relationship to the #lang protocol is mostly a historical artifact in Racket), but there are also questions about the meaning of syntax objects with lexical context that have not yet been seen by the macroexpander. Those technical details are outside of my realm of knowledge, but I can at least offer some solutions. Generally, my recommendation is to essentially define your language in two passes: a direct translation to s-expressions, followed by a phase of macroexpansion. The first phase is what your reader interacts with. Give the primitives that your reader produces names that are unlikely to conflict with users’ code — specifically, prefix them with “#%” so that they are clearly special. It can also be useful to include characters in the identifier names that aren’t even valid identifier characters in your language, but this is not always possible if any character can be legally used in an identifier. Either way, this means your my-let macro should likely be named something like #%let or #%my-language-let, and your reader produce syntax objects without lexical context that use these #%-prefixed primitives. The good news is, once you have converted your #lang’s source syntax to s-expressions, you can hand those expressions off to the macroexpander, and then you can use whatever hygienic names you wish. You can define your #%-prefixed primitives as hygienic macros that expand to nicely-scoped syntax objects with lexical context. For example, in a non-s-expression language I implemented in Racket, I converted lambda syntax into uses of a #%lambda form, and since the language supported pattern-matching, I defined #%lambda as a macro that expanded to plain old match-lambda** from racket/match. At that point, the name will not conflict, because macroexpansion is hygienic. It feels a little inelegant that Racket’s hygiene system does not extend to read-time, since it means the sort of hacks necessary in unhygienic languages are sometimes necessary in Racket when implementing a reader. Fortunately, a reader is much more self-contained and smaller in scope than a macro-enabled language, so it usually isn’t a big deal. Perhaps some language will eventually motivate a hygienic reader layer, but Racket doesn’t current have one. -- You received this message because you are subscribed to the Google Groups "Racket Users" group. To unsubscribe from this group and stop receiving emails from it, send an email to racket-users+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.