Re: [racket-users] Re: Readers and Namespaces

2017-08-22 Thread William G Hatch

On Thu, Aug 17, 2017 at 10:11:12PM -0700, Alexis King wrote:

Generally, my recommendation is to essentially define your language in
two passes: a direct translation to s-expressions, followed by a phase
of macroexpansion. The first phase is what your reader interacts with.
Give the primitives that your reader produces names that are unlikely
to conflict with users’ code — specifically, prefix them with “#%” so
that they are clearly special. It can also be useful to include
characters in the identifier names that aren’t even valid identifier
characters in your language, but this is not always possible if any
character can be legally used in an identifier. Either way, this means
your my-let macro should likely be named something like #%let or
#%my-language-let, and your reader produce syntax objects without
lexical context that use these #%-prefixed primitives.


Just a clarification about how `#%` is special: in general identifiers
prefixed with `#%` signal that they may be redefined for other
languages.  For instance, identifiers like #%app, #%datum,
#%module-begin, etc, are added by the macro expander at various points
and people are invited to implement new versions of them when making a
new language so that these positions in a program mean something
different.  So if your reader may be used by multiple languages,
beware that you are communicating that #%whatever is a place where you
invite the new language to hijack it.


It feels a little inelegant that Racket’s hygiene system does not extend
to read-time, since it means the sort of hacks necessary in unhygienic
languages are sometimes necessary in Racket when implementing a reader.
Fortunately, a reader is much more self-contained and smaller in scope
than a macro-enabled language, so it usually isn’t a big deal. Perhaps
some language will eventually motivate a hygienic reader layer, but
Racket doesn’t current have one.


The problem is that the guarantees of separate compilation (all the
stuff about module visits and instantiation) only apply to expansion,
not reading.  However, if you write a macro that uses `read-syntax`
and returns the result, you can have binding information on the syntax
objects and it will work as expected -- the visit and instantiation
guarantees will be enforced because you are in macro-expansion time,
not the initial read time.  If the initial `read-syntax` used on a
module were treated as if it were a function used in phase 1, and the
visits and instantiations of the modules used by the reader were done
as they are with expansion, I think it would work for the reader to
return syntax objects with binding information.

--
You received this message because you are subscribed to the Google Groups "Racket 
Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to racket-users+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: [racket-users] Re: Readers and Namespaces

2017-08-18 Thread Shu-Hung You
If you are implementing a new language, this can be done by delegating
the "=-to-let" work to the macro-expander and let the reader simply
parenthesizing the input into s-expression form. In this approach, we
don't need to worry about being hygienic in the reader since the
actual work is done by the macro expander.

For example, we can make the reader turn the following input

; a-test.rkt
#lang my-let-lang
x = 5;
a = 8;
my-let = 10;
= = a + a;
u = x + my-let + = + = + = + = + =;
printf("u = ~a\n", u);

into some parenthesized form

(module a-test my-let-lang
(= x 5) (= a 8) (= my-let 10) (= = (+ a a))
(= u (+ x my-let = = = = =))
(printf "u = ~a\n" u))

and implement a module-begin macro that turns the parenthesized body
into my-let.

(provide
 (rename-out [new-module-begin #%module-begin]))
(require
 (rename-in racket [#%module-begin racket-module-begin]))

(define-syntax (new-module-begin stx)
  (syntax-parse stx
; parse the s-exp-ized "=" syntax here
[(_ (= x:id e:expr) ... body)
 #'(racket-module-begin
(make-nested-my-let ((x e) ...) body))]))

--Shu-Hung

On Fri, Aug 18, 2017 at 9:21 AM, gfb  wrote:
> Assuming the setup where you make a module syntax object and call 
> strip-context on it, you can add a scope to all the user's identifiers after 
> that so they're not considered “above” any of the language's identifiers. 
> Make a function to do the marking:
>
> (define marker (make-syntax-introducer #true))
>
> Then walk the syntax object tree and replace each user identifier ‘id’ with 
> (marker id 'add). Depending on your parsing setup, you could have a specific 
> non-terminal for places a user identifier occurs and have a very generic 
> syntax tree walker that looks for the non-terminal and adds the mark to the 
> identifier.
>
> If you make a second marker for all other identifiers you encounter, then 
> none of the user's identifiers will be “under” the language's bindings, and 
> error messages will be better if the user accidentally uses (without binding) 
> one of the language's names.
>
> --
> You received this message because you are subscribed to the Google Groups 
> "Racket Users" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to racket-users+unsubscr...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.

-- 
You received this message because you are subscribed to the Google Groups 
"Racket Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to racket-users+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: [racket-users] Re: Readers and Namespaces

2017-08-18 Thread gfb
Assuming the setup where you make a module syntax object and call strip-context 
on it, you can add a scope to all the user's identifiers after that so they're 
not considered “above” any of the language's identifiers. Make a function to do 
the marking:

(define marker (make-syntax-introducer #true))

Then walk the syntax object tree and replace each user identifier ‘id’ with 
(marker id 'add). Depending on your parsing setup, you could have a specific 
non-terminal for places a user identifier occurs and have a very generic syntax 
tree walker that looks for the non-terminal and adds the mark to the identifier.

If you make a second marker for all other identifiers you encounter, then none 
of the user's identifiers will be “under” the language's bindings, and error 
messages will be better if the user accidentally uses (without binding) one of 
the language's names.

-- 
You received this message because you are subscribed to the Google Groups 
"Racket Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to racket-users+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: [racket-users] Re: Readers and Namespaces

2017-08-17 Thread Sam Waxman
On Friday, August 18, 2017 at 1:11:16 AM UTC-4, Alexis King wrote:
> > On Aug 17, 2017, at 21:52, Sam Waxman  wrote:
> > 
> > On a related note, I've read that read-syntax is supposed to return a
> > syntax-object whose lexical context is stripped. Why is that? Doesn't
> > that make it impossible for the language to know the difference
> > between the let I used in an earlier file and the let that the user
> > types as an identifier?
> 
> Yes, this is likely the cause of your problem (and it’s what I guessed
> was probably going on before you sent the second post). This is, for
> better or for worse, currently a no-no in Racket — syntax objects
> produced by a #lang’s reader are supposed to only have source locations
> and syntax properties on them, not lexical context.
> 
> There are various reasons this can be explained, some historical, others
> technical. One explanation I’ve heard is that valid programs read by
> `read-syntax` are supposed to also be valid programs if read by `read`
> — this means lexical context shouldn’t matter. In practice, I think this
> is probably not a very strong argument (I think `read`’s existence and
> its relationship to the #lang protocol is mostly a historical artifact
> in Racket), but there are also questions about the meaning of syntax
> objects with lexical context that have not yet been seen by the
> macroexpander. Those technical details are outside of my realm of
> knowledge, but I can at least offer some solutions.
> 
> Generally, my recommendation is to essentially define your language in
> two passes: a direct translation to s-expressions, followed by a phase
> of macroexpansion. The first phase is what your reader interacts with.
> Give the primitives that your reader produces names that are unlikely
> to conflict with users’ code — specifically, prefix them with “#%” so
> that they are clearly special. It can also be useful to include
> characters in the identifier names that aren’t even valid identifier
> characters in your language, but this is not always possible if any
> character can be legally used in an identifier. Either way, this means
> your my-let macro should likely be named something like #%let or
> #%my-language-let, and your reader produce syntax objects without
> lexical context that use these #%-prefixed primitives.
> 
> The good news is, once you have converted your #lang’s source syntax to
> s-expressions, you can hand those expressions off to the macroexpander,
> and then you can use whatever hygienic names you wish. You can define
> your #%-prefixed primitives as hygienic macros that expand to
> nicely-scoped syntax objects with lexical context. For example, in a
> non-s-expression language I implemented in Racket, I converted lambda
> syntax into uses of a #%lambda form, and since the language supported
> pattern-matching, I defined #%lambda as a macro that expanded to plain
> old match-lambda** from racket/match. At that point, the name will not
> conflict, because macroexpansion is hygienic.
> 
> It feels a little inelegant that Racket’s hygiene system does not extend
> to read-time, since it means the sort of hacks necessary in unhygienic
> languages are sometimes necessary in Racket when implementing a reader.
> Fortunately, a reader is much more self-contained and smaller in scope
> than a macro-enabled language, so it usually isn’t a big deal. Perhaps
> some language will eventually motivate a hygienic reader layer, but
> Racket doesn’t current have one.

Wow, I'm very surprised that the reader doesn't have hygiene! Seems odd that 
you can't do a number of things if your language's identifiers are allowed to 
have all the characters present in racket's.

Thankfully my languages doesn't allow # or % to be id characters, so I can do 
as you say and just prefix my macros with them. Thanks! 

-- 
You received this message because you are subscribed to the Google Groups 
"Racket Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to racket-users+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: [racket-users] Re: Readers and Namespaces

2017-08-17 Thread Alexis King
> On Aug 17, 2017, at 21:52, Sam Waxman  wrote:
> 
> On a related note, I've read that read-syntax is supposed to return a
> syntax-object whose lexical context is stripped. Why is that? Doesn't
> that make it impossible for the language to know the difference
> between the let I used in an earlier file and the let that the user
> types as an identifier?

Yes, this is likely the cause of your problem (and it’s what I guessed
was probably going on before you sent the second post). This is, for
better or for worse, currently a no-no in Racket — syntax objects
produced by a #lang’s reader are supposed to only have source locations
and syntax properties on them, not lexical context.

There are various reasons this can be explained, some historical, others
technical. One explanation I’ve heard is that valid programs read by
`read-syntax` are supposed to also be valid programs if read by `read`
— this means lexical context shouldn’t matter. In practice, I think this
is probably not a very strong argument (I think `read`’s existence and
its relationship to the #lang protocol is mostly a historical artifact
in Racket), but there are also questions about the meaning of syntax
objects with lexical context that have not yet been seen by the
macroexpander. Those technical details are outside of my realm of
knowledge, but I can at least offer some solutions.

Generally, my recommendation is to essentially define your language in
two passes: a direct translation to s-expressions, followed by a phase
of macroexpansion. The first phase is what your reader interacts with.
Give the primitives that your reader produces names that are unlikely
to conflict with users’ code — specifically, prefix them with “#%” so
that they are clearly special. It can also be useful to include
characters in the identifier names that aren’t even valid identifier
characters in your language, but this is not always possible if any
character can be legally used in an identifier. Either way, this means
your my-let macro should likely be named something like #%let or
#%my-language-let, and your reader produce syntax objects without
lexical context that use these #%-prefixed primitives.

The good news is, once you have converted your #lang’s source syntax to
s-expressions, you can hand those expressions off to the macroexpander,
and then you can use whatever hygienic names you wish. You can define
your #%-prefixed primitives as hygienic macros that expand to
nicely-scoped syntax objects with lexical context. For example, in a
non-s-expression language I implemented in Racket, I converted lambda
syntax into uses of a #%lambda form, and since the language supported
pattern-matching, I defined #%lambda as a macro that expanded to plain
old match-lambda** from racket/match. At that point, the name will not
conflict, because macroexpansion is hygienic.

It feels a little inelegant that Racket’s hygiene system does not extend
to read-time, since it means the sort of hacks necessary in unhygienic
languages are sometimes necessary in Racket when implementing a reader.
Fortunately, a reader is much more self-contained and smaller in scope
than a macro-enabled language, so it usually isn’t a big deal. Perhaps
some language will eventually motivate a hygienic reader layer, but
Racket doesn’t current have one.

-- 
You received this message because you are subscribed to the Google Groups 
"Racket Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to racket-users+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.