Re: [racket-users] How would you implement autoquoted atoms?

Matthew Flatt Tue, 23 Apr 2019 07:58:16 -0700

This response will be rambling, too. :)

Especially with your follow-up message, I think you're getting to a
problem that we've wrestled with for a while. Sometimes we've called it
the "graphical syntax" problem, because it's related to having non-text
syntax, such as images in DrRacket (which are currently implemented in
an ad hoc way). Another example could be adding quaternion literals,
analogous to complex-number literals. In the cases that we've
considered, we want the language to be extensible with a new kind of
literal, but there's not necessary any specific import the language
extension in the program. That means there's a set of binding,
evaluation, and composition problems to solve.

I've discussed the problem the most with William Hatch, and here's as
far as we got with some ideas.

There could be a new primitive datatype --- at the levels of symbols,
pairs, vectors, etc. --- to let the reader and expander communicate.
Just to have some concrete syntax for the default reader and printer,
let's say that the new kind of value can be written with `#q`, perhaps
of the form

  #q(<module-path> <identifier> <S-expression>)

The intent of the <module-path> and <identifier> components is to give
the value a kind of binding. That binding is analogous to syntax
objects, but without actually using syntax objects, which is arguably
the wrong concept to pull into the reader level. The remaining
<S-expression> is payload to be interpreted by the <module-path> and
<identifier> combination, such as image data or real numbers for the
components of a quaternion.

Of course, a reader might construct these values as a result of parsing
some other text, but the idea is that printing out the result from that
reader with the default printer would use this `#q` notation, and then
that printed form could be read back in. That is, the values can be
consistently marshaled and unmarshaled, just like pairs and vectors and
numbers.

The benefit of a new datatype is that it can have its own dispatch rule
in the expander. Probably a `#q` in an expression position would get
wrapped by an implicit `#%q-expression`, or something like that, which
would give a language control over whether it wants to allow arbitrary
literal values. But the default `#%q-expression` would consult the
value's "binding" via the <module-path> and <identifier> to expand the
value, which might inline an image or quaternion construction, or
something like that. In effect, the reader form carries its own
`require` at all times.

Maybe interning corresponds to an expansion that lifts out a
calculation (in the sense of `syntax-local-lift-expression`), or maybe
that's not good enough; I'm not sure.

We imagined that the primitive `quote` form might do something similar
to `#%q-expression` in the case that an image or quaternion is part of
a quoted S-expression. But, then, does there need to be an even
stronger `quote` that doesn't try to expand the `#q` content? I don't
know.

Meanwhile, the <module-path> and <identifier> combination could also
identify a value-specific printer, where images might recognize when
the output context can support rendering the actual image, while
quaternions might print using "+" and "i" and "j". Or maybe that
problem should be left to `prop:custom-write`.

At the level of writing down programs, the examples or images and
quaternions seem different. For images, DrRacket and other editors have
to include the concept of images somehow, and they insert values that
turn into `#q` forms when the program is viewed as a character
sequence. But quaternions are written with characters, so maybe that
syntax is more like `@` reading in that a language constructor on the
`#lang` line would add quaternion syntax to the readtable (which would
work for S-expression languages).

Overall, this reply is intended as a kind of endorsement and
elaboration of your thoughts: Yes, this is an interesting problem, and
it seems to need something new in Racket. And, yes, adding some new
datatype (with some default syntax) seems like the right direction,
mainly because it could trigger a new kind of dispatch in the expander.
Probably that new datatype should have something built-in that amounts
to a binding for it's compile-time and run-time realization.

I would be really happy to see someone experiment with these ideas, and
I'm pretty sure they could be implemented mostly by changing the
expander and reader in "racket/src/expander" --- although some
cooperation from the bytecode writer and reader is probably also needed,
and I'd be happy to help more there.

At Tue, 23 Apr 2019 06:08:05 -0700 (PDT), zeRusski wrote:
> I must apologies for what follows will be more of a rambling than an 
> exercise in clear thinking. That is because I am a bit stuck and thought 
> I'd seek help.
> 
> I have been thinking some about languages and how it isn't always easy to 
> clearly separate language being implemented from the language used to 
> implement it. The picture gets particularly blurry in Lisps. This time 
> around the question that gave me pause was one of implementing symbols. 
> Better still Racket keywords, since like many lispy terms "symbol" has so 
> many confusing meanings that its nigh impossible to tell what people mean 
> exactly. I specifically talk about autoquoted datums. Two interned symbols 
> that are equal? are eq?, two keywords that are equal? are eq?, 42 is eq? to 
> 42, etc. Symbols are bad example cause people often think about 'symbol or 
> identifier with semantics being: perform variable lookup.
> 
> Someone on this list said everything in Racket is a struct, so lets start 
> there.
>  
>  (struct kw (symbol))
> 
> We can also come up with some syntactic representation and extend our 
> language with read and read-syntax that translate this new syntax into 
> kw-struct as needed. But then we also demand that two syntactically equal 
> kws end up being the same value in the language, so no matter where our 
> reader encounters #kw(foo) it must produce the same value. This must be 
> true across module boundaries, too. Just like Racket keywords. So, what are 
> we to do? There's time when the reader runs, followed by expansion. Does 
> this mean they need to communicate somehow? Also, the reader "runs", that 
> is it is written in Racket (or some derivative) after all, but reader's 
> environment isn't one where expansion happens, and that of the final code 
> being evaled is different still. Right? 
> 
> To ensure eq? of two kws with the same printed representation we'll 
> probably want to keep some global table around that keeps track of 
> "interned" kws. So, for any two #kw(foo), our reader would have to produce 
> something like (lookup-intern-kw  #:symbol 'foo), which at run-time would 
> consult the table of kws and return the (kw 'foo) already there, or create 
> a fresh entry and return that new struct. Two observations: (a) it follows 
> that the global table is one that must exist at runtime - not while the 
> reader runs, and (b) we end up relying on the host language for symbol 
> equality after all 'foo is eq? 'foo and that allows us to key the table by 
> symbols e.g. 'foo.
> 
> Is this how you would do it? Is there a better way that involves the reader 
> more and relies on the runtime less?
> 
> Bonus question. What if we allow families of kws effectively partitioning 
> kws into namespaces: #kw(family name). This appears a small variation of 
> the above, where you'd simply assemble a compound symbol from family and 
> name to use for the table lookup. That is until you allow parameterizing by 
> "current-family", so kw declaration can omit the family part and it gets 
> inserted as needed - not unreasonable in a language with modules or 
> explicit namespaces. We could allow something like this:
> 
> #lang racket/kws
> #:current-family addams
> 
> #kw(morticia)
> 
> now any kw within a module without family must translate into one of addams 
> family. But also any #kw(addams morticia) in a different module must be eq? 
> to the one above and in fact to any one like that anywhere. One exception 
> is probably if we send them across Racket spaces which IIUC amount to 
> running separate VMs. In the above example the reader would have to be 
> aware of #:current-family declaration that may appear at the top of the 
> module. We'd probably translate that to some (current-family 'addams) 
> parameter setup, or wrap #%module-begin body in parameterize, then every kw 
> without explicit family would have to check the (current-family) parameter. 
> 
> Is there a way to push this more to the read-time? If there is, what 
> happens if we load the module and enter REPL? Could we ensure its reader is 
> properly parameterized that it would use appropriate current-family?
> 
> How screwed up is my thinking here? Is there a way to leverage the reader 
> more and rely on the runtime less? I imagine that'd make kws discussed 
> lighter weight? We talk about phases some in Racket, but reader runs 
> somewhere or rather sometime, too. I'd like to have a clearer picture in my 
> head, I guess.
> 
> Thanks
> 
> -- 
> You received this message because you are subscribed to the Google Groups 
> "Racket Users" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to racket-users+unsubscr...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.

-- 
You received this message because you are subscribed to the Google Groups 
"Racket Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to racket-users+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: [racket-users] How would you implement autoquoted atoms?

Reply via email to