Re: [Readable-discuss] Proposed update and expansion of SRFI-49 (I-expressions) - indented s-expressions

David A. Wheeler Sun, 01 Jul 2012 09:22:55 -0700

Alan Manuel Gloria <[email protected]> said:
> I find looking at Scheme code gives me headaches. This now makes
> sweet-expressions a high priority for me.


Excellent!  Some things I like, some things I don't or have concerns about.  
Let's talk this through!!


> So I propose the following parser for sweet expressions....

Great, nice to see a specific (counter) proposal.


> ; ignore completely empty lines
> swt-expr -> empty-line* swt-expr-core
>  $2

I think this comment is misleading. I think what you mean is
"ignore completely empty lines *before* a new sweet-expression".
I found that for interactive use, it's important to interpret a blank line
as "this is the end of the expression" once you've started, per my discussion:
 http://www.dwheeler.com/readable/sweet-expressions.html


> ; some simple utility parsers
> empty-line -> htspace* eol
> htspace -> SPACE | TAB | FORMFEED | VTAB | comment
> comment -> COMMENT-MARKER (not eol)*
> eol -> CR | LF
> ; tabs are disallowed currently
> ; we could define a preprocessor
> ; that expands tabs.

I think tabs really needs to be allowed, and that we must not presume any
particular expansion rule.
A lot of people use tabs, and there is no real standard for tab expansion.
(The only "standard", 8-space positions, is widely ignored; I hate them myself.)

My implementation accepts tabs, spaces, or a combo, as long as you're
consistent.  So if your next intent uses 2 spaces, then all the following items
with that indent (or more) need to use 2 spaces.  I think that is
far more likely to gain acceptance, which is the key thing here.

I'm not sure what you mean with the level rules.
I think we must NOT assume that every next level indent is exactly
one space; while that's easy to implement, it's a hard-to-read result.
I typically use 2 or 4 spaces for each level, for example.

> ; FORMFEED and VTAB are
> ; not always represented as
> ; single characters either...

But they are single chars in the datastream, and that's all that matters.

> ; implements "whitespace at top-level
> ; DISABLES I-expressions"
> swt-expr-core -> htspace+ mdn-expr
>   $2
> ; I-expressions must start at
> ; indent 0.  Sorry dwheeler.

I'm not sure why you're sorry :-). I've gone back & forth on how to
interpret this situation; I can be easily talked into this change.

In section 6.1 "Indentation issues" of:
  http://www.dwheeler.com/readable/version02.html
I discussed this; I even said:
"We could require that the top-level line begin at the left edge. This is not 
unknown; Python, a popular language using indentation, requires that the top 
level begin at the left edge (and raises an error if an attempt is made 
otherwise). This completely eliminates the need for hidden state - top level 
statements only start at the left edge, so there's nothing to remember."

Quietly disabling indentation processing, if the first line is indented,
would make processing old code (and disabling indentation when undesired)
especially easy, so there are advantages to this idea.

The main thing that worries me about silently disabling indentation processing
if they don't start on the edge is that it'd be easy to have code that
quietly fails to be interpreted properly just because it doesn't
start on the left edge.  That's easy to have happen.
If the indent is just one space, it's not obvious it's happening, especially
if it's quoted from another source.  The main alternative I considered
was making beginning-from-indent straight illegal (like Python).

One alternative would be a single warning the first time it happens when
reading a stream (a file or interactive session). E.G.:
"WARNING: Text begins indented, indentation processing disabled".
Then you can use it... but at least you know you intended to.

So, I think this is a very reasonable idea. It's not what I had
originally proposed, but I had noted this as an area I was especially
uncertain of.  Comments, anyone?


> Proposal:
> 1.  Remove the SPLICE-at-the-end rule!...

If it's impractical to implement, then it must die.
It's not entirely clear to me it's impractical to implement, though.

But even if we can implement it, do we *need* the splice-at-the-end rule?
If it's not worth the trouble, let's junk it.

Part of my concern was with stuff like:
  (myfunction :option1 (f a) :option2 (g b) ...).
Where there are parameters at the same list level, but you want singletons
or pairings to make it clear that there is a structure beyond what's
in the list.  This happens a *LOT* in some Lisp-based languages
(including Common Lisp and Arc).

But it looks like the EOL case really isn't needed; you can just do this:
  myfunction
    :option1 \ f(a)
    :option2 \ g(b)
    ...

So maybe we just don't need it.  I proposed the rule for discussion because
I was sure it needed discussion.  Thanks!


> ... If GROUP is instead "." as proposed by Arne, we might
> actually be able to code closer to the parser declarative spec.

"Group" isn't *that* hard to implement, just read in the atom, and
compare after you're done... then change the list based on what you have.

But yes, an alphabetic atom for a special construct is really odd.
I only accepted this because there was an SFRI
that did this, and I was trying to avoid recreating the wheel.
Using punctuation to control grouping *does* make sense.
One disadvantage: then we can't appeal to the SFRI, and any current users
would have to change their code (though it'd be trivial to do).

What's the sense of everyone else?  Should we switch the grouping
construct to punctuation, such as "." or "\"?  If so, which one?
As I mentioned earlier, I'm uncomfortable with "." as its replacement,
because that is too easy to not see, and grouping is *important*.

One minor downside of giving initial "\" with following whitespace a
meaning is that it makes it harder in some Lisp variants
to define or address atoms that begin
with whitespace.  I've never seen that actually *used* for any purpose unless
that Lisp didn't support strings, so I think that downside is worth it.
In Common Lisp, you could just switch to "| atom_beginning_with_space|"
anyway.

Another downside is that it means that the parser has to deal with
parsing anything with "\" while doing indentation processing.
But parsers don't have to be written often; we want to make it is to
develop software and data, and if a small-one cost has many benefits,
that's okay.


> ...
> dwheeler mentioned the use of "\" for the GROUP character. It happens to
> be the same as the SPLICE character. My initial instinct is that this is a
> non-breaking change, i.e. using the same character for both will not break
> things, as long as we remove the SPLICE-at-the-eol rule (i.e. only allow
> SPLICE at the start or in the middle of things). This means that the
> "GROUP" meaning of the character is not ambiguous with SPLICE-at-the-eol -
> remember, a "\" on a line by itself is either GROUP eol or SPLICE eol.
> ...
> For now I think we should investigate the following alternatives:
> 1. \ = GROUP = SPLICE, remove SPLICE-at-the-eol rule.
> 2. . = GROUP, \ = SPLICE
> 2.1. remove SPLICE-at-the-eol rule.
> 2.2. don't remove SPLICE-at-the-eol rule.

All options have their pluses.  I'm leaning more towards #1 than #2.

I find it useful to walk though some examples and see what they look like.
Let's look at #1 for a moment; at the beginning of the line, "\" would mean
"wrap with an extra (...)" (like a no-length function name).
For example: "(let ((x 2) (y 3)) (* x y))" can currently be represented as:
let
  group
    x 2
    y 3
  {x * y}

Changing "group" to "\" would mean it would look like this:
let
  \
    x 2
    y 3
  {x * y}

or alternatively:
let
  \
    x 2 \ y 3
  {x * y}

Or alternatively:
let
  \ x 2
    y 3
  {x * y}

Since x(2) == (x 2), the following (really clean format) would also work:
let
  \ x(2) y(3)
  {x * y}

I do NOT think we should accept this as a synonym:
let
  \ x 2 \ y 3
  {x * y}

Because I think *that* should mean the same as:
let
  \ x 2
  y 3
  {x * y}
which would mean "(let ((x 2)) (y 3) (* x y))"
and NOT:         "(let ((x 2) (y 3)) (* x y))".

What should repeated "\" mean at the beginning? I.E., what should this mean?:
  \ \ a \ b
I think that after handling the first "\", we should recurse the rule, so
the first two "\" would both be leading "\"s.  Thus, this would be the same as:
  \ \ a
      b
which equals
  \ ((a b))
which would be "(((a b)))".


Any other thoughts on those alternatives?  Any other alternatives?

A key issue in the next implementation is that I think it needs to be
"obviously correct".  The current sweet-expression implementation works
well enough to be useful, but I think people will want to be confident
that it is rock-solid.

--- David A. Wheeler


------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
_______________________________________________
Readable-discuss mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/readable-discuss

Re: [Readable-discuss] Proposed update and expansion of SRFI-49 (I-expressions) - indented s-expressions

Reply via email to