On Mon, Jul 2, 2012 at 12:22 AM, David A. Wheeler <[email protected]> wrote: > Alan Manuel Gloria <[email protected]> said: >> I find looking at Scheme code gives me headaches. This now makes >> sweet-expressions a high priority for me. > > Excellent! Some things I like, some things I don't or have concerns about. > Let's talk this through!! > > >> So I propose the following parser for sweet expressions.... > > Great, nice to see a specific (counter) proposal. > > >> ; ignore completely empty lines >> swt-expr -> empty-line* swt-expr-core >> $2 > > I think this comment is misleading. I think what you mean is > "ignore completely empty lines *before* a new sweet-expression". > I found that for interactive use, it's important to interpret a blank line > as "this is the end of the expression" once you've started, per my discussion: > http://www.dwheeler.com/readable/sweet-expressions.html > > >> ; some simple utility parsers >> empty-line -> htspace* eol >> htspace -> SPACE | TAB | FORMFEED | VTAB | comment >> comment -> COMMENT-MARKER (not eol)* >> eol -> CR | LF >> ; tabs are disallowed currently >> ; we could define a preprocessor >> ; that expands tabs. > > I think tabs really needs to be allowed, and that we must not presume any > particular expansion rule. > A lot of people use tabs, and there is no real standard for tab expansion. > (The only "standard", 8-space positions, is widely ignored; I hate them > myself.) > > My implementation accepts tabs, spaces, or a combo, as long as you're > consistent. So if your next intent uses 2 spaces, then all the following > items > with that indent (or more) need to use 2 spaces. I think that is > far more likely to gain acceptance, which is the key thing here. > > I'm not sure what you mean with the level rules. > I think we must NOT assume that every next level indent is exactly > one space; while that's easy to implement, it's a hard-to-read result. > I typically use 2 or 4 spaces for each level, for example.
It's a declarative expression of the desired indentation rules. Although the (body lvl) looks as if lvl is an input parameter, it isn't. In a "real" implementation it becomes an output parameter, because we need to ensure that for an (i-expr lvl) -> head eol-empty-lines (body inlvl), lvl < inlvl. So it's not one space: it's any number of spaces, as long as lvl < inlvl. Like I said, it's a declarative form that will require quite a bit of massaging to convert to an implementation. It's more like math than an actual program ^^. To support the extension "any combination of spaces and tabs", we could instead use a list of space and tab characters, in reverse, and modify the (spaces n) parser to use that instead (I suppose we should rename it to starting-indent instead). Also, instead of the relation lvl < inlvl we should define some relation that basically goes "lvl should be a tail sublist of inlvl, that is not the same as inlvl". Also, let me reiterate, this is a declarative form, so (spaces n) doesn't necessarily mean that n is an input. It would probably have to be implemented with the n as an output of the sub-parser spaces. > >> ; FORMFEED and VTAB are >> ; not always represented as >> ; single characters either... > > But they are single chars in the datastream, and that's all that matters. > But when you're aligning stuff, the number of characters in which your editor shows them matters. I guess again we could just use the "perfect sublist" relation and allow FORMFEED and VTAB in the spaces production (and rename it to starting-indent). >> ; implements "whitespace at top-level >> ; DISABLES I-expressions" >> swt-expr-core -> htspace+ mdn-expr >> $2 >> ; I-expressions must start at >> ; indent 0. Sorry dwheeler. > > I'm not sure why you're sorry :-). I've gone back & forth on how to > interpret this situation; I can be easily talked into this change. > > In section 6.1 "Indentation issues" of: > http://www.dwheeler.com/readable/version02.html > I discussed this; I even said: > "We could require that the top-level line begin at the left edge. This is not > unknown; Python, a popular language using indentation, requires that the top > level begin at the left edge (and raises an error if an attempt is made > otherwise). This completely eliminates the need for hidden state - top level > statements only start at the left edge, so there's nothing to remember." > > Quietly disabling indentation processing, if the first line is indented, > would make processing old code (and disabling indentation when undesired) > especially easy, so there are advantages to this idea. > > The main thing that worries me about silently disabling indentation processing > if they don't start on the edge is that it'd be easy to have code that > quietly fails to be interpreted properly just because it doesn't > start on the left edge. That's easy to have happen. > If the indent is just one space, it's not obvious it's happening, especially > if it's quoted from another source. The main alternative I considered > was making beginning-from-indent straight illegal (like Python). This is probably easiest, then. > > One alternative would be a single warning the first time it happens when > reading a stream (a file or interactive session). E.G.: > "WARNING: Text begins indented, indentation processing disabled". > Then you can use it... but at least you know you intended to. > > So, I think this is a very reasonable idea. It's not what I had > originally proposed, but I had noted this as an area I was especially > uncertain of. Comments, anyone? > Well, currently the specced parser will actively skip empty lines to look for the continuation of the body. So no amount of ENTER ENTER will actually get the expression read in on the REPL. LOL. There are a few reasons why I skip over empty lines instead of completing and returning the expression: 1. By doing so, I can treat either CR or LF as eol. As it happens, a DOS encoding means that the eol is actually encoded as CR LF. But by skipping over empty lines, the "extra" LF is simply treated as an empty line and skipped over. 2. Because in a REALLY long program, we want to separate code with empty lines sometimes. Even "inner", indented code. In particular, consider that the "module" syntax in R6RS requires all module contents to be sub-expressions of the upper module syntax form: so, every internal function must be indented within that form. If empty lines ended an expression, then the writer of the module can't separate functions of the module with empty lines, because the expression being read in is the module expression.. One alternative is to simply make the following changes: 1. rename eol-empty-lines to eol-comment-lines. 2. Modify eol-empty-lines to eol-comment-lines -> htspace* eol comment-line* 3. Modify eol to: eol -> CR LF eol -> CR eol -> LF 4. Add comment-line -> htspace* COMMENT-MARKER (not eol)* eol > >> Proposal: >> 1. Remove the SPLICE-at-the-end rule!... > > If it's impractical to implement, then it must die. > It's not entirely clear to me it's impractical to implement, though. > > But even if we can implement it, do we *need* the splice-at-the-end rule? > If it's not worth the trouble, let's junk it. > > Part of my concern was with stuff like: > (myfunction :option1 (f a) :option2 (g b) ...). > Where there are parameters at the same list level, but you want singletons > or pairings to make it clear that there is a structure beyond what's > in the list. This happens a *LOT* in some Lisp-based languages > (including Common Lisp and Arc). > > But it looks like the EOL case really isn't needed; you can just do this: > myfunction > :option1 \ f(a) > :option2 \ g(b) > ... > > So maybe we just don't need it. I proposed the rule for discussion because > I was sure it needed discussion. Thanks! > > >> ... If GROUP is instead "." as proposed by Arne, we might >> actually be able to code closer to the parser declarative spec. > > "Group" isn't *that* hard to implement, just read in the atom, and > compare after you're done... then change the list based on what you have. Please also consider: > A key issue in the next implementation is that I think it needs to be > "obviously correct". The current sweet-expression implementation works > well enough to be useful, but I think people will want to be confident > that it is rock-solid. Now, it would probably be easier to show obvious correctness if the structure of the implementation is "near enough" to the structure of the specced parser. And it's hard to follow that structure while being constrained by one-character-lookahead AND a multi-character syntax element. So my implementation will use a single-character element for GROUP, because I want to be as near to the specced parser as possible. > > But yes, an alphabetic atom for a special construct is really odd. > I only accepted this because there was an SFRI > that did this, and I was trying to avoid recreating the wheel. > Using punctuation to control grouping *does* make sense. > One disadvantage: then we can't appeal to the SFRI, and any current users > would have to change their code (though it'd be trivial to do). > > What's the sense of everyone else? Should we switch the grouping > construct to punctuation, such as "." or "\"? If so, which one? > As I mentioned earlier, I'm uncomfortable with "." as its replacement, > because that is too easy to not see, and grouping is *important*. > > One minor downside of giving initial "\" with following whitespace a > meaning is that it makes it harder in some Lisp variants > to define or address atoms that begin > with whitespace. I've never seen that actually *used* for any purpose unless > that Lisp didn't support strings, so I think that downside is worth it. > In Common Lisp, you could just switch to "| atom_beginning_with_space|" > anyway. > > Another downside is that it means that the parser has to deal with > parsing anything with "\" while doing indentation processing. > But parsers don't have to be written often; we want to make it is to > develop software and data, and if a small-one cost has many benefits, > that's okay. > > >> ... >> dwheeler mentioned the use of "\" for the GROUP character. It happens to >> be the same as the SPLICE character. My initial instinct is that this is a >> non-breaking change, i.e. using the same character for both will not break >> things, as long as we remove the SPLICE-at-the-eol rule (i.e. only allow >> SPLICE at the start or in the middle of things). This means that the >> "GROUP" meaning of the character is not ambiguous with SPLICE-at-the-eol - >> remember, a "\" on a line by itself is either GROUP eol or SPLICE eol. >> ... >> For now I think we should investigate the following alternatives: >> 1. \ = GROUP = SPLICE, remove SPLICE-at-the-eol rule. >> 2. . = GROUP, \ = SPLICE >> 2.1. remove SPLICE-at-the-eol rule. >> 2.2. don't remove SPLICE-at-the-eol rule. > > All options have their pluses. I'm leaning more towards #1 than #2. > > I find it useful to walk though some examples and see what they look like. > Let's look at #1 for a moment; at the beginning of the line, "\" would mean > "wrap with an extra (...)" (like a no-length function name). > For example: "(let ((x 2) (y 3)) (* x y))" can currently be represented as: > let > group > x 2 > y 3 > {x * y} > > Changing "group" to "\" would mean it would look like this: > let > \ > x 2 > y 3 > {x * y} > > or alternatively: > let > \ > x 2 \ y 3 > {x * y} > > Or alternatively: > let > \ x 2 > y 3 > {x * y} > > Since x(2) == (x 2), the following (really clean format) would also work: > let > \ x(2) y(3) > {x * y} > > I do NOT think we should accept this as a synonym: > let > \ x 2 \ y 3 > {x * y} > > Because I think *that* should mean the same as: > let > \ x 2 > y 3 > {x * y} > which would mean "(let ((x 2)) (y 3) (* x y))" > and NOT: "(let ((x 2) (y 3)) (* x y))". > > What should repeated "\" mean at the beginning? I.E., what should this mean?: > \ \ a \ b > I think that after handling the first "\", we should recurse the rule, so > the first two "\" would both be leading "\"s. Thus, this would be the same > as: > \ \ a > b > which equals > \ ((a b)) > which would be "(((a b)))". > > > Any other thoughts on those alternatives? Any other alternatives? The original purpose of the SPLICE rule was to support Arc and CL. In addition, Egil Moller mentioned that GROUP was intended to simply be an "invisible" symbol. Thus: group foo bar ===> foo bar ===> (foo bar) i.e. it doesn't *actually* wrap an additional () layer: group is just a symbol that gets dropped "magically", even though indentation processing will see group. However changing "group" to "\" and changing its meaning to "wrap another layer of ()" means: \ foo bar ===> ((foo bar)) So this is definitely a change. Note that my original proposal, for Arc, was this: if cond1 \ expr1 cond2 \ expr2 \ expr3 Which was intended to be: (if cond1 expr1 cond2 expr2 expr3) But if we change \ to mean "definitely add another layer of ()" instead of "act as if we indented up to here, but skip this symbol" (the way GROUP currently acts), then the Arc example is parsed as: (if cond1 (expr1) cond2 (expr2) (expr3)) So I'm wondering if we're going forward too fast and forgetting why the rule got there the first place. The reason why I think GROUP and SPLICE can be the same is because of Egil Moller's explanation that GROUP is intended to be an invisible symbol when at the head of an indentation. So, SPLICE and Egil Moller's GROUP act the same when at the start of a line. > > A key issue in the next implementation is that I think it needs to be > "obviously correct". The current sweet-expression implementation works > well enough to be useful, but I think people will want to be confident > that it is rock-solid. > > --- David A. Wheeler > ------------------------------------------------------------------------------ Live Security Virtual Conference Exclusive live event will cover all the ways today's security and threat landscape has changed and how IT managers can respond. Discussions will include endpoint security, mobile security and the latest in malware threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ _______________________________________________ Readable-discuss mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/readable-discuss
