Alan Manuel Gloria <[email protected]> said: > I find looking at Scheme code gives me headaches. This now makes > sweet-expressions a high priority for me.
Excellent! Some things I like, some things I don't or have concerns about. Let's talk this through!! > So I propose the following parser for sweet expressions.... Great, nice to see a specific (counter) proposal. > ; ignore completely empty lines > swt-expr -> empty-line* swt-expr-core > $2 I think this comment is misleading. I think what you mean is "ignore completely empty lines *before* a new sweet-expression". I found that for interactive use, it's important to interpret a blank line as "this is the end of the expression" once you've started, per my discussion: http://www.dwheeler.com/readable/sweet-expressions.html > ; some simple utility parsers > empty-line -> htspace* eol > htspace -> SPACE | TAB | FORMFEED | VTAB | comment > comment -> COMMENT-MARKER (not eol)* > eol -> CR | LF > ; tabs are disallowed currently > ; we could define a preprocessor > ; that expands tabs. I think tabs really needs to be allowed, and that we must not presume any particular expansion rule. A lot of people use tabs, and there is no real standard for tab expansion. (The only "standard", 8-space positions, is widely ignored; I hate them myself.) My implementation accepts tabs, spaces, or a combo, as long as you're consistent. So if your next intent uses 2 spaces, then all the following items with that indent (or more) need to use 2 spaces. I think that is far more likely to gain acceptance, which is the key thing here. I'm not sure what you mean with the level rules. I think we must NOT assume that every next level indent is exactly one space; while that's easy to implement, it's a hard-to-read result. I typically use 2 or 4 spaces for each level, for example. > ; FORMFEED and VTAB are > ; not always represented as > ; single characters either... But they are single chars in the datastream, and that's all that matters. > ; implements "whitespace at top-level > ; DISABLES I-expressions" > swt-expr-core -> htspace+ mdn-expr > $2 > ; I-expressions must start at > ; indent 0. Sorry dwheeler. I'm not sure why you're sorry :-). I've gone back & forth on how to interpret this situation; I can be easily talked into this change. In section 6.1 "Indentation issues" of: http://www.dwheeler.com/readable/version02.html I discussed this; I even said: "We could require that the top-level line begin at the left edge. This is not unknown; Python, a popular language using indentation, requires that the top level begin at the left edge (and raises an error if an attempt is made otherwise). This completely eliminates the need for hidden state - top level statements only start at the left edge, so there's nothing to remember." Quietly disabling indentation processing, if the first line is indented, would make processing old code (and disabling indentation when undesired) especially easy, so there are advantages to this idea. The main thing that worries me about silently disabling indentation processing if they don't start on the edge is that it'd be easy to have code that quietly fails to be interpreted properly just because it doesn't start on the left edge. That's easy to have happen. If the indent is just one space, it's not obvious it's happening, especially if it's quoted from another source. The main alternative I considered was making beginning-from-indent straight illegal (like Python). One alternative would be a single warning the first time it happens when reading a stream (a file or interactive session). E.G.: "WARNING: Text begins indented, indentation processing disabled". Then you can use it... but at least you know you intended to. So, I think this is a very reasonable idea. It's not what I had originally proposed, but I had noted this as an area I was especially uncertain of. Comments, anyone? > Proposal: > 1. Remove the SPLICE-at-the-end rule!... If it's impractical to implement, then it must die. It's not entirely clear to me it's impractical to implement, though. But even if we can implement it, do we *need* the splice-at-the-end rule? If it's not worth the trouble, let's junk it. Part of my concern was with stuff like: (myfunction :option1 (f a) :option2 (g b) ...). Where there are parameters at the same list level, but you want singletons or pairings to make it clear that there is a structure beyond what's in the list. This happens a *LOT* in some Lisp-based languages (including Common Lisp and Arc). But it looks like the EOL case really isn't needed; you can just do this: myfunction :option1 \ f(a) :option2 \ g(b) ... So maybe we just don't need it. I proposed the rule for discussion because I was sure it needed discussion. Thanks! > ... If GROUP is instead "." as proposed by Arne, we might > actually be able to code closer to the parser declarative spec. "Group" isn't *that* hard to implement, just read in the atom, and compare after you're done... then change the list based on what you have. But yes, an alphabetic atom for a special construct is really odd. I only accepted this because there was an SFRI that did this, and I was trying to avoid recreating the wheel. Using punctuation to control grouping *does* make sense. One disadvantage: then we can't appeal to the SFRI, and any current users would have to change their code (though it'd be trivial to do). What's the sense of everyone else? Should we switch the grouping construct to punctuation, such as "." or "\"? If so, which one? As I mentioned earlier, I'm uncomfortable with "." as its replacement, because that is too easy to not see, and grouping is *important*. One minor downside of giving initial "\" with following whitespace a meaning is that it makes it harder in some Lisp variants to define or address atoms that begin with whitespace. I've never seen that actually *used* for any purpose unless that Lisp didn't support strings, so I think that downside is worth it. In Common Lisp, you could just switch to "| atom_beginning_with_space|" anyway. Another downside is that it means that the parser has to deal with parsing anything with "\" while doing indentation processing. But parsers don't have to be written often; we want to make it is to develop software and data, and if a small-one cost has many benefits, that's okay. > ... > dwheeler mentioned the use of "\" for the GROUP character. It happens to > be the same as the SPLICE character. My initial instinct is that this is a > non-breaking change, i.e. using the same character for both will not break > things, as long as we remove the SPLICE-at-the-eol rule (i.e. only allow > SPLICE at the start or in the middle of things). This means that the > "GROUP" meaning of the character is not ambiguous with SPLICE-at-the-eol - > remember, a "\" on a line by itself is either GROUP eol or SPLICE eol. > ... > For now I think we should investigate the following alternatives: > 1. \ = GROUP = SPLICE, remove SPLICE-at-the-eol rule. > 2. . = GROUP, \ = SPLICE > 2.1. remove SPLICE-at-the-eol rule. > 2.2. don't remove SPLICE-at-the-eol rule. All options have their pluses. I'm leaning more towards #1 than #2. I find it useful to walk though some examples and see what they look like. Let's look at #1 for a moment; at the beginning of the line, "\" would mean "wrap with an extra (...)" (like a no-length function name). For example: "(let ((x 2) (y 3)) (* x y))" can currently be represented as: let group x 2 y 3 {x * y} Changing "group" to "\" would mean it would look like this: let \ x 2 y 3 {x * y} or alternatively: let \ x 2 \ y 3 {x * y} Or alternatively: let \ x 2 y 3 {x * y} Since x(2) == (x 2), the following (really clean format) would also work: let \ x(2) y(3) {x * y} I do NOT think we should accept this as a synonym: let \ x 2 \ y 3 {x * y} Because I think *that* should mean the same as: let \ x 2 y 3 {x * y} which would mean "(let ((x 2)) (y 3) (* x y))" and NOT: "(let ((x 2) (y 3)) (* x y))". What should repeated "\" mean at the beginning? I.E., what should this mean?: \ \ a \ b I think that after handling the first "\", we should recurse the rule, so the first two "\" would both be leading "\"s. Thus, this would be the same as: \ \ a b which equals \ ((a b)) which would be "(((a b)))". Any other thoughts on those alternatives? Any other alternatives? A key issue in the next implementation is that I think it needs to be "obviously correct". The current sweet-expression implementation works well enough to be useful, but I think people will want to be confident that it is rock-solid. --- David A. Wheeler ------------------------------------------------------------------------------ Live Security Virtual Conference Exclusive live event will cover all the ways today's security and threat landscape has changed and how IT managers can respond. Discussions will include endpoint security, mobile security and the latest in malware threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ _______________________________________________ Readable-discuss mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/readable-discuss
