On Fri, Dec 14, 2012 at 7:01 AM, David A. Wheeler <dwhee...@dwheeler.com> wrote: > Alan Manuel Gloria: >> As you pointed out before: nobody seems to have made any particularly >> serious attempts at describing indentation-sensitive syntax before. >> There's the standard "INDENT" "DEDENT" tokenization, but that may not >> work well with us, what with "!" being an indentation space outside of >> () [] {} but being a character inside of them. > > Yep, but it looks like we'll have to wade in anyway. > > I've been reading over the two "spec-*" files (including the one you made). > They're both obsoleted, but I think we can use them as a starting point. I > propose the following: > * let's focus on getting a correct BNF embedded in the draft SRFI, instead of > a separate file, to avoid trying to sync them up. > * we'll leave the existing spec-* files in the directory for now, so we can > easily consult them. > * I propose using "::=" for definitions instead of "->"; the former is more > convenient inside HTML. > * Let's start by focusing ONLY on the t-expression material (indentation, > etc.), and IGNORE at first the BNF for n-expressions and c-expressions. > After all, we already have SRFI-105, so we can just appeal to that, and just > stop at "n-expr" in the BNF for now. We might want to add that later, but > I'm not worried about that part.
Seems fine. > * After looking at your try at the spec, I think I see how to do it (and in a > simpler way) with INDENT/DEDENT/SAME. So let's try that to start with. We > may need to tackle it several times before we get a clear model, another > reason to wait to deal with the BNF of n-expressions. There are some issues here. For example, I assume that the n-expr production will end up also encountering INDENT/DEDENT/SAME tokens. Question: what happens when n-expr production encounters INDENT/DEDENT/SAME? a) treat as whitespace. Fine, what happens when you do this? (foo ! bar) Which gets tokenized to: LPAREN 'foo INDENT 'bar RPAREN Following "INDENT is whitespace inside n-expr" means we get (foo bar), but current actual code emits (foo ! bar). Our current code is constrained to have only one-char lookahead. (arguably this is our desired behavior, but it's not how we currently do things in the actual code) b) don't generate INDENT/DEDENT/SAME while in the n-expr production. But that requires changing tokenizers during the production, which implies that we're better off expressing the tokenizer as part of the parser itself. (this is the main reason why my spec- formulation uses parameterized productions) If we switch tokenizers, then we need to have some serious thought on the mechanics of that switching! And that implies (I think, maybe you'll get a better idea) using something like my parameterized spec- formulation. -- Formalization is a tricky thing. Personally I'd go for option (a) but it seems to require a better code organization than what we currently have. Incidentally, I think it's fortunate that I wrote letterfall - if we decide to reorg the code, we have something substantial to test it on ^^ Sincerely, AmkG ------------------------------------------------------------------------------ LogMeIn Rescue: Anywhere, Anytime Remote support for IT. Free Trial Remotely access PCs and mobile devices and provide instant support Improve your efficiency, and focus on delivering more value-add services Discover what IT Professionals Know. Rescue delivers http://p.sf.net/sfu/logmein_12329d2d _______________________________________________ Readable-discuss mailing list Readable-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/readable-discuss