On Fri, Dec 14, 2012 at 7:01 AM, David A. Wheeler <dwhee...@dwheeler.com> wrote:
> Alan Manuel Gloria:
>> As you pointed out before: nobody seems to have made any particularly
>> serious attempts at describing indentation-sensitive syntax before.
>> There's the standard "INDENT" "DEDENT" tokenization, but that may not
>> work well with us, what with "!" being an indentation space outside of
>> () [] {} but being a character inside of them.
>
> Yep, but it looks like we'll have to wade in anyway.
>
> I've been reading over the two "spec-*" files (including the one you made).  
> They're both obsoleted, but I think we can use them as a starting point.  I 
> propose the following:
> * let's focus on getting a correct BNF embedded in the draft SRFI, instead of 
> a separate file, to avoid trying to sync them up.
> * we'll leave the existing spec-* files in the directory for now, so we can 
> easily consult them.
> * I propose using "::=" for definitions instead of "->"; the former is more 
> convenient inside HTML.
> * Let's start by focusing ONLY on the t-expression material (indentation, 
> etc.), and IGNORE at first the BNF for n-expressions and c-expressions.  
> After all, we already have SRFI-105, so we can just appeal to that, and just 
> stop at "n-expr" in the BNF for now.  We might want to add that later, but 
> I'm not worried about that part.

Seems fine.

> * After looking at your try at the spec, I think I see how to do it (and in a 
> simpler way) with INDENT/DEDENT/SAME.  So let's try that to start with.  We 
> may need to tackle it several times before we get a clear model, another 
> reason to wait to deal with the BNF of n-expressions.

There are some issues here.

For example, I assume that the n-expr production will end up also
encountering INDENT/DEDENT/SAME tokens.

Question: what happens when n-expr production encounters INDENT/DEDENT/SAME?

a) treat as whitespace.  Fine, what happens when you do this?

(foo
! bar)

Which gets tokenized to:

LPAREN
'foo
INDENT
'bar
RPAREN

Following "INDENT is whitespace inside n-expr" means we get (foo bar),
but current actual code emits (foo ! bar).  Our current code is
constrained to have only one-char lookahead.

(arguably this is our desired behavior, but it's not how we currently
do things in the actual code)

b) don't generate INDENT/DEDENT/SAME while in the n-expr production.
But that requires changing tokenizers during the production, which
implies that we're better off expressing the tokenizer as part of the
parser itself.

(this is the main reason why my spec- formulation uses parameterized
productions)

If we switch tokenizers, then we need to have some serious thought on
the mechanics of that switching!  And that implies (I think, maybe
you'll get a better idea) using something like my parameterized spec-
formulation.

--

Formalization is a tricky thing.  Personally I'd go for option (a) but
it seems to require a better code organization than what we currently
have.

Incidentally, I think it's fortunate that I wrote letterfall - if we
decide to reorg the code, we have something substantial to test it on
^^

Sincerely,
AmkG

------------------------------------------------------------------------------
LogMeIn Rescue: Anywhere, Anytime Remote support for IT. Free Trial
Remotely access PCs and mobile devices and provide instant support
Improve your efficiency, and focus on delivering more value-add services
Discover what IT Professionals Know. Rescue delivers
http://p.sf.net/sfu/logmein_12329d2d
_______________________________________________
Readable-discuss mailing list
Readable-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/readable-discuss

Reply via email to