The ANTLR v3 book ("The Definitive ANTLR Reference") specifically discusses how
to process indentation, which I thought was a good thing.
But now that I'm looking at it more carefully (page 95), I'm realizing that the
book is just wrong. The book proposes triggering indent processing when there
are 1-or-more indent characters on a line. That cannot possibly work; that
would mean that DEDENTs would never be generated on blank lines or lines that
do not begin with an indent character.
Instead, in the ANTLR implementation, we should trigger as part of EOL
processing. After processing an EOL, read in any following indentation, and
emit INDENT/DEDENT from that. That fails to deal with indents at the beginning
of a file, but we can detect & process that specially. It's basically what we
do now. That requires that ANTLR be configured to allow multiple emits per
lexical token, but that only required a few lines of code from its FAQ (which
I've already added).
That algorithm also fails to deal with EOF without a preceding EOL. We could
deal with that as a special case too, though I'm inclined to just forbid it in
the spec. Handling EOF withing a preceding EOL is ugly in general, and it's
not the something you see in practice in source or structured data.
--- David A. Wheeler
------------------------------------------------------------------------------
Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, HTML5, CSS,
MVC, Windows 8 Apps, JavaScript and much more. Keep your skills current
with LearnDevNow - 3,200 step-by-step video tutorials by Microsoft
MVPs and experts. ON SALE this month only -- learn more at:
http://p.sf.net/sfu/learnmore_122712
_______________________________________________
Readable-discuss mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/readable-discuss