[Readable-discuss] ANTLR and indentation: ANTLR book wrong

David A. Wheeler Thu, 03 Jan 2013 03:47:13 -0800

The ANTLR v3 book ("The Definitive ANTLR Reference") specifically discusses how 
to process indentation, which I thought was a good thing.


But now that I'm looking at it more carefully (page 95), I'm realizing that the 
book is just wrong.  The book proposes triggering indent processing when there 
are 1-or-more indent characters on a line.  That cannot possibly work; that 
would mean that DEDENTs would never be generated on blank lines or lines that 
do not begin with an indent character.

Instead, in the ANTLR implementation, we should trigger as part of EOL 
processing.  After processing an EOL, read in any following indentation, and 
emit INDENT/DEDENT from that.  That fails to deal with indents at the beginning 
of a file, but we can detect & process that specially.  It's basically what we 
do now.  That requires that ANTLR be configured to allow multiple emits per 
lexical token, but that only required a few lines of code from its FAQ (which 
I've already added).

That algorithm also fails to deal with EOF without a preceding EOL.  We could 
deal with that as a special case too, though I'm inclined to just forbid it in 
the spec.  Handling EOF withing a preceding EOL is ugly in general, and it's 
not the something you see in practice in source or structured data.

 --- David A. Wheeler

------------------------------------------------------------------------------
Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, HTML5, CSS,
MVC, Windows 8 Apps, JavaScript and much more. Keep your skills current
with LearnDevNow - 3,200 step-by-step video tutorials by Microsoft
MVPs and experts. ON SALE this month only -- learn more at:
http://p.sf.net/sfu/learnmore_122712
_______________________________________________
Readable-discuss mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/readable-discuss

[Readable-discuss] ANTLR and indentation: ANTLR book wrong

Reply via email to