On Wed, Dec 19, 2012 at 7:30 AM, David A. Wheeler <dwhee...@dwheeler.com> wrote:
> It's certainly easy to get the BNF wrong by hand.
>
> I did a short look to see if there were some potential parsing tools that 
> might help us do this.  Minimum requirements: LL-based (so that implementing 
> recursive descent later would be easy) and a GPL-compatible OSS license.  
> EBNF with a nice notation so you can say hspace+ and so on, highly desirable. 
>  Also desired: maintained/widely-used, and generates multiple languages (to 
> avoid generators that are tied to one language).  I then walked through this 
> to find candidates: 
> http://en.wikipedia.org/wiki/Comparison_of_parser_generators
>
> Likely-useful tools in a first-pass sweep are ANTLR, Coco/R, Grammatica, 
> JavaCC, and APG.
>
> I do *not* know if any of these can handle INDENT/DEDENT guards, or in some 
> other way handle indents, which rather matters.  ANTLR's LL(*) algorithm 
> looks like we might manage to get INDENT/DEDENT tokens out of it, by matching 
> on indent characters and then taking some complex actions.
>
> What do you think?  Should we try to use a tool, like ANTLR?
>
> Alternatively, for just computing LL followsets, that could be calculated 
> using a rough tool for the purpose.

*shrug* yes, we should use some tool.

For what it's worth, Haskell's Parsec library combined with Haskell's
Monad Transformer library allows using the same syntax for both INDENT
/ DEDENT / SAME -guarded parsers, and basic parsers (like n-expr) that
don't have INDENT/DEDENT/SAME, and allowing the second to be used
inside the first.

Parsec also defaults to LL(1), meaning 1-item lookahead, which is a
necessity in actual Scheme implementations, since we expect to require
only peek-char (it supports limited-length lookahead using the 'try'
combinator, so if we avoid that, we know it's strictly LL(1)).  So if
we can get it working in Parsec, we can reasonably expect to get it
working in Scheme implementations without unget-char, only peek-char.

You know, "guarded" parsers are not standard in parsing lore.  So we
may need to hack support for INDENT / DEDENT/ SAME on whatever parser
generator we use.  Ideally, we should be able to delete the actions of
the parser in the parser spec and the parser will still, at the
minimum, be able to either signal a parse completion, or a parse
failure.  So even if we use a tool, I suspect that ideally, we would
have a translator on top of this tool (a preprocessor) that provides
INDENT/DEDENT/SAME or makes some valid transformation.

------------------------------------------------------------------------------
LogMeIn Rescue: Anywhere, Anytime Remote support for IT. Free Trial
Remotely access PCs and mobile devices and provide instant support
Improve your efficiency, and focus on delivering more value-add services
Discover what IT Professionals Know. Rescue delivers
http://p.sf.net/sfu/logmein_12329d2d
_______________________________________________
Readable-discuss mailing list
Readable-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/readable-discuss

Reply via email to