On Thu, Dec 20, 2012 at 12:16 AM, David A. Wheeler
<[email protected]> wrote:
> Alan Manuel Gloria:
>> *shrug* yes, we should use some tool.
>>
>> For what it's worth, Haskell's Parsec library combined with Haskell's
>> Monad Transformer library allows using the same syntax for both INDENT
>> / DEDENT / SAME -guarded parsers, and basic parsers (like n-expr) that
>> don't have INDENT/DEDENT/SAME, and allowing the second to be used
>> inside the first.
>>
>> Parsec also defaults to LL(1), meaning 1-item lookahead, which is a
>> necessity in actual Scheme implementations, since we expect to require
>> only peek-char (it supports limited-length lookahead using the 'try'
>> combinator, so if we avoid that, we know it's strictly LL(1)). So if
>> we can get it working in Parsec, we can reasonably expect to get it
>> working in Scheme implementations without unget-char, only peek-char.
>
> I agree that having a parser with significant indent-processing capabilities
> would be a big plus.
>
> I didn't include Parsec for several reasons, all based on the fact that
> Parsec is totally tied to Haskell:
> * I have serious notation concerns. We want to create a spec that will be
> read by others as part of the SRFI. ANTLR's notation is really excellent,
> it looks "just like the books". APG's is nasty. When I look at the Parsec
> example here:
> http://en.wikibooks.org/wiki/Write_Yourself_a_Scheme_in_48_Hours/Parsing
> it's clear that the Parsec notation is basically... Haskell. (Well, what a
> surprise :-) ). I like Parsec's notation better than APG's, but Parsec's
> notation is a REALLY different notation than "usual" BNFs and is not at all
> "what the books use".
> * A *lot* of people don't grok Haskell, and it's certainly not my strongest
> language either. I particularly worry that we'll need to handle certain
> cases specially if we want to seriously implement the spec with the tool, and
> at that point I'll end up throwing up my hands. I'm confident of my ability
> to fiddle with ANTLR and its ilk, but not with Parsec/Haskell. There are
> ports of Parsec, but they're tied to their languages too, and it's not clear
> that the ports are as widely used/supported.
> * It'd also be nice to be able to generate Javascript, so that we could have
> it working directly on the website. Parsec can't do that, again since it's
> tied to Haskell. That's not as important as the other issues.
>
> I won't *categorically* rule out Parsec... just say that there were *reasons*
> I didn't seriously consider Parsec.
>
> Do you (or anyone else here) have experience with Parsec, ANTLR, or similar?
> I've used bison/yacc several times, and I've done recursive descent by hand,
> but I've never used an LL-based parsing tool.
I pretty much learned parsing from Parsec, so I'm actually more
familiar with Parsec than with the standard parsing syntax in books.
At its core it's just a recursive descent parser, with a major hack
depending on laziness so that it defaults to LL(1), and can support
limited-length LL(k) parsing.
>
>> You know, "guarded" parsers are not standard in parsing lore. So we
>> may need to hack support for INDENT / DEDENT/ SAME on whatever parser
>> generator we use. Ideally, we should be able to delete the actions of
>> the parser in the parser spec and the parser will still, at the
>> minimum, be able to either signal a parse completion, or a parse
>> failure. So even if we use a tool, I suspect that ideally, we would
>> have a translator on top of this tool (a preprocessor) that provides
>> INDENT/DEDENT/SAME or makes some valid transformation.
>
> I think you're right we'll probably need to handle indentation specially no
> matter what. Traditionally parsers have a lexing preprocessor, and it
> appears to me that most people just bake indentation handling into a
> preprocessor of some kind. Handling abbreviation+space is easily handled
> that way too.
Yes, but that just moves the magic somewhere else, and we can't use a
preprocessor since we are "supposed" to use only flat character
streams provided by Scheme.
Basically my idea is that INDENT / DEDENT / SAME is part of the parser
parser's syntax (i.e. not a feature of the parser spec, but a feature
of the parser parser, which the parser spec takes advantage of).
> I've done a little reading on ANTLR, which appears to be one of the major
> LL-based parsers around. Several people *have* implemented indentation
> processing in it as well, though it's certainly not a strength of ANTLR.
>
> A problem with any of these tools is that there are some complicating factors
> in sweet-expressions that make it easy to use and understand, but unusual to
> parse:
> * Indentation-sensitivity... but only outside character pairs
Yes, this is the big one.
> * The "\\" and "$" have a different semantic meaning from "{\\}" and "{$}".
> One obvious way is to memorize the first character, read in an n-expr, and
> then compare, but that doesn't sit well with traditional LL-tools.
Dunno, it seems to me that specifying a higher-priority parse for
SPLICE etc etc with a lower priority parse for n-expr etc etc should
work well enough.
> * The "$" and "\\" have slightly different semantics at beginning of line vs.
> middle of line. By itself, trivial, but less so when combined with above.
Semantics, but not necessarily syntax ***I think***...
> * Abbreviation + space/eol after any indent has a special meaning
> * # <CHAR> can do so many things, e.g., #|...|#, and in some cases can set
> the indent level.
> * ; on a line by itself is ignored.
> The real challenge is trying not to read any characters unless truly
> necessary, so we can reuse the underlying readers, and that drives us towards
> LL-style and recursive descent parsers.
Agree with these.
Sincerely,
AmkG
------------------------------------------------------------------------------
LogMeIn Rescue: Anywhere, Anytime Remote support for IT. Free Trial
Remotely access PCs and mobile devices and provide instant support
Improve your efficiency, and focus on delivering more value-add services
Discover what IT Professionals Know. Rescue delivers
http://p.sf.net/sfu/logmein_12329d2d
_______________________________________________
Readable-discuss mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/readable-discuss