On Thu, Dec 20, 2012 at 12:16 AM, David A. Wheeler
<dwhee...@dwheeler.com> wrote:
> Alan Manuel Gloria:
>> *shrug* yes, we should use some tool.
>>
>> For what it's worth, Haskell's Parsec library combined with Haskell's
>> Monad Transformer library allows using the same syntax for both INDENT
>> / DEDENT / SAME -guarded parsers, and basic parsers (like n-expr) that
>> don't have INDENT/DEDENT/SAME, and allowing the second to be used
>> inside the first.
>>
>> Parsec also defaults to LL(1), meaning 1-item lookahead, which is a
>> necessity in actual Scheme implementations, since we expect to require
>> only peek-char (it supports limited-length lookahead using the 'try'
>> combinator, so if we avoid that, we know it's strictly LL(1)).  So if
>> we can get it working in Parsec, we can reasonably expect to get it
>> working in Scheme implementations without unget-char, only peek-char.
>
> I agree that having a parser with significant indent-processing capabilities 
> would be a big plus.
>
> I didn't include Parsec for several reasons, all based on the fact that 
> Parsec is totally tied to Haskell:
> * I have serious notation concerns.  We want to create a spec that will be 
> read by others as part of the SRFI.   ANTLR's notation is really excellent, 
> it looks "just like the books".  APG's is nasty.  When I look at the Parsec 
> example here: 
> http://en.wikibooks.org/wiki/Write_Yourself_a_Scheme_in_48_Hours/Parsing  
> it's clear that the Parsec notation is basically... Haskell.  (Well, what a 
> surprise :-) ).  I like Parsec's notation better than APG's, but Parsec's 
> notation is a REALLY different notation than "usual" BNFs and is not at all 
> "what the books use".
> * A *lot* of people don't grok Haskell, and it's certainly not my strongest 
> language either.   I particularly worry that we'll need to handle certain 
> cases specially if we want to seriously implement the spec with the tool, and 
> at that point I'll end up throwing up my hands. I'm confident of my ability 
> to fiddle with ANTLR and its ilk, but not with Parsec/Haskell.  There are 
> ports of Parsec, but they're tied to their languages too, and it's not clear 
> that the ports are as widely used/supported.
> * It'd also be nice to be able to generate Javascript, so that we could have 
> it working directly on the website.  Parsec can't do that, again since it's 
> tied to Haskell.  That's not as important as the other issues.
>
> I won't *categorically* rule out Parsec... just say that there were *reasons* 
> I didn't seriously consider Parsec.
>
> Do you (or anyone else here) have experience with Parsec, ANTLR, or similar?  
> I've used bison/yacc several times, and I've done recursive descent by hand, 
> but I've never used an LL-based parsing tool.

I pretty much learned parsing from Parsec, so I'm actually more
familiar with Parsec than with the standard parsing syntax in books.
At its core it's just a recursive descent parser, with a major hack
depending on laziness so that it defaults to LL(1), and can support
limited-length LL(k) parsing.

>
>> You know, "guarded" parsers are not standard in parsing lore.  So we
>> may need to hack support for INDENT / DEDENT/ SAME on whatever parser
>> generator we use.  Ideally, we should be able to delete the actions of
>> the parser in the parser spec and the parser will still, at the
>> minimum, be able to either signal a parse completion, or a parse
>> failure.  So even if we use a tool, I suspect that ideally, we would
>> have a translator on top of this tool (a preprocessor) that provides
>> INDENT/DEDENT/SAME or makes some valid transformation.
>
> I think you're right we'll probably need to handle indentation specially no 
> matter what.  Traditionally parsers have a lexing preprocessor, and it 
> appears to me that most people just bake indentation handling into a 
> preprocessor of some kind.  Handling abbreviation+space is easily handled 
> that way too.

Yes, but that just moves the magic somewhere else, and we can't use a
preprocessor since we are "supposed" to use only flat character
streams provided by Scheme.

Basically my idea is that INDENT / DEDENT / SAME is part of the parser
parser's syntax (i.e. not a feature of the parser spec, but a feature
of the parser parser, which the parser spec takes advantage of).

> I've done a little reading on ANTLR, which appears to be one of the major 
> LL-based parsers around.  Several people *have* implemented indentation 
> processing in it as well, though it's certainly not a strength of ANTLR.
>
> A problem with any of these tools is that there are some complicating factors 
> in sweet-expressions that make it easy to use and understand, but unusual to 
> parse:
> * Indentation-sensitivity... but only outside character pairs

Yes, this is the big one.

> * The "\\" and "$" have a different semantic meaning from "{\\}" and "{$}".  
> One obvious way is to memorize the first character, read in an n-expr, and 
> then compare, but that doesn't sit well with traditional LL-tools.

Dunno, it seems to me that specifying a higher-priority parse for
SPLICE etc etc with a lower priority parse for n-expr etc etc should
work well enough.

> * The "$" and "\\" have slightly different semantics at beginning of line vs. 
> middle of line.  By itself, trivial, but less so when combined with above.

Semantics, but not necessarily syntax ***I think***...

> * Abbreviation + space/eol after any indent has a special meaning
> * # <CHAR> can do so many things, e.g., #|...|#, and in some cases can set 
> the indent level.
> * ; on a line by itself is ignored.
> The real challenge is trying not to read any characters unless truly 
> necessary, so we can reuse the underlying readers, and that drives us towards 
> LL-style and recursive descent parsers.

Agree with these.

Sincerely,
AmkG

------------------------------------------------------------------------------
LogMeIn Rescue: Anywhere, Anytime Remote support for IT. Free Trial
Remotely access PCs and mobile devices and provide instant support
Improve your efficiency, and focus on delivering more value-add services
Discover what IT Professionals Know. Rescue delivers
http://p.sf.net/sfu/logmein_12329d2d
_______________________________________________
Readable-discuss mailing list
Readable-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/readable-discuss

Reply via email to