Alan Manuel Gloria:
> *shrug* yes, we should use some tool.
> 
> For what it's worth, Haskell's Parsec library combined with Haskell's
> Monad Transformer library allows using the same syntax for both INDENT
> / DEDENT / SAME -guarded parsers, and basic parsers (like n-expr) that
> don't have INDENT/DEDENT/SAME, and allowing the second to be used
> inside the first.
>
> Parsec also defaults to LL(1), meaning 1-item lookahead, which is a
> necessity in actual Scheme implementations, since we expect to require
> only peek-char (it supports limited-length lookahead using the 'try'
> combinator, so if we avoid that, we know it's strictly LL(1)).  So if
> we can get it working in Parsec, we can reasonably expect to get it
> working in Scheme implementations without unget-char, only peek-char.

I agree that having a parser with significant indent-processing capabilities 
would be a big plus.

I didn't include Parsec for several reasons, all based on the fact that Parsec 
is totally tied to Haskell:
* I have serious notation concerns.  We want to create a spec that will be read 
by others as part of the SRFI.   ANTLR's notation is really excellent, it looks 
"just like the books".  APG's is nasty.  When I look at the Parsec example 
here: http://en.wikibooks.org/wiki/Write_Yourself_a_Scheme_in_48_Hours/Parsing  
it's clear that the Parsec notation is basically... Haskell.  (Well, what a 
surprise :-) ).  I like Parsec's notation better than APG's, but Parsec's 
notation is a REALLY different notation than "usual" BNFs and is not at all 
"what the books use".
* A *lot* of people don't grok Haskell, and it's certainly not my strongest 
language either.   I particularly worry that we'll need to handle certain cases 
specially if we want to seriously implement the spec with the tool, and at that 
point I'll end up throwing up my hands. I'm confident of my ability to fiddle 
with ANTLR and its ilk, but not with Parsec/Haskell.  There are ports of 
Parsec, but they're tied to their languages too, and it's not clear that the 
ports are as widely used/supported.
* It'd also be nice to be able to generate Javascript, so that we could have it 
working directly on the website.  Parsec can't do that, again since it's tied 
to Haskell.  That's not as important as the other issues.

I won't *categorically* rule out Parsec... just say that there were *reasons* I 
didn't seriously consider Parsec.

Do you (or anyone else here) have experience with Parsec, ANTLR, or similar?  
I've used bison/yacc several times, and I've done recursive descent by hand, 
but I've never used an LL-based parsing tool.

> You know, "guarded" parsers are not standard in parsing lore.  So we
> may need to hack support for INDENT / DEDENT/ SAME on whatever parser
> generator we use.  Ideally, we should be able to delete the actions of
> the parser in the parser spec and the parser will still, at the
> minimum, be able to either signal a parse completion, or a parse
> failure.  So even if we use a tool, I suspect that ideally, we would
> have a translator on top of this tool (a preprocessor) that provides
> INDENT/DEDENT/SAME or makes some valid transformation.

I think you're right we'll probably need to handle indentation specially no 
matter what.  Traditionally parsers have a lexing preprocessor, and it appears 
to me that most people just bake indentation handling into a preprocessor of 
some kind.  Handling abbreviation+space is easily handled that way too.

I've done a little reading on ANTLR, which appears to be one of the major 
LL-based parsers around.  Several people *have* implemented indentation 
processing in it as well, though it's certainly not a strength of ANTLR.

A problem with any of these tools is that there are some complicating factors 
in sweet-expressions that make it easy to use and understand, but unusual to 
parse:
* Indentation-sensitivity... but only outside character pairs
* The "\\" and "$" have a different semantic meaning from "{\\}" and "{$}".  
One obvious way is to memorize the first character, read in an n-expr, and then 
compare, but that doesn't sit well with traditional LL-tools.
* The "$" and "\\" have slightly different semantics at beginning of line vs. 
middle of line.  By itself, trivial, but less so when combined with above.
* Abbreviation + space/eol after any indent has a special meaning
* # <CHAR> can do so many things, e.g., #|...|#, and in some cases can set the 
indent level.
* ; on a line by itself is ignored.
The real challenge is trying not to read any characters unless truly necessary, 
so we can reuse the underlying readers, and that drives us towards LL-style and 
recursive descent parsers.

--- David A. Wheeler

------------------------------------------------------------------------------
LogMeIn Rescue: Anywhere, Anytime Remote support for IT. Free Trial
Remotely access PCs and mobile devices and provide instant support
Improve your efficiency, and focus on delivering more value-add services
Discover what IT Professionals Know. Rescue delivers
http://p.sf.net/sfu/logmein_12329d2d
_______________________________________________
Readable-discuss mailing list
Readable-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/readable-discuss

Reply via email to