On Thu, Dec 20, 2012 at 12:16 AM, David A. Wheeler <dwhee...@dwheeler.com> wrote: > Alan Manuel Gloria: >> *shrug* yes, we should use some tool. >> >> For what it's worth, Haskell's Parsec library combined with Haskell's >> Monad Transformer library allows using the same syntax for both INDENT >> / DEDENT / SAME -guarded parsers, and basic parsers (like n-expr) that >> don't have INDENT/DEDENT/SAME, and allowing the second to be used >> inside the first. >> >> Parsec also defaults to LL(1), meaning 1-item lookahead, which is a >> necessity in actual Scheme implementations, since we expect to require >> only peek-char (it supports limited-length lookahead using the 'try' >> combinator, so if we avoid that, we know it's strictly LL(1)). So if >> we can get it working in Parsec, we can reasonably expect to get it >> working in Scheme implementations without unget-char, only peek-char. > > I agree that having a parser with significant indent-processing capabilities > would be a big plus. > > I didn't include Parsec for several reasons, all based on the fact that > Parsec is totally tied to Haskell: > * I have serious notation concerns. We want to create a spec that will be > read by others as part of the SRFI. ANTLR's notation is really excellent, > it looks "just like the books". APG's is nasty. When I look at the Parsec > example here: > http://en.wikibooks.org/wiki/Write_Yourself_a_Scheme_in_48_Hours/Parsing > it's clear that the Parsec notation is basically... Haskell. (Well, what a > surprise :-) ). I like Parsec's notation better than APG's, but Parsec's > notation is a REALLY different notation than "usual" BNFs and is not at all > "what the books use". > * A *lot* of people don't grok Haskell, and it's certainly not my strongest > language either. I particularly worry that we'll need to handle certain > cases specially if we want to seriously implement the spec with the tool, and > at that point I'll end up throwing up my hands. I'm confident of my ability > to fiddle with ANTLR and its ilk, but not with Parsec/Haskell. There are > ports of Parsec, but they're tied to their languages too, and it's not clear > that the ports are as widely used/supported. > * It'd also be nice to be able to generate Javascript, so that we could have > it working directly on the website. Parsec can't do that, again since it's > tied to Haskell. That's not as important as the other issues. > > I won't *categorically* rule out Parsec... just say that there were *reasons* > I didn't seriously consider Parsec. > > Do you (or anyone else here) have experience with Parsec, ANTLR, or similar? > I've used bison/yacc several times, and I've done recursive descent by hand, > but I've never used an LL-based parsing tool.
I pretty much learned parsing from Parsec, so I'm actually more familiar with Parsec than with the standard parsing syntax in books. At its core it's just a recursive descent parser, with a major hack depending on laziness so that it defaults to LL(1), and can support limited-length LL(k) parsing. > >> You know, "guarded" parsers are not standard in parsing lore. So we >> may need to hack support for INDENT / DEDENT/ SAME on whatever parser >> generator we use. Ideally, we should be able to delete the actions of >> the parser in the parser spec and the parser will still, at the >> minimum, be able to either signal a parse completion, or a parse >> failure. So even if we use a tool, I suspect that ideally, we would >> have a translator on top of this tool (a preprocessor) that provides >> INDENT/DEDENT/SAME or makes some valid transformation. > > I think you're right we'll probably need to handle indentation specially no > matter what. Traditionally parsers have a lexing preprocessor, and it > appears to me that most people just bake indentation handling into a > preprocessor of some kind. Handling abbreviation+space is easily handled > that way too. Yes, but that just moves the magic somewhere else, and we can't use a preprocessor since we are "supposed" to use only flat character streams provided by Scheme. Basically my idea is that INDENT / DEDENT / SAME is part of the parser parser's syntax (i.e. not a feature of the parser spec, but a feature of the parser parser, which the parser spec takes advantage of). > I've done a little reading on ANTLR, which appears to be one of the major > LL-based parsers around. Several people *have* implemented indentation > processing in it as well, though it's certainly not a strength of ANTLR. > > A problem with any of these tools is that there are some complicating factors > in sweet-expressions that make it easy to use and understand, but unusual to > parse: > * Indentation-sensitivity... but only outside character pairs Yes, this is the big one. > * The "\\" and "$" have a different semantic meaning from "{\\}" and "{$}". > One obvious way is to memorize the first character, read in an n-expr, and > then compare, but that doesn't sit well with traditional LL-tools. Dunno, it seems to me that specifying a higher-priority parse for SPLICE etc etc with a lower priority parse for n-expr etc etc should work well enough. > * The "$" and "\\" have slightly different semantics at beginning of line vs. > middle of line. By itself, trivial, but less so when combined with above. Semantics, but not necessarily syntax ***I think***... > * Abbreviation + space/eol after any indent has a special meaning > * # <CHAR> can do so many things, e.g., #|...|#, and in some cases can set > the indent level. > * ; on a line by itself is ignored. > The real challenge is trying not to read any characters unless truly > necessary, so we can reuse the underlying readers, and that drives us towards > LL-style and recursive descent parsers. Agree with these. Sincerely, AmkG ------------------------------------------------------------------------------ LogMeIn Rescue: Anywhere, Anytime Remote support for IT. Free Trial Remotely access PCs and mobile devices and provide instant support Improve your efficiency, and focus on delivering more value-add services Discover what IT Professionals Know. Rescue delivers http://p.sf.net/sfu/logmein_12329d2d _______________________________________________ Readable-discuss mailing list Readable-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/readable-discuss