It's certainly easy to get the BNF wrong by hand. I did a short look to see if there were some potential parsing tools that might help us do this. Minimum requirements: LL-based (so that implementing recursive descent later would be easy) and a GPL-compatible OSS license. EBNF with a nice notation so you can say hspace+ and so on, highly desirable. Also desired: maintained/widely-used, and generates multiple languages (to avoid generators that are tied to one language). I then walked through this to find candidates: http://en.wikipedia.org/wiki/Comparison_of_parser_generators
Likely-useful tools in a first-pass sweep are ANTLR, Coco/R, Grammatica, JavaCC, and APG. I do *not* know if any of these can handle INDENT/DEDENT guards, or in some other way handle indents, which rather matters. ANTLR's LL(*) algorithm looks like we might manage to get INDENT/DEDENT tokens out of it, by matching on indent characters and then taking some complex actions. What do you think? Should we try to use a tool, like ANTLR? Alternatively, for just computing LL followsets, that could be calculated using a rough tool for the purpose. --- David A. Wheeler ****LIST OF TOOLS**** ANTLR - capable, BSD, LL(*). As a practical matter must buy book ($24 for v3 electrons, inc. PDF). Extremely popular, active development, lively support. Many generated languages, including Javascript (for v3). On Fedora, install "antlr3" (just "antlr" installs v2) - very convenient! Complication: In a major version transition; ANTLR v4 due out Dec. 2012, v4 book draft out. But that one is Java-only for now, and looks like documentation is really sketchy for v4. It might be best to stick with v3 for now. Notation looks really clean. http://antlr.org/ Example (v3): expr : term ( ( PLUS | MINUS ) term )* ; term : factor ( ( MULT | DIV ) factor )* ; factor : NUMBER ; Example (v4) (it appears it has some LL left-recursion extensions): grammar Expr; prog: (expr NEWLINE)* ; expr: expr ('*'|'/') expr | expr ('+'|'-') expr | INT | '(' expr ')' ; Coco/R - capable: Attributed grammar, LL(k). GPL (with exceptions, no prob). Different programs for different generated languages (ugh, but not relevant?) Coco/R EXAMPLE: CompilationUnit = [ "package" Qualident ';' ] { ImportDeclaration } { TypeDeclaration } (. CODE HERE .) . http://www.ssw.uni-linz.ac.at/Research/Projects/Coco/ Grammatica: "C# and Java parser generator (compiler compiler). It improves upon simlar tools (like yacc and ANTLR) by creating well-commented and readable source code, by having automatic error recovery and detailed error messages, and by support for testing and debugging grammars without generating source code." "Grammatica supports LL(k) grammars with an unlimited number of look-ahead tokens." GNU LGPL. Example: ImportList = "IMPORTS" SymbolsFromModule* ";" ; http://grammatica.percederberg.net/ JavaCC : Claims "most popular" for Java. Originally by Sun. Originally wasn't OSS, now it is. Slightly ugly notation; productions have type declarations and look like Java. Reasonable if you're doing Java of course, but not the point in this case. You enter ".jj" files. Example: void enumerator() : {} { <ID> ("=" constant_expression())? } APG - capable, GPL, web version. Lots of capabilities. Can gen Javascript, and that's a big plus. Ugh: Uses prefix repetition notation *(...), a BNF notation I (wheeler) really hate (it's not what the textbooks do, so it's painful to read). NO: AXE - Baked into C++. HiLexed - looks unfinished. SableCC - LALR. ------------------------------------------------------------------------------ LogMeIn Rescue: Anywhere, Anytime Remote support for IT. Free Trial Remotely access PCs and mobile devices and provide instant support Improve your efficiency, and focus on delivering more value-add services Discover what IT Professionals Know. Rescue delivers http://p.sf.net/sfu/logmein_12329d2d _______________________________________________ Readable-discuss mailing list Readable-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/readable-discuss