I'm happy to announce the CoCo/r for Python release candidate. CoCo/r is an scanner generator and LL(k) parser generator which has already been ported to many languages. CocoPy 1.1.0rc can be found in the Python Package Index.
CoCo/r for Python now passes all tests in the official COCO test suite (http://www.ssw.uni-linz.ac.at/coco/). Features: - The generated scanner and parser are completely independant. Either is easily replaced by handwritten equivalents. - The syntax for the target language specification is reminicent of the Pascal family of languages. - The production syntax is not the usual UNIX regular expression syntax. Instead, CoCo uses the much more readable EBNF syntax introduced by Nicholas Wirth. This same syntax is used for writing productions for both the regular (token syntax) and context free (phrase or statement syntax) grammars. - The specification language supports botn inherited and synthetic attributes; as well as embedding semantic actions throughout a production. - The generated scanners are DFAs. - The generated parsers are recursive descent (hence the /r in CoCo/r) - Features like comments and case (in)sensativity are handled automatically. - Lexemes may be context dependant. - CoCo provides a unique and easy to use error recovery system; a difficult problem in recursive descent parsers. - The developer can customize error messages and event include his own. What follows is a very simplistic example compiler included in the distribution. # ======================================= Calc.atg COMPILER Calc # --------------------------------------------------- # Everything here goes into the parser class VARS = [ 0 ] * 1000 # Create an array to hold the variables def getSpix( self ): varName = self.token.val.upper() # Grab the most recently parsed lexeme if len(varName) >= 2: return 26*(ord(varName[1])-ord('A'))+(ord(varName[0])-ord('A')) else: return ord(varName[0])-ord('A') def getNumber( self ): return int(self.token.val) def newVar( self, spix ): self.VARS[ spix ] = 0 def getVar( self, spix ): return self.VARS[ spix ] def writeVal( self, val ): print val def readVal( self, spix ): self.VARS[ spix ] = int(raw_input( 'Read >' )) def setVar( self, spix, val ): self.VARS[ spix ] = val # End of definitions for parser class # ----------------------------------------- IGNORECASE CHARACTERS letter = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz". digit = "0123456789". eol = CHR(13) . lf = CHR(10) . TOKENS ident = letter {letter | digit} . number = digit {digit} . COMMENTS FROM '--' TO eol IGNORE eol + lf PRODUCTIONS Calc = [Declarations] StatSeq . Declarations = 'VAR' Ident<out spix> (. self.newVar(spix) .) { ',' Ident<out spix> (. self.newVar(spix) .) } ';' . StatSeq = Stat {';' Stat}. Stat = "READ" Ident<out spix> (. self.readVal(spix) .) | "WRITE" Expr<out val> (. self.writeVal(val) .) | Ident<out spix> ":=" Expr<out val>(. self.setVar(spix, val) .) . Expr<out exprVal> = Term<out exprVal> { '+' Term<out termVal> (. exprVal += termVal .) | '-' Term<out termVal> (. exprVal -= termVal .) } . Term<out termVal> = Fact<out termVal> { '*' Fact<out factVal> (. termVal *= factVal .) | '/' Fact<out factVal> (. termVal /= factVal .) } . Fact<out factVal> = Ident<out spix> (. factVal = self.getVar(spix) .) | number (. factVal = self.getNumber() .) | '(' Expr<out factVal> ')' . Ident<out spix> = ident (. spix = self.getSpix() .) . END Calc. # ---------------------------------------------------------------Sample input calc.inp VAR A,B,C,D; WRITE 1+(2*3)+4; WRITE 100/10; READ A; WRITE A; B := A*16; WRITE B*2
-- http://mail.python.org/mailman/listinfo/python-list