On Apr 9, 7:19 am, "Michael Yanowitz" <[EMAIL PROTECTED]> wrote: > Hello: > > I have been searching for an easy solution, and hopefully one > has already been written, so I don't want to reinvent the wheel: > > Suppose I have a string of expressions such as: > "((($IP = "127.1.2.3") AND ($AX < 15)) OR (($IP = "127.1.2.4") AND ($AY != > 0))) > I would like to split up into something like: > [ "OR", > "(($IP = "127.1.2.3") AND ($AX < 15))", > "(($IP = "127.1.2.4") AND ($AY != 0))" ] > > which I may then decide to or not to further split into: > [ "OR", > ["AND", "($IP = "127.1.2.3")", "($AX < 15)"], > ["AND", "(($IP = "127.1.2.4")", ($AY != 0))"] ] > > Is there an easy way to do this? > I tried using regular expressions, re, but I don't think it is > recursive enough. I really want to break it up from: > (E1 AND_or_OR E2) and make that int [AND_or_OR, E1, E2] > and apply the same to E1 and E2 recursively until E1[0] != '(' > > But the main problem I am running to is, how do I split this up > by outer parentheseis. So that I get the proper '(' and ')' to split > this upper correctly? > > Thanks in advance: > Michael Yanowitz
This problem is right down the pyparsing fairway! Pyparsing is a module for defining recursive-descent parsers, and it has some built- in help just for applications such as this. You start by defining the basic elements of the text to be parsed. In your sample text, you are combining a number of relational comparisons, made up of variable names and literal integers and quoted strings. Using pyparsing classes, we define these: varName = Word("$",alphas, min=2) integer = Word("0123456789").setParseAction( lambda t : int(t[0]) ) varVal = dblQuotedString | integer varName is a "word" starting with a $, followed by 1 or more alphas. integer is a "word" made up of 1 or more digits, and we add a parsing action to convert these to Python ints. varVal shows that a value can be an integer or a dblQuotedString (a common expression included with pyparsing). Next we define the set of relational operators, and the comparison expression: relationalOp = oneOf("= < > >= <= !=") comparison = Group(varName + relationalOp + varVal) The comparison expression is grouped so as to keep tokens separate from surrounding expressions. Now the most complicated part, to use the operatorPrecedence method from pyparsing. It is possible to create the recursive grammar explicitly, but this is another application that is very common, so pyparsing includes a helper for it too. Here is your set of operations defined using operatorPrecedence: boolExpr = operatorPrecedence( comparison, [ ( "AND", 2, opAssoc.LEFT ), ( "OR", 2, opAssoc.LEFT ), ]) operatorPrecedence takes 2 arguments: the base-level or atom expression (in your case, the comparison expression), and a list of tuples listing the operators in descending priority. Each tuple gives the operator, the number of operands (1 or 2), and whether it is right or left associative. Now the only thing left to do is use boolExpr to parse your test string: results = boolExpr.parseString('((($IP = "127.1.2.3") AND ($AX < 15)) OR (($IP = "127.1.2.4") AND ($AY != 0)))') pyparsing returns parsed tokens as a rich object of type ParseResults. This object can be accessed as a list, dict, or object instance with named attributes. For this example, we'll actually create a nested list using ParseResults' asList method. Passing this list to the pprint module we get: pprint.pprint( results.asList() ) prints [[[['$IP', '=', '"127.1.2.3"'], 'AND', ['$AX', '<', 15]], 'OR', [['$IP', '=', '"127.1.2.4"'], 'AND', ['$AY', '!=', 0]]]] Here is the whole program in one chunk (I also added support for NOT - higher priority than AND, and right-associative): test = '((($IP = "127.1.2.3") AND ($AX < 15)) OR (($IP = "127.1.2.4") AND ($AY != 0)))' from pyparsing import oneOf, Word, alphas, dblQuotedString, nums, \ Literal, Group, operatorPrecedence, opAssoc varName = Word("$",alphas) integer = Word(nums).setParseAction( lambda t : int(t[0]) ) varVal = dblQuotedString | integer relationalOp = oneOf("= < > >= <= !=") comparison = Group(varName + relationalOp + varVal) boolExpr = operatorPrecedence( comparison, [ ( "NOT", 1, opAssoc.RIGHT ), ( "AND", 2, opAssoc.LEFT ), ( "OR", 2, opAssoc.LEFT ), ]) import pprint pprint.pprint( boolExpr.parseString(test).asList() ) The pyparsing wiki includes some related examples, SimpleBool.py and SimpleArith.py - go to http://pyparsing.wikispaces.com/Examples. -- Paul -- http://mail.python.org/mailman/listinfo/python-list