Re: parallel class structures for AST-based objects
Steve Howell schrieb: On Nov 21, 4:07 pm, MRAB pyt...@mrabarnett.plus.com wrote: I don't see the point of EvalNode and PrettyPrintNode. Why don't you just give Integer, Sum and Product 'eval' and 'pprint' methods? That's a good question, and it's the crux of my design dilemma. If ALL I ever wanted to to with Integer/Sum/Product was to eval() and pprint(), then I would just add those methods to Integer, Sum, and Product, and be done with it, as you suggest. But what happens when somebody wants to extend capability? Should every future software developer that wants to use Integer/Sum/Product extend those classes to get work done? What if they want to store additional state for nodes? What's usually done is to create visitors/matchers. Those traverse the AST, and either only visit, or return transformed versions of it (think e.g. algebraic optimization) A visitor will roughly look like this: class ASTVisitor(object): def visit(self, node): name = node.__class__.__name__.lower() if hasattr(self, visit_%s % name): getattr(self, visit_%s % name)(node) for child in node: self.visit(child) You can of course chose another type of dispatch, using e.g. a generic method. Then you create Visitors for specific tasks - pretty-printing, evaluation, rewriting. Those don't have the overhead of your current design with all those factory-mapping stuff, and yet you aren't forced to put logic into AST you don't want there. Diez -- http://mail.python.org/mailman/listinfo/python-list
Re: parallel class structures for AST-based objects
On Sun, Nov 22, 2009 at 4:50 AM, Diez B. Roggisch de...@nospam.web.de wrote: Steve Howell schrieb: On Nov 21, 4:07 pm, MRAB pyt...@mrabarnett.plus.com wrote: I don't see the point of EvalNode and PrettyPrintNode. Why don't you just give Integer, Sum and Product 'eval' and 'pprint' methods? That's a good question, and it's the crux of my design dilemma. If ALL I ever wanted to to with Integer/Sum/Product was to eval() and pprint(), then I would just add those methods to Integer, Sum, and Product, and be done with it, as you suggest. But what happens when somebody wants to extend capability? Should every future software developer that wants to use Integer/Sum/Product extend those classes to get work done? What if they want to store additional state for nodes? What's usually done is to create visitors/matchers. Those traverse the AST, and either only visit, or return transformed versions of it (think e.g. algebraic optimization) A visitor will roughly look like this: class ASTVisitor(object): def visit(self, node): name = node.__class__.__name__.lower() if hasattr(self, visit_%s % name): getattr(self, visit_%s % name)(node) for child in node: self.visit(child) You can of course chose another type of dispatch, using e.g. a generic method. Then you create Visitors for specific tasks - pretty-printing, evaluation, rewriting. Those don't have the overhead of your current design with all those factory-mapping stuff, and yet you aren't forced to put logic into AST you don't want there. FWIW I often use John Aycock's SPARK (Scanning, Parsing, and Rewriting Kit) for this sort of thing. It has a GenericASTTraversal which is a Visitor pattern according to Design Patterns. http://pages.cpsc.ucalgary.ca/~aycock/spark/ It's apparently distributed with the python source, but it's not in the standard library, more's the pity IMO. There's a bit of a learning curve but it's well worth it. ~Simon -- http://mail.python.org/mailman/listinfo/python-list
Re: parallel class structures for AST-based objects
On Nov 22, 7:55 am, Simon Forman sajmik...@gmail.com wrote: On Sun, Nov 22, 2009 at 4:50 AM, Diez B. Roggisch de...@nospam.web.de wrote: Steve Howell schrieb: On Nov 21, 4:07 pm, MRAB pyt...@mrabarnett.plus.com wrote: I don't see the point of EvalNode and PrettyPrintNode. Why don't you just give Integer, Sum and Product 'eval' and 'pprint' methods? That's a good question, and it's the crux of my design dilemma. If ALL I ever wanted to to with Integer/Sum/Product was to eval() and pprint(), then I would just add those methods to Integer, Sum, and Product, and be done with it, as you suggest. But what happens when somebody wants to extend capability? Should every future software developer that wants to use Integer/Sum/Product extend those classes to get work done? What if they want to store additional state for nodes? What's usually done is to create visitors/matchers. Those traverse the AST, and either only visit, or return transformed versions of it (think e.g. algebraic optimization) A visitor will roughly look like this: class ASTVisitor(object): def visit(self, node): name = node.__class__.__name__.lower() if hasattr(self, visit_%s % name): getattr(self, visit_%s % name)(node) for child in node: self.visit(child) You can of course chose another type of dispatch, using e.g. a generic method. Then you create Visitors for specific tasks - pretty-printing, evaluation, rewriting. Those don't have the overhead of your current design with all those factory-mapping stuff, and yet you aren't forced to put logic into AST you don't want there. FWIW I often use John Aycock's SPARK (Scanning, Parsing, and Rewriting Kit) for this sort of thing. It has a GenericASTTraversal which is a Visitor pattern according to Design Patterns. http://pages.cpsc.ucalgary.ca/~aycock/spark/ It's apparently distributed with the python source, but it's not in the standard library, more's the pity IMO. There's a bit of a learning curve but it's well worth it. Thanks, Simon, I think something like GenericASTTraversal should work for me in most cases. A couple of folks have suggested an idea that is in his paper, which is to use a single class for something PrettyPrinter, and then use reflection to find methods on PrettyPrinter to pretty-print sums, products, integers, etc. -- http://mail.python.org/mailman/listinfo/python-list
Re: parallel class structures for AST-based objects
Steve Howell wrote: I have been writing some code that parses a mini-language, and I am running into what I know is a pretty common design pattern problem, but I am wondering the most Pythonic way to solve it. Basically, I have a bunch of really simple classes that work together to define an expression--in my oversimplified example code below, those are Integer, Sum, and Product. Then I want to write different modules that work with those expression objects. In my example, I have a parallel set of classes that enable you to evaluate an expression, and another set of objects that enable you to pretty-print the expression. The below code works as intended so far (tested under 2.6), but before I go too much further with this design, I want to get a sanity check and some ideas on other ways to represent the interrelationships within the code. Basically, the issue here is that you have varying behavior in two dimensions--a node right now is only a Product/Integer/ Sum so far, but I might want to introduce new concepts like Difference, Quotient, etc. And right now the only things that you can do to expressions is eval() them and pprint() them, but you eventually might want to operate on the expressions in new ways, including fairly abstract operations that go beyond a simple walking of the tree. Here is the code: ### # base classes just represents the expression itself, which # get created by a parser or unit tests # example usage is # expression = Product(Sum(Integer(5),Integer(2)), Integer(6)) class Integer: def __init__(self, val): self.val = val class BinaryOp: def __init__(self, a,b): self.a = a self.b = b class Sum(BinaryOp): pass class Product(BinaryOp): pass class EvalNode: def __init__(self, node): self.node = node def evaluatechild(self, child): return EvalNode.factory(child).eval() @staticmethod def factory(child): mapper = { 'Sum': SumEvalNode, 'Product': ProductEvalNode, 'Integer': IntegerEvalNode } return abstract_factory(child, mapper) class SumEvalNode(EvalNode): def eval(self): a = self.evaluatechild(self.node.a) b = self.evaluatechild(self.node.b) return a + b class ProductEvalNode(EvalNode): def eval(self): a = self.evaluatechild(self.node.a) b = self.evaluatechild(self.node.b) return a * b class IntegerEvalNode(EvalNode): def eval(self): return self.node.val ### class PrettyPrintNode: def __init__(self, node): self.node = node def pprint_child(self, child): return PrettyPrintNode.factory(child).pprint() @staticmethod def factory(child): mapper = { 'Sum': SumPrettyPrintNode, 'Product': ProductPrettyPrintNode, 'Integer': IntegerPrettyPrintNode } return abstract_factory(child, mapper) class SumPrettyPrintNode(PrettyPrintNode): def pprint(self): a = self.pprint_child(self.node.a) b = self.pprint_child(self.node.b) return '(the sum of %s and %s)' % (a, b) class ProductPrettyPrintNode(PrettyPrintNode): def pprint(self): a = self.pprint_child(self.node.a) b = self.pprint_child(self.node.b) return '(the product of %s and %s)' % (a, b) class IntegerPrettyPrintNode(PrettyPrintNode): def pprint(self): return self.node.val ## # Not sure where this method really wants to be structurally, # or what it should be named, but it reduces some duplication def abstract_factory(node, node_class_mapper): return node_class_mapper[node.__class__.__name__](node) expression = Product(Sum(Integer(5),Integer(2)), Integer(6)) evaluator = EvalNode.factory(expression) print evaluator.eval() pprinter = PrettyPrintNode.factory(expression) print pprinter.pprint() I don't see the point of EvalNode and PrettyPrintNode. Why don't you just give Integer, Sum and Product 'eval' and 'pprint' methods? -- http://mail.python.org/mailman/listinfo/python-list
Re: parallel class structures for AST-based objects
On Nov 21, 4:07 pm, MRAB pyt...@mrabarnett.plus.com wrote: I don't see the point of EvalNode and PrettyPrintNode. Why don't you just give Integer, Sum and Product 'eval' and 'pprint' methods? That's a good question, and it's the crux of my design dilemma. If ALL I ever wanted to to with Integer/Sum/Product was to eval() and pprint(), then I would just add those methods to Integer, Sum, and Product, and be done with it, as you suggest. But what happens when somebody wants to extend capability? Should every future software developer that wants to use Integer/Sum/Product extend those classes to get work done? What if they want to store additional state for nodes? -- http://mail.python.org/mailman/listinfo/python-list
Re: parallel class structures for AST-based objects
On 22 Nov, 00:07, MRAB pyt...@mrabarnett.plus.com wrote: Steve Howell wrote: I have been writing some code that parses a mini-language, and I am running into what I know is a pretty common design pattern problem, but I am wondering the most Pythonic way to solve it. Basically, I have a bunch of really simple classes that work together to define an expression--in my oversimplified example code below, those are Integer, Sum, and Product. Then I want to write different modules that work with those expression objects. In my example, I have a parallel set of classes that enable you to evaluate an expression, and another set of objects that enable you to pretty-print the expression. The below code works as intended so far (tested under 2.6), but before I go too much further with this design, I want to get a sanity check and some ideas on other ways to represent the interrelationships within the code. Basically, the issue here is that you have varying behavior in two dimensions--a node right now is only a Product/Integer/ Sum so far, but I might want to introduce new concepts like Difference, Quotient, etc. And right now the only things that you can do to expressions is eval() them and pprint() them, but you eventually might want to operate on the expressions in new ways, including fairly abstract operations that go beyond a simple walking of the tree. Here is the code: ### # base classes just represents the expression itself, which # get created by a parser or unit tests # example usage is # expression = Product(Sum(Integer(5),Integer(2)), Integer(6)) class Integer: def __init__(self, val): self.val = val class BinaryOp: def __init__(self, a,b): self.a = a self.b = b class Sum(BinaryOp): pass class Product(BinaryOp): pass class EvalNode: def __init__(self, node): self.node = node def evaluatechild(self, child): return EvalNode.factory(child).eval() @staticmethod def factory(child): mapper = { 'Sum': SumEvalNode, 'Product': ProductEvalNode, 'Integer': IntegerEvalNode } return abstract_factory(child, mapper) class SumEvalNode(EvalNode): def eval(self): a = self.evaluatechild(self.node.a) b = self.evaluatechild(self.node.b) return a + b class ProductEvalNode(EvalNode): def eval(self): a = self.evaluatechild(self.node.a) b = self.evaluatechild(self.node.b) return a * b class IntegerEvalNode(EvalNode): def eval(self): return self.node.val ### class PrettyPrintNode: def __init__(self, node): self.node = node def pprint_child(self, child): return PrettyPrintNode.factory(child).pprint() @staticmethod def factory(child): mapper = { 'Sum': SumPrettyPrintNode, 'Product': ProductPrettyPrintNode, 'Integer': IntegerPrettyPrintNode } return abstract_factory(child, mapper) class SumPrettyPrintNode(PrettyPrintNode): def pprint(self): a = self.pprint_child(self.node.a) b = self.pprint_child(self.node.b) return '(the sum of %s and %s)' % (a, b) class ProductPrettyPrintNode(PrettyPrintNode): def pprint(self): a = self.pprint_child(self.node.a) b = self.pprint_child(self.node.b) return '(the product of %s and %s)' % (a, b) class IntegerPrettyPrintNode(PrettyPrintNode): def pprint(self): return self.node.val ## # Not sure where this method really wants to be structurally, # or what it should be named, but it reduces some duplication def abstract_factory(node, node_class_mapper): return node_class_mapper[node.__class__.__name__](node) expression = Product(Sum(Integer(5),Integer(2)), Integer(6)) evaluator = EvalNode.factory(expression) print evaluator.eval() pprinter = PrettyPrintNode.factory(expression) print pprinter.pprint() I don't see the point of EvalNode and PrettyPrintNode. Why don't you just give Integer, Sum and Product 'eval' and 'pprint' methods? This looks more structurally sound: class Node(object): def eval(self): raise NotImplementedError def pprint(self): raise NotImplementedError class BinaryOperatorNode(Node): operator = None def __init__(self, first, second): self.first = first self.second = second def eval(self): return self.operator(self.first.eval(),
Re: parallel class structures for AST-based objects
On Nov 21, 4:33 pm, Richard Thomas chards...@gmail.com wrote: This looks more structurally sound: class Node(object): def eval(self): raise NotImplementedError def pprint(self): raise NotImplementedError My objection to the interface you describe is that Node defines the type of operations that can be done to it by third-party code, which is something that I cannot predict, and which (pscouples every subclass of Node to eval() and pprint(), even though those subclasses might not care about either operation. They might be reimplementing only one of those operations, or some completely different operation. class Sum(BinaryOperatorNode): operator = lambda x, y: x + y class Product(BinaryOperatorNode): operator = lambda x, y: x * y I do like the notion of driving up Sum/Product abstractions like operator into BinaryOperatorNode, but that is sort of an artifact of my toy example, not my main design dilemma. I don't know what you're doing exactly but if all you need is to be able to parse and evaluate expressions then you can get very decent mileage out of overriding operators, to the extent that the whole thing you are trying to do could be a single class... class Expression(object): def __init__(self, func): self.func = func def __call__(self, **context): while isinstance(self, Expression): self = self.func(context) return self def __add__(self, other): return Expression(lambda context: self.func(context) + other) def __mul__(self, other): [snipped] It is my fault for muddying the conversation with a toy example that evaluates arithmetic expressions. My particular problem involves a mini-language that describes how you want to descend Python data structures. But maybe that's not what you need. No need to overengineer if it is though, keep it simple, simple is better than complex. Yep! I am trying to keep things simple, but my wish to extend is not speculative at this point; it is real. I have an expression syntax, which I call pyDTL, that I want these operations on: * generate two different versions of Python code that expresses the operation * eagerly evaluate the expression on some object * pretty-print the expression itself * use the expression as a prototype for stub objects to determine what operations they allow * etc All of those requirements create the need to somehow create new objects or functions that correspond to the same AST. I describe pyDTL here: http://showellonprogramming.blogspot.com/2009/11/mini-ddl-for-python.html Here is a simple example: echo '{.foo, .bar(){.spam, .eggs} }' | python dtl/parse.py dict( foo = obj.foo, bar = (lambda bar: dict( spam = bar.spam, eggs = bar.eggs, ))(obj.bar()), ) ### -- http://mail.python.org/mailman/listinfo/python-list
Re: parallel class structures for AST-based objects
Steve Howell wrote: My objection to the interface you describe is that Node defines the type of operations that can be done to it by third-party code, which is something that I cannot predict I think you have the right idea with a mapping from node classes to implementations of operations, but your intermediate classes SumPrettyPrintNode etc. are not necessary. Just put functions implementing the operations directly into the mapping. Another thing is that the mappings should be somewhere global that can be extended, perhaps through a registration interface. Putting the mapping dicts inside the node classes doesn't gain you anything over simply using methods. All you've done is reimplement method dispatch. A third thing is that instead of just looking up the node class in the mapping, you may want to walk up the MRO and try each of the base classes until you find a match. That way, nodes will effectively inherit implementations from their base classes for any operations that they don't explicitly implement. This inheritance behaviour becomes important if you have two developers A and B, where A adds new classes and B adds new operations. If A and B don't talk to each other, you necessarily end up with gaps in the matrix: A's classes won't have implementations for B's operations. The best you can hope for is that the missing operations will somehow fall back gracefully on default implementations. Inheritance allows this to be achieved: if A derives all his classes from existing node classes, and B provides default implementations of all his operations for the root Node class, then some implementation will always be found for every class/operation combination. Now, there's actually a very easy way of implementing all this. Your registration interface could simply be def register_operation(klass, operation_name, function): setattr(klass, operation_name, function) In other words, monkey-patch the classes to add the new functions directly as methods. Since it's so simple, you could even do away with the registration function altogether and just have developers insert methods directly into the classes. Some people might consider this an ugly hack, but the end result is almost exactly the same, it's very simple, and it's very efficient, making use of Python's existing method dispatch machinery instead of reinventing it yourself. One reason you might want to keep the registration function, even if all it's doing is modifying the classes, is to make it easier to find where operations are being defined, by searching for calls to register_operation(). -- Greg -- http://mail.python.org/mailman/listinfo/python-list