Re: parallel class structures for AST-based objects

2009-11-22 Thread Diez B. Roggisch

Steve Howell schrieb:

On Nov 21, 4:07 pm, MRAB pyt...@mrabarnett.plus.com wrote:

I don't see the point of EvalNode and PrettyPrintNode. Why don't you
just give Integer, Sum and Product 'eval' and 'pprint' methods?


That's a good question, and it's the crux of my design dilemma.  If
ALL I ever wanted to to with Integer/Sum/Product was to eval() and
pprint(), then I would just add those methods to Integer, Sum, and
Product, and be done with it, as you suggest.  But what happens when
somebody wants to extend capability?  Should every future software
developer that wants to use Integer/Sum/Product extend those classes
to get work done?  What if they want to store additional state for
nodes?



What's usually done is to create visitors/matchers. Those traverse the 
AST, and either only visit, or return transformed versions of it (think 
e.g. algebraic optimization)


A visitor will roughly look like this:


class ASTVisitor(object):



   def visit(self, node):
   name = node.__class__.__name__.lower()
   if hasattr(self, visit_%s % name):
   getattr(self, visit_%s % name)(node)
   for child in node:
   self.visit(child)



You can of course chose another type of dispatch, using e.g. a generic 
method.


Then you create Visitors for specific tasks - pretty-printing, 
evaluation, rewriting. Those don't have the overhead of your current 
design with all those factory-mapping stuff, and yet you aren't forced 
to put logic into AST you don't want there.



Diez

--
http://mail.python.org/mailman/listinfo/python-list


Re: parallel class structures for AST-based objects

2009-11-22 Thread Simon Forman
On Sun, Nov 22, 2009 at 4:50 AM, Diez B. Roggisch de...@nospam.web.de wrote:
 Steve Howell schrieb:

 On Nov 21, 4:07 pm, MRAB pyt...@mrabarnett.plus.com wrote:

 I don't see the point of EvalNode and PrettyPrintNode. Why don't you
 just give Integer, Sum and Product 'eval' and 'pprint' methods?

 That's a good question, and it's the crux of my design dilemma.  If
 ALL I ever wanted to to with Integer/Sum/Product was to eval() and
 pprint(), then I would just add those methods to Integer, Sum, and
 Product, and be done with it, as you suggest.  But what happens when
 somebody wants to extend capability?  Should every future software
 developer that wants to use Integer/Sum/Product extend those classes
 to get work done?  What if they want to store additional state for
 nodes?


 What's usually done is to create visitors/matchers. Those traverse the AST,
 and either only visit, or return transformed versions of it (think e.g.
 algebraic optimization)

 A visitor will roughly look like this:


 class ASTVisitor(object):



   def visit(self, node):
       name = node.__class__.__name__.lower()
       if hasattr(self, visit_%s % name):
           getattr(self, visit_%s % name)(node)
       for child in node:
           self.visit(child)



 You can of course chose another type of dispatch, using e.g. a generic
 method.

 Then you create Visitors for specific tasks - pretty-printing, evaluation,
 rewriting. Those don't have the overhead of your current design with all
 those factory-mapping stuff, and yet you aren't forced to put logic into AST
 you don't want there.



FWIW I often use John Aycock's SPARK (Scanning, Parsing, and Rewriting
Kit) for this sort of thing.  It has a GenericASTTraversal which is a
Visitor pattern according to Design Patterns.

http://pages.cpsc.ucalgary.ca/~aycock/spark/

It's apparently distributed with the python source, but it's not in
the standard library, more's the pity IMO.

There's a bit of a learning curve but it's well worth it.

~Simon
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: parallel class structures for AST-based objects

2009-11-22 Thread Steve Howell
On Nov 22, 7:55 am, Simon Forman sajmik...@gmail.com wrote:
 On Sun, Nov 22, 2009 at 4:50 AM, Diez B. Roggisch de...@nospam.web.de wrote:



  Steve Howell schrieb:

  On Nov 21, 4:07 pm, MRAB pyt...@mrabarnett.plus.com wrote:

  I don't see the point of EvalNode and PrettyPrintNode. Why don't you
  just give Integer, Sum and Product 'eval' and 'pprint' methods?

  That's a good question, and it's the crux of my design dilemma.  If
  ALL I ever wanted to to with Integer/Sum/Product was to eval() and
  pprint(), then I would just add those methods to Integer, Sum, and
  Product, and be done with it, as you suggest.  But what happens when
  somebody wants to extend capability?  Should every future software
  developer that wants to use Integer/Sum/Product extend those classes
  to get work done?  What if they want to store additional state for
  nodes?

  What's usually done is to create visitors/matchers. Those traverse the AST,
  and either only visit, or return transformed versions of it (think e.g.
  algebraic optimization)

  A visitor will roughly look like this:

  class ASTVisitor(object):

    def visit(self, node):
        name = node.__class__.__name__.lower()
        if hasattr(self, visit_%s % name):
            getattr(self, visit_%s % name)(node)
        for child in node:
            self.visit(child)

  You can of course chose another type of dispatch, using e.g. a generic
  method.

  Then you create Visitors for specific tasks - pretty-printing, evaluation,
  rewriting. Those don't have the overhead of your current design with all
  those factory-mapping stuff, and yet you aren't forced to put logic into AST
  you don't want there.

 FWIW I often use John Aycock's SPARK (Scanning, Parsing, and Rewriting
 Kit) for this sort of thing.  It has a GenericASTTraversal which is a
 Visitor pattern according to Design Patterns.

 http://pages.cpsc.ucalgary.ca/~aycock/spark/

 It's apparently distributed with the python source, but it's not in
 the standard library, more's the pity IMO.

 There's a bit of a learning curve but it's well worth it.


Thanks, Simon, I think something like GenericASTTraversal should work
for me in most cases.

A couple of folks have suggested an idea that is in his paper, which
is to use a single class for something PrettyPrinter, and then use
reflection to find methods on PrettyPrinter to pretty-print sums,
products, integers, etc.


-- 
http://mail.python.org/mailman/listinfo/python-list


Re: parallel class structures for AST-based objects

2009-11-21 Thread MRAB

Steve Howell wrote:

I have been writing some code that parses a mini-language, and I am
running into what I know is a pretty common design pattern problem,
but I am wondering the most Pythonic way to solve it.

Basically, I have a bunch of really simple classes that work together
to define an expression--in my oversimplified example code below,
those are Integer, Sum, and Product.

Then I want to write different modules that work with those expression
objects.  In my example, I have a parallel set of classes that enable
you to evaluate an expression, and another set of objects that enable
you to pretty-print the expression.

The below code works as intended so far (tested under 2.6), but before
I go too much further with this design, I want to get a sanity check
and some ideas on other ways to represent the interrelationships
within the code.  Basically, the issue here is that you have varying
behavior in two dimensions--a node right now is only a Product/Integer/
Sum so far, but I might want to introduce new concepts like
Difference, Quotient, etc.  And right now the only things that you can
do to expressions is eval() them and pprint() them, but you eventually
might want to operate on the expressions in new ways, including fairly
abstract operations that go beyond a simple walking of the tree.

Here is the code:

###
# base classes just represents the expression itself, which
# get created by a parser or unit tests
# example usage is
# expression = Product(Sum(Integer(5),Integer(2)), Integer(6))
class Integer:
def __init__(self, val):
self.val = val

class BinaryOp:
def __init__(self, a,b):
self.a = a
self.b = b

class Sum(BinaryOp):
pass

class Product(BinaryOp):
pass



class EvalNode:
def __init__(self, node):
self.node = node

def evaluatechild(self, child):
return EvalNode.factory(child).eval()

@staticmethod
def factory(child):
mapper = {
'Sum': SumEvalNode,
'Product': ProductEvalNode,
'Integer': IntegerEvalNode
}
return abstract_factory(child, mapper)

class SumEvalNode(EvalNode):
def eval(self):
a = self.evaluatechild(self.node.a)
b = self.evaluatechild(self.node.b)
return a + b

class ProductEvalNode(EvalNode):
def eval(self):
a = self.evaluatechild(self.node.a)
b = self.evaluatechild(self.node.b)
return a * b

class IntegerEvalNode(EvalNode):
def eval(self): return self.node.val

###

class PrettyPrintNode:
def __init__(self, node):
self.node = node

def pprint_child(self, child):
return PrettyPrintNode.factory(child).pprint()

@staticmethod
def factory(child):
mapper = {
'Sum': SumPrettyPrintNode,
'Product': ProductPrettyPrintNode,
'Integer': IntegerPrettyPrintNode
}
return abstract_factory(child, mapper)

class SumPrettyPrintNode(PrettyPrintNode):
def pprint(self):
a = self.pprint_child(self.node.a)
b = self.pprint_child(self.node.b)
return '(the sum of %s and %s)' % (a, b)

class ProductPrettyPrintNode(PrettyPrintNode):
def pprint(self):
a = self.pprint_child(self.node.a)
b = self.pprint_child(self.node.b)
return '(the product of %s and %s)' % (a, b)

class IntegerPrettyPrintNode(PrettyPrintNode):
def pprint(self): return self.node.val

##
# Not sure where this method really wants to be structurally,
# or what it should be named, but it reduces some duplication

def abstract_factory(node, node_class_mapper):
return node_class_mapper[node.__class__.__name__](node)


expression = Product(Sum(Integer(5),Integer(2)), Integer(6))

evaluator = EvalNode.factory(expression)
print evaluator.eval()

pprinter = PrettyPrintNode.factory(expression)
print pprinter.pprint()


I don't see the point of EvalNode and PrettyPrintNode. Why don't you
just give Integer, Sum and Product 'eval' and 'pprint' methods?
--
http://mail.python.org/mailman/listinfo/python-list


Re: parallel class structures for AST-based objects

2009-11-21 Thread Steve Howell
On Nov 21, 4:07 pm, MRAB pyt...@mrabarnett.plus.com wrote:

 I don't see the point of EvalNode and PrettyPrintNode. Why don't you
 just give Integer, Sum and Product 'eval' and 'pprint' methods?

That's a good question, and it's the crux of my design dilemma.  If
ALL I ever wanted to to with Integer/Sum/Product was to eval() and
pprint(), then I would just add those methods to Integer, Sum, and
Product, and be done with it, as you suggest.  But what happens when
somebody wants to extend capability?  Should every future software
developer that wants to use Integer/Sum/Product extend those classes
to get work done?  What if they want to store additional state for
nodes?

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: parallel class structures for AST-based objects

2009-11-21 Thread Richard Thomas
On 22 Nov, 00:07, MRAB pyt...@mrabarnett.plus.com wrote:
 Steve Howell wrote:
  I have been writing some code that parses a mini-language, and I am
  running into what I know is a pretty common design pattern problem,
  but I am wondering the most Pythonic way to solve it.

  Basically, I have a bunch of really simple classes that work together
  to define an expression--in my oversimplified example code below,
  those are Integer, Sum, and Product.

  Then I want to write different modules that work with those expression
  objects.  In my example, I have a parallel set of classes that enable
  you to evaluate an expression, and another set of objects that enable
  you to pretty-print the expression.

  The below code works as intended so far (tested under 2.6), but before
  I go too much further with this design, I want to get a sanity check
  and some ideas on other ways to represent the interrelationships
  within the code.  Basically, the issue here is that you have varying
  behavior in two dimensions--a node right now is only a Product/Integer/
  Sum so far, but I might want to introduce new concepts like
  Difference, Quotient, etc.  And right now the only things that you can
  do to expressions is eval() them and pprint() them, but you eventually
  might want to operate on the expressions in new ways, including fairly
  abstract operations that go beyond a simple walking of the tree.

  Here is the code:

      ###
      # base classes just represents the expression itself, which
      # get created by a parser or unit tests
      # example usage is
      # expression = Product(Sum(Integer(5),Integer(2)), Integer(6))
      class Integer:
          def __init__(self, val):
              self.val = val

      class BinaryOp:
          def __init__(self, a,b):
              self.a = a
              self.b = b

      class Sum(BinaryOp):
          pass

      class Product(BinaryOp):
          pass

      

      class EvalNode:
          def __init__(self, node):
              self.node = node

          def evaluatechild(self, child):
              return EvalNode.factory(child).eval()

          @staticmethod
          def factory(child):
              mapper = {
                  'Sum': SumEvalNode,
                  'Product': ProductEvalNode,
                  'Integer': IntegerEvalNode
                  }
              return abstract_factory(child, mapper)

      class SumEvalNode(EvalNode):
          def eval(self):
              a = self.evaluatechild(self.node.a)
              b = self.evaluatechild(self.node.b)
              return a + b

      class ProductEvalNode(EvalNode):
          def eval(self):
              a = self.evaluatechild(self.node.a)
              b = self.evaluatechild(self.node.b)
              return a * b

      class IntegerEvalNode(EvalNode):
          def eval(self): return self.node.val

      ###

      class PrettyPrintNode:
          def __init__(self, node):
              self.node = node

          def pprint_child(self, child):
              return PrettyPrintNode.factory(child).pprint()

          @staticmethod
          def factory(child):
              mapper = {
                  'Sum': SumPrettyPrintNode,
                  'Product': ProductPrettyPrintNode,
                  'Integer': IntegerPrettyPrintNode
                  }
              return abstract_factory(child, mapper)

      class SumPrettyPrintNode(PrettyPrintNode):
          def pprint(self):
              a = self.pprint_child(self.node.a)
              b = self.pprint_child(self.node.b)
              return '(the sum of %s and %s)' % (a, b)

      class ProductPrettyPrintNode(PrettyPrintNode):
          def pprint(self):
              a = self.pprint_child(self.node.a)
              b = self.pprint_child(self.node.b)
              return '(the product of %s and %s)' % (a, b)

      class IntegerPrettyPrintNode(PrettyPrintNode):
          def pprint(self): return self.node.val

      ##
      # Not sure where this method really wants to be structurally,
      # or what it should be named, but it reduces some duplication

      def abstract_factory(node, node_class_mapper):
          return node_class_mapper[node.__class__.__name__](node)

      expression = Product(Sum(Integer(5),Integer(2)), Integer(6))

      evaluator = EvalNode.factory(expression)
      print evaluator.eval()

      pprinter = PrettyPrintNode.factory(expression)
      print pprinter.pprint()

 I don't see the point of EvalNode and PrettyPrintNode. Why don't you
 just give Integer, Sum and Product 'eval' and 'pprint' methods?

This looks more structurally sound:

class Node(object):
   def eval(self):
  raise NotImplementedError
   def pprint(self):
  raise NotImplementedError

class BinaryOperatorNode(Node):
   operator = None
   def __init__(self, first, second):
  self.first = first
  self.second = second
   def eval(self):
  return self.operator(self.first.eval(), 

Re: parallel class structures for AST-based objects

2009-11-21 Thread Steve Howell
On Nov 21, 4:33 pm, Richard Thomas chards...@gmail.com wrote:

 This looks more structurally sound:

 class Node(object):
    def eval(self):
       raise NotImplementedError
    def pprint(self):
       raise NotImplementedError


My objection to the interface you describe is that Node defines the
type of operations that can be done to it by third-party code, which
is something that I cannot predict, and which (pscouples every
subclass of Node to eval() and pprint(), even though those subclasses
might not care about either operation.  They might be reimplementing
only one of those operations, or some completely different operation.


 class Sum(BinaryOperatorNode):
    operator = lambda x, y: x + y

 class Product(BinaryOperatorNode):
    operator = lambda x, y: x * y


I do like the notion of driving up Sum/Product abstractions like
operator into BinaryOperatorNode, but that is sort of an artifact of
my toy example, not my main design dilemma.

 I don't know what you're doing exactly but if all you need is to be
 able to parse and evaluate expressions then you can get very decent
 mileage out of overriding operators, to the extent that the whole
 thing you are trying to do could be a single class...

 class Expression(object):
def __init__(self, func):
   self.func = func
def __call__(self, **context):
   while isinstance(self, Expression):
  self = self.func(context)
   return self
def __add__(self, other):
   return Expression(lambda context: self.func(context) + other)
def __mul__(self, other):
 [snipped]

It is my fault for muddying the conversation with a toy example that
evaluates arithmetic expressions.  My particular problem involves a
mini-language that describes how you want to descend Python data
structures.


 But maybe that's not what you need. No need to overengineer if it is
 though, keep it simple, simple is better than complex.


Yep!  I am trying to keep things simple, but my wish to extend is not
speculative at this point; it is real.  I have an expression syntax,
which I call pyDTL, that I want these operations on:

  * generate two different versions of Python code that expresses the
operation
  * eagerly evaluate the expression on some object
  * pretty-print the expression itself
  * use the expression as a prototype for stub objects to determine
what operations they allow
  * etc

All of those requirements create the need to somehow create new
objects or functions that correspond to the same AST.

I describe pyDTL here:

http://showellonprogramming.blogspot.com/2009/11/mini-ddl-for-python.html

Here is a simple example:

 echo '{.foo, .bar(){.spam, .eggs} }' | python dtl/parse.py
 dict(
 foo = obj.foo,
 bar = (lambda bar:
 dict(
 spam = bar.spam,
 eggs = bar.eggs,
 ))(obj.bar()),
 )


###
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: parallel class structures for AST-based objects

2009-11-21 Thread Gregory Ewing

Steve Howell wrote:


My objection to the interface you describe is that Node defines the
type of operations that can be done to it by third-party code, which
is something that I cannot predict


I think you have the right idea with a mapping from node
classes to implementations of operations, but your
intermediate classes SumPrettyPrintNode etc. are not
necessary. Just put functions implementing the operations
directly into the mapping.

Another thing is that the mappings should be somewhere
global that can be extended, perhaps through a registration
interface. Putting the mapping dicts inside the node classes
doesn't gain you anything over simply using methods. All
you've done is reimplement method dispatch.

A third thing is that instead of just looking up the node
class in the mapping, you may want to walk up the MRO and
try each of the base classes until you find a match. That
way, nodes will effectively inherit implementations from
their base classes for any operations that they don't
explicitly implement.

This inheritance behaviour becomes important if you have
two developers A and B, where A adds new classes and B
adds new operations. If A and B don't talk to each other,
you necessarily end up with gaps in the matrix: A's classes
won't have implementations for B's operations.

The best you can hope for is that the missing operations
will somehow fall back gracefully on default implementations.
Inheritance allows this to be achieved: if A derives all
his classes from existing node classes, and B provides
default implementations of all his operations for the root
Node class, then some implementation will always be found
for every class/operation combination.

Now, there's actually a very easy way of implementing all
this. Your registration interface could simply be

  def register_operation(klass, operation_name, function):
setattr(klass, operation_name, function)

In other words, monkey-patch the classes to add the new
functions directly as methods. Since it's so simple, you
could even do away with the registration function altogether
and just have developers insert methods directly into the
classes.

Some people might consider this an ugly hack, but the
end result is almost exactly the same, it's very simple,
and it's very efficient, making use of Python's existing
method dispatch machinery instead of reinventing it
yourself.

One reason you might want to keep the registration function,
even if all it's doing is modifying the classes, is to
make it easier to find where operations are being defined,
by searching for calls to register_operation().

--
Greg
--
http://mail.python.org/mailman/listinfo/python-list