At 10:57 AM 4/7/2006 -0700, Guido van Rossum wrote: >Here I'm not so sure. Are you sure those results are unexpected? It >seems you are saying that if I have a class C deriving from A and B >(in that order) where both A and B implement a method foo(), and C >doesn't, the call C().foo() would be ambiguous. But in fact the whole >point of having a defined MRO is to foster the expectation that it >will call A's foo(), because that one comes first in the MRO. > >So at least for the single-argument variant I would think that my >algorithm gives results consistent with the definition of inheritance >in new-style classes.
The ambiguity is only in context, or rather, the requirement for context. If you use the argument's MRO, then you can't statically determine the dominance hierarchy, because whether A or B dominates depends on the argument you're actually testing at runtime! This makes it harder for you to tell ahead of time whether you've defined the right methods. In particular, note that if you have cases for A and for B, you can easily be misled into thinking that one (let's say A) dominates the other because you currently only have class C(A,B). And you can then conclude that your list of cases is comprehensive and unambiguous -- until somebody creates class D(B,A) and suddenly your rules are ambiguous again. By the way, my rewrite of dominates() actually had a bug in it also, as did your original. Yours used an 'is' to compare the tuple signatures where it should've used ==, and mine checked individual types with 'is' instead of comparing the entire tuple with ==. Instead of: all(d is not s and issubclass(d,s) for d,s in zip(dom, sub)) I should've written: dom != sub and all(issubclass(d,s) for d,s in zip(dom, sub)) > > Unfortunately, this entire approach is subject to similar scale > > issues as PyProtocols/RuleDispatch. > >Even with the cache I put in? The hairy algorithm doesn't get invoked >more than once per actual signature (type tuple). You've measured it now, but yes, my own measurements of similar techniques in RuleDispatch showed the same effect, and as with your measurement, it is the Python code to build the tuple, rather than the tuple allocation itself, that produces the delays. The current version of RuleDispatch goes to some lengths to avoid creating any temporary data structures (if possible) for this reason. >I like the idea of analyzing the methods for dominance relationships >ahead of time. But I'm not sure why we should then resort to >generating code. Shouldn't it be just as easy to create a data >structure that can be walked by fixed code? (Just like a >recursive-descent parser can be written as a fixed "machine" that >interprets a data structure representing the parse information.) That's what RuleDispatch does currently, but it's again considerably slower than an AST-manipulation approach would be, and is also much slower when it has to handle expressions other than "plain old arguments". Calling it "generating code", however, is a misleading way of thinking about it, in that it implies something much more complex than is necessary. The data structure created to do dispatching is not much different from an AST, and in fact RuleDispatch uses its own simple AST for processing expressions anyway. The parts of RuleDispatch that process expressions are effectively recapitulating the Python interpreter, so I would much rather in a future version just take an AST right off of a function, like this: @some_function.when(lambda: x<27 and isinstance(y,int)) def twenty_seven_and_int(x,y): ... to replace my current use of string parsing for these conditions. At that point RuleDispatch would just basically be performing relatively simple AST pruning and grafting, to take the conditional expression(s) and function bodies and graft them into a larger dispatch tree. Since we already have the mechanisms to manage such ASTs and generate bytecode, and since alternative implementations are going to have to provide the same AST APIs at some point, ISTM that this is the natural way to go for the long term, and provides the most opportunities for customized dispatching techniques, such as RuleDispatch's custom method combinations. (Which allow you to control such things as whether there are "next methods" or not, among other things.) Note that such AST prune-and-graft operations could also potentially implement things like a 'next_method()' call, by substituting the function's previous AST. This would also allow you to do a simple monkeypatching style of overloading (i.e, just replacing the existing function body and chaining) for the very simplest cases with only a handful of methods. And, setting a function's AST needn't immediately cause compilation; it could be delayed until the function is actually called, thus nullifying any cost besides the tree grafting itself, for functions that aren't used in a particular program run. Granted, this definition-time cost might still be higher than the defintion-time cost of a more limited dispatch facility, such as the "tuple of types" approach you've been prototyping. In truth, my thoughts on AST munging are still essentially speculation; they reflect my design for the next version of RuleDispatch, rather than an implemented and tested design at this point. I should also clarify that I think of dispatch from a slightly different perspective, since RuleDispatch allows tests on arbitrary conditions and expressions. As a result, there are constraints on ordering of tests, because of things like a condition for "x>0 and y/x<27", where the second expression would produce an error if it was evaluated out of order. RuleDispatch already has the necessary machinery to manage these issues and build a single dispatch tree for interpretation, however, and it's actually not that complex: RuleDispatch in total is only about 4000 lines of Python, not including tests, but including all those blank lines I put between screenfuls of code. :) It does not reuse any of the stdlib compiler library, either, as I found it too slow and too big for the relatively simple task of processing logical expressions -- the expression processor is ~700 lines, or ~1100 if you include the concrete syntax->abstract syntax converter. Some of this machinery wouldn't be needed if there were a nice fast AST API and I could pull the ASTs directly from functions instead of parsing them. That leaves about 2900 lines of "meat": basically the index classes for handling class lookups and other comparison operators, and the code to build and interpret dispatch trees. The interpretation part would of course go away under an AST regime, while the rest would stay about the same. Naturally, if your goals are more modest, dealing only with actual arguments, no expressions, no operators other than 'isinstance()', no ability to apply boolean logic to the conditions (e.g. "isinstance(arg1,A) and isinstance(arg1,B)", or perhaps "not isinstance(arg1,C)!"), then the logic for an interpreter is indeed much simpler. In that case, you can just use a series of dictionaries that each argument is checked against, one at a time, and you can build those dictionaries in a straightforward way at registration time. In that approach, the lookup loop could look something like: reg = self.root_registry for arg in args: for cls in type(arg).__mro__: # not exactly, but sort of if cls in reg: reg = reg[cls] break else: raise NoMatch # (or fall back to default) # reg is now a function or a placeholder indicating ambiguity You wouldn't actually use __mro__, in order to avoid the unstable dominance hierarchy problem, but you get the gist. If you use RuleDispatch and restrict what operations you use, you'll actually get almost exactly the above; it's just that RuleDispatch may not evaluate the args in strictly left-to-right order, depending on how "good" an argument is for dispatching. (RuleDispatch evaluates the "goodness" of expressions (such as arguments) by how well they fan out and flatten the dispatch tree, so it may decide that the last argument is in fact a better thing to check first. This approach of course has more hash table lookups than your approach, so these are all tradeoffs to be considered, and more importantly, measured. But in general I tend to see this space through the lens of "arbitrary boolean expressions" and thus may occasionally intuit wrong answers for the more limited scope you're contemplating here. Actually, I keep feeling like you're "cheating" because you can get away with so little code -- and I keep having to remind myself that it's because there are so many things that the cached type tuple approach cannot. (The approach which you are using, and which other folks wrote papers about, prior to the Chambers & Chen paper on using dispatch trees to handle arbitrary predicates.) Anyway, I just got home from a 6+ hour flight, and I'm probably not being very coherent right now, so I'm going to go get some sleep. :) _______________________________________________ Python-3000 mailing list Python-3000@python.org http://mail.python.org/mailman/listinfo/python-3000 Unsubscribe: http://mail.python.org/mailman/options/python-3000/archive%40mail-archive.com