Howdy, This looks like it could be an amazing MMD performance gain, I would love to work on it. How well-tested is our MMD right now? I will have to take a look.
Duke On Fri, Nov 27, 2009 at 1:36 PM, Allison Randal <[email protected]> wrote: > I've been mulling ways to speed up our MMD in the back of my mind for a > while now. Had a thought this morning as I was working on some quantum > coursework (semi-related to hamming distance). Just a general idea at > the moment, opening it up for group thoughts. > > The tree of parents for a class can be represented as a hypercube graph, > where each parent has an n-bit representation and the Manhattan distance > between a child and any of its parents is the Hamming distance between > the corresponding nodes in the graph, which can be calculated by > counting up the 1 bits on a simple XOR between the n-bit strings for the > child and the parent. So, if you have an inheritance tree like: > > G H I > \ | / > D E F > \ / \ / > B C > \ / > A > > Where A inherits from B and C, B inherits from D and E, etc. Then you can > represent it as: > > A: 0000 > B: 0001 > C: 0010 > D: 0101 > E: 0011 > F: 0110 > G: 1101 > H: 1011 > I: 1110 > > The distance between A and B is: > > 0000 > XOR 0001 > ---- > 0001 = 1 > > 0000 > XOR 1110 > ---- > 1110 = 3 > > (And actually, looking at that, if you always mark the bottom-most child as > all 0, then you don't even have to calculate the XOR for > bottom-child-to-parent, since it's the same as the parent's bitstring.) > > The cost is in setting up the hypercube graph for the inheritance hierarchy > of a particular class. Need to dig into it some more to see how cheap that > could be. The graph could at least be cached as a set of integer "parent > ids" in the child class after it's constructed the first time. > > Wikipedia has a good summary of Hamming distance and hypercube graphs. > > Some relevant work, there's likely more to be found: > http://portal.acm.org/citation.cfm?id=301378 > > "We show that the multi-method dispatching problem can be transformed to a > geometric problem on multi-dimensional integer grids, for which we then > develop a data structure that uses near-linear space and has log-logarithmic > query time. This gives a solution whose performance almost matches that of > the best known algorithm for mono-method dispatching." > > Allison > > _______________________________________________ > http://lists.parrot.org/mailman/listinfo/parrot-dev > -- Jonathan "Duke" Leto [email protected] http://leto.net _______________________________________________ http://lists.parrot.org/mailman/listinfo/parrot-dev
