Andrew Whitworth wrote:
The cost is in setting up the hypercube graph for the inheritance hierarchy
of a particular class. Need to dig into it some more to see how cheap that
could be. The graph could at least be cached as a set of integer "parent
ids" in the child class after it's constructed the first time.
This does sound like a pretty neat idea, but I share your worry that
generating the hypercube is going to be expensive. Combine that with
the fact that we would have to completely recompute the hypercube
every time we modified the hierarchy at all, and we couldn't lazily
reuse hypercubes computed for class B when we assign B as a parent of
class A, etc. Of course, creating new classes and modifying existing
hierarchies is likely an infrequent-enough operation for most cases
that the expense will be amortized by fast subsequent lookups.
Actually, I'm not so much concerned that generating the hypercube will
be expensive, more that it'll be a choice between "cheaper than what we
have now", "cheap", and "can we push it hard to make it really cheap?".
But, I haven't spent much time on it yet. A simple first take on how to
do it would be:
- Set the child ID to 0 (i.e. 0000000000)
- Iterate over the immediate parents.
- Set each parent ID to one bit different from child
(i.e. 0000000001, 0000000010, 0000000100, 0000001000)
- Recursively iterate over parents of parents
- Set parent-of-parent ID to one bit different from parent
(i.e. 0000001001, 0000001010, 0000001100)
Could be simpler if you don't bother with unique IDs and just mark the
level (i.e. immediate parents are all 0000000001, next level is all
0000000011, etc). There are some trade-offs there. Using an integer
means we're limited to the size of the integer, 4,294,967,295 unique
parents or 32 levels in a 32-bit integer.
We already invalidate some caches in the Class on modification, so that
part isn't difficult. Some choices would need to be made on how to store
the cache of IDs in the child Class for: the smallest possible storage,
the quickest possible lookup, and easily invalidated (so, likely all in
one place, perhaps a struct pointer in the PMC struct). And low-level
types can't be modified anyway, so one hypercube graph for all of them
could be generated at compile-time.
How does Parrot currently calculate the manhattan distance between a
child and parent type?
Slowly. The horror is mmd_distance in src/multidispatch.c (called by
Parrot_mmd_sort_candidates). That function and many of the functions it
calls are remnants from the old MMD system, left in place for backward
compatibility.
It iterates through integer type arrays of both the call and the
multisub (first creating the arrays if they don't exist), performs a
series of manual checks on the types (which it repeats every time a
dispatch is made on a particular type), and iterates through all the
parents of each call argument (which it repeats every time a dispatch is
made).
So, the step of "iterating over the parents" has to be done anyway. The
hypercube graph ads some bit math, plus cache storage and retrieval.
Bonus points if we can eliminate most of the manual checks (building
core types into the hypergraph) and eliminate creating fixed integer
arrays for the sub and the call.
Some relevant work, there's likely more to be found:
http://portal.acm.org/citation.cfm?id=301378
/me really needs to get a proper ACM subscription eventually to read
all the cool papers there.
Any chance I could get a copy of this one?
Yes, ACM has a sane policy about redistribution of papers. I'll email
you a copy. (Also to anyone else who wants it.)
Allison
_______________________________________________
http://lists.parrot.org/mailman/listinfo/parrot-dev