Andrew Whitworth wrote:

The cost is in setting up the hypercube graph for the inheritance hierarchy
of a particular class. Need to dig into it some more to see how cheap that
could be. The graph could at least be cached as a set of integer "parent
ids" in the child class after it's constructed the first time.

This does sound like a pretty neat idea, but I share your worry that
generating the hypercube is going to be expensive. Combine that with
the fact that we would have to completely recompute the hypercube
every time we modified the hierarchy at all, and we couldn't lazily
reuse hypercubes computed for class B when we assign B as a parent of
class A, etc. Of course, creating new classes and modifying existing
hierarchies is likely an infrequent-enough operation for most cases
that the expense will be amortized by fast subsequent lookups.

Actually, I'm not so much concerned that generating the hypercube will be expensive, more that it'll be a choice between "cheaper than what we have now", "cheap", and "can we push it hard to make it really cheap?". But, I haven't spent much time on it yet. A simple first take on how to do it would be:

- Set the child ID to 0 (i.e. 0000000000)

- Iterate over the immediate parents.
  - Set each parent ID to one bit different from child
    (i.e. 0000000001, 0000000010, 0000000100, 0000001000)
  - Recursively iterate over parents of parents
    - Set parent-of-parent ID to one bit different from parent
      (i.e. 0000001001, 0000001010, 0000001100)

Could be simpler if you don't bother with unique IDs and just mark the level (i.e. immediate parents are all 0000000001, next level is all 0000000011, etc). There are some trade-offs there. Using an integer means we're limited to the size of the integer, 4,294,967,295 unique parents or 32 levels in a 32-bit integer.

We already invalidate some caches in the Class on modification, so that part isn't difficult. Some choices would need to be made on how to store the cache of IDs in the child Class for: the smallest possible storage, the quickest possible lookup, and easily invalidated (so, likely all in one place, perhaps a struct pointer in the PMC struct). And low-level types can't be modified anyway, so one hypercube graph for all of them could be generated at compile-time.

How does Parrot currently calculate the manhattan distance between a
child and parent type?

Slowly. The horror is mmd_distance in src/multidispatch.c (called by Parrot_mmd_sort_candidates). That function and many of the functions it calls are remnants from the old MMD system, left in place for backward compatibility.

It iterates through integer type arrays of both the call and the multisub (first creating the arrays if they don't exist), performs a series of manual checks on the types (which it repeats every time a dispatch is made on a particular type), and iterates through all the parents of each call argument (which it repeats every time a dispatch is made).

So, the step of "iterating over the parents" has to be done anyway. The hypercube graph ads some bit math, plus cache storage and retrieval.

Bonus points if we can eliminate most of the manual checks (building core types into the hypergraph) and eliminate creating fixed integer arrays for the sub and the call.

Some relevant work, there's likely more to be found:
http://portal.acm.org/citation.cfm?id=301378

/me really needs to get a proper ACM subscription eventually to read
all the cool papers there.

Any chance I could get a copy of this one?

Yes, ACM has a sane policy about redistribution of papers. I'll email you a copy. (Also to anyone else who wants it.)

Allison
_______________________________________________
http://lists.parrot.org/mailman/listinfo/parrot-dev

Reply via email to