"Robert W. Cunningham" wrote:

> I've been working with hash functions recently, but more info is needed
> before I can make any kind of recommendation:
> 
> 1. How large is "N" expected to be?  Does it have a hard upper limit?
> (If not, can we assign one?)
> 
> 2. Can the hash use a fixed overhead (allocate and optimize for maximum
> value of N), or should it grow and shrink with N?

Fixed is good.  It could be tied directly to the size N of the meta
info array.  If possible, it might be nice to be able to scale N and
the hash together to gather some real data later.


> 3. I suspect it may be possible to integrate the hash into your meta
> info array.  Is that array statically allocated?  Is it ever dynamically
> reallocated?  How large are the entries in the meta info array?

Fixed to N as above.  Perhaps, if you can make it scale based on
powers of 2 or whatever works out to be simple.  But performance
is the key.  Flexibility has to be a non-primary goal.


> 4. It is always possible to trade compute time for space in a hash.
> Just how slow is the hash allowed to be?  (How often will it be hit?)
> Conversely, what is the maximum tolerable space penalty?  (May it, for
> example, equal the size of your meta info array?)

Let me explain how this hash fits in so you can get the big picture.
Most of the dynamically generated code will be just the original
instructions.  Though, when we see computed or out-of-region static
branches, a small sequence which calls a handler function will be
generated.

So the handler function will be called by many generated sequences.
In the original C code, you could imagine that a loop might be
calling a C function many times/second.  And we would be redirecting
that call via our handler routine.  So, it's important that our
handler routine be _very_ fast.


> 5. Is the page address already part of the meta info you save?  Is the
> array relatively small?  If so, you may want to consider keeping the
> array sorted, then use a simple binary (or other) search of the array to
> do the lookup.  It ain't elegant, but it uses absolutely no overhead.
> By the same token, for larger arrays an AVL tree would add little
> overhead, but greatly simplify the process of keeping the array data
> sorted, and traversing the tree will be much faster than most searches.

Sure, I was planning on storing the page address in the meta info.  The
size of the meta array would be determined by the number of guest pages
for which we would have any amount of translated code.  Some pages may
have only small portions translated, others higher density.  Maybe
if we had 256K of actual translated code, that might translate into
some 128 or 256 meta entries - just pulling these numbers out of thin air.


> It looks to me like all possible mappings through the hash can't be
> known in advance, so that eliminates the use of a perfect hash (absolute
> minimum on both speed and space).  Without a perfect hash, you will have
> one of two things:  Collisions (and collision resolution algorithms) or
> gaps in the hash table(s).  Collisions save space, but cost time to
> resolve.
> 
> If all the page addresses are already mapped into x86 GDT (?)
> structures, I believe there may already be some prior art on solving
> this problem (or a similar one).

You mean mapped into the page tables?


> Finally, I'm not really clear how you want the hash to work:  Will you
> want *any* address from the page to map to the right array entry, or
> just the *start* address of the page?  Are the pages of uniform size
> (4k), or do they vary (clusters of MMU pages)?

Uniform 4k pages.  I'm looking to map based on only the page address
(only the upper 20 bits).  For the offset, I also need to then hash to
find the address of the translated instruction, but have an idea for that
already.

Thanks,
-Kevin

-- 
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
Kevin Lawton                        [EMAIL PROTECTED]
MandrakeSoft, Inc.                  Plex86 developer
http://www.linux-mandrake.com/      http://www.plex86.org/

Reply via email to