Ramon van Handel wrote:
> Actually no, my idea was a hybrid between this and your quasi-dynamic
> translation idea. I don't use a static handler but actually generate
> pieces of in-place code in a code cache, which I then jump to. This
> gives the possibility to do dynamic translation as well as emulation,
> without the disadvantage of having to replace *all* of the code (the
> disadvantage with repsect to this method is a slightly larger overhead).
Oh I see, multiple branch targets of generated code.
Nevertheless, there's something that we need to be aware for
_any_ technique which doesn't defer to monitor (ring0) space to handle
virtualization. At ring3 you can not manipulate the TLB entries in your
generated code. The current framework assumes that the currently
executing page executes in the address space of the real one, as well as
executing with an effective 'execute-only' status due to the split I&D TLB
trick. We currently handle execution on a page-at-a-time basis.
To handle branch transfers at ring3, like in generated code,
We would need to decouple use (and need) of the TLB trickery and related code.
And since our translated code page would no longer run in the place
where the real code page lives, we would need to do the following
to make such a decoupling work:
- Use modified CS descriptor and EIP values which point into the
translated code buffer, rather than the guest. The monitor page
tables have to be set to give ring3 access to this area.
- Virtualize any instructions which access data via the CS: prefix,
since we no longer run in the space of the guest CS descriptor.
These could be emulated in the monitor space for now.
- Make certain translation meta information tables available to ring3,
because the translated code and routines need access to this.
- Have tables which correlate between translated code addresses and
related guest code. Program the monitor to use these rather than
assume exception stack frame information is usable directly (as is
the case currently).
These are items which either the "Ramon Approach", the quasi dynamic
translation, or other ring3 approach need to handle.
My thoughts are that once we have decoupled the translated code
from the address space of the real code, and require all the
framework of dynamically translated code, then the motivation for
using the current simple-minded 1:1 SBE strategy diminishes.
Though, the ideals of more simply extending what we have are
good, so I've been thinking over the Ramon Approach in parallel.
BTW, a special consideration of virtualizing with an instruction
of size N (an extra case to add to prescan), is that you have to
scan forwards the size of the instruction you are using (like say
5 for a branch) to see if any prescanned instructions start
in that range. For a 1:1 replacement, like in a branch-for-branch
replacement, this isn't likely other than overlapping instructions.
But for replacing smaller instructions, you may well step
on downstream instructions. You will need to mark each of these
offsets as containing a virtualized instructions.
The downside to this, is that if you do step on instructions
which were OK'd before, then you have to dump the cache for
that page because we don't store branch information and there
may be OK'd branches which would then branch to the middle of
the branch instruction you use to virtualize the instruction.
Additionally, we would need to add some more logic to prescan().
The problem is that prescan(), assumes using the one-byte INT3
instruction to virtualize. Thus if a static intra-page branch
occurs, the instruction is OK'd regardless if it is to a known
virtualized instruction. You would have to change this logic,
to virtualize such branch instructions if they hit the middle
of your branch instruction (i.e. to downstream instructions).
If these branch instructions are then short form, you may
also step on yet more instructions inserting another long form
branch, another cache page dump, and a more overhead calling
the handlers instead of just executing the branches directly.
I'm not sure how much back-propagation virtualization will
be necessary in real code.
Also, what would your strategy be for 16-bit code? As much
as I hate it, we need to be able to run this crud fairly fast
since it's used to boot OSes etc, and some people even run DOS. :^}
The SBE extended by branches kinda breaks down in 16-bit space.
I've actually considered (for the quasi-DT method) translating
code for a 16-bit CS to a 32-bit CS by inserting OPSIZE and ADDRSIZE
prefixes, yet letting the natural {ES,DS,SS,FS,GS} accesses work
as normal. This way we could also easily branch to handler/generated
code as well as inline other code snipits.
-Kevin
--
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
Kevin Lawton [EMAIL PROTECTED]
MandrakeSoft, Inc. Plex86 developer
http://www.linux-mandrake.com/ http://www.plex86.org/