On Fri, Dec 22, 2000 at 01:37:40PM -0500, Kevin Lawton wrote:
> Hey,
> 
> What better to ponder over the holiday season, then ideas for
> enhancing performance in plex86.
> 
> This would be a good chance for people with interests in
> performance and achitectural aspects of plex86, to weigh in.

Hi Kevin,

Thanks for the very informative write-up of the existing mechanism,
and the proposed enhancements.  There is one part of the existing
system that I do not fully understand..

>  E) Branch instructions.  We have to handle branch instructions of
>     both static and calculated offsets, in such a way that execution
>     does not occur in areas of untranslated code.  Additionally, because
>     translated code fragments may at any time be invalidated, we need
>     to generate efficient code which can handle branching to other
>     translated code which may be invalidated in the future.
[...]
> Scan Before Execute (SBE): our current strategy
> ===============================================
[..]
> Referring to the points above, A and D are not an issue with
> this strategy.  For E, in-page static branches are allowed to execute
> natively, but out-of-page and calculated instructions are virtualized.
[..]
> The downside of this strategy is that the technique for virtualizing
> certain instructions (replacing with INT3) is performance expensive.
> Each execution of that instruction invokes an exception, which transitions
> from ring3 (where the guest is executed) to the ring0 (where the monitor
> is executed), processing in the monitor, and a return transition back
> to the guest.  Since this INT3 is not specific to any particular
> virtualized instruction, processing has to be more generic and thus
> even more expensive.  If there are a large number of out-of-page
> branches (for example, C function calls outside of the page) or
> calculated branches (for example, optimized switch statements or
> dereferenced function calls), then performance penalties can be big.

>From the above excerpt, you seem to be implying that one of the main
performance bottle-necks of the current code is the result of plex
virtualizing branch instructions and the current associated cost of
virtualizing instructions via the INT3 method.

I do not understand why plex needs to virtualize out-of-page and
dynamic branch instructions.  Obviously, there needs to be a mechanism
that prevents the guest code from running untranslated instructions
(your point E), but couldn't this be achieved via manipulation of the
page tables?

Specifically, if plex were to manipulate the page tables in such a way
that only translated pages were permitted to be executed, then each
time a branch to an untranslated page occurred, a page-fault would
result.  In this scenario, plex could then translate the page in
question (assuming any translations were necessary), map it into the
guest address space, and resume execution.  For branches into pages
that plex could not fully translate, plex could flood-fill the
untranslated parts of the page with INT3 - thus causing any branch to
an untranslated part of that page to cause an immediate fault.

Unless I'm missing something, the above scheme could permit large
portions of the code to be translated and then run natively.

I assume I'm missing something -- but what, I'm not sure..  :-)
-Kevin

Reply via email to