John B�ckstrand wrote:
>
> > Well, can't we simply use the MMU to mark pages where MMIO can happen
> > and then use the page fault to process them ? This way, we don't
> > need to care about checking addresses. We only check them when a
> > fault happens.
>
> IIRC, thats very slow. an interrupt+processing per memory access
> isnt very fast. btw, isnt it done like this today?
>
> and didnt the dosemu guy talk about this? I think he said that
> doing it by PG:ing was slow...
It's done currently by page fault protections. Any interrupt
mechanism will be slow per memory access.
The best way is to not do any MemoryMappedIO at all, and
use a special driver. Other than that, the next fastest
option is to use adaptive translated code, code that
calculates the memory address, sees if it is within MMIO
bounds, and directs the access to real memory or MMIO
mechanisms accordingly. For example, a translated code
sequence can use the native LEA instruction to calculate
the address, then compare it to the bounds 0xa0000..0xbffff.
If outside then just access memory. If within, then route
the data access to MMIO. The MMIO handling code can either
buffer a number of accesses if the conditions warrant that,
or even have some of the specific device's handling resident.
For example, MMIO handling code could have the VGA
memory access --> latch --> VGA framebuffer logic.
If this MMIO handling code is placed at the same ring as
the guest (even ring3), there is potential for such handling to be done
without faults and monitor context switches. It would be placed
at the same ring as the translated code.
I say 'adaptive' because I envision it best to generate such
MMIO bounds checks only when a given instruction has accessed
MMIO and a fault is initially generated in the monitor. When
the instruction is known to access such memory, perhaps translate
it with such checks from that point forward, or even retranslate
it without after the bounds check fails in the future. Leave
the page protection on for MMIO pages.
Though, IMO I think it's best to just get rid of MMIO stuff,
and forget about the above. Either have special drivers, or use
the video card directly somehow if you want performance with
crappy 640x480x16 mode.
With that in mind, one of the dominant performance hits
will be from the virtualization of out-of-page(or page cluster)
and calculated branches. Ramon's idea of virtualizing these
by changing them to a call to handling routines may work quite well
for this. Such routines could be placed in whatever ring
the guest executes in. This would give us decent performance
while letting most code run native without any translated
code overhead. And there is quite a bit more overhead than
you might think translating code for x86. Given the
{guest, handling code} is run at ring3 there is the bonus
of paging protections for the monitor. Even if the guest
code runs away it can't step on the monitor.
-Kevin
--
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
Kevin Lawton [EMAIL PROTECTED]
MandrakeSoft, Inc. Plex86 developer
http://www.linux-mandrake.com/ http://www.plex86.org/