Hi Timo,

Regarding #1, we actually have this working in our tree. It was required for 
the new SYSENTRY C handler. So thank you for the advice and for pointing that 
out. Unfortunately I will not credit you for the idea since the code was 
already written, I hope this is understandable. We had to do this not for 
performance, but for compatibility with the Eoi helper function which is 
exported. The performance difference between pushad and doing the instructions 
manually (and later fixing up) was actually not that big, since pushad has 
certain pipeline advantages. However, when dealing with interrupts, it starts 
becoming more of a problem (although that code is not in mainline yet). Also 
your sample code has some bugs: the last integer register save is duplicated, 
and it would be better to use a symbolic name instead of 10 * 4. That value is 
also wrong, since it depends on whether or not the trap generates an error code 
or not. x86 is not easy, have to be careful!

Regarding #2, I unfortunately do not see the need to add more assembly when it 
is not needed. There are no subtle compiler breakages involved, since it is 
impossible for the compiler to add a DS/ES access for no reason at all (can you 
come up with an example?), this is part of the contract the compiler has to 
make (if compilers did this, it would break a whole class of embedded and 
real-mode software). Because the stack is safe, I think it's worth avoiding the 
extra lines of assembly. The C inlines will produce exactly the same code 
anyway. Also, it's not as easy as you make it seem, since for a FAST-V86 trap, 
we don't want to save the registers, just change them. So we need to insert a 
check for EFLAGS to see if the V8086 flag is there or not, then conditionally 
do the registers, then jump to the C code that will do another conditional 
check, and pick the right trap entry C code... it gets quite complicated. Since 
there are no "real" issue with doing it in C, other than personal choice and 
style, I think we'll stick with what we have.

The patch for #1 will also start introducing some other performance enhancing 
features, such as branch prediction hints for the CPU and compiler, and other 
little boosts. With the pusha conversion code gone, the C code is actually more 
efficient than previous assembly implementations.

Thanks for your comments and suggestions, please keep them coming. Do not be 
discouraged by this particular decision regarding #2 (it's also slightly 
political) as a sign or pattern that any suggestion will be ignored (you were 
not the first to bring up #1, otherwise we'd have done it thanks to you).

Also, if anyone has any experience with MSC++ and could port the GCC-centric 
macros, that would be much appreciated. I've given up on MSC++ after the 
compiler and linker have tried, year after year, to add more and more shit to 
what used to be a simple portable codebase. Buffer this, buffer that, nx this, 
dep that, safe seh here, hotpatch there, manifest files to even get it to 
run... ARGH!

Finally, the recent fixes have solved the double fault problem, but it seems 
certain flavours of VMWare use a different Video BIOS than my test machines and 
that V8086 C emulation code must be hitting a bug. It could also be we're 
seeing the BOP and not handling it. I'll look into this today. Note that the 
ASM implementation of PUSHF  (or was it POPF) was completely broken. Someone 
must've made some serious typos when *ahem* writing the code. Hopefully there 
are no dependencies on that bug.

-r

> Hi,
> 
> I suggest the following changes to the current implementation:
> 
> 1.) Replace the PUSHA in the trap entry code, by a series of MOVs to the
> correct stack positions. The same for the trap exit code.
> KiTrapFrameFromPushaStack can be removed then. This would result in more
> clear, less complex and faster code, with only  a few additional
> assembly instructions in  the trap entry macro. The exit could be done
> either by normal call/return or by a jmp to an exit handler.
> 
> 2.) Segements should be fixed up before entering C code, as not doing so
> may introduce possible compiler dependend breakages. It's also much
> cleaner and there's no reason to do the same stuff later in inline
> assembly instead of direcly in the asm entry point.
> 
> The resulting code might looks something like this:
> 
>     /* Allocate KTRAP_FRAME */
>     sub esp, KTRAP_FRAME_LENGTH - 10 * 4
> 
>     /* Save integer registers */
>     mov [esp + KTRAP_FRAME_EBP], ebp
>     mov [esp + KTRAP_FRAME_EBX], ebx
>     mov [esp + KTRAP_FRAME_ESI], esi
>     mov [esp + KTRAP_FRAME_EDI], edi
>     mov [esp + KTRAP_FRAME_EAX], eax
>     mov [esp + KTRAP_FRAME_ECX], ecx
>     mov [esp + KTRAP_FRAME_EDX], edx
>     mov [esp + KTRAP_FRAME_EBX], ebx
> 
>     /* Save segment regs */
>     mov [esp + KTRAP_FRAME_SEGDS], ds
>     mov [esp + KTRAP_FRAME_SEGES], es
> 
>     /* Fixup segment regs */
>     mov ax, KGDT_R3_DATA | RPL_MASK
>     mov ds, ax
>     mov es, ax
> 
> 
> Timo
> 
> 
> _______________________________________________
> Ros-dev mailing list
> [email protected]
> http://www.reactos.org/mailman/listinfo/ros-dev



_______________________________________________
Ros-dev mailing list
[email protected]
http://www.reactos.org/mailman/listinfo/ros-dev

Reply via email to