Hi,
I just had a look at the AMD64 ABI spec
(http://www.x86-64.org/documentation/abi-0.99.pdf). This spec is valid at least
for Linux and OS-X (see
http://developer.apple.com/library/mac/#documentation/DeveloperTools/Conceptual/LowLevelABI/140-x86-64_Function_Calling_Conventions/x86_64.html#//apple_ref/doc/uid/TP40005035-SW1).
Section 3.2.1 "Registers and the Stack Frame" gives a detailed description of
the register usage. According to the summary given in Figure 3.4 "Register
usage", the following registers are preserved across function calls.
%rbx callee-saved register; optionally used as base pointer
%rsp stack pointer
%rbp callee-saved register; optional used as frame pointer
%r12 - %r15 callee-saved registers
%mxcsr SSE2 control and status word
%fcw x87 control word
The both variants of switch_amd64_unix.h preserve these registers with the
exception of the mxcsr and the fcw register.
Then I had a closer look at the current implementation of switch_amd64_unix.h.
I feel there are a few weak points:
1. The REGS_TO_SAVE list is used to mark registers as clobbered. Therefore the
compiler preserves the initial value of the register, if and only if the
register must be preserved according to the ABI specification in effect. The
two register lists
/* minimum set according to the ABI spec */
#define REGS_TO_SAVE "rbx", "r12", "r13", "r14", "r15"
and
/* maximum set accepted by the compiler */
#define REGS_TO_SAVE "memory", "rax", "rbx", "rcx", "rdx", "rsi", "rdi", \
"r8", "r9", "r10", "r11", "r12", "r13", "r14", "r15", \
"st(1)", "st(2)", "st(3)", "st(4)", "st(5)", "st(6)", "st(7)", \
"xmm0", "xmm1", "xmm2", "xmm3", "xmm4", "xmm5", "xmm6", "xmm7", \
"xmm8", "xmm9", "xmm10", "xmm11", "xmm12", "xmm13", "xmm14", "xmm15"
yield the same or equivalent assembly code. Tested with gcc 3.4.6, 4.1.2 and
4.5.1.
2. Depending on the usage of the -fomit-frame-pointer flag we must or must not
include "rbp" in REGS_TO_SAVE. Unfortunately there's a gcc bug. Of my 3 gcc
versions, only 4.5.1 did preserve the "rbp" register, if -fomit-frame-pointer
is in effect and "rbp" was added to REGS_TO_SAVE.
3. Using the clobber mechanism is a mixed blessing. On the one hand the
compiler knows what to preserve. On the other hand, we have no guarantee, at
which point in time the compiler restores the registers. The compile may
restore the values immediately after the first __asm__ statement. Fortunately
current gcc versions don't do it.
4. The function slp_switch is declared static. Therefore the ABI specification
does not apply to this function. Section 3.2: "The standard calling sequence
requirements apply only to global functions.". However the ABI applies to the
calling function slp_transfer(), but this function is more complicated and the
generated assembly code is harder to read.
5. The registers mxcsr and the fcw are currently not preserved.
6. The last __asm__ statement is dead code (after a "return") and therefore
does not generate code.
------8<-------------8<---------------8<-----------------
#define REGS_TO_SAVE "rdx", "rbx", "r12", "r13", "r14", "r15", "r9", "r8",
"rdi", "rsi", "rcx", "rbp"
static int
slp_switch(void)
{
register long *stackref, stsizediff;
__asm__ volatile ("" : : : REGS_TO_SAVE);
__asm__ ("movq %%rsp, %0" : "=g" (stackref));
{
SLP_SAVE_STATE(stackref, stsizediff);
__asm__ volatile (
"addq %0, %%rsp\n"
"addq %0, %%rbp\n"
:
: "r" (stsizediff)
);
SLP_RESTORE_STATE();
return 0;
}
__asm__ volatile ("" : : : REGS_TO_SAVE);
}
------8<-------------8<---------------8<-----------------
None of these weak points is a real bug and the implementation works fine for
me. Nevertheless I propose to fix these weak points. My preliminary patch is
included below. I'm not sure, that the patch already has production quality,
but it is a starting point. Unfortunately I have access to an OS-X system and
therefore I'm not able test my patch on OS-X.
@Jeff: did you compare the generated assembly code before and after your patch?
What gcc version are you using?
And now folks, comments please.
Regards
Anselm
--- stackless-python/configure.in.orig 2011-07-13 15:39:28.020192050 -0400
+++ stackless-python/configure.in 2011-07-13 15:38:17.719109312 -0400
@@ -318,7 +318,7 @@
# Stackless flags for compiling the hard switching code
case $MACHDEP in
darwin)
- SLPFLAGS="-fomit-frame-pointer -O2"
+ SLPFLAGS="-fomit-frame-pointer -DSLP_NO_FAME_POINTER -O2"
;;
*)
SLPFLAGS="-fno-omit-frame-pointer -O2"
@@ -4303,7 +4303,7 @@
AC_SUBST(SLPFLAGS)
SLPFLAGS="-fno-omit-frame-pointer -O2"
case $MACHDEP in
-darwin) SLPFLAGS="-fomit-frame-pointer -O2";;
+darwin) SLPFLAGS="-fomit-frame-pointer -DSLP_NO_FAME_POINTER -O2";;
'') SLPFLAGS="-fno-omit-frame-pointer -O2";;
esac
--- stackless-python/Stackless/platf/switch_amd64_unix.h.orig 2011-07-11
08:03:37.000000000 -0400
+++ stackless-python/Stackless/platf/switch_amd64_unix.h 2011-07-13
15:27:20.983033224 -0400
@@ -36,26 +36,85 @@
#define STACK_MAGIC 0
-#define REGS_TO_SAVE "rdx", "rbx", "r12", "r13", "r14", "r15", "r9", "r8",
"rdi", "rsi", "rcx", "rbp"
+/*
+ * The AMD64 ABI Draft 0.99.5, Figure 3.4 "Register usage"
+ * defines the following registers as "Preserved across
+ * function calls":
+ * %rbx callee-saved register; optionally used as base pointer
+ * %rsp stack pointer
+ * %rbp callee-saved register; optional used as frame pointer
+ * %r12 - %r15 callee-saved registers
+ * %mxcsr SSE2 control and status word
+ * %fcw x87 control word
+ *
+ * The compiler always preserves the %rsp register.
+ *
+ * Depending on the usage of a frame pointer, which is optional
+ * for the AMD64 ABI, the compiler already preserves the %rbp
+ * register.
+ *
+ * For the other registers (except %mxcsr and %fcw) we tell the compiler,
+ * that we are going to clobber the registers. The compiler will then save the
registers
+ * for us. (However the compiler gives no guarantee, when it will restore
+ * the registers.) And the compiler only preserves those registers, that must
+ * be preserved according to the calling convention. It does not preserve any
other
+ * register, that may be modified during a function call. Therefore specifying
additional
+ * registers has no effect at all. Take a look at the generated assembly code!
+ *
+ * If we want more control, we need to preserve the
+ * registers explicitly similar to %mxcsr and %fcw.
+ *
+ */
+
+#if 0
+/* Registers marked as clobbered, minimum set according to the ABI spec. */
+#define REGS_TO_SAVE "rbx", "r12", "r13", "r14", "r15"
+#else
+/* Maximum possible clobber list, gives the same assembly code as the minimum
set */
+#define REGS_TO_SAVE "memory", "rax", "rbx", "rcx", "rdx", "rsi", "rdi", \
+ "r8", "r9", "r10", "r11", "r12", "r13", "r14", "r15",
\
+ "st(1)", "st(2)", "st(3)", "st(4)", "st(5)", "st(6)", "st(7)", \
+ "xmm0", "xmm1", "xmm2", "xmm3", "xmm4", "xmm5", "xmm6", "xmm7", \
+ "xmm8", "xmm9", "xmm10", "xmm11", "xmm12", "xmm13", "xmm14", "xmm15"
+#endif
+
+#ifdef SLP_NO_FAME_POINTER
+#define FRAME_POINTER_REG , "rbp"
+#else
+#define FRAME_POINTER_REG
+#endif
-static int
+/*
+ * You may want to make the function static enable optimizations.
+ */
+#if 0
+static
+#endif
+int
slp_switch(void)
{
register long *stackref, stsizediff;
- __asm__ volatile ("" : : : REGS_TO_SAVE);
+ int mxcsr; short x87cw;
+ __asm__ volatile (
+ "fstcw %0\n\t"
+ "stmxcsr %1\n\t"
+ : "=m" (x87cw), "=m" (mxcsr) : : REGS_TO_SAVE FRAME_POINTER_REG );
__asm__ ("movq %%rsp, %0" : "=g" (stackref));
{
SLP_SAVE_STATE(stackref, stsizediff);
__asm__ volatile (
- "addq %0, %%rsp\n"
- "addq %0, %%rbp\n"
+ "addq %0, %%rsp\n\t"
+ "addq %0, %%rbp\n\t"
:
: "r" (stsizediff)
);
SLP_RESTORE_STATE();
+ __asm__ volatile (
+ "ldmxcsr %1\n\t"
+ "fldcw %0\n\t"
+ : : "m" (x87cw), "m" (mxcsr) );
return 0;
}
- __asm__ volatile ("" : : : REGS_TO_SAVE);
}
#endif
Am 12.07.2011 15:03, schrieb Richard Tew:
> On Tue, Jul 12, 2011 at 3:30 PM, Anselm Kruis
> <[email protected]> wrote:
>> I'm far from understanding the code completely.
>
> If you have any questions, feel free to ask on the list.
>
>> I looked at the switching code of the greenlet project
>> (http://codespeak.net/py/0.9.2/apigen/source/c-extension/greenlet/switch_amd64_unix.h.html)
>> Pretty much the same as our old version of the code.
>
> The original stack switching code for greenlet was taken from
> Stackless. However, be aware that greenlet is no longer hosted on
> codespeak, and that the place to go for it is here:
>
> https://bitbucket.org/ambroff/greenlet/
>
>> Here is some discussion about the context switch for amd64:
>> http://code.google.com/p/coev/wiki/GreenletProblems The page contains a link
>
> That's an interesting page. I had no idea that greenlet was so
> problematic (in theory). In practice, I imagine that the selective
> range of switched registers are reasonable and sufficient for normal
> usage. Of course, that doesn't make it right, but I will touch on
> that further below. Regarding more architecture neutral problems like
> the threadstate and exception aspect, I expect that is abstracted away
> in any decent framework. Otherwise I wouldn't expect people to have
> confidence in gevent based solutions, as evidenced in the following:
>
> http://tarekziade.wordpress.com/2011/07/12/firefox-sync-python/
>
>> to the amd64 ABI specification http://www.x86-64.org/documentation.html.
>> I'll try to read it later.
>
> Note that (speaking at least on behalf of myself) the switching code
> was generally written based using research on which registers were
> supposed to be solved. Often it is difficult to locate any resource
> that clearly states what registers have to be saved for a platform.
> So do not take the existing sets as being gospel. In some cases
> platforms have changed and required changes to the switching logic.
>
> Cheers,
> Richard.
>
> _______________________________________________
> Stackless mailing list
> [email protected]
> http://www.stackless.com/mailman/listinfo/stackless
--
Dipl. Phys. Anselm Kruis science + computing ag
Senior Solution Architect Ingolstädter Str. 22
email [email protected] 80807 München, Germany
phone +49 89 356386 874 fax 737 www.science-computing.de
--
Vorstand/Board of Management:
Dr. Bernd Finkbeiner, Dr. Roland Niemeier,
Dr. Arno Steitz, Dr. Ingrid Zech
Vorsitzender des Aufsichtsrats/
Chairman of the Supervisory Board:
Philippe Miltin
Sitz/Registered Office: Tuebingen
Registergericht/Registration Court: Stuttgart
Registernummer/Commercial Register No.: HRB 382196
_______________________________________________
Stackless mailing list
[email protected]
http://www.stackless.com/mailman/listinfo/stackless