Hello,

finally I found some time for stackless. We still have to problem with the AMD64 code, that does not compile on Linux.

Meanwhile there was a little progress:
- I had access to an OS-X system and was able to successfully test my patch. - I simplified the patch. It does not depend on the -fomit-frame-pointer / -fno-omit-frame-pointer flags any more. The price is a little overhead, if a frame pointer is used. Thanks to Alexey Bronzenkov. - I also improved the comments in the patch, because I got the impression that some people didn't understand the concept of the patch fully.

Now, that the patch has been tested on Linux and OS-X I kindly ask to apply this patch.

Regards
  Anselm


Am 18.08.2011 15:18, schrieb Jeff Senn:
Anselm - can you have a look and see what you think of Alexey's version?
Compare with your earlier proposal?  What is you current best version for me to 
test on OS-X
(preferably w/o the darwin specific compile option which I can't tell if you
thought was necessary or not...)

FWIW: My problem with the 32-bit version does seem to have something to do
with the llvm version of gcc on Lion... putting a useless (but not 
optimize-away-able)
statement directly after the call to slp_switch that refers to cst and cstprev
causes it to work again... so I suspect either some sort of "bug"
in code generation -- or an optimization that breaks the stack switching...
if this sounds familiar to anyone (before I have time to go rip apart and 
analyze
the object code -- which will be awhile), please let me know...


On Aug 17, 2011, at 6:09 PM, Alexey Borzenkov wrote:

On Fri, May 6, 2011 at 7:27 PM, Estevo<[email protected]>  wrote:
I got the following error trying to make release27-maint in a 64-bit

gcc -pthread -c -fno-strict-aliasing -DSTACKLESS_FRHACK=0 -g -O2 -DNDEBUG -g
-fwrapv -O3 -Wall -Wstrict-prototypes  -I. -IInclude -I./Include -I./Stackless
-DPy_BUILD_CORE -fno-omit-frame-pointer -O2 -I. -o Stackless/core/slp_transfer.o
./Stackless/core/slp_transfer.c
./Stackless/core/slp_transfer.c: In function ‘slp_transfer’:
./Stackless/core/slp_transfer.c:153:1: error: bp cannot be used in asm here
make: *** [Stackless/core/slp_transfer.o] Erro 1

While working on greenlet and fixing switch-related problems, I found
that I cornered myself into the same issue with gcc (as opposed to
llvm-gcc used on OS X Lion). It seems that for whatever reasons gcc
doesn't allow clobbering rbp when it's using it for referencing stack
frame.

Coincidentally I managed to find a way to correctly save rbp (as well
as csr and cw), that works and compiles both with and without
-fomit-frame-pointers. In anyone is interested you can find my
slp_transfer for amd64 here:

https://bitbucket.org/snaury/greenlet/src/5d37cde9474c/platform/switch_amd64_unix.h

If anyone can verify it's 100% correct (though redundant), then you
can maybe use it in stackless too.

_______________________________________________
Stackless mailing list
[email protected]
http://www.stackless.com/mailman/listinfo/stackless


_______________________________________________
Stackless mailing list
[email protected]
http://www.stackless.com/mailman/listinfo/stackless



--
Vorstand/Board of Management:
Dr. Bernd Finkbeiner, Dr. Roland Niemeier, Dr. Arno Steitz, Dr. Ingrid Zech
Vorsitzender des Aufsichtsrats/
Chairman of the Supervisory Board:
Philippe Miltin
Sitz/Registered Office: Tuebingen
Registergericht/Registration Court: Stuttgart
Registernummer/Commercial Register No.: HRB 382196
Index: Stackless/platf/switch_amd64_unix.h
===================================================================
--- Stackless/platf/switch_amd64_unix.h (revision 88911)
+++ Stackless/platf/switch_amd64_unix.h (working copy)
@@ -2,6 +2,8 @@
  * this is the internal transfer function.
  *
  * HISTORY
+ * 24-Oct-11 Anselm Kruis <a.kruis at science-computing.de>
+ *      Reworked based on the AMD64 ABI spec.
  * 26-Jul-10 Jeff Senn <senn at maya.com>
  *      Got this to work (rather than crash consistently)
  *      on OS-X 10.6 by adding more registers to save set.  
@@ -36,26 +38,95 @@
 #define STACK_MAGIC 0
 
 
-#define REGS_TO_SAVE "rdx", "rbx", "r12", "r13", "r14", "r15", "r9", "r8", 
"rdi", "rsi", "rcx", "rbp"
+/* 
+ * In order to switch the stack, we use the fact, that the compiler 
+ * already knows how to preserve registers accross function calls.
+ *
+ * The relevant AMD64 ABI specifigation pecisely defines which registers
+ * must be preserved and which registers may be modified. 
+ * We use a gcc inline assembly feature to pretend that the inline 
+ * assembly block modifies the registers to be preserved. As a result, 
+ * the compiler emits code to preserve those registers.
+ * 
+ * The AMD64 ABI Draft 0.99.5, Figure 3.4 "Register usage"
+ * defines the following registers as "Preserved across 
+ * function calls":
+ * %rbx         callee-saved register; optionally used as base pointer
+ * %rsp         stack pointer
+ * %rbp         callee-saved register; optional used as frame pointer
+ * %r12 - %r15  callee-saved registers
+ * %mxcsr       SSE2 control and status word
+ * %fcw         x87 control word
+ * 
+ * The compiler always preserves the %rsp register accros a function call.
+ *
+ * Depending on the usage of a frame pointer, which is optional
+ * for the AMD64 ABI, the compiler already preserves the %rbp 
+ * register. Unfortunately, we must not add "rbp" to the clobber list, if 
+ * rbp is used as a frame pointer (won't compile). Therefore we save
+ * rbp manually.
+ *
+ * For the other registers (except %mxcsr and %fcw) we tell the compiler, 
+ * that we are going to clobber the registers. The compiler will then save the 
registers 
+ * for us. (However the compiler gives no guarantee, when it will restore
+ * the registers.) And the compiler only preserves those registers, that must
+ * be preserved according to the calling convention. It does not preserve any 
other 
+ * register, that may be modified during a function call. Therefore specifying 
additional 
+ * registers has no effect at all. Take a look at the generated assembly code!
+ * 
+ * If we want more control, we need to preserve the
+ * registers explicitly similar to  %mxcsr, %fcw and %rbp.
+ * 
+ */
 
-static int
+#if 1
+/* Registers marked as clobbered, minimum set according to the ABI spec. */
+#define REGS_CLOBBERED "rbx", "r12", "r13", "r14", "r15"
+#else
+/* Maximum possible clobber list. It gives the same assembly code as the 
minimum list.
+   If the ABI evolves, it might be necessary to add some of these registers */
+#define REGS_CLOBBERED "memory", "rax", "rbx", "rcx", "rdx", "rsi", "rdi", \
+    "r8", "r9", "r10", "r11", "r12", "r13", "r14", "r15",                      
\
+    "st(1)", "st(2)", "st(3)", "st(4)", "st(5)", "st(6)", "st(7)", \
+    "xmm0", "xmm1", "xmm2", "xmm3", "xmm4", "xmm5", "xmm6", "xmm7", \
+    "xmm8", "xmm9", "xmm10", "xmm11", "xmm12", "xmm13", "xmm14", "xmm15"
+#endif
+
+/* 
+ * You may want to make the function static enable optimizations.
+ * However, the ABI SPEC does not apply to static functions. Therefore 
+ * I make slp_switch a regular global function.
+ */
+#if 0
+static
+#endif
+int
 slp_switch(void)
 {
     register long *stackref, stsizediff;
-    __asm__ volatile ("" : : : REGS_TO_SAVE);
+    void * rbp; int mxcsr; short x87cw;
+    __asm__ volatile (
+        "fstcw %0\n\t"
+        "stmxcsr %1\n\t"
+        "movq %%rbp, %2\n\t"
+        : "=m" (x87cw), "=m" (mxcsr), "=m" (rbp) : : REGS_CLOBBERED );
     __asm__ ("movq %%rsp, %0" : "=g" (stackref));
     {
         SLP_SAVE_STATE(stackref, stsizediff);
         __asm__ volatile (
-            "addq %0, %%rsp\n"
-            "addq %0, %%rbp\n"
+            "addq %0, %%rsp\n\t"
+            "addq %0, %%rbp\n\t"
             :
             : "r" (stsizediff)
             );
         SLP_RESTORE_STATE();
+        __asm__ volatile (
+            "movq %2, %%rbp\n\t"
+           "ldmxcsr %1\n\t"
+            "fldcw %0\n\t"
+            : : "m" (x87cw), "m" (mxcsr), "m" (rbp));
         return 0;
     }
-    __asm__ volatile ("" : : : REGS_TO_SAVE);
 }
 
 #endif


_______________________________________________
Stackless mailing list
[email protected]
http://www.stackless.com/mailman/listinfo/stackless

Reply via email to