On Mon, Feb 02, 2009 at 09:10:01AM -0700, zooko wrote: > I had another random thought -- could Python or something about the > Python<->C interface or something about your use of SSE2 be mis- > aligning the stack?
The x86-64 ABI specifies that the stack should always be 16-byte aligned upon function entry. It does seem possible Python would not respect that in all cases, or maybe there is some case where using alloca throws things off - since on x86-64 the worst that would usually happen is things run a bit slower due to misaligned memory accesses, it is conceivable that such a bug would be missed. I added assert(((uintptr_t)__builtin_frame_address(0)) % 16 == 0); at the beginning of _addmul1 (and disabled NDEBUG to ensure it was active), and the tests all ran without the assertion triggering, as did my encoding benchmark. I looked at the assembly GCC 4.3 generates for Opteron and Core2 processors for addmul (-O2 and -O2 -fPIC). In each case it pushes 4 64-bit registers onto the stack, and does not touch the stack again until returning when it pops the callee-saved registers. So even if the stack was misaligned, it is hard for me to see how it would affect the performance that much. -Jack _______________________________________________ tahoe-dev mailing list [email protected] http://allmydata.org/cgi-bin/mailman/listinfo/tahoe-dev
