Re: [PATCH] S/390: Hardware transactional memory support
On Tue, 2013-08-13 at 12:09 -0700, Richard Henderson wrote: On 08/13/2013 11:26 AM, Jakub Jelinek wrote: On Fri, Aug 02, 2013 at 11:05:33AM -1000, Richard Henderson wrote: On 08/02/2013 04:45 AM, Andreas Krebbel wrote: ! XCFLAGS=${XCFLAGS} -mzarch -mhtm -msoft-float Not good, since _ITM_R{F,D,E,CF,CD,CE} should return values in floating point registers; similarly for the write accessors. So, would it be enough to compile just beginend.cc with -msoft-float and the rest normally? No. From what I understand of the s390 restriction, we can't touch the floating point registers after starting a transaction, at least until we're committed to restoring them all. Which means that we have to have everything along the path from htm_begin_success == false until a longjmp restores them. This path includes a call to std::operator new, so that makes it a non-starter. Better, I think, to pass the gtm_jmpbuf to htm_begin. Then we can do uint32_t ret = __builtin_tbegin_nofloat (NULL); if (!htm_begin_success(ret)) // restore fpu state from jb return ret; at which point we're back to normal and we can do whatever we want within the normal abi wrt the fpu. Can we instead move the successful path of the HTM fastpath to the _ITM_beginTransaction asm function, and just do the fallback and retry code in beginend.cc? So, add a tbegin (or xbegin) and the check of serial_lock to the asm function as in Andi's patch (http://gcc.gnu.org/ml/gcc-patches/2013-01/msg00640.html), but keep the restart policy code (ie, the serial lock read/write locking for the wait, etc.) in beginend.cc? The latter could return a new code that tells the asm function to try the HTM fastpath again; we might also want to use a separate bit in the arguments to gtm_thread::begin_transaction to tell it whether the HTM fastpath failed before. This way, we get lower overhead when the HTM fastpath succeeds right away because we can do the minimally necessary thing in asm; we still can change the retry policies easily; and we can restore the floating point registers to a sane state for any libitm C/C++ code that we might call without having to do any restore inside of the C/C++ code. Thoughts? I'll put working on a draft of this for x86 on my list.
Re: [PATCH] S/390: Hardware transactional memory support
On 08/14/2013 05:44 AM, Torvald Riegel wrote: Can we instead move the successful path of the HTM fastpath to the _ITM_beginTransaction asm function, and just do the fallback and retry code in beginend.cc? Certainly, but this seems like a larger re-design, and I was thinking of a shorter-term solution. r~
Re: [PATCH] S/390: Hardware transactional memory support
On 02/08/13 23:05, Richard Henderson wrote: On 08/02/2013 04:45 AM, Andreas Krebbel wrote: ! XCFLAGS=${XCFLAGS} -mzarch -mhtm -msoft-float Not good, since _ITM_R{F,D,E,CF,CD,CE} should return values in floating point registers; similarly for the write accessors. Right. I've reverted that change on mainline and 4.8 branch. Bye, -Andreas-
Re: [PATCH] S/390: Hardware transactional memory support
On 14/08/13 17:33, Richard Henderson wrote: On 08/14/2013 05:44 AM, Torvald Riegel wrote: Can we instead move the successful path of the HTM fastpath to the _ITM_beginTransaction asm function, and just do the fallback and retry code in beginend.cc? Certainly, but this seems like a larger re-design, and I was thinking of a shorter-term solution. Before http://gcc.gnu.org/ml/gcc-patches/2013-05/msg00820.html no FPR appears to be used as GPR save slot. So it should be safe for 4.8 branch. Skimming through the EC12 compiled libitm.so disassembly all FPR uses appear to be limited to the ITM_* accessor functions. I think 4.8 should be ok as is and we perhaps could go with the patches from Andi for mainline? Bye, -Andreas-
Re: [PATCH] S/390: Hardware transactional memory support
On Fri, Aug 02, 2013 at 11:05:33AM -1000, Richard Henderson wrote: On 08/02/2013 04:45 AM, Andreas Krebbel wrote: ! XCFLAGS=${XCFLAGS} -mzarch -mhtm -msoft-float Not good, since _ITM_R{F,D,E,CF,CD,CE} should return values in floating point registers; similarly for the write accessors. So, would it be enough to compile just beginend.cc with -msoft-float and the rest normally? Or what routines in particular must be prevented from using float registers implicitly? Jakub
Re: [PATCH] S/390: Hardware transactional memory support
On 08/13/2013 11:26 AM, Jakub Jelinek wrote: On Fri, Aug 02, 2013 at 11:05:33AM -1000, Richard Henderson wrote: On 08/02/2013 04:45 AM, Andreas Krebbel wrote: ! XCFLAGS=${XCFLAGS} -mzarch -mhtm -msoft-float Not good, since _ITM_R{F,D,E,CF,CD,CE} should return values in floating point registers; similarly for the write accessors. So, would it be enough to compile just beginend.cc with -msoft-float and the rest normally? No. From what I understand of the s390 restriction, we can't touch the floating point registers after starting a transaction, at least until we're committed to restoring them all. Which means that we have to have everything along the path from htm_begin_success == false until a longjmp restores them. This path includes a call to std::operator new, so that makes it a non-starter. Better, I think, to pass the gtm_jmpbuf to htm_begin. Then we can do uint32_t ret = __builtin_tbegin_nofloat (NULL); if (!htm_begin_success(ret)) // restore fpu state from jb return ret; at which point we're back to normal and we can do whatever we want within the normal abi wrt the fpu. r~
Re: [PATCH] S/390: Hardware transactional memory support
On Fri, 2013-06-21 at 12:23 +0200, Andreas Krebbel wrote: Index: libitm/config/s390/target.h === *** libitm/config/s390/target.h.orig --- libitm/config/s390/target.h *** [...] + static inline uint32_t + htm_begin () + { + return __builtin_tbegin_nofloat (NULL); + } I'm not quite sure I understand why the nofloat variant is sufficient; is this because we don't use FPRs in the abort/restart code of libitm (and the FPRs are known to be clobbered after starting a txn)? There is a bit in the properties passed to _ITM_beginTransaction which states whether the txn has floating point update code (pr_hasNoFloatUpdate). Would this be useful? (BTW, sorry for the late response. I don't read gcc-patches frequently, so please CC me on libitm-related things.)
Re: [PATCH] S/390: Hardware transactional memory support
On 02/08/13 13:31, Torvald Riegel wrote: On Fri, 2013-06-21 at 12:23 +0200, Andreas Krebbel wrote: Index: libitm/config/s390/target.h === *** libitm/config/s390/target.h.orig --- libitm/config/s390/target.h *** [...] + static inline uint32_t + htm_begin () + { + return __builtin_tbegin_nofloat (NULL); + } I'm not quite sure I understand why the nofloat variant is sufficient; is this because we don't use FPRs in the abort/restart code of libitm (and the FPRs are known to be clobbered after starting a txn)? Yes. To my understanding there are 2 potential issues when the FPRs are not saved. 1. The abort code will read corrupted FPR content for FPRs: - which are live across tbegin and - are updated in the TX body and - are read in the abort code The abort code will see the updated value from the TX body. Since libitm implements TX begins as function calls only call-saved registers can be live across a tbegin. But all the call-saved FPRs are saved in _ITM_beginTransaction and get restored when doing the longjmp back into the user code. So this should be no problem. 2. The other problem is that the ABI might get violated when aborting into a callee which already returned and there are FPRs which have been modified afterwards. In that case the compiler would not have generated FPR save/restore instructions in the function prologue/epilogue of the callee. But in fact there are modified FPRs when exiting the second time. But again this should be no problem thanks to the setjmp/longjmp semantics of _ITM_beginTransaction and GTM_longjmp. If the TX body modified a call-saved FPR it will get restored on the abort path back from libitm to the user code. As long as libitm does not use FPRs itself this should be safe without having tbegin clobbering FPRs. There is a bit in the properties passed to _ITM_beginTransaction which states whether the txn has floating point update code (pr_hasNoFloatUpdate). Would this be useful? This probably would be useful in the ITM_beginTransaction / GTM_longjmp implementations which in that case could try to avoid the FPR save/restores. But Richard mentioned that these bits so far are never set by GCC since it is lacking the required infos from the register allocator. (BTW, sorry for the late response. I don't read gcc-patches frequently, so please CC me on libitm-related things.) No problem. I'll try to remember this and cc you next time. Bye, -Andreas-
Re: [PATCH] S/390: Hardware transactional memory support
On Fri, 2013-08-02 at 15:16 +0200, Andreas Krebbel wrote: Since libitm implements TX begins as function calls only call-saved registers can be live across a tbegin. But all the call-saved FPRs are saved in _ITM_beginTransaction and get restored when doing the longjmp back into the user code. So this should be no problem. Except that the htm_begin() routines are declared static inline functions, so when they're inlined, you aren't protected by the semantics of a call anymore, are you? Peter
Re: [PATCH] S/390: Hardware transactional memory support
On 02/08/13 16:23, Peter Bergner wrote: On Fri, 2013-08-02 at 15:16 +0200, Andreas Krebbel wrote: Since libitm implements TX begins as function calls only call-saved registers can be live across a tbegin. But all the call-saved FPRs are saved in _ITM_beginTransaction and get restored when doing the longjmp back into the user code. So this should be no problem. Except that the htm_begin() routines are declared static inline functions, so when they're inlined, you aren't protected by the semantics of a call anymore, are you? They get inlined into libitm code but not into the user code. As I understand it in the user code will always be a call to _ITM_beginTransaction. -Andreas- Peter
Re: [PATCH] S/390: Hardware transactional memory support
On Fri, 2013-08-02 at 16:26 +0200, Andreas Krebbel wrote: On 02/08/13 16:23, Peter Bergner wrote: On Fri, 2013-08-02 at 15:16 +0200, Andreas Krebbel wrote: Since libitm implements TX begins as function calls only call-saved registers can be live across a tbegin. But all the call-saved FPRs are saved in _ITM_beginTransaction and get restored when doing the longjmp back into the user code. So this should be no problem. Except that the htm_begin() routines are declared static inline functions, so when they're inlined, you aren't protected by the semantics of a call anymore, are you? They get inlined into libitm code but not into the user code. As I understand it in the user code will always be a call to _ITM_beginTransaction. Sure, that protects the user code, but what about the libitm code? From your previous comment: As long as libitm does not use FPRs itself this should be safe without having tbegin clobbering FPRs. Is it a given that s390 doesn't use FPRs without explicit use of floating point types? I ask, because on POWER, we can and do generate floating point code without explicit use of double, float, etc. Maybe s390 is safe in that respect. Peter
Re: [PATCH] S/390: Hardware transactional memory support
On 02/08/13 16:36, Peter Bergner wrote: On Fri, 2013-08-02 at 16:26 +0200, Andreas Krebbel wrote: On 02/08/13 16:23, Peter Bergner wrote: As long as libitm does not use FPRs itself this should be safe without having tbegin clobbering FPRs. Is it a given that s390 doesn't use FPRs without explicit use of floating point types? I ask, because on POWER, we can and do generate floating point code without explicit use of double, float, etc. Maybe s390 is safe in that respect. S/390 used to be on the safe side regarding this but with mainline we started using FPRs as GPR save slots. As Richard suggested we should build libitm with -msoft-float in order to prevent this within libitm. Unfortunately I forgot about this in my S/390 libitm patch. So I'll commit the following: Index: libitm/configure.tgt === *** libitm/configure.tgt.orig 2013-07-30 07:42:29.0 + --- libitm/configure.tgt2013-08-02 14:43:18.641206151 + *** case ${target_cpu} in *** 109,115 ARCH=x86 ;; s390|s390x) ! XCFLAGS=${XCFLAGS} -mzarch -mhtm ARCH=s390 ;; --- 109,115 ARCH=x86 ;; s390|s390x) ! XCFLAGS=${XCFLAGS} -mzarch -mhtm -msoft-float ARCH=s390 ;;
Re: [PATCH] S/390: Hardware transactional memory support
On Fri, 2013-08-02 at 15:16 +0200, Andreas Krebbel wrote: On 02/08/13 13:31, Torvald Riegel wrote: On Fri, 2013-06-21 at 12:23 +0200, Andreas Krebbel wrote: Index: libitm/config/s390/target.h === *** libitm/config/s390/target.h.orig --- libitm/config/s390/target.h *** [...] + static inline uint32_t + htm_begin () + { + return __builtin_tbegin_nofloat (NULL); + } I'm not quite sure I understand why the nofloat variant is sufficient; is this because we don't use FPRs in the abort/restart code of libitm (and the FPRs are known to be clobbered after starting a txn)? Yes. To my understanding there are 2 potential issues when the FPRs are not saved. 1. The abort code will read corrupted FPR content for FPRs: - which are live across tbegin and - are updated in the TX body and - are read in the abort code The abort code will see the updated value from the TX body. Since libitm implements TX begins as function calls only call-saved registers can be live across a tbegin. But all the call-saved FPRs are saved in _ITM_beginTransaction and get restored when doing the longjmp back into the user code. So this should be no problem. 2. The other problem is that the ABI might get violated when aborting into a callee which already returned and there are FPRs which have been modified afterwards. In that case the compiler would not have generated FPR save/restore instructions in the function prologue/epilogue of the callee. But in fact there are modified FPRs when exiting the second time. But again this should be no problem thanks to the setjmp/longjmp semantics of _ITM_beginTransaction and GTM_longjmp. If the TX body modified a call-saved FPR it will get restored on the abort path back from libitm to the user code. As long as libitm does not use FPRs itself this should be safe without having tbegin clobbering FPRs. And I suppose that even if it would use FPRs, it shouldn't ever access memory locations touched in user code. Thanks for the explanation. Some of this was obvious, sorry, but that's what happens when a mental context switch is incomplete :) Next time, I'd prefer a summary of such reasoning to be added as a comment in the source; it's just too easy to forget about all the subtleties... There is a bit in the properties passed to _ITM_beginTransaction which states whether the txn has floating point update code (pr_hasNoFloatUpdate). Would this be useful? This probably would be useful in the ITM_beginTransaction / GTM_longjmp implementations which in that case could try to avoid the FPR save/restores. But Richard mentioned that these bits so far are never set by GCC since it is lacking the required infos from the register allocator. Another thing that comes to mind is that it might be useful to put parts of the HTM fast path into the asm pieces of _ITM_beginTransaction. That way, one can use exactly the amount of SW setjmp one needs given what's easy to save/restore for the TM; this might decrease the startup costs of a txn somewhat. Andi Kleen posted a patch for how to do this for RTM a while ago (but it had some issues): http://gcc.gnu.org/ml/gcc-patches/2013-01/msg01189.html Perhaps something along these lines (while avoiding the issues :) ) would be useful for s/390 and Power too. Torvald
Re: [PATCH] S/390: Hardware transactional memory support
On 08/02/2013 04:45 AM, Andreas Krebbel wrote: ! XCFLAGS=${XCFLAGS} -mzarch -mhtm -msoft-float Not good, since _ITM_R{F,D,E,CF,CD,CE} should return values in floating point registers; similarly for the write accessors. r~
Re: [PATCH] S/390: Hardware transactional memory support
On Tue, 2013-07-16 at 08:08 -0400, David Edelsohn wrote: Then I agree that the hunk should be reverted so that HTM does not imply POWER8. Ok, I'll bootstrap and regtest just to be sure there is no unintended fallout for such a small path before committing. Thanks. Peter