Re: [PATCH] S/390: Hardware transactional memory support

2013-08-14 Thread Torvald Riegel
On Tue, 2013-08-13 at 12:09 -0700, Richard Henderson wrote:
 On 08/13/2013 11:26 AM, Jakub Jelinek wrote:
  On Fri, Aug 02, 2013 at 11:05:33AM -1000, Richard Henderson wrote:
  On 08/02/2013 04:45 AM, Andreas Krebbel wrote:
  !   XCFLAGS=${XCFLAGS} -mzarch -mhtm -msoft-float
 
  Not good, since _ITM_R{F,D,E,CF,CD,CE} should return values in
  floating point registers; similarly for the write accessors.
  
  So, would it be enough to compile just beginend.cc with -msoft-float
  and the rest normally?
 
 No.
 
 From what I understand of the s390 restriction, we can't touch the
 floating point registers after starting a transaction, at least until
 we're committed to restoring them all.
 
 Which means that we have to have everything along the path from
 htm_begin_success == false until a longjmp restores them.  This path
 includes a call to std::operator new, so that makes it a non-starter.
 
 Better, I think, to pass the gtm_jmpbuf to htm_begin.  Then we can do
 
   uint32_t ret = __builtin_tbegin_nofloat (NULL);
   if (!htm_begin_success(ret))
 // restore fpu state from jb
   return ret;
 
 at which point we're back to normal and we can do whatever we want
 within the normal abi wrt the fpu.

Can we instead move the successful path of the HTM fastpath to the
_ITM_beginTransaction asm function, and just do the fallback and retry
code in beginend.cc?  So, add a tbegin (or xbegin) and the check of
serial_lock to the asm function as in Andi's patch
(http://gcc.gnu.org/ml/gcc-patches/2013-01/msg00640.html), but keep the
restart policy code (ie, the serial lock read/write locking for the
wait, etc.) in beginend.cc?  The latter could return a new code that
tells the asm function to try the HTM fastpath again; we might also want
to use a separate bit in the arguments to gtm_thread::begin_transaction
to tell it whether the HTM fastpath failed before.

This way, we get lower overhead when the HTM fastpath succeeds right
away because we can do the minimally necessary thing in asm; we still
can change the retry policies easily; and we can restore the floating
point registers to a sane state for any libitm C/C++ code that we might
call without having to do any restore inside of the C/C++ code.

Thoughts?  I'll put working on a draft of this for x86 on my list.



Re: [PATCH] S/390: Hardware transactional memory support

2013-08-14 Thread Richard Henderson
On 08/14/2013 05:44 AM, Torvald Riegel wrote:
 Can we instead move the successful path of the HTM fastpath to the
 _ITM_beginTransaction asm function, and just do the fallback and retry
 code in beginend.cc?

Certainly, but this seems like a larger re-design, and I was thinking of
a shorter-term solution.


r~


Re: [PATCH] S/390: Hardware transactional memory support

2013-08-14 Thread Andreas Krebbel
On 02/08/13 23:05, Richard Henderson wrote:
 On 08/02/2013 04:45 AM, Andreas Krebbel wrote:
 !   XCFLAGS=${XCFLAGS} -mzarch -mhtm -msoft-float
 
 Not good, since _ITM_R{F,D,E,CF,CD,CE} should return values in
 floating point registers; similarly for the write accessors.

Right. I've reverted that change on mainline and 4.8 branch.

Bye,

-Andreas-




Re: [PATCH] S/390: Hardware transactional memory support

2013-08-14 Thread Andreas Krebbel
On 14/08/13 17:33, Richard Henderson wrote:
 On 08/14/2013 05:44 AM, Torvald Riegel wrote:
 Can we instead move the successful path of the HTM fastpath to the
 _ITM_beginTransaction asm function, and just do the fallback and retry
 code in beginend.cc?
 
 Certainly, but this seems like a larger re-design, and I was thinking of
 a shorter-term solution.
 

Before http://gcc.gnu.org/ml/gcc-patches/2013-05/msg00820.html no FPR appears 
to be used as GPR save
slot. So it should be safe for 4.8 branch. Skimming through the EC12 compiled 
libitm.so disassembly
all FPR uses appear to be limited to the ITM_* accessor functions.  I think 4.8 
should be ok as is
and we perhaps could go with the patches from Andi for mainline?

Bye,

-Andreas-



Re: [PATCH] S/390: Hardware transactional memory support

2013-08-13 Thread Jakub Jelinek
On Fri, Aug 02, 2013 at 11:05:33AM -1000, Richard Henderson wrote:
 On 08/02/2013 04:45 AM, Andreas Krebbel wrote:
  !   XCFLAGS=${XCFLAGS} -mzarch -mhtm -msoft-float
 
 Not good, since _ITM_R{F,D,E,CF,CD,CE} should return values in
 floating point registers; similarly for the write accessors.

So, would it be enough to compile just beginend.cc with -msoft-float
and the rest normally?
Or what routines in particular must be prevented from using float registers
implicitly?

Jakub


Re: [PATCH] S/390: Hardware transactional memory support

2013-08-13 Thread Richard Henderson
On 08/13/2013 11:26 AM, Jakub Jelinek wrote:
 On Fri, Aug 02, 2013 at 11:05:33AM -1000, Richard Henderson wrote:
 On 08/02/2013 04:45 AM, Andreas Krebbel wrote:
 !   XCFLAGS=${XCFLAGS} -mzarch -mhtm -msoft-float

 Not good, since _ITM_R{F,D,E,CF,CD,CE} should return values in
 floating point registers; similarly for the write accessors.
 
 So, would it be enough to compile just beginend.cc with -msoft-float
 and the rest normally?

No.

From what I understand of the s390 restriction, we can't touch the
floating point registers after starting a transaction, at least until
we're committed to restoring them all.

Which means that we have to have everything along the path from
htm_begin_success == false until a longjmp restores them.  This path
includes a call to std::operator new, so that makes it a non-starter.

Better, I think, to pass the gtm_jmpbuf to htm_begin.  Then we can do

  uint32_t ret = __builtin_tbegin_nofloat (NULL);
  if (!htm_begin_success(ret))
// restore fpu state from jb
  return ret;

at which point we're back to normal and we can do whatever we want
within the normal abi wrt the fpu.


r~


Re: [PATCH] S/390: Hardware transactional memory support

2013-08-02 Thread Torvald Riegel
On Fri, 2013-06-21 at 12:23 +0200, Andreas Krebbel wrote:
 Index: libitm/config/s390/target.h
 ===
 *** libitm/config/s390/target.h.orig
 --- libitm/config/s390/target.h
 ***

[...]

 + static inline uint32_t
 + htm_begin ()
 + {
 +   return __builtin_tbegin_nofloat (NULL);
 + } 

I'm not quite sure I understand why the nofloat variant is sufficient;
is this because we don't use FPRs in the abort/restart code of libitm
(and the FPRs are known to be clobbered after starting a txn)?

There is a bit in the properties passed to _ITM_beginTransaction which
states whether the txn has floating point update code
(pr_hasNoFloatUpdate).  Would this be useful?

(BTW, sorry for the late response.  I don't read gcc-patches frequently,
so please CC me on libitm-related things.)



Re: [PATCH] S/390: Hardware transactional memory support

2013-08-02 Thread Andreas Krebbel
On 02/08/13 13:31, Torvald Riegel wrote:
 On Fri, 2013-06-21 at 12:23 +0200, Andreas Krebbel wrote:
 Index: libitm/config/s390/target.h
 ===
 *** libitm/config/s390/target.h.orig
 --- libitm/config/s390/target.h
 ***
 
 [...]
 
 + static inline uint32_t
 + htm_begin ()
 + {
 +   return __builtin_tbegin_nofloat (NULL);
 + } 
 
 I'm not quite sure I understand why the nofloat variant is sufficient;
 is this because we don't use FPRs in the abort/restart code of libitm
 (and the FPRs are known to be clobbered after starting a txn)?

Yes. To my understanding there are 2 potential issues when the FPRs are not 
saved.

1. The abort code will read corrupted FPR content for FPRs:
- which are live across tbegin and
- are updated in the TX body and
- are read in the abort code
The abort code will see the updated value from the TX body.

Since libitm implements TX begins as function calls only call-saved registers 
can be live across a
tbegin. But all the call-saved FPRs are saved in _ITM_beginTransaction and get 
restored when doing
the longjmp back into the user code. So this should be no problem.

2. The other problem is that the ABI might get violated when aborting into a 
callee which already
returned and there are FPRs which have been modified afterwards. In that case 
the compiler would not
have generated FPR save/restore instructions in the function prologue/epilogue 
of the callee. But in
fact there are modified FPRs when exiting the second time.

But again this should be no problem thanks to the setjmp/longjmp semantics of 
_ITM_beginTransaction
and GTM_longjmp. If the TX body modified a call-saved FPR it will get restored 
on the abort path
back from libitm to the user code.


As long as libitm does not use FPRs itself this should be safe without having 
tbegin clobbering FPRs.

 There is a bit in the properties passed to _ITM_beginTransaction which
 states whether the txn has floating point update code
 (pr_hasNoFloatUpdate).  Would this be useful?

This probably would be useful in the ITM_beginTransaction / GTM_longjmp 
implementations which in
that case could try to avoid the FPR save/restores. But Richard mentioned that 
these bits so far are
never set by GCC since it is lacking the required infos from the register 
allocator.

 (BTW, sorry for the late response.  I don't read gcc-patches frequently,
 so please CC me on libitm-related things.)

No problem. I'll try to remember this and cc you next time.

Bye,

-Andreas-



Re: [PATCH] S/390: Hardware transactional memory support

2013-08-02 Thread Peter Bergner
On Fri, 2013-08-02 at 15:16 +0200, Andreas Krebbel wrote:
 Since libitm implements TX begins as function calls only call-saved registers 
 can be live across a
 tbegin. But all the call-saved FPRs are saved in _ITM_beginTransaction and 
 get restored when doing
 the longjmp back into the user code. So this should be no problem.

Except that the htm_begin() routines are declared static inline functions,
so when they're inlined, you aren't protected by the semantics of a call
anymore, are you?

Peter





Re: [PATCH] S/390: Hardware transactional memory support

2013-08-02 Thread Andreas Krebbel
On 02/08/13 16:23, Peter Bergner wrote:
 On Fri, 2013-08-02 at 15:16 +0200, Andreas Krebbel wrote:
 Since libitm implements TX begins as function calls only call-saved 
 registers can be live across a
 tbegin. But all the call-saved FPRs are saved in _ITM_beginTransaction and 
 get restored when doing
 the longjmp back into the user code. So this should be no problem.
 
 Except that the htm_begin() routines are declared static inline functions,
 so when they're inlined, you aren't protected by the semantics of a call
 anymore, are you?

They get inlined into libitm code but not into the user code. As I understand 
it in the user code
will always be a call to _ITM_beginTransaction.

-Andreas-

 
 Peter
 
 
 



Re: [PATCH] S/390: Hardware transactional memory support

2013-08-02 Thread Peter Bergner
On Fri, 2013-08-02 at 16:26 +0200, Andreas Krebbel wrote:
 On 02/08/13 16:23, Peter Bergner wrote:
  On Fri, 2013-08-02 at 15:16 +0200, Andreas Krebbel wrote:
  Since libitm implements TX begins as function calls only call-saved 
  registers can be live across a
  tbegin. But all the call-saved FPRs are saved in _ITM_beginTransaction and 
  get restored when doing
  the longjmp back into the user code. So this should be no problem.
  
  Except that the htm_begin() routines are declared static inline functions,
  so when they're inlined, you aren't protected by the semantics of a call
  anymore, are you?
 
 They get inlined into libitm code but not into the user code. As I understand 
 it in the user code
 will always be a call to _ITM_beginTransaction.

Sure, that protects the user code, but what about the libitm code?
From your previous comment:

 As long as libitm does not use FPRs itself this should be safe without
 having tbegin clobbering FPRs.

Is it a given that s390 doesn't use FPRs without explicit use of
floating point types?  I ask, because on POWER, we can and do 
generate floating point code without explicit use of double,
float, etc.  Maybe s390 is safe in that respect.

Peter





Re: [PATCH] S/390: Hardware transactional memory support

2013-08-02 Thread Andreas Krebbel
On 02/08/13 16:36, Peter Bergner wrote:
 On Fri, 2013-08-02 at 16:26 +0200, Andreas Krebbel wrote:
 On 02/08/13 16:23, Peter Bergner wrote:
 As long as libitm does not use FPRs itself this should be safe without
 having tbegin clobbering FPRs.
 
 Is it a given that s390 doesn't use FPRs without explicit use of
 floating point types?  I ask, because on POWER, we can and do 
 generate floating point code without explicit use of double,
 float, etc.  Maybe s390 is safe in that respect.

S/390 used to be on the safe side regarding this but with mainline we started 
using FPRs as GPR save
slots. As Richard suggested we should build libitm with -msoft-float in order 
to prevent this within
libitm. Unfortunately I forgot about this in my S/390 libitm patch. So I'll 
commit the following:


Index: libitm/configure.tgt
===
*** libitm/configure.tgt.orig   2013-07-30 07:42:29.0 +
--- libitm/configure.tgt2013-08-02 14:43:18.641206151 +
*** case ${target_cpu} in
*** 109,115 
ARCH=x86
;;
s390|s390x)
!   XCFLAGS=${XCFLAGS} -mzarch -mhtm
ARCH=s390
;;

--- 109,115 
ARCH=x86
;;
s390|s390x)
!   XCFLAGS=${XCFLAGS} -mzarch -mhtm -msoft-float
ARCH=s390
;;



Re: [PATCH] S/390: Hardware transactional memory support

2013-08-02 Thread Torvald Riegel
On Fri, 2013-08-02 at 15:16 +0200, Andreas Krebbel wrote:
 On 02/08/13 13:31, Torvald Riegel wrote:
  On Fri, 2013-06-21 at 12:23 +0200, Andreas Krebbel wrote:
  Index: libitm/config/s390/target.h
  ===
  *** libitm/config/s390/target.h.orig
  --- libitm/config/s390/target.h
  ***
  
  [...]
  
  + static inline uint32_t
  + htm_begin ()
  + {
  +   return __builtin_tbegin_nofloat (NULL);
  + } 
  
  I'm not quite sure I understand why the nofloat variant is sufficient;
  is this because we don't use FPRs in the abort/restart code of libitm
  (and the FPRs are known to be clobbered after starting a txn)?
 
 Yes. To my understanding there are 2 potential issues when the FPRs are not 
 saved.
 
 1. The abort code will read corrupted FPR content for FPRs:
 - which are live across tbegin and
 - are updated in the TX body and
 - are read in the abort code
 The abort code will see the updated value from the TX body.
 
 Since libitm implements TX begins as function calls only call-saved registers 
 can be live across a
 tbegin. But all the call-saved FPRs are saved in _ITM_beginTransaction and 
 get restored when doing
 the longjmp back into the user code. So this should be no problem.
 
 2. The other problem is that the ABI might get violated when aborting into a 
 callee which already
 returned and there are FPRs which have been modified afterwards. In that case 
 the compiler would not
 have generated FPR save/restore instructions in the function 
 prologue/epilogue of the callee. But in
 fact there are modified FPRs when exiting the second time.
 
 But again this should be no problem thanks to the setjmp/longjmp semantics of 
 _ITM_beginTransaction
 and GTM_longjmp. If the TX body modified a call-saved FPR it will get 
 restored on the abort path
 back from libitm to the user code.
 
 
 As long as libitm does not use FPRs itself this should be safe without having 
 tbegin clobbering FPRs.

And I suppose that even if it would use FPRs, it shouldn't ever access
memory locations touched in user code.

Thanks for the explanation.  Some of this was obvious, sorry, but
that's what happens when a mental context switch is incomplete :)  Next
time, I'd prefer a summary of such reasoning to be added as a comment in
the source; it's just too easy to forget about all the subtleties...

  There is a bit in the properties passed to _ITM_beginTransaction which
  states whether the txn has floating point update code
  (pr_hasNoFloatUpdate).  Would this be useful?
 
 This probably would be useful in the ITM_beginTransaction / GTM_longjmp 
 implementations which in
 that case could try to avoid the FPR save/restores. But Richard mentioned 
 that these bits so far are
 never set by GCC since it is lacking the required infos from the register 
 allocator.

Another thing that comes to mind is that it might be useful to put parts
of the HTM fast path into the asm pieces of _ITM_beginTransaction.  That
way, one can use exactly the amount of SW setjmp one needs given what's
easy to save/restore for the TM; this might decrease the startup costs
of a txn somewhat.  Andi Kleen posted a patch for how to do this for RTM
a while ago (but it had some issues):
http://gcc.gnu.org/ml/gcc-patches/2013-01/msg01189.html

Perhaps something along these lines (while avoiding the issues :) )
would be useful for s/390 and Power too.


Torvald



Re: [PATCH] S/390: Hardware transactional memory support

2013-08-02 Thread Richard Henderson
On 08/02/2013 04:45 AM, Andreas Krebbel wrote:
 !   XCFLAGS=${XCFLAGS} -mzarch -mhtm -msoft-float

Not good, since _ITM_R{F,D,E,CF,CD,CE} should return values in
floating point registers; similarly for the write accessors.


r~


Re: [PATCH] S/390: Hardware transactional memory support

2013-07-16 Thread Peter Bergner
On Tue, 2013-07-16 at 08:08 -0400, David Edelsohn wrote:
 Then I agree that the hunk should be reverted so that HTM does not imply 
 POWER8.

Ok, I'll bootstrap and regtest just to be sure there is no unintended
fallout for such a small path before committing.  Thanks.

Peter