Re: [PATCH] Add faster HTM fastpath for libitm TSX v2
On Tue, 2013-01-29 at 20:19 +0100, Andi Kleen wrote: next time please reply to the points raised in a review if you disagree with them, and don't just ignore them. That speeds up the review. It was all in the previous email on the topic. This v2 patch did not incorporate all the changes that I requested, nor did you explicitly object to those you didn't incorporate. Maybe you don't care what's in libitm's comments, but I do. // See gtm_thread::begin_transaction. -uint32_t GTM::htm_fastpath = 0; +uint32_t GTM::htm_fastpath asm(__gtm_htm_fastpath) = 0; + +uint32_t *GTM::global_lock asm(__gtm_global_lock); Rearrange the serial lock's fields (see below). To my knowledge C++ classes have no guaranteed layout, so that's not safe because there is no guarantee where the vtable pointers are. it would be only with plain old structs. And gtm_rwlock doesn't have any virtual members, right? So it's like a plain old struct (ie, it's a POD type). + // pr_tryHTM can be set by an assembler fast path when it already tried + // a hardware transaction once. In this case we do one retry less. pr_tryHTM is already documented. And I asked you to simply reference this in a comment. What's the problem with that? I do not want people working on libitm to have to grep through the code for uses of pr_tryHTM and look for comments that might be related to it just because you don't want to put a simple reference into the comment at the declaration of pr_tryHTM. Can't you see that adding the reference is just plain useful for everybody else? Torvald
Re: [PATCH] Add faster HTM fastpath for libitm TSX v2
On Thu, 2013-01-24 at 19:30 -0800, Andi Kleen wrote: From: Andi Kleen a...@linux.intel.com The libitm TSX hardware transaction fast path currently does quite a bit of unnecessary work (saving registers etc.) before even trying to start a hardware transaction. This patch moves the initial attempt at a transaction early into the assembler stub. Complicated work like retries is still done in C. So this is essentially a fast path for the fast path. The assembler code doesn't understand the layout of serial_lock, but it needs to check that serial_lock is free. We export just the lock variable as a separate pointer for this. The assembler code controls the HTM fast path with a new pr_tryHTM flag. I had to reorganize the retry loop slightly so that the waiting for the lock to be free again is done first (because the first transaction is already done elsewhere). This makes the patch look larger than it really is. This just moves one basic block around. Passes bootstrap/testsuite on x86_64 v2: Use asm() to supply assembler name of namespaced variables Make comments more verbose. Andi, next time please reply to the points raised in a review if you disagree with them, and don't just ignore them. That speeds up the review. libitm/: 2013-01-24 Andi Kleen a...@linux.intel.com * beginend.cc (GTM::htm_fastpath): Add asm alias to __gtm_htm_fastpath (GTM::global_lock): Add. (begin_transaction): Check pr_tryHTM. Remove one iteration from HTM retry loop. Move lock free check to the beginning of loop body. * config/linux/rwlock.h (gtm_rwlock::get_lock_var): Add. * config/posix/rwlock.h (tm_rwlock::get_lock_var): Add. * config/x86/sjlj.S: Include config.h. (pr_tryHTM, pr_uninstrumentedCode, pr_hasNoAbort, a_runInstrumentedCode, a_runUninstrumentedCode, _XABORT_EXPLICIT, _XABORT_RETRY): Add defines. (_ITM_beginTransaction): Add HTM fast path. * libitm.h (pr_tryHTM): Add. * libitm_i.h (GTM::htm_fastpath): Add asm alias. (GTM::gtm_global_lock): Add. * method-serial.cc (method_group::init, fini): Initialize GTM::global_lock and GTM::htm_fastpath. --- libitm/beginend.cc | 46 +++-- libitm/config/linux/rwlock.h |5 + libitm/config/posix/rwlock.h |5 + libitm/config/x86/sjlj.S | 47 ++ libitm/libitm.h |7 +-- libitm/libitm_i.h|3 ++- libitm/method-serial.cc |2 ++ 7 files changed, 92 insertions(+), 23 deletions(-) diff --git a/libitm/beginend.cc b/libitm/beginend.cc index 4369946..118920b 100644 --- a/libitm/beginend.cc +++ b/libitm/beginend.cc @@ -55,7 +55,9 @@ static pthread_key_t thr_release_key; static pthread_once_t thr_release_once = PTHREAD_ONCE_INIT; // See gtm_thread::begin_transaction. -uint32_t GTM::htm_fastpath = 0; +uint32_t GTM::htm_fastpath asm(__gtm_htm_fastpath) = 0; + +uint32_t *GTM::global_lock asm(__gtm_global_lock); Rearrange the serial lock's fields (see below). /* Allocate a transaction structure. */ void * @@ -187,27 +189,13 @@ GTM::gtm_thread::begin_transaction (uint32_t prop, const gtm_jmpbuf *jb) // indeed in serial mode, and HW transactions should never need serial mode // for any internal changes (e.g., they never abort visibly to the STM code // and thus do not trigger the standard retry handling). Add a comment as second paragraph in the previous block of comments, briefly explaining when pr_tryHTM is set (ie, under which condition), and that this happens in the assembler implementation of _ITM_beginTransaction. - if (likely(htm_fastpath (prop pr_hasNoAbort))) + // pr_tryHTM can be set by an assembler fast path when it already tried + // a hardware transaction once. In this case we do one retry less. + if (likely(prop pr_tryHTM)) { - for (uint32_t t = htm_fastpath; t; t--) + // One transaction has been already tried by the assembler fast path. + for (uint32_t t = htm_fastpath - 1; t; t--) { - uint32_t ret = htm_begin(); - if (htm_begin_success(ret)) - { - // We are executing a transaction now. - // Monitor the writer flag in the serial-mode lock, and abort - // if there is an active or waiting serial-mode transaction. - if (unlikely(serial_lock.is_write_locked())) - htm_abort(); - else - // We do not need to set a_saveLiveVariables because of HTM. - return (prop pr_uninstrumentedCode) ? - a_runUninstrumentedCode : a_runInstrumentedCode; - } - // The transaction has aborted. Don't retry if it's unlikely that - // retrying the transaction will be successful. - if (!htm_abort_should_retry(ret)) - break; //
Re: [PATCH] Add faster HTM fastpath for libitm TSX v2
next time please reply to the points raised in a review if you disagree with them, and don't just ignore them. That speeds up the review. It was all in the previous email on the topic. // See gtm_thread::begin_transaction. -uint32_t GTM::htm_fastpath = 0; +uint32_t GTM::htm_fastpath asm(__gtm_htm_fastpath) = 0; + +uint32_t *GTM::global_lock asm(__gtm_global_lock); Rearrange the serial lock's fields (see below). To my knowledge C++ classes have no guaranteed layout, so that's not safe because there is no guarantee where the vtable pointers are. it would be only with plain old structs. + // pr_tryHTM can be set by an assembler fast path when it already tried + // a hardware transaction once. In this case we do one retry less. pr_tryHTM is already documented. -Andi
Re: [PATCH] Add faster HTM fastpath for libitm TSX v2
Hello! The libitm TSX hardware transaction fast path currently does quite a bit of unnecessary work (saving registers etc.) before even trying to start a hardware transaction. This patch moves the initial attempt at a transaction early into the assembler stub. Complicated work like retries is still done in C. So this is essentially a fast path for the fast path. The assembler code doesn't understand the layout of serial_lock, but it needs to check that serial_lock is free. We export just the lock variable as a separate pointer for this. Probably the attached (RFC) patch can be useful in this case. The patch allows to specify the label for xbegin, so it is possible to implement code like following (non-sensical) example: --cut here-- extern int lock1; extern int lock2; int test () { register unsigned int res asm (eax); __builtin_ia32_xbegin_label (__txn_abort); lock1 = 0; __txn_abort: if (res 0x10) lock2 = 1; return 0; } --cut here-- gcc -O2 -mrtm: test: xbegin .L2 movl$0, lock1(%rip) .L2: testb $16, %al je .L3 movl$1, lock2(%rip) .L3: xorl%eax, %eax ret Please note that the edge from __builtin_ia32_xbegin_label is not known to tree optimizers, so someone familiar with this part of the compiler should enhance/fix the patch. With the (fixed) patch, you can implement your assembly using plain C++, probably also using ifunc relocations. Uros. Index: i386.md === --- i386.md (revision 195460) +++ i386.md (working copy) @@ -18093,6 +18093,17 @@ DONE; }) +(define_expand xbegin_label + [(parallel +[(set (pc) + (if_then_else (ne (unspec [(const_int 0)] UNSPEC_XBEGIN_ABORT) + (const_int 0)) + (match_operand 1) + (pc))) + (set (match_operand:SI 0 register_operand) + (unspec_volatile:SI [(match_dup 0)] UNSPECV_XBEGIN))])] + TARGET_RTM) + (define_insn xbegin_1 [(set (pc) (if_then_else (ne (unspec [(const_int 0)] UNSPEC_XBEGIN_ABORT) Index: i386.c === --- i386.c (revision 195460) +++ i386.c (working copy) @@ -26570,6 +26570,7 @@ enum ix86_builtins /* RTM */ IX86_BUILTIN_XBEGIN, + IX86_BUILTIN_XBEGIN_LABEL, IX86_BUILTIN_XEND, IX86_BUILTIN_XABORT, IX86_BUILTIN_XTEST, @@ -26942,6 +26943,7 @@ static const struct builtin_description bdesc_spec /* RTM */ { OPTION_MASK_ISA_RTM, CODE_FOR_xbegin, __builtin_ia32_xbegin, IX86_BUILTIN_XBEGIN, UNKNOWN, (int) UNSIGNED_FTYPE_VOID }, + { OPTION_MASK_ISA_RTM, CODE_FOR_xbegin_label, __builtin_ia32_xbegin_label, IX86_BUILTIN_XBEGIN_LABEL, UNKNOWN, (int) VOID_FTYPE_PCVOID }, { OPTION_MASK_ISA_RTM, CODE_FOR_xend, __builtin_ia32_xend, IX86_BUILTIN_XEND, UNKNOWN, (int) VOID_FTYPE_VOID }, { OPTION_MASK_ISA_RTM, CODE_FOR_xtest, __builtin_ia32_xtest, IX86_BUILTIN_XTEST, UNKNOWN, (int) INT_FTYPE_VOID }, }; @@ -32239,6 +32241,17 @@ addcarryx: return target; +case IX86_BUILTIN_XBEGIN_LABEL: + arg0 = CALL_EXPR_ARG (exp, 0); + op0 = expand_normal (arg0); + if (GET_CODE (op0) != LABEL_REF) + { + error (the xbegin's argument must be a label); + return const0_rtx; + } + emit_jump_insn (gen_xbegin_label (gen_rtx_REG (SImode, AX_REG), op0)); + return 0; + case IX86_BUILTIN_XABORT: icode = CODE_FOR_xabort; arg0 = CALL_EXPR_ARG (exp, 0);
Re: [PATCH] Add faster HTM fastpath for libitm TSX v2
Probably the attached (RFC) patch can be useful in this case. The patch allows to specify the label for xbegin, so it is possible to implement code like following (non-sensical) example: It can be actually implemented using asm goto. I have some macros for this. And the tree optimizers should even support it. #define XBEGIN(label) \ asm volatile goto(.byte 0xc7,0xf8 ; .long %l0-1f\n1: ::: eax : label) #define XEND()asm volatile(.byte 0x0f,0x01,0xd5) #define XFAIL(label) label: asm volatile( ::: eax) #define XFAIL_STATUS(label, status) label: asm volatile( : =a (status)) But the assembler code is still needed for this because the non TSX path needs to save all registers (it's like a setjmp/longjmp), and that cannot be implemented in C. The goal of the assembler code was not to use the label, but to move the initial transaction before the setjmp code. -Andi
[PATCH] Add faster HTM fastpath for libitm TSX v2
From: Andi Kleen a...@linux.intel.com The libitm TSX hardware transaction fast path currently does quite a bit of unnecessary work (saving registers etc.) before even trying to start a hardware transaction. This patch moves the initial attempt at a transaction early into the assembler stub. Complicated work like retries is still done in C. So this is essentially a fast path for the fast path. The assembler code doesn't understand the layout of serial_lock, but it needs to check that serial_lock is free. We export just the lock variable as a separate pointer for this. The assembler code controls the HTM fast path with a new pr_tryHTM flag. I had to reorganize the retry loop slightly so that the waiting for the lock to be free again is done first (because the first transaction is already done elsewhere). This makes the patch look larger than it really is. This just moves one basic block around. Passes bootstrap/testsuite on x86_64 v2: Use asm() to supply assembler name of namespaced variables Make comments more verbose. libitm/: 2013-01-24 Andi Kleen a...@linux.intel.com * beginend.cc (GTM::htm_fastpath): Add asm alias to __gtm_htm_fastpath (GTM::global_lock): Add. (begin_transaction): Check pr_tryHTM. Remove one iteration from HTM retry loop. Move lock free check to the beginning of loop body. * config/linux/rwlock.h (gtm_rwlock::get_lock_var): Add. * config/posix/rwlock.h (tm_rwlock::get_lock_var): Add. * config/x86/sjlj.S: Include config.h. (pr_tryHTM, pr_uninstrumentedCode, pr_hasNoAbort, a_runInstrumentedCode, a_runUninstrumentedCode, _XABORT_EXPLICIT, _XABORT_RETRY): Add defines. (_ITM_beginTransaction): Add HTM fast path. * libitm.h (pr_tryHTM): Add. * libitm_i.h (GTM::htm_fastpath): Add asm alias. (GTM::gtm_global_lock): Add. * method-serial.cc (method_group::init, fini): Initialize GTM::global_lock and GTM::htm_fastpath. --- libitm/beginend.cc | 46 +++-- libitm/config/linux/rwlock.h |5 + libitm/config/posix/rwlock.h |5 + libitm/config/x86/sjlj.S | 47 ++ libitm/libitm.h |7 +-- libitm/libitm_i.h|3 ++- libitm/method-serial.cc |2 ++ 7 files changed, 92 insertions(+), 23 deletions(-) diff --git a/libitm/beginend.cc b/libitm/beginend.cc index 4369946..118920b 100644 --- a/libitm/beginend.cc +++ b/libitm/beginend.cc @@ -55,7 +55,9 @@ static pthread_key_t thr_release_key; static pthread_once_t thr_release_once = PTHREAD_ONCE_INIT; // See gtm_thread::begin_transaction. -uint32_t GTM::htm_fastpath = 0; +uint32_t GTM::htm_fastpath asm(__gtm_htm_fastpath) = 0; + +uint32_t *GTM::global_lock asm(__gtm_global_lock); /* Allocate a transaction structure. */ void * @@ -187,27 +189,13 @@ GTM::gtm_thread::begin_transaction (uint32_t prop, const gtm_jmpbuf *jb) // indeed in serial mode, and HW transactions should never need serial mode // for any internal changes (e.g., they never abort visibly to the STM code // and thus do not trigger the standard retry handling). - if (likely(htm_fastpath (prop pr_hasNoAbort))) + // pr_tryHTM can be set by an assembler fast path when it already tried + // a hardware transaction once. In this case we do one retry less. + if (likely(prop pr_tryHTM)) { - for (uint32_t t = htm_fastpath; t; t--) + // One transaction has been already tried by the assembler fast path. + for (uint32_t t = htm_fastpath - 1; t; t--) { - uint32_t ret = htm_begin(); - if (htm_begin_success(ret)) - { - // We are executing a transaction now. - // Monitor the writer flag in the serial-mode lock, and abort - // if there is an active or waiting serial-mode transaction. - if (unlikely(serial_lock.is_write_locked())) - htm_abort(); - else - // We do not need to set a_saveLiveVariables because of HTM. - return (prop pr_uninstrumentedCode) ? - a_runUninstrumentedCode : a_runInstrumentedCode; - } - // The transaction has aborted. Don't retry if it's unlikely that - // retrying the transaction will be successful. - if (!htm_abort_should_retry(ret)) - break; // Wait until any concurrent serial-mode transactions have finished. // This is an empty critical section, but won't be elided. if (serial_lock.is_write_locked()) @@ -225,6 +213,24 @@ GTM::gtm_thread::begin_transaction (uint32_t prop, const gtm_jmpbuf *jb) // we have retried so often that we should go serial to avoid // starvation. } + + uint32_t ret = htm_begin(); + if (htm_begin_success(ret)) + { +