Re: [PATCH] libitm/x86: Correct offsets of __private_tm and pointer_guard
On Wed, 2018-05-30 at 07:54 -0700, H.J. Lu wrote: > In glibc, sysdeps/i386/nptl/tls.h has > > typedef struct > { > void *tcb;/* Pointer to the TCB. Not necessarily the >thread descriptor used by libpthread. */ > dtv_t *dtv; > void *self; /* Pointer to the thread descriptor. */ > int multiple_threads; > uintptr_t sysinfo; > uintptr_t stack_guard; > uintptr_t pointer_guard; > int gscope_flag; > int __glibc_reserved1; > /* Reservation of some values for the TM ABI. */ > void *__private_tm[4]; > /* GCC split stack support. */ > void *__private_ss; > } tcbhead_t; > > and sysdeps/x86_64/nptl/tls.h has > > typedef struct > { > void *tcb;/* Pointer to the TCB. Not necessarily the >thread descriptor used by libpthread. */ > dtv_t *dtv; > void *self; /* Pointer to the thread descriptor. */ > int multiple_threads; > int gscope_flag; > uintptr_t sysinfo; > uintptr_t stack_guard; > uintptr_t pointer_guard; > unsigned long int vgetcpu_cache[2]; > int __glibc_reserved1; > int __glibc_unused1; > /* Reservation of some values for the TM ABI. */ > void *__private_tm[4]; > /* GCC split stack support. */ > void *__private_ss; > long int __glibc_reserved2; > /* Must be kept even if it is no longer used by glibc since programs, > like AddressSanitizer, depend on the size of tcbhead_t. */ > __128bits __glibc_unused2[8][4] __attribute__ ((aligned (32))); > > void *__padding[8]; > } tcbhead_t; > > The offsets of __private_tm are > > i386: 36 bytes > x32:48 bytes > x86_64: 80 bytes > > and the offsets of pointer_guard are: > > i386: 24 bytes > x32:28 bytes > x86_64: 48 bytes > > Update SEG_READ and SEG_WRITE to use the offset of __private_tm as base > and correct the offset of pointer_guard for x32. > > Tested on i686, x86-64 and x32. OK for trunk and release branches? The patch itself looks okay to me, but the commit message is pretty cryptic. I can't ack for release branches.
Re: [PATCH 0/5] Make std::future::wait_* use std::chrono::steady_clock when required
On Sun, 2018-01-07 at 20:55 +, Mike Crowe wrote: > This patch series was originally submitted back in September at > https://gcc.gnu.org/ml/libstdc++/2017-09/msg00083.html which ended up > as https://patchwork.ozlabs.org/cover/817379/ . The patches received > no comments at all, which may mean that they are perfect or that they > are so bad that no-one knew where to start with criticising them. It > would be good to know which. :) > > The patches are unchanged from last time, but I have corrected a typo > in one of the commit messages. The original cover message follows. > > This is a first attempt to make std::future::wait_until and > std::future::wait_for make correct use of > std::chrono::steady_clock/CLOCK_MONOTONIC. It also makes > std::future::wait_until react to changes to CLOCK_REALTIME during the > wait, but only when passed a std::chrono::system_clock time point. I have comments on the design. First, I don't think we should not change __atomic_futex_unsigned_base::_M_futex_wait_until, as there's a risk that we'll change behavior of existing applications that work as expected. Instead, ISTM we should additionally expose the two options we have at the level of futexes: * Relative timeout using CLOCK_MONOTONIC * Absolute timeout using CLOCK_REALTIME (which will fall back to the former on old kernels, which is fine I think). Then we do the following translations from functions that programs would call to the new futex functions: 1) wait_for is a loop in which we load the current time from the steady clock, then call the relative futex wait, and if that returns for a spurious reason (ie, neither timeout nor is the expected value present), we reduce the prior relative amount by the difference between the time before the futex wait and the current time. 2) wait_until using the steady clock is a loop similar to wait_for, just that we additionally compute the initial relative timeout. 3) wait_until using the system clock is a loop that uses absolute-timeout futex wait. 4) For wait_until using an unknown clock, I'd say that synching to the system clock is the right approach. Using wait_until indicates that the programmer wanted to have a point in time, not a duration. Does this work for you? If so, could you provide a revised patch that uses this approach and includes this approach in the documentation? (Sorry for the lack of comments in the current code).
Re: [PATCH 2/5] libstdc++ futex: Use FUTEX_CLOCK_REALTIME for wait
On Tue, 2018-01-09 at 17:54 +, Mike Crowe wrote: > On Tuesday 09 January 2018 at 13:50:54 +, Jonathan Wakely wrote: > > On 07/01/18 20:55 +, Mike Crowe wrote: > > > The futex system call supports waiting for an absolute time if > > > FUTEX_WAIT_BITSET is used rather than FUTEX_WAIT. Doing so provides two > > > benefits: > > > > > > 1. The call to gettimeofday is not required in order to calculate a > > > relative timeout. > > > > > > 2. If someone changes the system clock during the wait then the futex > > > timeout will correctly expire earlier or later. Currently that only > > > happens if the clock is changed prior to the call to gettimeofday. > > > > > > According to futex(2), support for FUTEX_CLOCK_REALTIME was added in the > > > v2.6.28 Linux kernel and FUTEX_WAIT_BITSET was added in v2.6.25. There is > > > no attempt to detect the kernel version and fall back to the previous > > > method. > > > > I don't think we can require a specific kernel version just for this. > > What is the minimum kernel version that libstdc++ requires? Since it's > already relying on NPTL it can't go back earlier than v2.6, but I suppose > that's a while before v2.6.28. I'm not aware of any choice regarding this, but Jonathan will know for sure. Generally, I think choosing a minium kernel version might be helpful, in particular if we want to optimize more often specifically for Linux environments; this may become more worthwhile in the future, for example when we look at new C++ features such as parallel algorithms, or upcoming executors. The gthreads abstraction may is a nice goal, but we can benefit a lot from knowing what the underlying platform really is. Another option might be to require a minimum glibc version on Linux, and build libstdc++ for that. That would yield a minimum kernel version as well, and we may can make use of other things in return such as syscall wrappers. > > What happens if you use those bits on an older kernel, will there be > > an ENOSYS error? Because that would allow us to try the new form, and > > fallback to the old. > > Glibc's nptl-init.c calls > > futex(.., FUTEX_WAIT_BITSET | FUTEX_CLOCK_REALTIME | FUTEX_PRIVATE_FLAG, ..) > > and sets __have_futex_clock_realtime based on whether it gets ENOSYS back > or not so it looks like it is possible to determine whether support is > available. > > The only such check I can find in libstdc++ is in filesystem/std-ops.cc > fs::do_copy_file which can try sendfile(2) on each invocation but then > automatically falls back to copying by steam. In that case, the overhead of > the potentially-failing sendfile system call is small compared to copying > the file. > > Doing such a check in _M_futex_wait_until means one system call if > FUTEX_CLOCK_REALTIME is supported, but three system calls if it is not > supported. If we found a way to cache the answer in a thread-safe and cheap > manner then this could be avoided. > > Do you think it's worth trying to cache whether FUTEX_CLOCK_REALTIME is > available? It may be worth caching that, given how simple this can be: Just add a global atomic variable whose initial value means trying the most recent syscall op, and set that to some other value that indicates an older kernel. Then check the value before attempting the syscall. Can all be relaxed atomic accesses because it's essentially just a best-effort optimization. Performance-wise, the trade-off is between an additional atomic load for new kernels vs. one additional syscall for older kernels. Given that we need the fallback for old kernels anyway unless we select a minimum version, I guess doing the caching makes sense. The syscalls are on the slow paths anyway.
Re: [PATCH] Fix __atomic to not implement atomic loads with CAS.
On Fri, 2017-02-03 at 13:44 +, Ramana Radhakrishnan wrote: > __atomic_load on ARM appears to be ok as well > > except for > > __atomic_load_di which should really be the ldrexd / strexd loop but we > could ameliorate that similar to your option 3b. This uses just ldrexd now, and thus is not guaranteed to be atomic? > On AArch64 > > * <16 byte loads have always been fine. The architecture allows single > copy atomic loads using single load instructions for all other sizes and > memory models, so we are fine there. > > * we have gone through the libatomic locks from day one of the port for > 16 byte loads. This has been a bit of a bugbear for a number of users > within ARM who would really like to get performance without heavy weight > locks for 16 byte atomic ops. Would it be acceptable for those users to have loads that perform like CAS loops, especially under contention? Or are these users more concerned about aarch64 not offering a true atomic 16-byte load? > We could never switch this around on AArch64 to using the loop (or CAS) > by default as this would be the reverse issue (i.e. old compilers > calling libatomic locks and new code using the inlined instruction > sequence). I think there's an implicit assumption that whenever one uses code generated with a particular compiler, the libatomic version being used at runtime is at least as recent as that particular compiler. In a mix of old and new code, this would mean that new libatomic has to be used. However, AFAIK we haven't specified that explicitly yet. (Richard Henderson said we'd make similar implicit assumptions for libgcc, if I understood him correctly.) If that assumption holds, then switching from locks in libatomic to inlined instructions is possible, provided that libatomic switches too. > > My interest in this patch was piqued because if we were to switch > aarch64 to not use the locks in libatomic and go to a CAS or the loop > sequence I showed in my reply to Jakub, I don't believe we have an ABI > issue as I would expect there to be a single copy of libatomic on the > system and everyone would simply be using that. Right, and it would have to be at least as recent as the most recent compiler being used (eg, the system compiler on that system). > However we would end up with the situation of generating stores for > __atomic_loads as you described above. > > We could still in theory do that and gain the same bug for rodata and > volatile atomic pointers on AArch64 - I don't see a way of avoiding that > on aarch32 as we have a similar ABI issue to you. Agreed. > > > The patch introduces a can_atomic_load_p, which checks that an entry > > exists in optabs and the machine descriptions for an atomic load of the > > particular mode. > > > > IIRC, arm and aarch64 had atomic load patterns defined whenever they had > > a CAS defined. It would be nice if you could double-check that. If > > that's the case, nothing should change with my patch because > > can_atomic_load_p would always return true if a CAS could be issued. > > If that's not the case, arm/aarch64 could be in the same boat as x86_64 > > with cmpxchg16b; then, using Option 3b might be a useful (intermediary) > > solution for ARM as well (OTOH, compatibility with existing code and > > __sync builtins might perhaps not be as much of a factor as it might be > > on x86_64). > > On AArch64 IIRC only those instructions that are single copy atomic as > per the architecture are allowed to be atomic loads. Thus we do not > generate such loops anywhere. I think the condition we'd have to check is rather that whenever there is an atomic CAS available, there is also an atomic load available. That is not quite the same as having only truly atomic loads available, because then a CAS could still be available regardless of the availability of a matching load.
Re: [PATCH] Fix __atomic to not implement atomic loads with CAS.
On Thu, 2017-02-02 at 13:58 +0100, Thomas Schwinge wrote: > > The other failures I saw didn't seem atomics related > > (eg, openacc) > > I suppose you're testing without nvptx offloading -- which failures do > you see for OpenACC testing? (There shouldn't be any for host fallback > testing.) Sorry, false alarm. Those tests print out "FAILED", but they don't actually fail. > > I haven't compared it against a full bootstrap build and > > make check of trunk.). > > I just happened to do that for x86_64-pc-linux-gnu, and I'm seeing the > following additional changes; posting this just in case that's not > expected to happen: > > -PASS: gcc.dg/atomic-compare-exchange-5.c (test for excess errors) > -PASS: gcc.dg/atomic-compare-exchange-5.c execution test > +UNSUPPORTED: gcc.dg/atomic-compare-exchange-5.c Those changes are expected because we don't expect anymore that cmpxchg16b availability also means that 16-byte atomics will be inlined.
Re: [PATCH] Fix __atomic to not implement atomic loads with CAS.
On Thu, 2017-02-02 at 14:48 +, Ramana Radhakrishnan wrote: > On 30/01/17 18:54, Torvald Riegel wrote: > > This patch fixes the __atomic builtins to not implement supposedly > > lock-free atomic loads based on just a compare-and-swap operation. > > > > If there is no hardware-backed atomic load for a certain memory > > location, the current implementation can implement the load with a CAS > > while claiming that the access is lock-free. This is a bug in the cases > > of volatile atomic loads and atomic loads to read-only-mapped memory; it > > also creates a lot of contention in case of concurrent atomic loads, > > which results in at least counter-intuitive performance because most > > users probably understand "lock-free" to mean hardware-backed (and thus > > "fast") instead of just in the progress-criteria sense. > > > > This patch implements option 3b of the choices described here: > > https://gcc.gnu.org/ml/gcc/2017-01/msg00167.html > > > Will Deacon pointed me at this thread asking if something similar could > be done on ARM. It would be nice if someone more familiar with ARM could double-check that ARM is not affected. I guess ARM isn't, but that's based on me looking at machine descriptions, which I hadn't ever done before working on this patch... The patch introduces a can_atomic_load_p, which checks that an entry exists in optabs and the machine descriptions for an atomic load of the particular mode. IIRC, arm and aarch64 had atomic load patterns defined whenever they had a CAS defined. It would be nice if you could double-check that. If that's the case, nothing should change with my patch because can_atomic_load_p would always return true if a CAS could be issued. If that's not the case, arm/aarch64 could be in the same boat as x86_64 with cmpxchg16b; then, using Option 3b might be a useful (intermediary) solution for ARM as well (OTOH, compatibility with existing code and __sync builtins might perhaps not be as much of a factor as it might be on x86_64). The atomic load patterns you have could still be wrong, for example by generating a LDXP/STXP loop or something else that can store on an atomic load. In that case, the problem is similar as not having a custom load pattern, and the second case in the previous paragraph applies. > On armv8-a we can implement an atomic load of 16 bytes using an LDXP / > STXP loop as a 16 byte load isnt' single copy atomic. On armv8.1-a we do > have a CAS on 16 bytes. > > This was discussed last year here. > > https://gcc.gnu.org/ml/gcc/2016-06/msg00017.html > > and the consensus seemed to be that we couldn't really do a CAS on > readonly data as we would have no way of avoiding a trap. Yes, and I agree that not doing a CAS (or anything that can store on atomic load) is the right thing to do. BTW, on the ARM targets, is it possible that the __sync builtins would use LDXP/STXP on 16 byte and the __atomic loads would fall back to using libatomic and locks? Or do you simply not provide 16-byte __sync builtins? > I'm sure I'm missing something piece of the puzzle, so I'd be interested > in how you avoid this in this implementation on x86_64 with what I can > see is a CAS. If we'd start from scratch, we wouldn't use cmpxchg16b if we don't have a 16-byte atomic load. However, we did do that, and there might be code out there that does have inlined cmpxchg16b. As discussed in the thread you cited, changing GCC 7 back to libatomics *and* the use of locks would be an effective ABI break. Therefore, my patch (and Option 3b) makes a compromise and delegates to libatomic but libatomic gets custom code to actually keep using cmpxchg16b. That means that the actual synchronization doesn't change, but we constrain this problem to libatomic and prevent application code generated with GCC 7 from being directly affected. Does this clarify the x86_64 situation?
Re: [PATCH] Fix __atomic to not implement atomic loads with CAS.
On Mon, 2017-01-30 at 19:54 +0100, Torvald Riegel wrote: > This patch fixes the __atomic builtins to not implement supposedly > lock-free atomic loads based on just a compare-and-swap operation. After an off-list OK by Jakub, I have committed this as r245098. Jakub will take care of the OpenMP side in a follow-up patch.
[PATCH] Fix __atomic to not implement atomic loads with CAS.
This patch fixes the __atomic builtins to not implement supposedly lock-free atomic loads based on just a compare-and-swap operation. If there is no hardware-backed atomic load for a certain memory location, the current implementation can implement the load with a CAS while claiming that the access is lock-free. This is a bug in the cases of volatile atomic loads and atomic loads to read-only-mapped memory; it also creates a lot of contention in case of concurrent atomic loads, which results in at least counter-intuitive performance because most users probably understand "lock-free" to mean hardware-backed (and thus "fast") instead of just in the progress-criteria sense. This patch implements option 3b of the choices described here: https://gcc.gnu.org/ml/gcc/2017-01/msg00167.html In a nutshell, this does change affected accesses to call libatomic instead of inlining the CAS-based load emulation -- but libatomic will continue to do the CAS-based approach. Thus, there's no change in how the changes are actually performed (including compatibility with the __sync builtins, which are not changed) -- but we do make it much easier to fix this in the future, and we do ensure that less future code will have the problematic code inlined into it (and thus unfixable). Second, the return of __atomic_always_lock_free and __atomic_is_lock_free are changed to report false for the affected accesses. This aims at making users aware of the fact that they don't get fast hardware-backed performance for the affected accesses. This patch does not yet change how OpenMP atomics support is implemented. Jakub said he would take care of that. I suppose one approach would be to check can_atomic_load_p (introduced by this patch) to decide in expand_omp_atomic whether to just use the mutex-based approach; I think that conservatively, OpenMP atomics have to assume that there could be an atomic load to a particular memory location elsewhere in the program, so the choice between mutex-backed or not has to be consistent for a particular memory location. Thanks to Richard Henderson for providing the libatomic part of this patch, and thanks to Andrew MacLeod for helping with the compiler parts. I've tested this on an x86_64-linux bootstrap build and see no regressions. (With the exception of two OpenMP failures, which Jakub will take care of. The other failures I saw didn't seem atomics related (eg, openacc); I haven't compared it against a full bootstrap build and make check of trunk.). AFAICT, in practice only x86 with -mcx16 (or where this is implicitly implied) is affected. The other architectures I looked at seemed to have patterns for true hardware-backed atomic loads whenever they also had a wider-than-word-sized CAS available. I know we missed the stage3 deadline with this, unfortunately. I think it would be good to have this fix be part of GCC 7 though, because this would ensure that less future code has the problematic load-via-CAS code inlined. Jakub: If this is OK for GCC 7, can you please take care of the OpenMP bits and commit this? Changelog entries are in the commit message. If others could test on other hardware, this would also be appreciated. commit 1db13cb386e673d5265bcaf2d70fc25dda22e5fd Author: Torvald Riegel <trie...@redhat.com> Date: Fri Jan 27 17:40:44 2017 +0100 Fix __atomic to not implement atomic loads with CAS. gcc/ * builtins.c (fold_builtin_atomic_always_lock_free): Make "lock-free" conditional on existance of a fast atomic load. * optabs-query.c (can_atomic_load_p): New function. * optabs-query.h (can_atomic_load_p): Declare it. * optabs.c (expand_atomic_exchange): Always delegate to libatomic if no fast atomic load is available for the particular size of access. (expand_atomic_compare_and_swap): Likewise. (expand_atomic_load): Likewise. (expand_atomic_store): Likewise. (expand_atomic_fetch_op): Likewise. * testsuite/lib/target-supports.exp (check_effective_target_sync_int_128): Remove x86 because it provides no fast atomic load. (check_effective_target_sync_int_128_runtime): Likewise. libatomic/ * acinclude.m4: Add #define FAST_ATOMIC_LDST_*. * auto-config.h.in: Regenerate. * config/x86/host-config.h (FAST_ATOMIC_LDST_16): Define to 0. (atomic_compare_exchange_n): New. * glfree.c (EXACT, LARGER): Change condition and add comments. diff --git a/gcc/builtins.c b/gcc/builtins.c index bf68e31..0a0e8b9 100644 --- a/gcc/builtins.c +++ b/gcc/builtins.c @@ -6157,8 +6157,9 @@ fold_builtin_atomic_always_lock_free (tree arg0, tree arg1) /* Check if a compare_and_swap pattern exists for the mode which represents the required size. The pattern is not allowed to fail, so the existence - of the pattern indicates support is present. */ - if (can_compare_and_swap_p (mode, true)) + of the pattern indicates support is present. Al
[committed] libitm: Disable TSX on processors on which it may be broken.
This fix follows the same approach that glibc uses to disable TSX on processors on which it is broken. TSX can also be disabled through a microcode update on these processors, but glibc consensus is that it cannot be detected reliably whether the microcode update has been applied. Thus, we just look for affected models/steppings. Tested on x86_64-linux (but I don't have a machine with broken TSX available). libitm/ChangeLog * config/x86/target.h (htm_available): Add check for some processors on which TSX is broken. commit ff5b36bbb136d73e673d8cbe234a4c070c3d72ee Author: torvaldDate: Wed Jan 18 20:22:02 2017 + libitm: Disable TSX on processors on which it may be broken. libitm/ChangeLog * config/x86/target.h (htm_available): Add check for some processors on which TSX is broken. git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@244594 138bc75d-0d04-0410-961f-82ee72b054a4 diff --git a/libitm/config/x86/target.h b/libitm/config/x86/target.h index 8d0a0da..665c7d6 100644 --- a/libitm/config/x86/target.h +++ b/libitm/config/x86/target.h @@ -78,6 +78,28 @@ htm_available () if (__get_cpuid_max (0, NULL) >= 7) { unsigned a, b, c, d; + /* TSX is broken on some processors. This can be fixed by microcode, + but we cannot reliably detect whether the microcode has been + updated. Therefore, do not report availability of TSX on these + processors. We use the same approach here as in glibc (see + https://sourceware.org/ml/libc-alpha/2016-12/msg00470.html). */ + __cpuid (0, a, b, c, d); + if (b == 0x756e6547 && c == 0x6c65746e && d == 0x49656e69) + { + __cpuid (1, a, b, c, d); + if (((a >> 8) & 0x0f) == 0x06) // Family. + { + unsigned model = ((a >> 4) & 0x0f) // Model. + + ((a >> 12) & 0xf0); // Extended model. + unsigned stepping = a & 0x0f; + if ((model == 0x3c) + || (model == 0x45) + || (model == 0x46) + /* Xeon E7 v3 has correct TSX if stepping >= 4. */ + || ((model == 0x3f) && (stepping < 4))) + return false; + } + } __cpuid_count (7, 0, a, b, c, d); if (b & cpuid_rtm) return true;
Re: [gofrontend-dev] Go patch committed: Avoid warning by using a local var for std::ofstream
On Wed, 2016-09-21 at 16:56 +0200, Florian Weimer wrote: > Torvald, would it be possible to align mutexes internally on hppa, to > avoid the 16-byte alignment of the entire struct (that is, store a > pointer to the actual mutex object, which points to a sub-region of > the struct which is suitably aligned)? I think this should be possible. The int used for the actual lock is first in the structure, followed by two more int's that we could move to the end of the structure (which is just padding currently). That would allow us to shift the lock member by 8 bytes on demand at runtime.
Re: [PATCH] Partially improve scalability of the unwinder (PR libgcc/71744)
On Thu, 2016-09-15 at 06:05 -0700, Ian Lance Taylor wrote: > Jakub Jelinekwrites: > > > 2016-09-15 Jakub Jelinek > > > > PR libgcc/71744 > > * unwind-dw2-fde.c (ATOMIC_FDE_FAST_PATH): Define if __register_frame* > > is not the primary registry and atomics are available. > > (any_objects_registered): New variable. > > (__register_frame_info_bases, __register_frame_info_table_bases): > > Atomically store 1 to any_objects_registered after registering first > > unwind info. > > (_Unwind_Find_FDE): Return early if any_objects_registered is 0. > > This is OK. In glibc, we have a rule that we add sufficient code comments for glibc. I'm not in the position to set rules for GCC, but I think this would help here too. Trying to explain in comments why a certain memory order is used would have at least brought up the issue you mention below. > > +#ifdef ATOMIC_FDE_FAST_PATH > > + /* For targets where unwind info is usually not registered through these > > + APIs anymore, avoid taking a global lock. */ > > + if (__builtin_expect (!__atomic_load_n (_objects_registered, > > + __ATOMIC_ACQUIRE), 1)) > > +return NULL; > > +#endif > > + > >init_object_mutex_once (); > >__gthread_mutex_lock (_mutex); > > I doubt it matters, but I don't think you need to use __ATOMIC_ACQUIRE > in the atomic_load_n. You could use __ATOMIC_RELAXED. Acquiring the > mutex is going to enforce cross-thread sequential consistency anyhow. But then the release MO wouldn't be required either. The __gthread_mutex_lock call has no synchronizes-with relation with the release MO stores the patch adds, if that was what you were assuming. Note that in the C11 / C++11 memory model, lock acquisitions are not considered to automatically also be acquire MO fences. This may be the case on many archs, but it's not something the memory model guarantees.
Re: [PATCH 3/4][PR 71931] Fix libitm tests
On Wed, 2016-08-24 at 20:08 +0100, Szabolcs Nagy wrote: > Pass build time CC make var down to dejagnu so the sysroot > is set correctly when gcc is built with --with-build-sysroot. > > libitm/ > 2016-08-24 Szabolcs Nagy> > PR testsuite/71931 > * configure.ac: Add AC_CONFIG_FILES. > * configure: Regenerated. > * testuite/Makefile.am: Add rule for libitm-test-support.exp. > * testuite/Makefile.in: Regenerated. > * testuite/libitm-test-support.exp.in: New. > * testuite/lib/libitm.exp (libitm_init): Use BUILD_CC. > I don't know enough about the build system to really review this. If a similar patch has been ACKed and applied for libatomic (71931 states that both are affected), then I guess this is OK?
Re: [PATCH] PR libitm/70456: Allocate aligned memory in gtm_thread operator new
On Sat, 2016-04-02 at 09:25 -0700, H.J. Lu wrote: > On Wed, Mar 30, 2016 at 5:34 AM, H.J. Luwrote: > > Since GTM::gtm_thread has > > > > gtm_thread *next_thread __attribute__((__aligned__(HW_CACHELINE_SIZE))); > > > > GTM::gtm_thread::operator new should allocate aligned memory. > > > > Tested on Linux/x86-64. OK for trunk. > > > > > > This patch is better. Tested on Linux/x86-64. OK for trunk? OK.
Re: [PING 2, PATCH] libitm: Introduce target macro TARGET_BEGIN_TRANSACTION_ATTRIBUTE.
On Wed, 2016-03-02 at 07:36 +0100, Dominik Vogt wrote: > On Mon, Feb 01, 2016 at 02:18:48PM +0100, Dominik Vogt wrote: > > The attached patch adds the a target specific attribute via the > > new target macro TARGET_BEGIN_TRANSACTION_ATTRIBUTE to the > > function begin_transaction(). S/390 uses this to set the > > soft-float target attribute which is needed to fix a crash with > > -m31. > > > > As there seems to be no place in libitm to document internal macros like > > USE_HTM_FASTPATH or the new macro, I've put the documentation in a > > comment where the macro is used. > > Please note that we're waiting for approval of the libitm > maintainer, not for approval of the s390 maintainer. LGTM. Sorry for missing this patch. Please CC me on libitm patches if I don't respond quickly. Thanks.
Re: [PATCH] libitm: Fix HTM fastpath.
On Fri, 2016-01-08 at 12:07 +0100, Torvald Riegel wrote: > This patch fixes a thinko in the HTM fastpath implementation. In a > nutshell, we also need to monitor the HTM fastpath control (ie, > htm_fastpath) variable from within a HW transaction on the HTM fastpath, > so that such a HW transaciton will only execute if the HTM hastpath is > still enabled. > > We move htm_fastpath into the serial lock so that a HW transaction only > needs one cacheline of HTM capacity to monitor both htm_fastpath and > check that no non-HW-transaction is currently running. > > Tested on x86_64-linux. > > 2016-01-08 Torvald Riegel <trie...@redhat.com> > > * beginend.cc (GTM::gtm_thread::serial_lock): Put on cacheline > boundary. > (htm_fastpath): Remove. > (gtm_thread::begin_transaction): Fix HTM fastpath. > (_ITM_commitTransaction): Adapt. > (_ITM_commitTransactionEH): Adapt. > * libitm/config/linux/rwlock.h (gtm_rwlock): Add htm_fastpath member > and accessors. > * libitm/config/posix/rwlock.h (gtm_rwlock): Likewise. > * libitm/config/posix/rwlock.cc (gtm_rwlock::gtm_rwlock): Adapt. > * libitm/config/x86/sjlj.S (_ITM_beginTransaction): Fix HTM fastpath. > * libitm/libitm_i.h (htm_fastpath): Remove declaration. > * libitm/method-serial.cc (htm_mg): Adapt. > (gtm_thread::serialirr_mode): Adapt. > * libitm/query.cc (_ITM_inTransaction, _ITM_getTransactionId): Adapt. I have committed the attached patch (a minor rebase compared to the prior one) after offline approval by Jakub Jelinek. He tested on PPC as well, and the patch fixed the problem we saw there. commit 51a5f38f0228ed7e4772bf1d0439f93ab4ffcf23 Author: torvald <torvald@138bc75d-0d04-0410-961f-82ee72b054a4> Date: Fri Jan 22 16:13:06 2016 + libitm: Fix HTM fastpath. * beginend.cc (GTM::gtm_thread::serial_lock): Put on cacheline boundary. (htm_fastpath): Remove. (gtm_thread::begin_transaction): Fix HTM fastpath. (_ITM_commitTransaction): Adapt. (_ITM_commitTransactionEH): Adapt. * libitm/config/linux/rwlock.h (gtm_rwlock): Add htm_fastpath member and accessors. * libitm/config/posix/rwlock.h (gtm_rwlock): Likewise. * libitm/config/posix/rwlock.cc (gtm_rwlock::gtm_rwlock): Adapt. * libitm/config/x86/sjlj.S (_ITM_beginTransaction): Fix HTM fastpath. * libitm/libitm_i.h (htm_fastpath): Remove declaration. * libitm/method-serial.cc (htm_mg): Adapt. (gtm_thread::serialirr_mode): Adapt. * libitm/query.cc (_ITM_inTransaction, _ITM_getTransactionId): Adapt. git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@232735 138bc75d-0d04-0410-961f-82ee72b054a4 diff --git a/libitm/beginend.cc b/libitm/beginend.cc index 00d28f4..1a258ad 100644 --- a/libitm/beginend.cc +++ b/libitm/beginend.cc @@ -32,7 +32,11 @@ using namespace GTM; extern __thread gtm_thread_tls _gtm_thr_tls; #endif -gtm_rwlock GTM::gtm_thread::serial_lock; +// Put this at the start of a cacheline so that serial_lock's writers and +// htm_fastpath fields are on the same cacheline, so that HW transactions +// only have to pay one cacheline capacity to monitor both. +gtm_rwlock GTM::gtm_thread::serial_lock + __attribute__((aligned(HW_CACHELINE_SIZE))); gtm_thread *GTM::gtm_thread::list_of_threads = 0; unsigned GTM::gtm_thread::number_of_threads = 0; @@ -51,9 +55,6 @@ static pthread_mutex_t global_tid_lock = PTHREAD_MUTEX_INITIALIZER; static pthread_key_t thr_release_key; static pthread_once_t thr_release_once = PTHREAD_ONCE_INIT; -// See gtm_thread::begin_transaction. -uint32_t GTM::htm_fastpath = 0; - /* Allocate a transaction structure. */ void * GTM::gtm_thread::operator new (size_t s) @@ -173,9 +174,11 @@ GTM::gtm_thread::begin_transaction (uint32_t prop, const gtm_jmpbuf *jb) // lock's writer flag and thus abort if another thread is or becomes a // serial transaction. Therefore, if the fastpath is enabled, then a // transaction is not executing as a HW transaction iff the serial lock is - // write-locked. This allows us to use htm_fastpath and the serial lock's - // writer flag to reliable determine whether the current thread runs a HW - // transaction, and thus we do not need to maintain this information in + // write-locked. Also, HW transactions monitor the fastpath control + // variable, so that they will only execute if dispatch_htm is still the + // current method group. This allows us to use htm_fastpath and the serial + // lock's writers flag to reliable determine whether the current thread runs + // a HW transaction, and thus we do not need to maintain this information in // per-thread state. // If an uninstrumented code path is not available, we can still run // instrumented code from a HW transaction because the HTM fastpath kicks @@ -187,9 +190,14 @@ GTM::gtm_thread::begin
Re: [PATCH v2] libstdc++: Make certain exceptions transaction_safe.
On Thu, 2016-01-21 at 11:00 +0100, Dominique d'Humières wrote: > Torvald, > > Now that I can bootstrap on darwin, I have found the following failure for > libitm.c++/libstdc++-safeexc.C > > /opt/gcc/work/libitm/testsuite/libitm.c++/libstdc++-safeexc.C:50:2: error: > unsafe function call 'std::underflow_error::underflow_error(const string&)' > within atomic transaction > throw T (what); > ^ Well, yes, that's my oversight. The previous fix disabled the support, so we need to now xfail or disable this test on Darwin. Same for AIX. Ignore these failures for now. I'll work on a fix.
Re: [PATCH v2] libstdc++: Make certain exceptions transaction_safe.
On Thu, 2016-01-21 at 11:00 +0100, Dominique d'Humières wrote: > Torvald, > > Now that I can bootstrap on darwin, I have found the following failure for > libitm.c++/libstdc++-safeexc.C > > /opt/gcc/work/libitm/testsuite/libitm.c++/libstdc++-safeexc.C:50:2: error: > unsafe function call 'std::underflow_error::underflow_error(const string&)' > within atomic transaction > throw T (what); > ^ Does this patch fix it (ie, mark the test unsupported)? commit 259c0cf27d0a88eecc90af1aa500f88f6108cb04 Author: Torvald Riegel <trie...@redhat.com> Date: Thu Jan 21 16:21:33 2016 +0100 libitm: Disable testing transaction-safe exceptions on Darwin and AIX. * testsuite/libitm.c++/libstdc++-safeexc.C: Not supported on darwin or AIX. diff --git a/libitm/testsuite/libitm.c++/libstdc++-safeexc.C b/libitm/testsuite/libitm.c++/libstdc++-safeexc.C index 3e1655e..55ebd25 100644 --- a/libitm/testsuite/libitm.c++/libstdc++-safeexc.C +++ b/libitm/testsuite/libitm.c++/libstdc++-safeexc.C @@ -2,7 +2,10 @@ // are indeed that. Thus, this also tests the transactional clones in // libstdc++ and libsupc++. -// { dg-do run } +// Not supported on Darwin nor AIX because those lack the support for +// weak references to undefined functions that we need in libstdc++ to make +// exceptions transaction-safe. +// { dg-do run { target { ! *-*-darwin* powerpc-ibm-aix* } } } #include #include
[PATCH] libitm: Disable testing transaction-safe exceptions on Darwin and AIX.
On Thu, 2016-01-21 at 10:06 -0800, Mike Stump wrote: > On Jan 21, 2016, at 9:29 AM, Dominique d'Humières <domi...@lps.ens.fr> wrote: > > // { dg-do run { target { ! { *-*-darwin* powerpc-ibm-aix* } } } } > > A comment to hint that this has something to do with weak undefined would be > nice. Here's the patch I prepared (which indeed includes a comment). OK for trunk? I'm not quite sure whether this qualifies as a regression, but having an additional test that now fails is one I guess. commit 0323fed14832e5744cbc63bcfeeb6728f7f13394 Author: Torvald Riegel <trie...@redhat.com> Date: Thu Jan 21 16:21:33 2016 +0100 libitm: Disable testing transaction-safe exceptions on Darwin and AIX. * testsuite/libitm.c++/libstdc++-safeexc.C: Not supported on darwin or AIX. diff --git a/libitm/testsuite/libitm.c++/libstdc++-safeexc.C b/libitm/testsuite/libitm.c++/libstdc++-safeexc.C index 3e1655e..20e2e5e 100644 --- a/libitm/testsuite/libitm.c++/libstdc++-safeexc.C +++ b/libitm/testsuite/libitm.c++/libstdc++-safeexc.C @@ -2,7 +2,10 @@ // are indeed that. Thus, this also tests the transactional clones in // libstdc++ and libsupc++. -// { dg-do run } +// Not supported on Darwin nor AIX because those lack the support for +// weak references to undefined functions that we need in libstdc++ to make +// exceptions transaction-safe. +// { dg-do run { target { ! { *-*-darwin* powerpc-ibm-aix* } } } } #include #include
Re: [PATCH v2] libstdc++: Make certain exceptions transaction_safe.
On Thu, 2016-01-21 at 18:12 +, Pedro Alves wrote: > On 01/21/2016 06:06 PM, Mike Stump wrote: > > On Jan 21, 2016, at 9:29 AM, Dominique d'Humières> > wrote: > >> // { dg-do run { target { ! { *-*-darwin* powerpc-ibm-aix* } } } } > > > > A comment to hint that this has something to do with weak undefined would > > be nice. > > > > Or come up with some new "dg-require-effective-target weakref-whatnot", > making it > self-documenting. > It's just one test that needs this right now. If there should be more in the future, I agree that a dg-require-... might be nicer.
Re: [PATCH] libitm: Disable testing transaction-safe exceptions on Darwin and AIX.
On Thu, 2016-01-21 at 18:26 +, Jonathan Wakely wrote: > On 21/01/16 10:19 -0800, Mike Stump wrote: > >On Jan 21, 2016, at 10:15 AM, Torvald Riegel <trie...@redhat.com> wrote: > >> On Thu, 2016-01-21 at 10:06 -0800, Mike Stump wrote: > >>> On Jan 21, 2016, at 9:29 AM, Dominique d'Humières <domi...@lps.ens.fr> > >>> wrote: > >>>> // { dg-do run { target { ! { *-*-darwin* powerpc-ibm-aix* } } } } > >>> > >>> A comment to hint that this has something to do with weak undefined would > >>> be nice. > >> > >> Here's the patch I prepared (which indeed includes a comment). > >> > >> OK for trunk? I'm not quite sure whether this qualifies as a > >> regression, but having an additional test that now fails is one I guess. > >> > > > >A simple testsuite fixup like this is still ok. From a darwin, AIX > >perspective it is fine. If either the transaction or the libstdc++ people > >like it, I think we’re set. > > OK from the libstdc++ side. > Committed.
[PATCH][committed] libitm: Remove dead code.
I missed dead code when I removed the cacheline stuff. local_type_traits hasn't been updated either, apparently leading to bootstrap issues. So we just remove more dead code. Tested fine on x86_64-linux. Committed. commit c608b69c3c49c7d29033faf328fd4d117f31fd9f Author: Torvald Riegel <trie...@redhat.com> Date: Tue Jan 19 20:43:10 2016 +0100 libitm: Remove dead code. * local_type_traits: Remove file. * libitm_i.h: Don't include it anymore. (sized_integral): Remove. diff --git a/libitm/libitm_i.h b/libitm/libitm_i.h index 751b4ab..ae88ff0 100644 --- a/libitm/libitm_i.h +++ b/libitm/libitm_i.h @@ -36,7 +36,6 @@ #include #include #include -#include "local_type_traits" #include "local_atomic" /* Don't require libgcc_s.so for exceptions. */ @@ -49,13 +48,6 @@ namespace GTM HIDDEN { using namespace std; -// A helper template for accessing an unsigned integral of SIZE bytes. -template struct sized_integral { }; -template<> struct sized_integral<1> { typedef uint8_t type; }; -template<> struct sized_integral<2> { typedef uint16_t type; }; -template<> struct sized_integral<4> { typedef uint32_t type; }; -template<> struct sized_integral<8> { typedef uint64_t type; }; - typedef unsigned int gtm_word __attribute__((mode (word))); // These values are given to GTM_restart_transaction and indicate the diff --git a/libitm/local_type_traits b/libitm/local_type_traits deleted file mode 100644 index 131e8d2..000 --- a/libitm/local_type_traits +++ /dev/null @@ -1,1901 +0,0 @@ -// C++0x type_traits -*- C++ -*- - -// Copyright (C) 2007-2016 Free Software Foundation, Inc. -// -// This file is part of the GNU ISO C++ Library. This library is free -// software; you can redistribute it and/or modify it under the -// terms of the GNU General Public License as published by the -// Free Software Foundation; either version 3, or (at your option) -// any later version. - -// This library is distributed in the hope that it will be useful, -// but WITHOUT ANY WARRANTY; without even the implied warranty of -// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the -// GNU General Public License for more details. - -// Under Section 7 of GPL version 3, you are granted additional -// permissions described in the GCC Runtime Library Exception, version -// 3.1, as published by the Free Software Foundation. - -// You should have received a copy of the GNU General Public License and -// a copy of the GCC Runtime Library Exception along with this program; -// see the files COPYING3 and COPYING.RUNTIME respectively. If not, see -// <http://www.gnu.org/licenses/>. - -// -// -// This is a copy of the libstdc++ header, with the trivial modification -// of ignoring the c++config.h include. If and when the top-level build is -// fixed so that target libraries can be built using the newly built, we can -// delete this file. -// -// - -/** @file include/type_traits - * This is a Standard C++ Library header. - */ - -#ifndef _GLIBCXX_TYPE_TRAITS -#define _GLIBCXX_TYPE_TRAITS 1 - -// #pragma GCC system_header - -// #ifndef __GXX_EXPERIMENTAL_CXX0X__ -// # include -// #else -// #include - -namespace std // _GLIBCXX_VISIBILITY(default) -{ -// _GLIBCXX_BEGIN_NAMESPACE_VERSION - - /** - * @addtogroup metaprogramming - * @{ - */ - - /// integral_constant - template -struct integral_constant -{ - static constexpr _Tp value = __v; - typedef _Tp value_type; - typedef integral_constant<_Tp, __v> type; - constexpr operator value_type() { return value; } -}; - - /// typedef for true_type - typedef integral_constant<bool, true> true_type; - - /// typedef for false_type - typedef integral_constant<bool, false>false_type; - - template -constexpr _Tp integral_constant<_Tp, __v>::value; - - // Meta programming helper types. - - template<bool, typename, typename> -struct conditional; - - template -struct __or_; - - template<> -struct __or_<> -: public false_type -{ }; - - template -struct __or_<_B1> -: public _B1 -{ }; - - template -struct __or_<_B1, _B2> -: public conditional<_B1::value, _B1, _B2>::type -{ }; - - template -struct __or_<_B1, _B2, _B3, _Bn...> -: public conditional<_B1::value, _B1, __or_<_B2, _B3, _Bn...>>::type -{ }; - - template -struct __and_; - - template<> -struct __and_<> -: public true_type -{ }; - - template -struct __and_<_B1> -: public _B1 -{ }; - - template -struct __and_<_B1, _B2> -: public conditional<_B1::value, _B2, _B1>::type -{ };
Re: [PATCH v2] libstdc++: Make certain exceptions transaction_safe.
On Sat, 2016-01-16 at 10:57 +0100, Dominique d'Humières wrote: > > Addressed these, fixed a problem with using GLIBCXX_WEAK_DEFINITION > > (which is only set on Darwin despite the generic-sounding name -- so > > just use __attribute__((weak)) directly), and also updated > > testsuite_abi.cc so that it knows about CXXABI_1.3.10. > > > > Approved by Jonathan Wakely. Committed as r232454. > This breaks bootstrap on darwin, see > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69310. Please give this patch a try. I've only tested it on x86_64-linux. Jon, okay from your side if Darwin testing succeeds? commit 6987f84f278d2cbf5b828a8c81c1be84b292b1af Author: Torvald Riegel <trie...@redhat.com> Date: Tue Jan 19 18:36:14 2016 +0100 libstdc: Use weak alias instead of just alias in TM support. PR libstdc++/69310 * src/c++11/cow-stdexcept.cc: Use weak alias instead of just alias to make Darwin happy. diff --git a/libstdc++-v3/src/c++11/cow-stdexcept.cc b/libstdc++-v3/src/c++11/cow-stdexcept.cc index a0f505c..a070747 100644 --- a/libstdc++-v3/src/c++11/cow-stdexcept.cc +++ b/libstdc++-v3/src/c++11/cow-stdexcept.cc @@ -364,7 +364,9 @@ _ZGTtNSt##NAME##C1EPKc (CLASS* that, const char* s) \ construct the COW string in the latter manually. Note that the \ exception classes will not be declared transaction_safe if the \ shared empty _Rep is disabled with --enable-fully-dynamic-string \ - (in which case _GLIBCXX_FULLY_DYNAMIC_STRING is nonzero). */ \ + (in which case _GLIBCXX_FULLY_DYNAMIC_STRING is nonzero). \ + The alias declarations are also declared weak because Darwin \ + doesn't support non-weak aliases. */\ CLASS e("");\ _ITM_memcpyRnWt(that, , sizeof(CLASS));\ _txnal_cow_string_C1_for_exceptions(_txnal_##BASE##_get_msg(that), \ @@ -372,7 +374,7 @@ _ZGTtNSt##NAME##C1EPKc (CLASS* that, const char* s) \ } \ void \ _ZGTtNSt##NAME##C2EPKc (CLASS*, const char*)\ - __attribute__((alias ("_ZGTtNSt" #NAME "C1EPKc"))); \ + __attribute__((weak, alias ("_ZGTtNSt" #NAME "C1EPKc"))); \ void \ _ZGTtNSt##NAME##C1ERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE( \ CLASS* that, const std::__sso_string& s)\ @@ -393,7 +395,7 @@ _ZGTtNSt##NAME##D1Ev(CLASS* that) \ { _txnal_cow_string_D1(_txnal_##BASE##_get_msg(that)); } \ void \ _ZGTtNSt##NAME##D2Ev(CLASS*) \ -__attribute__((alias ("_ZGTtNSt" #NAME "D1Ev"))); \ +__attribute__((weak, alias ("_ZGTtNSt" #NAME "D1Ev"))); \ void \ _ZGTtNSt##NAME##D0Ev(CLASS* that) \ { \
Re: [PATCH v2] libstdc++: Make certain exceptions transaction_safe.
On Sun, 2016-01-17 at 18:30 -0500, David Edelsohn wrote: > On Sun, Jan 17, 2016 at 3:21 PM, Torvald Riegel <trie...@redhat.com> wrote: > > On Sat, 2016-01-16 at 15:38 -0500, David Edelsohn wrote: > >> On Sat, Jan 16, 2016 at 8:35 AM, Jakub Jelinek <ja...@redhat.com> wrote: > >> > On Sat, Jan 16, 2016 at 07:47:33AM -0500, David Edelsohn wrote: > >> >> stage1 libstdc++ builds just fine. the problem is stage2 configure > >> >> fails due to missing ITM_xxx symbols when configure tries to compile > >> >> and run conftest programs. > >> > > >> > On x86_64-linux, the _ITM_xxx symbols are undef weak ones and thus it is > >> > fine to load libstdc++ without libitm and libstdc++ doesn't depend on > >> > libitm. > >> > > >> > So, is AIX defining __GXX_WEAK__ or not? Perhaps some other macro or > >> > configure check needs to be used to determine if undefined weak symbols > >> > work the way libstdc++ needs them to. > >> > >> __GXX_WEAK__ appears to be defined by gcc/c-family/c-cppbuiltin.c > >> based on SUPPORTS_ONE_ONLY. gcc/defaults.h defines SUPPORTS_ONE_ONLY > >> if the target supports MAKE_DECL_ONE_ONLY and link-once semantics. > >> AIX weak correctly supports link-once semantics. AIX also supports > >> the definition of __GXX_WEAK__ in gcc/doc/cpp.texi, namely collapsing > >> symbols with vague linkage in multiple translation units. > >> > >> libstdc++/src/c++11/cow-stdexcept.cc appears to be using __GXX_WEAK__ > >> and __attribute__ ((weak)) for references to symbols that may not be > >> defined at link time or run time. AIX does not allow undefined symbol > >> errors by default. And the libstdc++ inference about the semantics of > >> __GXX_WEAK__ are different than the documentation. > >> > >> AIX supports MAKE_DECL_ONE_ONLY and the documented meaning of > >> __GXX_WEAK__. AIX does not support extension of the meaning to > >> additional SVR4 semantics not specified in the documentation. > > > > I see, so we might be assuming that __GXX_WEAK__ means more than it > > actually does (I'm saying "might" because personally, I don't know; your > > information supports this is the case, but the initial info I got was > > that __GXX_WEAK__ would mean we could have weak decls without > > definitions). > > I believe that libstdc++ must continue with the weak undefined > references to the symbols as designed, but protect them with a > different macro. For example, __GXX_WEAK_REF__ or __GXX_WEAK_UNDEF__ > defined in defaults.h based on configure test or simply overridden in > config/rs6000/aix.h. Or the macro could be local to libstdc++ and > overridden in config/os/aix/os_defines.h. OK. I'm currently testing the attached patch on x86_64-linux. David, if there are no objections from you and those CC'ed, could you give this one a try on AIX, please? * include/bits/c++config (_GLIBCXX_USE_WEAK_REF): New. (_GLIBCXX_TXN_SAFE, _GLIBCXX_TXN_SAFE_DYN): Use _GLIBCXX_USE_WEAK_REF and move after its definition. * config/os/aix/os_defines.h (_GLIBCXX_USE_WEAK_REF): Override. * src/c++11/cow-stdexcept.cc: Use _GLIBCXX_USE_WEAK_REF instead of __GXX_WEAK__, and only provide transactional clones if _GLIBCXX_USE_WEAK_REF is true. Don't provide stubs of libitm functions. commit a5a8819bce824815a94ef8d58f6d4123db92f1d4 Author: Torvald Riegel <trie...@redhat.com> Date: Mon Jan 18 14:42:21 2016 +0100 libstdc++: Fix usage of __GXX_WEAK__ in TM TS support. * include/bits/c++config (_GLIBCXX_USE_WEAK_REF): New. (_GLIBCXX_TXN_SAFE, _GLIBCXX_TXN_SAFE_DYN): Use _GLIBCXX_USE_WEAK_REF and move after its definition. * config/os/aix/os_defines.h (_GLIBCXX_USE_WEAK_REF): Override. * src/c++11/cow-stdexcept.cc: Use _GLIBCXX_USE_WEAK_REF instead of __GXX_WEAK__, and only provide transactional clones if _GLIBCXX_USE_WEAK_REF is true. Don't provide stubs of libitm functions. diff --git a/libstdc++-v3/config/os/aix/os_defines.h b/libstdc++-v3/config/os/aix/os_defines.h index d895471..0949446 100644 --- a/libstdc++-v3/config/os/aix/os_defines.h +++ b/libstdc++-v3/config/os/aix/os_defines.h @@ -48,4 +48,7 @@ #define __COMPATMATH__ #endif +// No support for referencing weak symbols without a definition. +#define _GLIBCXX_USE_WEAK_REF 0 + #endif diff --git a/libstdc++-v3/include/bits/c++config b/libstdc++-v3/include/bits/c++config index 387a7bb..57024e4 100644 --- a/libstdc++-v3/include/bits/c++config +++ b/libstdc++-v3/include/bits/c++config @@ -483,20 +483,6 @@ namespace std #define _GLIBCXX_USE_ALLOCATOR_
Re: [PATCH v2] libstdc++: Make certain exceptions transaction_safe.
On Mon, 2016-01-18 at 14:54 +0100, Torvald Riegel wrote: > On Sun, 2016-01-17 at 18:30 -0500, David Edelsohn wrote: > > On Sun, Jan 17, 2016 at 3:21 PM, Torvald Riegel <trie...@redhat.com> wrote: > > > On Sat, 2016-01-16 at 15:38 -0500, David Edelsohn wrote: > > >> On Sat, Jan 16, 2016 at 8:35 AM, Jakub Jelinek <ja...@redhat.com> wrote: > > >> > On Sat, Jan 16, 2016 at 07:47:33AM -0500, David Edelsohn wrote: > > >> >> stage1 libstdc++ builds just fine. the problem is stage2 configure > > >> >> fails due to missing ITM_xxx symbols when configure tries to compile > > >> >> and run conftest programs. > > >> > > > >> > On x86_64-linux, the _ITM_xxx symbols are undef weak ones and thus it > > >> > is > > >> > fine to load libstdc++ without libitm and libstdc++ doesn't depend on > > >> > libitm. > > >> > > > >> > So, is AIX defining __GXX_WEAK__ or not? Perhaps some other macro or > > >> > configure check needs to be used to determine if undefined weak symbols > > >> > work the way libstdc++ needs them to. > > >> > > >> __GXX_WEAK__ appears to be defined by gcc/c-family/c-cppbuiltin.c > > >> based on SUPPORTS_ONE_ONLY. gcc/defaults.h defines SUPPORTS_ONE_ONLY > > >> if the target supports MAKE_DECL_ONE_ONLY and link-once semantics. > > >> AIX weak correctly supports link-once semantics. AIX also supports > > >> the definition of __GXX_WEAK__ in gcc/doc/cpp.texi, namely collapsing > > >> symbols with vague linkage in multiple translation units. > > >> > > >> libstdc++/src/c++11/cow-stdexcept.cc appears to be using __GXX_WEAK__ > > >> and __attribute__ ((weak)) for references to symbols that may not be > > >> defined at link time or run time. AIX does not allow undefined symbol > > >> errors by default. And the libstdc++ inference about the semantics of > > >> __GXX_WEAK__ are different than the documentation. > > >> > > >> AIX supports MAKE_DECL_ONE_ONLY and the documented meaning of > > >> __GXX_WEAK__. AIX does not support extension of the meaning to > > >> additional SVR4 semantics not specified in the documentation. > > > > > > I see, so we might be assuming that __GXX_WEAK__ means more than it > > > actually does (I'm saying "might" because personally, I don't know; your > > > information supports this is the case, but the initial info I got was > > > that __GXX_WEAK__ would mean we could have weak decls without > > > definitions). > > > > I believe that libstdc++ must continue with the weak undefined > > references to the symbols as designed, but protect them with a > > different macro. For example, __GXX_WEAK_REF__ or __GXX_WEAK_UNDEF__ > > defined in defaults.h based on configure test or simply overridden in > > config/rs6000/aix.h. Or the macro could be local to libstdc++ and > > overridden in config/os/aix/os_defines.h. > > OK. I'm currently testing the attached patch on x86_64-linux. No regressions in the libstdc++ and libitm tests on x86_64-linux.
Re: [PATCH v2] libstdc++: Make certain exceptions transaction_safe.
On Sat, 2016-01-16 at 15:38 -0500, David Edelsohn wrote: > On Sat, Jan 16, 2016 at 8:35 AM, Jakub Jelinek <ja...@redhat.com> wrote: > > On Sat, Jan 16, 2016 at 07:47:33AM -0500, David Edelsohn wrote: > >> stage1 libstdc++ builds just fine. the problem is stage2 configure > >> fails due to missing ITM_xxx symbols when configure tries to compile > >> and run conftest programs. > > > > On x86_64-linux, the _ITM_xxx symbols are undef weak ones and thus it is > > fine to load libstdc++ without libitm and libstdc++ doesn't depend on > > libitm. > > > > So, is AIX defining __GXX_WEAK__ or not? Perhaps some other macro or > > configure check needs to be used to determine if undefined weak symbols > > work the way libstdc++ needs them to. > > __GXX_WEAK__ appears to be defined by gcc/c-family/c-cppbuiltin.c > based on SUPPORTS_ONE_ONLY. gcc/defaults.h defines SUPPORTS_ONE_ONLY > if the target supports MAKE_DECL_ONE_ONLY and link-once semantics. > AIX weak correctly supports link-once semantics. AIX also supports > the definition of __GXX_WEAK__ in gcc/doc/cpp.texi, namely collapsing > symbols with vague linkage in multiple translation units. > > libstdc++/src/c++11/cow-stdexcept.cc appears to be using __GXX_WEAK__ > and __attribute__ ((weak)) for references to symbols that may not be > defined at link time or run time. AIX does not allow undefined symbol > errors by default. And the libstdc++ inference about the semantics of > __GXX_WEAK__ are different than the documentation. > > AIX supports MAKE_DECL_ONE_ONLY and the documented meaning of > __GXX_WEAK__. AIX does not support extension of the meaning to > additional SVR4 semantics not specified in the documentation. I see, so we might be assuming that __GXX_WEAK__ means more than it actually does (I'm saying "might" because personally, I don't know; your information supports this is the case, but the initial info I got was that __GXX_WEAK__ would mean we could have weak decls without definitions). The attached patch works around this by always definining stubs for the libitm functions, yet declaring them weak at the same time. If __GXX_WEAK__ is not supplied, the transactional clones aren't built at all. This tests fine on x86_64-linux, and I suppose that it should work on AIX too (but I haven't tested). Is it harmless if gnu.pre lists symbols that we don't provide? Thoughts? commit 9008d4a610dccb5ec47f9c1e506492b8615a36fd Author: Torvald Riegel <trie...@redhat.com> Date: Sun Jan 17 19:21:13 2016 +0100 libstdc++: Fix usage of weak symbols in TM TS support. diff --git a/libstdc++-v3/src/c++11/cow-stdexcept.cc b/libstdc++-v3/src/c++11/cow-stdexcept.cc index e2d7e76..8ccc7f5 100644 --- a/libstdc++-v3/src/c++11/cow-stdexcept.cc +++ b/libstdc++-v3/src/c++11/cow-stdexcept.cc @@ -179,6 +179,13 @@ _GLIBCXX_END_NAMESPACE_VERSION // Furthermore, _Rep will always have been allocated or deallocated via // global new or delete, so nontransactional writes we do to _Rep cannot // interfere with transactional accesses. + +// We require support for weak symbols because we need to call libitm but must +// not depend on it. The exception classes will not be declared +// transaction-safe if there is no such support, so do not create +// transactional clones in that case at all. +#ifdef __GXX_WEAK__ + extern "C" { #ifndef _GLIBCXX_MANGLE_SIZE_T @@ -195,7 +202,6 @@ extern "C" { # define ITM_REGPARM #endif -#if __GXX_WEAK__ // Declare all libitm symbols we rely on, but make them weak so that we do // not depend on libitm. extern void* _ZGTtnaX (size_t sz) __attribute__((weak)); @@ -213,9 +219,10 @@ extern void _ITM_memcpyRnWt(void *, const void *, size_t) extern void _ITM_addUserCommitAction(void (*)(void *), uint64_t, void *) ITM_REGPARM __attribute__((weak)); -#else -// If there is no support for weak symbols, create dummies. The exceptions -// will not be declared transaction_safe in this case. +// Also create stubs because some targets (e.g., AIX) do not support weak +// symbols that do not have a definition. These stubs will not be used +// (unless users call the transactional clones directly or run transactions +// without using libitm as well) void* _ZGTtnaX (size_t) { return NULL; } void _ZGTtdlPv (void*) { } uint8_t _ITM_RU1(const uint8_t *) { return 0; } @@ -224,7 +231,6 @@ uint64_t _ITM_RU8(const uint64_t *) { return 0; } void _ITM_memcpyRtWn(void *, const void *, size_t) { } void _ITM_memcpyRnWt(void *, const void *, size_t) { } void _ITM_addUserCommitAction(void (*)(void *), uint64_t, void *) { }; -#endif } @@ -440,3 +446,5 @@ CTORDTOR(14overflow_error, std::overflow_error, runtime_error) CTORDTOR(15underflow_error, std::underflow_error, runtime_error) } + +#endif // __GXX_WEAK__
Re: [PATCH] libstdc++: Fix static_assert.
On Sun, 2016-01-17 at 17:03 +, Jonathan Wakely wrote: > On 17 January 2016 at 17:01, Jonathan Wakely wrote: > > On 16 January 2016 at 22:47, Torvald Riegel wrote: > >> This adds a missing string argument to a call to static_assert, thus not > >> making it depend on c++1z extensions. This fixes the build breakage on > >> mingw introduced in 232454. > >> > >> Tested on x86_64-linux. OK? > > > > OK. > > Actually please change the message, so it states the condition being > tested, not the negation. > > e.g. "Pointers are 32 bits or 64 bits wide" > > or state what must be true, "Pointers must be 32 bits or 64 bits wide" I committed the latter.
[PATCH][committed] libitm: Ensure proxy privatization safety.
This patch fixes privatization safety support by additionally ensuring proxy privatization safety. It fixes a correctness bugs, but in a simple way for now, which will lead to less scalability with higher thread counts. I plan to finish an optimization once I have enough time for this. Tested on x86_64-linux. Committed as r232469 * method-gl.cc (gl_wt_dispatch::trycommit): Ensure proxy privatization safety. * method-ml.cc (ml_wt_dispatch::trycommit): Likewise. * libitm/testsuite/libitm.c/priv-1.c: New. commit fded025d03371aa320eb7e0cb6bd3d772e387fdc Author: Torvald Riegel <trie...@redhat.com> Date: Fri Nov 27 22:59:07 2015 +0100 libitm: Ensure proxy privatization safety. * method-gl.cc (gl_wt_dispatch::trycommit): Ensure proxy privatization safety. * method-ml.cc (ml_wt_dispatch::trycommit): Likewise. * libitm/testsuite/libitm.c/priv-1.c: New. diff --git a/libitm/method-gl.cc b/libitm/method-gl.cc index b2e2bca..b51c802 100644 --- a/libitm/method-gl.cc +++ b/libitm/method-gl.cc @@ -291,12 +291,18 @@ public: // See begin_or_restart() for why we need release memory order here. v = gl_mg::clear_locked(v) + 1; o_gl_mg.orec.store(v, memory_order_release); - - // Need to ensure privatization safety. Every other transaction must - // have a snapshot time that is at least as high as our commit time - // (i.e., our commit must be visible to them). - priv_time = v; } + +// Need to ensure privatization safety. Every other transaction must have +// a snapshot time that is at least as high as our commit time (i.e., our +// commit must be visible to them). Because of proxy privatization, we +// must ensure that even if we are a read-only transaction. See +// ml_wt_dispatch::trycommit() for details: We can't get quite the same +// set of problems because we just use one orec and thus, for example, +// there cannot be concurrent writers -- but we can still get pending +// loads to privatized data when not ensuring privatization safety, which +// is problematic if the program unmaps the privatized memory. +priv_time = v; return true; } diff --git a/libitm/method-ml.cc b/libitm/method-ml.cc index 723643a..c1a6771 100644 --- a/libitm/method-ml.cc +++ b/libitm/method-ml.cc @@ -513,6 +513,21 @@ public: if (!tx->writelog.size()) { tx->readlog.clear(); +// We still need to ensure privatization safety, unfortunately. While +// we cannot have privatized anything by ourselves (because we are not +// an update transaction), we can have observed the commits of +// another update transaction that privatized something. Because any +// commit happens before ensuring privatization, our snapshot and +// commit can thus have happened before ensuring privatization safety +// for this commit/snapshot time. Therefore, before we can return to +// nontransactional code that might use the privatized data, we must +// ensure privatization safety for our snapshot time. +// This still seems to be better than not allowing use of the +// snapshot time before privatization safety has been ensured because +// we at least can run transactions such as this one, and in the +// meantime the transaction producing this commit time might have +// finished ensuring privatization safety for it. +priv_time = tx->shared_state.load(memory_order_relaxed); return true; } diff --git a/libitm/testsuite/libitm.c/priv-1.c b/libitm/testsuite/libitm.c/priv-1.c new file mode 100644 index 000..635d523 --- /dev/null +++ b/libitm/testsuite/libitm.c/priv-1.c @@ -0,0 +1,116 @@ +/* Quick stress test for proxy privatization. */ + +/* We need to use a TM method that has to enforce privatization safety + explicitly. */ +/* { dg-set-target-env-var ITM_DEFAULT_METHOD "ml_wt" } */ + +#include +#include +#include + +/* Make them likely to be mapped to different orecs. */ +#define ALIGN __attribute__((aligned (256))) +/* Don't make these static to work around PR 68591. */ +int x ALIGN; +int *ptr ALIGN; +int *priv_ptr ALIGN; +int priv_value ALIGN; +int barrier ALIGN = 0; +const int iters = 100; + +static void arrive_and_wait (int expected_value) +{ + int now = __atomic_add_fetch (, 1, __ATOMIC_ACQ_REL); + while (now < expected_value) +__atomic_load (, , __ATOMIC_ACQUIRE); +} + +static void __attribute__((transaction_pure,noinline)) delay (int i) +{ + for (volatile int v = 0; v < i; v++); +} + +/* This tries to catch a case in which proxy privatization safety is not + ensured by privatization_user. Specifically, it's access to the value + of it's transactional snapshot of ptr must read from an uncommitted write + by writer; thus, writer must still be active but must have read ptr before + proxy can privatize *ptr by assig
Re: [PATCH v2] libstdc++: Make certain exceptions transaction_safe.
On Sat, 2016-01-16 at 10:57 +0100, Dominique d'Humières wrote: > > Addressed these, fixed a problem with using GLIBCXX_WEAK_DEFINITION > > (which is only set on Darwin despite the generic-sounding name -- so > > just use __attribute__((weak)) directly), and also updated > > testsuite_abi.cc so that it knows about CXXABI_1.3.10. > > > > Approved by Jonathan Wakely. Committed as r232454. > This breaks bootstrap on darwin, see > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69310. Can you (or anyone else with access to Darwin) let me know what Darwin actually supports that would be similar in effect to the alias we use right now? I suppose a weak alias would work as well because we define the C1 constructor anyway? If so, what's the proper macros / configury check to use for these? Is there any documentation for what darwin supports and needs in this space?
[PATCH] libstdc++: Fix static_assert.
This adds a missing string argument to a call to static_assert, thus not making it depend on c++1z extensions. This fixes the build breakage on mingw introduced in 232454. Tested on x86_64-linux. OK? commit 7659ab483954a15c8143f6b1b9d135159a2ecc67 Author: Torvald Riegel <trie...@redhat.com> Date: Sat Jan 16 23:40:26 2016 +0100 libstdc++: Fix static_assert. * src/c++11/cow-stdexcept.cc (txnal_read_ptr): Fix static_assert. diff --git a/libstdc++-v3/src/c++11/cow-stdexcept.cc b/libstdc++-v3/src/c++11/cow-stdexcept.cc index afc3f6c..375ab8b 100644 --- a/libstdc++-v3/src/c++11/cow-stdexcept.cc +++ b/libstdc++-v3/src/c++11/cow-stdexcept.cc @@ -278,8 +278,8 @@ _txnal_cow_string_C1_for_exceptions(void* that, const char* s, void *exc) static void* txnal_read_ptr(void* const * ptr) { static_assert(sizeof(uint64_t) == sizeof(void*) - || sizeof(uint32_t) == sizeof(void*)); - // FIXME make a true compile-time choice to prevent warnings. + || sizeof(uint32_t) == sizeof(void*), + "Pointers are neither 32 nor 64 bit wide"); #if __UINTPTR_MAX__ == __UINT64_MAX__ return (void*)_ITM_RU8((const uint64_t*)ptr); #else
Re: [PATCH v2] libstdc++: Make certain exceptions transaction_safe.
On Sat, 2016-01-16 at 14:35 +0100, Jakub Jelinek wrote: > On Sat, Jan 16, 2016 at 07:47:33AM -0500, David Edelsohn wrote: > > stage1 libstdc++ builds just fine. the problem is stage2 configure > > fails due to missing ITM_xxx symbols when configure tries to compile > > and run conftest programs. > > On x86_64-linux, the _ITM_xxx symbols are undef weak ones and thus it is > fine to load libstdc++ without libitm and libstdc++ doesn't depend on > libitm. > > So, is AIX defining __GXX_WEAK__ or not? Perhaps some other macro or > configure check needs to be used to determine if undefined weak symbols > work the way libstdc++ needs them to. David, if you can tell me what AIX supports and whether it defines __GXX_WEAK__ with the semantics we assume here, I can see what a fix would be. As Jakub says, the point of all what's below is to actually make it work when there's no TM support. Also, knowing the actual error that AIX fails with would be helpful. I have no access to AIX, so can't check. Thanks. > #if __GXX_WEAK__ > // Declare all libitm symbols we rely on, but make them weak so that we do > // not depend on libitm. > extern void* _ZGTtnaX (size_t sz) __attribute__((weak)); > extern void _ZGTtdlPv (void* ptr) __attribute__((weak)); > extern uint8_t _ITM_RU1(const uint8_t *p) > ITM_REGPARM __attribute__((weak)); > extern uint32_t _ITM_RU4(const uint32_t *p) > ITM_REGPARM __attribute__((weak)); > extern uint64_t _ITM_RU8(const uint64_t *p) > ITM_REGPARM __attribute__((weak)); > extern void _ITM_memcpyRtWn(void *, const void *, size_t) > ITM_REGPARM __attribute__((weak)); > extern void _ITM_memcpyRnWt(void *, const void *, size_t) > ITM_REGPARM __attribute__((weak)); > extern void _ITM_addUserCommitAction(void (*)(void *), uint64_t, void *) > ITM_REGPARM __attribute__((weak)); > > #else > // If there is no support for weak symbols, create dummies. The exceptions > // will not be declared transaction_safe in this case. > void* _ZGTtnaX (size_t) { return NULL; } > void _ZGTtdlPv (void*) { } > uint8_t _ITM_RU1(const uint8_t *) { return 0; } > uint32_t _ITM_RU4(const uint32_t *) { return 0; } > uint64_t _ITM_RU8(const uint64_t *) { return 0; } > void _ITM_memcpyRtWn(void *, const void *, size_t) { } > void _ITM_memcpyRnWt(void *, const void *, size_t) { } > void _ITM_addUserCommitAction(void (*)(void *), uint64_t, void *) { }; > #endif > > Jakub
Re: [PATCH v2] libstdc++: Make certain exceptions transaction_safe.
On Thu, 2016-01-14 at 17:58 +, Jonathan Wakely wrote: > On 07/01/16 17:47 +0100, Torvald Riegel wrote: > >The attached patch makes some exceptions transaction-safe, as require by > >the Transactional Memory TS. I believe I addressed all feedback for the > >previous version of this patch (in particular, there are now more safety > >checks for preconditions for this implementation (eg, that the new > >allocator is used), all exceptions declared by the TM TS are now > >supported (with the exception of tx_exception -- should I add that in a > >follow-up patch?) > > Yes, that can be a separate patch, as it's adding a new type rather > than modifying the existing ones to add this TM magic. > > >Thus, the patch adds txnal clones as required. They are new exported > >symbols, but not visible to nontransactional code. The only changes to > >headers is transaction_safe[_dynamic] annotations where required by the > >TS, and a few friend declarations. The annotations are only enabled if > >a user compiles with -fgnu-tm. IOW, the changes are pretty much > >invisible when not using the TM TS. > > Thanks for adding all the clear comments as well. I'm sure we'll all > be grateful for those when we come to look back at the code. > > >There are also commented-out calls to _ITM_setAssociatedException in the > >code, which exist to show how we plan to support transaction > >cancellation through exceptions (which needs some more libitm support > >and bugfixes on the compiler side). > > > >Tested on x86_64-linux and x86-linux. > > > >OK? > > Yes, with some minor formatting changes noted below. Addressed these, fixed a problem with using GLIBCXX_WEAK_DEFINITION (which is only set on Darwin despite the generic-sounding name -- so just use __attribute__((weak)) directly), and also updated testsuite_abi.cc so that it knows about CXXABI_1.3.10. Approved by Jonathan Wakely. Committed as r232454. commit df44fb92fd161282cc6540053cd82177b7c02d51 Author: Torvald Riegel <trie...@redhat.com> Date: Fri Nov 13 01:00:52 2015 +0100 libstdc++: Make certain exceptions transaction_safe. diff --git a/libitm/testsuite/libitm.c++/libstdc++-safeexc.C b/libitm/testsuite/libitm.c++/libstdc++-safeexc.C new file mode 100644 index 000..3e1655e --- /dev/null +++ b/libitm/testsuite/libitm.c++/libstdc++-safeexc.C @@ -0,0 +1,89 @@ +// Tests that the exceptions declared by the TM TS (N4514) as transaction_safe +// are indeed that. Thus, this also tests the transactional clones in +// libstdc++ and libsupc++. + +// { dg-do run } + +#include +#include +#include +#include + +using namespace std; + +template void thrower(const T& t) +{ + try +{ + atomic_commit + { + throw t; + } +} + catch (T ex) +{ + if (ex != t) abort (); +} +} + +template void thrower1(const string& what) +{ + try +{ + atomic_commit + { + throw T (); + } +} + catch (T ex) +{ + if (what != ex.what()) abort (); +} +} + +template void thrower2(const string& what) +{ + try +{ + atomic_commit + { + throw T (what); + } +} + catch (T ex) +{ + if (what != ex.what()) abort (); +} +} + + +int main () +{ + thrower (23); + thrower (23); + thrower (23); + thrower (23); + thrower (23); + thrower (23); + thrower (42); + thrower (42); + thrower (42); + thrower (42); + thrower (23.42); + thrower (23.42); + thrower (23.42); + thrower<void*> (0); + thrower<void**> (0); + thrower1 ("std::exception"); + thrower1 ("std::bad_exception"); + thrower2 ("test"); + thrower2 ("test"); + thrower2 ("test"); + thrower2 ("test"); + thrower2 ("test"); + thrower2 ("test"); + thrower2 ("test"); + thrower2 ("test"); + thrower2 ("test"); + return 0; +} diff --git a/libstdc++-v3/acinclude.m4 b/libstdc++-v3/acinclude.m4 index b76e8d5..1e25660 100644 --- a/libstdc++-v3/acinclude.m4 +++ b/libstdc++-v3/acinclude.m4 @@ -2594,6 +2594,8 @@ AC_DEFUN([GLIBCXX_ENABLE_ALLOCATOR], [ ;; esac + GLIBCXX_CONDITIONAL(ENABLE_ALLOCATOR_NEW, + test $enable_libstdcxx_allocator_flag = new) AC_SUBST(ALLOCATOR_H) AC_SUBST(ALLOCATOR_NAME) ]) @@ -4344,6 +4346,34 @@ dnl AC_LANG_RESTORE ]) +dnl +dnl Check how size_t is mangled. Copied from libitm. +dnl +AC_DEFUN([GLIBCXX_CHECK_SIZE_T_MANGLING], [ + AC_CACHE_CHECK([how size_t is mangled], + glibcxx_cv_size_t_mangling, [ +AC_TRY_COMPILE([], [extern __SIZE_TYPE__ x; extern unsigned long x;], + [glibcxx_cv_size_t_mangling=m], [ + AC_TRY_COMPILE([], [extern __SIZE_TYPE__ x; extern unsigned int x;], + [glib
[PATCH][committed] libitm: Fix privatization safety interaction with serial mode.
This patch fixes the interaction of privatization safety and serial mode. Specifically, previously serial mode was able to happen concurrently with other threads still being in the quiescence-based implementation of ensuring privatization safety; this is a bug because serial mode threads can modify the list of transactions (which privatization safety iterates over) and can change the current method group (priv safety depends on shared_state to have the context of the method group that the privatizing thread assumes to be still current). The fix delays making a transaction inactive to after it has ensured privatization safety. It becomes quasi-inactive before that (ie, it gets an always-most-recent snapshot) though. However, as a result of that, we now need to prevent deadlocks between transactions upgrading to serial mode and privatization safety: The upgrader must keep its snapshot most recent (or abort if it cannot) so that the still-active transactions that the upgrader waits for are not waiting in turn for the upgrader to update its snapshot. Tested on x86_64-linux. Committed as r232322. 2016-01-13 Torvald Riegel <trie...@redhat.com> * beginend.cc (gtm_thread::trycommit): Fix privatization safety. * config/linux/rwlock.cc (gtm_rwlock::write_lock_generic): Likewise. * config/posix/rwlock.cc (gtm_rwlock::write_lock_generic): Likewise. * dispatch.h (abi_dispatch::snapshot_most_recent): New. * method-gl.cc (gl_wt_dispatch::snapshot_most_recent): New. * method-ml.cc (ml_wt_dispatch::snapshot_most_recent): New. * method-serial.cc (serial_dispatch::snapshot_most_recent): New. (serialirr_dispatch::snapshot_most_recent): New. (serialirr_onwrite_dispatch::snapshot_most_recent): New. commit 25e850205e37009a76fb37b52f9ce61ab1616995 Author: Torvald Riegel <trie...@redhat.com> Date: Wed Jan 13 13:35:09 2016 +0100 libitm: Fix privatization safety interaction with serial mode. diff --git a/libitm/beginend.cc b/libitm/beginend.cc index c801dab..85fb4f1 100644 --- a/libitm/beginend.cc +++ b/libitm/beginend.cc @@ -568,8 +568,9 @@ GTM::gtm_thread::trycommit () gtm_word priv_time = 0; if (abi_disp()->trycommit (priv_time)) { - // The transaction is now inactive. Everything that we still have to do - // will not synchronize with other transactions anymore. + // The transaction is now finished but we will still access some shared + // data if we have to ensure privatization safety. + bool do_read_unlock = false; if (state & gtm_thread::STATE_SERIAL) { gtm_thread::serial_lock.write_unlock (); @@ -578,7 +579,27 @@ GTM::gtm_thread::trycommit () priv_time = 0; } else - gtm_thread::serial_lock.read_unlock (this); + { + // If we have to ensure privatization safety, we must not yet + // release the read lock and become inactive because (1) we still + // have to go through the list of all transactions, which can be + // modified by serial mode threads, and (2) we interpret each + // transactions' shared_state in the context of what we believe to + // be the current method group (and serial mode transactions can + // change the method group). Therefore, if we have to ensure + // privatization safety, delay becoming inactive but set a maximum + // snapshot time (we have committed and thus have an empty snapshot, + // so it will always be most recent). Use release MO so that this + // synchronizes with other threads observing our snapshot time. + if (priv_time) + { + do_read_unlock = true; + shared_state.store((~(typeof gtm_thread::shared_state)0) - 1, + memory_order_release); + } + else + gtm_thread::serial_lock.read_unlock (this); + } state = 0; // We can commit the undo log after dispatch-specific commit and after @@ -618,8 +639,11 @@ GTM::gtm_thread::trycommit () } } - // After ensuring privatization safety, we execute potentially - // privatizing actions (e.g., calling free()). User actions are first. + // After ensuring privatization safety, we are now truly inactive and + // thus can release the read lock. We will also execute potentially + // privatizing actions (e.g., calling free()). User actions are first. + if (do_read_unlock) + gtm_thread::serial_lock.read_unlock (this); commit_user_actions (); commit_allocations (false, 0); diff --git a/libitm/config/linux/rwlock.cc b/libitm/config/linux/rwlock.cc index 46a775f..381a553 100644 --- a/libitm/config/linux/rwlock.cc +++ b/libitm/config/linux/rwlock.cc @@ -158,6 +158,19 @@ gtm_rwlock::write_lock_generic (gtm_thread *tx) while (it->shared_state.load (memory_order_relaxed) != ~(typeof it->shared_state)0) { + // If this is an upgrade, we have to break deadlocks with + // privatization safety. This may fail on our side, in
[PATCH][committed] libitm: Fix seq-cst MOs/fences in rwlock.
This fixes a few cases of memory_order_seq_cst atomic accesses that should have been memory_order_seq_cst fences instead. I think it's likely that it has worked so far nonetheless because the code we currently end up with is likely similar -- but this could change with more agressive optimizations of concurrent code. Tested on x86_64-linux. Committed as r232353. 2016-01-13 Torvald Riegel <trie...@redhat.com> * beginend.cc (gtm_thread::trycommit): Fix seq_cst fences. * config/linux/rwlock.cc (gtm_rwlock::write_lock_generic): Likewise. (gtm_rwlock::write_unlock): Likewise. commit beef84ff7f719a1c6f498edb273be92185a38c26 Author: Torvald Riegel <trie...@redhat.com> Date: Wed Jan 13 21:36:48 2016 +0100 libitm: Fix seq-cst MOs/fences in rwlock. diff --git a/libitm/beginend.cc b/libitm/beginend.cc index 85fb4f1..00d28f4 100644 --- a/libitm/beginend.cc +++ b/libitm/beginend.cc @@ -619,8 +619,10 @@ GTM::gtm_thread::trycommit () // acquisitions). This ensures that if we read prior to other // reader transactions setting their shared_state to 0, then those // readers will observe our updates. We can reuse the seq_cst fence - // in serial_lock.read_unlock() however, so we don't need another - // one here. + // in serial_lock.read_unlock() if we performed that; if not, we + // issue the fence. + if (do_read_unlock) + atomic_thread_fence (memory_order_seq_cst); // TODO Don't just spin but also block using cond vars / futexes // here. Should probably be integrated with the serial lock code. for (gtm_thread *it = gtm_thread::list_of_threads; it != 0; diff --git a/libitm/config/linux/rwlock.cc b/libitm/config/linux/rwlock.cc index 381a553..b8d31ea 100644 --- a/libitm/config/linux/rwlock.cc +++ b/libitm/config/linux/rwlock.cc @@ -122,9 +122,10 @@ gtm_rwlock::read_lock (gtm_thread *tx) bool gtm_rwlock::write_lock_generic (gtm_thread *tx) { - // Try to acquire the write lock. + // Try to acquire the write lock. Relaxed MO is fine because of the + // additional fence below. int w = 0; - if (unlikely (!writers.compare_exchange_strong (w, 1))) + if (unlikely (!writers.compare_exchange_strong (w, 1, memory_order_relaxed))) { // If this is an upgrade, we must not wait for other writers or // upgrades. @@ -135,18 +136,20 @@ gtm_rwlock::write_lock_generic (gtm_thread *tx) // switch to contended mode. We need seq_cst memory order to make the // Dekker-style synchronization work. if (w != 2) - w = writers.exchange (2); + w = writers.exchange (2, memory_order_relaxed); while (w != 0) { futex_wait(, 2); - w = writers.exchange (2); + w = writers.exchange (2, memory_order_relaxed); } } + // This fence is both required for the Dekker-like synchronization we do + // here and is the acquire MO required to make us synchronize-with prior + // writers. + atomic_thread_fence (memory_order_seq_cst); // We have acquired the writer side of the R/W lock. Now wait for any // readers that might still be active. - // We don't need an extra barrier here because the CAS and the xchg - // operations have full barrier semantics already. // TODO In the worst case, this requires one wait/wake pair for each // active reader. Reduce this! for (gtm_thread *it = gtm_thread::list_of_threads; it != 0; @@ -259,28 +262,24 @@ gtm_rwlock::read_unlock (gtm_thread *tx) void gtm_rwlock::write_unlock () { - // This needs to have seq_cst memory order. - if (writers.fetch_sub (1) == 2) + // Release MO so that we synchronize with subsequent writers. + if (writers.exchange (0, memory_order_release) == 2) { - // There might be waiting writers, so wake them. - writers.store (0, memory_order_relaxed); - if (futex_wake(, 1) == 0) - { - // If we did not wake any waiting writers, we might indeed be the - // last writer (this can happen because write_lock_generic() - // exchanges 0 or 1 to 2 and thus might go to contended mode even if - // no other thread holds the write lock currently). Therefore, we - // have to wake up readers here as well. Execute a barrier after - // the previous relaxed reset of writers (Dekker-style), and fall - // through to the normal reader wake-up code. - atomic_thread_fence (memory_order_seq_cst); - } - else + // There might be waiting writers, so wake them. If we woke any thread, + // we assume it to indeed be a writer; waiting writers will never give + // up, so we can assume that they will take care of anything else such + // as waking readers. + if (futex_wake(, 1) > 0) return; + // If we did not wake any waiting writers, we might indeed be the last + // writer (this can happen because write_lock_generic() exchanges 0 or 1 + // to 2 and thus might go to contended mode even if no other thread + // holds the write
[PATCH][committed] libitm: Remove dead code and data.
This removes code and data members that have not been used for quite a while now. The user-visible benefit is 8MB less space overhead if libitm is used. Tested on x86_64-linux and committed as r232275. 2016-01-12 Torvald Riegel <trie...@redhat.com> * libitm_i.h (gtm_mask_stack): Remove. * beginend.cc (gtm_stmlock_array, gtm_clock): Likewise. * stmlock.h: Remove file. * config/alpha/cacheline.h: Likewise. * config/generic/cacheline.h: Likewise. * config/powerpc/cacheline.h: Likewise. * config/sparc/cacheline.h: Likewise. * config/x86/cacheline.h: Likewise. commit fe0abed5782347922d4f9dba13b9a917fe9d5296 Author: Torvald Riegel <trie...@redhat.com> Date: Mon Jan 11 19:30:14 2016 +0100 libitm: Remove dead code and data. diff --git a/libitm/beginend.cc b/libitm/beginend.cc index 367edc8..c801dab 100644 --- a/libitm/beginend.cc +++ b/libitm/beginend.cc @@ -36,9 +36,6 @@ gtm_rwlock GTM::gtm_thread::serial_lock; gtm_thread *GTM::gtm_thread::list_of_threads = 0; unsigned GTM::gtm_thread::number_of_threads = 0; -gtm_stmlock GTM::gtm_stmlock_array[LOCK_ARRAY_SIZE]; -atomic GTM::gtm_clock; - /* ??? Move elsewhere when we figure out library initialization. */ uint64_t GTM::gtm_spin_count_var = 1000; diff --git a/libitm/config/alpha/cacheline.h b/libitm/config/alpha/cacheline.h deleted file mode 100644 index c8da46d..000 --- a/libitm/config/alpha/cacheline.h +++ /dev/null @@ -1,38 +0,0 @@ -/* Copyright (C) 2009-2016 Free Software Foundation, Inc. - Contributed by Richard Henderson <r...@redhat.com>. - - This file is part of the GNU Transactional Memory Library (libitm). - - Libitm is free software; you can redistribute it and/or modify it - under the terms of the GNU General Public License as published by - the Free Software Foundation; either version 3 of the License, or - (at your option) any later version. - - Libitm is distributed in the hope that it will be useful, but WITHOUT ANY - WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS - FOR A PARTICULAR PURPOSE. See the GNU General Public License for - more details. - - Under Section 7 of GPL version 3, you are granted additional - permissions described in the GCC Runtime Library Exception, version - 3.1, as published by the Free Software Foundation. - - You should have received a copy of the GNU General Public License and - a copy of the GCC Runtime Library Exception along with this program; - see the files COPYING3 and COPYING.RUNTIME respectively. If not, see - <http://www.gnu.org/licenses/>. */ - -#ifndef LIBITM_ALPHA_CACHELINE_H -#define LIBITM_ALPHA_CACHELINE_H 1 - -// A cacheline is the smallest unit with which locks are associated. -// The current implementation of the _ITM_[RW] barriers assumes that -// all data types can fit (aligned) within a cachline, which means -// in practice sizeof(complex long double) is the smallest cacheline size. -// It ought to be small enough for efficient manipulation of the -// modification mask, below. -#define CACHELINE_SIZE 64 - -#include "config/generic/cacheline.h" - -#endif // LIBITM_ALPHA_CACHELINE_H diff --git a/libitm/config/generic/cacheline.h b/libitm/config/generic/cacheline.h deleted file mode 100644 index 8b9f927..000 --- a/libitm/config/generic/cacheline.h +++ /dev/null @@ -1,58 +0,0 @@ -/* Copyright (C) 2009-2016 Free Software Foundation, Inc. - Contributed by Richard Henderson <r...@redhat.com>. - - This file is part of the GNU Transactional Memory Library (libitm). - - Libitm is free software; you can redistribute it and/or modify it - under the terms of the GNU General Public License as published by - the Free Software Foundation; either version 3 of the License, or - (at your option) any later version. - - Libitm is distributed in the hope that it will be useful, but WITHOUT ANY - WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS - FOR A PARTICULAR PURPOSE. See the GNU General Public License for - more details. - - Under Section 7 of GPL version 3, you are granted additional - permissions described in the GCC Runtime Library Exception, version - 3.1, as published by the Free Software Foundation. - - You should have received a copy of the GNU General Public License and - a copy of the GCC Runtime Library Exception along with this program; - see the files COPYING3 and COPYING.RUNTIME respectively. If not, see - <http://www.gnu.org/licenses/>. */ - -#ifndef LIBITM_CACHELINE_H -#define LIBITM_CACHELINE_H 1 - -namespace GTM HIDDEN { - -// A cacheline is the smallest unit with which locks are associated. -// The current implementation of the _ITM_[RW] barriers assumes that -// all data types can fit (aligned) within a cachline, which means -// in practice sizeof(complex long double) is the smallest cacheline size. -// It ought to be small enough for
[PATCH] libitm: Fix HTM fastpath.
This patch fixes a thinko in the HTM fastpath implementation. In a nutshell, we also need to monitor the HTM fastpath control (ie, htm_fastpath) variable from within a HW transaction on the HTM fastpath, so that such a HW transaciton will only execute if the HTM hastpath is still enabled. We move htm_fastpath into the serial lock so that a HW transaction only needs one cacheline of HTM capacity to monitor both htm_fastpath and check that no non-HW-transaction is currently running. Tested on x86_64-linux. 2016-01-08 Torvald Riegel <trie...@redhat.com> * beginend.cc (GTM::gtm_thread::serial_lock): Put on cacheline boundary. (htm_fastpath): Remove. (gtm_thread::begin_transaction): Fix HTM fastpath. (_ITM_commitTransaction): Adapt. (_ITM_commitTransactionEH): Adapt. * libitm/config/linux/rwlock.h (gtm_rwlock): Add htm_fastpath member and accessors. * libitm/config/posix/rwlock.h (gtm_rwlock): Likewise. * libitm/config/posix/rwlock.cc (gtm_rwlock::gtm_rwlock): Adapt. * libitm/config/x86/sjlj.S (_ITM_beginTransaction): Fix HTM fastpath. * libitm/libitm_i.h (htm_fastpath): Remove declaration. * libitm/method-serial.cc (htm_mg): Adapt. (gtm_thread::serialirr_mode): Adapt. * libitm/query.cc (_ITM_inTransaction, _ITM_getTransactionId): Adapt. commit bfa6b5c51e110ca38c0ce5aa7234e7b719eee5a6 Author: Torvald Riegel <trie...@redhat.com> Date: Wed Jan 6 15:34:57 2016 +0100 libitm: Fix HTM fastpath. diff --git a/libitm/beginend.cc b/libitm/beginend.cc index 367edc8..015704b 100644 --- a/libitm/beginend.cc +++ b/libitm/beginend.cc @@ -32,7 +32,11 @@ using namespace GTM; extern __thread gtm_thread_tls _gtm_thr_tls; #endif -gtm_rwlock GTM::gtm_thread::serial_lock; +// Put this at the start of a cacheline so that serial_lock's writers and +// htm_fastpath fields are on the same cacheline, so that HW transactions +// only have to pay one cacheline capacity to monitor both. +gtm_rwlock GTM::gtm_thread::serial_lock + __attribute__((aligned(HW_CACHELINE_SIZE))); gtm_thread *GTM::gtm_thread::list_of_threads = 0; unsigned GTM::gtm_thread::number_of_threads = 0; @@ -54,9 +58,6 @@ static pthread_mutex_t global_tid_lock = PTHREAD_MUTEX_INITIALIZER; static pthread_key_t thr_release_key; static pthread_once_t thr_release_once = PTHREAD_ONCE_INIT; -// See gtm_thread::begin_transaction. -uint32_t GTM::htm_fastpath = 0; - /* Allocate a transaction structure. */ void * GTM::gtm_thread::operator new (size_t s) @@ -176,9 +177,11 @@ GTM::gtm_thread::begin_transaction (uint32_t prop, const gtm_jmpbuf *jb) // lock's writer flag and thus abort if another thread is or becomes a // serial transaction. Therefore, if the fastpath is enabled, then a // transaction is not executing as a HW transaction iff the serial lock is - // write-locked. This allows us to use htm_fastpath and the serial lock's - // writer flag to reliable determine whether the current thread runs a HW - // transaction, and thus we do not need to maintain this information in + // write-locked. Also, HW transactions monitor the fastpath control + // variable, so that they will only execute if dispatch_htm is still the + // current method group. This allows us to use htm_fastpath and the serial + // lock's writers flag to reliable determine whether the current thread runs + // a HW transaction, and thus we do not need to maintain this information in // per-thread state. // If an uninstrumented code path is not available, we can still run // instrumented code from a HW transaction because the HTM fastpath kicks @@ -190,9 +193,14 @@ GTM::gtm_thread::begin_transaction (uint32_t prop, const gtm_jmpbuf *jb) // for any internal changes (e.g., they never abort visibly to the STM code // and thus do not trigger the standard retry handling). #ifndef HTM_CUSTOM_FASTPATH - if (likely(htm_fastpath && (prop & pr_hasNoAbort))) + if (likely(serial_lock.get_htm_fastpath() && (prop & pr_hasNoAbort))) { - for (uint32_t t = htm_fastpath; t; t--) + // Note that the snapshot of htm_fastpath that we take here could be + // outdated, and a different method group than dispatch_htm may have + // been chosen in the meantime. Therefore, take care not not touch + // anything besides the serial lock, which is independent of method + // groups. + for (uint32_t t = serial_lock.get_htm_fastpath(); t; t--) { uint32_t ret = htm_begin(); if (htm_begin_success(ret)) @@ -200,9 +208,11 @@ GTM::gtm_thread::begin_transaction (uint32_t prop, const gtm_jmpbuf *jb) // We are executing a transaction now. // Monitor the writer flag in the serial-mode lock, and abort // if there is an active or waiting serial-mode transaction. + // Also checks that htm_fastpath is still nonzero and thus + // HW transactions
[PATCH v2] libstdc++: Make certain exceptions transaction_safe.
The attached patch makes some exceptions transaction-safe, as require by the Transactional Memory TS. I believe I addressed all feedback for the previous version of this patch (in particular, there are now more safety checks for preconditions for this implementation (eg, that the new allocator is used), all exceptions declared by the TM TS are now supported (with the exception of tx_exception -- should I add that in a follow-up patch?), and there is a test for the new functionality (as part of libitm's testsuite)). There are two things that complicate such support. First, it seems better to not rely on -fgnu-tm of libstdc++ code for now (or at least we tried to avoid this so far). Therefore, the transactional clones in this patch are all manually instrumented (ie, the functions are C functions with names matching the mangled names of the respective C++ functions, and the _ZGTt prefix signaling that they are txnal clones). Second, exceptions still use a COW string internally, which cannot be made transaction-safe with just compiler support because of the reference counting implementation inside of COW strings, which uses atomics. One would need something custom for that nonetheless. Thus, the patch adds txnal clones as required. They are new exported symbols, but not visible to nontransactional code. The only changes to headers is transaction_safe[_dynamic] annotations where required by the TS, and a few friend declarations. The annotations are only enabled if a user compiles with -fgnu-tm. IOW, the changes are pretty much invisible when not using the TM TS. There are also commented-out calls to _ITM_setAssociatedException in the code, which exist to show how we plan to support transaction cancellation through exceptions (which needs some more libitm support and bugfixes on the compiler side). Tested on x86_64-linux and x86-linux. OK? 2016-01-07 Torvald Riegel <trie...@redhat.com> * include/bits/basic_string.h (basic_string): Declare friends. * include/bits/c++config (_GLIBCXX_TXN_SAFE, _GLIBCXX_TXN_SAFE_DYN, _GLIBCXX_USE_ALLOCATOR_NEW): New. * include/std/stdexcept (logic_error, domain_error, invalid_argument, length_error, out_of_range, runtime_error, range_error, underflow_error, overflow_error): Declare members as transaction-safe. (logic_error, runtime_error): Declare friend functions. * libsupc++/exception (exception, bad_exception): Declare members as transaction-safe. * src/c++11/cow-stdexcept.cc: Define transactional clones for the transaction-safe members of exceptions and helper functions. * libsupc++/eh_exception.cc: Adjust and define transactional clones. * config/abi/pre/gnu.ver (GLIBCXX_3.4.22) Add transactional clones. (CXXABI_1.3.10): New. * acinclude.m4 (GLIBCXX_CHECK_SIZE_T_MANGLING): New. (GLIBCXX_ENABLE_ALLOCATOR): Set ENABLE_ALLOCATOR_NEW. * configure.ac: Call GLIBCXX_CHECK_SIZE_T_MANGLING. * include/Makefile.am: Write ENABLE_ALLOCATOR_NEW to c++config.h. * include/Makefile.in: Regenerate. * config.h.in: Regenerate. * configure: Regenerate. commit d4eda24eca5bf5d8d94f1d56762a9efd50294750 Author: Torvald Riegel <trie...@redhat.com> Date: Fri Nov 13 01:00:52 2015 +0100 libstdc++: Make certain exceptions transaction_safe. diff --git a/libitm/testsuite/libitm.c++/libstdc++-safeexc.C b/libitm/testsuite/libitm.c++/libstdc++-safeexc.C new file mode 100644 index 000..3e1655e --- /dev/null +++ b/libitm/testsuite/libitm.c++/libstdc++-safeexc.C @@ -0,0 +1,89 @@ +// Tests that the exceptions declared by the TM TS (N4514) as transaction_safe +// are indeed that. Thus, this also tests the transactional clones in +// libstdc++ and libsupc++. + +// { dg-do run } + +#include +#include +#include +#include + +using namespace std; + +template void thrower(const T& t) +{ + try +{ + atomic_commit + { + throw t; + } +} + catch (T ex) +{ + if (ex != t) abort (); +} +} + +template void thrower1(const string& what) +{ + try +{ + atomic_commit + { + throw T (); + } +} + catch (T ex) +{ + if (what != ex.what()) abort (); +} +} + +template void thrower2(const string& what) +{ + try +{ + atomic_commit + { + throw T (what); + } +} + catch (T ex) +{ + if (what != ex.what()) abort (); +} +} + + +int main () +{ + thrower (23); + thrower (23); + thrower (23); + thrower (23); + thrower (23); + thrower (23); + thrower (42); + thrower (42); + thrower (42); + thrower (42); + thrower (23.42); + thrower (23.42); + thrower (23.42); + thrower<void*> (0); + thrower<void**> (0); + thrower1 ("std::exception"); + thrower1 ("std::bad_exception"); + thrower2 ("test"); + thrower2 ("test"); + thrower2 ("test")
Re: [PATCH][WIP] libstdc++: Make certain exceptions transaction_safe.
On Wed, 2015-12-16 at 21:05 +, Jonathan Wakely wrote: > Sorry for the delay finishing this review, some of the code kept > melting my brain ;-) I know what you mean :) Thanks for the review! > On 14/11/15 20:45 +0100, Torvald Riegel wrote: > >diff --git a/libstdc++-v3/config/abi/pre/gnu.ver > >b/libstdc++-v3/config/abi/pre/gnu.ver > >index 1b3184a..d902b03 100644 > >--- a/libstdc++-v3/config/abi/pre/gnu.ver > >+++ b/libstdc++-v3/config/abi/pre/gnu.ver > >@@ -1876,6 +1876,12 @@ GLIBCXX_3.4.22 { > > _ZNSt6thread6_StateD[012]Ev; > > > > _ZNSt6thread15_M_start_threadESt10unique_ptrINS_6_StateESt14default_deleteIS1_EEPFvvE; > > > >+# Support for the Transactional Memory TS (N4514) > >+_ZGTtNSt11logic_errorC1EPKc; > >+ > >_ZGTtNSt11logic_errorC1ERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE; > >+_ZGTtNKSt11logic_error4whatEv; > >+_ZGTtNSt11logic_errorD1Ev; > >+ > > } GLIBCXX_3.4.21; > > This is OK but ... > > > # Symbols in the support library (libsupc++) have their own tag. > >@@ -2107,6 +2113,12 @@ CXXABI_1.3.9 { > > # operator delete[](void*, std::size_t) > > _ZdaPv[jmy]; > > > >+# Support for the Transactional Memory TS (N4514) > >+_ZGTtNKSt9exceptionD1Ev; > >+_ZGTtNKSt9exception4whatEv; > >+_ZGTtNKSt13bad_exceptionD1Ev; > >+_ZGTtNKSt13bad_exception4whatEv; > >+ > > } CXXABI_1.3.8; > > That symbol version was already used in the gcc-5 release and so is > frozen, you'll need CXXABI_1.3.10 for these new symbols (similar to > https://gcc.gnu.org/ml/gcc-patches/2015-12/msg00258.html so if > Catherine's already added that version you can just add them there). Fixed. > >diff --git a/libstdc++-v3/include/bits/c++config > >b/libstdc++-v3/include/bits/c++config > >index 723feb1..0e66bb0 100644 > >--- a/libstdc++-v3/include/bits/c++config > >+++ b/libstdc++-v3/include/bits/c++config > >@@ -481,6 +481,17 @@ namespace std > > # define _GLIBCXX_BEGIN_EXTERN_C extern "C" { > > # define _GLIBCXX_END_EXTERN_C } > > > >+// Conditionally enable annotations for the Transactional Memory TS on > >C++11. > >+#if __cplusplus >= 201103L && \ > >+ _GLIBCXX_USE_CXX11_ABI && _GLIBCXX_USE_DUAL_ABI && \ > > It's possible we can make this work for the old ABI too, but this is > OK for now. The old ABI always uses COW strings, but that's what the > code you've written deals with anyway. OK. I would have to write the wrappers for the new strings too then, and I wanted to avoid that for now. > >+ defined(__cpp_transactional_memory) && __cpp_transactional_memory >= > >201505L > > The defined(__cpp_transactional_memory) check is redundant, isn't it? > > Users aren't allowed to define it, so it will either be defined to an > integer value or undefined and evaluate to zero. Won't we get an undefined warning when using -Wundef if we remove the first part? That's what I wanted to avoid. > >+#define _GLIBCXX_TXN_SAFE transaction_safe > >+#define _GLIBCXX_TXN_SAFE_DYN transaction_safe_dynamic > >+#else > >+#define _GLIBCXX_TXN_SAFE > >+#define _GLIBCXX_TXN_SAFE_DYN > >+#endif > >+ > > #else // !__cplusplus > > # define _GLIBCXX_BEGIN_EXTERN_C > > # define _GLIBCXX_END_EXTERN_C > > > >@@ -44,7 +46,36 @@ std::exception::what() const _GLIBCXX_USE_NOEXCEPT > > } > > > > const char* > >-std::bad_exception::what() const _GLIBCXX_USE_NOEXCEPT > >+std::bad_exception::what() const _GLIBCXX_TXN_SAFE_DYN _GLIBCXX_USE_NOEXCEPT > > { > > return "std::bad_exception"; > > } > >+ > >+// Transactional clones for the destructors and what(). > >+// what() is effectively transaction_pure, but we do not want to annotate it > >+// as such; thus, we call exactly the respective nontransactional function. > >+extern "C" { > >+ > >+void > >+_ZGTtNKSt9exceptionD1Ev(const std::exception*) > >+{ } > >+ > >+const char* > >+_ZGTtNKSt9exception4whatEv(const std::exception* that) > >+{ > >+ return that->std::exception::what(); > >+} > > This makes a non-virtual call, is that correct? > > If users derive from std::exception and override what() they will > expect a call to what() to dispatch to their override in the derived > class, but IIUC in a transactional block they would call this > function, which would call the base what(), not their override. Yes. I added this comment to _ZGTtNKSt9exception4whatEv and also referenced it from a comment in _ZGTt
Re: [patch] libstdc++/68921 add timeout argument to futex(2)
On Tue, 2015-12-15 at 18:46 +, Jonathan Wakely wrote: > This fixes a missing argument to the futex syscall. > > Tested powerpc64le-linux. This needs to be fixed for gcc-5 and trunk. > OK. Thanks!
[COMMITTED] libitm: Use multiplicative hashing in the multi-lock TM method.
This changes the hash function used in the multi-lock TM method for mapping the address of a transactional access by an application to synchronization metadata. The previous hash function was very simple and just used a range of bits from the address; the new function uses multiplicative hashing (see the code comments for more background). The previous function was fast, but suffered from cases of pathological performance degradation because of power-of-two aliasing (eg, allocator arenas); to get okay performance, one had to use a substantial amount of synchronization metadata. The new hash function uses two 32b multiplications (and we might get rid of one of them by specializing the code more), but this doesn't seem to affect single-thread overheads much at least on mainstream x86. It makes the TM method scale better by suffering from fewer false conflicts, while still using 8x less memory for the synchronization metadata (512KB currently on 64b, 256KB on 32b). The space overhead can be further reduces if we think that's important (e.g., 32KB/16KB resulted in 10% less performance in prior tests). Tested on x86_64-linux with the libitm tests, and a quick custom test. In-depth performance testing in another implementation similar to libitm has been done before, see the URL cited in the code's comments. commit 7c6d5c7221b85fd82bcb8c59c90ae39b14883b98 Author: Torvald Riegel <trie...@redhat.com> Date: Thu Nov 26 16:52:04 2015 +0100 libitm: Use multiplicative hashing in the multi-lock TM method. * method-ml.cc (ml_mg): Use multiplicative instead of simple hashing. (ml_wt_dispatch::pre_write): Adapt. (ml_wt_dispatch::pre_load): Likewise. diff --git a/libitm/method-ml.cc b/libitm/method-ml.cc index 37cb08b..b4cabc8 100644 --- a/libitm/method-ml.cc +++ b/libitm/method-ml.cc @@ -69,23 +69,51 @@ struct ml_mg : public method_group atomic* orecs __attribute__((aligned(HW_CACHELINE_SIZE))); char tailpadding[HW_CACHELINE_SIZE - sizeof(atomic*)]; - // Location-to-orec mapping. Stripes of 16B mapped to 2^19 orecs. - static const gtm_word L2O_ORECS = 1 << 19; - static const gtm_word L2O_SHIFT = 4; - static size_t get_orec(const void* addr) + // Location-to-orec mapping. Stripes of 32B mapped to 2^16 orecs using + // multiplicative hashing. See Section 5.2.2 of Torvald Riegel's PhD thesis + // for the background on this choice of hash function and parameters: + // http://nbn-resolving.de/urn:nbn:de:bsz:14-qucosa-115596 + // We pick the Mult32 hash because it works well with fewer orecs (i.e., + // less space overhead and just 32b multiplication). + // We may want to check and potentially change these settings once we get + // better or just more benchmarks. + static const gtm_word L2O_ORECS_BITS = 16; + static const gtm_word L2O_ORECS = 1 << L2O_ORECS_BITS; + // An iterator over the orecs covering the region [addr,addr+len). + struct orec_iterator { -return ((uintptr_t)addr >> L2O_SHIFT) & (L2O_ORECS - 1); - } - static size_t get_next_orec(size_t orec) - { -return (orec + 1) & (L2O_ORECS - 1); - } - // Returns the next orec after the region. - static size_t get_orec_end(const void* addr, size_t len) - { -return (((uintptr_t)addr + len + (1 << L2O_SHIFT) - 1) >> L2O_SHIFT) -& (L2O_ORECS - 1); - } +static const gtm_word L2O_SHIFT = 5; +static const uint32_t L2O_MULT32 = 81007; +uint32_t mult; +size_t orec; +size_t orec_end; +orec_iterator (const void* addr, size_t len) +{ + uint32_t a = (uintptr_t) addr >> L2O_SHIFT; + uint32_t ae = ((uintptr_t) addr + len + (1 << L2O_SHIFT) - 1) + >> L2O_SHIFT; + mult = a * L2O_MULT32; + orec = mult >> (32 - L2O_ORECS_BITS); + // We can't really avoid this second multiplication unless we use a + // branch instead or know more about the alignment of addr. (We often + // know len at compile time because of instantiations of functions + // such as _ITM_RU* for accesses of specific lengths. + orec_end = (ae * L2O_MULT32) >> (32 - L2O_ORECS_BITS); +} +size_t get() { return orec; } +void advance() +{ + // We cannot simply increment orec because L2O_MULT32 is larger than + // 1 << (32 - L2O_ORECS_BITS), and thus an increase of the stripe (i.e., + // addr >> L2O_SHIFT) could increase the resulting orec index by more + // than one; with the current parameters, we would roughly acquire a + // fourth more orecs than necessary for regions covering more than orec. + // Keeping mult around as extra state shouldn't matter much. + mult += L2O_MULT32; + orec = mult >> (32 - L2O_ORECS_BITS); +} +bool reached_end() { return orec == orec_end; } + }; virtual void init() { @@ -142,14 +170,13 @@ protected: gtm_word locked_by_tx = ml_mg::set_locked(tx); // Lock all ore
[COMMITTED] libitm: Fix recent changes to allocations log.
This fixes an oversight in the recent changes to the allocations log for supporting sized delete. Tested libitm on x86_64-linux. COmmitted as obvious. libitm/ * libitm_i.h (gtm_alloc_action): Remove union. * testsuite/libitm.c/alloc-1.c: New. commit 74c5fd924fe3e8d6bececce209d00bf0523bb4dc Author: Torvald Riegel <trie...@redhat.com> Date: Sun Nov 22 21:54:24 2015 +0100 libitm: Fix recent changes to allocations log. libitm/ * libitm_i.h (gtm_alloc_action): Remove union. * testsuite/libitm.c/alloc-1.c: New. diff --git a/libitm/libitm_i.h b/libitm/libitm_i.h index f01a1ab..4b72348 100644 --- a/libitm/libitm_i.h +++ b/libitm/libitm_i.h @@ -106,12 +106,10 @@ namespace GTM HIDDEN { // the template used inside gtm_thread can instantiate. struct gtm_alloc_action { - // Iff free_fn_sz is nonzero, it must be used instead of free_fn. - union - { -void (*free_fn)(void *); -void (*free_fn_sz)(void *, size_t); - }; + // Iff free_fn_sz is nonzero, it must be used instead of free_fn, and vice + // versa. + void (*free_fn)(void *); + void (*free_fn_sz)(void *, size_t); size_t sz; // If true, this is an allocation; we discard the log entry on outermost // commit, and deallocate on abort. If false, this is a deallocation and diff --git a/libitm/testsuite/libitm.c/alloc-1.c b/libitm/testsuite/libitm.c/alloc-1.c new file mode 100644 index 000..49faab5 --- /dev/null +++ b/libitm/testsuite/libitm.c/alloc-1.c @@ -0,0 +1,17 @@ +// Test that rolling back allocations works. +#include + +void __attribute((transaction_pure,noinline)) dont_optimize(void* p) +{ + *(volatile char *) p; +} + +int main() +{ + __transaction_atomic { +void *p = malloc (23); +dont_optimize (p); +__transaction_cancel; + } + return 0; +}
Re: [PATCH] Transactional Memory: Support __cxa_free_exception and fix exception handling.
On Thu, 2015-11-19 at 11:18 -0600, Peter Bergner wrote: > On Thu, 2015-11-19 at 09:35 -0600, Torvald Riegel wrote: > > Tested using the libitm testsuite on x86_64-linux. > > Tested on powerpc64le-linux with no regressions and I confirmed > the new eh-5.C test case passes. Thanks. Then I'll commit this if I hear no objections from rth or Jason.
[PATCH] Transactional Memory: Support __cxa_free_exception and fix exception handling.
The EH scheme that we had been using for TM / libitm doesn't work properly. We fail to handle throwing exceptions whose constructors may throw themselves. We also do not clean up properly in all situations when a transactions abort while being in the process of throwing an exception. This patch solves this particular problem by adding a transactional wrapper for __cxa_free_exception and changing the EH scheme in libitm. Follow-up patches will fix other issues that we have identified. Some of the changes to the libitm.texi ABI docs added in this patch already take this future work into account. Tested using the libitm testsuite on x86_64-linux. Are the gcc/ bits OK? gcc/cp/ * except.c (do_free_exception): Use transactional wrapper. libitm/ * testsuite/libitm.c++/eh-5.C: New. * libitm.h (_ITM_cxa_free_exception): New. * libitm.map (_ITM_cxa_free_exception): Add it. * libitm.texi: Update ABI docs. * libitm_i.h (gtm_transaction_cp::cxa_unthrown): Remove. (gtm_transaction_cp::cxa_uncaught_count): Add. (gtm_thread::cxa_unthrown): Remove. (gtm_thread::cxa_uncaught_count_ptr): Add. (gtm_thread::cxa_uncaught_count): Add. (gtm_thread::drop_references_allocations): Rename to... (gtm_thread::discard_allocation): ... this and adapt. (gtm_thread::init_cpp_exceptions): New. * beginend.cc (gtm_thread::gtm_thread): Adapt EH handling. (gtm_thread::begin_transaction): Likewise. (gtm_transaction_cp::save): Likewise. (gtm_thread::trycommit): Likewise. * eh_cpp.cc: Add overview comments. (__cxa_eh_globals, __cxa_get_globals, __cxa_free_exception): Declare. (free_any_exception, _ITM_cxa_free_exception): New. (gtm_thread::init_cpp_exceptions): Define. (_ITM_cxa_allocate_exception, _ITM_cxa_throw): Adapt. (_ITM_cxa_begin_catch, _ITM_cxa_end_catch): Likewise. (gtm_thread::revert_cpp_exceptions): Likewise. commit 23bd34e3c8028a12705a47d13a4c7aa36bfeca60 Author: Torvald Riegel <trie...@redhat.com> Date: Tue Nov 3 15:38:22 2015 +0100 Support __cxa_free_exception and fix exception handling. gcc/cp/ * except.c (do_free_exception): Use transactional wrapper. libitm/ * testsuite/libitm.c++/eh-5.C: New. * libitm.h (_ITM_cxa_free_exception): New. * libitm.map (_ITM_cxa_free_exception): Add it. * libitm.texi: Update ABI docs. * libitm_i.h (gtm_transaction_cp::cxa_unthrown): Remove. (gtm_transaction_cp::cxa_uncaught_count): Add. (gtm_thread::cxa_unthrown): Remove. (gtm_thread::cxa_uncaught_count_ptr): Add. (gtm_thread::cxa_uncaught_count): Add. (gtm_thread::drop_references_allocations): Rename to... (gtm_thread::discard_allocation): ... this and adapt. (gtm_thread::init_cpp_exceptions): New. * beginend.cc (gtm_thread::gtm_thread): Adapt EH handling. (gtm_thread::begin_transaction): Likewise. (gtm_transaction_cp::save): Likewise. (gtm_thread::trycommit): Likewise. * eh_cpp.cc: Add overview comments. (__cxa_eh_globals, __cxa_get_globals, __cxa_free_exception): Declare. (free_any_exception, _ITM_cxa_free_exception): New. (gtm_thread::init_cpp_exceptions): Define. (_ITM_cxa_allocate_exception, _ITM_cxa_throw): Adapt. (_ITM_cxa_begin_catch, _ITM_cxa_end_catch): Likewise. (gtm_thread::revert_cpp_exceptions): Likewise. diff --git a/gcc/cp/except.c b/gcc/cp/except.c index 9b2450d..ad40436 100644 --- a/gcc/cp/except.c +++ b/gcc/cp/except.c @@ -662,6 +662,16 @@ do_free_exception (tree ptr) /* Declare void __cxa_free_exception (void *) throw(). */ fn = declare_library_fn (fn, void_type_node, ptr_type_node, ECF_NOTHROW | ECF_LEAF); + + if (flag_tm) + { + tree fn2 = get_identifier ("_ITM_cxa_free_exception"); + if (!get_global_value_if_present (fn2, )) + fn2 = declare_library_fn (fn2, void_type_node, + ptr_type_node, + ECF_NOTHROW | ECF_LEAF | ECF_TM_PURE); + record_tm_replacement (fn, fn2); + } } return cp_build_function_call_nary (fn, tf_warning_or_error, ptr, NULL_TREE); diff --git a/libitm/beginend.cc b/libitm/beginend.cc index c3ed11b..86f7b39 100644 --- a/libitm/beginend.cc +++ b/libitm/beginend.cc @@ -132,6 +132,8 @@ GTM::gtm_thread::gtm_thread () number_of_threads_changed(number_of_threads - 1, number_of_threads); serial_lock.write_unlock (); + init_cpp_exceptions (); + if (pthread_once(_release_once, thread_exit_init)) GTM_fatal("Initializing thread release TLS key failed."); // Any non-null value is sufficient to trigger destruction of this @@ -383,6 +385,11 @@ GTM::gtm_thread::begin_transaction (uint32_t prop, const gtm_jmpbuf *jb) #endif } + // Log the number of uncaught exceptions if we might have to roll back this + // state. + if (tx->cxa_unc
[PATCH][WIP] libstdc++: Make certain exceptions transaction_safe.
The attached patch makes some exceptions transaction-safe, as require by the Transactional Memory TS. It has some rough edges, but I'm hoping I can sort them out quickly using your feedback. It only supports logic_error and exception/bad_exception, but the other exceptions that the TM TS specifies as transaction-safe basically require the same support (notably, runtime_error is the same as logic_error, and their subclasses don't add anything). There are two things that complicate such support. First, it seems better to not rely on -fgnu-tm of libstdc++ code for now (or at least we tried to avoid this so far). Therefore, the transactional clones in this patch are all manually instrumented (ie, the functions are C functions with names matching the mangled names of the respective C++ functions, and the _ZGTt prefix signaling that they are txnal clones). Second, exceptions still use a COW string internally, which cannot be made transaction-safe with just compiler support because of the reference counting implementation inside of COW strings, which uses atomics. One would need something custom for that nonetheless. Thus, the patch adds txnal clones as required. They are new exported symbols, but not visible to nontransactional code. The only changes to headers is transaction_safe[_dynamic] annotations where required by the TS, and a few friend declarations. The annotaitons are only enabled if a user compiles with -fgnu-tm. IOW, the changes are pretty much invisible when not using the TM TS. The patch doesn't include tests yet, but tests like this one seem to run successfully on x86_64: template void thrower4(const string& what) { try { // Creating a temporary inside of the txn ICEs. T t(what); atomic_commit { _ITM_hackOrTxnProp (pr_atomicCancel); dontoptimize (t.what()); throw T (what); } } catch (T ex) { if (what != ex.what()) abort (); } } thrower4 ("test"); I can't yet test the destructors because of issues on the compiler side. There are also commented-out calls to _ITM_setAssociatedException in the code, which exist to show how we plan to support transaction cancellation through exceptions (which needs some more libitm support and bugfixes on the compiler side). commit e81080a01ab0daf2949a400c1a2d5077d37ba515 Author: Torvald Riegel <trie...@redhat.com> Date: Fri Nov 13 01:00:52 2015 +0100 libstdc++: Make certain exceptions transaction_safe. diff --git a/libstdc++-v3/config/abi/pre/gnu.ver b/libstdc++-v3/config/abi/pre/gnu.ver index 1b3184a..d902b03 100644 --- a/libstdc++-v3/config/abi/pre/gnu.ver +++ b/libstdc++-v3/config/abi/pre/gnu.ver @@ -1876,6 +1876,12 @@ GLIBCXX_3.4.22 { _ZNSt6thread6_StateD[012]Ev; _ZNSt6thread15_M_start_threadESt10unique_ptrINS_6_StateESt14default_deleteIS1_EEPFvvE; +# Support for the Transactional Memory TS (N4514) +_ZGTtNSt11logic_errorC1EPKc; +_ZGTtNSt11logic_errorC1ERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE; +_ZGTtNKSt11logic_error4whatEv; +_ZGTtNSt11logic_errorD1Ev; + } GLIBCXX_3.4.21; # Symbols in the support library (libsupc++) have their own tag. @@ -2107,6 +2113,12 @@ CXXABI_1.3.9 { # operator delete[](void*, std::size_t) _ZdaPv[jmy]; +# Support for the Transactional Memory TS (N4514) +_ZGTtNKSt9exceptionD1Ev; +_ZGTtNKSt9exception4whatEv; +_ZGTtNKSt13bad_exceptionD1Ev; +_ZGTtNKSt13bad_exception4whatEv; + } CXXABI_1.3.8; # Symbols in the support library (libsupc++) supporting transactional memory. diff --git a/libstdc++-v3/include/bits/basic_string.h b/libstdc++-v3/include/bits/basic_string.h index b3853cd..47d9f2f 100644 --- a/libstdc++-v3/include/bits/basic_string.h +++ b/libstdc++-v3/include/bits/basic_string.h @@ -4904,6 +4904,18 @@ _GLIBCXX_END_NAMESPACE_CXX11 int compare(size_type __pos, size_type __n1, const _CharT* __s, size_type __n2) const; + +# ifdef _GLIBCXX_TM_TS_INTERNAL + friend void + ::_txnal_cow_string_C1_for_exceptions(void* that, const char* s, + void* exc); + friend const char* + ::_txnal_cow_string_c_str(const void *that); + friend void + ::_txnal_cow_string_D1(void *that); + friend void + ::_txnal_cow_string_D1_commit(void *that); +# endif }; #endif // !_GLIBCXX_USE_CXX11_ABI diff --git a/libstdc++-v3/include/bits/c++config b/libstdc++-v3/include/bits/c++config index 723feb1..0e66bb0 100644 --- a/libstdc++-v3/include/bits/c++config +++ b/libstdc++-v3/include/bits/c++config @@ -481,6 +481,17 @@ namespace std # define _GLIBCXX_BEGIN_EXTERN_C extern "C" { # define _GLIBCXX_END_EXTERN_C } +// Conditionally enable annotations for the Transactional Memory TS on C++11. +#if __cplusplus >= 201103L && \ + _GLIBCXX_USE_CXX11_ABI && _GLIBCXX_USE_DUAL_ABI && \ + defined(__cpp_transactional_memory) && __cpp_transactional_memor
Re: State of support for the ISO C++ Transactional Memory TS and remanining work
On Wed, 2015-11-11 at 15:04 +, Szabolcs Nagy wrote: > On 10/11/15 18:29, Torvald Riegel wrote: > > On Tue, 2015-11-10 at 17:26 +, Szabolcs Nagy wrote: > >> On 09/11/15 00:19, Torvald Riegel wrote: > >>> I've not yet created tests for the full list of functions specified as > >>> transaction-safe in the TS, but my understanding is that this list was > >>> created after someone from the ISO C++ TM study group looked at libstdc > >>> ++'s implementation and investigated which functions might be feasible > >>> to be declared transaction-safe in it. > >>> > >> > >> is that list available somewhere? > > > > See the TM TS, N4514. > > > > i was looking at an older version, > things make more sense now. > > i think system() should not be transaction safe.. > > i wonder what's the plan for getting libc functions > instrumented (i assume that is needed unless hw > support is used). No specific plans so far. We'll wait and see, I guess. TM is still in a chicken-and-egg situation. > >> xmalloc > >> the runtime exits on memory allocation failure, > >> so it is not possible to use it safely. > >> (i think it should be possible to roll back the > >> transaction in case of internal allocation failure > >> and retry with a strategy that does not need dynamic > >> allocation). > > > > Not sure what you mean by "safely". Hardening against out-of-memory > > situations hasn't been considered to be of high priority so far, but I'd > > accept patches for that that don't increase complexity signifantly and > > don't hamper performance. > > > > i consider this a library safety issue. > > (a library or runtime is not safe to use if it may terminate > the process in case of internal failures.) If it is truly a purely internal failure, then aborting might be the best thing one can do if there is no sensible way to try to recover from the error (ie, take a fail-fast approach). Out-of-memory errors are not purely internal failures. I agree that it would be nice to have a fallback, but for some features there simply is none (eg, the program can't require rollback to be possible and yet not provide enough memory for this to be achievable). Given that this transactions have to be used from C programs too, there's not much libitm can do except perhaps call user-supplied handlers. > >> uint64_t GTM::gtm_spin_count_var = 1000; > >> i guess this was supposed to be tunable. > >> it seems libitm needs some knobs (strategy, retries, > >> spin counts), but there is no easy way to adapt these > >> for a target/runtime environment. > > > > Sure, more performance tuning knobs would be nice. > > > > my problem was with getting the knobs right at runtime. > > (i think this will need a solution to make tm practically > useful, there are settings that seem to be sensitive to > the properties of the underlying hw.. this also seems > to be a problem for glibc lock elision retry policies.) Yes, that applies to many tuning settings in lots of places. And certainly to TM implementations too :) > >> sys_futex0 > >> i'm not sure why this has arch specific implementations > >> for some targets but not others. (syscall is not in the > >> implementation reserved namespace). > > > > Are there archs that support libitm but don't have a definition of this > > one? > > > > i thought all targets were supported on linux > (the global lock based strategies should work) > i can prepare a sys_futex0 for arm and aarch64. arm and aarch64 should be supported according to configure.tgt. Also see the comment in config/linux/futex_bits.h if you want to change something there. I haven't tried arm at all so far.
Re: State of support for the ISO C++ Transactional Memory TS and remanining work
On Tue, 2015-11-10 at 17:26 +, Szabolcs Nagy wrote: > On 09/11/15 00:19, Torvald Riegel wrote: > > Hi, > > > > I'd like to summarize the current state of support for the TM TS, and > > outline the current plan for the work that remains to complete the > > support. > > > > I'm aware we're at the end of stage 1, but I'm confident we can still > > finish this work and hope to include it in GCC 6 because: > > (1) most of the support is already in GCC, and we have a big head start > > in the space of TM so it would be unfortunate to not waste that by not > > delivering support for the TM TS, > > (2) this is a TS and support for it is considered experimental, > > (3) most of the affected code is in libitm or the compiler's TM passes, > > which has to be enabled explicitly by the user. > > > > Currently, we have complete support for the syntax and all necessary > > instrumentation except the exception handling bits listed below. libitm > > has a good set of STM and HTM-based algorithms. > > > > > > What is missing on the compiler side is essentially a change of how we > > support atomic_noexcept and atomic_cancel, in particular exception > > handling. Instead of just using a finally block as done currently, the > > compiler need to build a catch clause so that it can actively intercept > > exceptions that escape an atomic_noexcept or atomic_cancel. For > > atomic_noexcept, the compiler needs to include a call to abort() in the > > catch clause. > > > > > > For atomic_cancel, it needs to call ITM_commitTransactionEH in the catch > > clause, and use NULL as exception argument. This can then be used by > > libitm to look at the currently being handled exception and (a) check > > whether the type support transaction cancellation as specified by the TS > > and (b) pick out the allocations that belong to this exception and roll > > back everything else before rethrowing this exception. > > > > For (a), it's probably best to place this check into libstdc++ > > (specifically, libsupc++ I suppose) because support for transaction > > cancellation is a property that library parts of the standard (or the > > TS) require, and that has to match the implementation in libstdc++. > > Attached is a patch by Jason that implements this check. This adds one > > symbol, which should be okay we hope. > > > > does that mean libitm will depend on libstdc++? No, weak references are used to avoid that. See libitm/eh_cpp.cc for example. > i think the tm runtime should not depend on c++, > so it is usable from c code. > > > For (b), our plan is to track the additional allocations that happen > > when during construction of the exception types that support > > cancellation (eg, creating the what() string for logic_error). There > > are several ways to do that, one of that being that we create custom > > transactional clones of those constructors that tell libitm that either > > such a constructor is currently running or explicitly list the > > allocations that have been made by the constructor; eventually, we would > > always (copy) construct into memory returned by cxa_allocate_exception, > > which then makes available the necessary undo information when such an > > exception is handled in libitm. > > > > > > The other big piece of missing support is making sure that the functions > > that are specified in the TS as transaction_safe are indeed that. I > > believe we do not need to actually add such annotations to any libstdc++ > > functions that are already transaction-safe and completely defined in > > headers -- those functions are implicitly transaction-safe, and we can > > thus let the compiler isntrument them at the point of use inside of a > > transaction. > > > > If a supposedly transaction-safe function is not defined in a header, > > we'd need a transaction_safe annotation at the declaration. Jason has > > implemented the TM TS feature test macro, so we can only add the > > annotation if the user has enabled support for the TM TS in the > > respective compilation process. > > sounds ugly: the function type (transaction-safety) > depends on the visibility of the definition.. I don't understand why that would be the case. The TS specifies whether a function has to be safe or not. We strive to implement that to support the TS. Nonetheless, the TS also has the notion of a function being implicitly transaction-safe (ie, if the compiler can check that it is, it can be used as-if annotated as transacton-safe even if it actually isn't). Also note that you can't overload based on tra
[PATCH] libitm: Support __cxa_free_exception and fix exception handling.
See the added overview comments in eh_cpp.cc. This is still lacking updated docs in the ABI spec, which I can add later. We don't need to change __cxa_tm_cleanup in libsupc++, but don't use the first two arguments of it anymore, which we may want to document too. Thoughts? commit 0a67dc5a13fd17a24fc667a251d000a73cd5159e Author: Torvald Riegel <trie...@redhat.com> Date: Tue Nov 3 15:38:22 2015 +0100 Support __cxa_free_exception and fix exception handling. diff --git a/libitm/beginend.cc b/libitm/beginend.cc index c3ed11b..86f7b39 100644 --- a/libitm/beginend.cc +++ b/libitm/beginend.cc @@ -132,6 +132,8 @@ GTM::gtm_thread::gtm_thread () number_of_threads_changed(number_of_threads - 1, number_of_threads); serial_lock.write_unlock (); + init_cpp_exceptions (); + if (pthread_once(_release_once, thread_exit_init)) GTM_fatal("Initializing thread release TLS key failed."); // Any non-null value is sufficient to trigger destruction of this @@ -383,6 +385,11 @@ GTM::gtm_thread::begin_transaction (uint32_t prop, const gtm_jmpbuf *jb) #endif } + // Log the number of uncaught exceptions if we might have to roll back this + // state. + if (tx->cxa_uncaught_count_ptr != 0) +tx->cxa_uncaught_count = *tx->cxa_uncaught_count_ptr; + // Run dispatch-specific restart code. Retry until we succeed. GTM::gtm_restart_reason rr; while ((rr = disp->begin_or_restart()) != NO_RESTART) @@ -411,7 +418,7 @@ GTM::gtm_transaction_cp::save(gtm_thread* tx) id = tx->id; prop = tx->prop; cxa_catch_count = tx->cxa_catch_count; - cxa_unthrown = tx->cxa_unthrown; + cxa_uncaught_count = tx->cxa_uncaught_count; disp = abi_disp(); nesting = tx->nesting; } @@ -583,7 +590,6 @@ GTM::gtm_thread::trycommit () undolog.commit (); // Reset further transaction state. cxa_catch_count = 0; - cxa_unthrown = NULL; restart_total = 0; // Ensure privatization safety, if necessary. diff --git a/libitm/eh_cpp.cc b/libitm/eh_cpp.cc index a86dbf1..e28c057 100644 --- a/libitm/eh_cpp.cc +++ b/libitm/eh_cpp.cc @@ -26,6 +26,50 @@ using namespace GTM; +/* Exceptions can exist in three phases: (1) after having been allocated by + __cxa_allocate_exception but before being handed off to __cxa_throw, + (2) when they are in flight, so between __cxa_throw and __cxa_begin_catch, + and (3) when they are being handled (between __cxa_begin_catch and + __cxa_end_catch). + + We can get aborts in all three phases, for example in (1) during + construction of the exception object, or in (2) during destructors called + while unwinding the stack. The transaction that created an exception + object can commit in phase (2) but not in phases (1) and (3) because both + throw expressions and catch clauses are properly nested wrt transactions. + + We handle phase (1) by dealing with exception objects similar to how we + deal with other (de)allocations, which also ensures that we can have more + than one exception object allocated at the same time (e.g., if the + throw expression itself throws an exception and thus calls + __cxa_allocate_exception). However, on the call to __cxa_begin_catch + we hand off the exception to the special handling of phase (3) and + remove the undo log entry of the allocation. Note that if the allocation + happened outside of this transaction, we do not need to do anything. + + When an exception reaches phase (2) due to a call to __cxa_throw, the count + of uncaught exceptions is incremented. We roll back this effect by saving + and restoring this number in the structure returned from __cxa_get_globals. + This also takes care of increments of this count when rethrowing an + exception. + + For phase (3), we keep track of the number of times __cxa_begin_catch + has been called without a matching call to __cxa_end_catch. This count + is then used by __cxa_tm_cleanup to roll back the exception handling state + by calling __cxa_end_catch for the exceptions that have not been finished + yet (without running destructors though because we roll back the memory + anyway). + Once an exception that was allocated in this transaction enters phase (3), + it does not need to be deallocated on abort anymore because the calls to + __cxa_end_catch will take care of that. + + We require all code executed by the transaction to be transaction_safe (or + transaction_pure, or to have wrappers) if the transaction is to be rolled + back. However, we take care to not require this for transactions that + just commit; this way, transactions that enter serial mode and then call + uninstrumented code continue to work. + */ + /* Everything from libstdc++ is weak, to avoid requiring that library to be linked into plain C applications using libitm.so. */ @@ -33,85 +77,138 @@ using namespace GTM; extern "C" { +struct __cxa_e
[PATCH v2] libitm: Support sized delete.
This patch supports the sized variants of operator delete. Some change compare to v1. Tested on x86_64-linux. commit df00a283f2e37bd3c69f37783fa81dde7ccf1f94 Author: Torvald Riegel <trie...@redhat.com> Date: Thu Oct 29 18:52:20 2015 +0100 Support sized delete. This adds transactional clones of the sized version of operator delete. diff --git a/libitm/alloc.cc b/libitm/alloc.cc index bb292da..7b8786c 100644 --- a/libitm/alloc.cc +++ b/libitm/alloc.cc @@ -29,26 +29,38 @@ namespace GTM HIDDEN { void gtm_thread::record_allocation (void *ptr, void (*free_fn)(void *)) { - uintptr_t iptr = (uintptr_t) ptr; - - gtm_alloc_action *a = this->alloc_actions.find(iptr); - if (a == 0) -a = this->alloc_actions.insert(iptr); + // We do not deallocate before outermost commit, so we should never have + // an existing log entry for a new allocation. + gtm_alloc_action *a = this->alloc_actions.insert((uintptr_t) ptr); a->free_fn = free_fn; + a->free_fn_sz = 0; a->allocated = true; } void gtm_thread::forget_allocation (void *ptr, void (*free_fn)(void *)) { - uintptr_t iptr = (uintptr_t) ptr; - - gtm_alloc_action *a = this->alloc_actions.find(iptr); - if (a == 0) -a = this->alloc_actions.insert(iptr); - + // We do not deallocate before outermost commit, so we should never have + // an existing log entry for a deallocation at the same address. We may + // have an existing entry for a matching allocation, but this is handled + // correctly because both are complementary in that only one of these will + // cause an action at commit or abort. + gtm_alloc_action *a = this->alloc_actions.insert((uintptr_t) ptr); a->free_fn = free_fn; + a->free_fn_sz = 0; + a->allocated = false; +} + +void +gtm_thread::forget_allocation (void *ptr, size_t sz, + void (*free_fn_sz)(void *, size_t)) +{ + // Same as forget_allocation but with a size. + gtm_alloc_action *a = this->alloc_actions.insert((uintptr_t) ptr); + a->free_fn = 0; + a->free_fn_sz = free_fn_sz; + a->sz = sz; a->allocated = false; } @@ -67,31 +79,27 @@ commit_allocations_2 (uintptr_t key, gtm_alloc_action *a, void *data) if (cb_data->revert_p) { - // Roll back nested allocations. + // Roll back nested allocations, discard deallocations. if (a->allocated) - a->free_fn (ptr); + { + if (a->free_fn_sz != 0) + a->free_fn_sz (ptr, a->sz); + else + a->free_fn (ptr); + } } else { - if (a->allocated) - { - // Add nested allocations to parent transaction. - gtm_alloc_action* a_parent = cb_data->parent->insert(key); - *a_parent = *a; - } - else - { - // ??? We could eliminate a parent allocation that matches this - // memory release, if we had support for removing all accesses - // to this allocation from the transaction's undo and redo logs - // (otherwise, the parent transaction's undo or redo might write to - // data that is already shared again because of calling free()). - // We don't have this support currently, and the benefit of this - // optimization is unknown, so just add it to the parent. - gtm_alloc_action* a_parent; - a_parent = cb_data->parent->insert(key); - *a_parent = *a; - } + // Add allocations and deallocations to parent. + // ??? We could eliminate a (parent) allocation that matches this + // a deallocation, if we had support for removing all accesses + // to this allocation from the transaction's undo and redo logs + // (otherwise, the parent transaction's undo or redo might write to + // data that is already shared again because of calling free()). + // We don't have this support currently, and the benefit of this + // optimization is unknown, so just add it to the parent. + gtm_alloc_action* a_parent = cb_data->parent->insert(key); + *a_parent = *a; } } @@ -99,10 +107,15 @@ static void commit_allocations_1 (uintptr_t key, gtm_alloc_action *a, void *cb_data) { void *ptr = (void *)key; - uintptr_t revert_p = (uintptr_t) cb_data; + bool revert_p = (bool) (uintptr_t) cb_data; - if (a->allocated == revert_p) -a->free_fn (ptr); + if (revert_p == a->allocated) +{ + if (a->free_fn_sz != 0) + a->free_fn_sz (ptr, a->sz); + else + a->free_fn (ptr); +} } /* Permanently commit allocated memory during transaction. diff --git a/libitm/alloc_cpp.cc b/libitm/alloc_cpp.cc index 8514618..13185a7 100644 --- a/libitm/alloc_cpp.cc +++ b/libitm/alloc_cpp.cc @@ -35,41 +35,50 @@ using namespace GTM; #define _ZnwX S(_Znw,MANGLE_SIZE_T) #define _ZnaX S(_Zna,MANGLE_SIZE_T) +#define _ZdlPvX S(_ZdlPv,MANGLE_SIZE_T) #define _ZnwXRKSt9nothrow_t S(S(_Znw,MANGLE_SIZE_T),RKSt9nothrow_t) #define _ZnaXRKSt9nothrow_t S(S(_Zna,MANGLE_SIZE_T),RKSt9nothrow_t) +#define _ZdlPvXRKSt9nothrow_t S(S(_ZdlPv,MANGLE_SIZE_T),RKSt9
State of support for the ISO C++ Transactional Memory TS and remanining work
Hi, I'd like to summarize the current state of support for the TM TS, and outline the current plan for the work that remains to complete the support. I'm aware we're at the end of stage 1, but I'm confident we can still finish this work and hope to include it in GCC 6 because: (1) most of the support is already in GCC, and we have a big head start in the space of TM so it would be unfortunate to not waste that by not delivering support for the TM TS, (2) this is a TS and support for it is considered experimental, (3) most of the affected code is in libitm or the compiler's TM passes, which has to be enabled explicitly by the user. Currently, we have complete support for the syntax and all necessary instrumentation except the exception handling bits listed below. libitm has a good set of STM and HTM-based algorithms. What is missing on the compiler side is essentially a change of how we support atomic_noexcept and atomic_cancel, in particular exception handling. Instead of just using a finally block as done currently, the compiler need to build a catch clause so that it can actively intercept exceptions that escape an atomic_noexcept or atomic_cancel. For atomic_noexcept, the compiler needs to include a call to abort() in the catch clause. For atomic_cancel, it needs to call ITM_commitTransactionEH in the catch clause, and use NULL as exception argument. This can then be used by libitm to look at the currently being handled exception and (a) check whether the type support transaction cancellation as specified by the TS and (b) pick out the allocations that belong to this exception and roll back everything else before rethrowing this exception. For (a), it's probably best to place this check into libstdc++ (specifically, libsupc++ I suppose) because support for transaction cancellation is a property that library parts of the standard (or the TS) require, and that has to match the implementation in libstdc++. Attached is a patch by Jason that implements this check. This adds one symbol, which should be okay we hope. For (b), our plan is to track the additional allocations that happen when during construction of the exception types that support cancellation (eg, creating the what() string for logic_error). There are several ways to do that, one of that being that we create custom transactional clones of those constructors that tell libitm that either such a constructor is currently running or explicitly list the allocations that have been made by the constructor; eventually, we would always (copy) construct into memory returned by cxa_allocate_exception, which then makes available the necessary undo information when such an exception is handled in libitm. The other big piece of missing support is making sure that the functions that are specified in the TS as transaction_safe are indeed that. I believe we do not need to actually add such annotations to any libstdc++ functions that are already transaction-safe and completely defined in headers -- those functions are implicitly transaction-safe, and we can thus let the compiler isntrument them at the point of use inside of a transaction. If a supposedly transaction-safe function is not defined in a header, we'd need a transaction_safe annotation at the declaration. Jason has implemented the TM TS feature test macro, so we can only add the annotation if the user has enabled support for the TM TS in the respective compilation process. We also need ensure that there is a transaction clode of the function. This will add symbols to libstdc++, but these all have their own special prefix in the mangled name. I'd like to get feedback on how to best trigger the insturmentation and make it a part of a libstdc++ build. (If that would show to be too problematic, we could still fall back to writing transacitonal clones manually.) For the clones of the constructors of the types that support cancellation, I suppose manually written clones might be easier than automatic instrumentation. I've not yet created tests for the full list of functions specified as transaction-safe in the TS, but my understanding is that this list was created after someone from the ISO C++ TM study group looked at libstdc ++'s implementation and investigated which functions might be feasible to be declared transaction-safe in it. I'm looking forward to your feedback. Thanks, Torvald // -*- C++ -*- GNU C++ atomic_cancel support. // Copyright (C) 2015 Free Software Foundation, Inc. // // This file is part of GCC. // // GCC is free software; you can redistribute it and/or modify // it under the terms of the GNU General Public License as published by // the Free Software Foundation; either version 3, or (at your option) // any later version. // // GCC is distributed in the hope that it will be useful, // but WITHOUT ANY WARRANTY; without even the implied warranty of // MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the // GNU General Public License for more details. // //
Re: [PATCH] libitm: Support sized delete.
On Fri, 2015-10-30 at 10:19 -0700, Richard Henderson wrote: > > #define _ZnwX S(_Znw,MANGLE_SIZE_T) > > #define _ZnaX S(_Zna,MANGLE_SIZE_T) > > +#define _ZdlPvXS(_ZdlPv,MANGLE_SIZE_T) > > #define _ZnwXRKSt9nothrow_tS(S(_Znw,MANGLE_SIZE_T),RKSt9nothrow_t) > > #define _ZnaXRKSt9nothrow_tS(S(_Zna,MANGLE_SIZE_T),RKSt9nothrow_t) > > +#define _ZdlPvXRKSt9nothrow_t > > S(S(_ZdlPv,MANGLE_SIZE_T),RKSt9nothrow_t) These are symbols that are provided by libstdc++ if it is available (otherwise, we use dummy implementations, see the other parts of the patch)... > > #define _ZGTtnwX S(_ZGTtnw,MANGLE_SIZE_T) > > #define _ZGTtnaX S(_ZGTtna,MANGLE_SIZE_T) > > +#define _ZGTtdlPvX S(_ZGTtdlPv,MANGLE_SIZE_T) > > #define _ZGTtnwXRKSt9nothrow_t > > S(S(_ZGTtnw,MANGLE_SIZE_T),RKSt9nothrow_t) > > #define _ZGTtnaXRKSt9nothrow_t > > S(S(_ZGTtna,MANGLE_SIZE_T),RKSt9nothrow_t) > > +#define _ZGTtdlPvXRKSt9nothrow_t > > S(S(_ZGTtdlPv,MANGLE_SIZE_T),RKSt9nothrow_t) > > One more thing... Why are there 4 new symbols here... > > > +/* Wrap: operator delete(void* ptr, std::size_t sz) */ > > +void > > +_ZGTtdlPvX (void *ptr, size_t sz) > > +{ > > + if (ptr) > > +gtm_thr()->forget_allocation (ptr, sz, _ZdlPvX); > > +} > > + > > +/* Wrap: operator delete (void *ptr, std::size_t sz, const > > std::nothrow_t&) */ > > +void > > +_ZGTtdlPvXRKSt9nothrow_t (void *ptr, size_t sz, c_nothrow_p nt UNUSED) > > +{ > > + if (ptr) > > +gtm_thr()->forget_allocation (ptr, sz, delsz_opnt); > > +} > > + ... while those are the transactional wrappers that we actually want to provide and export from libitm. I'll probably revise the patch in a v2 though depending on how we decide to handle allocations of exception objects.
Re: [PATCH] libitm: Support sized delete.
On Thu, 2015-10-29 at 12:38 -0700, Richard Henderson wrote: > On 10/29/2015 11:19 AM, Torvald Riegel wrote: > > diff --git a/libitm/libitm.map b/libitm/libitm.map > > index 21bcfdf..7fc9a41 100644 > > --- a/libitm/libitm.map > > +++ b/libitm/libitm.map > > @@ -168,10 +168,12 @@ LIBITM_1.0 { > > _ZGTtnw?; > > _ZGTtna?; > > _ZGTtdlPv; > > + _ZGTtdlPv?; > > _ZGTtdaPv; > > _ZGTtnw?RKSt9nothrow_t; > > _ZGTtna?RKSt9nothrow_t; > > _ZGTtdlPvRKSt9nothrow_t; > > + _ZGTtdlPv?RKSt9nothrow_t; > > _ZGTtdaPvRKSt9nothrow_t; > > > > _ITM_cxa_allocate_exception; > > Everything looks good except for this part. The new symbols need to go into a > new symbol version. C.f. libatomic.map for the syntax. Ah, thanks. OK in the updated patch that's attached? I've also looked at the sized delete paper again, and we should be able to called the underlying unsized delete even if the compiler issued a call to the sized delete (ie, to answer my own question). However, it's not difficult or too much overhead to call the right version, so I'll just keep the patch the way it is. commit 1125e1b96cddf71b907ff382898239f09410d48e Author: Torvald Riegel <trie...@redhat.com> Date: Thu Oct 29 18:52:20 2015 +0100 Support sized delete. This adds transactional clones of the sized version of operator delete. diff --git a/libitm/alloc.cc b/libitm/alloc.cc index bb292da..a72848d 100644 --- a/libitm/alloc.cc +++ b/libitm/alloc.cc @@ -37,6 +37,7 @@ gtm_thread::record_allocation (void *ptr, void (*free_fn)(void *)) a->free_fn = free_fn; a->allocated = true; + a->sized_delete = false; } void @@ -50,6 +51,23 @@ gtm_thread::forget_allocation (void *ptr, void (*free_fn)(void *)) a->free_fn = free_fn; a->allocated = false; + a->sized_delete = false; +} + +void +gtm_thread::forget_allocation (void *ptr, size_t sz, + void (*free_fn_sz)(void *, size_t)) +{ + uintptr_t iptr = (uintptr_t) ptr; + + gtm_alloc_action *a = this->alloc_actions.find(iptr); + if (a == 0) +a = this->alloc_actions.insert(iptr); + + a->free_fn_sz = free_fn_sz; + a->allocated = false; + a->sized_delete = true; + a->sz = sz; } namespace { @@ -102,7 +120,12 @@ commit_allocations_1 (uintptr_t key, gtm_alloc_action *a, void *cb_data) uintptr_t revert_p = (uintptr_t) cb_data; if (a->allocated == revert_p) -a->free_fn (ptr); +{ + if (a->sized_delete) + a->free_fn_sz (ptr, a->sz); + else + a->free_fn (ptr); +} } /* Permanently commit allocated memory during transaction. diff --git a/libitm/alloc_cpp.cc b/libitm/alloc_cpp.cc index 8514618..13185a7 100644 --- a/libitm/alloc_cpp.cc +++ b/libitm/alloc_cpp.cc @@ -35,41 +35,50 @@ using namespace GTM; #define _ZnwX S(_Znw,MANGLE_SIZE_T) #define _ZnaX S(_Zna,MANGLE_SIZE_T) +#define _ZdlPvX S(_ZdlPv,MANGLE_SIZE_T) #define _ZnwXRKSt9nothrow_t S(S(_Znw,MANGLE_SIZE_T),RKSt9nothrow_t) #define _ZnaXRKSt9nothrow_t S(S(_Zna,MANGLE_SIZE_T),RKSt9nothrow_t) +#define _ZdlPvXRKSt9nothrow_t S(S(_ZdlPv,MANGLE_SIZE_T),RKSt9nothrow_t) #define _ZGTtnwX S(_ZGTtnw,MANGLE_SIZE_T) #define _ZGTtnaX S(_ZGTtna,MANGLE_SIZE_T) +#define _ZGTtdlPvX S(_ZGTtdlPv,MANGLE_SIZE_T) #define _ZGTtnwXRKSt9nothrow_t S(S(_ZGTtnw,MANGLE_SIZE_T),RKSt9nothrow_t) #define _ZGTtnaXRKSt9nothrow_t S(S(_ZGTtna,MANGLE_SIZE_T),RKSt9nothrow_t) +#define _ZGTtdlPvXRKSt9nothrow_t S(S(_ZGTtdlPv,MANGLE_SIZE_T),RKSt9nothrow_t) /* Everything from libstdc++ is weak, to avoid requiring that library to be linked into plain C applications using libitm.so. */ extern "C" { -extern void *_ZnwX (size_t) __attribute__((weak)); -extern void _ZdlPv (void *) __attribute__((weak)); -extern void *_ZnaX (size_t) __attribute__((weak)); -extern void _ZdaPv (void *) __attribute__((weak)); +extern void *_ZnwX (size_t) __attribute__((weak)); +extern void _ZdlPv (void *) __attribute__((weak)); +extern void _ZdlPvX (void *, size_t) __attribute__((weak)); +extern void *_ZnaX (size_t) __attribute__((weak)); +extern void _ZdaPv (void *) __attribute__((weak)); typedef const struct nothrow_t { } *c_nothrow_p; extern void *_ZnwXRKSt9nothrow_t (size_t, c_nothrow_p) __attribute__((weak)); extern void _ZdlPvRKSt9nothrow_t (void *, c_nothrow_p) __attribute__((weak)); +extern void _ZdlPvXRKSt9nothrow_t +(void *, size_t, c_nothrow_p) __attribute__((weak)); extern void *_ZnaXRKSt9nothrow_t (size_t, c_nothrow_p) __attribute__((weak)); extern void _ZdaPvRKSt9nothrow_t (void *, c_nothrow_p) __attribute__((weak)); #if !defined (HAVE_ELF_STYLE_WEAKREF) -void *_ZnwX (size_t) { return NULL; } -void _ZdlPv (void *) { return; } -void *_ZnaX (size_t) { return NULL; } -void _ZdaPv (void *) { return; } - -void *_ZnwXRKSt9nothrow_t (size_t, c_nothrow_p) { return NULL; } -void _ZdlPvRKSt
[PATCH] libitm: Support sized delete.
This patch supports the sized variants of operator delete. Tested on x86_64-linux. Is this OK or is it completely sufficient to just call non-sized operator delete from libitm even if the program has a sized delete? (If the latter is the case, I would just remove the special handling for that and use non-sized dlete under the covers.) 2015-10-29 Torvald Riegel <trie...@redhat.com> * alloc_cpp.cc (_ZdlPvX, _ZdlPvXRKSt9nothrow_t, _ZGTtdlPvX, _ZGTtdlPvXRKSt9nothrow_t, delsz_opnt): New. * libitm.map: Add _ZGTtdlPvX and _ZGTtdlPvXRKSt9nothrow_t. * libitm_i.h (gtm_alloc_action): Add free_fn_sz, sz, sized_delete members. (gtm_thread::forget_allocations): New overload with size_t argument. * alloc.c (gtm_thread::forget_allocation): Define new overload and adapt existing one. (gtm_thread::record_allocation): Adapt. * testsuite/libitm.c++/newdelete.C: New. commit f0d6b07a62681ad3cd477cf3c75904ee22baf75e Author: Torvald Riegel <trie...@redhat.com> Date: Thu Oct 29 18:52:20 2015 +0100 Support sized delete. This adds transactional clones of the sized version of operator delete. diff --git a/libitm/alloc.cc b/libitm/alloc.cc index bb292da..a72848d 100644 --- a/libitm/alloc.cc +++ b/libitm/alloc.cc @@ -37,6 +37,7 @@ gtm_thread::record_allocation (void *ptr, void (*free_fn)(void *)) a->free_fn = free_fn; a->allocated = true; + a->sized_delete = false; } void @@ -50,6 +51,23 @@ gtm_thread::forget_allocation (void *ptr, void (*free_fn)(void *)) a->free_fn = free_fn; a->allocated = false; + a->sized_delete = false; +} + +void +gtm_thread::forget_allocation (void *ptr, size_t sz, + void (*free_fn_sz)(void *, size_t)) +{ + uintptr_t iptr = (uintptr_t) ptr; + + gtm_alloc_action *a = this->alloc_actions.find(iptr); + if (a == 0) +a = this->alloc_actions.insert(iptr); + + a->free_fn_sz = free_fn_sz; + a->allocated = false; + a->sized_delete = true; + a->sz = sz; } namespace { @@ -102,7 +120,12 @@ commit_allocations_1 (uintptr_t key, gtm_alloc_action *a, void *cb_data) uintptr_t revert_p = (uintptr_t) cb_data; if (a->allocated == revert_p) -a->free_fn (ptr); +{ + if (a->sized_delete) + a->free_fn_sz (ptr, a->sz); + else + a->free_fn (ptr); +} } /* Permanently commit allocated memory during transaction. diff --git a/libitm/alloc_cpp.cc b/libitm/alloc_cpp.cc index 8514618..13185a7 100644 --- a/libitm/alloc_cpp.cc +++ b/libitm/alloc_cpp.cc @@ -35,41 +35,50 @@ using namespace GTM; #define _ZnwX S(_Znw,MANGLE_SIZE_T) #define _ZnaX S(_Zna,MANGLE_SIZE_T) +#define _ZdlPvXS(_ZdlPv,MANGLE_SIZE_T) #define _ZnwXRKSt9nothrow_tS(S(_Znw,MANGLE_SIZE_T),RKSt9nothrow_t) #define _ZnaXRKSt9nothrow_tS(S(_Zna,MANGLE_SIZE_T),RKSt9nothrow_t) +#define _ZdlPvXRKSt9nothrow_t S(S(_ZdlPv,MANGLE_SIZE_T),RKSt9nothrow_t) #define _ZGTtnwX S(_ZGTtnw,MANGLE_SIZE_T) #define _ZGTtnaX S(_ZGTtna,MANGLE_SIZE_T) +#define _ZGTtdlPvX S(_ZGTtdlPv,MANGLE_SIZE_T) #define _ZGTtnwXRKSt9nothrow_t S(S(_ZGTtnw,MANGLE_SIZE_T),RKSt9nothrow_t) #define _ZGTtnaXRKSt9nothrow_t S(S(_ZGTtna,MANGLE_SIZE_T),RKSt9nothrow_t) +#define _ZGTtdlPvXRKSt9nothrow_t S(S(_ZGTtdlPv,MANGLE_SIZE_T),RKSt9nothrow_t) /* Everything from libstdc++ is weak, to avoid requiring that library to be linked into plain C applications using libitm.so. */ extern "C" { -extern void *_ZnwX (size_t) __attribute__((weak)); -extern void _ZdlPv (void *) __attribute__((weak)); -extern void *_ZnaX (size_t) __attribute__((weak)); -extern void _ZdaPv (void *) __attribute__((weak)); +extern void *_ZnwX (size_t) __attribute__((weak)); +extern void _ZdlPv (void *) __attribute__((weak)); +extern void _ZdlPvX (void *, size_t) __attribute__((weak)); +extern void *_ZnaX (size_t) __attribute__((weak)); +extern void _ZdaPv (void *) __attribute__((weak)); typedef const struct nothrow_t { } *c_nothrow_p; extern void *_ZnwXRKSt9nothrow_t (size_t, c_nothrow_p) __attribute__((weak)); extern void _ZdlPvRKSt9nothrow_t (void *, c_nothrow_p) __attribute__((weak)); +extern void _ZdlPvXRKSt9nothrow_t +(void *, size_t, c_nothrow_p) __attribute__((weak)); extern void *_ZnaXRKSt9nothrow_t (size_t, c_nothrow_p) __attribute__((weak)); extern void _ZdaPvRKSt9nothrow_t (void *, c_nothrow_p) __attribute__((weak)); #if !defined (HAVE_ELF_STYLE_WEAKREF) -void *_ZnwX (size_t) { return NULL; } -void _ZdlPv (void *) { return; } -void *_ZnaX (size_t) { return NULL; } -void _ZdaPv (void *) { return; } - -void *_ZnwXRKSt9nothrow_t (size_t, c_nothrow_p) { return NULL; } -void _ZdlPvRKSt9nothrow_t (void *, c_nothrow_p) { return; } -void *_ZnaXRKSt9nothrow_t (size_t, c_nothrow_p) { return NULL; } -void _ZdaPvRKSt9nothrow_t (vo
Re: [PATCH, rs6000] Add memory barriers to tbegin, tend, etc.
On Fri, 2015-10-09 at 11:52 -0500, Peter Bergner wrote: > On Fri, 2015-10-09 at 16:41 +0200, Torvald Riegel wrote: > > On Thu, 2015-09-03 at 16:58 -0500, Peter Bergner wrote: > >> +Note that the semantics of the above HTM builtins are required to mimic > >> the > >> +locking semantics used for critical sections. Builtins that are used to > >> +create a new transaction or restart a suspended transaction (specifically > >> any > >> +builtin that changes the state from non-transactional to transactional) > > > > Those builtins could be nested in transactions yet would need to provide > > the same semantics at least as far as the compiler is concerned, so > > maybe change "specifically" to "for example"? > > Ah, good point. I forgot about the nested example. In that case, how about > we just drop the text within the ()'s instead? I'm not sure we need it > given your suggested addition below. Sounds good to me. > > I haven't thought enough about txn suspend/resume/abort to make a > > proposal how their required semantics would be specified best. Maybe we > > should call this out explicitly, or maybe the sentences above are > > sufficient because they refer to HTM instructions generally. > > Well we do mention that we need acquire semantics not just for instructions > that start transactions, but also for instructions that resume transactions. > Similar for release and end/suspend, so I guess I'd lean towards what we have > is sufficient. OK. > > > Maybe the second "HTM instructions" should be "builtins" too, I don't > > know. > > Well the builtins expand into multiple hardware instructions and the > memory barriers are only attached to the HTM instructions and not the > other instructions that are generated by the builtins, so I think > HTM instructions is probably more correct here. But if you think > builtins is better, I can change it. No, it's exactly this aspect of the implementation that I didn't know about, and thus wasn't sure. Sounds all good. > Thanks for the review! I've attached the changes to the documentation below. > Is this better? Yes, thanks!
Re: [PATCH, rs6000] Add memory barriers to tbegin, tend, etc.
Sorry for the much delayed response. I've been sick and am slowly catching up. On Thu, 2015-09-03 at 16:58 -0500, Peter Bergner wrote: > On a glibc thread discussing this issue, Torvald also asked that I add > documention describing the memory consistency semantics the HTM instructions > should have, so I added a blurb about that. Torvald, is the text below > what you were looking for? > Index: gcc/doc/extend.texi > === > --- gcc/doc/extend.texi (revision 227308) > +++ gcc/doc/extend.texi (working copy) > @@ -16030,6 +16030,23 @@ > unsigned int __builtin_tsuspend (void) > @end smallexample > > +Note that the semantics of the above HTM builtins are required to mimic the > +locking semantics used for critical sections. Builtins that are used to > +create a new transaction or restart a suspended transaction (specifically any > +builtin that changes the state from non-transactional to transactional) Those builtins could be nested in transactions yet would need to provide the same semantics at least as far as the compiler is concerned, so maybe change "specifically" to "for example"? > must > +have lock acquisition like semantics while those builtins that end or suspend > +a transaction (ie, changes the state from transactional to non-transactional) > +must have lock release like semantics. I would be a bit more specific just to avoid confusion about how the locks are implemented or what specific semantics they have. What about adding this or something similar: "Specifically, this must mimic lock semantics as specified by C++11, for example: Lock acquisition is as-if an execution of __atomic_exchange_n(,1,__ATOMIC_ACQUIRE) that returns 0, and lock release is as-if an execution of __atomic_store(,0,__ATOMIC_RELEASE), with globallock being an implicit implementation-defined lock used for all transactions." > The HTM instructions associated with > +with the builtins inherently provide the correct acquisition and release > +semantics required. I would say "hardware barriers" or "hardware fences" instead of "semantics" to clarify that this is just one part. > However, the compiler must be prohibited from moving > +loads and stores across the HTM instructions. This has been accomplished > +by adding memory barriers to the associated HTM instructions. I think it would be good to make the requirement tighter, so that we do not promise too much. What about changing this to: "However, the compiler must also be prohibited from moving loads and stores across the builtins in a way that would violate their semantics. This has been accomplished by adding memory barriers to the associated HTM instructions (which is a conservative approach to provide acquire and release semantics)." I haven't thought enough about txn suspend/resume/abort to make a proposal how their required semantics would be specified best. Maybe we should call this out explicitly, or maybe the sentences above are sufficient because they refer to HTM instructions generally. Maybe the second "HTM instructions" should be "builtins" too, I don't know. > Earlier versions > +of the compiler did not treat the HTM instructions as memory barriers. > +A @code{__TM_FENCE__} macro has been added, which can be used to determine > +whether the current compiler treats HTM instructions as memory barriers or > not. "builtins" instead of "HTM instructions"? > +This allows the user to explicitly add memory barriers to their code when > +using an older version of the compiler. > + > The following set of built-in functions are available to gain access > to the HTM specific special purpose registers. Thanks for documenting the semantics!
Re: [gomp4.1] Doacross library implementation
On Thu, 2015-09-24 at 20:32 +0200, Jakub Jelinek wrote: > Torvald, can you please have a look at it, if I got all the atomics / memory > models right? More detailed comments below, but in general, I'd really suggest to add more code comments for the synchronization parts. In the end, the level of detail of documentation of libgomp is your decision, but, for example, the lack of comments in synchronization code in glibc has made maintaining this code and fixing issues in it very costly. It has also been hard to understand for many. My suggestion would be both to (1) document the high-level, abstract synchronization scheme and (2) how that scheme is implemented. The first point is important in my experience because typically, the high-level scheme and the actual thinking behind it (or, IOW, the intent of the original author) is much harder to reconstruct in case of concurrent code than it is for sequential code; you can't just simply follow the program along line by line, but have to consider interleavings. Even if the synchronization problem to solve is relatively straight-forward as in thise case (ie, one-directional waiting), it's worth IMO to do point (1). If it is simple, the high-level description will be simple, and it will assure others that one really has to just solve that and that the original author wasn't aware of any other issues. Regarding point (2), what we're doing in glibc now is basically to document how the specific things we do in the code make sure we implement the high-level scheme. So we'd say things like "this CAS here now ensures consensus among the threads A, B, C". For memory orders specifically, it helps to document why they are sufficient and necessary; this helps others understand the code, so that they don't need to go hunting through all of the code looking for other accesses to the same memory locations to be able to reconstruct the intended happens-before relations. I have some examples below. Also, given that you don't use explicit atomic types but just atomic operations, it's IMO a good idea to document which variables are supposed to be accessed atomically so it becomes easier to not violate the data-race-freedom requirement accidentally. > The testcase obviously is not a good benchmark, we'll need > some more realistic one. But obviously when asking for oversubscription, it > is quite expensive. The question is how to implement a non-busy waiting > fallback, whether we put some mutex and queue guarded by the mutex into the > same (or some other?) cache-line, or just use atomics to queue it and how to > make it cheap for the case where busy waiting is sufficient. Atomics and futexes is probably the best approach if you want performance; at least we need some efficient way for post() to figure out that there are indeed waiters, and we don't want to use a lock for that. What specific approach to use is a question of how much time we want to spend on this. It's hard to estimate how often we'd really need a blocking wait in practice, though. > I'd say > it should be sufficient to implement non-busy waiting in the flattened > variant. Sounds fine to me. > --- libgomp/ordered.c.jj 2015-09-18 18:36:42.0 +0200 > +++ libgomp/ordered.c 2015-09-24 18:20:28.286244397 +0200 > @@ -252,14 +254,146 @@ GOMP_ordered_end (void) > { > } > > +/* DOACROSS initialization. */ > + > +#define MAX_COLLAPSED_BITS (__SIZEOF_LONG__ * __CHAR_BIT__) > + > +void > +gomp_doacross_init (unsigned ncounts, long *counts, long chunk_size) > +{ > + struct gomp_thread *thr = gomp_thread (); > + struct gomp_team *team = thr->ts.team; > + struct gomp_work_share *ws = thr->ts.work_share; > + unsigned int i, bits[MAX_COLLAPSED_BITS], num_bits = 0; > + unsigned long ent, num_ents, elt_sz, shift_sz; > + struct gomp_doacross_work_share *doacross; > + > + if (team == NULL || team->nthreads == 1) > +return; > + > + for (i = 0; i < ncounts; i++) > +{ > + /* If any count is 0, GOMP_doacross_{post,wait} can't be called. */ > + if (counts[i] == 0) > + return; > + > + if (num_bits <= MAX_COLLAPSED_BITS) > + { > + unsigned int this_bits; > + if (counts[i] == 1) > + this_bits = 1; > + else > + this_bits = __SIZEOF_LONG__ * __CHAR_BIT__ > + - __builtin_clzl (counts[i] - 1); > + if (num_bits + this_bits <= MAX_COLLAPSED_BITS) > + { > + bits[i] = this_bits; > + num_bits += this_bits; > + } > + else > + num_bits = MAX_COLLAPSED_BITS + 1; > + } > +} > + > + if (ws->sched == GFS_STATIC) > +num_ents = team->nthreads; > + else > +num_ents = (counts[0] - 1) / chunk_size + 1; > + if (num_bits <= MAX_COLLAPSED_BITS) > +{ > + elt_sz = sizeof (unsigned long); > + shift_sz = ncounts * sizeof (unsigned int); > +} > + else > +{ > + elt_sz = sizeof (unsigned long) * ncounts; > + shift_sz = 0; > +} > +
Re: RFC: PATCH for front end parts of C++ transactional memory TS
On Fri, 2015-10-02 at 14:13 -0400, Jason Merrill wrote: > The patch also doesn't attempt to do anything about the library. The > second patch sets transaction_safe on various built-ins, but without the > library support this just means references to undefined symbols. For some of the builtins, we already have functions in the libitm ABI: malloc and free, memcpy, memmove, memset. See libitm/libitm.h. Can you transform those same as is already done for the respective C functions? Other builtins should probably just transaction_pure: abort, I believe, and I guess the math functions too (although I haven't thought about FP exceptions at all so far). For other builtins, I guess we should either map them to the transactional variants of the respective C functions if we support those already (but the builtins might not change errno whether the C functions might do), or extend the libitm ABI with more _ITM_ functions. If we just need the builtins for libstdc++, just making the builtins tm_safe might be easier. Thoughts? > The > third patch is a beginning of libstdc++ support, which may or may not be > useful to Jonathan or Torvald in completing that support. I'd at least need guidance from Jon how to best integrate the annotations, and how to best ship transaction variants in libstdc++ (eg, do they go into a separate library because of TS vs. libstdc++?) I can add transactional versions for anything that the compiler doesn't take care of when provided with the source code. But I think those cases should be rather rare.
Re: [PATCH] [PING] [PR libitm/61164] Remove redefinition of glibc internal macro __always_inline
On Mon, 2015-08-17 at 13:16 +0200, Gleb Fotengauer-Malinovskiy wrote: On Sun, Aug 16, 2015 at 07:35:17PM +0200, Torvald Riegel wrote: On Thu, 2015-06-11 at 14:36 +0300, Gleb Fotengauer-Malinovskiy wrote: On Fri, May 15, 2015 at 03:04:27PM +0200, Torvald Riegel wrote: On Wed, 2015-05-06 at 17:54 +0300, Gleb Fotengauer-Malinovskiy wrote: 2015-05-06 Gleb Fotengauer-Malinovskiy gle...@altlinux.org PR libitm/61164 * local_atomic (__always_inline): Rename to... (__libitm_always_inline): ... this. OK. Thanks. You are welcome. It seems still not applied, AFAICS. I forgot to ask you at Cauldron whether you have completed a copyright assignment agreement? No, I didn't. I'm in progress. I don't think this is small enough to be a trivial patch. A regular series of repeated changes, such as renaming a symbol, is not legally significant even if the symbol has to be renamed in many places. https://www.gnu.org/prep/maintain/html_node/Legally-Significant.html#Legally-Significant OK. I committed this to trunk for you, with the (tiny change) marker in the ChangeLog as suggested on this GNU web page.
Re: [PATCH] [PING] [PR libitm/61164] Remove redefinition of glibc internal macro __always_inline
On Thu, 2015-06-11 at 14:36 +0300, Gleb Fotengauer-Malinovskiy wrote: On Fri, May 15, 2015 at 03:04:27PM +0200, Torvald Riegel wrote: On Wed, 2015-05-06 at 17:54 +0300, Gleb Fotengauer-Malinovskiy wrote: 2015-05-06 Gleb Fotengauer-Malinovskiy gle...@altlinux.org PR libitm/61164 * local_atomic (__always_inline): Rename to... (__libitm_always_inline): ... this. OK. Thanks. You are welcome. It seems still not applied, AFAICS. I forgot to ask you at Cauldron whether you have completed a copyright assignment agreement? I don't think this is small enough to be a trivial patch.
Re: [Patch libstdc++] Rewrite cpu/generic/atomic_word.h
On Fri, 2015-06-12 at 10:30 +0100, Ramana Radhakrishnan wrote: On Fri, Jun 12, 2015 at 10:06 AM, Jonathan Wakely jwak...@redhat.com wrote: On 11/06/15 23:56 +0200, Torvald Riegel wrote: On Fri, 2015-05-22 at 12:37 +0100, Ramana Radhakrishnan wrote: I don't think we can remove _GLIBCXX_READ_MEM_BARRIER and _GLIBCXX_WRITE_MEM_BARRIER from atomic_word.h even though they are superseded by the atomics as it is published in the documentation as available macros. I see. We should at least update the documentation of those, as the current one isn't a really portable specification. If we can, I'd deprecate them. Jonathan, what do you think? Yes, I'm in favour of deprecating them. They are GCC-specific anyway, so there is no reason to prefer them to std::atomic_ or __atomic_ fences. I'll treat it as a follow-up. Thanks. Can I get an ack for this patch though ? The above comment was of a general nature, not specifically about this patch or meant as a precondition for this patch. Sorry if that wasn't clear :) To me, your patch looks good.
Re: [Patch libstdc++] Rewrite cpu/generic/atomic_word.h
On Tue, 2015-06-09 at 10:50 +0100, Ramana Radhakrishnan wrote: On 22/05/15 17:56, Torvald Riegel wrote: On Fri, 2015-05-22 at 12:37 +0100, Ramana Radhakrishnan wrote: Hi, While writing atomic_word.h for the ARM backend to fix PR target/66200 I thought it would make more sense to write it all up with atomic primitives instead of providing various fragile bits of inline asssembler. Thus this patch came about. I intend to ask for a specialized version of this to be applied for the ARM backend in any case after testing completes. If this goes in as a cleanup I expect all the ports to be deleting their atomic_word.h implementation in the various ports directories and I'll follow up with patches asking port maintainers to test and apply for each of the individual cases if this is deemed to be good enough. Could you use C++11 atomics right away instead of adapting the wrappers? I spent some time trying to massage guard.cc into using C++11 atomics but it was just easier to write it with the builtins - I consider this to be a step improvement compared to where we are currently. Fair enough. I don't think we can remove _GLIBCXX_READ_MEM_BARRIER and _GLIBCXX_WRITE_MEM_BARRIER from atomic_word.h even though they are superseded by the atomics as it is published in the documentation as available macros. I see. We should at least update the documentation of those, as the current one isn't a really portable specification. If we can, I'd deprecate them. Jonathan, what do you think? It was obvious that alpha, powerpc, aix, ia64 could just fall back to the standard implementations. The cris and sparc implementations have different types for _Atomic_word from generic and the rest - my guess is sparc can remove the _GLIBCXX_READ_MEM_BARRIER and _GLIBCXX_WRITE_MEM_BARRIER but I have no way of testing this is sane. I'll leave that to the respective target maintainers for sparc and cris to deal as appropriate. LGTM.
Re: [Patch libstdc++] Rewrite cpu/generic/atomic_word.h
On Fri, 2015-05-22 at 12:37 +0100, Ramana Radhakrishnan wrote: Hi, While writing atomic_word.h for the ARM backend to fix PR target/66200 I thought it would make more sense to write it all up with atomic primitives instead of providing various fragile bits of inline asssembler. Thus this patch came about. I intend to ask for a specialized version of this to be applied for the ARM backend in any case after testing completes. If this goes in as a cleanup I expect all the ports to be deleting their atomic_word.h implementation in the various ports directories and I'll follow up with patches asking port maintainers to test and apply for each of the individual cases if this is deemed to be good enough. Could you use C++11 atomics right away instead of adapting the wrappers? I think the more non-C++11 atomic operations/wrappers we can remove the better. diff --git a/libstdc++-v3/config/cpu/generic/atomic_word.h b/libstdc ++-v3/config/cpu/generic/atomic_word.h index 19038bb..bedce31 100644 --- a/libstdc++-v3/config/cpu/generic/atomic_word.h +++ b/libstdc++-v3/config/cpu/generic/atomic_word.h @@ -29,19 +29,46 @@ #ifndef _GLIBCXX_ATOMIC_WORD_H #define _GLIBCXX_ATOMIC_WORD_H 1 +#include bits/cxxabi_tweaks.h + typedef int _Atomic_word; -// Define these two macros using the appropriate memory barrier for the target. -// The commented out versions below are the defaults. -// See ia64/atomic_word.h for an alternative approach. + +namespace __gnu_cxx _GLIBCXX_VISIBILITY(default) +{ + // Test the first byte of __g and ensure that no loads are hoisted across + // the test. That comment is not quite correct. I'd just say that this is a memory_order_acquire load and a comparison. + inline bool + __test_and_acquire (__cxxabiv1::__guard *__g) + { +unsigned char __c; +unsigned char *__p = reinterpret_castunsigned char *(__g); +__atomic_load (__p, __c, __ATOMIC_ACQUIRE); +return __c != 0; + } + + // Set the first byte of __g to 1 and ensure that no stores are sunk + // across the store. Likewise; I'd just say this is a memory_order_release store with 1 as value. + inline void + __set_and_release (__cxxabiv1::__guard *__g) + { +unsigned char *__p = reinterpret_castunsigned char *(__g); +unsigned char val = 1; +__atomic_store (__p, val, __ATOMIC_RELEASE); + } + +#define _GLIBCXX_GUARD_TEST_AND_ACQUIRE(G) __gnu_cxx::__test_and_acquire (G) +#define _GLIBCXX_GUARD_SET_AND_RELEASE(G) __gnu_cxx::__set_and_release (G) // This one prevents loads from being hoisted across the barrier; // in other words, this is a Load-Load acquire barrier. Likewise; I'd just say that this is an mo_acquire fence. -// This is necessary iff TARGET_RELAXED_ORDERING is defined in tm.h. -// #define _GLIBCXX_READ_MEM_BARRIER __asm __volatile (:::memory) +// This is necessary iff TARGET_RELAXED_ORDERING is defined in tm.h. +#define _GLIBCXX_READ_MEM_BARRIER __atomic_thread_fence (__ATOMIC_ACQUIRE) // This one prevents stores from being sunk across the barrier; in other // words, a Store-Store release barrier. Likewise; mo_release fence. -// #define _GLIBCXX_WRITE_MEM_BARRIER __asm __volatile (:::memory) +#define _GLIBCXX_WRITE_MEM_BARRIER __atomic_thread_fence (__ATOMIC_RELEASE) + +} #endif
Re: [PATCH] Fix memory orders description in atomic ops built-ins docs.
On Fri, 2015-05-22 at 17:41 +0100, Matthew Wahab wrote: On 21/05/15 19:26, Torvald Riegel wrote: On Thu, 2015-05-21 at 16:45 +0100, Matthew Wahab wrote: On 19/05/15 20:20, Torvald Riegel wrote: On Mon, 2015-05-18 at 17:36 +0100, Matthew Wahab wrote: Hello, On 15/05/15 17:22, Torvald Riegel wrote: This patch improves the documentation of the built-ins for atomic operations. I think we're talking at cross-purposes and not really getting anywhere. I've replied to some of your comments below, but it's mostly a restatement of points already made. OK. I have a few more comments below. We seem to have different views about the purpose of the manual page. I'm treating it as a description of the built-in functions provided by gcc to generate the code needed to implement the C++11 model. That is, the built-ins are distinct from C++11 and their descriptions should be, as far as possible, independent of the methods used in the C++11 specification to describe the C++11 memory model. OK. But we'd need a *precise* specification of what they do if we'd want to make them separate from the C++11 memory model. And we don't have that, would you agree? There is a difference between the sort of description that is needed for a formal specification and the sort that would be needed for a programmers manual. The best example of this that I can think of is the Standard ML definition (http://sml-family.org). That is a mathematical (so precise) definition that is invaluable if you want an unambiguous specification of the language. But its useless for anybody who just wants to use Standard ML to write programs. For that, you need go to the imprecise descriptions that are given in books about SML and in the documentation for SML compilers and libraries. The problem with using the formal SML definition is the same as with using the formal C++11 definition: most of it is detail needed to make things in the formal specification come out the right way. That detail, about things that are internal to the definition of the specification, makes it difficult to understand what is intended to be available for the user. A relation like happens-before is user-facing. It is how one reasons about ordering in a multi-threaded execution. This isn't internal or for a corner-case like additional-synchronizes-with or one of the consistency rules. The GCC manual seems to me to be aimed more at the people who want to use GCC to write code and I don't think that the patch makes much allowance for them. I do think that more precise statements about the relationship to C++11 are useful to have. Its the sort of constraint that ought to be documented somewhere. But it seems to be more of interest to compiler writers or, at least, to users who are as knowledgeable as compiler writers. A document targeting that group, such as the GCC internals or a GCC wiki-page, would seem to be a better place for the information. (Another example of the distinction may be the Intel Itanium ABI documentation which has a programmers description of the synchronization primitives and a separate, formal description of their behaviour.) For what it's worth, my view of how C++11, the __atomics and the machine code line up is that each is a distinct layer. Each layer implements the requirements of the higher (more abstract) layer but is otherwise entirely independent. That's why I think that a description of the __atomic built-in, aimed at compiler users rather than writers and that doesn't expect knowledge of C++11 is desirable and possible. I'm also concerned that the patch, by describing things in terms of formal C++11 concepts, makes it more difficult for people to know what the built-ins can be expected to do and so make the built-in more difficult to use[..] I hadn't thought about that possible danger, but that would be right. The way I would prefer to counter that is that we add a big fat warning to the __sync built-ins that we don't have a precise specification for them and that there are several corners of hand-waving and potentially further issues, and that this is another reason to prefer the __atomic built-ins. PR 65697 etc. are enough indication for me that we indeed lack a proper specification. Increasing uncertainty about the __sync built-ins wouldn't make people move to equally uncertain __atomic built-ins. There's enough knowledge and use of the __sync builtins to make them a more comfortable choice then the C++11 atomics and in the worst case it would push people to roll their own synchronization functions with assembler or system calls. I don't buy that. Sure, some people will be uncomfortable with anything. But I don't see how specified in C++11 and C11 is the same level of uncertainty as we don't have a tight specification. Users can pick
Re: [PATCH] Fix memory orders description in atomic ops built-ins docs.
On Thu, 2015-05-21 at 16:45 +0100, Matthew Wahab wrote: On 19/05/15 20:20, Torvald Riegel wrote: On Mon, 2015-05-18 at 17:36 +0100, Matthew Wahab wrote: Hello, On 15/05/15 17:22, Torvald Riegel wrote: This patch improves the documentation of the built-ins for atomic operations. The memory model to memory order change does improve things but I think that the patch has some problems. As it is now, it makes some of the descriptions quite difficult to understand and seems to assume more familiarity with details of the C++11 specification then might be expected. I'd say that's a side effect of the C++11 memory model being the reference specification of the built-ins. Generally, the memory order descriptions seem to be targeted towards language designers but don't provide for anybody trying to understand how to implement or to use the built-ins. I agree that the current descriptions aren't a tutorial on the C++11 memory model. However, given that the model is not GCC-specific, we aren't really in a need to provide a tutorial, in the same way that we don't provide a C++ tutorial. Users can pick the C++11 memory model educational material of their choice, and we need to document what's missing to apply the C++11 knowledge to the built-ins we provide. We seem to have different views about the purpose of the manual page. I'm treating it as a description of the built-in functions provided by gcc to generate the code needed to implement the C++11 model. That is, the built-ins are distinct from C++11 and their descriptions should be, as far as possible, independent of the methods used in the C++11 specification to describe the C++11 memory model. OK. But we'd need a *precise* specification of what they do if we'd want to make them separate from the C++11 memory model. And we don't have that, would you agree? It's also not a trivial task, so I wouldn't be optimistic that someone would offer to write such a specification, and have it cross-checked. I understand of course that the __atomics were added in order to support C++11 but that doesn't make them part of C++11 and, since __atomic functions can be made available when C11/C++11 may not be, it seems to make sense to try for stand-alone descriptions. The compiler can very well provide the C++11 *memory model* without creating any dependency on the other language/library pieces of C++11 or C11. Prior to C++11, multi-threaded executions were not defined by the standard, so we're not conflicting with anything in prior language standards, right? Another way to see this is to say that we just *copy* the C++11 memory model and use it as the memory model that specifies the behavior of the atomic built-ins. That additionally frees us from having to come up with and maintain our GCC-specific specification of atomics and a memory model. I'm also concerned that the patch, by describing things in terms of formal C++11 concepts, makes it more difficult for people to know what the built-ins can be expected to do and so make the built-in more difficult to use There is a danger that rather than take a risk with uncertainty about the behaviour of the __atomics, people will fall-back to the __sync functions simply because their expected behaviour is easier to work out. I hadn't thought about that possible danger, but that would be right. The way I would prefer to counter that is that we add a big fat warning to the __sync built-ins that we don't have a precise specification for them and that there are several corners of hand-waving and potentially further issues, and that this is another reason to prefer the __atomic built-ins. PR 65697 etc. are enough indication for me that we indeed lack a proper specification. I don't think that linking to external sites will help either, unless people already want to know C++11. Anybody who just wants to (e.g.) add a memory barrier will take one look at the __sync manual page and use the closest match from there instead. Well, just wants to add a memory barrier is a the start of the problem. The same way one needs to understand a hardware memory model to pick the right HW instruction(s), the same one needs to understand a programming language memory model to pick a fence and understand its semantics. Note that none of this requires a tutorial of any kind. I'm just suggesting that the manual should describe what behaviour should be expected of the code generated for the functions. For the memory orders, that would mean describing what constraints need to be met by the generated code. I'd bet that if one describes these constraints correctly, you'll get a large document -- even if one removes any introductory or explanatory parts that could make it a tutorial. It's fairly straight-forward to describe several simple usage patterns of the atomics (e.g., seq-cst ones, simple acquire/release
Re: [PATCH] add self-tuning to x86 hardware fast path in libitm
On Mon, 2015-05-18 at 23:27 -0400, Nuno Diegues wrote: On Mon, May 18, 2015 at 5:29 PM, Torvald Riegel trie...@redhat.com wrote: Are there better options for the utility function, or can we tune it to be less affected by varying txn length and likelihood of txnal vs. nontxnal code? What are the things we want to avoid with the tuning? I can think of: * Not needing to wait for serial txns, or being aborted by a serial txn. * Not retrying using HTM too much so that the retry time is larger than the scalability we actually gain by having other txns commit concurrently. Yes, those are the key points we want to make sure that do not happen. Anything else? Did you consider other utility functions during your research? The txnal vs nontxnal is indeed a completely different story. To account for this we would need extra book-keeping to count only cycles spent inside txnal code. So this would require every thread (or a sample of threads) to perform a rdtsc (or equivalent) on every begin/end call rather than the current approach of a single rdtsc per optimization round. With this type of online optimization we found that the algorithm had to be very simple and cheap to execute. RDTSC was a good finding to fit this, and it enabled us to obtain gains. Other time sources failed to do so. I do not have, out of the box, a good alternative to offer. I suppose it would take some iterations of thinking/experimenting with, just like with any research problem :) So let's iterate on this in parallel with the other changes that we need to get in place. I'd prefer to have some more confidence that measuring txn throughput in epochs is the best way forward. Here are a few thoughts: Why do we actually need to measure succeeded transactions? If a HW txn is just getting committed without interfering with anything else, is this really different from, say, two HW txns getting committed? Aren't we really just interested in wasted work, or delayed txns? That may help taking care of the nontxnal vs. txnal problem. Measuring committed txns during a time that might otherwise be spent by a serial txns could be useful to figure out how much other useful work a serial txn prevents. But we'd see that as well if we'd just go serial during the auto-tuning because then concurrent txns will have to wait; and in this case we could measure it in the slow path of those concurrent txns (ie, during waiting, where measurement overhead wouldn't matter as much). If a serial txn is running, concurrent txns (that wait) are free to sync and tell the serial how much cost it creates for the concurrent txns. There, txn length could matter, but we won't find out for real until after the concurrent ones have run (they could be pretty short, so we can't simply assume that the concurrent ones are as long as the serial one, so that simply the number of concurrent ones can be used to calculate delayed work). Also, note that the mitigation strategy for rdtsc short-comings that you mention in the paper is not applicable in general, specifically not in libitm. I suppose you mean the preemption of threads inflating the cycles measured? Yes, preemption and migration of threads (in case there's no global sync'ed TSC or similar) -- you mentioned in the paper that you pin threads to cores... This would be similarly a problem to any time source that tries to measure the amount of work performed; not sure how we can avoid it in general. Any thoughts? Not really as long as we keep depending on measuring time in a light-weight way. Measuring smaller time intervals could make it less likely that preemption happens during such an interval, though. Another issue is that we need to implement the tuning in a portable way. You currently make it depend on whether the assembler knows about RTM, whereas the rest of the code makes this more portable and defers to arch-specific files. I'd prefer if we could use the tuning on other archs too. But for that, we need to cleanly separate generic from arch-specific parts. That affects magic numbers as well as things like rdtsc(). Yes, I refrained from adding new calls to the arch-specific files, to contain the changes mainly. But that is possible and that's part of the feedback I was hoping to get. OK. Let me know if you want further input regarding this. I'm wondering about whether it really makes sense to treat XABORT like conflicts and other abort reasons, instead of like capacity aborts. Perhaps we need to differentiate between the explicit abort codes glibc uses, so that we can distinguish between cases where an abort is supposed to signal incompatibility with txnal execution and cases where it's just used for lock elision (see sysdeps/unix/sysv/linux/x86/hle.h in current glibc): #define _ABORT_LOCK_BUSY0xff #define _ABORT_LOCK_IS_LOCKED 0xfe #define _ABORT_NESTED_TRYLOCK 0xfd I am
Re: [PATCH] Fix memory orders description in atomic ops built-ins docs.
On Mon, 2015-05-18 at 17:36 +0100, Matthew Wahab wrote: Hello, On 15/05/15 17:22, Torvald Riegel wrote: This patch improves the documentation of the built-ins for atomic operations. The memory model to memory order change does improve things but I think that the patch has some problems. As it is now, it makes some of the descriptions quite difficult to understand and seems to assume more familiarity with details of the C++11 specification then might be expected. I'd say that's a side effect of the C++11 memory model being the reference specification of the built-ins. Generally, the memory order descriptions seem to be targeted towards language designers but don't provide for anybody trying to understand how to implement or to use the built-ins. I agree that the current descriptions aren't a tutorial on the C++11 memory model. However, given that the model is not GCC-specific, we aren't really in a need to provide a tutorial, in the same way that we don't provide a C++ tutorial. Users can pick the C++11 memory model educational material of their choice, and we need to document what's missing to apply the C++11 knowledge to the built-ins we provide. There are several resources for implementers, for example the mappings maintained by the Cambridge research group. I guess it would be sufficient to have such material on the wiki. Is there something specific that you'd like to see documented for implementers? Adding a less formal, programmers view to some of the descriptions would help. Yes, probably. However, I'm not aware of a good C++11 memory model tutorial that we could link too (OTOH, I haven't really looked for one). I don't plan to write one, and doing so would certainly take some time and require *much* more text than what we have now. That implies the descriptions would be more than just illustrative, but I'd suggest that would be appropriate for the GCC manual. The danger I see there is that if we claim to define semantics instead of just illustrating them, then we need to do a really good job of defining/explaining them. I worry that we may end up replicating what's in the standard or Batty et al.'s formalization of the memory model, and having to give a tutorial too. I'm not sure anyone is volunteering to do that amount of work. I'm also not sure that the use of C++11 terms in the some of the descriptions. In particular, using happens-before seems wrong because happens-before isn't described anywhere in the GCC manual and because it has a specific meaning in the C++11 specification that doesn't apply to the GCC built-ins (which C++11 doesn't know about). I agree it's not described in the manual, but we're implementing C++11. However, I don't see why happens-before semantics wouldn't apply to GCC's implementation of the built-ins; there may be cases where we guarantee more, but if one uses the builtins in way allowed by the C++11 model, one certainly gets behavior and happens-before relationships as specified by C++11. diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi index 6004681..5b2ded8 100644 --- a/gcc/doc/extend.texi +++ b/gcc/doc/extend.texi @@ -8853,19 +8853,19 @@ are not prevented from being speculated to before the barrier. [...] If the data type size maps to one -of the integral sizes that may have lock free support, the generic -version uses the lock free built-in function. Otherwise an +of the integral sizes that may support lock-freedom, the generic +version uses the lock-free built-in function. Otherwise an external call is left to be resolved at run time. = This is a slightly awkward sentence. Maybe it could be replaced with something on the lines of The generic function uses the lock-free built-in function when the data-type size makes that possible, otherwise an external call is left to be resolved at run-time. = Changed to: It uses the lock-free built-in function if the specific data type size makes that possible; otherwise, an external call is left to be resolved at run time. -The memory models integrate both barriers to code motion as well as -synchronization requirements with other threads. They are listed here -in approximately ascending order of strength. +An atomic operation can both constrain code motion by the compiler and +be mapped to a hardware instruction for synchronization between threads +(e.g., a fence). [...] = This is a little unclear (and inaccurate, aarch64 can use two instructions for fences). I also thought that atomic operations constrain code motion by the hardware. Maybe break the link with the compiler and hardware: An atomic operation can both constrain code motion and act as a synchronization point between threads. = I removed by the compiler and used hardware instruction_s_. @table @code @item __ATOMIC_RELAXED -No barriers or synchronization. +Implies no inter-thread ordering constraints. It may
Re: [PATCH] add self-tuning to x86 hardware fast path in libitm
On Wed, 2015-04-29 at 23:23 -0400, Nuno Diegues wrote: Hello, I have taken the chance to improve the patch by addressing the comments above in this thread. Namely: - to use a simple random generator managed inside the library only - removed floating point usage and replaced by fixed arithmetic - added some comments where relevant First of all, sorry for taking so long to review this. Thank you for the contribution. There are several high-level changes that I'd like to see (or be convinced of that those aren't necessary). I'll discuss these first. I'll have more detailed comments inlined in the patch. My major concern is about rdtsc being used. The relation to frequency adaption that Andi mentioned is one issue, but the bigger issue to me is that the runtime of transactions might vary a lot, and that the relation of txnal vs. nontxnal code executed in a period might vary a lot too. Are there better options for the utility function, or can we tune it to be less affected by varying txn length and likelihood of txnal vs. nontxnal code? What are the things we want to avoid with the tuning? I can think of: * Not needing to wait for serial txns, or being aborted by a serial txn. * Not retrying using HTM too much so that the retry time is larger than the scalability we actually gain by having other txns commit concurrently. Anything else? Did you consider other utility functions during your research? Also, note that the mitigation strategy for rdtsc short-comings that you mention in the paper is not applicable in general, specifically not in libitm. Another issue is that we need to implement the tuning in a portable way. You currently make it depend on whether the assembler knows about RTM, whereas the rest of the code makes this more portable and defers to arch-specific files. I'd prefer if we could use the tuning on other archs too. But for that, we need to cleanly separate generic from arch-specific parts. That affects magic numbers as well as things like rdtsc(). Generally, the patch needs to include more comments. Most importantly, we need to document why we implemented things the way we did, or do tuning the way we do. In cases where the intent isn't trivial to see nor the code is trivial, explain the intent or reference the place where you document the big picture. A good rule of thumb is adding comments whenever it's not simple to see why we use one of several possible implementation alternatives. The magic numbers being used are a good example. Make them constants and don't just put them directly in the code, and then add documentation why you chose this number and why it's okay to make that choice. If it isn't necessarily okay (eg, other archs, future systems), but you don't have a better solution right now, add something like ??? Not ideal because of XYZ.. If there's a source for some of the magic numbers, cite it (e.g., [28] in your paper might be one for the tuning thresholds, I guess). Reoptimizing only in a specific, fixed thread is insufficient in the general case. There may be a thread that only runs an initial txn and then never another one; if this happens to be the last thread registered, you'll never get any tuning. If we want the tuning to be effective, one of the threads *actively* running txns needs to tune eventually, always. Depending on that, you'll probably have to change the sync between the tuning thread and others. Serial mode may be a simple way to do that. It may be possible to only check for tuning being necessary when in serial mode. It could be possible that we end up trying HTM too often yet never go to serial mode; however, that seems unlikely to me (but I haven't thought thoroughly about it). Also, please remember that only data-race-free code is strictly correct C++ code (the only exception being the validated-loads special case in the STM implementations, for which C++ doesn't have support yet (but for which proposals exist)). Thus, use atomic operations accordingly. That might create another issue though in that you can't assume 64b atomic ops to be supported on all archs. Maybe using serial mode for tuning doesn't trigger that issue though. I'm wondering about whether it really makes sense to treat XABORT like conflicts and other abort reasons, instead of like capacity aborts. Perhaps we need to differentiate between the explicit abort codes glibc uses, so that we can distinguish between cases where an abort is supposed to signal incompatibility with txnal execution and cases where it's just used for lock elision (see sysdeps/unix/sysv/linux/x86/hle.h in current glibc): #define _ABORT_LOCK_BUSY0xff #define _ABORT_LOCK_IS_LOCKED 0xfe #define _ABORT_NESTED_TRYLOCK 0xfd Andi said that it would be nice to have an env var turning tuning on/off. I agree in principle, yet would prefer to have the tuning be the default. What about adding an htm_notune option in parse_default_method? It would be even nicer to
Re: [PATCH] add self-tuning to x86 hardware fast path in libitm
On Mon, 2015-05-18 at 23:39 +0200, Andi Kleen wrote: Are there better options for the utility function, or can we tune it to There is nothing better that isn't a lot slower. Do you care to elaborate why? As-is, I find this statement to not be convincing; at the very least we need to document why we think that something time-based is the best option. Other tuning attempts have looked at rates of aborted, attempted, and committed txns, for example. Why do we measure nb. of transactions in a whole period, and can't get the same info through measuring smaller but more specific time intervals (e.g., how long we wait for a serial txn)?
Re: [PATCH] [PR libitm/61164] Remove redefinition of glibc internal macro __always_inline
On Wed, 2015-05-06 at 17:54 +0300, Gleb Fotengauer-Malinovskiy wrote: 2015-05-06 Gleb Fotengauer-Malinovskiy gle...@altlinux.org PR libitm/61164 * local_atomic (__always_inline): Rename to... (__libitm_always_inline): ... this. OK. Thanks.
[PATCH] Fix memory orders description in atomic ops built-ins docs.
This patch improves the documentation of the built-ins for atomic operations. First, to be consistent with the C++11 standard and make it easier for users to relate the built-ins to the C++11 specification, the patch changes 'memory model' to 'memory order'. The memory model is the whole thing, whereas memory orders are specific classes ordering requirements that an atomic operations are parametrized by. Second, the patch clarifies that the descriptions of the memory orders are for illustrative purposes only, and adds illustrative descriptions that are less incorrect and incomplete than what we had before. There is no way we can precisely summarize the C++ memory model in the docs, and there isn't really a need either. OK for trunk? 2015-05-15 Torvald Riegel trie...@redhat.com * doc/extend.texi (__atomic Builtins): Use 'memory order' instead of 'memory model' to align with C++11; fix description of memory orders; fix a few typos. commit d29b0bbeecf2267085f4ecdbaa61a0f969665468 Author: Torvald Riegel trie...@redhat.com Date: Fri May 15 18:14:40 2015 +0200 Fix memory order description in atomic ops built-ins docs. diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi index 6004681..5b2ded8 100644 --- a/gcc/doc/extend.texi +++ b/gcc/doc/extend.texi @@ -8853,19 +8853,19 @@ are not prevented from being speculated to before the barrier. @section Built-in Functions for Memory Model Aware Atomic Operations The following built-in functions approximately match the requirements -for C++11 concurrency and memory models. They are all +for the C++11 memory model. They are all identified by being prefixed with @samp{__atomic} and most are overloaded so that they work with multiple types. These functions are intended to replace the legacy @samp{__sync} -builtins. The main difference is that the memory model to be used is a -parameter to the functions. New code should always use the +builtins. The main difference is that the memory order that is requested +is a parameter to the functions. New code should always use the @samp{__atomic} builtins rather than the @samp{__sync} builtins. Note that the @samp{__atomic} builtins assume that programs will -conform to the C++11 model for concurrency. In particular, they assume +conform to the C++11 memory model. In particular, they assume that programs are free of data races. See the C++11 standard for -detailed definitions. +detailed requirements. The @samp{__atomic} builtins can be used with any integral scalar or pointer type that is 1, 2, 4, or 8 bytes in length. 16-byte integral @@ -8875,136 +8875,140 @@ supported by the architecture. The four non-arithmetic functions (load, store, exchange, and compare_exchange) all have a generic version as well. This generic version works on any data type. If the data type size maps to one -of the integral sizes that may have lock free support, the generic -version uses the lock free built-in function. Otherwise an +of the integral sizes that may support lock-freedom, the generic +version uses the lock-free built-in function. Otherwise an external call is left to be resolved at run time. This external call is the same format with the addition of a @samp{size_t} parameter inserted as the first parameter indicating the size of the object being pointed to. All objects must be the same size. -There are 6 different memory models that can be specified. These map -to the C++11 memory models with the same names, see the C++11 standard +There are 6 different memory orders that can be specified. These map +to the C++11 memory orders with the same names, see the C++11 standard or the @uref{http://gcc.gnu.org/wiki/Atomic/GCCMM/AtomicSync,GCC wiki on atomic synchronization} for detailed definitions. Individual -targets may also support additional memory models for use on specific +targets may also support additional memory orders for use on specific architectures. Refer to the target documentation for details of these. -The memory models integrate both barriers to code motion as well as -synchronization requirements with other threads. They are listed here -in approximately ascending order of strength. +An atomic operation can both constrain code motion by the compiler and +be mapped to a hardware instruction for synchronization between threads +(e.g., a fence). To which extent this happens is controlled by the +memory orders, which are listed here in approximately ascending order of +strength. The description of each memory order is only meant to roughly +illustrate the effects and is not a specification; see the C++11 +memory model for precise semantics. @table @code @item __ATOMIC_RELAXED -No barriers or synchronization. +Implies no inter-thread ordering constraints. @item __ATOMIC_CONSUME -Data dependency only for both barrier and synchronization with another -thread. +This is currently implemented using the stronger @code{__ATOMIC_ACQUIRE
Re: [patch] Fix shared_timed_mutex::try_lock_until() et al
There is an correctness issue related to mutex destruction. The added documentation is a good start, but I'd still add some more for the complicated pieces of reasoning. Details inline below. On Tue, 2015-04-07 at 15:28 +0100, Jonathan Wakely wrote: diff --git a/libstdc++-v3/include/std/shared_mutex b/libstdc ++-v3/include/std/shared_mutex index ab1b45b..7391f11 100644 --- a/libstdc++-v3/include/std/shared_mutex +++ b/libstdc++-v3/include/std/shared_mutex @@ -268,7 +268,12 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION #else // ! _GLIBCXX_USE_PTHREAD_RWLOCK_T +// Must use the same clock as condition_variable +typedef chrono::system_clock __clock_t; + #if _GTHREAD_USE_MUTEX_TIMEDLOCK +// Can't use std::timed_mutex with std::condition_variable, so define +// a new timed mutex type that derives from std::mutex. struct _Mutex : mutex, __timed_mutex_impl_Mutex { templatetypename _Rep, typename _Period @@ -285,16 +290,44 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION typedef mutex _Mutex; #endif -// Based on Howard Hinnant's reference implementation from N2406 - +// Based on Howard Hinnant's reference implementation from N2406. + +// The high bit of _M_state is the write-entered flag which is set to +// indicate a writer has taken the lock or is queuing to take the lock. +// The remaining bits are the count of reader locks. +// +// To take a reader lock block on gate1 while the write-entered flag is I think this is easier to parse if you add a comma: To take a reader lock, block Same for other sentences below. +// set or the maximum number of reader locks is held, then increment the +// reader lock count. +// To release decrement the count, then if the write-entered flag is set +// and the count is zero then signal gate2 to wake a queued writer, +// otherwise if the maximum number of reader locks was held signal gate1 +// to wake a reader. +// +// To take a writer lock block on gate1 while the write-entered flag is +// set, then set the write-entered flag to start queueing, then block on +// gate2 while the number of reader locks is non-zero. +// To release unset the write-entered flag and signal gate1 to wake all +// blocked readers and writers. Perhaps it would also be useful to have a sentence on how writers and readers are prioritized (e.g., writers preferred if readers already hold the lock, all are equal when writer unlocks, ...). + +// Only locked when accessing _M_state or waiting on condition variables. _Mutex _M_mut; +// Used to block while write-entered is set or reader count at maximum. condition_variable _M_gate1; +// Used to block queued writers while reader count is non-zero. condition_variable _M_gate2; +// The write-entered flag and reader count. unsigned _M_state; static constexpr unsigned _S_write_entered = 1U (sizeof(unsigned)*__CHAR_BIT__ - 1); -static constexpr unsigned _M_n_readers = ~_S_write_entered; +static constexpr unsigned _S_n_readers = ~_S_write_entered; Rename this to _S_max_readers or such? + +// Test whether the write-entered flag is set. _M_mut must be locked. +bool _M_write_entered() const { return _M_state _S_write_entered; } + +// The number of reader locks currently held. _M_mut must be locked. +unsigned _M_readers() const { return _M_state _S_n_readers; } public: shared_timed_mutex() : _M_state(0) {} @@ -313,11 +346,9 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION lock() { unique_lockmutex __lk(_M_mut); - while (_M_state _S_write_entered) - _M_gate1.wait(__lk); Add something like this:? // We first wait until we acquire the writer-part of the lock, and // then wait until no readers hold the lock anymore. + _M_gate1.wait(__lk, [=]{ return !_M_write_entered(); }); _M_state |= _S_write_entered; - while (_M_state _M_n_readers) - _M_gate2.wait(__lk); + _M_gate2.wait(__lk, [=]{ return _M_readers() == 0; }); } bool @@ -337,26 +368,30 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION bool try_lock_for(const chrono::duration_Rep, _Period __rel_time) { - unique_lock_Mutex __lk(_M_mut, __rel_time); - if (__lk.owns_lock() _M_state == 0) - { - _M_state = _S_write_entered; - return true; - } - return false; + return try_lock_until(__clock_t::now() + __rel_time); } templatetypename _Clock, typename _Duration bool try_lock_until(const chrono::time_point_Clock, _Duration __abs_time) { - unique_lock_Mutex __lk(_M_mut, __abs_time); - if (__lk.owns_lock() _M_state == 0) + if (!_M_mut.try_lock_until(__abs_time)) + return false; I think a non-timed acquisition
Re: [PATCH][AArch64] Testcase fix for __ATOMIC_CONSUME
On Mon, 2015-02-09 at 09:10 -0800, Mike Stump wrote: On Feb 9, 2015, at 7:11 AM, Alex Velenko alex.vele...@arm.com wrote: The following patch makes atomic-op-consume.c XFAIL Is this patch ok? Ok. I’d shorten the comment above the xfail to be exceedingly short: /* PR59448 consume not implemented yet */ The reason is the brain can process this about 8x faster. Also, one can cut and paste the PR part into a web browser directly, or, if you have an electric bugzilla mode for emacs, it will pop right up. */ Given the discussions we had in ISO C++ SG1, it seems the only way to fix memory_order_consume is to deprecate it (or let it rot forever), and add something else to the standard. See http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2014/n4321.pdf IOW, I believe the promotion is here to stay. I'm not aware of any other implementation doing something else. Thus, XFAIL doesn't seem right to me.
[patch libstdc++] Add POSIX variant of shared_timed_mutex.
[Resend. Sorry for the noise on gcc@.] This adds a POSIX-based implementation of shared_timed_mutex, using pthread_rwlock_* operations directly instead of implementing with mutexes and two condvars. This enables using an optimized pthread_rwlock_t on POSIX. Tested on x86_64-linux. OK? 2015-01-16 Torvald Riegel trie...@redhat.com * include/std/shared_mutex (shared_timed_mutex): Add POSIX-based implementation. commit e0a32ddb058d8b4dd563f89130d03bce220ace8c Author: Torvald Riegel trie...@redhat.com Date: Thu Jan 15 22:29:23 2015 +0100 libstdc++: Add POSIX variant of shared_timed_mutex. * include/std/shared_mutex (shared_timed_mutex): Add POSIX-based implementation. diff --git a/libstdc++-v3/include/std/shared_mutex b/libstdc++-v3/include/std/shared_mutex index 8bfede3..643768c 100644 --- a/libstdc++-v3/include/std/shared_mutex +++ b/libstdc++-v3/include/std/shared_mutex @@ -57,6 +57,188 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION /// shared_timed_mutex class shared_timed_mutex { +#if defined(__GTHREADS_CXX0X) +typedef chrono::system_clock __clock_t; + +pthread_rwlock_t _M_rwlock; + + public: +shared_timed_mutex() +{ + int __ret = pthread_rwlock_init(_M_rwlock, NULL); + if (__ret == ENOMEM) + throw bad_alloc(); + else if (__ret == EAGAIN) + __throw_system_error(int(errc::resource_unavailable_try_again)); + else if (__ret == EPERM) + __throw_system_error(int(errc::operation_not_permitted)); + // Errors not handled: EBUSY, EINVAL + _GLIBCXX_DEBUG_ASSERT(__ret == 0); +} + +~shared_timed_mutex() +{ + int __ret __attribute((unused)) = pthread_rwlock_destroy(_M_rwlock); + // Errors not handled: EBUSY, EINVAL + _GLIBCXX_DEBUG_ASSERT(__ret == 0); +} + +shared_timed_mutex(const shared_timed_mutex) = delete; +shared_timed_mutex operator=(const shared_timed_mutex) = delete; + +// Exclusive ownership + +void +lock() +{ + int __ret = pthread_rwlock_wrlock(_M_rwlock); + if (__ret == EDEADLK) + __throw_system_error(int(errc::resource_deadlock_would_occur)); + // Errors not handled: EINVAL + _GLIBCXX_DEBUG_ASSERT(__ret == 0); +} + +bool +try_lock() +{ + int __ret = pthread_rwlock_trywrlock(_M_rwlock); + if (__ret == EBUSY) return false; + // Errors not handled: EINVAL + _GLIBCXX_DEBUG_ASSERT(__ret == 0); + return true; +} + +templatetypename _Rep, typename _Period + bool + try_lock_for(const chrono::duration_Rep, _Period __rel_time) + { + return try_lock_until(__clock_t::now() + __rel_time); + } + +templatetypename _Duration + bool + try_lock_until(const chrono::time_point__clock_t, _Duration __atime) + { + auto __s = chrono::time_point_castchrono::seconds(__atime); + auto __ns = chrono::duration_castchrono::nanoseconds(__atime - __s); + + __gthread_time_t __ts = + { + static_caststd::time_t(__s.time_since_epoch().count()), + static_castlong(__ns.count()) + }; + + int __ret = pthread_rwlock_timedwrlock(_M_rwlock, __ts); + // On self-deadlock, we just fail to acquire the lock. Technically, + // the program violated the precondition. + if (__ret == ETIMEDOUT || __ret == EDEADLK) + return false; + // Errors not handled: EINVAL + _GLIBCXX_DEBUG_ASSERT(__ret == 0); + return true; + } + +templatetypename _Clock, typename _Duration + bool + try_lock_until(const chrono::time_point_Clock, _Duration __abs_time) + { + // DR 887 - Sync unknown clock to known clock. + const typename _Clock::time_point __c_entry = _Clock::now(); + const __clock_t::time_point __s_entry = __clock_t::now(); + const auto __delta = __abs_time - __c_entry; + const auto __s_atime = __s_entry + __delta; + return try_lock_until(__s_atime); + } + +void +unlock() +{ + int __ret __attribute((unused)) = pthread_rwlock_unlock(_M_rwlock); + // Errors not handled: EPERM, EBUSY, EINVAL + _GLIBCXX_DEBUG_ASSERT(__ret == 0); +} + +// Shared ownership + +void +lock_shared() +{ + int __ret = pthread_rwlock_rdlock(_M_rwlock); + if (__ret == EDEADLK) + __throw_system_error(int(errc::resource_deadlock_would_occur)); + if (__ret == EAGAIN) + // Maximum number of read locks has been exceeded. + __throw_system_error(int(errc::device_or_resource_busy)); + // Errors not handled: EINVAL + _GLIBCXX_DEBUG_ASSERT(__ret == 0); +} + +bool +try_lock_shared() +{ + int __ret = pthread_rwlock_tryrdlock(_M_rwlock); + // If the maximum number of read locks has been exceeded, we just fail + // to acquire the lock. Unlike for lock(), we are not allowed to throw + // an exception. + if (__ret == EBUSY || __ret == EAGAIN) return false; + // Errors not handled: EINVAL + _GLIBCXX_DEBUG_ASSERT(__ret == 0); + return true; +} + +templatetypename _Rep, typename _Period
[patch libstdc++] Optimize synchronization in std::future if futexes are available.
This patch optimizes synchronization in std::future by using atomic operations and futexes if the latter are available, instead of the mutex/condvar combination in the existing code. That reduces the space overhead for futexes as well as synchronization runtime overheads. To do that, the patch introduces an abstraction layer for futexes that essentially extends an atomic-typed variable with operations that wait for the variable to (not) have a certain value. This waiting can then be implemented internally with a combination of spin-waiting (not implemented yet) and blocking using OS features such as futexes. This approach is similar to what the synchronicT proposal to ISO C++ (N4195) contains. The atomic-typed variable is an unsigned int because this is what Linux futexes currently support as futex variable. If futexes are not available, the implementation falls back to using a mutex and condvar. This leads to very similar code for std::future compared to the existing code. The exception is that _State_baseV2::wait_for and _State_baseV2::wait_until may acquire the mutex twice if the future is not ready; this may lead to some additional contention if those functions actually have to wait for the future to become ready (ie, this is on the slow path). It would be possible to optimize the space overhead in std::future further by merging _State_baseV2::_M_retrieved and _State_baseV2::_M_once into _State_baseV2::_M_status. This would add some complexity to the synchronization code, but not a lot. I'm happy to do this next week if people are interested in this. Tested on x86_64-linux and the 30_threads/* tests. OK for trunk? commit 4b07d1c0ab807fd0fbedacde1e9fec99a7d75b6d Author: Torvald Riegel trie...@redhat.com Date: Sun Nov 16 12:07:22 2014 +0100 libstdc++: Optimize synchronization in std::future if futexes are available. * src/c++11/futex.cc: New file. * include/bits/atomic_futex.h: New file. * include/std/future (__future_base::_State_baseV2): Use atomic_futex_unsigned instead of mutex+condvar. * src/c++11/futex.cc: Likewise. * include/Makefile.am: Add atomic_futex.h. * include/Makefile.in: Likewise. * src/c++11/Makefile.am: Add futex.cc. * src/c++11/Makefile.in: Likewise. diff --git a/libstdc++-v3/config/abi/pre/gnu.ver b/libstdc++-v3/config/abi/pre/gnu.ver index 700da18..d8d155f 100644 --- a/libstdc++-v3/config/abi/pre/gnu.ver +++ b/libstdc++-v3/config/abi/pre/gnu.ver @@ -1759,6 +1759,11 @@ GLIBCXX_3.4.21 { _ZNKSt8time_getI[cw]St19istreambuf_iteratorI[cw]St11char_traitsI[cw]EEE3getES3_S3_RSt8ios_baseRSt12_Ios_IostateP2tmPK[cw]SC_; _ZNKSt8time_getI[cw]St19istreambuf_iteratorI[cw]St11char_traitsI[cw]EEE6do_getES3_S3_RSt8ios_baseRSt12_Ios_IostateP2tmcc; +extern C++ +{ + std::__atomic_futex_unsigned_base*; +}; + } GLIBCXX_3.4.20; diff --git a/libstdc++-v3/include/Makefile.am b/libstdc++-v3/include/Makefile.am index 08e5d5f..4772950 100644 --- a/libstdc++-v3/include/Makefile.am +++ b/libstdc++-v3/include/Makefile.am @@ -83,6 +83,7 @@ bits_headers = \ ${bits_srcdir}/allocated_ptr.h \ ${bits_srcdir}/allocator.h \ ${bits_srcdir}/atomic_base.h \ + ${bits_srcdir}/atomic_futex.h \ ${bits_srcdir}/basic_ios.h \ ${bits_srcdir}/basic_ios.tcc \ ${bits_srcdir}/basic_string.h \ diff --git a/libstdc++-v3/include/Makefile.in b/libstdc++-v3/include/Makefile.in index 3e5d82e..ebcaa96 100644 --- a/libstdc++-v3/include/Makefile.in +++ b/libstdc++-v3/include/Makefile.in @@ -351,6 +351,7 @@ bits_headers = \ ${bits_srcdir}/allocated_ptr.h \ ${bits_srcdir}/allocator.h \ ${bits_srcdir}/atomic_base.h \ + ${bits_srcdir}/atomic_futex.h \ ${bits_srcdir}/basic_ios.h \ ${bits_srcdir}/basic_ios.tcc \ ${bits_srcdir}/basic_string.h \ diff --git a/libstdc++-v3/include/bits/atomic_futex.h b/libstdc++-v3/include/bits/atomic_futex.h new file mode 100644 index 000..9a418d8 --- /dev/null +++ b/libstdc++-v3/include/bits/atomic_futex.h @@ -0,0 +1,288 @@ +// -*- C++ -*- header. + +// Copyright (C) 2015 Free Software Foundation, Inc. +// +// This file is part of the GNU ISO C++ Library. This library is free +// software; you can redistribute it and/or modify it under the +// terms of the GNU General Public License as published by the +// Free Software Foundation; either version 3, or (at your option) +// any later version. + +// This library is distributed in the hope that it will be useful, +// but WITHOUT ANY WARRANTY; without even the implied warranty of +// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +// GNU General Public License for more details. + +// Under Section 7 of GPL version 3, you are granted additional +// permissions described in the GCC Runtime Library Exception, version +// 3.1, as published by the Free Software Foundation. + +// You should have received a copy of the GNU General Public License and +// a copy of the GCC Runtime Library Exception along with this program; +// see
Re: [PATCH] PR59448 - Promote consume to acquire
On Wed, 2015-01-14 at 10:44 -0500, Andrew MacLeod wrote: I think this brings us to where we ought to be... at least almost :-) The latest version I have is n3337, which still specifies that atomic_clear can't be memory_order_acquire or memory_order_acq_rel. Has that been updated to specify that memory_order_consume is not allowed either? I think there was a request in at some point... I can add that if so. N3797 disallows mo_consume on atomic_flag::clear.
Re: [PATCH] PR59448 - Promote consume to acquire
On Tue, 2015-01-13 at 10:11 -0500, Andrew MacLeod wrote: On 01/13/2015 09:59 AM, Richard Biener wrote: On Tue, Jan 13, 2015 at 3:56 PM, Andrew MacLeod amacl...@redhat.com wrote: Lengthy discussion : https://gcc.gnu.org/bugzilla/show_bug.cgi?id=59448 Basically we can generate incorrect code for an atomic consume operation in some circumstances. The general feeling seems to be that we should simply promote all consume operations to an acquire operation until there is a better definition/understanding of the consume model and how GCC can track it. I proposed a simple patch in the PR, and I have not seen or heard of any dissenting opinion. We should get this in before the end of stage 3 I think. The problem with the patch in the PR is the memory model is immediately promoted from consume to acquire. This happens *before* any of the memmodel checks are made. If a consume is illegally specified (such as in a compare_exchange), it gets promoted to acquire and the compiler doesn't report the error because it never sees the consume. This new patch simply makes the adjustment after any errors are checked on the originally specified model. It bootstraps on x86_64-unknown-linux-gnu and passes all regression testing. I also built an aarch64 compiler and it appears to issue the LDAR as specified in the PR, but anyone with a vested interest really ought to check it out with a real build to be sure. OK for trunk? Why not patch get_memmodel? (not sure if that catches all cases) Richard. That was the original patch. The issue is that it promotes consume to acquire before any error checking gets to look at the model, so then we allow illegal specification of consume. (It actually triggers a failure in the testsuite) (This is this test: gcc/testsuite/gcc.dg/atomic-invalid.c) The documentation of the atomic builtins also disallows mo_consume on atomic_exchange. However, I don't see any such requirement in C11 or C++14 (and I'd be surprised to see it in C++11). It would be surprising also because for other atomic read-modify-write operations (eg, fetch_add), we don't make such a requirement in the builtins docs -- and atomic_exchange is just a read-modify-write with a noop, basically. Does anyone remember why this requirement for no consume on exchange was added, or sees a reason to keep it? If not, I think we should drop it. This would solve the testsuite failure for Andrew. Dropping it would prevent GCC from checking the consume-on-success / acquire-on-failure case for compare_excahnge I mentioned previously, but I think that this is pretty harmless. I could imagine that, for some reason, either backends or libatomic do not implement consume on atomic_exchange just because the docs disallowed it -- but I haven't checked that.
Re: [PATCH] PR59448 - Promote consume to acquire
On Tue, 2015-01-13 at 09:56 -0500, Andrew MacLeod wrote: The problem with the patch in the PR is the memory model is immediately promoted from consume to acquire. This happens *before* any of the memmodel checks are made. If a consume is illegally specified (such as in a compare_exchange), it gets promoted to acquire and the compiler doesn't report the error because it never sees the consume. The only issue I can think of in compare_exchange is if the program specifies memory_order_consume for the success path but memory_order_acquire for the failure path, which is disallowed by the standard. However, I don't see a reason why the standard's requirement is anything but a performance check in our particular case. The only case we prevent the compiler from reporting is a consume-on-success / acquire-on-failure combination. But we upgrade the former to acquire, so we can't even cause libatomic (or similar) to issue too weak barriers due to libatomic relying on the standard's requirement. Thus, if there's no easy way to upgrade to acquire after the sanity checks, I think this isn't critical enough to hold up the patch from being committed. memory_order_consume is clearly a feature for experts.
[patch] [WIP] Optimize synchronization in std::future if futexes are available.
This WORK-IN-PROGRESS patch uses an atomic unsigned and futex operations to optimize the synchronization code in std::future. The current code uses a mutex/condvar combination, which is both slower (e.g., due to mutex contention, stronger ordering requirements for condvars, using an additional condvar-internal mutex, ...) and makes std::future fairly large. It introduces an __atomic_futex_unsigned type, which provides basic atomic operations (load and store) on an atomicunsigned and additionally provides load_when_[not_]equal operations that do blocking waits on the atomic -- pretty much what futexes do. Such an __atomic_futex_unsigned is then There are a few bits missing in this patch: * A fallback implementation for platforms that don't provide futexes, in the form of a different implementation of __atomic_futex_unsigned. A mutex+condvar combination is what I'm aiming at; for std::future, this would lead to similar code and sizeof(std::future) as currently. * More documentation of how the synchronization works. Jonathan has a patch in flight for that, so this should get merged. * Integration with the on_thread_exit patch that Jonathan posted, which uses the current, lock-based synchronization scheme and thus needs to get adapted. * Testing. There are ways to optimize further I suppose, for example by letting the __atomic_futex_unsigned take care of all current uses of call_once too. Let me know if I should do that. This would reduce the number of atomic ops a little in some cases, as well as reduce space required for futures a little. Comments? commit 1543313a3b590e6422f5d547cabd6662e0a6f538 Author: Torvald Riegel trie...@redhat.com Date: Sun Nov 16 12:07:22 2014 +0100 [WIP] Optimize synchronization in std::future if futexes are available. diff --git a/libstdc++-v3/include/Makefile.am b/libstdc++-v3/include/Makefile.am index e6edc73..81e6a8b 100644 --- a/libstdc++-v3/include/Makefile.am +++ b/libstdc++-v3/include/Makefile.am @@ -83,6 +83,7 @@ bits_headers = \ ${bits_srcdir}/allocated_ptr.h \ ${bits_srcdir}/allocator.h \ ${bits_srcdir}/atomic_base.h \ + ${bits_srcdir}/atomic_futex.h \ ${bits_srcdir}/basic_ios.h \ ${bits_srcdir}/basic_ios.tcc \ ${bits_srcdir}/basic_string.h \ diff --git a/libstdc++-v3/include/Makefile.in b/libstdc++-v3/include/Makefile.in index 2ade448..2d72de3 100644 --- a/libstdc++-v3/include/Makefile.in +++ b/libstdc++-v3/include/Makefile.in @@ -350,6 +350,7 @@ bits_headers = \ ${bits_srcdir}/allocated_ptr.h \ ${bits_srcdir}/allocator.h \ ${bits_srcdir}/atomic_base.h \ + ${bits_srcdir}/atomic_futex.h \ ${bits_srcdir}/basic_ios.h \ ${bits_srcdir}/basic_ios.tcc \ ${bits_srcdir}/basic_string.h \ diff --git a/libstdc++-v3/include/bits/atomic_futex.h b/libstdc++-v3/include/bits/atomic_futex.h new file mode 100644 index 000..4b5664d --- /dev/null +++ b/libstdc++-v3/include/bits/atomic_futex.h @@ -0,0 +1,175 @@ +// -*- C++ -*- header. + +// Copyright (C) 2014 Free Software Foundation, Inc. +// +// This file is part of the GNU ISO C++ Library. This library is free +// software; you can redistribute it and/or modify it under the +// terms of the GNU General Public License as published by the +// Free Software Foundation; either version 3, or (at your option) +// any later version. + +// This library is distributed in the hope that it will be useful, +// but WITHOUT ANY WARRANTY; without even the implied warranty of +// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +// GNU General Public License for more details. + +// Under Section 7 of GPL version 3, you are granted additional +// permissions described in the GCC Runtime Library Exception, version +// 3.1, as published by the Free Software Foundation. + +// You should have received a copy of the GNU General Public License and +// a copy of the GCC Runtime Library Exception along with this program; +// see the files COPYING3 and COPYING.RUNTIME respectively. If not, see +// http://www.gnu.org/licenses/. + +/** @file bits/atomic_futex.h + * This is an internal header file, included by other library headers. + * Do not attempt to use it directly. + */ + +#ifndef _GLIBCXX_ATOMIC_FUTEX_H +#define _GLIBCXX_ATOMIC_FUTEX_H 1 + +#pragma GCC system_header + +#include bits/c++config.h +#include atomic +#include mutex +#include condition_variable + +#ifndef _GLIBCXX_ALWAYS_INLINE +#define _GLIBCXX_ALWAYS_INLINE inline __attribute__((always_inline)) +#endif + +namespace std _GLIBCXX_VISIBILITY(default) +{ +_GLIBCXX_BEGIN_NAMESPACE_VERSION + +//#ifdef _GLIBCXX_USE_FUTEX + struct __atomic_futex_unsigned_base + { +// Returns false iff a timeout occurred. +static bool +futex_wait_for(unsigned *addr, unsigned val, bool has_timeout, + chrono::seconds s, chrono::nanoseconds ns); + +// This is static because it can be executed after the object has been +// destroyed. +static void futex_notify_all(unsigned* addr); + }; + + template unsigned _Waiter_bit = 0x8000
Re: [PATCH v3] PR libitm/61164: use always_inline consistently
On Mon, 2014-09-29 at 16:53 +0400, Gleb Fotengauer-Malinovskiy wrote: -#undef __always_inline -#define __always_inline __attribute__((always_inline)) +#define __libitm_always_inline inline __attribute__((always_inline)) The previous code seems to work in libstdc++. I believe that eventually, we'll want to use libstdc++-v3/include/bits/atomic_base.h (see the comment at the top of the file). Can we keep the diff between the two files small?
Re: [PATCH v3] PR libitm/61164: use always_inline consistently
On Mon, 2014-09-29 at 17:38 +0200, Jakub Jelinek wrote: On Mon, Sep 29, 2014 at 05:35:24PM +0200, Torvald Riegel wrote: On Mon, 2014-09-29 at 16:53 +0400, Gleb Fotengauer-Malinovskiy wrote: -#undef __always_inline -#define __always_inline __attribute__((always_inline)) +#define __libitm_always_inline inline __attribute__((always_inline)) The previous code seems to work in libstdc++. I believe that eventually, we'll want to use libstdc++-v3/include/bits/atomic_base.h (see the comment at the top of the file). Can we keep the diff between the two files small? libstdc++-v3/include/bits/atomic_base.h uses _GLIBCXX_ALWAYS_INLINE macro: #ifndef _GLIBCXX_ALWAYS_INLINE #define _GLIBCXX_ALWAYS_INLINE inline __attribute__((always_inline)) #endif Ahh.. I missed that inline in there when I had a quick look. Sorry for the noise :)
[COMMITTED] Add myself as maintainer for libitm.
On Mon, 2014-03-03 at 09:24 -0800, Richard Henderson wrote: On 03/03/2014 04:48 AM, Torvald Riegel wrote: Should I add myself as maintainer for libitm? Yes. Committed as r210448. commit 4da9024845f11053b56aa1318469029b044ff6d1 Author: Torvald Riegel trie...@redhat.com Date: Thu May 15 00:31:03 2014 +0200 Add myself as maintainer for libitm. diff --git a/MAINTAINERS b/MAINTAINERS index ae3d330..8d094be 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -174,6 +174,7 @@ libgomp Richard Henderson r...@redhat.com libgomp Jakub Jelinek ja...@redhat.com libiberty DJ Delorie d...@redhat.com libiberty Ian Lance Taylor i...@airs.com +libitm Torvald Riegel trie...@redhat.com libffi testsuite Andreas Tobler andre...@gcc.gnu.org libobjc Nicola Pero nicola.p...@meta-innovation.com libobjc Andrew Pinski pins...@gmail.com
Re: [PATCH] Fix libitm futex handling on non-x86/ppc/sh/alpha targets
On Wed, 2014-03-26 at 22:19 +0100, Jakub Jelinek wrote: Hi! The sys_futex0 caller expects return values as returned by raw syscalls, i.e. value = 0 success, negative value are errors -errorval. But, the syscall function returns value = 0 on success, and -1 on error, with errno set to errorval. This means if e.g. futex syscall fails with EAGAIN, and EPERM is 1, we get GTM program terminated because we think futex returned -EPERM. Fixed thusly, bootstrapped/regtested on s390x-linux, ok for trunk/4.8? Looks good to me. Thanks.
Re: [PATCH, LIBITM] Backport libitm bug fixes to FSF 4.8
On Fri, 2014-02-28 at 19:32 -0600, Peter Bergner wrote: I'd like to ask for permission to backport the following two LIBITM bug fixes to the FSF 4.8 branch. Although these are not technically fixing regressions, they do fix the libitm.c/reentrant.c testsuite failure on s390 and powerpc (or at least it will when we finally get our power8 code backported to FSF 4.8). It also fixes a real bug on x86 that is latent because we don't currently have a test case that warms up the x86's RTM hardware enough such that its xbegin succeeds exposing the bug. I'd like this backport so that the 4.8 based distros won't need to carry this as an add-on patch. It should also be fairly safe as well, since the fixed code is limited to the arches (x86, s390 and powerpc) that define USE_HTM_FASTPATH, so all others definitely won't see a difference. Looks good to me. I'll note I CC'd some of the usual suspects interested in TM as well as the normal RMs, because LIBITM doesn't seem to have a maintainer or reviewer listed in the MAINTAINERS file. Is that an oversight or??? I'm reviewing all libitm patches that I'm aware of (but I don't read gcc-patches regularly). Should I add myself as maintainer for libitm? Does this come with any other responsibilities than reviewing patches? Torvald
Re: [PATCH] libitm: Add custom HTM fast path for RTM on x86_64.
On Mon, 2013-08-26 at 09:49 -0700, Richard Henderson wrote: On 08/22/2013 02:57 PM, Torvald Riegel wrote: On Thu, 2013-08-22 at 12:05 -0700, Richard Henderson wrote: On 08/22/2013 11:39 AM, Torvald Riegel wrote: + /* Store edi for future HTM fast path retries. We use a stack slot +lower than the jmpbuf so that the jmpbuf's rip field will overlap +with the proper return address on the stack. */ + movl%edi, -64(%rsp) You havn't allocated the stack frame here, and you're storing outside the redzone. This is invalid. Two possibilities: (1) always allocate the stack frame on entry to the function (adds two register additions to the htm fast path -- in the noise i'd think) (2) store the edi value in the non-htm path, with the pr_HTMRetryableAbort bit or'd in. (adds an extra store to the non-htm path; probably noise). You'd want to mask out that bit when you reload it. Oops. Picked fix (2). Better now? Move the andl of edi down into the HAVE_AS_RTM block, above the orl of HTMRetriedAfterAbort. Ok with that change. Committed as r202101.
Re: [PATCH] libitm: Add custom HTM fast path for RTM on x86_64.
On Fri, 2013-08-30 at 16:49 +0200, Rainer Orth wrote: Torvald Riegel trie...@redhat.com writes: On Mon, 2013-08-26 at 09:49 -0700, Richard Henderson wrote: On 08/22/2013 02:57 PM, Torvald Riegel wrote: On Thu, 2013-08-22 at 12:05 -0700, Richard Henderson wrote: On 08/22/2013 11:39 AM, Torvald Riegel wrote: + /* Store edi for future HTM fast path retries. We use a stack slot + lower than the jmpbuf so that the jmpbuf's rip field will overlap + with the proper return address on the stack. */ + movl%edi, -64(%rsp) You havn't allocated the stack frame here, and you're storing outside the redzone. This is invalid. Two possibilities: (1) always allocate the stack frame on entry to the function (adds two register additions to the htm fast path -- in the noise i'd think) (2) store the edi value in the non-htm path, with the pr_HTMRetryableAbort bit or'd in. (adds an extra store to the non-htm path; probably noise). You'd want to mask out that bit when you reload it. Oops. Picked fix (2). Better now? Move the andl of edi down into the HAVE_AS_RTM block, above the orl of HTMRetriedAfterAbort. Ok with that change. Committed as r202101. The patch has broken Solaris/x86 bootstrap: Sorry about that. I'll try to remember to test on non-Linux / non-futex systems too next time. I committed the attached patch as r202116. (Please note that I'll be offline for two weeks starting in about an hour (yes, bad timing), so I hope I didn't miss anything else...) Torvald commit 29006efcecde6ec0366f6c88a662d30b9efb931f Author: torvald torvald@138bc75d-0d04-0410-961f-82ee72b054a4 Date: Fri Aug 30 17:13:05 2013 + libitm: Fix wrong initialization order introduced with r202101. * config/posix/rwlock.cc: Fix initialization order. git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@202116 138bc75d-0d04-0410-961f-82ee72b054a4 diff --git a/libitm/config/posix/rwlock.cc b/libitm/config/posix/rwlock.cc index 488e9c2..61b6ad9 100644 --- a/libitm/config/posix/rwlock.cc +++ b/libitm/config/posix/rwlock.cc @@ -30,11 +30,11 @@ namespace GTM HIDDEN { // ??? Move this back to the header file when constexpr is implemented. gtm_rwlock::gtm_rwlock() - : mutex (PTHREAD_MUTEX_INITIALIZER), + : summary (0), +mutex (PTHREAD_MUTEX_INITIALIZER), c_readers (PTHREAD_COND_INITIALIZER), c_writers (PTHREAD_COND_INITIALIZER), c_confirmed_writers (PTHREAD_COND_INITIALIZER), -summary (0), a_readers (0), w_readers (0), w_writers (0)
Re: [PATCH] libitm: Add custom HTM fast path for RTM on x86_64.
Attached is an updated patch. Changes explained below. On Wed, 2013-08-21 at 10:50 -0700, Richard Henderson wrote: -#if defined(USE_HTM_FASTPATH) !defined(HTM_CUSTOM_FASTPATH) +#ifdef USE_HTM_FASTPATH // HTM fastpath. Only chosen in the absence of transaction_cancel to allow // using an uninstrumented code path. // The fastpath is enabled only by dispatch_htm's method group, which uses @@ -187,6 +187,7 @@ GTM::gtm_thread::begin_transaction (uint32_t prop, const gtm_jmpbuf *jb) // indeed in serial mode, and HW transactions should never need serial mode // for any internal changes (e.g., they never abort visibly to the STM code // and thus do not trigger the standard retry handling). +#ifndef HTM_CUSTOM_FASTPATH if (likely(htm_fastpath (prop pr_hasNoAbort))) { for (uint32_t t = htm_fastpath; t; t--) @@ -237,6 +238,49 @@ GTM::gtm_thread::begin_transaction (uint32_t prop, const gtm_jmpbuf *jb) } } } +#else + // If we have a custom HTM fastpath in ITM_beginTransaction, we implement + // just the retry policy here. We communicate with the custom fastpath Don't you want this logic arranged as #ifdef HTM_CUSTOM_FASTPATH ... your new code #elif defined(USE_HTM_FASTPATH) ... existing htm code #endif It seems you didn't feel strongly about it (and neither do I), so I kept this as is. The current code arrangement has the advantage that the general HTM fastpath comes first, and second the stuff for the custom, more complex fastpath variants. - subq$56, %rsp - cfi_def_cfa_offset(64) + subq$64, %rsp + cfi_def_cfa_offset(72) You now have an abi-misaligned stack. Since the return address is pushed by the call, an aligned stack frame allocation is (N + 8) % 16 == 0. Fixed, and added more suggestions by Richard. We also discussed splitting the xbegin into two paths, one for the first HTM abort -- where we'd save the jmpbuf -- and one for subsequent aborts -- where we wouldn't need to save the jmpbuf. But given that we only save the jmpbuf when either going to non-HTM or after an HTM abort, the overheads didn't seem to matter enough to justify the added complexity. + // Accessed from assembly language, thus the asm specifier on + // the name, avoiding complex name mangling. +#ifdef __USER_LABEL_PREFIX__ +#define UPFX1(t) UPFX(t) +#define UPFX(t) #t + static gtm_rwlock serial_lock +__asm__(UPFX1(__USER_LABEL_PREFIX__) gtm_serial_lock); +#else + static gtm_rwlock serial_lock +__asm__(gtm_serial_lock); +#endif Now that we have 3 copies of this, we should simplify things. E.g. #ifdef __USER_LABEL_PREFIX__ # define UPFX1(X) #X # define UPFX UPFX1(__USER_LABEL_PREFIX__) #else # define UPFX #endif static gtm_rwlock serial_lock __asm__(UPFX gtm_serial_lock); Fixed (including the one additional indirection that was necessary ;) ). OK for trunk? commit 428aa3302c5674326585f178409f734222edfa8b Author: Torvald Riegel trie...@redhat.com Date: Wed Aug 21 11:40:54 2013 +0200 Add custom HTM fast path for RTM on x86_64. * libitm_i.h (gtm_thread): Assign an asm name to serial_lock. (htm_fastpath): Assign an asm name. * libitm.h (_ITM_codeProperties): Add non-ABI flags used by custom HTM fast paths. (_ITM_actions): Likewise. * config/x86/target.h (HTM_CUSTOM_FASTPATH): Enable custom fastpath on x86_64. * config/x86/sjlj.S (_ITM_beginTransaction): Add custom HTM fast path. * config/posix/rwlock.h (gtm_rwlock): Update comments. Move summary field to the start of the structure. * config/linux/rwlock.h (gtm_rwlock): Update comments. * beginend.cc (gtm_thread::begin_transaction): Add retry policy handling for custom HTM fast paths. diff --git a/libitm/beginend.cc b/libitm/beginend.cc index a3bf549..bd7b19e 100644 --- a/libitm/beginend.cc +++ b/libitm/beginend.cc @@ -165,7 +165,7 @@ GTM::gtm_thread::begin_transaction (uint32_t prop, const gtm_jmpbuf *jb) if (unlikely(prop pr_undoLogCode)) GTM_fatal(pr_undoLogCode not supported); -#if defined(USE_HTM_FASTPATH) !defined(HTM_CUSTOM_FASTPATH) +#ifdef USE_HTM_FASTPATH // HTM fastpath. Only chosen in the absence of transaction_cancel to allow // using an uninstrumented code path. // The fastpath is enabled only by dispatch_htm's method group, which uses @@ -187,6 +187,7 @@ GTM::gtm_thread::begin_transaction (uint32_t prop, const gtm_jmpbuf *jb) // indeed in serial mode, and HW transactions should never need serial mode // for any internal changes (e.g., they never abort visibly to the STM code // and thus do not trigger the standard retry handling). +#ifndef HTM_CUSTOM_FASTPATH if (likely(htm_fastpath (prop pr_hasNoAbort))) { for (uint32_t t = htm_fastpath; t; t--) @@ -237,6 +238,49 @@ GTM::gtm_thread::begin_transaction
Re: [PATCH] libitm: Add custom HTM fast path for RTM on x86_64.
On Thu, 2013-08-22 at 12:05 -0700, Richard Henderson wrote: On 08/22/2013 11:39 AM, Torvald Riegel wrote: + /* Store edi for future HTM fast path retries. We use a stack slot + lower than the jmpbuf so that the jmpbuf's rip field will overlap + with the proper return address on the stack. */ + movl%edi, -64(%rsp) You havn't allocated the stack frame here, and you're storing outside the redzone. This is invalid. Two possibilities: (1) always allocate the stack frame on entry to the function (adds two register additions to the htm fast path -- in the noise i'd think) (2) store the edi value in the non-htm path, with the pr_HTMRetryableAbort bit or'd in. (adds an extra store to the non-htm path; probably noise). You'd want to mask out that bit when you reload it. Oops. Picked fix (2). Better now? commit 42a3defc64d28dc2c96f5bbdf8547e5b7bd9a40b Author: Torvald Riegel trie...@redhat.com Date: Wed Aug 21 11:40:54 2013 +0200 Add custom HTM fast path for RTM on x86_64. * libitm_i.h (gtm_thread): Assign an asm name to serial_lock. (htm_fastpath): Assign an asm name. * libitm.h (_ITM_codeProperties): Add non-ABI flags used by custom HTM fast paths. (_ITM_actions): Likewise. * config/x86/target.h (HTM_CUSTOM_FASTPATH): Enable custom fastpath on x86_64. * config/x86/sjlj.S (_ITM_beginTransaction): Add custom HTM fast path. * config/posix/rwlock.h (gtm_rwlock): Update comments. Move summary field to the start of the structure. * config/linux/rwlock.h (gtm_rwlock): Update comments. * beginend.cc (gtm_thread::begin_transaction): Add retry policy handling for custom HTM fast paths. diff --git a/libitm/beginend.cc b/libitm/beginend.cc index a3bf549..bd7b19e 100644 --- a/libitm/beginend.cc +++ b/libitm/beginend.cc @@ -165,7 +165,7 @@ GTM::gtm_thread::begin_transaction (uint32_t prop, const gtm_jmpbuf *jb) if (unlikely(prop pr_undoLogCode)) GTM_fatal(pr_undoLogCode not supported); -#if defined(USE_HTM_FASTPATH) !defined(HTM_CUSTOM_FASTPATH) +#ifdef USE_HTM_FASTPATH // HTM fastpath. Only chosen in the absence of transaction_cancel to allow // using an uninstrumented code path. // The fastpath is enabled only by dispatch_htm's method group, which uses @@ -187,6 +187,7 @@ GTM::gtm_thread::begin_transaction (uint32_t prop, const gtm_jmpbuf *jb) // indeed in serial mode, and HW transactions should never need serial mode // for any internal changes (e.g., they never abort visibly to the STM code // and thus do not trigger the standard retry handling). +#ifndef HTM_CUSTOM_FASTPATH if (likely(htm_fastpath (prop pr_hasNoAbort))) { for (uint32_t t = htm_fastpath; t; t--) @@ -237,6 +238,49 @@ GTM::gtm_thread::begin_transaction (uint32_t prop, const gtm_jmpbuf *jb) } } } +#else + // If we have a custom HTM fastpath in ITM_beginTransaction, we implement + // just the retry policy here. We communicate with the custom fastpath + // through additional property bits and return codes, and either transfer + // control back to the custom fastpath or run the fallback mechanism. The + // fastpath synchronization algorithm itself is the same. + // pr_HTMRetryableAbort states that a HW transaction started by the custom + // HTM fastpath aborted, and that we thus have to decide whether to retry + // the fastpath (returning a_tryHTMFastPath) or just proceed with the + // fallback method. + if (likely(htm_fastpath (prop pr_HTMRetryableAbort))) +{ + tx = gtm_thr(); + if (unlikely(tx == NULL)) +{ + // See below. + tx = new gtm_thread(); + set_gtm_thr(tx); +} + // If this is the first abort, reset the retry count. We abuse + // restart_total for the retry count, which is fine because our only + // other fallback will use serial transactions, which don't use + // restart_total but will reset it when committing. + if (!(prop pr_HTMRetriedAfterAbort)) + tx-restart_total = htm_fastpath; + + if (--tx-restart_total 0) + { + // Wait until any concurrent serial-mode transactions have finished. + // Essentially the same code as above. + if (serial_lock.is_write_locked()) + { + if (tx-nesting 0) + goto stop_custom_htm_fastpath; + serial_lock.read_lock(tx); + serial_lock.read_unlock(tx); + } + // Let ITM_beginTransaction retry the custom HTM fastpath. + return a_tryHTMFastPath; + } +} + stop_custom_htm_fastpath: +#endif #endif tx = gtm_thr(); diff --git a/libitm/config/linux/rwlock.h b/libitm/config/linux/rwlock.h index 428299f..c761edf 100644 --- a/libitm/config/linux/rwlock.h +++ b/libitm/config/linux/rwlock.h @@ -39,6 +39,11 @@ struct gtm_thread
[PATCH] libitm: Add custom HTM fast path for RTM on x86_64.
This patch adds a custom HTM fast path for RTM on x86_64, which moves the core HTM fast path bits from gtm_thread::begin_transaction into the x86-specific ITM_beginTransaction implementation. It extends/changes the previous patch by Andi: http://gcc.gnu.org/ml/gcc-patches/2013-01/msg01228.html The custom fast path decreases the overheads of using HW transactions. gtm_thread::begin_transaction remains responsible for handling the retry policy after aborts of HW transactions, including when to switch to the fallback execution method. Right now, the C++ retry code isn't aware of the specific abort reason but just counts the number of retries for a particular transaction; it might make sense to add this in the future. Tested on Haswell with microbenchmarks and STAMP Vacation. OK for trunk? (Please take a closer look at the asm pieces of this.) (I've seen failures for STAMP Genome during my recent tests, but those happen also with just ITM_DEFAULT_METHOD=serialirr and a single thread, and AFAICT they don't seem to be related to the changes in _ITM_beginTransaction. I'll have a look...) Andreas and Peter: Is this sufficient as a proof of concept for custom fast paths on your architectures, or would you like to see any changes? Torvald commit 9329bd4504d13d415542d93418157d588b599b4e Author: Torvald Riegel trie...@redhat.com Date: Wed Aug 21 11:40:54 2013 +0200 Add custom HTM fast path for RTM on x86_64. * libitm_i.h (gtm_thread): Assign an asm name to serial_lock. (htm_fastpath): Assign an asm name. * libitm.h (_ITM_codeProperties): Add non-ABI flags used by custom HTM fast paths. (_ITM_actions): Likewise. * config/x86/target.h (HTM_CUSTOM_FASTPATH): Enable custom fastpath on x86_64. * config/x86/sjlj.S (_ITM_beginTransaction): Add custom HTM fast path. * config/posix/rwlock.h (gtm_rwlock): Update comments. Move summary field to the start of the structure. * config/linux/rwlock.h (gtm_rwlock): Update comments. * beginend.cc (gtm_thread::begin_transaction): Add retry policy handling for custom HTM fast paths. diff --git a/libitm/beginend.cc b/libitm/beginend.cc index a3bf549..bd7b19e 100644 --- a/libitm/beginend.cc +++ b/libitm/beginend.cc @@ -165,7 +165,7 @@ GTM::gtm_thread::begin_transaction (uint32_t prop, const gtm_jmpbuf *jb) if (unlikely(prop pr_undoLogCode)) GTM_fatal(pr_undoLogCode not supported); -#if defined(USE_HTM_FASTPATH) !defined(HTM_CUSTOM_FASTPATH) +#ifdef USE_HTM_FASTPATH // HTM fastpath. Only chosen in the absence of transaction_cancel to allow // using an uninstrumented code path. // The fastpath is enabled only by dispatch_htm's method group, which uses @@ -187,6 +187,7 @@ GTM::gtm_thread::begin_transaction (uint32_t prop, const gtm_jmpbuf *jb) // indeed in serial mode, and HW transactions should never need serial mode // for any internal changes (e.g., they never abort visibly to the STM code // and thus do not trigger the standard retry handling). +#ifndef HTM_CUSTOM_FASTPATH if (likely(htm_fastpath (prop pr_hasNoAbort))) { for (uint32_t t = htm_fastpath; t; t--) @@ -237,6 +238,49 @@ GTM::gtm_thread::begin_transaction (uint32_t prop, const gtm_jmpbuf *jb) } } } +#else + // If we have a custom HTM fastpath in ITM_beginTransaction, we implement + // just the retry policy here. We communicate with the custom fastpath + // through additional property bits and return codes, and either transfer + // control back to the custom fastpath or run the fallback mechanism. The + // fastpath synchronization algorithm itself is the same. + // pr_HTMRetryableAbort states that a HW transaction started by the custom + // HTM fastpath aborted, and that we thus have to decide whether to retry + // the fastpath (returning a_tryHTMFastPath) or just proceed with the + // fallback method. + if (likely(htm_fastpath (prop pr_HTMRetryableAbort))) +{ + tx = gtm_thr(); + if (unlikely(tx == NULL)) +{ + // See below. + tx = new gtm_thread(); + set_gtm_thr(tx); +} + // If this is the first abort, reset the retry count. We abuse + // restart_total for the retry count, which is fine because our only + // other fallback will use serial transactions, which don't use + // restart_total but will reset it when committing. + if (!(prop pr_HTMRetriedAfterAbort)) + tx-restart_total = htm_fastpath; + + if (--tx-restart_total 0) + { + // Wait until any concurrent serial-mode transactions have finished. + // Essentially the same code as above. + if (serial_lock.is_write_locked()) + { + if (tx-nesting 0) + goto stop_custom_htm_fastpath; + serial_lock.read_lock(tx); + serial_lock.read_unlock(tx); + } + // Let
Re: [PATCH] libitm: Add custom HTM fast path for RTM on x86_64.
On Wed, 2013-08-21 at 10:14 -0700, Andi Kleen wrote: Torvald Riegel trie...@redhat.com writes: +#endif leaq8(%rsp), %rax - subq$56, %rsp - cfi_def_cfa_offset(64) + subq$64, %rsp + cfi_def_cfa_offset(72) I don't see why you did this change and the addq change below. I need to store edi (ie, the properties of the transaction passed to the function by the compiler) on the stack, so these two changes create additional room for it. (I wasn't sure about alignment requirements, so I just used 8 bytes for it.) The rest seems reasonable to me, although I haven't tried to untangle the full dependencies between C++ and asm code for retries. If anyone has any suggestions for how to improve the comments, let me know. It would be likely cleaner to just keep the retries fully in C++ like the original patch did. There's no advantage of going back to assembler. That's true for x86, but it seems that for s390, we can't easily put the xbegin/tbegin into the C++ code because of floating point register save/restore issues. The added complexity on the x86 side seemed to be a reasonable price for having a general HTM fast path retry handling on the C++ side. Torvald
Re: [PATCH] libitm: Add custom HTM fast path for RTM on x86_64.
On Wed, 2013-08-21 at 19:41 +0200, Andi Kleen wrote: That's true for x86, but it seems that for s390, we can't easily put the xbegin/tbegin into the C++ code because of floating point register save/restore issues. The added complexity on the x86 side seemed to be a reasonable price for having a general HTM fast path retry handling on the C++ side. I don't see much point in trying to unify these fast paths for very different implementations with different properties. It would not surprise me if the highly tuned end result is different. It's kind of like trying to write a portable high performance memcpy(). I agree that highly tuned implementations might eventually be pretty different from each other. Nonetheless, I'm currently not aware of anybody volunteering to really work on highly tuned implementations, in particular regarding the retry policies. Once we get there -- and it would be great if somebody would work on that!, I'm completely fine with separating those bits out. Until then, I think it's preferable to keep things as unified as possible as long as this doesn't come with unreasonable complexity or runtime overheads. Torvald
Re: [PATCH] S/390: Hardware transactional memory support
On Tue, 2013-08-13 at 12:09 -0700, Richard Henderson wrote: On 08/13/2013 11:26 AM, Jakub Jelinek wrote: On Fri, Aug 02, 2013 at 11:05:33AM -1000, Richard Henderson wrote: On 08/02/2013 04:45 AM, Andreas Krebbel wrote: ! XCFLAGS=${XCFLAGS} -mzarch -mhtm -msoft-float Not good, since _ITM_R{F,D,E,CF,CD,CE} should return values in floating point registers; similarly for the write accessors. So, would it be enough to compile just beginend.cc with -msoft-float and the rest normally? No. From what I understand of the s390 restriction, we can't touch the floating point registers after starting a transaction, at least until we're committed to restoring them all. Which means that we have to have everything along the path from htm_begin_success == false until a longjmp restores them. This path includes a call to std::operator new, so that makes it a non-starter. Better, I think, to pass the gtm_jmpbuf to htm_begin. Then we can do uint32_t ret = __builtin_tbegin_nofloat (NULL); if (!htm_begin_success(ret)) // restore fpu state from jb return ret; at which point we're back to normal and we can do whatever we want within the normal abi wrt the fpu. Can we instead move the successful path of the HTM fastpath to the _ITM_beginTransaction asm function, and just do the fallback and retry code in beginend.cc? So, add a tbegin (or xbegin) and the check of serial_lock to the asm function as in Andi's patch (http://gcc.gnu.org/ml/gcc-patches/2013-01/msg00640.html), but keep the restart policy code (ie, the serial lock read/write locking for the wait, etc.) in beginend.cc? The latter could return a new code that tells the asm function to try the HTM fastpath again; we might also want to use a separate bit in the arguments to gtm_thread::begin_transaction to tell it whether the HTM fastpath failed before. This way, we get lower overhead when the HTM fastpath succeeds right away because we can do the minimally necessary thing in asm; we still can change the retry policies easily; and we can restore the floating point registers to a sane state for any libitm C/C++ code that we might call without having to do any restore inside of the C/C++ code. Thoughts? I'll put working on a draft of this for x86 on my list.
Re: [PATCH] S/390: Hardware transactional memory support
On Fri, 2013-06-21 at 12:23 +0200, Andreas Krebbel wrote: Index: libitm/config/s390/target.h === *** libitm/config/s390/target.h.orig --- libitm/config/s390/target.h *** [...] + static inline uint32_t + htm_begin () + { + return __builtin_tbegin_nofloat (NULL); + } I'm not quite sure I understand why the nofloat variant is sufficient; is this because we don't use FPRs in the abort/restart code of libitm (and the FPRs are known to be clobbered after starting a txn)? There is a bit in the properties passed to _ITM_beginTransaction which states whether the txn has floating point update code (pr_hasNoFloatUpdate). Would this be useful? (BTW, sorry for the late response. I don't read gcc-patches frequently, so please CC me on libitm-related things.)
Re: [PATCH][4.8 backport] S/390: Transactional Execution support
On Thu, 2013-08-01 at 10:55 +0200, Andreas Krebbel wrote: Torvald, could you please consider adding backports of the following patches to the 4.8 branch? 19/06/13 [patch] libitm: Fix handling of reentrancy in the HTM fastpath http://gcc.gnu.org/ml/gcc-patches/2013-06/msg01132.html 20/06/13 Re: [patch] libitm: Fix handling of reentrancy in the HTM fastpath http://gcc.gnu.org/ml/gcc-patches/2013-06/msg01188.html I believe Jakub already backported these two; at least he told me he was planning to, IIRC. Torvald
Re: [PATCH] S/390: Hardware transactional memory support
On Fri, 2013-08-02 at 15:16 +0200, Andreas Krebbel wrote: On 02/08/13 13:31, Torvald Riegel wrote: On Fri, 2013-06-21 at 12:23 +0200, Andreas Krebbel wrote: Index: libitm/config/s390/target.h === *** libitm/config/s390/target.h.orig --- libitm/config/s390/target.h *** [...] + static inline uint32_t + htm_begin () + { + return __builtin_tbegin_nofloat (NULL); + } I'm not quite sure I understand why the nofloat variant is sufficient; is this because we don't use FPRs in the abort/restart code of libitm (and the FPRs are known to be clobbered after starting a txn)? Yes. To my understanding there are 2 potential issues when the FPRs are not saved. 1. The abort code will read corrupted FPR content for FPRs: - which are live across tbegin and - are updated in the TX body and - are read in the abort code The abort code will see the updated value from the TX body. Since libitm implements TX begins as function calls only call-saved registers can be live across a tbegin. But all the call-saved FPRs are saved in _ITM_beginTransaction and get restored when doing the longjmp back into the user code. So this should be no problem. 2. The other problem is that the ABI might get violated when aborting into a callee which already returned and there are FPRs which have been modified afterwards. In that case the compiler would not have generated FPR save/restore instructions in the function prologue/epilogue of the callee. But in fact there are modified FPRs when exiting the second time. But again this should be no problem thanks to the setjmp/longjmp semantics of _ITM_beginTransaction and GTM_longjmp. If the TX body modified a call-saved FPR it will get restored on the abort path back from libitm to the user code. As long as libitm does not use FPRs itself this should be safe without having tbegin clobbering FPRs. And I suppose that even if it would use FPRs, it shouldn't ever access memory locations touched in user code. Thanks for the explanation. Some of this was obvious, sorry, but that's what happens when a mental context switch is incomplete :) Next time, I'd prefer a summary of such reasoning to be added as a comment in the source; it's just too easy to forget about all the subtleties... There is a bit in the properties passed to _ITM_beginTransaction which states whether the txn has floating point update code (pr_hasNoFloatUpdate). Would this be useful? This probably would be useful in the ITM_beginTransaction / GTM_longjmp implementations which in that case could try to avoid the FPR save/restores. But Richard mentioned that these bits so far are never set by GCC since it is lacking the required infos from the register allocator. Another thing that comes to mind is that it might be useful to put parts of the HTM fast path into the asm pieces of _ITM_beginTransaction. That way, one can use exactly the amount of SW setjmp one needs given what's easy to save/restore for the TM; this might decrease the startup costs of a txn somewhat. Andi Kleen posted a patch for how to do this for RTM a while ago (but it had some issues): http://gcc.gnu.org/ml/gcc-patches/2013-01/msg01189.html Perhaps something along these lines (while avoiding the issues :) ) would be useful for s/390 and Power too. Torvald
Re: [patch] libitm: Fix handling of reentrancy in the HTM fastpath
On Wed, 2013-06-19 at 22:13 -0500, Peter Bergner wrote: On Thu, 2013-06-20 at 00:51 +0200, Torvald Riegel wrote: On Wed, 2013-06-19 at 14:43 -0500, Peter Bergner wrote: I'm having trouble seeing why/when _ITM_inTransaction() is returning something other than inIrrevocableTransaction. I'll see if I can determine why and will report back. Ok, we return outsideTransaction because the nesting level (tx-nesting) is zero. That's a second bug in libitm, sorry. Can you try with the attached patch additionally to the previous one? Thanks! Ok, with this patch, plus adding a powerpc implementation of htm_transaction_active(), reentrant.c now executes correctly on both HTM and non-HTM hardware for me. Thanks for all of your help with this! I'd still like to hear from Andreas, whether the reentrant.c test case with both patches, now works on S390. I'll note unlike your x86 htm_transaction_active() implementation, my implementation has to check for htm_available() first before executing the htm instruction which tells me whether I'm in transaction state or not, otherwise I'll get a SIGILL on non-HTM hardware. Unfortunately, my htm_available() call uses getauxval() to query the AUXV for a hwcap bit. Is there a place I can stash the result of the first call, so that subsequent calls use the cached value? Normally, I could use a static var, but I doubt that is what we want to do in static inline functions. You're right, that was missing for x86 as well. Please see the updated second patch that is attached. It additionally checks htm_fastpath to see whether we are actually using the HTM. This variable is initialized to the value that htm_init() returns. This should do the right thing I suppose. Torvald commit c8352d4d9fa5cfa3453a61c581956835de9753e5 Author: Torvald Riegel trie...@redhat.com Date: Thu Jun 20 00:46:59 2013 +0200 libitm: Handle HTM fastpath in status query functions. diff --git a/libitm/config/x86/target.h b/libitm/config/x86/target.h index 77b627f..063c09e 100644 --- a/libitm/config/x86/target.h +++ b/libitm/config/x86/target.h @@ -125,6 +125,13 @@ htm_abort_should_retry (uint32_t begin_ret) { return begin_ret _XABORT_RETRY; } + +/* Returns true iff a hardware transaction is currently being executed. */ +static inline bool +htm_transaction_active () +{ + return _xtest() != 0; +} #endif diff --git a/libitm/query.cc b/libitm/query.cc index 5707321..39a35b3 100644 --- a/libitm/query.cc +++ b/libitm/query.cc @@ -43,6 +43,15 @@ _ITM_libraryVersion (void) _ITM_howExecuting ITM_REGPARM _ITM_inTransaction (void) { +#if defined(USE_HTM_FASTPATH) + // If we use the HTM fastpath, we cannot reliably detect whether we are + // in a transaction because this function can be called outside of + // a transaction and thus we can't deduce this by looking at just the serial + // lock. This function isn't used in practice currently, so the easiest + // way to handle it is to just abort. + if (htm_fastpath htm_transaction_active()) +htm_abort(); +#endif struct gtm_thread *tx = gtm_thr(); if (tx (tx-nesting 0)) { @@ -58,6 +67,11 @@ _ITM_inTransaction (void) _ITM_transactionId_t ITM_REGPARM _ITM_getTransactionId (void) { +#if defined(USE_HTM_FASTPATH) + // See ITM_inTransaction. + if (htm_fastpath htm_transaction_active()) +htm_abort(); +#endif struct gtm_thread *tx = gtm_thr(); return (tx (tx-nesting 0)) ? tx-id : _ITM_noTransactionId; }
[patch] libitm: Fix handling of reentrancy in the HTM fastpath
(Re-sending to the proper list. Sorry for the noise at gcc@.) http://gcc.gnu.org/bugzilla/show_bug.cgi?id=57643 The HTM fastpath didn't handle a situation in which a relaxed transaction executed unsafe code that in turn starts a transaction; it simply tried to wait for the other transaction, not checking whether the current thread started the other transaction. We fix this by doing this check, and if we have the lock, we just continue with the fallback serial-mode path instead of using a HW transaction. The current code won't do the check before starting a HW transaction, but this way we can keep the HTM fastpath unchanged; also, this particular reentrancy is probably infrequent in practice, so I suppose the small slowdown shouldn't matter much. Also, I first thought about trying to use the HTM in the reentrancy case, but this doesn't make any sense because other transactions can't run anyway, and we should really just finish the serial-mode transaction as fast as possible. Peter and/or Andreas: Could you please check that this fixes the bug you see on Power/s390? Thanks. Torvald commit 185af84e365e1bae31aea5afd6e67e81f3c32c72 Author: Torvald Riegel trie...@redhat.com Date: Wed Jun 19 16:42:24 2013 +0200 libitm: Fix handling of reentrancy in the HTM fastpath. PR libitm/57643 diff --git a/libitm/beginend.cc b/libitm/beginend.cc index 93e702e..a3bf549 100644 --- a/libitm/beginend.cc +++ b/libitm/beginend.cc @@ -197,6 +197,8 @@ GTM::gtm_thread::begin_transaction (uint32_t prop, const gtm_jmpbuf *jb) // We are executing a transaction now. // Monitor the writer flag in the serial-mode lock, and abort // if there is an active or waiting serial-mode transaction. + // Note that this can also happen due to an enclosing + // serial-mode transaction; we handle this case below. if (unlikely(serial_lock.is_write_locked())) htm_abort(); else @@ -219,6 +221,14 @@ GTM::gtm_thread::begin_transaction (uint32_t prop, const gtm_jmpbuf *jb) tx = new gtm_thread(); set_gtm_thr(tx); } + // Check whether there is an enclosing serial-mode transaction; + // if so, we just continue as a nested transaction and don't + // try to use the HTM fastpath. This case can happen when an + // outermost relaxed transaction calls unsafe code that starts + // a transaction. + if (tx-nesting 0) + break; + // Another thread is running a serial-mode transaction. Wait. serial_lock.read_lock(tx); serial_lock.read_unlock(tx); // TODO We should probably reset the retry count t here, unless
Re: [patch] libitm: Fix handling of reentrancy in the HTM fastpath
On Wed, 2013-06-19 at 14:43 -0500, Peter Bergner wrote: On Wed, 2013-06-19 at 10:57 -0500, Peter Bergner wrote: On Wed, 2013-06-19 at 10:49 -0500, Peter Bergner wrote: This is due to the following in _ITM_inTransaction(): 47 if (tx (tx-nesting 0)) (gdb) p tx $2 = (GTM::gtm_thread *) 0x10901bf0 (gdb) p tx-nesting $3 = 1 (gdb) step 49 if (tx-state gtm_thread::STATE_IRREVOCABLE) (gdb) p tx-state $4 = 3 (gdb) p gtm_thread::STATE_IRREVOCABLE $5 = 2 (gdb) step 50return inIrrevocableTransaction; Bah, ignore this. It's a different call that is returning something other than inIrrevocableTransaction. Unfortunately, gdb is having problems inside hw txns and I'm having trouble seeing why/when _ITM_inTransaction() is returning something other than inIrrevocableTransaction. I'll see if I can determine why and will report back. Ok, we return outsideTransaction because the nesting level (tx-nesting) is zero. That's a second bug in libitm, sorry. Can you try with the attached patch additionally to the previous one? Thanks! Torvald commit 02dde6bb91107792fb0cb9f5c4785d25b6aa0e3c Author: Torvald Riegel trie...@redhat.com Date: Thu Jun 20 00:46:59 2013 +0200 libitm: Handle HTM fastpath in status query functions. diff --git a/libitm/config/x86/target.h b/libitm/config/x86/target.h index 77b627f..063c09e 100644 --- a/libitm/config/x86/target.h +++ b/libitm/config/x86/target.h @@ -125,6 +125,13 @@ htm_abort_should_retry (uint32_t begin_ret) { return begin_ret _XABORT_RETRY; } + +/* Returns true iff a hardware transaction is currently being executed. */ +static inline bool +htm_transaction_active () +{ + return _xtest() != 0; +} #endif diff --git a/libitm/query.cc b/libitm/query.cc index 5707321..0ac3eda 100644 --- a/libitm/query.cc +++ b/libitm/query.cc @@ -43,6 +43,15 @@ _ITM_libraryVersion (void) _ITM_howExecuting ITM_REGPARM _ITM_inTransaction (void) { +#if defined(USE_HTM_FASTPATH) + // If we use the HTM fastpath, we cannot reliably detect whether we are + // in a transaction because this function can be called outside of + // a transaction and thus we can't deduce this by looking at just the serial + // lock. This function isn't used in practice currently, so the easiest + // way to handle it is to just abort. + if (htm_transaction_active()) +htm_abort(); +#endif struct gtm_thread *tx = gtm_thr(); if (tx (tx-nesting 0)) { @@ -58,6 +67,11 @@ _ITM_inTransaction (void) _ITM_transactionId_t ITM_REGPARM _ITM_getTransactionId (void) { +#if defined(USE_HTM_FASTPATH) + // See ITM_inTransaction. + if (htm_transaction_active()) +htm_abort(); +#endif struct gtm_thread *tx = gtm_thr(); return (tx (tx-nesting 0)) ? tx-id : _ITM_noTransactionId; }
Re: [PATCH] Add faster HTM fastpath for libitm TSX v2
On Tue, 2013-01-29 at 20:19 +0100, Andi Kleen wrote: next time please reply to the points raised in a review if you disagree with them, and don't just ignore them. That speeds up the review. It was all in the previous email on the topic. This v2 patch did not incorporate all the changes that I requested, nor did you explicitly object to those you didn't incorporate. Maybe you don't care what's in libitm's comments, but I do. // See gtm_thread::begin_transaction. -uint32_t GTM::htm_fastpath = 0; +uint32_t GTM::htm_fastpath asm(__gtm_htm_fastpath) = 0; + +uint32_t *GTM::global_lock asm(__gtm_global_lock); Rearrange the serial lock's fields (see below). To my knowledge C++ classes have no guaranteed layout, so that's not safe because there is no guarantee where the vtable pointers are. it would be only with plain old structs. And gtm_rwlock doesn't have any virtual members, right? So it's like a plain old struct (ie, it's a POD type). + // pr_tryHTM can be set by an assembler fast path when it already tried + // a hardware transaction once. In this case we do one retry less. pr_tryHTM is already documented. And I asked you to simply reference this in a comment. What's the problem with that? I do not want people working on libitm to have to grep through the code for uses of pr_tryHTM and look for comments that might be related to it just because you don't want to put a simple reference into the comment at the declaration of pr_tryHTM. Can't you see that adding the reference is just plain useful for everybody else? Torvald
Re: [PATCH] Add faster HTM fastpath for libitm TSX
On Thu, 2013-01-24 at 17:45 +0100, Andi Kleen wrote: + uint32_t *__gtm_global_lock; Same, and rearrange the serial lock's fields (see below). My understanding is that C++ makes no guarantees of class layout. That is why i ended up not relying on the layout. gtm_rwlock is a POD type, so you get the guarantees that C would provide. Torvald
Re: [PATCH] Add faster HTM fastpath for libitm TSX v2
On Thu, 2013-01-24 at 19:30 -0800, Andi Kleen wrote: From: Andi Kleen a...@linux.intel.com The libitm TSX hardware transaction fast path currently does quite a bit of unnecessary work (saving registers etc.) before even trying to start a hardware transaction. This patch moves the initial attempt at a transaction early into the assembler stub. Complicated work like retries is still done in C. So this is essentially a fast path for the fast path. The assembler code doesn't understand the layout of serial_lock, but it needs to check that serial_lock is free. We export just the lock variable as a separate pointer for this. The assembler code controls the HTM fast path with a new pr_tryHTM flag. I had to reorganize the retry loop slightly so that the waiting for the lock to be free again is done first (because the first transaction is already done elsewhere). This makes the patch look larger than it really is. This just moves one basic block around. Passes bootstrap/testsuite on x86_64 v2: Use asm() to supply assembler name of namespaced variables Make comments more verbose. Andi, next time please reply to the points raised in a review if you disagree with them, and don't just ignore them. That speeds up the review. libitm/: 2013-01-24 Andi Kleen a...@linux.intel.com * beginend.cc (GTM::htm_fastpath): Add asm alias to __gtm_htm_fastpath (GTM::global_lock): Add. (begin_transaction): Check pr_tryHTM. Remove one iteration from HTM retry loop. Move lock free check to the beginning of loop body. * config/linux/rwlock.h (gtm_rwlock::get_lock_var): Add. * config/posix/rwlock.h (tm_rwlock::get_lock_var): Add. * config/x86/sjlj.S: Include config.h. (pr_tryHTM, pr_uninstrumentedCode, pr_hasNoAbort, a_runInstrumentedCode, a_runUninstrumentedCode, _XABORT_EXPLICIT, _XABORT_RETRY): Add defines. (_ITM_beginTransaction): Add HTM fast path. * libitm.h (pr_tryHTM): Add. * libitm_i.h (GTM::htm_fastpath): Add asm alias. (GTM::gtm_global_lock): Add. * method-serial.cc (method_group::init, fini): Initialize GTM::global_lock and GTM::htm_fastpath. --- libitm/beginend.cc | 46 +++-- libitm/config/linux/rwlock.h |5 + libitm/config/posix/rwlock.h |5 + libitm/config/x86/sjlj.S | 47 ++ libitm/libitm.h |7 +-- libitm/libitm_i.h|3 ++- libitm/method-serial.cc |2 ++ 7 files changed, 92 insertions(+), 23 deletions(-) diff --git a/libitm/beginend.cc b/libitm/beginend.cc index 4369946..118920b 100644 --- a/libitm/beginend.cc +++ b/libitm/beginend.cc @@ -55,7 +55,9 @@ static pthread_key_t thr_release_key; static pthread_once_t thr_release_once = PTHREAD_ONCE_INIT; // See gtm_thread::begin_transaction. -uint32_t GTM::htm_fastpath = 0; +uint32_t GTM::htm_fastpath asm(__gtm_htm_fastpath) = 0; + +uint32_t *GTM::global_lock asm(__gtm_global_lock); Rearrange the serial lock's fields (see below). /* Allocate a transaction structure. */ void * @@ -187,27 +189,13 @@ GTM::gtm_thread::begin_transaction (uint32_t prop, const gtm_jmpbuf *jb) // indeed in serial mode, and HW transactions should never need serial mode // for any internal changes (e.g., they never abort visibly to the STM code // and thus do not trigger the standard retry handling). Add a comment as second paragraph in the previous block of comments, briefly explaining when pr_tryHTM is set (ie, under which condition), and that this happens in the assembler implementation of _ITM_beginTransaction. - if (likely(htm_fastpath (prop pr_hasNoAbort))) + // pr_tryHTM can be set by an assembler fast path when it already tried + // a hardware transaction once. In this case we do one retry less. + if (likely(prop pr_tryHTM)) { - for (uint32_t t = htm_fastpath; t; t--) + // One transaction has been already tried by the assembler fast path. + for (uint32_t t = htm_fastpath - 1; t; t--) { - uint32_t ret = htm_begin(); - if (htm_begin_success(ret)) - { - // We are executing a transaction now. - // Monitor the writer flag in the serial-mode lock, and abort - // if there is an active or waiting serial-mode transaction. - if (unlikely(serial_lock.is_write_locked())) - htm_abort(); - else - // We do not need to set a_saveLiveVariables because of HTM. - return (prop pr_uninstrumentedCode) ? - a_runUninstrumentedCode : a_runInstrumentedCode; - } - // The transaction has aborted. Don't retry if it's unlikely that - // retrying the transaction will be successful. - if (!htm_abort_should_retry(ret)) - break; //
Re: [PATCH] Add faster HTM fastpath for libitm TSX
On Sat, 2013-01-12 at 13:03 -0800, Andi Kleen wrote: From: Andi Kleen a...@linux.intel.com The libitm TSX hardware transaction fast path currently does quite a bit of unnecessary work (saving registers etc.) before even trying to start a hardware transaction. This patch moves the initial attempt at a transaction early into the assembler stub. Complicated work like retries is still done in C. So this is essentially a fast path for the fast path. The assembler code doesn't understand the layout of serial_lock, but it needs to check that serial_lock is free. We export just the lock variable as a separate pointer for this. The assembler code controls the HTM fast path with a new pr_tryHTM flag. I had to reorganize the retry loop slightly so that the waiting for the lock to be free again is done first (because the first transaction is already done elsewhere). This makes the patch look larger than it really is. This just moves one basic block around. Passes bootstrap/testsuite on x86_64 libitm/: 2013-01-12 Andi Kleen a...@linux.intel.com * beginend.cc (GTM::htm_fastpath): Remove. (__gtm_htm_fastpath, __gtm_global_lock): Add. (begin_transaction): Check pr_tryHTM. Remove one iteration from HTM retry loop. Move lock free check to the beginning of loop body. (_ITM_commitTransaction): Check __gtm_htm_fastpath instead of htm_fastpath. (_ITM_commitTransactionEH): dito. * config/linux/rwlock.h (gtm_rwlock::get_lock_var): Add. * config/posix/rwlock.h (tm_rwlock::get_lock_var): Add. * config/x86/sjlj.S: Include config.h. (pr_tryHTM, pr_uninstrumentedCode, pr_hasNoAbort, a_runInstrumentedCode, a_runUninstrumentedCode, _XABORT_EXPLICIT, _XABORT_RETRY): Add defines. (_ITM_beginTransaction): Add HTM fast path. * libitm.h (pr_tryHTM): Add. * libitm_i.h (GTM::htm_fastpath): Remove. (__gtm_htm_fastpath, __gtm_global_lock): Add. * method-serial.cc (method_group::init, fini): Initialize __gtm_global_lock and __gtm_htm_fastpath. (gtm_thread::serialirr_mode): Check __gtm_htm_fastpath. --- libitm/beginend.cc | 50 +++--- libitm/config/linux/rwlock.h |5 + libitm/config/posix/rwlock.h |5 + libitm/config/x86/sjlj.S | 47 +++ libitm/libitm.h |3 ++- libitm/libitm_i.h|7 +++--- libitm/method-serial.cc |8 --- 7 files changed, 96 insertions(+), 29 deletions(-) diff --git a/libitm/beginend.cc b/libitm/beginend.cc index 4369946..96ee1b3 100644 --- a/libitm/beginend.cc +++ b/libitm/beginend.cc @@ -55,7 +55,11 @@ static pthread_key_t thr_release_key; static pthread_once_t thr_release_once = PTHREAD_ONCE_INIT; // See gtm_thread::begin_transaction. -uint32_t GTM::htm_fastpath = 0; +extern C +{ + uint32_t __gtm_htm_fastpath = 0; Please keep this as-is and rather use the same approach as for gtm_thread::begin_transaction. + uint32_t *__gtm_global_lock; Same, and rearrange the serial lock's fields (see below). +}; /* Allocate a transaction structure. */ void * @@ -187,27 +191,11 @@ GTM::gtm_thread::begin_transaction (uint32_t prop, const gtm_jmpbuf *jb) // indeed in serial mode, and HW transactions should never need serial mode // for any internal changes (e.g., they never abort visibly to the STM code // and thus do not trigger the standard retry handling). Add a comment as second paragraph in the previous block of comments, briefly explaining when pr_tryHTM is set (ie, under which condition), and that this happens in the assembler implementation of _ITM_beginTransaction. - if (likely(htm_fastpath (prop pr_hasNoAbort))) + if (likely(prop pr_tryHTM)) { - for (uint32_t t = htm_fastpath; t; t--) + // One transaction has been already tried by the assembler fast path. + for (uint32_t t = __gtm_htm_fastpath - 1; t; t--) { - uint32_t ret = htm_begin(); - if (htm_begin_success(ret)) - { - // We are executing a transaction now. - // Monitor the writer flag in the serial-mode lock, and abort - // if there is an active or waiting serial-mode transaction. - if (unlikely(serial_lock.is_write_locked())) - htm_abort(); - else - // We do not need to set a_saveLiveVariables because of HTM. - return (prop pr_uninstrumentedCode) ? - a_runUninstrumentedCode : a_runInstrumentedCode; - } - // The transaction has aborted. Don't retry if it's unlikely that - // retrying the transaction will be successful. - if (!htm_abort_should_retry(ret)) - break; // Wait until any concurrent serial-mode transactions have finished. // This is an empty critical
Re: [PATCH] Make some asan builtins tm_pure (PR sanitizer/55508)
On Mon, 2012-12-17 at 13:52 +0400, Dmitry Vyukov wrote: resend in plain text On Mon, Dec 17, 2012 at 1:50 PM, Dmitry Vyukov dvyu...@google.com wrote: On Fri, Dec 14, 2012 at 5:43 PM, Torvald Riegel trie...@redhat.com wrote: On Thu, 2012-12-13 at 10:02 +0100, Jakub Jelinek wrote: On Thu, Dec 13, 2012 at 10:38:13AM +0400, Dmitry Vyukov wrote: On Wed, Dec 12, 2012 at 11:50 PM, Jakub Jelinek ja...@redhat.com wrote: Various TM tests ICE when built with -fgnu-tm -fsanitizer=address. The problem is that asan.c pass adds calls to builtins that weren't there before and TM is upset about it. The __asan_report* are all like abort, in correctly written program they shouldn't have a user visible effect, in bad program they will terminate the process, but in any case it doesn't matter how many times they are retried as part of a transaction, there is no state to roll back on transaction cancellation. __asan_handle_no_return, while not being noreturn, just marks the stack as unprotected, so again in correctly written application no effect, in bad app might result in some issues being undetected, but still, it can be done many times and isn't irreversible. I was only loosely following tm-languages discussions. Does gcc tm model guarantees strong consistency for all memory accesses? I mean there are tm implementations that allow transient inconsistencies, Will leave this to Torvald. This has two parts: (1) how TM fits into the C++11/C11 memory model, and (2) which guarantees the compiler and the TM runtime library agree on at the level of the TM ABI that we use in GCC. Regarding the first part, all transactions provide guarantees similar to a global lock. Specifically, there are virtual transaction start/end events that take part in sequenced-before (similar to acquisition and release of the global lock), whose are then guaranteed to execute (without transactions interleaving with each other) in a global total order (let's call this order TO). TO then contributes to happens-before. On the ABI level, the TM runtime is allowed to execute speculatively as long as (1) it does not expose any speculative execution to nontransactional code and (2) speculation doesn't violate the language's as-if rules (i.e., no visible side effects other than the abstract machine; no signals due to seg faults etc.). This means that, for example, the TM will not access data at a wider granularity than what nontransactional code that conforms to the C++11 memory model does would do. Second, transactions can have a tentative position in TO, but they will only expose final TO choices to nontxnal code. So, each transaction will execute code that would be valid when executed in isolation -- but it is possible that several transactions noncommitted transactions are in flight concurrently that may conflict with each other. The TM ensures that such speculative execution is safe and not visible to a race-free program. So, when a transaction commits at a certain position in TO, it will make sure that all other active transactions reach consensus on TO (up to this position) before it returns to the execution of nontxnal code. Those active transactions will either see that they would be still valid at a new TO position (after the committing txn), or they abort and then signal that they agree to this TO. That means that the nontxnal code will not be affected by any speculative execution, even if the prior txn privatized some data (this is called privatization safety in TM jargon). The sort-of counterpart to privatization safety is publication safety (ie, transactions can safely make some data shared). While privatization safety is handled by the TM runtime, publication safety is ensured by the compiler by not allowing any dangerous load/load reordering, basicallly. Publication safety is not yet ensured always, but Aldy is working on this. than are detected later and trx is restarted. Can't asan trigger false positives in this case? I can't imagine any. I also don't see any, because all loads/stores of all transactions, even those with a tentative TO position, would be a valid execution. For malloc/free in transactions, AFAIR, ASAN hooks into malloc/free directly, so when libitm handles those including rollback of an allocation, there shouldn't be false positives. However, there might be false negatives in case of free() because deallocations are deferred until transaction commit (they can't be rolled back). Could we tell ASAN about free() directly using a call added by the TM instrumentation pass? That would at least help catch the false negatives for all free()-like functions that the compiler knows about. I do not think