Re: [PATCH] libitm/x86: Correct offsets of __private_tm and pointer_guard

2018-06-12 Thread Torvald Riegel
On Wed, 2018-05-30 at 07:54 -0700, H.J. Lu wrote:
> In glibc, sysdeps/i386/nptl/tls.h has
> 
> typedef struct
> {
>   void *tcb;/* Pointer to the TCB.  Not necessarily the
>thread descriptor used by libpthread.  */
>   dtv_t *dtv;
>   void *self;   /* Pointer to the thread descriptor.  */
>   int multiple_threads;
>   uintptr_t sysinfo;
>   uintptr_t stack_guard;
>   uintptr_t pointer_guard;
>   int gscope_flag;
>   int __glibc_reserved1;
>   /* Reservation of some values for the TM ABI.  */
>   void *__private_tm[4];
>   /* GCC split stack support.  */
>   void *__private_ss;
> } tcbhead_t;
> 
> and sysdeps/x86_64/nptl/tls.h has
> 
> typedef struct
> {
>   void *tcb;/* Pointer to the TCB.  Not necessarily the
>thread descriptor used by libpthread.  */
>   dtv_t *dtv;
>   void *self;   /* Pointer to the thread descriptor.  */
>   int multiple_threads;
>   int gscope_flag;
>   uintptr_t sysinfo;
>   uintptr_t stack_guard;
>   uintptr_t pointer_guard;
>   unsigned long int vgetcpu_cache[2];
>   int __glibc_reserved1;
>   int __glibc_unused1;
>   /* Reservation of some values for the TM ABI.  */
>   void *__private_tm[4];
>   /* GCC split stack support.  */
>   void *__private_ss;
>   long int __glibc_reserved2;
>   /* Must be kept even if it is no longer used by glibc since programs,
>  like AddressSanitizer, depend on the size of tcbhead_t.  */
>   __128bits __glibc_unused2[8][4] __attribute__ ((aligned (32)));
> 
>   void *__padding[8];
> } tcbhead_t;
> 
> The offsets of __private_tm are
> 
> i386:   36 bytes
> x32:48 bytes
> x86_64: 80 bytes
> 
> and the offsets of pointer_guard are:
> 
> i386:   24 bytes
> x32:28 bytes
> x86_64: 48 bytes
> 
> Update SEG_READ and SEG_WRITE to use the offset of __private_tm as base
> and correct the offset of pointer_guard for x32.
> 
> Tested on i686, x86-64 and x32.  OK for trunk and release branches?

The patch itself looks okay to me, but the commit message is pretty
cryptic.  I can't ack for release branches.




Re: [PATCH 0/5] Make std::future::wait_* use std::chrono::steady_clock when required

2018-01-13 Thread Torvald Riegel
On Sun, 2018-01-07 at 20:55 +, Mike Crowe wrote:
> This patch series was originally submitted back in September at
> https://gcc.gnu.org/ml/libstdc++/2017-09/msg00083.html which ended up
> as https://patchwork.ozlabs.org/cover/817379/ . The patches received
> no comments at all, which may mean that they are perfect or that they
> are so bad that no-one knew where to start with criticising them. It
> would be good to know which. :)
> 
> The patches are unchanged from last time, but I have corrected a typo
> in one of the commit messages. The original cover message follows.
> 
> This is a first attempt to make std::future::wait_until and
> std::future::wait_for make correct use of
> std::chrono::steady_clock/CLOCK_MONOTONIC. It also makes
> std::future::wait_until react to changes to CLOCK_REALTIME during the
> wait, but only when passed a std::chrono::system_clock time point.

I have comments on the design.

First, I don't think we should not change
__atomic_futex_unsigned_base::_M_futex_wait_until, as there's a risk
that we'll change behavior of existing applications that work as
expected.
Instead, ISTM we should additionally expose the two options we have at
the level of futexes:
* Relative timeout using CLOCK_MONOTONIC
* Absolute timeout using CLOCK_REALTIME (which will fall back to the
former on old kernels, which is fine I think).

Then we do the following translations from functions that programs would
call to the new futex functions:

1) wait_for is a loop in which we load the current time from the steady
clock, then call the relative futex wait, and if that returns for a
spurious reason (ie, neither timeout nor is the expected value present),
we reduce the prior relative amount by the difference between the time
before the futex wait and the current time.

2) wait_until using the steady clock is a loop similar to wait_for, just
that we additionally compute the initial relative timeout.

3) wait_until using the system clock is a loop that uses
absolute-timeout futex wait.

4) For wait_until using an unknown clock, I'd say that synching to the
system clock is the right approach.  Using wait_until indicates that the
programmer wanted to have a point in time, not a duration.

Does this work for you?

If so, could you provide a revised patch that uses this approach and
includes this approach in the documentation?
(Sorry for the lack of comments in the current code).



Re: [PATCH 2/5] libstdc++ futex: Use FUTEX_CLOCK_REALTIME for wait

2018-01-12 Thread Torvald Riegel
On Tue, 2018-01-09 at 17:54 +, Mike Crowe wrote:
> On Tuesday 09 January 2018 at 13:50:54 +, Jonathan Wakely wrote:
> > On 07/01/18 20:55 +, Mike Crowe wrote:
> > > The futex system call supports waiting for an absolute time if
> > > FUTEX_WAIT_BITSET is used rather than FUTEX_WAIT. Doing so provides two
> > > benefits:
> > > 
> > > 1. The call to gettimeofday is not required in order to calculate a
> > >   relative timeout.
> > > 
> > > 2. If someone changes the system clock during the wait then the futex
> > >   timeout will correctly expire earlier or later. Currently that only
> > >   happens if the clock is changed prior to the call to gettimeofday.
> > > 
> > > According to futex(2), support for FUTEX_CLOCK_REALTIME was added in the
> > > v2.6.28 Linux kernel and FUTEX_WAIT_BITSET was added in v2.6.25. There is
> > > no attempt to detect the kernel version and fall back to the previous
> > > method.
> > 
> > I don't think we can require a specific kernel version just for this.
> 
> What is the minimum kernel version that libstdc++ requires? Since it's
> already relying on NPTL it can't go back earlier than v2.6, but I suppose
> that's a while before v2.6.28.

I'm not aware of any choice regarding this, but Jonathan will know for
sure.

Generally, I think choosing a minium kernel version might be helpful, in
particular if we want to optimize more often specifically for Linux
environments; this may become more worthwhile in the future, for example
when we look at new C++ features such as parallel algorithms, or
upcoming executors.
The gthreads abstraction may is a nice goal, but we can benefit a lot
from knowing what the underlying platform really is.

Another option might be to require a minimum glibc version on Linux, and
build libstdc++ for that.  That would yield a minimum kernel version as
well, and we may can make use of other things in return such as syscall
wrappers.

> > What happens if you use those bits on an older kernel, will there be
> > an ENOSYS error? Because that would allow us to try the new form, and
> > fallback to the old.
> 
> Glibc's nptl-init.c calls
> 
>  futex(.., FUTEX_WAIT_BITSET | FUTEX_CLOCK_REALTIME | FUTEX_PRIVATE_FLAG, ..)
> 
> and sets __have_futex_clock_realtime based on whether it gets ENOSYS back
> or not so it looks like it is possible to determine whether support is
> available.
> 
> The only such check I can find in libstdc++ is in filesystem/std-ops.cc
> fs::do_copy_file which can try sendfile(2) on each invocation but then
> automatically falls back to copying by steam. In that case, the overhead of
> the potentially-failing sendfile system call is small compared to copying
> the file.
> 
> Doing such a check in _M_futex_wait_until means one system call if
> FUTEX_CLOCK_REALTIME is supported, but three system calls if it is not
> supported. If we found a way to cache the answer in a thread-safe and cheap
> manner then this could be avoided.
> 
> Do you think it's worth trying to cache whether FUTEX_CLOCK_REALTIME is
> available?

It may be worth caching that, given how simple this can be:  Just add a
global atomic variable whose initial value means trying the most recent
syscall op, and set that to some other value that indicates an older
kernel.  Then check the value before attempting the syscall.  Can all be
relaxed atomic accesses because it's essentially just a best-effort
optimization.

Performance-wise, the trade-off is between an additional atomic load for
new kernels vs. one additional syscall for older kernels.  Given that we
need the fallback for old kernels anyway unless we select a minimum
version, I guess doing the caching makes sense.  The syscalls are on the
slow paths anyway.



Re: [PATCH] Fix __atomic to not implement atomic loads with CAS.

2017-02-03 Thread Torvald Riegel
On Fri, 2017-02-03 at 13:44 +, Ramana Radhakrishnan wrote:
> __atomic_load on ARM appears to be ok as well
> 
> except for
> 
> __atomic_load_di which should really be the ldrexd / strexd loop but we 
> could ameliorate that similar to your option 3b.

This uses just ldrexd now, and thus is not guaranteed to be atomic?

> On AArch64
> 
> * <16 byte loads have always been fine. The architecture allows single 
> copy atomic loads using single load instructions for all other sizes and 
> memory models, so we are fine there.
> 
> * we have gone through the libatomic locks from day one of the port for 
> 16 byte loads.  This has been a bit of a bugbear for a number of users 
> within ARM who would really like to get performance without heavy weight 
> locks for 16 byte atomic ops.

Would it be acceptable for those users to have loads that perform like
CAS loops, especially under contention?  Or are these users more
concerned about aarch64 not offering a true atomic 16-byte load?

> We could never switch this around on AArch64 to using the loop (or CAS) 
> by default as this would be the reverse issue (i.e. old compilers 
> calling libatomic locks and new code using the inlined instruction 
> sequence).

I think there's an implicit assumption that whenever one uses code
generated with a particular compiler, the libatomic version being used
at runtime is at least as recent as that particular compiler.  In a mix
of old and new code, this would mean that new libatomic has to be used.
However, AFAIK we haven't specified that explicitly yet.  (Richard
Henderson said we'd make similar implicit assumptions for libgcc, if I
understood him correctly.)

If that assumption holds, then switching from locks in libatomic to
inlined instructions is possible, provided that libatomic switches too.

> 
> My interest in this patch was piqued because if we were to switch 
> aarch64 to not use the locks in libatomic and go to a CAS or the loop 
> sequence I showed in my reply to Jakub, I don't believe we have an ABI 
> issue as I would expect there to be a single copy of libatomic on the 
> system and everyone would simply be using that.

Right, and it would have to be at least as recent as the most recent
compiler being used (eg, the system compiler on that system).

> However we would end up with the situation of generating stores for 
> __atomic_loads as you described above.
> 
> We could still in theory do that and gain the same bug for rodata and 
> volatile atomic pointers on AArch64 - I don't see a way of avoiding that 
> on aarch32 as we have a similar ABI issue to you.

Agreed.

> 
> > The patch introduces a can_atomic_load_p, which checks that an entry
> > exists in optabs and the machine descriptions for an atomic load of the
> > particular mode.
> >
> > IIRC, arm and aarch64 had atomic load patterns defined whenever they had
> > a CAS defined.  It would be nice if you could double-check that.  If
> > that's the case, nothing should change with my patch because
> > can_atomic_load_p would always return true if a CAS could be issued.
> > If that's not the case, arm/aarch64 could be in the same boat as x86_64
> > with cmpxchg16b; then, using Option 3b might be a useful (intermediary)
> > solution for ARM as well (OTOH, compatibility with existing code and
> > __sync builtins might perhaps not be as much of a factor as it might be
> > on x86_64).
> 
> On AArch64 IIRC only those instructions that are single copy atomic as 
> per the architecture are allowed to be atomic loads. Thus we do not 
> generate such loops anywhere.

I think the condition we'd have to check is rather that whenever there
is an atomic CAS available, there is also an atomic load available.
That is not quite the same as having only truly atomic loads available,
because then a CAS could still be available regardless of the
availability of a matching load.



Re: [PATCH] Fix __atomic to not implement atomic loads with CAS.

2017-02-02 Thread Torvald Riegel
On Thu, 2017-02-02 at 13:58 +0100, Thomas Schwinge wrote:
> > The other failures I saw didn't seem atomics related
> > (eg, openacc)
> 
> I suppose you're testing without nvptx offloading -- which failures do
> you see for OpenACC testing?  (There shouldn't be any for host fallback
> testing.)

Sorry, false alarm.  Those tests print out "FAILED", but they don't
actually fail.

> > I haven't compared it against a full bootstrap build and
> > make check of trunk.).
> 
> I just happened to do that for x86_64-pc-linux-gnu, and I'm seeing the
> following additional changes; posting this just in case that's not
> expected to happen:
> 
> -PASS: gcc.dg/atomic-compare-exchange-5.c (test for excess errors)
> -PASS: gcc.dg/atomic-compare-exchange-5.c execution test
> +UNSUPPORTED: gcc.dg/atomic-compare-exchange-5.c

Those changes are expected because we don't expect anymore that
cmpxchg16b availability also means that 16-byte atomics will be inlined.



Re: [PATCH] Fix __atomic to not implement atomic loads with CAS.

2017-02-02 Thread Torvald Riegel
On Thu, 2017-02-02 at 14:48 +, Ramana Radhakrishnan wrote:
> On 30/01/17 18:54, Torvald Riegel wrote:
> > This patch fixes the __atomic builtins to not implement supposedly
> > lock-free atomic loads based on just a compare-and-swap operation.
> >
> > If there is no hardware-backed atomic load for a certain memory
> > location, the current implementation can implement the load with a CAS
> > while claiming that the access is lock-free.  This is a bug in the cases
> > of volatile atomic loads and atomic loads to read-only-mapped memory; it
> > also creates a lot of contention in case of concurrent atomic loads,
> > which results in at least counter-intuitive performance because most
> > users probably understand "lock-free" to mean hardware-backed (and thus
> > "fast") instead of just in the progress-criteria sense.
> >
> > This patch implements option 3b of the choices described here:
> > https://gcc.gnu.org/ml/gcc/2017-01/msg00167.html
> 
> 
> Will Deacon pointed me at this thread asking if something similar could 
> be done on ARM.

It would be nice if someone more familiar with ARM could double-check
that ARM is not affected.  I guess ARM isn't, but that's based on me
looking at machine descriptions, which I hadn't ever done before working
on this patch...

The patch introduces a can_atomic_load_p, which checks that an entry
exists in optabs and the machine descriptions for an atomic load of the
particular mode.

IIRC, arm and aarch64 had atomic load patterns defined whenever they had
a CAS defined.  It would be nice if you could double-check that.  If
that's the case, nothing should change with my patch because
can_atomic_load_p would always return true if a CAS could be issued.
If that's not the case, arm/aarch64 could be in the same boat as x86_64
with cmpxchg16b; then, using Option 3b might be a useful (intermediary)
solution for ARM as well (OTOH, compatibility with existing code and
__sync builtins might perhaps not be as much of a factor as it might be
on x86_64).

The atomic load patterns you have could still be wrong, for example by
generating a LDXP/STXP loop or something else that can store on an
atomic load.  In that case, the problem is similar as not having a
custom load pattern, and the second case in the previous paragraph
applies.

> On armv8-a we can implement an atomic load of 16 bytes using an LDXP / 
> STXP loop as a 16 byte load isnt' single copy atomic. On armv8.1-a we do 
> have a CAS on 16 bytes.
> 
> This was discussed last year here.
> 
> https://gcc.gnu.org/ml/gcc/2016-06/msg00017.html
> 
> and the consensus seemed to be that we couldn't really do a CAS on 
> readonly data as we would have no way of avoiding a trap.

Yes, and I agree that not doing a CAS (or anything that can store on
atomic load) is the right thing to do.

BTW, on the ARM targets, is it possible that the __sync builtins would
use LDXP/STXP on 16 byte and the __atomic loads would fall back to using
libatomic and locks?  Or do you simply not provide 16-byte __sync
builtins?

> I'm sure I'm missing something piece of the puzzle, so I'd be interested 
> in how you avoid this in this implementation on x86_64 with what I can 
> see is a CAS.

If we'd start from scratch, we wouldn't use cmpxchg16b if we don't have
a 16-byte atomic load.  However, we did do that, and there might be code
out there that does have inlined cmpxchg16b.  As discussed in the thread
you cited, changing GCC 7 back to libatomics *and* the use of locks
would be an effective ABI break.  Therefore, my patch (and Option 3b)
makes a compromise and delegates to libatomic but libatomic gets custom
code to actually keep using cmpxchg16b.  That means that the actual
synchronization doesn't change, but we constrain this problem to
libatomic and prevent application code generated with GCC 7 from being
directly affected.

Does this clarify the x86_64 situation?



Re: [PATCH] Fix __atomic to not implement atomic loads with CAS.

2017-02-01 Thread Torvald Riegel
On Mon, 2017-01-30 at 19:54 +0100, Torvald Riegel wrote:
> This patch fixes the __atomic builtins to not implement supposedly
> lock-free atomic loads based on just a compare-and-swap operation.

After an off-list OK by Jakub, I have committed this as r245098.
Jakub will take care of the OpenMP side in a follow-up patch.



[PATCH] Fix __atomic to not implement atomic loads with CAS.

2017-01-30 Thread Torvald Riegel
This patch fixes the __atomic builtins to not implement supposedly
lock-free atomic loads based on just a compare-and-swap operation.

If there is no hardware-backed atomic load for a certain memory
location, the current implementation can implement the load with a CAS
while claiming that the access is lock-free.  This is a bug in the cases
of volatile atomic loads and atomic loads to read-only-mapped memory; it
also creates a lot of contention in case of concurrent atomic loads,
which results in at least counter-intuitive performance because most
users probably understand "lock-free" to mean hardware-backed (and thus
"fast") instead of just in the progress-criteria sense.

This patch implements option 3b of the choices described here:
https://gcc.gnu.org/ml/gcc/2017-01/msg00167.html

In a nutshell, this does change affected accesses to call libatomic
instead of inlining the CAS-based load emulation -- but libatomic will
continue to do the CAS-based approach.  Thus, there's no change in how
the changes are actually performed (including compatibility with the
__sync builtins, which are not changed) -- but we do make it much easier
to fix this in the future, and we do ensure that less future code will
have the problematic code inlined into it (and thus unfixable).

Second, the return of __atomic_always_lock_free and
__atomic_is_lock_free are changed to report false for the affected
accesses.  This aims at making users aware of the fact that they don't
get fast hardware-backed performance for the affected accesses.

This patch does not yet change how OpenMP atomics support is
implemented.  Jakub said he would take care of that.  I suppose one
approach would be to check can_atomic_load_p (introduced by this patch)
to decide in expand_omp_atomic whether to just use the mutex-based
approach; I think that conservatively, OpenMP atomics have to assume
that there could be an atomic load to a particular memory location
elsewhere in the program, so the choice between mutex-backed or not has
to be consistent for a particular memory location.

Thanks to Richard Henderson for providing the libatomic part of this
patch, and thanks to Andrew MacLeod for helping with the compiler parts.

I've tested this on an x86_64-linux bootstrap build and see no
regressions.  (With the exception of two OpenMP failures, which Jakub
will take care of.  The other failures I saw didn't seem atomics related
(eg, openacc); I haven't compared it against a full bootstrap build and
make check of trunk.).

AFAICT, in practice only x86 with -mcx16 (or where this is implicitly
implied) is affected.  The other architectures I looked at seemed to
have patterns for true hardware-backed atomic loads whenever they also
had a wider-than-word-sized CAS available.

I know we missed the stage3 deadline with this, unfortunately.  I think
it would be good to have this fix be part of GCC 7 though, because this
would ensure that less future code has the problematic load-via-CAS code
inlined.

Jakub: If this is OK for GCC 7, can you please take care of the OpenMP
bits and commit this?  Changelog entries are in the commit message.

If others could test on other hardware, this would also be appreciated.
commit 1db13cb386e673d5265bcaf2d70fc25dda22e5fd
Author: Torvald Riegel <trie...@redhat.com>
Date:   Fri Jan 27 17:40:44 2017 +0100

Fix __atomic to not implement atomic loads with CAS.

gcc/
	* builtins.c (fold_builtin_atomic_always_lock_free): Make "lock-free"
	conditional on existance of a fast atomic load.
	* optabs-query.c (can_atomic_load_p): New function.
	* optabs-query.h (can_atomic_load_p): Declare it.
	* optabs.c (expand_atomic_exchange): Always delegate to libatomic if
	no fast atomic load is available for the particular size of access.
	(expand_atomic_compare_and_swap): Likewise.
	(expand_atomic_load): Likewise.
	(expand_atomic_store): Likewise.
	(expand_atomic_fetch_op): Likewise.
	* testsuite/lib/target-supports.exp
	(check_effective_target_sync_int_128): Remove x86 because it provides
	no fast atomic load.
	(check_effective_target_sync_int_128_runtime): Likewise.

libatomic/
	* acinclude.m4: Add #define FAST_ATOMIC_LDST_*.
	* auto-config.h.in: Regenerate.
	* config/x86/host-config.h (FAST_ATOMIC_LDST_16): Define to 0.
	(atomic_compare_exchange_n): New.
	* glfree.c (EXACT, LARGER): Change condition and add comments.

diff --git a/gcc/builtins.c b/gcc/builtins.c
index bf68e31..0a0e8b9 100644
--- a/gcc/builtins.c
+++ b/gcc/builtins.c
@@ -6157,8 +6157,9 @@ fold_builtin_atomic_always_lock_free (tree arg0, tree arg1)
 
   /* Check if a compare_and_swap pattern exists for the mode which represents
  the required size.  The pattern is not allowed to fail, so the existence
- of the pattern indicates support is present.  */
-  if (can_compare_and_swap_p (mode, true))
+ of the pattern indicates support is present.  Al

[committed] libitm: Disable TSX on processors on which it may be broken.

2017-01-18 Thread Torvald Riegel
This fix follows the same approach that glibc uses to disable TSX on
processors on which it is broken.  TSX can also be disabled through a
microcode update on these processors, but glibc consensus is that it
cannot be detected reliably whether the microcode update has been
applied.  Thus, we just look for affected models/steppings.

Tested on x86_64-linux (but I don't have a machine with broken TSX
available).

libitm/ChangeLog

* config/x86/target.h (htm_available): Add check for some processors
on which TSX is broken.

commit ff5b36bbb136d73e673d8cbe234a4c070c3d72ee
Author: torvald 
Date:   Wed Jan 18 20:22:02 2017 +

libitm: Disable TSX on processors on which it may be broken.

	libitm/ChangeLog

	* config/x86/target.h (htm_available): Add check for some processors
	on which TSX is broken.

git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@244594 138bc75d-0d04-0410-961f-82ee72b054a4

diff --git a/libitm/config/x86/target.h b/libitm/config/x86/target.h
index 8d0a0da..665c7d6 100644
--- a/libitm/config/x86/target.h
+++ b/libitm/config/x86/target.h
@@ -78,6 +78,28 @@ htm_available ()
   if (__get_cpuid_max (0, NULL) >= 7)
 {
   unsigned a, b, c, d;
+  /* TSX is broken on some processors.  This can be fixed by microcode,
+	 but we cannot reliably detect whether the microcode has been
+	 updated.  Therefore, do not report availability of TSX on these
+	 processors.  We use the same approach here as in glibc (see
+	 https://sourceware.org/ml/libc-alpha/2016-12/msg00470.html).  */
+  __cpuid (0, a, b, c, d);
+  if (b == 0x756e6547 && c == 0x6c65746e && d == 0x49656e69)
+	{
+	  __cpuid (1, a, b, c, d);
+	  if (((a >> 8) & 0x0f) == 0x06) // Family.
+	{
+	  unsigned model = ((a >> 4) & 0x0f) // Model.
+		  + ((a >> 12) & 0xf0); // Extended model.
+	  unsigned stepping = a & 0x0f;
+	  if ((model == 0x3c)
+		  || (model == 0x45)
+		  || (model == 0x46)
+		  /* Xeon E7 v3 has correct TSX if stepping >= 4.  */
+		  || ((model == 0x3f) && (stepping < 4)))
+		return false;
+	}
+	}
   __cpuid_count (7, 0, a, b, c, d);
   if (b & cpuid_rtm)
 	return true;


Re: [gofrontend-dev] Go patch committed: Avoid warning by using a local var for std::ofstream

2016-09-22 Thread Torvald Riegel
On Wed, 2016-09-21 at 16:56 +0200, Florian Weimer wrote:
> Torvald, would it be possible to align mutexes internally on hppa, to
> avoid the 16-byte alignment of the entire struct (that is, store a
> pointer to the actual mutex object, which points to a sub-region of
> the struct which is suitably aligned)?

I think this should be possible.  The int used for the actual lock is
first in the structure, followed by two more int's that we could move to
the end of the structure (which is just padding currently).  That would
allow us to shift the lock member by 8 bytes on demand at runtime.



Re: [PATCH] Partially improve scalability of the unwinder (PR libgcc/71744)

2016-09-16 Thread Torvald Riegel
On Thu, 2016-09-15 at 06:05 -0700, Ian Lance Taylor wrote:
> Jakub Jelinek  writes:
> 
> > 2016-09-15  Jakub Jelinek  
> >
> > PR libgcc/71744
> > * unwind-dw2-fde.c (ATOMIC_FDE_FAST_PATH): Define if __register_frame*
> > is not the primary registry and atomics are available.
> > (any_objects_registered): New variable.
> > (__register_frame_info_bases, __register_frame_info_table_bases):
> > Atomically store 1 to any_objects_registered after registering first
> > unwind info.
> > (_Unwind_Find_FDE): Return early if any_objects_registered is 0.
> 
> This is OK.

In glibc, we have a rule that we add sufficient code comments for glibc.
I'm not in the position to set rules for GCC, but I think this would
help here too.  Trying to explain in comments why a certain memory order
is used would have at least brought up the issue you mention below.

> > +#ifdef ATOMIC_FDE_FAST_PATH
> > +  /* For targets where unwind info is usually not registered through these
> > + APIs anymore, avoid taking a global lock.  */
> > +  if (__builtin_expect (!__atomic_load_n (_objects_registered,
> > + __ATOMIC_ACQUIRE), 1))
> > +return NULL;
> > +#endif
> > +
> >init_object_mutex_once ();
> >__gthread_mutex_lock (_mutex);
> 
> I doubt it matters, but I don't think you need to use __ATOMIC_ACQUIRE
> in the atomic_load_n.  You could use __ATOMIC_RELAXED.  Acquiring the
> mutex is going to enforce cross-thread sequential consistency anyhow.

But then the release MO wouldn't be required either.  The
__gthread_mutex_lock call has no synchronizes-with relation with the
release MO stores the patch adds, if that was what you were assuming.

Note that in the C11 / C++11 memory model, lock acquisitions are not
considered to automatically also be acquire MO fences.  This may be the
case on many archs, but it's not something the memory model guarantees.




Re: [PATCH 3/4][PR 71931] Fix libitm tests

2016-09-06 Thread Torvald Riegel
On Wed, 2016-08-24 at 20:08 +0100, Szabolcs Nagy wrote:
> Pass build time CC make var down to dejagnu so the sysroot
> is set correctly when gcc is built with --with-build-sysroot.
> 
> libitm/
> 2016-08-24  Szabolcs Nagy  
> 
>   PR testsuite/71931
>   * configure.ac: Add AC_CONFIG_FILES.
>   * configure: Regenerated.
>   * testuite/Makefile.am: Add rule for libitm-test-support.exp.
>   * testuite/Makefile.in: Regenerated.
>   * testuite/libitm-test-support.exp.in: New.
>   * testuite/lib/libitm.exp (libitm_init): Use BUILD_CC.
> 

I don't know enough about the build system to really review this.  If a
similar patch has been ACKed and applied for libatomic (71931 states
that both are affected), then I guess this is OK?



Re: [PATCH] PR libitm/70456: Allocate aligned memory in gtm_thread operator new

2016-04-19 Thread Torvald Riegel
On Sat, 2016-04-02 at 09:25 -0700, H.J. Lu wrote:
> On Wed, Mar 30, 2016 at 5:34 AM, H.J. Lu  wrote:
> > Since GTM::gtm_thread has
> >
> > gtm_thread *next_thread __attribute__((__aligned__(HW_CACHELINE_SIZE)));
> >
> > GTM::gtm_thread::operator new should allocate aligned memory.
> >
> > Tested on Linux/x86-64.  OK for trunk.
> >
> >
> 
> This patch is better.  Tested on Linux/x86-64.  OK for trunk?

OK.



Re: [PING 2, PATCH] libitm: Introduce target macro TARGET_BEGIN_TRANSACTION_ATTRIBUTE.

2016-03-02 Thread Torvald Riegel
On Wed, 2016-03-02 at 07:36 +0100, Dominik Vogt wrote:
> On Mon, Feb 01, 2016 at 02:18:48PM +0100, Dominik Vogt wrote:
> > The attached patch adds the a target specific attribute via the
> > new target macro TARGET_BEGIN_TRANSACTION_ATTRIBUTE to the
> > function begin_transaction().  S/390 uses this to set the
> > soft-float target attribute which is needed to fix a crash with
> > -m31.
> > 
> > As there seems to be no place in libitm to document internal macros like
> > USE_HTM_FASTPATH or the new macro, I've put the documentation in a
> > comment where the macro is used.
> 
> Please note that we're waiting for approval of the libitm
> maintainer, not for approval of the s390 maintainer.

LGTM.  Sorry for missing this patch.  Please CC me on libitm patches if
I don't respond quickly.

Thanks.



Re: [PATCH] libitm: Fix HTM fastpath.

2016-01-22 Thread Torvald Riegel
On Fri, 2016-01-08 at 12:07 +0100, Torvald Riegel wrote:
> This patch fixes a thinko in the HTM fastpath implementation.  In a
> nutshell, we also need to monitor the HTM fastpath control (ie,
> htm_fastpath) variable from within a HW transaction on the HTM fastpath,
> so that such a HW transaciton will only execute if the HTM hastpath is
> still enabled.
> 
> We move htm_fastpath into the serial lock so that a HW transaction only
> needs one cacheline of HTM capacity to monitor both htm_fastpath and
> check that no non-HW-transaction is currently running.
> 
> Tested on x86_64-linux.
> 
> 2016-01-08  Torvald Riegel  <trie...@redhat.com>
> 
>   * beginend.cc (GTM::gtm_thread::serial_lock): Put on cacheline
>   boundary.
>   (htm_fastpath): Remove.
>   (gtm_thread::begin_transaction): Fix HTM fastpath.
>   (_ITM_commitTransaction): Adapt.
>   (_ITM_commitTransactionEH): Adapt.
>   * libitm/config/linux/rwlock.h (gtm_rwlock): Add htm_fastpath member
>   and accessors.
>   * libitm/config/posix/rwlock.h (gtm_rwlock): Likewise.
>   * libitm/config/posix/rwlock.cc (gtm_rwlock::gtm_rwlock): Adapt.
>   * libitm/config/x86/sjlj.S (_ITM_beginTransaction): Fix HTM fastpath.
>   * libitm/libitm_i.h (htm_fastpath): Remove declaration.
>   * libitm/method-serial.cc (htm_mg): Adapt.
>   (gtm_thread::serialirr_mode): Adapt.
>   * libitm/query.cc (_ITM_inTransaction, _ITM_getTransactionId): Adapt.

I have committed the attached patch (a minor rebase compared to the
prior one) after offline approval by Jakub Jelinek.  He tested on PPC as
well, and the patch fixed the problem we saw there.
commit 51a5f38f0228ed7e4772bf1d0439f93ab4ffcf23
Author: torvald <torvald@138bc75d-0d04-0410-961f-82ee72b054a4>
Date:   Fri Jan 22 16:13:06 2016 +

libitm: Fix HTM fastpath.

	* beginend.cc (GTM::gtm_thread::serial_lock): Put on cacheline
	boundary.
	(htm_fastpath): Remove.
	(gtm_thread::begin_transaction): Fix HTM fastpath.
	(_ITM_commitTransaction): Adapt.
	(_ITM_commitTransactionEH): Adapt.
	* libitm/config/linux/rwlock.h (gtm_rwlock): Add htm_fastpath member
	and accessors.
	* libitm/config/posix/rwlock.h (gtm_rwlock): Likewise.
	* libitm/config/posix/rwlock.cc (gtm_rwlock::gtm_rwlock): Adapt.
	* libitm/config/x86/sjlj.S (_ITM_beginTransaction): Fix HTM fastpath.
	* libitm/libitm_i.h (htm_fastpath): Remove declaration.
	* libitm/method-serial.cc (htm_mg): Adapt.
	(gtm_thread::serialirr_mode): Adapt.
	* libitm/query.cc (_ITM_inTransaction, _ITM_getTransactionId): Adapt.

git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@232735 138bc75d-0d04-0410-961f-82ee72b054a4

diff --git a/libitm/beginend.cc b/libitm/beginend.cc
index 00d28f4..1a258ad 100644
--- a/libitm/beginend.cc
+++ b/libitm/beginend.cc
@@ -32,7 +32,11 @@ using namespace GTM;
 extern __thread gtm_thread_tls _gtm_thr_tls;
 #endif
 
-gtm_rwlock GTM::gtm_thread::serial_lock;
+// Put this at the start of a cacheline so that serial_lock's writers and
+// htm_fastpath fields are on the same cacheline, so that HW transactions
+// only have to pay one cacheline capacity to monitor both.
+gtm_rwlock GTM::gtm_thread::serial_lock
+  __attribute__((aligned(HW_CACHELINE_SIZE)));
 gtm_thread *GTM::gtm_thread::list_of_threads = 0;
 unsigned GTM::gtm_thread::number_of_threads = 0;
 
@@ -51,9 +55,6 @@ static pthread_mutex_t global_tid_lock = PTHREAD_MUTEX_INITIALIZER;
 static pthread_key_t thr_release_key;
 static pthread_once_t thr_release_once = PTHREAD_ONCE_INIT;
 
-// See gtm_thread::begin_transaction.
-uint32_t GTM::htm_fastpath = 0;
-
 /* Allocate a transaction structure.  */
 void *
 GTM::gtm_thread::operator new (size_t s)
@@ -173,9 +174,11 @@ GTM::gtm_thread::begin_transaction (uint32_t prop, const gtm_jmpbuf *jb)
   // lock's writer flag and thus abort if another thread is or becomes a
   // serial transaction.  Therefore, if the fastpath is enabled, then a
   // transaction is not executing as a HW transaction iff the serial lock is
-  // write-locked.  This allows us to use htm_fastpath and the serial lock's
-  // writer flag to reliable determine whether the current thread runs a HW
-  // transaction, and thus we do not need to maintain this information in
+  // write-locked.  Also, HW transactions monitor the fastpath control
+  // variable, so that they will only execute if dispatch_htm is still the
+  // current method group.  This allows us to use htm_fastpath and the serial
+  // lock's writers flag to reliable determine whether the current thread runs
+  // a HW transaction, and thus we do not need to maintain this information in
   // per-thread state.
   // If an uninstrumented code path is not available, we can still run
   // instrumented code from a HW transaction because the HTM fastpath kicks
@@ -187,9 +190,14 @@ GTM::gtm_thread::begin

Re: [PATCH v2] libstdc++: Make certain exceptions transaction_safe.

2016-01-21 Thread Torvald Riegel
On Thu, 2016-01-21 at 11:00 +0100, Dominique d'Humières wrote:
> Torvald,
> 
> Now that I can bootstrap on darwin, I have found the following failure for 
> libitm.c++/libstdc++-safeexc.C
> 
> /opt/gcc/work/libitm/testsuite/libitm.c++/libstdc++-safeexc.C:50:2: error: 
> unsafe function call 'std::underflow_error::underflow_error(const string&)' 
> within atomic transaction
>   throw T (what);
>   ^

Well, yes, that's my oversight.  The previous fix disabled the support,
so we need to now xfail or disable this test on Darwin.  Same for AIX.
Ignore these failures for now.  I'll work on a fix.



Re: [PATCH v2] libstdc++: Make certain exceptions transaction_safe.

2016-01-21 Thread Torvald Riegel
On Thu, 2016-01-21 at 11:00 +0100, Dominique d'Humières wrote:
> Torvald,
> 
> Now that I can bootstrap on darwin, I have found the following failure for 
> libitm.c++/libstdc++-safeexc.C
> 
> /opt/gcc/work/libitm/testsuite/libitm.c++/libstdc++-safeexc.C:50:2: error: 
> unsafe function call 'std::underflow_error::underflow_error(const string&)' 
> within atomic transaction
>   throw T (what);
>   ^

Does this patch fix it (ie, mark the test unsupported)?
commit 259c0cf27d0a88eecc90af1aa500f88f6108cb04
Author: Torvald Riegel <trie...@redhat.com>
Date:   Thu Jan 21 16:21:33 2016 +0100

libitm: Disable testing transaction-safe exceptions on Darwin and AIX.

	* testsuite/libitm.c++/libstdc++-safeexc.C: Not supported on darwin
	or AIX.

diff --git a/libitm/testsuite/libitm.c++/libstdc++-safeexc.C b/libitm/testsuite/libitm.c++/libstdc++-safeexc.C
index 3e1655e..55ebd25 100644
--- a/libitm/testsuite/libitm.c++/libstdc++-safeexc.C
+++ b/libitm/testsuite/libitm.c++/libstdc++-safeexc.C
@@ -2,7 +2,10 @@
 // are indeed that.  Thus, this also tests the transactional clones in
 // libstdc++ and libsupc++.
 
-// { dg-do run }
+// Not supported on Darwin nor AIX because those lack the support for
+// weak references to undefined functions that we need in libstdc++ to make
+// exceptions transaction-safe.
+// { dg-do run { target { ! *-*-darwin* powerpc-ibm-aix* } } }
 
 #include 
 #include 


[PATCH] libitm: Disable testing transaction-safe exceptions on Darwin and AIX.

2016-01-21 Thread Torvald Riegel
On Thu, 2016-01-21 at 10:06 -0800, Mike Stump wrote:
> On Jan 21, 2016, at 9:29 AM, Dominique d'Humières <domi...@lps.ens.fr> wrote:
> > // { dg-do run { target { ! { *-*-darwin* powerpc-ibm-aix* } } } }
> 
> A comment to hint that this has something to do with weak undefined would be 
> nice.

Here's the patch I prepared (which indeed includes a comment).

OK for trunk?  I'm not quite sure whether this qualifies as a
regression, but having an additional test that now fails is one I guess.
commit 0323fed14832e5744cbc63bcfeeb6728f7f13394
Author: Torvald Riegel <trie...@redhat.com>
Date:   Thu Jan 21 16:21:33 2016 +0100

libitm: Disable testing transaction-safe exceptions on Darwin and AIX.

	* testsuite/libitm.c++/libstdc++-safeexc.C: Not supported on darwin
	or AIX.

diff --git a/libitm/testsuite/libitm.c++/libstdc++-safeexc.C b/libitm/testsuite/libitm.c++/libstdc++-safeexc.C
index 3e1655e..20e2e5e 100644
--- a/libitm/testsuite/libitm.c++/libstdc++-safeexc.C
+++ b/libitm/testsuite/libitm.c++/libstdc++-safeexc.C
@@ -2,7 +2,10 @@
 // are indeed that.  Thus, this also tests the transactional clones in
 // libstdc++ and libsupc++.
 
-// { dg-do run }
+// Not supported on Darwin nor AIX because those lack the support for
+// weak references to undefined functions that we need in libstdc++ to make
+// exceptions transaction-safe.
+// { dg-do run { target { ! { *-*-darwin* powerpc-ibm-aix* } } } }
 
 #include 
 #include 


Re: [PATCH v2] libstdc++: Make certain exceptions transaction_safe.

2016-01-21 Thread Torvald Riegel
On Thu, 2016-01-21 at 18:12 +, Pedro Alves wrote:
> On 01/21/2016 06:06 PM, Mike Stump wrote:
> > On Jan 21, 2016, at 9:29 AM, Dominique d'Humières  
> > wrote:
> >> // { dg-do run { target { ! { *-*-darwin* powerpc-ibm-aix* } } } }
> > 
> > A comment to hint that this has something to do with weak undefined would 
> > be nice.
> > 
> 
> Or come up with some new "dg-require-effective-target weakref-whatnot", 
> making it
> self-documenting.
> 

It's just one test that needs this right now.  If there should be more
in the future, I agree that a dg-require-... might be nicer.



Re: [PATCH] libitm: Disable testing transaction-safe exceptions on Darwin and AIX.

2016-01-21 Thread Torvald Riegel
On Thu, 2016-01-21 at 18:26 +, Jonathan Wakely wrote:
> On 21/01/16 10:19 -0800, Mike Stump wrote:
> >On Jan 21, 2016, at 10:15 AM, Torvald Riegel <trie...@redhat.com> wrote:
> >> On Thu, 2016-01-21 at 10:06 -0800, Mike Stump wrote:
> >>> On Jan 21, 2016, at 9:29 AM, Dominique d'Humières <domi...@lps.ens.fr> 
> >>> wrote:
> >>>> // { dg-do run { target { ! { *-*-darwin* powerpc-ibm-aix* } } } }
> >>>
> >>> A comment to hint that this has something to do with weak undefined would 
> >>> be nice.
> >>
> >> Here's the patch I prepared (which indeed includes a comment).
> >>
> >> OK for trunk?  I'm not quite sure whether this qualifies as a
> >> regression, but having an additional test that now fails is one I guess.
> >> 
> >
> >A simple testsuite fixup like this is still ok.  From a darwin, AIX 
> >perspective it is fine.  If either the transaction or the libstdc++ people 
> >like it, I think we’re set.
> 
> OK from the libstdc++ side.
> 

Committed.



[PATCH][committed] libitm: Remove dead code.

2016-01-19 Thread Torvald Riegel
I missed dead code when I removed the cacheline stuff.
local_type_traits hasn't been updated either, apparently leading to
bootstrap issues.  So we just remove more dead code.

Tested fine on x86_64-linux.  Committed.
commit c608b69c3c49c7d29033faf328fd4d117f31fd9f
Author: Torvald Riegel <trie...@redhat.com>
Date:   Tue Jan 19 20:43:10 2016 +0100

libitm: Remove dead code.

	* local_type_traits: Remove file.
	* libitm_i.h: Don't include it anymore.
	(sized_integral): Remove.

diff --git a/libitm/libitm_i.h b/libitm/libitm_i.h
index 751b4ab..ae88ff0 100644
--- a/libitm/libitm_i.h
+++ b/libitm/libitm_i.h
@@ -36,7 +36,6 @@
 #include 
 #include 
 #include 
-#include "local_type_traits"
 #include "local_atomic"
 
 /* Don't require libgcc_s.so for exceptions.  */
@@ -49,13 +48,6 @@ namespace GTM HIDDEN {
 
 using namespace std;
 
-// A helper template for accessing an unsigned integral of SIZE bytes.
-template struct sized_integral { };
-template<> struct sized_integral<1> { typedef uint8_t type; };
-template<> struct sized_integral<2> { typedef uint16_t type; };
-template<> struct sized_integral<4> { typedef uint32_t type; };
-template<> struct sized_integral<8> { typedef uint64_t type; };
-
 typedef unsigned int gtm_word __attribute__((mode (word)));
 
 // These values are given to GTM_restart_transaction and indicate the
diff --git a/libitm/local_type_traits b/libitm/local_type_traits
deleted file mode 100644
index 131e8d2..000
--- a/libitm/local_type_traits
+++ /dev/null
@@ -1,1901 +0,0 @@
-// C++0x type_traits -*- C++ -*-
-
-// Copyright (C) 2007-2016 Free Software Foundation, Inc.
-//
-// This file is part of the GNU ISO C++ Library.  This library is free
-// software; you can redistribute it and/or modify it under the
-// terms of the GNU General Public License as published by the
-// Free Software Foundation; either version 3, or (at your option)
-// any later version.
-
-// This library is distributed in the hope that it will be useful,
-// but WITHOUT ANY WARRANTY; without even the implied warranty of
-// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
-// GNU General Public License for more details.
-
-// Under Section 7 of GPL version 3, you are granted additional
-// permissions described in the GCC Runtime Library Exception, version
-// 3.1, as published by the Free Software Foundation.
-
-// You should have received a copy of the GNU General Public License and
-// a copy of the GCC Runtime Library Exception along with this program;
-// see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
-// <http://www.gnu.org/licenses/>.
-
-// 
-//
-// This is a copy of the libstdc++ header, with the trivial modification
-// of ignoring the c++config.h include.  If and when the top-level build is
-// fixed so that target libraries can be built using the newly built, we can
-// delete this file.
-//
-// 
-
-/** @file include/type_traits
- *  This is a Standard C++ Library header.
- */
-
-#ifndef _GLIBCXX_TYPE_TRAITS
-#define _GLIBCXX_TYPE_TRAITS 1
-
-// #pragma GCC system_header
-
-// #ifndef __GXX_EXPERIMENTAL_CXX0X__
-// # include 
-// #else
-// #include 
-
-namespace std // _GLIBCXX_VISIBILITY(default)
-{
-// _GLIBCXX_BEGIN_NAMESPACE_VERSION
-
-  /**
-   * @addtogroup metaprogramming
-   * @{
-   */
-
-  /// integral_constant
-  template
-struct integral_constant
-{
-  static constexpr _Tp  value = __v;
-  typedef _Tp   value_type;
-  typedef integral_constant<_Tp, __v>   type;
-  constexpr operator value_type() { return value; }
-};
-  
-  /// typedef for true_type
-  typedef integral_constant<bool, true> true_type;
-
-  /// typedef for false_type
-  typedef integral_constant<bool, false>false_type;
-
-  template
-constexpr _Tp integral_constant<_Tp, __v>::value;
-
-  // Meta programming helper types.
-
-  template<bool, typename, typename>
-struct conditional;
-
-  template
-struct __or_;
-
-  template<>
-struct __or_<>
-: public false_type
-{ };
-
-  template
-struct __or_<_B1>
-: public _B1
-{ };
-
-  template
-struct __or_<_B1, _B2>
-: public conditional<_B1::value, _B1, _B2>::type
-{ };
-
-  template
-struct __or_<_B1, _B2, _B3, _Bn...>
-: public conditional<_B1::value, _B1, __or_<_B2, _B3, _Bn...>>::type
-{ };
-
-  template
-struct __and_;
-
-  template<>
-struct __and_<>
-: public true_type
-{ };
-
-  template
-struct __and_<_B1>
-: public _B1
-{ };
-
-  template
-struct __and_<_B1, _B2>
-: public conditional<_B1::value, _B2, _B1>::type
-{ };

Re: [PATCH v2] libstdc++: Make certain exceptions transaction_safe.

2016-01-19 Thread Torvald Riegel
On Sat, 2016-01-16 at 10:57 +0100, Dominique d'Humières wrote:
> > Addressed these, fixed a problem with using GLIBCXX_WEAK_DEFINITION
> > (which is only set on Darwin despite the generic-sounding name -- so
> > just use __attribute__((weak)) directly), and also updated
> > testsuite_abi.cc so that it knows about CXXABI_1.3.10.
> >
> > Approved by Jonathan Wakely.  Committed as r232454.
> This breaks bootstrap on darwin, see 
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69310.

Please give this patch a try.  I've only tested it on x86_64-linux.
Jon, okay from your side if Darwin testing succeeds?
commit 6987f84f278d2cbf5b828a8c81c1be84b292b1af
Author: Torvald Riegel <trie...@redhat.com>
Date:   Tue Jan 19 18:36:14 2016 +0100

libstdc: Use weak alias instead of just alias in TM support.

	PR libstdc++/69310
	* src/c++11/cow-stdexcept.cc: Use weak alias instead of just alias
	to make Darwin happy.

diff --git a/libstdc++-v3/src/c++11/cow-stdexcept.cc b/libstdc++-v3/src/c++11/cow-stdexcept.cc
index a0f505c..a070747 100644
--- a/libstdc++-v3/src/c++11/cow-stdexcept.cc
+++ b/libstdc++-v3/src/c++11/cow-stdexcept.cc
@@ -364,7 +364,9 @@ _ZGTtNSt##NAME##C1EPKc (CLASS* that, const char* s)			\
  construct the COW string in the latter manually.  Note that the	\
  exception classes will not be declared transaction_safe if the	\
  shared empty _Rep is disabled with --enable-fully-dynamic-string	\
- (in which case _GLIBCXX_FULLY_DYNAMIC_STRING is nonzero).  */	\
+ (in which case _GLIBCXX_FULLY_DYNAMIC_STRING is nonzero).		\
+ The alias declarations are also declared weak because Darwin	\
+ doesn't support non-weak aliases.  */\
   CLASS e("");\
   _ITM_memcpyRnWt(that, , sizeof(CLASS));\
   _txnal_cow_string_C1_for_exceptions(_txnal_##BASE##_get_msg(that),	\
@@ -372,7 +374,7 @@ _ZGTtNSt##NAME##C1EPKc (CLASS* that, const char* s)			\
 }	\
 void	\
 _ZGTtNSt##NAME##C2EPKc (CLASS*, const char*)\
-  __attribute__((alias ("_ZGTtNSt" #NAME "C1EPKc")));			\
+  __attribute__((weak, alias ("_ZGTtNSt" #NAME "C1EPKc")));		\
 void	\
 _ZGTtNSt##NAME##C1ERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE( \
 CLASS* that, const std::__sso_string& s)\
@@ -393,7 +395,7 @@ _ZGTtNSt##NAME##D1Ev(CLASS* that)	\
 { _txnal_cow_string_D1(_txnal_##BASE##_get_msg(that)); }		\
 void	\
 _ZGTtNSt##NAME##D2Ev(CLASS*)		\
-__attribute__((alias ("_ZGTtNSt" #NAME "D1Ev")));			\
+__attribute__((weak, alias ("_ZGTtNSt" #NAME "D1Ev")));			\
 void	\
 _ZGTtNSt##NAME##D0Ev(CLASS* that)	\
 {	\


Re: [PATCH v2] libstdc++: Make certain exceptions transaction_safe.

2016-01-18 Thread Torvald Riegel
On Sun, 2016-01-17 at 18:30 -0500, David Edelsohn wrote:
> On Sun, Jan 17, 2016 at 3:21 PM, Torvald Riegel <trie...@redhat.com> wrote:
> > On Sat, 2016-01-16 at 15:38 -0500, David Edelsohn wrote:
> >> On Sat, Jan 16, 2016 at 8:35 AM, Jakub Jelinek <ja...@redhat.com> wrote:
> >> > On Sat, Jan 16, 2016 at 07:47:33AM -0500, David Edelsohn wrote:
> >> >> stage1 libstdc++ builds just fine.  the problem is stage2 configure
> >> >> fails due to missing ITM_xxx symbols when configure tries to compile
> >> >> and run conftest programs.
> >> >
> >> > On x86_64-linux, the _ITM_xxx symbols are undef weak ones and thus it is
> >> > fine to load libstdc++ without libitm and libstdc++ doesn't depend on
> >> > libitm.
> >> >
> >> > So, is AIX defining __GXX_WEAK__ or not?  Perhaps some other macro or
> >> > configure check needs to be used to determine if undefined weak symbols
> >> > work the way libstdc++ needs them to.
> >>
> >> __GXX_WEAK__ appears to be defined by gcc/c-family/c-cppbuiltin.c
> >> based on  SUPPORTS_ONE_ONLY.  gcc/defaults.h defines SUPPORTS_ONE_ONLY
> >> if the target supports MAKE_DECL_ONE_ONLY and link-once semantics.
> >> AIX weak correctly supports link-once semantics.  AIX also supports
> >> the definition of __GXX_WEAK__ in gcc/doc/cpp.texi, namely collapsing
> >> symbols with vague linkage in multiple translation units.
> >>
> >> libstdc++/src/c++11/cow-stdexcept.cc appears to be using __GXX_WEAK__
> >> and __attribute__ ((weak)) for references to symbols that may not be
> >> defined at link time or run time.  AIX does not allow undefined symbol
> >> errors by default.  And the libstdc++ inference about the semantics of
> >> __GXX_WEAK__ are different than the documentation.
> >>
> >> AIX supports MAKE_DECL_ONE_ONLY and the documented meaning of
> >> __GXX_WEAK__.  AIX does not support extension of the meaning to
> >> additional SVR4 semantics not specified in the documentation.
> >
> > I see, so we might be assuming that __GXX_WEAK__ means more than it
> > actually does (I'm saying "might" because personally, I don't know; your
> > information supports this is the case, but the initial info I got was
> > that __GXX_WEAK__ would mean we could have weak decls without
> > definitions).
> 
> I believe that libstdc++ must continue with the weak undefined
> references to the symbols as designed, but protect them with a
> different macro.  For example, __GXX_WEAK_REF__ or __GXX_WEAK_UNDEF__
> defined in defaults.h based on configure test or simply overridden in
> config/rs6000/aix.h.  Or the macro could be local to libstdc++ and
> overridden in config/os/aix/os_defines.h.

OK.  I'm currently testing the attached patch on x86_64-linux.  David,
if there are no objections from you and those CC'ed, could you give this
one a try on AIX, please?

* include/bits/c++config (_GLIBCXX_USE_WEAK_REF): New.
(_GLIBCXX_TXN_SAFE, _GLIBCXX_TXN_SAFE_DYN): Use _GLIBCXX_USE_WEAK_REF
    and move after its definition.
* config/os/aix/os_defines.h (_GLIBCXX_USE_WEAK_REF): Override.
* src/c++11/cow-stdexcept.cc: Use _GLIBCXX_USE_WEAK_REF instead of
__GXX_WEAK__, and only provide transactional clones if
_GLIBCXX_USE_WEAK_REF is true.  Don't provide stubs of libitm
functions.

commit a5a8819bce824815a94ef8d58f6d4123db92f1d4
Author: Torvald Riegel <trie...@redhat.com>
Date:   Mon Jan 18 14:42:21 2016 +0100

libstdc++: Fix usage of __GXX_WEAK__ in TM TS support.

	* include/bits/c++config (_GLIBCXX_USE_WEAK_REF): New.
	(_GLIBCXX_TXN_SAFE, _GLIBCXX_TXN_SAFE_DYN): Use	_GLIBCXX_USE_WEAK_REF
	and move after its definition.
	* config/os/aix/os_defines.h (_GLIBCXX_USE_WEAK_REF): Override.
	* src/c++11/cow-stdexcept.cc: Use _GLIBCXX_USE_WEAK_REF instead of
	__GXX_WEAK__, and only provide transactional clones if
	_GLIBCXX_USE_WEAK_REF is true.  Don't provide stubs of libitm
	functions.

diff --git a/libstdc++-v3/config/os/aix/os_defines.h b/libstdc++-v3/config/os/aix/os_defines.h
index d895471..0949446 100644
--- a/libstdc++-v3/config/os/aix/os_defines.h
+++ b/libstdc++-v3/config/os/aix/os_defines.h
@@ -48,4 +48,7 @@
 #define __COMPATMATH__
 #endif
 
+// No support for referencing weak symbols without a definition.
+#define _GLIBCXX_USE_WEAK_REF 0
+
 #endif
diff --git a/libstdc++-v3/include/bits/c++config b/libstdc++-v3/include/bits/c++config
index 387a7bb..57024e4 100644
--- a/libstdc++-v3/include/bits/c++config
+++ b/libstdc++-v3/include/bits/c++config
@@ -483,20 +483,6 @@ namespace std
 
 #define _GLIBCXX_USE_ALLOCATOR_

Re: [PATCH v2] libstdc++: Make certain exceptions transaction_safe.

2016-01-18 Thread Torvald Riegel
On Mon, 2016-01-18 at 14:54 +0100, Torvald Riegel wrote:
> On Sun, 2016-01-17 at 18:30 -0500, David Edelsohn wrote:
> > On Sun, Jan 17, 2016 at 3:21 PM, Torvald Riegel <trie...@redhat.com> wrote:
> > > On Sat, 2016-01-16 at 15:38 -0500, David Edelsohn wrote:
> > >> On Sat, Jan 16, 2016 at 8:35 AM, Jakub Jelinek <ja...@redhat.com> wrote:
> > >> > On Sat, Jan 16, 2016 at 07:47:33AM -0500, David Edelsohn wrote:
> > >> >> stage1 libstdc++ builds just fine.  the problem is stage2 configure
> > >> >> fails due to missing ITM_xxx symbols when configure tries to compile
> > >> >> and run conftest programs.
> > >> >
> > >> > On x86_64-linux, the _ITM_xxx symbols are undef weak ones and thus it 
> > >> > is
> > >> > fine to load libstdc++ without libitm and libstdc++ doesn't depend on
> > >> > libitm.
> > >> >
> > >> > So, is AIX defining __GXX_WEAK__ or not?  Perhaps some other macro or
> > >> > configure check needs to be used to determine if undefined weak symbols
> > >> > work the way libstdc++ needs them to.
> > >>
> > >> __GXX_WEAK__ appears to be defined by gcc/c-family/c-cppbuiltin.c
> > >> based on  SUPPORTS_ONE_ONLY.  gcc/defaults.h defines SUPPORTS_ONE_ONLY
> > >> if the target supports MAKE_DECL_ONE_ONLY and link-once semantics.
> > >> AIX weak correctly supports link-once semantics.  AIX also supports
> > >> the definition of __GXX_WEAK__ in gcc/doc/cpp.texi, namely collapsing
> > >> symbols with vague linkage in multiple translation units.
> > >>
> > >> libstdc++/src/c++11/cow-stdexcept.cc appears to be using __GXX_WEAK__
> > >> and __attribute__ ((weak)) for references to symbols that may not be
> > >> defined at link time or run time.  AIX does not allow undefined symbol
> > >> errors by default.  And the libstdc++ inference about the semantics of
> > >> __GXX_WEAK__ are different than the documentation.
> > >>
> > >> AIX supports MAKE_DECL_ONE_ONLY and the documented meaning of
> > >> __GXX_WEAK__.  AIX does not support extension of the meaning to
> > >> additional SVR4 semantics not specified in the documentation.
> > >
> > > I see, so we might be assuming that __GXX_WEAK__ means more than it
> > > actually does (I'm saying "might" because personally, I don't know; your
> > > information supports this is the case, but the initial info I got was
> > > that __GXX_WEAK__ would mean we could have weak decls without
> > > definitions).
> > 
> > I believe that libstdc++ must continue with the weak undefined
> > references to the symbols as designed, but protect them with a
> > different macro.  For example, __GXX_WEAK_REF__ or __GXX_WEAK_UNDEF__
> > defined in defaults.h based on configure test or simply overridden in
> > config/rs6000/aix.h.  Or the macro could be local to libstdc++ and
> > overridden in config/os/aix/os_defines.h.
> 
> OK.  I'm currently testing the attached patch on x86_64-linux.

No regressions in the libstdc++ and libitm tests on x86_64-linux.



Re: [PATCH v2] libstdc++: Make certain exceptions transaction_safe.

2016-01-17 Thread Torvald Riegel
On Sat, 2016-01-16 at 15:38 -0500, David Edelsohn wrote:
> On Sat, Jan 16, 2016 at 8:35 AM, Jakub Jelinek <ja...@redhat.com> wrote:
> > On Sat, Jan 16, 2016 at 07:47:33AM -0500, David Edelsohn wrote:
> >> stage1 libstdc++ builds just fine.  the problem is stage2 configure
> >> fails due to missing ITM_xxx symbols when configure tries to compile
> >> and run conftest programs.
> >
> > On x86_64-linux, the _ITM_xxx symbols are undef weak ones and thus it is
> > fine to load libstdc++ without libitm and libstdc++ doesn't depend on
> > libitm.
> >
> > So, is AIX defining __GXX_WEAK__ or not?  Perhaps some other macro or
> > configure check needs to be used to determine if undefined weak symbols
> > work the way libstdc++ needs them to.
> 
> __GXX_WEAK__ appears to be defined by gcc/c-family/c-cppbuiltin.c
> based on  SUPPORTS_ONE_ONLY.  gcc/defaults.h defines SUPPORTS_ONE_ONLY
> if the target supports MAKE_DECL_ONE_ONLY and link-once semantics.
> AIX weak correctly supports link-once semantics.  AIX also supports
> the definition of __GXX_WEAK__ in gcc/doc/cpp.texi, namely collapsing
> symbols with vague linkage in multiple translation units.
> 
> libstdc++/src/c++11/cow-stdexcept.cc appears to be using __GXX_WEAK__
> and __attribute__ ((weak)) for references to symbols that may not be
> defined at link time or run time.  AIX does not allow undefined symbol
> errors by default.  And the libstdc++ inference about the semantics of
> __GXX_WEAK__ are different than the documentation.
> 
> AIX supports MAKE_DECL_ONE_ONLY and the documented meaning of
> __GXX_WEAK__.  AIX does not support extension of the meaning to
> additional SVR4 semantics not specified in the documentation.

I see, so we might be assuming that __GXX_WEAK__ means more than it
actually does (I'm saying "might" because personally, I don't know; your
information supports this is the case, but the initial info I got was
that __GXX_WEAK__ would mean we could have weak decls without
definitions).

The attached patch works around this by always definining stubs for the
libitm functions, yet declaring them weak at the same time.  If
__GXX_WEAK__ is not supplied, the transactional clones aren't built at
all.  This tests fine on x86_64-linux, and I suppose that it should work
on AIX too (but I haven't tested).  Is it harmless if gnu.pre lists
symbols that we don't provide?

Thoughts?
commit 9008d4a610dccb5ec47f9c1e506492b8615a36fd
Author: Torvald Riegel <trie...@redhat.com>
Date:   Sun Jan 17 19:21:13 2016 +0100

libstdc++: Fix usage of weak symbols in TM TS support.

diff --git a/libstdc++-v3/src/c++11/cow-stdexcept.cc b/libstdc++-v3/src/c++11/cow-stdexcept.cc
index e2d7e76..8ccc7f5 100644
--- a/libstdc++-v3/src/c++11/cow-stdexcept.cc
+++ b/libstdc++-v3/src/c++11/cow-stdexcept.cc
@@ -179,6 +179,13 @@ _GLIBCXX_END_NAMESPACE_VERSION
 // Furthermore, _Rep will always have been allocated or deallocated via
 // global new or delete, so nontransactional writes we do to _Rep cannot
 // interfere with transactional accesses.
+
+// We require support for weak symbols because we need to call libitm but must
+// not depend on it.  The exception classes will not be declared
+// transaction-safe if there is no such support, so do not create
+// transactional clones in that case at all.
+#ifdef __GXX_WEAK__
+
 extern "C" {
 
 #ifndef _GLIBCXX_MANGLE_SIZE_T
@@ -195,7 +202,6 @@ extern "C" {
 # define ITM_REGPARM
 #endif
 
-#if __GXX_WEAK__
 // Declare all libitm symbols we rely on, but make them weak so that we do
 // not depend on libitm.
 extern void* _ZGTtnaX (size_t sz) __attribute__((weak));
@@ -213,9 +219,10 @@ extern void _ITM_memcpyRnWt(void *, const void *, size_t)
 extern void _ITM_addUserCommitAction(void (*)(void *), uint64_t, void *)
   ITM_REGPARM __attribute__((weak));
 
-#else
-// If there is no support for weak symbols, create dummies.  The exceptions
-// will not be declared transaction_safe in this case.
+// Also create stubs because some targets (e.g., AIX) do not support weak
+// symbols that do not have a definition.  These stubs will not be used
+// (unless users call the transactional clones directly or run transactions
+// without using libitm as well)
 void* _ZGTtnaX (size_t) { return NULL; }
 void _ZGTtdlPv (void*) { }
 uint8_t _ITM_RU1(const uint8_t *) { return 0; }
@@ -224,7 +231,6 @@ uint64_t _ITM_RU8(const uint64_t *) { return 0; }
 void _ITM_memcpyRtWn(void *, const void *, size_t) { }
 void _ITM_memcpyRnWt(void *, const void *, size_t) { }
 void _ITM_addUserCommitAction(void (*)(void *), uint64_t, void *) { };
-#endif
 
 }
 
@@ -440,3 +446,5 @@ CTORDTOR(14overflow_error, std::overflow_error, runtime_error)
 CTORDTOR(15underflow_error, std::underflow_error, runtime_error)
 
 }
+
+#endif  // __GXX_WEAK__


Re: [PATCH] libstdc++: Fix static_assert.

2016-01-17 Thread Torvald Riegel
On Sun, 2016-01-17 at 17:03 +, Jonathan Wakely wrote:
> On 17 January 2016 at 17:01, Jonathan Wakely wrote:
> > On 16 January 2016 at 22:47, Torvald Riegel wrote:
> >> This adds a missing string argument to a call to static_assert, thus not
> >> making it depend on c++1z extensions.  This fixes the build breakage on
> >> mingw introduced in 232454.
> >>
> >> Tested on x86_64-linux.  OK?
> >
> > OK.
> 
> Actually please change the message, so it states the condition being
> tested, not the negation.
> 
> e.g. "Pointers are 32 bits or 64 bits wide"
> 
> or state what must be true, "Pointers must be 32 bits or 64 bits wide"

I committed the latter.



[PATCH][committed] libitm: Ensure proxy privatization safety.

2016-01-16 Thread Torvald Riegel
This patch fixes privatization safety support by additionally ensuring
proxy privatization safety.  It fixes a correctness bugs, but in a
simple way for now, which will lead to less scalability with higher
thread counts.  I plan to finish an optimization once I have enough time
for this.

Tested on x86_64-linux.  Committed as r232469

* method-gl.cc (gl_wt_dispatch::trycommit): Ensure proxy privatization
safety.
* method-ml.cc (ml_wt_dispatch::trycommit): Likewise.
* libitm/testsuite/libitm.c/priv-1.c: New.

commit fded025d03371aa320eb7e0cb6bd3d772e387fdc
Author: Torvald Riegel <trie...@redhat.com>
Date:   Fri Nov 27 22:59:07 2015 +0100

libitm: Ensure proxy privatization safety.

	* method-gl.cc (gl_wt_dispatch::trycommit): Ensure proxy privatization
	safety.
	* method-ml.cc (ml_wt_dispatch::trycommit): Likewise.
	* libitm/testsuite/libitm.c/priv-1.c: New.

diff --git a/libitm/method-gl.cc b/libitm/method-gl.cc
index b2e2bca..b51c802 100644
--- a/libitm/method-gl.cc
+++ b/libitm/method-gl.cc
@@ -291,12 +291,18 @@ public:
 // See begin_or_restart() for why we need release memory order here.
 	v = gl_mg::clear_locked(v) + 1;
 	o_gl_mg.orec.store(v, memory_order_release);
-
-	// Need to ensure privatization safety. Every other transaction must
-	// have a snapshot time that is at least as high as our commit time
-	// (i.e., our commit must be visible to them).
-	priv_time = v;
   }
+
+// Need to ensure privatization safety. Every other transaction must have
+// a snapshot time that is at least as high as our commit time (i.e., our
+// commit must be visible to them).  Because of proxy privatization, we
+// must ensure that even if we are a read-only transaction.  See
+// ml_wt_dispatch::trycommit() for details: We can't get quite the same
+// set of problems because we just use one orec and thus, for example,
+// there cannot be concurrent writers -- but we can still get pending
+// loads to privatized data when not ensuring privatization safety, which
+// is problematic if the program unmaps the privatized memory.
+priv_time = v;
 return true;
   }
 
diff --git a/libitm/method-ml.cc b/libitm/method-ml.cc
index 723643a..c1a6771 100644
--- a/libitm/method-ml.cc
+++ b/libitm/method-ml.cc
@@ -513,6 +513,21 @@ public:
 if (!tx->writelog.size())
   {
 tx->readlog.clear();
+// We still need to ensure privatization safety, unfortunately.  While
+// we cannot have privatized anything by ourselves (because we are not
+// an update transaction), we can have observed the commits of
+// another update transaction that privatized something.  Because any
+// commit happens before ensuring privatization, our snapshot and
+// commit can thus have happened before ensuring privatization safety
+// for this commit/snapshot time.  Therefore, before we can return to
+// nontransactional code that might use the privatized data, we must
+// ensure privatization safety for our snapshot time.
+// This still seems to be better than not allowing use of the
+// snapshot time before privatization safety has been ensured because
+// we at least can run transactions such as this one, and in the
+// meantime the transaction producing this commit time might have
+// finished ensuring privatization safety for it.
+priv_time = tx->shared_state.load(memory_order_relaxed);
 return true;
   }
 
diff --git a/libitm/testsuite/libitm.c/priv-1.c b/libitm/testsuite/libitm.c/priv-1.c
new file mode 100644
index 000..635d523
--- /dev/null
+++ b/libitm/testsuite/libitm.c/priv-1.c
@@ -0,0 +1,116 @@
+/* Quick stress test for proxy privatization.  */
+
+/* We need to use a TM method that has to enforce privatization safety
+   explicitly.  */
+/* { dg-set-target-env-var ITM_DEFAULT_METHOD "ml_wt" } */
+
+#include 
+#include 
+#include 
+
+/* Make them likely to be mapped to different orecs.  */
+#define ALIGN __attribute__((aligned (256)))
+/* Don't make these static to work around PR 68591.  */
+int x ALIGN;
+int *ptr ALIGN;
+int *priv_ptr ALIGN;
+int priv_value ALIGN;
+int barrier ALIGN = 0;
+const int iters = 100;
+
+static void arrive_and_wait (int expected_value)
+{
+  int now = __atomic_add_fetch (, 1, __ATOMIC_ACQ_REL);
+  while (now < expected_value)
+__atomic_load (, , __ATOMIC_ACQUIRE);
+}
+
+static void __attribute__((transaction_pure,noinline)) delay (int i)
+{
+  for (volatile int v = 0; v < i; v++);
+}
+
+/* This tries to catch a case in which proxy privatization safety is not
+   ensured by privatization_user.  Specifically, it's access to the value
+   of it's transactional snapshot of ptr must read from an uncommitted write
+   by writer; thus, writer must still be active but must have read ptr before
+   proxy can privatize *ptr by assig

Re: [PATCH v2] libstdc++: Make certain exceptions transaction_safe.

2016-01-16 Thread Torvald Riegel
On Sat, 2016-01-16 at 10:57 +0100, Dominique d'Humières wrote:
> > Addressed these, fixed a problem with using GLIBCXX_WEAK_DEFINITION
> > (which is only set on Darwin despite the generic-sounding name -- so
> > just use __attribute__((weak)) directly), and also updated
> > testsuite_abi.cc so that it knows about CXXABI_1.3.10.
> >
> > Approved by Jonathan Wakely.  Committed as r232454.
> This breaks bootstrap on darwin, see 
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69310.

Can you (or anyone else with access to Darwin) let me know what Darwin
actually supports that would be similar in effect to the alias we use
right now?  I suppose a weak alias would work as well because we define
the C1 constructor anyway?  If so, what's the proper macros / configury
check to use for these?

Is there any documentation for what darwin supports and needs in this
space?



[PATCH] libstdc++: Fix static_assert.

2016-01-16 Thread Torvald Riegel
This adds a missing string argument to a call to static_assert, thus not
making it depend on c++1z extensions.  This fixes the build breakage on
mingw introduced in 232454.

Tested on x86_64-linux.  OK?
commit 7659ab483954a15c8143f6b1b9d135159a2ecc67
Author: Torvald Riegel <trie...@redhat.com>
Date:   Sat Jan 16 23:40:26 2016 +0100

libstdc++: Fix static_assert.

	* src/c++11/cow-stdexcept.cc (txnal_read_ptr): Fix static_assert.

diff --git a/libstdc++-v3/src/c++11/cow-stdexcept.cc b/libstdc++-v3/src/c++11/cow-stdexcept.cc
index afc3f6c..375ab8b 100644
--- a/libstdc++-v3/src/c++11/cow-stdexcept.cc
+++ b/libstdc++-v3/src/c++11/cow-stdexcept.cc
@@ -278,8 +278,8 @@ _txnal_cow_string_C1_for_exceptions(void* that, const char* s, void *exc)
 static void* txnal_read_ptr(void* const * ptr)
 {
   static_assert(sizeof(uint64_t) == sizeof(void*)
-		|| sizeof(uint32_t) == sizeof(void*));
-  // FIXME make a true compile-time choice to prevent warnings.
+		|| sizeof(uint32_t) == sizeof(void*),
+		"Pointers are neither 32 nor 64 bit wide");
 #if __UINTPTR_MAX__ == __UINT64_MAX__
   return (void*)_ITM_RU8((const uint64_t*)ptr);
 #else


Re: [PATCH v2] libstdc++: Make certain exceptions transaction_safe.

2016-01-16 Thread Torvald Riegel
On Sat, 2016-01-16 at 14:35 +0100, Jakub Jelinek wrote:
> On Sat, Jan 16, 2016 at 07:47:33AM -0500, David Edelsohn wrote:
> > stage1 libstdc++ builds just fine.  the problem is stage2 configure
> > fails due to missing ITM_xxx symbols when configure tries to compile
> > and run conftest programs.
> 
> On x86_64-linux, the _ITM_xxx symbols are undef weak ones and thus it is
> fine to load libstdc++ without libitm and libstdc++ doesn't depend on
> libitm.
> 
> So, is AIX defining __GXX_WEAK__ or not?  Perhaps some other macro or
> configure check needs to be used to determine if undefined weak symbols
> work the way libstdc++ needs them to.

David, if you can tell me what AIX supports and whether it defines
__GXX_WEAK__ with the semantics we assume here, I can see what a fix
would be.  As Jakub says, the point of all what's below is to actually
make it work when there's no TM support.

Also, knowing the actual error that AIX fails with would be helpful.  I
have no access to AIX, so can't check.  Thanks.

> #if __GXX_WEAK__
> // Declare all libitm symbols we rely on, but make them weak so that we do
> // not depend on libitm.
> extern void* _ZGTtnaX (size_t sz) __attribute__((weak));
> extern void _ZGTtdlPv (void* ptr) __attribute__((weak));
> extern uint8_t _ITM_RU1(const uint8_t *p)
>   ITM_REGPARM __attribute__((weak));
> extern uint32_t _ITM_RU4(const uint32_t *p)
>   ITM_REGPARM __attribute__((weak));
> extern uint64_t _ITM_RU8(const uint64_t *p)
>   ITM_REGPARM __attribute__((weak));
> extern void _ITM_memcpyRtWn(void *, const void *, size_t)
>   ITM_REGPARM __attribute__((weak));
> extern void _ITM_memcpyRnWt(void *, const void *, size_t)
>   ITM_REGPARM __attribute__((weak));
> extern void _ITM_addUserCommitAction(void (*)(void *), uint64_t, void *)
>   ITM_REGPARM __attribute__((weak));
> 
> #else
> // If there is no support for weak symbols, create dummies.  The exceptions
> // will not be declared transaction_safe in this case.
> void* _ZGTtnaX (size_t) { return NULL; }
> void _ZGTtdlPv (void*) { }
> uint8_t _ITM_RU1(const uint8_t *) { return 0; }
> uint32_t _ITM_RU4(const uint32_t *) { return 0; }
> uint64_t _ITM_RU8(const uint64_t *) { return 0; }
> void _ITM_memcpyRtWn(void *, const void *, size_t) { }
> void _ITM_memcpyRnWt(void *, const void *, size_t) { }
> void _ITM_addUserCommitAction(void (*)(void *), uint64_t, void *) { };
> #endif
> 
>   Jakub





Re: [PATCH v2] libstdc++: Make certain exceptions transaction_safe.

2016-01-15 Thread Torvald Riegel
On Thu, 2016-01-14 at 17:58 +, Jonathan Wakely wrote:
> On 07/01/16 17:47 +0100, Torvald Riegel wrote:
> >The attached patch makes some exceptions transaction-safe, as require by
> >the Transactional Memory TS.  I believe I addressed all feedback for the
> >previous version of this patch (in particular, there are now more safety
> >checks for preconditions for this implementation (eg, that the new
> >allocator is used), all exceptions declared by the TM TS are now
> >supported (with the exception of tx_exception -- should I add that in a
> >follow-up patch?)
> 
> Yes, that can be a separate patch, as it's adding a new type rather
> than modifying the existing ones to add this TM magic.
> 
> >Thus, the patch adds txnal clones as required.  They are new exported
> >symbols, but not visible to nontransactional code.  The only changes to
> >headers is transaction_safe[_dynamic] annotations where required by the
> >TS, and a few friend declarations.  The annotations are only enabled if
> >a user compiles with -fgnu-tm.  IOW, the changes are pretty much
> >invisible when not using the TM TS.
> 
> Thanks for adding all the clear comments as well. I'm sure we'll all
> be grateful for those when we come to look back at the code.
> 
> >There are also commented-out calls to _ITM_setAssociatedException in the
> >code, which exist to show how we plan to support transaction
> >cancellation through exceptions (which needs some more libitm support
> >and bugfixes on the compiler side).
> >
> >Tested on x86_64-linux and x86-linux.
> >
> >OK?
> 
> Yes, with some minor formatting changes noted below.

Addressed these, fixed a problem with using GLIBCXX_WEAK_DEFINITION
(which is only set on Darwin despite the generic-sounding name -- so
just use __attribute__((weak)) directly), and also updated
testsuite_abi.cc so that it knows about CXXABI_1.3.10.

Approved by Jonathan Wakely.  Committed as r232454.


commit df44fb92fd161282cc6540053cd82177b7c02d51
Author: Torvald Riegel <trie...@redhat.com>
Date:   Fri Nov 13 01:00:52 2015 +0100

libstdc++: Make certain exceptions transaction_safe.

diff --git a/libitm/testsuite/libitm.c++/libstdc++-safeexc.C b/libitm/testsuite/libitm.c++/libstdc++-safeexc.C
new file mode 100644
index 000..3e1655e
--- /dev/null
+++ b/libitm/testsuite/libitm.c++/libstdc++-safeexc.C
@@ -0,0 +1,89 @@
+// Tests that the exceptions declared by the TM TS (N4514) as transaction_safe
+// are indeed that.  Thus, this also tests the transactional clones in
+// libstdc++ and libsupc++.
+
+// { dg-do run }
+
+#include 
+#include 
+#include 
+#include 
+
+using namespace std;
+
+template void thrower(const T& t)
+{
+  try
+{
+  atomic_commit
+  {
+	throw t;
+  }
+}
+  catch (T ex)
+{
+  if (ex != t) abort ();
+}
+}
+
+template void thrower1(const string& what)
+{
+  try
+{
+  atomic_commit
+  {
+	throw T ();
+  }
+}
+  catch (T ex)
+{
+  if (what != ex.what()) abort ();
+}
+}
+
+template void thrower2(const string& what)
+{
+  try
+{
+  atomic_commit
+  {
+	throw T (what);
+  }
+}
+  catch (T ex)
+{
+  if (what != ex.what()) abort ();
+}
+}
+
+
+int main ()
+{
+  thrower (23);
+  thrower (23);
+  thrower (23);
+  thrower (23);
+  thrower (23);
+  thrower (23);
+  thrower (42);
+  thrower (42);
+  thrower (42);
+  thrower (42);
+  thrower (23.42);
+  thrower (23.42);
+  thrower (23.42);
+  thrower<void*> (0);
+  thrower<void**> (0);
+  thrower1 ("std::exception");
+  thrower1 ("std::bad_exception");
+  thrower2 ("test");
+  thrower2 ("test");
+  thrower2 ("test");
+  thrower2 ("test");
+  thrower2 ("test");
+  thrower2 ("test");
+  thrower2 ("test");
+  thrower2 ("test");
+  thrower2 ("test");
+  return 0;
+}
diff --git a/libstdc++-v3/acinclude.m4 b/libstdc++-v3/acinclude.m4
index b76e8d5..1e25660 100644
--- a/libstdc++-v3/acinclude.m4
+++ b/libstdc++-v3/acinclude.m4
@@ -2594,6 +2594,8 @@ AC_DEFUN([GLIBCXX_ENABLE_ALLOCATOR], [
   ;;
   esac
 
+  GLIBCXX_CONDITIONAL(ENABLE_ALLOCATOR_NEW,
+		  test $enable_libstdcxx_allocator_flag = new)
   AC_SUBST(ALLOCATOR_H)
   AC_SUBST(ALLOCATOR_NAME)
 ])
@@ -4344,6 +4346,34 @@ dnl
   AC_LANG_RESTORE
 ])
 
+dnl
+dnl Check how size_t is mangled.  Copied from libitm.
+dnl
+AC_DEFUN([GLIBCXX_CHECK_SIZE_T_MANGLING], [
+  AC_CACHE_CHECK([how size_t is mangled],
+ glibcxx_cv_size_t_mangling, [
+AC_TRY_COMPILE([], [extern __SIZE_TYPE__ x; extern unsigned long x;],
+   [glibcxx_cv_size_t_mangling=m], [
+  AC_TRY_COMPILE([], [extern __SIZE_TYPE__ x; extern unsigned int x;],
+ [glib

[PATCH][committed] libitm: Fix privatization safety interaction with serial mode.

2016-01-13 Thread Torvald Riegel
This patch fixes the interaction of privatization safety and serial
mode.  Specifically, previously serial mode was able to happen
concurrently with other threads still being in the quiescence-based
implementation of ensuring privatization safety; this is a bug because
serial mode threads can modify the list of transactions (which
privatization safety iterates over) and can change the current method
group (priv safety depends on shared_state to have the context of the
method group that the privatizing thread assumes to be still current).

The fix delays making a transaction inactive to after it has ensured
privatization safety.  It becomes quasi-inactive before that (ie, it
gets an always-most-recent snapshot) though.  However, as a result of
that, we now need to prevent deadlocks between transactions upgrading to
serial mode and privatization safety: The upgrader must keep its
snapshot most recent (or abort if it cannot) so that the still-active
transactions that the upgrader waits for are not waiting in turn for the
upgrader to update its snapshot.

Tested on x86_64-linux.  Committed as r232322.

2016-01-13  Torvald Riegel  <trie...@redhat.com>

* beginend.cc (gtm_thread::trycommit): Fix privatization safety.
* config/linux/rwlock.cc (gtm_rwlock::write_lock_generic): Likewise.
* config/posix/rwlock.cc (gtm_rwlock::write_lock_generic): Likewise.
* dispatch.h (abi_dispatch::snapshot_most_recent): New.
* method-gl.cc (gl_wt_dispatch::snapshot_most_recent): New.
* method-ml.cc (ml_wt_dispatch::snapshot_most_recent): New.
* method-serial.cc (serial_dispatch::snapshot_most_recent): New.
(serialirr_dispatch::snapshot_most_recent): New.
(serialirr_onwrite_dispatch::snapshot_most_recent): New.
commit 25e850205e37009a76fb37b52f9ce61ab1616995
Author: Torvald Riegel <trie...@redhat.com>
Date:   Wed Jan 13 13:35:09 2016 +0100

libitm: Fix privatization safety interaction with serial mode.

diff --git a/libitm/beginend.cc b/libitm/beginend.cc
index c801dab..85fb4f1 100644
--- a/libitm/beginend.cc
+++ b/libitm/beginend.cc
@@ -568,8 +568,9 @@ GTM::gtm_thread::trycommit ()
   gtm_word priv_time = 0;
   if (abi_disp()->trycommit (priv_time))
 {
-  // The transaction is now inactive. Everything that we still have to do
-  // will not synchronize with other transactions anymore.
+  // The transaction is now finished but we will still access some shared
+  // data if we have to ensure privatization safety.
+  bool do_read_unlock = false;
   if (state & gtm_thread::STATE_SERIAL)
 {
   gtm_thread::serial_lock.write_unlock ();
@@ -578,7 +579,27 @@ GTM::gtm_thread::trycommit ()
   priv_time = 0;
 }
   else
-	gtm_thread::serial_lock.read_unlock (this);
+	{
+	  // If we have to ensure privatization safety, we must not yet
+	  // release the read lock and become inactive because (1) we still
+	  // have to go through the list of all transactions, which can be
+	  // modified by serial mode threads, and (2) we interpret each
+	  // transactions' shared_state in the context of what we believe to
+	  // be the current method group (and serial mode transactions can
+	  // change the method group).  Therefore, if we have to ensure
+	  // privatization safety, delay becoming inactive but set a maximum
+	  // snapshot time (we have committed and thus have an empty snapshot,
+	  // so it will always be most recent).  Use release MO so that this
+	  // synchronizes with other threads observing our snapshot time.
+	  if (priv_time)
+	{
+	  do_read_unlock = true;
+	  shared_state.store((~(typeof gtm_thread::shared_state)0) - 1,
+		  memory_order_release);
+	}
+	  else
+	gtm_thread::serial_lock.read_unlock (this);
+	}
   state = 0;
 
   // We can commit the undo log after dispatch-specific commit and after
@@ -618,8 +639,11 @@ GTM::gtm_thread::trycommit ()
 	}
 	}
 
-  // After ensuring privatization safety, we execute potentially
-  // privatizing actions (e.g., calling free()). User actions are first.
+  // After ensuring privatization safety, we are now truly inactive and
+  // thus can release the read lock.  We will also execute potentially
+  // privatizing actions (e.g., calling free()).  User actions are first.
+  if (do_read_unlock)
+	gtm_thread::serial_lock.read_unlock (this);
   commit_user_actions ();
   commit_allocations (false, 0);
 
diff --git a/libitm/config/linux/rwlock.cc b/libitm/config/linux/rwlock.cc
index 46a775f..381a553 100644
--- a/libitm/config/linux/rwlock.cc
+++ b/libitm/config/linux/rwlock.cc
@@ -158,6 +158,19 @@ gtm_rwlock::write_lock_generic (gtm_thread *tx)
   while (it->shared_state.load (memory_order_relaxed)
   != ~(typeof it->shared_state)0)
 	{
+	  // If this is an upgrade, we have to break deadlocks with
+	  // privatization safety.  This may fail on our side, in 

[PATCH][committed] libitm: Fix seq-cst MOs/fences in rwlock.

2016-01-13 Thread Torvald Riegel
This fixes a few cases of memory_order_seq_cst atomic accesses that
should have been memory_order_seq_cst fences instead.  I think it's
likely that it has worked so far nonetheless because the code we
currently end up with is likely similar -- but this could change with
more agressive optimizations of concurrent code.

Tested on x86_64-linux.  Committed as r232353.

2016-01-13  Torvald Riegel  <trie...@redhat.com>

* beginend.cc (gtm_thread::trycommit): Fix seq_cst fences.
* config/linux/rwlock.cc (gtm_rwlock::write_lock_generic): Likewise.
(gtm_rwlock::write_unlock): Likewise.
commit beef84ff7f719a1c6f498edb273be92185a38c26
Author: Torvald Riegel <trie...@redhat.com>
Date:   Wed Jan 13 21:36:48 2016 +0100

libitm: Fix seq-cst MOs/fences in rwlock.

diff --git a/libitm/beginend.cc b/libitm/beginend.cc
index 85fb4f1..00d28f4 100644
--- a/libitm/beginend.cc
+++ b/libitm/beginend.cc
@@ -619,8 +619,10 @@ GTM::gtm_thread::trycommit ()
   // acquisitions).  This ensures that if we read prior to other
   // reader transactions setting their shared_state to 0, then those
   // readers will observe our updates.  We can reuse the seq_cst fence
-  // in serial_lock.read_unlock() however, so we don't need another
-  // one here.
+  // in serial_lock.read_unlock() if we performed that; if not, we
+	  // issue the fence.
+	  if (do_read_unlock)
+	atomic_thread_fence (memory_order_seq_cst);
 	  // TODO Don't just spin but also block using cond vars / futexes
 	  // here. Should probably be integrated with the serial lock code.
 	  for (gtm_thread *it = gtm_thread::list_of_threads; it != 0;
diff --git a/libitm/config/linux/rwlock.cc b/libitm/config/linux/rwlock.cc
index 381a553..b8d31ea 100644
--- a/libitm/config/linux/rwlock.cc
+++ b/libitm/config/linux/rwlock.cc
@@ -122,9 +122,10 @@ gtm_rwlock::read_lock (gtm_thread *tx)
 bool
 gtm_rwlock::write_lock_generic (gtm_thread *tx)
 {
-  // Try to acquire the write lock.
+  // Try to acquire the write lock.  Relaxed MO is fine because of the
+  // additional fence below.
   int w = 0;
-  if (unlikely (!writers.compare_exchange_strong (w, 1)))
+  if (unlikely (!writers.compare_exchange_strong (w, 1, memory_order_relaxed)))
 {
   // If this is an upgrade, we must not wait for other writers or
   // upgrades.
@@ -135,18 +136,20 @@ gtm_rwlock::write_lock_generic (gtm_thread *tx)
   // switch to contended mode.  We need seq_cst memory order to make the
   // Dekker-style synchronization work.
   if (w != 2)
-	w = writers.exchange (2);
+	w = writers.exchange (2, memory_order_relaxed);
   while (w != 0)
 	{
 	  futex_wait(, 2);
-	  w = writers.exchange (2);
+	  w = writers.exchange (2, memory_order_relaxed);
 	}
 }
+  // This fence is both required for the Dekker-like synchronization we do
+  // here and is the acquire MO required to make us synchronize-with prior
+  // writers.
+  atomic_thread_fence (memory_order_seq_cst);
 
   // We have acquired the writer side of the R/W lock. Now wait for any
   // readers that might still be active.
-  // We don't need an extra barrier here because the CAS and the xchg
-  // operations have full barrier semantics already.
   // TODO In the worst case, this requires one wait/wake pair for each
   // active reader. Reduce this!
   for (gtm_thread *it = gtm_thread::list_of_threads; it != 0;
@@ -259,28 +262,24 @@ gtm_rwlock::read_unlock (gtm_thread *tx)
 void
 gtm_rwlock::write_unlock ()
 {
-  // This needs to have seq_cst memory order.
-  if (writers.fetch_sub (1) == 2)
+  // Release MO so that we synchronize with subsequent writers.
+  if (writers.exchange (0, memory_order_release) == 2)
 {
-  // There might be waiting writers, so wake them.
-  writers.store (0, memory_order_relaxed);
-  if (futex_wake(, 1) == 0)
-	{
-	  // If we did not wake any waiting writers, we might indeed be the
-	  // last writer (this can happen because write_lock_generic()
-	  // exchanges 0 or 1 to 2 and thus might go to contended mode even if
-	  // no other thread holds the write lock currently). Therefore, we
-	  // have to wake up readers here as well.  Execute a barrier after
-	  // the previous relaxed reset of writers (Dekker-style), and fall
-	  // through to the normal reader wake-up code.
-	  atomic_thread_fence (memory_order_seq_cst);
-	}
-  else
+  // There might be waiting writers, so wake them.  If we woke any thread,
+  // we assume it to indeed be a writer; waiting writers will never give
+  // up, so we can assume that they will take care of anything else such
+  // as waking readers.
+  if (futex_wake(, 1) > 0)
 	return;
+  // If we did not wake any waiting writers, we might indeed be the last
+  // writer (this can happen because write_lock_generic() exchanges 0 or 1
+  // to 2 and thus might go to contended mode even if no other thread
+  // holds the write

[PATCH][committed] libitm: Remove dead code and data.

2016-01-12 Thread Torvald Riegel
This removes code and data members that have not been used for quite a
while now.  The user-visible benefit is 8MB less space overhead if
libitm is used.

Tested on x86_64-linux and committed as r232275.


2016-01-12  Torvald Riegel  <trie...@redhat.com>

* libitm_i.h (gtm_mask_stack): Remove.
* beginend.cc (gtm_stmlock_array, gtm_clock): Likewise.
* stmlock.h: Remove file.
* config/alpha/cacheline.h: Likewise.
* config/generic/cacheline.h: Likewise.
* config/powerpc/cacheline.h: Likewise.
* config/sparc/cacheline.h: Likewise.
* config/x86/cacheline.h: Likewise.

commit fe0abed5782347922d4f9dba13b9a917fe9d5296
Author: Torvald Riegel <trie...@redhat.com>
Date:   Mon Jan 11 19:30:14 2016 +0100

libitm: Remove dead code and data.

diff --git a/libitm/beginend.cc b/libitm/beginend.cc
index 367edc8..c801dab 100644
--- a/libitm/beginend.cc
+++ b/libitm/beginend.cc
@@ -36,9 +36,6 @@ gtm_rwlock GTM::gtm_thread::serial_lock;
 gtm_thread *GTM::gtm_thread::list_of_threads = 0;
 unsigned GTM::gtm_thread::number_of_threads = 0;
 
-gtm_stmlock GTM::gtm_stmlock_array[LOCK_ARRAY_SIZE];
-atomic GTM::gtm_clock;
-
 /* ??? Move elsewhere when we figure out library initialization.  */
 uint64_t GTM::gtm_spin_count_var = 1000;
 
diff --git a/libitm/config/alpha/cacheline.h b/libitm/config/alpha/cacheline.h
deleted file mode 100644
index c8da46d..000
--- a/libitm/config/alpha/cacheline.h
+++ /dev/null
@@ -1,38 +0,0 @@
-/* Copyright (C) 2009-2016 Free Software Foundation, Inc.
-   Contributed by Richard Henderson <r...@redhat.com>.
-
-   This file is part of the GNU Transactional Memory Library (libitm).
-
-   Libitm is free software; you can redistribute it and/or modify it
-   under the terms of the GNU General Public License as published by
-   the Free Software Foundation; either version 3 of the License, or
-   (at your option) any later version.
-
-   Libitm is distributed in the hope that it will be useful, but WITHOUT ANY
-   WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS
-   FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
-   more details.
-
-   Under Section 7 of GPL version 3, you are granted additional
-   permissions described in the GCC Runtime Library Exception, version
-   3.1, as published by the Free Software Foundation.
-
-   You should have received a copy of the GNU General Public License and
-   a copy of the GCC Runtime Library Exception along with this program;
-   see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
-   <http://www.gnu.org/licenses/>.  */
-
-#ifndef LIBITM_ALPHA_CACHELINE_H
-#define LIBITM_ALPHA_CACHELINE_H 1
-
-// A cacheline is the smallest unit with which locks are associated.
-// The current implementation of the _ITM_[RW] barriers assumes that
-// all data types can fit (aligned) within a cachline, which means
-// in practice sizeof(complex long double) is the smallest cacheline size.
-// It ought to be small enough for efficient manipulation of the
-// modification mask, below.
-#define CACHELINE_SIZE 64
-
-#include "config/generic/cacheline.h"
-
-#endif // LIBITM_ALPHA_CACHELINE_H
diff --git a/libitm/config/generic/cacheline.h b/libitm/config/generic/cacheline.h
deleted file mode 100644
index 8b9f927..000
--- a/libitm/config/generic/cacheline.h
+++ /dev/null
@@ -1,58 +0,0 @@
-/* Copyright (C) 2009-2016 Free Software Foundation, Inc.
-   Contributed by Richard Henderson <r...@redhat.com>.
-
-   This file is part of the GNU Transactional Memory Library (libitm).
-
-   Libitm is free software; you can redistribute it and/or modify it
-   under the terms of the GNU General Public License as published by
-   the Free Software Foundation; either version 3 of the License, or
-   (at your option) any later version.
-
-   Libitm is distributed in the hope that it will be useful, but WITHOUT ANY
-   WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS
-   FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
-   more details.
-
-   Under Section 7 of GPL version 3, you are granted additional
-   permissions described in the GCC Runtime Library Exception, version
-   3.1, as published by the Free Software Foundation.
-
-   You should have received a copy of the GNU General Public License and
-   a copy of the GCC Runtime Library Exception along with this program;
-   see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
-   <http://www.gnu.org/licenses/>.  */
-
-#ifndef LIBITM_CACHELINE_H
-#define LIBITM_CACHELINE_H 1
-
-namespace GTM HIDDEN {
-
-// A cacheline is the smallest unit with which locks are associated.
-// The current implementation of the _ITM_[RW] barriers assumes that
-// all data types can fit (aligned) within a cachline, which means
-// in practice sizeof(complex long double) is the smallest cacheline size.
-// It ought to be small enough for 

[PATCH] libitm: Fix HTM fastpath.

2016-01-08 Thread Torvald Riegel
This patch fixes a thinko in the HTM fastpath implementation.  In a
nutshell, we also need to monitor the HTM fastpath control (ie,
htm_fastpath) variable from within a HW transaction on the HTM fastpath,
so that such a HW transaciton will only execute if the HTM hastpath is
still enabled.

We move htm_fastpath into the serial lock so that a HW transaction only
needs one cacheline of HTM capacity to monitor both htm_fastpath and
check that no non-HW-transaction is currently running.

Tested on x86_64-linux.

2016-01-08  Torvald Riegel  <trie...@redhat.com>

* beginend.cc (GTM::gtm_thread::serial_lock): Put on cacheline
boundary.
(htm_fastpath): Remove.
(gtm_thread::begin_transaction): Fix HTM fastpath.
(_ITM_commitTransaction): Adapt.
(_ITM_commitTransactionEH): Adapt.
* libitm/config/linux/rwlock.h (gtm_rwlock): Add htm_fastpath member
and accessors.
* libitm/config/posix/rwlock.h (gtm_rwlock): Likewise.
* libitm/config/posix/rwlock.cc (gtm_rwlock::gtm_rwlock): Adapt.
* libitm/config/x86/sjlj.S (_ITM_beginTransaction): Fix HTM fastpath.
* libitm/libitm_i.h (htm_fastpath): Remove declaration.
* libitm/method-serial.cc (htm_mg): Adapt.
(gtm_thread::serialirr_mode): Adapt.
* libitm/query.cc (_ITM_inTransaction, _ITM_getTransactionId): Adapt.
commit bfa6b5c51e110ca38c0ce5aa7234e7b719eee5a6
Author: Torvald Riegel <trie...@redhat.com>
Date:   Wed Jan 6 15:34:57 2016 +0100

libitm: Fix HTM fastpath.

diff --git a/libitm/beginend.cc b/libitm/beginend.cc
index 367edc8..015704b 100644
--- a/libitm/beginend.cc
+++ b/libitm/beginend.cc
@@ -32,7 +32,11 @@ using namespace GTM;
 extern __thread gtm_thread_tls _gtm_thr_tls;
 #endif
 
-gtm_rwlock GTM::gtm_thread::serial_lock;
+// Put this at the start of a cacheline so that serial_lock's writers and
+// htm_fastpath fields are on the same cacheline, so that HW transactions
+// only have to pay one cacheline capacity to monitor both.
+gtm_rwlock GTM::gtm_thread::serial_lock
+  __attribute__((aligned(HW_CACHELINE_SIZE)));
 gtm_thread *GTM::gtm_thread::list_of_threads = 0;
 unsigned GTM::gtm_thread::number_of_threads = 0;
 
@@ -54,9 +58,6 @@ static pthread_mutex_t global_tid_lock = PTHREAD_MUTEX_INITIALIZER;
 static pthread_key_t thr_release_key;
 static pthread_once_t thr_release_once = PTHREAD_ONCE_INIT;
 
-// See gtm_thread::begin_transaction.
-uint32_t GTM::htm_fastpath = 0;
-
 /* Allocate a transaction structure.  */
 void *
 GTM::gtm_thread::operator new (size_t s)
@@ -176,9 +177,11 @@ GTM::gtm_thread::begin_transaction (uint32_t prop, const gtm_jmpbuf *jb)
   // lock's writer flag and thus abort if another thread is or becomes a
   // serial transaction.  Therefore, if the fastpath is enabled, then a
   // transaction is not executing as a HW transaction iff the serial lock is
-  // write-locked.  This allows us to use htm_fastpath and the serial lock's
-  // writer flag to reliable determine whether the current thread runs a HW
-  // transaction, and thus we do not need to maintain this information in
+  // write-locked.  Also, HW transactions monitor the fastpath control
+  // variable, so that they will only execute if dispatch_htm is still the
+  // current method group.  This allows us to use htm_fastpath and the serial
+  // lock's writers flag to reliable determine whether the current thread runs
+  // a HW transaction, and thus we do not need to maintain this information in
   // per-thread state.
   // If an uninstrumented code path is not available, we can still run
   // instrumented code from a HW transaction because the HTM fastpath kicks
@@ -190,9 +193,14 @@ GTM::gtm_thread::begin_transaction (uint32_t prop, const gtm_jmpbuf *jb)
   // for any internal changes (e.g., they never abort visibly to the STM code
   // and thus do not trigger the standard retry handling).
 #ifndef HTM_CUSTOM_FASTPATH
-  if (likely(htm_fastpath && (prop & pr_hasNoAbort)))
+  if (likely(serial_lock.get_htm_fastpath() && (prop & pr_hasNoAbort)))
 {
-  for (uint32_t t = htm_fastpath; t; t--)
+  // Note that the snapshot of htm_fastpath that we take here could be
+  // outdated, and a different method group than dispatch_htm may have
+  // been chosen in the meantime.  Therefore, take care not not touch
+  // anything besides the serial lock, which is independent of method
+  // groups.
+  for (uint32_t t = serial_lock.get_htm_fastpath(); t; t--)
 	{
 	  uint32_t ret = htm_begin();
 	  if (htm_begin_success(ret))
@@ -200,9 +208,11 @@ GTM::gtm_thread::begin_transaction (uint32_t prop, const gtm_jmpbuf *jb)
 	  // We are executing a transaction now.
 	  // Monitor the writer flag in the serial-mode lock, and abort
 	  // if there is an active or waiting serial-mode transaction.
+	  // Also checks that htm_fastpath is still nonzero and thus
+	  // HW transactions

[PATCH v2] libstdc++: Make certain exceptions transaction_safe.

2016-01-07 Thread Torvald Riegel
The attached patch makes some exceptions transaction-safe, as require by
the Transactional Memory TS.  I believe I addressed all feedback for the
previous version of this patch (in particular, there are now more safety
checks for preconditions for this implementation (eg, that the new
allocator is used), all exceptions declared by the TM TS are now
supported (with the exception of tx_exception -- should I add that in a
follow-up patch?), and there is a test for the new functionality (as
part of libitm's testsuite)).

There are two things that complicate such support.  First, it seems
better to not rely on -fgnu-tm of libstdc++ code for now (or at least we
tried to avoid this so far).  Therefore, the transactional clones in
this patch are all manually instrumented (ie, the functions are C
functions with names matching the mangled names of the respective C++
functions, and the _ZGTt prefix signaling that they are txnal clones).

Second, exceptions still use a COW string internally, which cannot be
made transaction-safe with just compiler support because of the
reference counting implementation inside of COW strings, which uses
atomics.  One would need something custom for that nonetheless.

Thus, the patch adds txnal clones as required.  They are new exported
symbols, but not visible to nontransactional code.  The only changes to
headers is transaction_safe[_dynamic] annotations where required by the
TS, and a few friend declarations.  The annotations are only enabled if
a user compiles with -fgnu-tm.  IOW, the changes are pretty much
invisible when not using the TM TS.

There are also commented-out calls to _ITM_setAssociatedException in the
code, which exist to show how we plan to support transaction
cancellation through exceptions (which needs some more libitm support
and bugfixes on the compiler side).

Tested on x86_64-linux and x86-linux.

OK?

2016-01-07  Torvald Riegel  <trie...@redhat.com>

* include/bits/basic_string.h (basic_string): Declare friends.
* include/bits/c++config (_GLIBCXX_TXN_SAFE,
_GLIBCXX_TXN_SAFE_DYN, _GLIBCXX_USE_ALLOCATOR_NEW): New.
* include/std/stdexcept (logic_error, domain_error, invalid_argument,
length_error, out_of_range, runtime_error, range_error,
underflow_error, overflow_error): Declare members as transaction-safe.
(logic_error, runtime_error): Declare friend functions.
* libsupc++/exception (exception, bad_exception): Declare members as
transaction-safe.
* src/c++11/cow-stdexcept.cc: Define transactional clones for the
transaction-safe members of exceptions and helper functions.
* libsupc++/eh_exception.cc: Adjust and define transactional clones.
* config/abi/pre/gnu.ver (GLIBCXX_3.4.22) Add transactional clones.
(CXXABI_1.3.10): New.
* acinclude.m4 (GLIBCXX_CHECK_SIZE_T_MANGLING): New.
(GLIBCXX_ENABLE_ALLOCATOR): Set ENABLE_ALLOCATOR_NEW.
* configure.ac: Call GLIBCXX_CHECK_SIZE_T_MANGLING.
* include/Makefile.am: Write ENABLE_ALLOCATOR_NEW to c++config.h.
* include/Makefile.in: Regenerate.
* config.h.in: Regenerate.
* configure: Regenerate.

commit d4eda24eca5bf5d8d94f1d56762a9efd50294750
Author: Torvald Riegel <trie...@redhat.com>
Date:   Fri Nov 13 01:00:52 2015 +0100

libstdc++: Make certain exceptions transaction_safe.

diff --git a/libitm/testsuite/libitm.c++/libstdc++-safeexc.C b/libitm/testsuite/libitm.c++/libstdc++-safeexc.C
new file mode 100644
index 000..3e1655e
--- /dev/null
+++ b/libitm/testsuite/libitm.c++/libstdc++-safeexc.C
@@ -0,0 +1,89 @@
+// Tests that the exceptions declared by the TM TS (N4514) as transaction_safe
+// are indeed that.  Thus, this also tests the transactional clones in
+// libstdc++ and libsupc++.
+
+// { dg-do run }
+
+#include 
+#include 
+#include 
+#include 
+
+using namespace std;
+
+template void thrower(const T& t)
+{
+  try
+{
+  atomic_commit
+  {
+	throw t;
+  }
+}
+  catch (T ex)
+{
+  if (ex != t) abort ();
+}
+}
+
+template void thrower1(const string& what)
+{
+  try
+{
+  atomic_commit
+  {
+	throw T ();
+  }
+}
+  catch (T ex)
+{
+  if (what != ex.what()) abort ();
+}
+}
+
+template void thrower2(const string& what)
+{
+  try
+{
+  atomic_commit
+  {
+	throw T (what);
+  }
+}
+  catch (T ex)
+{
+  if (what != ex.what()) abort ();
+}
+}
+
+
+int main ()
+{
+  thrower (23);
+  thrower (23);
+  thrower (23);
+  thrower (23);
+  thrower (23);
+  thrower (23);
+  thrower (42);
+  thrower (42);
+  thrower (42);
+  thrower (42);
+  thrower (23.42);
+  thrower (23.42);
+  thrower (23.42);
+  thrower<void*> (0);
+  thrower<void**> (0);
+  thrower1 ("std::exception");
+  thrower1 ("std::bad_exception");
+  thrower2 ("test");
+  thrower2 ("test");
+  thrower2 ("test")

Re: [PATCH][WIP] libstdc++: Make certain exceptions transaction_safe.

2015-12-23 Thread Torvald Riegel
On Wed, 2015-12-16 at 21:05 +, Jonathan Wakely wrote:
> Sorry for the delay finishing this review, some of the code kept
> melting my brain ;-)

I know what you mean :)  Thanks for the review!

> On 14/11/15 20:45 +0100, Torvald Riegel wrote:
> >diff --git a/libstdc++-v3/config/abi/pre/gnu.ver 
> >b/libstdc++-v3/config/abi/pre/gnu.ver
> >index 1b3184a..d902b03 100644
> >--- a/libstdc++-v3/config/abi/pre/gnu.ver
> >+++ b/libstdc++-v3/config/abi/pre/gnu.ver
> >@@ -1876,6 +1876,12 @@ GLIBCXX_3.4.22 {
> > _ZNSt6thread6_StateD[012]Ev;
> > 
> > _ZNSt6thread15_M_start_threadESt10unique_ptrINS_6_StateESt14default_deleteIS1_EEPFvvE;
> > 
> >+# Support for the Transactional Memory TS (N4514)
> >+_ZGTtNSt11logic_errorC1EPKc;
> >+
> >_ZGTtNSt11logic_errorC1ERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE;
> >+_ZGTtNKSt11logic_error4whatEv;
> >+_ZGTtNSt11logic_errorD1Ev;
> >+
> > } GLIBCXX_3.4.21;
> 
> This is OK but ...
> 
> > # Symbols in the support library (libsupc++) have their own tag.
> >@@ -2107,6 +2113,12 @@ CXXABI_1.3.9 {
> > # operator delete[](void*, std::size_t)
> > _ZdaPv[jmy];
> > 
> >+# Support for the Transactional Memory TS (N4514)
> >+_ZGTtNKSt9exceptionD1Ev;
> >+_ZGTtNKSt9exception4whatEv;
> >+_ZGTtNKSt13bad_exceptionD1Ev;
> >+_ZGTtNKSt13bad_exception4whatEv;
> >+
> > } CXXABI_1.3.8;
> 
> That symbol version was already used in the gcc-5 release and so is
> frozen, you'll need CXXABI_1.3.10 for these new symbols (similar to
> https://gcc.gnu.org/ml/gcc-patches/2015-12/msg00258.html so if
> Catherine's already added that version you can just add them there).

Fixed.

> >diff --git a/libstdc++-v3/include/bits/c++config 
> >b/libstdc++-v3/include/bits/c++config
> >index 723feb1..0e66bb0 100644
> >--- a/libstdc++-v3/include/bits/c++config
> >+++ b/libstdc++-v3/include/bits/c++config
> >@@ -481,6 +481,17 @@ namespace std
> > # define _GLIBCXX_BEGIN_EXTERN_C extern "C" {
> > # define _GLIBCXX_END_EXTERN_C }
> > 
> >+// Conditionally enable annotations for the Transactional Memory TS on 
> >C++11.
> >+#if __cplusplus >= 201103L && \
> >+  _GLIBCXX_USE_CXX11_ABI && _GLIBCXX_USE_DUAL_ABI && \
> 
> It's possible we can make this work for the old ABI too, but this is
> OK for now. The old ABI always uses COW strings, but that's what the
> code you've written deals with anyway.

OK.  I would have to write the wrappers for the new strings too then,
and I wanted to avoid that for now.

> >+  defined(__cpp_transactional_memory) && __cpp_transactional_memory >= 
> >201505L
> 
> The defined(__cpp_transactional_memory) check is redundant, isn't it?
> 
> Users aren't allowed to define it, so it will either be defined to an
> integer value or undefined and evaluate to zero.

Won't we get an undefined warning when using -Wundef if we remove the
first part?  That's what I wanted to avoid.

> >+#define _GLIBCXX_TXN_SAFE transaction_safe
> >+#define _GLIBCXX_TXN_SAFE_DYN transaction_safe_dynamic
> >+#else
> >+#define _GLIBCXX_TXN_SAFE
> >+#define _GLIBCXX_TXN_SAFE_DYN
> >+#endif
> >+
> > #else // !__cplusplus
> > # define _GLIBCXX_BEGIN_EXTERN_C
> > # define _GLIBCXX_END_EXTERN_C
> 
> 
> >@@ -44,7 +46,36 @@ std::exception::what() const _GLIBCXX_USE_NOEXCEPT
> > }
> > 
> > const char* 
> >-std::bad_exception::what() const _GLIBCXX_USE_NOEXCEPT
> >+std::bad_exception::what() const _GLIBCXX_TXN_SAFE_DYN _GLIBCXX_USE_NOEXCEPT
> > {
> >   return "std::bad_exception";
> > }
> >+
> >+// Transactional clones for the destructors and what().
> >+// what() is effectively transaction_pure, but we do not want to annotate it
> >+// as such; thus, we call exactly the respective nontransactional function.
> >+extern "C" {
> >+
> >+void
> >+_ZGTtNKSt9exceptionD1Ev(const std::exception*)
> >+{ }
> >+
> >+const char*
> >+_ZGTtNKSt9exception4whatEv(const std::exception* that)
> >+{
> >+  return that->std::exception::what();
> >+}
> 
> This makes a non-virtual call, is that correct?
> 
> If users derive from std::exception and override what() they will
> expect a call to what() to dispatch to their override in the derived
> class, but IIUC in a transactional block they would call this
> function, which would call the base what(), not their override.

Yes.  I added this comment to _ZGTtNKSt9exception4whatEv and also
referenced it from a comment in _ZGTt

Re: [patch] libstdc++/68921 add timeout argument to futex(2)

2015-12-15 Thread Torvald Riegel
On Tue, 2015-12-15 at 18:46 +, Jonathan Wakely wrote:
> This fixes a missing argument to the futex syscall.
> 
> Tested powerpc64le-linux. This needs to be fixed for gcc-5 and trunk.
> 

OK.  Thanks!



[COMMITTED] libitm: Use multiplicative hashing in the multi-lock TM method.

2015-11-26 Thread Torvald Riegel
This changes the hash function used in the multi-lock TM method for
mapping the address of a transactional access by an application to
synchronization metadata.  The previous hash function was very simple
and just used a range of bits from the address; the new function uses
multiplicative hashing (see the code comments for more background).  The
previous function was fast, but suffered from cases of pathological
performance degradation because of power-of-two aliasing (eg, allocator
arenas); to get okay performance, one had to use a substantial amount of
synchronization metadata.  The new hash function uses two 32b
multiplications (and we might get rid of one of them by specializing the
code more), but this doesn't seem to affect single-thread overheads much
at least on mainstream x86.  It makes the TM method scale better by
suffering from fewer false conflicts, while still using 8x less memory
for the synchronization metadata (512KB currently on 64b, 256KB on 32b).
The space overhead can be further reduces if we think that's important
(e.g., 32KB/16KB resulted in 10% less performance in prior tests).

Tested on x86_64-linux with the libitm tests, and a quick custom test.
In-depth performance testing in another implementation similar to libitm
has been done before, see the URL cited in the code's comments.
commit 7c6d5c7221b85fd82bcb8c59c90ae39b14883b98
Author: Torvald Riegel <trie...@redhat.com>
Date:   Thu Nov 26 16:52:04 2015 +0100

libitm: Use multiplicative hashing in the multi-lock TM method.

	* method-ml.cc (ml_mg): Use multiplicative instead of simple hashing.
	(ml_wt_dispatch::pre_write): Adapt.
	(ml_wt_dispatch::pre_load): Likewise.

diff --git a/libitm/method-ml.cc b/libitm/method-ml.cc
index 37cb08b..b4cabc8 100644
--- a/libitm/method-ml.cc
+++ b/libitm/method-ml.cc
@@ -69,23 +69,51 @@ struct ml_mg : public method_group
   atomic* orecs __attribute__((aligned(HW_CACHELINE_SIZE)));
   char tailpadding[HW_CACHELINE_SIZE - sizeof(atomic*)];
 
-  // Location-to-orec mapping.  Stripes of 16B mapped to 2^19 orecs.
-  static const gtm_word L2O_ORECS = 1 << 19;
-  static const gtm_word L2O_SHIFT = 4;
-  static size_t get_orec(const void* addr)
+  // Location-to-orec mapping.  Stripes of 32B mapped to 2^16 orecs using
+  // multiplicative hashing.  See Section 5.2.2 of Torvald Riegel's PhD thesis
+  // for the background on this choice of hash function and parameters:
+  // http://nbn-resolving.de/urn:nbn:de:bsz:14-qucosa-115596
+  // We pick the Mult32 hash because it works well with fewer orecs (i.e.,
+  // less space overhead and just 32b multiplication).
+  // We may want to check and potentially change these settings once we get
+  // better or just more benchmarks.
+  static const gtm_word L2O_ORECS_BITS = 16;
+  static const gtm_word L2O_ORECS = 1 << L2O_ORECS_BITS;
+  // An iterator over the orecs covering the region [addr,addr+len).
+  struct orec_iterator
   {
-return ((uintptr_t)addr >> L2O_SHIFT) & (L2O_ORECS - 1);
-  }
-  static size_t get_next_orec(size_t orec)
-  {
-return (orec + 1) & (L2O_ORECS - 1);
-  }
-  // Returns the next orec after the region.
-  static size_t get_orec_end(const void* addr, size_t len)
-  {
-return (((uintptr_t)addr + len + (1 << L2O_SHIFT) - 1) >> L2O_SHIFT)
-& (L2O_ORECS - 1);
-  }
+static const gtm_word L2O_SHIFT = 5;
+static const uint32_t L2O_MULT32 = 81007;
+uint32_t mult;
+size_t orec;
+size_t orec_end;
+orec_iterator (const void* addr, size_t len)
+{
+  uint32_t a = (uintptr_t) addr >> L2O_SHIFT;
+  uint32_t ae = ((uintptr_t) addr + len + (1 << L2O_SHIFT) - 1)
+	  >> L2O_SHIFT;
+  mult = a * L2O_MULT32;
+  orec = mult >> (32 - L2O_ORECS_BITS);
+  // We can't really avoid this second multiplication unless we use a
+  // branch instead or know more about the alignment of addr.  (We often
+  // know len at compile time because of instantiations of functions
+  // such as _ITM_RU* for accesses of specific lengths.
+  orec_end = (ae * L2O_MULT32) >> (32 - L2O_ORECS_BITS);
+}
+size_t get() { return orec; }
+void advance()
+{
+  // We cannot simply increment orec because L2O_MULT32 is larger than
+  // 1 << (32 - L2O_ORECS_BITS), and thus an increase of the stripe (i.e.,
+  // addr >> L2O_SHIFT) could increase the resulting orec index by more
+  // than one; with the current parameters, we would roughly acquire a
+  // fourth more orecs than necessary for regions covering more than orec.
+  // Keeping mult around as extra state shouldn't matter much.
+  mult += L2O_MULT32;
+  orec = mult >> (32 - L2O_ORECS_BITS);
+}
+bool reached_end() { return orec == orec_end; }
+  };
 
   virtual void init()
   {
@@ -142,14 +170,13 @@ protected:
 gtm_word locked_by_tx = ml_mg::set_locked(tx);
 
 // Lock all ore

[COMMITTED] libitm: Fix recent changes to allocations log.

2015-11-22 Thread Torvald Riegel
This fixes an oversight in the recent changes to the allocations log for
supporting sized delete.

Tested libitm on x86_64-linux.  COmmitted as obvious.

libitm/
* libitm_i.h (gtm_alloc_action): Remove union.
* testsuite/libitm.c/alloc-1.c: New.

commit 74c5fd924fe3e8d6bececce209d00bf0523bb4dc
Author: Torvald Riegel <trie...@redhat.com>
Date:   Sun Nov 22 21:54:24 2015 +0100

libitm: Fix recent changes to allocations log.

	libitm/
	* libitm_i.h (gtm_alloc_action): Remove union.
	* testsuite/libitm.c/alloc-1.c: New.

diff --git a/libitm/libitm_i.h b/libitm/libitm_i.h
index f01a1ab..4b72348 100644
--- a/libitm/libitm_i.h
+++ b/libitm/libitm_i.h
@@ -106,12 +106,10 @@ namespace GTM HIDDEN {
 // the template used inside gtm_thread can instantiate.
 struct gtm_alloc_action
 {
-  // Iff free_fn_sz is nonzero, it must be used instead of free_fn.
-  union
-  {
-void (*free_fn)(void *);
-void (*free_fn_sz)(void *, size_t);
-  };
+  // Iff free_fn_sz is nonzero, it must be used instead of free_fn, and vice
+  // versa.
+  void (*free_fn)(void *);
+  void (*free_fn_sz)(void *, size_t);
   size_t sz;
   // If true, this is an allocation; we discard the log entry on outermost
   // commit, and deallocate on abort.  If false, this is a deallocation and
diff --git a/libitm/testsuite/libitm.c/alloc-1.c b/libitm/testsuite/libitm.c/alloc-1.c
new file mode 100644
index 000..49faab5
--- /dev/null
+++ b/libitm/testsuite/libitm.c/alloc-1.c
@@ -0,0 +1,17 @@
+// Test that rolling back allocations works.
+#include 
+
+void __attribute((transaction_pure,noinline)) dont_optimize(void* p)
+{
+  *(volatile char *) p;
+}
+
+int main()
+{
+  __transaction_atomic {
+void *p = malloc (23);
+dont_optimize (p);
+__transaction_cancel;
+  }
+  return 0;
+}


Re: [PATCH] Transactional Memory: Support __cxa_free_exception and fix exception handling.

2015-11-19 Thread Torvald Riegel
On Thu, 2015-11-19 at 11:18 -0600, Peter Bergner wrote:
> On Thu, 2015-11-19 at 09:35 -0600, Torvald Riegel wrote:
> > Tested using the libitm testsuite on x86_64-linux.
> 
> Tested on powerpc64le-linux with no regressions and I confirmed
> the new eh-5.C test case passes.

Thanks.  Then I'll commit this if I hear no objections from rth or
Jason.



[PATCH] Transactional Memory: Support __cxa_free_exception and fix exception handling.

2015-11-18 Thread Torvald Riegel
The EH scheme that we had been using for TM / libitm doesn't work
properly.  We fail to handle throwing exceptions whose constructors may
throw themselves.  We also do not clean up properly in all situations
when a transactions abort while being in the process of throwing an
exception.
This patch solves  this particular problem by adding a transactional
wrapper for __cxa_free_exception and changing the EH scheme in libitm.

Follow-up patches will fix other issues that we have identified.  Some
of the changes to the libitm.texi ABI docs added in this patch already
take this future work into account.

Tested using the libitm testsuite on x86_64-linux.

Are the gcc/ bits OK?

gcc/cp/
* except.c (do_free_exception): Use transactional wrapper.

libitm/
* testsuite/libitm.c++/eh-5.C: New.
* libitm.h (_ITM_cxa_free_exception): New.
* libitm.map (_ITM_cxa_free_exception): Add it.
* libitm.texi: Update ABI docs.
* libitm_i.h (gtm_transaction_cp::cxa_unthrown): Remove.
(gtm_transaction_cp::cxa_uncaught_count): Add.
(gtm_thread::cxa_unthrown): Remove.
(gtm_thread::cxa_uncaught_count_ptr): Add.
(gtm_thread::cxa_uncaught_count): Add.
(gtm_thread::drop_references_allocations): Rename to...
(gtm_thread::discard_allocation): ... this and adapt.
(gtm_thread::init_cpp_exceptions): New.
* beginend.cc (gtm_thread::gtm_thread): Adapt EH handling.
(gtm_thread::begin_transaction): Likewise.
(gtm_transaction_cp::save): Likewise.
(gtm_thread::trycommit): Likewise.
* eh_cpp.cc: Add overview comments.
(__cxa_eh_globals, __cxa_get_globals, __cxa_free_exception): Declare.
(free_any_exception, _ITM_cxa_free_exception): New.
(gtm_thread::init_cpp_exceptions): Define.
(_ITM_cxa_allocate_exception, _ITM_cxa_throw): Adapt.
(_ITM_cxa_begin_catch, _ITM_cxa_end_catch): Likewise.
(gtm_thread::revert_cpp_exceptions): Likewise.

commit 23bd34e3c8028a12705a47d13a4c7aa36bfeca60
Author: Torvald Riegel <trie...@redhat.com>
Date:   Tue Nov 3 15:38:22 2015 +0100

Support __cxa_free_exception and fix exception handling.

	gcc/cp/
	* except.c (do_free_exception): Use transactional wrapper.

	libitm/
	* testsuite/libitm.c++/eh-5.C: New.
	* libitm.h (_ITM_cxa_free_exception): New.
	* libitm.map (_ITM_cxa_free_exception): Add it.
	* libitm.texi: Update ABI docs.
	* libitm_i.h (gtm_transaction_cp::cxa_unthrown): Remove.
	(gtm_transaction_cp::cxa_uncaught_count): Add.
	(gtm_thread::cxa_unthrown): Remove.
	(gtm_thread::cxa_uncaught_count_ptr): Add.
	(gtm_thread::cxa_uncaught_count): Add.
	(gtm_thread::drop_references_allocations): Rename to...
	(gtm_thread::discard_allocation): ... this and adapt.
	(gtm_thread::init_cpp_exceptions): New.
	* beginend.cc (gtm_thread::gtm_thread): Adapt EH handling.
	(gtm_thread::begin_transaction): Likewise.
	(gtm_transaction_cp::save): Likewise.
	(gtm_thread::trycommit): Likewise.
	* eh_cpp.cc: Add overview comments.
	(__cxa_eh_globals, __cxa_get_globals, __cxa_free_exception): Declare.
	(free_any_exception, _ITM_cxa_free_exception): New.
	(gtm_thread::init_cpp_exceptions): Define.
	(_ITM_cxa_allocate_exception, _ITM_cxa_throw): Adapt.
	(_ITM_cxa_begin_catch, _ITM_cxa_end_catch): Likewise.
	(gtm_thread::revert_cpp_exceptions): Likewise.

diff --git a/gcc/cp/except.c b/gcc/cp/except.c
index 9b2450d..ad40436 100644
--- a/gcc/cp/except.c
+++ b/gcc/cp/except.c
@@ -662,6 +662,16 @@ do_free_exception (tree ptr)
   /* Declare void __cxa_free_exception (void *) throw().  */
   fn = declare_library_fn (fn, void_type_node, ptr_type_node,
 			   ECF_NOTHROW | ECF_LEAF);
+
+  if (flag_tm)
+	{
+	  tree fn2 = get_identifier ("_ITM_cxa_free_exception");
+	  if (!get_global_value_if_present (fn2, ))
+	fn2 = declare_library_fn (fn2, void_type_node,
+  ptr_type_node,
+  ECF_NOTHROW | ECF_LEAF | ECF_TM_PURE);
+	  record_tm_replacement (fn, fn2);
+	}
 }
 
   return cp_build_function_call_nary (fn, tf_warning_or_error, ptr, NULL_TREE);
diff --git a/libitm/beginend.cc b/libitm/beginend.cc
index c3ed11b..86f7b39 100644
--- a/libitm/beginend.cc
+++ b/libitm/beginend.cc
@@ -132,6 +132,8 @@ GTM::gtm_thread::gtm_thread ()
   number_of_threads_changed(number_of_threads - 1, number_of_threads);
   serial_lock.write_unlock ();
 
+  init_cpp_exceptions ();
+
   if (pthread_once(_release_once, thread_exit_init))
 GTM_fatal("Initializing thread release TLS key failed.");
   // Any non-null value is sufficient to trigger destruction of this
@@ -383,6 +385,11 @@ GTM::gtm_thread::begin_transaction (uint32_t prop, const gtm_jmpbuf *jb)
 #endif
 }
 
+  // Log the number of uncaught exceptions if we might have to roll back this
+  // state.
+  if (tx->cxa_unc

[PATCH][WIP] libstdc++: Make certain exceptions transaction_safe.

2015-11-14 Thread Torvald Riegel
The attached patch makes some exceptions transaction-safe, as require by
the Transactional Memory TS.  It has some rough edges, but I'm hoping I
can sort them out quickly using your feedback.  It only supports
logic_error and exception/bad_exception, but the other exceptions that
the TM TS specifies as transaction-safe basically require the same
support (notably, runtime_error is the same as logic_error, and their
subclasses don't add anything).

There are two things that complicate such support.  First, it seems
better to not rely on -fgnu-tm of libstdc++ code for now (or at least we
tried to avoid this so far).  Therefore, the transactional clones in
this patch are all manually instrumented (ie, the functions are C
functions with names matching the mangled names of the respective C++
functions, and the _ZGTt prefix signaling that they are txnal clones).

Second, exceptions still use a COW string internally, which cannot be
made transaction-safe with just compiler support because of the
reference counting implementation inside of COW strings, which uses
atomics.  One would need something custom for that nonetheless.

Thus, the patch adds txnal clones as required.  They are new exported
symbols, but not visible to nontransactional code.  The only changes to
headers is transaction_safe[_dynamic] annotations where required by the
TS, and a few friend declarations.  The annotaitons are only enabled if
a user compiles with -fgnu-tm.  IOW, the changes are pretty much
invisible when not using the TM TS.

The patch doesn't include tests yet, but tests like this one seem to run
successfully on x86_64:

template void thrower4(const string& what)
{
  try
{
  // Creating a temporary inside of the txn ICEs.
  T t(what);
  atomic_commit
  {
_ITM_hackOrTxnProp (pr_atomicCancel);
dontoptimize (t.what());
throw T (what);
  }
}
  catch (T ex)
{
  if (what != ex.what()) abort ();
}
}
thrower4 ("test");

I can't yet test the destructors because of issues on the compiler side.
There are also commented-out calls to _ITM_setAssociatedException in the
code, which exist to show how we plan to support transaction
cancellation through exceptions (which needs some more libitm support
and bugfixes on the compiler side).
commit e81080a01ab0daf2949a400c1a2d5077d37ba515
Author: Torvald Riegel <trie...@redhat.com>
Date:   Fri Nov 13 01:00:52 2015 +0100

libstdc++: Make certain exceptions transaction_safe.

diff --git a/libstdc++-v3/config/abi/pre/gnu.ver b/libstdc++-v3/config/abi/pre/gnu.ver
index 1b3184a..d902b03 100644
--- a/libstdc++-v3/config/abi/pre/gnu.ver
+++ b/libstdc++-v3/config/abi/pre/gnu.ver
@@ -1876,6 +1876,12 @@ GLIBCXX_3.4.22 {
 _ZNSt6thread6_StateD[012]Ev;
 _ZNSt6thread15_M_start_threadESt10unique_ptrINS_6_StateESt14default_deleteIS1_EEPFvvE;
 
+# Support for the Transactional Memory TS (N4514)
+_ZGTtNSt11logic_errorC1EPKc;
+_ZGTtNSt11logic_errorC1ERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE;
+_ZGTtNKSt11logic_error4whatEv;
+_ZGTtNSt11logic_errorD1Ev;
+
 } GLIBCXX_3.4.21;
 
 # Symbols in the support library (libsupc++) have their own tag.
@@ -2107,6 +2113,12 @@ CXXABI_1.3.9 {
 # operator delete[](void*, std::size_t)
 _ZdaPv[jmy];
 
+# Support for the Transactional Memory TS (N4514)
+_ZGTtNKSt9exceptionD1Ev;
+_ZGTtNKSt9exception4whatEv;
+_ZGTtNKSt13bad_exceptionD1Ev;
+_ZGTtNKSt13bad_exception4whatEv;
+
 } CXXABI_1.3.8;
 
 # Symbols in the support library (libsupc++) supporting transactional memory.
diff --git a/libstdc++-v3/include/bits/basic_string.h b/libstdc++-v3/include/bits/basic_string.h
index b3853cd..47d9f2f 100644
--- a/libstdc++-v3/include/bits/basic_string.h
+++ b/libstdc++-v3/include/bits/basic_string.h
@@ -4904,6 +4904,18 @@ _GLIBCXX_END_NAMESPACE_CXX11
   int
   compare(size_type __pos, size_type __n1, const _CharT* __s,
 	  size_type __n2) const;
+
+# ifdef _GLIBCXX_TM_TS_INTERNAL
+  friend void
+  ::_txnal_cow_string_C1_for_exceptions(void* that, const char* s,
+	void* exc);
+  friend const char*
+  ::_txnal_cow_string_c_str(const void *that);
+  friend void
+  ::_txnal_cow_string_D1(void *that);
+  friend void
+  ::_txnal_cow_string_D1_commit(void *that);
+# endif
   };
 #endif  // !_GLIBCXX_USE_CXX11_ABI
 
diff --git a/libstdc++-v3/include/bits/c++config b/libstdc++-v3/include/bits/c++config
index 723feb1..0e66bb0 100644
--- a/libstdc++-v3/include/bits/c++config
+++ b/libstdc++-v3/include/bits/c++config
@@ -481,6 +481,17 @@ namespace std
 # define _GLIBCXX_BEGIN_EXTERN_C extern "C" {
 # define _GLIBCXX_END_EXTERN_C }
 
+// Conditionally enable annotations for the Transactional Memory TS on C++11.
+#if __cplusplus >= 201103L && \
+  _GLIBCXX_USE_CXX11_ABI && _GLIBCXX_USE_DUAL_ABI && \
+  defined(__cpp_transactional_memory) && __cpp_transactional_memor

Re: State of support for the ISO C++ Transactional Memory TS and remanining work

2015-11-12 Thread Torvald Riegel
On Wed, 2015-11-11 at 15:04 +, Szabolcs Nagy wrote:
> On 10/11/15 18:29, Torvald Riegel wrote:
> > On Tue, 2015-11-10 at 17:26 +, Szabolcs Nagy wrote:
> >> On 09/11/15 00:19, Torvald Riegel wrote:
> >>> I've not yet created tests for the full list of functions specified as
> >>> transaction-safe in the TS, but my understanding is that this list was
> >>> created after someone from the ISO C++ TM study group looked at libstdc
> >>> ++'s implementation and investigated which functions might be feasible
> >>> to be declared transaction-safe in it.
> >>>
> >>
> >> is that list available somewhere?
> >
> > See the TM TS, N4514.
> >
> 
> i was looking at an older version,
> things make more sense now.
> 
> i think system() should not be transaction safe..
> 
> i wonder what's the plan for getting libc functions
> instrumented (i assume that is needed unless hw
> support is used).

No specific plans so far.  We'll wait and see, I guess.  TM is still in
a chicken-and-egg situation.

> >> xmalloc
> >> the runtime exits on memory allocation failure,
> >> so it is not possible to use it safely.
> >> (i think it should be possible to roll back the
> >> transaction in case of internal allocation failure
> >> and retry with a strategy that does not need dynamic
> >> allocation).
> >
> > Not sure what you mean by "safely".  Hardening against out-of-memory
> > situations hasn't been considered to be of high priority so far, but I'd
> > accept patches for that that don't increase complexity signifantly and
> > don't hamper performance.
> >
> 
> i consider this a library safety issue.
> 
> (a library or runtime is not safe to use if it may terminate
> the process in case of internal failures.)

If it is truly a purely internal failure, then aborting might be the
best thing one can do if there is no sensible way to try to recover from
the error (ie, take a fail-fast approach).
Out-of-memory errors are not purely internal failures.  I agree that it
would be nice to have a fallback, but for some features there simply is
none (eg, the program can't require rollback to be possible and yet not
provide enough memory for this to be achievable).  Given that this
transactions have to be used from C programs too, there's not much
libitm can do except perhaps call user-supplied handlers.

> >> uint64_t GTM::gtm_spin_count_var = 1000;
> >> i guess this was supposed to be tunable.
> >> it seems libitm needs some knobs (strategy, retries,
> >> spin counts), but there is no easy way to adapt these
> >> for a target/runtime environment.
> >
> > Sure, more performance tuning knobs would be nice.
> >
> 
> my problem was with getting the knobs right at runtime.
> 
> (i think this will need a solution to make tm practically
> useful, there are settings that seem to be sensitive to
> the properties of the underlying hw.. this also seems
> to be a problem for glibc lock elision retry policies.)

Yes, that applies to many tuning settings in lots of places.  And
certainly to TM implementations too :)

> >> sys_futex0
> >> i'm not sure why this has arch specific implementations
> >> for some targets but not others. (syscall is not in the
> >> implementation reserved namespace).
> >
> > Are there archs that support libitm but don't have a definition of this
> > one?
> >
> 
> i thought all targets were supported on linux
> (the global lock based strategies should work)
> i can prepare a sys_futex0 for arm and aarch64.

arm and aarch64 should be supported according to configure.tgt.  Also
see the comment in config/linux/futex_bits.h if you want to change
something there.  I haven't tried arm at all so far.



Re: State of support for the ISO C++ Transactional Memory TS and remanining work

2015-11-10 Thread Torvald Riegel
On Tue, 2015-11-10 at 17:26 +, Szabolcs Nagy wrote:
> On 09/11/15 00:19, Torvald Riegel wrote:
> > Hi,
> >
> > I'd like to summarize the current state of support for the TM TS, and
> > outline the current plan for the work that remains to complete the
> > support.
> >
> > I'm aware we're at the end of stage 1, but I'm confident we can still
> > finish this work and hope to include it in GCC 6 because:
> > (1) most of the support is already in GCC, and we have a big head start
> > in the space of TM so it would be unfortunate to not waste that by not
> > delivering support for the TM TS,
> > (2) this is a TS and support for it is considered experimental,
> > (3) most of the affected code is in libitm or the compiler's TM passes,
> > which has to be enabled explicitly by the user.
> >
> > Currently, we have complete support for the syntax and all necessary
> > instrumentation except the exception handling bits listed below.  libitm
> > has a good set of STM and HTM-based algorithms.
> >
> >
> > What is missing on the compiler side is essentially a change of how we
> > support atomic_noexcept and atomic_cancel, in particular exception
> > handling.  Instead of just using a finally block as done currently, the
> > compiler need to build a catch clause so that it can actively intercept
> > exceptions that escape an atomic_noexcept or atomic_cancel.  For
> > atomic_noexcept, the compiler needs to include a call to abort() in the
> > catch clause.
> >
> >
> > For atomic_cancel, it needs to call ITM_commitTransactionEH in the catch
> > clause, and use NULL as exception argument.  This can then be used by
> > libitm to look at the currently being handled exception and (a) check
> > whether the type support transaction cancellation as specified by the TS
> > and (b) pick out the allocations that belong to this exception and roll
> > back everything else before rethrowing this exception.
> >
> > For (a), it's probably best to place this check into libstdc++
> > (specifically, libsupc++ I suppose) because support for transaction
> > cancellation is a property that library parts of the standard (or the
> > TS) require, and that has to match the implementation in libstdc++.
> > Attached is a patch by Jason that implements this check.  This adds one
> > symbol, which should be okay we hope.
> >
> 
> does that mean libitm will depend on libstdc++?

No, weak references are used to avoid that.  See libitm/eh_cpp.cc for
example.

> i think the tm runtime should not depend on c++,
> so it is usable from c code.
> 
> > For (b), our plan is to track the additional allocations that happen
> > when during construction of the exception types that support
> > cancellation (eg, creating the what() string for logic_error).  There
> > are several ways to do that, one of that being that we create custom
> > transactional clones of those constructors that tell libitm that either
> > such a constructor is currently running or explicitly list the
> > allocations that have been made by the constructor; eventually, we would
> > always (copy) construct into memory returned by cxa_allocate_exception,
> > which then makes available the necessary undo information when such an
> > exception is handled in libitm.
> >
> >
> > The other big piece of missing support is making sure that the functions
> > that are specified in the TS as transaction_safe are indeed that.  I
> > believe we do not need to actually add such annotations to any libstdc++
> > functions that are already transaction-safe and completely defined in
> > headers -- those functions are implicitly transaction-safe, and we can
> > thus let the compiler isntrument them at the point of use inside of a
> > transaction.
> >
> > If a supposedly transaction-safe function is not defined in a header,
> > we'd need a transaction_safe annotation at the declaration.  Jason has
> > implemented the TM TS feature test macro, so we can only add the
> > annotation if the user has enabled support for the TM TS in the
> > respective compilation process.
> 
> sounds ugly: the function type (transaction-safety)
> depends on the visibility of the definition..

I don't understand why that would be the case.  The TS specifies whether
a function has to be safe or not.  We strive to implement that to
support the TS.  Nonetheless, the TS also has the notion of a function
being implicitly transaction-safe (ie, if the compiler can check that it
is, it can be used as-if annotated as transacton-safe even if it
actually isn't).
Also note that you can't overload based on tra

[PATCH] libitm: Support __cxa_free_exception and fix exception handling.

2015-11-08 Thread Torvald Riegel
See the added overview comments in eh_cpp.cc.
This is still lacking updated docs in the ABI spec, which I can add
later.  We don't need to change __cxa_tm_cleanup in libsupc++, but don't
use the first two arguments of it anymore, which we may want to document
too.

Thoughts?
commit 0a67dc5a13fd17a24fc667a251d000a73cd5159e
Author: Torvald Riegel <trie...@redhat.com>
Date:   Tue Nov 3 15:38:22 2015 +0100

Support __cxa_free_exception and fix exception handling.

diff --git a/libitm/beginend.cc b/libitm/beginend.cc
index c3ed11b..86f7b39 100644
--- a/libitm/beginend.cc
+++ b/libitm/beginend.cc
@@ -132,6 +132,8 @@ GTM::gtm_thread::gtm_thread ()
   number_of_threads_changed(number_of_threads - 1, number_of_threads);
   serial_lock.write_unlock ();
 
+  init_cpp_exceptions ();
+
   if (pthread_once(_release_once, thread_exit_init))
 GTM_fatal("Initializing thread release TLS key failed.");
   // Any non-null value is sufficient to trigger destruction of this
@@ -383,6 +385,11 @@ GTM::gtm_thread::begin_transaction (uint32_t prop, const gtm_jmpbuf *jb)
 #endif
 }
 
+  // Log the number of uncaught exceptions if we might have to roll back this
+  // state.
+  if (tx->cxa_uncaught_count_ptr != 0)
+tx->cxa_uncaught_count = *tx->cxa_uncaught_count_ptr;
+
   // Run dispatch-specific restart code. Retry until we succeed.
   GTM::gtm_restart_reason rr;
   while ((rr = disp->begin_or_restart()) != NO_RESTART)
@@ -411,7 +418,7 @@ GTM::gtm_transaction_cp::save(gtm_thread* tx)
   id = tx->id;
   prop = tx->prop;
   cxa_catch_count = tx->cxa_catch_count;
-  cxa_unthrown = tx->cxa_unthrown;
+  cxa_uncaught_count = tx->cxa_uncaught_count;
   disp = abi_disp();
   nesting = tx->nesting;
 }
@@ -583,7 +590,6 @@ GTM::gtm_thread::trycommit ()
   undolog.commit ();
   // Reset further transaction state.
   cxa_catch_count = 0;
-  cxa_unthrown = NULL;
   restart_total = 0;
 
   // Ensure privatization safety, if necessary.
diff --git a/libitm/eh_cpp.cc b/libitm/eh_cpp.cc
index a86dbf1..e28c057 100644
--- a/libitm/eh_cpp.cc
+++ b/libitm/eh_cpp.cc
@@ -26,6 +26,50 @@
 
 using namespace GTM;
 
+/* Exceptions can exist in three phases: (1) after having been allocated by
+   __cxa_allocate_exception but before being handed off to __cxa_throw,
+   (2) when they are in flight, so between __cxa_throw and __cxa_begin_catch,
+   and (3) when they are being handled (between __cxa_begin_catch and
+   __cxa_end_catch).
+
+   We can get aborts in all three phases, for example in (1) during
+   construction of the exception object, or in (2) during destructors called
+   while unwinding the stack.  The transaction that created an exception
+   object can commit in phase (2) but not in phases (1) and (3) because both
+   throw expressions and catch clauses are properly nested wrt transactions.
+
+   We handle phase (1) by dealing with exception objects similar to how we
+   deal with other (de)allocations, which also ensures that we can have more
+   than one exception object allocated at the same time (e.g., if the
+   throw expression itself throws an exception and thus calls
+   __cxa_allocate_exception).  However, on the call to __cxa_begin_catch
+   we hand off the exception to the special handling of phase (3) and
+   remove the undo log entry of the allocation.  Note that if the allocation
+   happened outside of this transaction, we do not need to do anything.
+
+   When an exception reaches phase (2) due to a call to __cxa_throw, the count
+   of uncaught exceptions is incremented.  We roll back this effect by saving
+   and restoring this number in the structure returned from __cxa_get_globals.
+   This also takes care of increments of this count when rethrowing an
+   exception.
+
+   For phase (3), we keep track of the number of times __cxa_begin_catch
+   has been called without a matching call to __cxa_end_catch.  This count
+   is then used by __cxa_tm_cleanup to roll back the exception handling state
+   by calling __cxa_end_catch for the exceptions that have not been finished
+   yet (without running destructors though because we roll back the memory
+   anyway).
+   Once an exception that was allocated in this transaction enters phase (3),
+   it does not need to be deallocated on abort anymore because the calls to
+   __cxa_end_catch will take care of that.
+
+   We require all code executed by the transaction to be transaction_safe (or
+   transaction_pure, or to have wrappers) if the transaction is to be rolled
+   back.  However, we take care to not require this for transactions that
+   just commit; this way, transactions that enter serial mode and then call
+   uninstrumented code continue to work.
+   */
+
 /* Everything from libstdc++ is weak, to avoid requiring that library
to be linked into plain C applications using libitm.so.  */
 
@@ -33,85 +77,138 @@ using namespace GTM;
 
 extern "C" {
 
+struct __cxa_e

[PATCH v2] libitm: Support sized delete.

2015-11-08 Thread Torvald Riegel
This patch supports the sized variants of operator delete.
Some change compare to v1.
Tested on x86_64-linux.
commit df00a283f2e37bd3c69f37783fa81dde7ccf1f94
Author: Torvald Riegel <trie...@redhat.com>
Date:   Thu Oct 29 18:52:20 2015 +0100

Support sized delete.

This adds transactional clones of the sized version of operator delete.

diff --git a/libitm/alloc.cc b/libitm/alloc.cc
index bb292da..7b8786c 100644
--- a/libitm/alloc.cc
+++ b/libitm/alloc.cc
@@ -29,26 +29,38 @@ namespace GTM HIDDEN {
 void
 gtm_thread::record_allocation (void *ptr, void (*free_fn)(void *))
 {
-  uintptr_t iptr = (uintptr_t) ptr;
-
-  gtm_alloc_action *a = this->alloc_actions.find(iptr);
-  if (a == 0)
-a = this->alloc_actions.insert(iptr);
+  // We do not deallocate before outermost commit, so we should never have
+  // an existing log entry for a new allocation.
+  gtm_alloc_action *a = this->alloc_actions.insert((uintptr_t) ptr);
 
   a->free_fn = free_fn;
+  a->free_fn_sz = 0;
   a->allocated = true;
 }
 
 void
 gtm_thread::forget_allocation (void *ptr, void (*free_fn)(void *))
 {
-  uintptr_t iptr = (uintptr_t) ptr;
-
-  gtm_alloc_action *a = this->alloc_actions.find(iptr);
-  if (a == 0)
-a = this->alloc_actions.insert(iptr);
-
+  // We do not deallocate before outermost commit, so we should never have
+  // an existing log entry for a deallocation at the same address.  We may
+  // have an existing entry for a matching allocation, but this is handled
+  // correctly because both are complementary in that only one of these will
+  // cause an action at commit or abort.
+  gtm_alloc_action *a = this->alloc_actions.insert((uintptr_t) ptr);
   a->free_fn = free_fn;
+  a->free_fn_sz = 0;
+  a->allocated = false;
+}
+
+void
+gtm_thread::forget_allocation (void *ptr, size_t sz,
+			   void (*free_fn_sz)(void *, size_t))
+{
+  // Same as forget_allocation but with a size.
+  gtm_alloc_action *a = this->alloc_actions.insert((uintptr_t) ptr);
+  a->free_fn = 0;
+  a->free_fn_sz = free_fn_sz;
+  a->sz = sz;
   a->allocated = false;
 }
 
@@ -67,31 +79,27 @@ commit_allocations_2 (uintptr_t key, gtm_alloc_action *a, void *data)
 
   if (cb_data->revert_p)
 {
-  // Roll back nested allocations.
+  // Roll back nested allocations, discard deallocations.
   if (a->allocated)
-	a->free_fn (ptr);
+	{
+	  if (a->free_fn_sz != 0)
+	a->free_fn_sz (ptr, a->sz);
+	  else
+	a->free_fn (ptr);
+	}
 }
   else
 {
-  if (a->allocated)
-	{
-	  // Add nested allocations to parent transaction.
-	  gtm_alloc_action* a_parent = cb_data->parent->insert(key);
-	  *a_parent = *a;
-	}
-  else
-	{
-	  // ??? We could eliminate a parent allocation that matches this
-	  // memory release, if we had support for removing all accesses
-	  // to this allocation from the transaction's undo and redo logs
-	  // (otherwise, the parent transaction's undo or redo might write to
-	  // data that is already shared again because of calling free()).
-	  // We don't have this support currently, and the benefit of this
-	  // optimization is unknown, so just add it to the parent.
-	  gtm_alloc_action* a_parent;
-	  a_parent = cb_data->parent->insert(key);
-	  *a_parent = *a;
-	}
+  // Add allocations and deallocations to parent.
+  // ??? We could eliminate a (parent) allocation that matches this
+  // a deallocation, if we had support for removing all accesses
+  // to this allocation from the transaction's undo and redo logs
+  // (otherwise, the parent transaction's undo or redo might write to
+  // data that is already shared again because of calling free()).
+  // We don't have this support currently, and the benefit of this
+  // optimization is unknown, so just add it to the parent.
+  gtm_alloc_action* a_parent = cb_data->parent->insert(key);
+  *a_parent = *a;
 }
 }
 
@@ -99,10 +107,15 @@ static void
 commit_allocations_1 (uintptr_t key, gtm_alloc_action *a, void *cb_data)
 {
   void *ptr = (void *)key;
-  uintptr_t revert_p = (uintptr_t) cb_data;
+  bool revert_p = (bool) (uintptr_t) cb_data;
 
-  if (a->allocated == revert_p)
-a->free_fn (ptr);
+  if (revert_p == a->allocated)
+{
+  if (a->free_fn_sz != 0)
+	a->free_fn_sz (ptr, a->sz);
+  else
+	a->free_fn (ptr);
+}
 }
 
 /* Permanently commit allocated memory during transaction.
diff --git a/libitm/alloc_cpp.cc b/libitm/alloc_cpp.cc
index 8514618..13185a7 100644
--- a/libitm/alloc_cpp.cc
+++ b/libitm/alloc_cpp.cc
@@ -35,41 +35,50 @@ using namespace GTM;
 
 #define _ZnwX			S(_Znw,MANGLE_SIZE_T)
 #define _ZnaX			S(_Zna,MANGLE_SIZE_T)
+#define _ZdlPvX			S(_ZdlPv,MANGLE_SIZE_T)
 #define _ZnwXRKSt9nothrow_t	S(S(_Znw,MANGLE_SIZE_T),RKSt9nothrow_t)
 #define _ZnaXRKSt9nothrow_t	S(S(_Zna,MANGLE_SIZE_T),RKSt9nothrow_t)
+#define _ZdlPvXRKSt9nothrow_t	S(S(_ZdlPv,MANGLE_SIZE_T),RKSt9

State of support for the ISO C++ Transactional Memory TS and remanining work

2015-11-08 Thread Torvald Riegel
Hi,

I'd like to summarize the current state of support for the TM TS, and
outline the current plan for the work that remains to complete the
support.

I'm aware we're at the end of stage 1, but I'm confident we can still
finish this work and hope to include it in GCC 6 because:
(1) most of the support is already in GCC, and we have a big head start
in the space of TM so it would be unfortunate to not waste that by not
delivering support for the TM TS,
(2) this is a TS and support for it is considered experimental,
(3) most of the affected code is in libitm or the compiler's TM passes,
which has to be enabled explicitly by the user.

Currently, we have complete support for the syntax and all necessary
instrumentation except the exception handling bits listed below.  libitm
has a good set of STM and HTM-based algorithms.


What is missing on the compiler side is essentially a change of how we
support atomic_noexcept and atomic_cancel, in particular exception
handling.  Instead of just using a finally block as done currently, the
compiler need to build a catch clause so that it can actively intercept
exceptions that escape an atomic_noexcept or atomic_cancel.  For
atomic_noexcept, the compiler needs to include a call to abort() in the
catch clause.


For atomic_cancel, it needs to call ITM_commitTransactionEH in the catch
clause, and use NULL as exception argument.  This can then be used by
libitm to look at the currently being handled exception and (a) check
whether the type support transaction cancellation as specified by the TS
and (b) pick out the allocations that belong to this exception and roll
back everything else before rethrowing this exception.

For (a), it's probably best to place this check into libstdc++
(specifically, libsupc++ I suppose) because support for transaction
cancellation is a property that library parts of the standard (or the
TS) require, and that has to match the implementation in libstdc++.
Attached is a patch by Jason that implements this check.  This adds one
symbol, which should be okay we hope.

For (b), our plan is to track the additional allocations that happen
when during construction of the exception types that support
cancellation (eg, creating the what() string for logic_error).  There
are several ways to do that, one of that being that we create custom
transactional clones of those constructors that tell libitm that either
such a constructor is currently running or explicitly list the
allocations that have been made by the constructor; eventually, we would
always (copy) construct into memory returned by cxa_allocate_exception,
which then makes available the necessary undo information when such an
exception is handled in libitm.


The other big piece of missing support is making sure that the functions
that are specified in the TS as transaction_safe are indeed that.  I
believe we do not need to actually add such annotations to any libstdc++
functions that are already transaction-safe and completely defined in
headers -- those functions are implicitly transaction-safe, and we can
thus let the compiler isntrument them at the point of use inside of a
transaction.

If a supposedly transaction-safe function is not defined in a header,
we'd need a transaction_safe annotation at the declaration.  Jason has
implemented the TM TS feature test macro, so we can only add the
annotation if the user has enabled support for the TM TS in the
respective compilation process.
We also need ensure that there is a transaction clode of the function.
This will add symbols to libstdc++, but these all have their own special
prefix in the mangled name.  I'd like to get feedback on how to best
trigger the insturmentation and make it a part of a libstdc++ build.
(If that would show to be too problematic, we could still fall back to
writing transacitonal clones manually.)
For the clones of the constructors of the types that support
cancellation, I suppose manually written clones might be easier than
automatic instrumentation.

I've not yet created tests for the full list of functions specified as
transaction-safe in the TS, but my understanding is that this list was
created after someone from the ISO C++ TM study group looked at libstdc
++'s implementation and investigated which functions might be feasible
to be declared transaction-safe in it.

I'm looking forward to your feedback.

Thanks,

Torvald
// -*- C++ -*- GNU C++ atomic_cancel support.
// Copyright (C) 2015 Free Software Foundation, Inc.
//
// This file is part of GCC.
//
// GCC is free software; you can redistribute it and/or modify
// it under the terms of the GNU General Public License as published by
// the Free Software Foundation; either version 3, or (at your option)
// any later version.
//
// GCC is distributed in the hope that it will be useful,
// but WITHOUT ANY WARRANTY; without even the implied warranty of
// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
// GNU General Public License for more details.
//
// 

Re: [PATCH] libitm: Support sized delete.

2015-11-03 Thread Torvald Riegel
On Fri, 2015-10-30 at 10:19 -0700, Richard Henderson wrote:
> >  #define _ZnwX  S(_Znw,MANGLE_SIZE_T)
> >  #define _ZnaX  S(_Zna,MANGLE_SIZE_T)
> > +#define _ZdlPvXS(_ZdlPv,MANGLE_SIZE_T)
> >  #define _ZnwXRKSt9nothrow_tS(S(_Znw,MANGLE_SIZE_T),RKSt9nothrow_t)
> >  #define _ZnaXRKSt9nothrow_tS(S(_Zna,MANGLE_SIZE_T),RKSt9nothrow_t)
> > +#define _ZdlPvXRKSt9nothrow_t  
> > S(S(_ZdlPv,MANGLE_SIZE_T),RKSt9nothrow_t)

These are symbols that are provided by libstdc++ if it is available
(otherwise, we use dummy implementations, see the other parts of the
patch)...
  
> >  #define _ZGTtnwX   S(_ZGTtnw,MANGLE_SIZE_T)
> >  #define _ZGTtnaX   S(_ZGTtna,MANGLE_SIZE_T)
> > +#define _ZGTtdlPvX S(_ZGTtdlPv,MANGLE_SIZE_T)
> >  #define _ZGTtnwXRKSt9nothrow_t 
> > S(S(_ZGTtnw,MANGLE_SIZE_T),RKSt9nothrow_t)
> >  #define _ZGTtnaXRKSt9nothrow_t 
> > S(S(_ZGTtna,MANGLE_SIZE_T),RKSt9nothrow_t)
> > +#define _ZGTtdlPvXRKSt9nothrow_t 
> > S(S(_ZGTtdlPv,MANGLE_SIZE_T),RKSt9nothrow_t)
> 
> One more thing... Why are there 4 new symbols here...
> 
> > +/* Wrap: operator delete(void* ptr, std::size_t sz)  */
> > +void
> > +_ZGTtdlPvX (void *ptr, size_t sz)
> > +{
> > +  if (ptr)
> > +gtm_thr()->forget_allocation (ptr, sz, _ZdlPvX);
> > +}
> > +
> > +/* Wrap: operator delete (void *ptr, std::size_t sz, const 
> > std::nothrow_t&)  */
> > +void
> > +_ZGTtdlPvXRKSt9nothrow_t (void *ptr, size_t sz, c_nothrow_p nt UNUSED)
> > +{
> > +  if (ptr)
> > +gtm_thr()->forget_allocation (ptr, sz, delsz_opnt);
> > +}
> > +

... while those are the transactional wrappers that we actually want to
provide and export from libitm.

I'll probably revise the patch in a v2 though depending on how we decide
to handle allocations of exception objects.



Re: [PATCH] libitm: Support sized delete.

2015-10-30 Thread Torvald Riegel
On Thu, 2015-10-29 at 12:38 -0700, Richard Henderson wrote:
> On 10/29/2015 11:19 AM, Torvald Riegel wrote:
> > diff --git a/libitm/libitm.map b/libitm/libitm.map
> > index 21bcfdf..7fc9a41 100644
> > --- a/libitm/libitm.map
> > +++ b/libitm/libitm.map
> > @@ -168,10 +168,12 @@ LIBITM_1.0 {
> > _ZGTtnw?;
> > _ZGTtna?;
> > _ZGTtdlPv;
> > +   _ZGTtdlPv?;
> > _ZGTtdaPv;
> > _ZGTtnw?RKSt9nothrow_t;
> > _ZGTtna?RKSt9nothrow_t;
> > _ZGTtdlPvRKSt9nothrow_t;
> > +   _ZGTtdlPv?RKSt9nothrow_t;
> > _ZGTtdaPvRKSt9nothrow_t;
> >  
> > _ITM_cxa_allocate_exception;
> 
> Everything looks good except for this part.  The new symbols need to go into a
> new symbol version.  C.f. libatomic.map for the syntax.

Ah, thanks.  OK in the updated patch that's attached?

I've also looked at the sized delete paper again, and we should be able
to called the underlying unsized delete even if the compiler issued a
call to the sized delete (ie, to answer my own question).  However, it's
not difficult or too much overhead to call the right version, so I'll
just keep the patch the way it is.
commit 1125e1b96cddf71b907ff382898239f09410d48e
Author: Torvald Riegel <trie...@redhat.com>
Date:   Thu Oct 29 18:52:20 2015 +0100

Support sized delete.

This adds transactional clones of the sized version of operator delete.

diff --git a/libitm/alloc.cc b/libitm/alloc.cc
index bb292da..a72848d 100644
--- a/libitm/alloc.cc
+++ b/libitm/alloc.cc
@@ -37,6 +37,7 @@ gtm_thread::record_allocation (void *ptr, void (*free_fn)(void *))
 
   a->free_fn = free_fn;
   a->allocated = true;
+  a->sized_delete = false;
 }
 
 void
@@ -50,6 +51,23 @@ gtm_thread::forget_allocation (void *ptr, void (*free_fn)(void *))
 
   a->free_fn = free_fn;
   a->allocated = false;
+  a->sized_delete = false;
+}
+
+void
+gtm_thread::forget_allocation (void *ptr, size_t sz,
+			   void (*free_fn_sz)(void *, size_t))
+{
+  uintptr_t iptr = (uintptr_t) ptr;
+
+  gtm_alloc_action *a = this->alloc_actions.find(iptr);
+  if (a == 0)
+a = this->alloc_actions.insert(iptr);
+
+  a->free_fn_sz = free_fn_sz;
+  a->allocated = false;
+  a->sized_delete = true;
+  a->sz = sz;
 }
 
 namespace {
@@ -102,7 +120,12 @@ commit_allocations_1 (uintptr_t key, gtm_alloc_action *a, void *cb_data)
   uintptr_t revert_p = (uintptr_t) cb_data;
 
   if (a->allocated == revert_p)
-a->free_fn (ptr);
+{
+  if (a->sized_delete)
+	a->free_fn_sz (ptr, a->sz);
+  else
+	a->free_fn (ptr);
+}
 }
 
 /* Permanently commit allocated memory during transaction.
diff --git a/libitm/alloc_cpp.cc b/libitm/alloc_cpp.cc
index 8514618..13185a7 100644
--- a/libitm/alloc_cpp.cc
+++ b/libitm/alloc_cpp.cc
@@ -35,41 +35,50 @@ using namespace GTM;
 
 #define _ZnwX			S(_Znw,MANGLE_SIZE_T)
 #define _ZnaX			S(_Zna,MANGLE_SIZE_T)
+#define _ZdlPvX			S(_ZdlPv,MANGLE_SIZE_T)
 #define _ZnwXRKSt9nothrow_t	S(S(_Znw,MANGLE_SIZE_T),RKSt9nothrow_t)
 #define _ZnaXRKSt9nothrow_t	S(S(_Zna,MANGLE_SIZE_T),RKSt9nothrow_t)
+#define _ZdlPvXRKSt9nothrow_t	S(S(_ZdlPv,MANGLE_SIZE_T),RKSt9nothrow_t)
 
 #define _ZGTtnwX		S(_ZGTtnw,MANGLE_SIZE_T)
 #define _ZGTtnaX		S(_ZGTtna,MANGLE_SIZE_T)
+#define _ZGTtdlPvX		S(_ZGTtdlPv,MANGLE_SIZE_T)
 #define _ZGTtnwXRKSt9nothrow_t	S(S(_ZGTtnw,MANGLE_SIZE_T),RKSt9nothrow_t)
 #define _ZGTtnaXRKSt9nothrow_t	S(S(_ZGTtna,MANGLE_SIZE_T),RKSt9nothrow_t)
+#define _ZGTtdlPvXRKSt9nothrow_t S(S(_ZGTtdlPv,MANGLE_SIZE_T),RKSt9nothrow_t)
 
 /* Everything from libstdc++ is weak, to avoid requiring that library
to be linked into plain C applications using libitm.so.  */
 
 extern "C" {
 
-extern void *_ZnwX (size_t) __attribute__((weak));
-extern void _ZdlPv (void *) __attribute__((weak));
-extern void *_ZnaX (size_t) __attribute__((weak));
-extern void _ZdaPv (void *) __attribute__((weak));
+extern void *_ZnwX  (size_t) __attribute__((weak));
+extern void _ZdlPv  (void *) __attribute__((weak));
+extern void _ZdlPvX (void *, size_t) __attribute__((weak));
+extern void *_ZnaX  (size_t) __attribute__((weak));
+extern void _ZdaPv  (void *) __attribute__((weak));
 
 typedef const struct nothrow_t { } *c_nothrow_p;
 
 extern void *_ZnwXRKSt9nothrow_t (size_t, c_nothrow_p) __attribute__((weak));
 extern void _ZdlPvRKSt9nothrow_t (void *, c_nothrow_p) __attribute__((weak));
+extern void _ZdlPvXRKSt9nothrow_t
+(void *, size_t, c_nothrow_p) __attribute__((weak));
 extern void *_ZnaXRKSt9nothrow_t (size_t, c_nothrow_p) __attribute__((weak));
 extern void _ZdaPvRKSt9nothrow_t (void *, c_nothrow_p) __attribute__((weak));
 
 #if !defined (HAVE_ELF_STYLE_WEAKREF) 
-void *_ZnwX (size_t) { return NULL; }
-void _ZdlPv (void *) { return; }
-void *_ZnaX (size_t) { return NULL; }
-void _ZdaPv (void *) { return; }
-
-void *_ZnwXRKSt9nothrow_t (size_t, c_nothrow_p) { return NULL; }
-void _ZdlPvRKSt

[PATCH] libitm: Support sized delete.

2015-10-29 Thread Torvald Riegel
This patch supports the sized variants of operator delete.
Tested on x86_64-linux.

Is this OK or is it completely sufficient to just call non-sized
operator delete from libitm even if the program has a sized delete?  (If
the latter is the case, I would just remove the special handling for
that and use non-sized dlete under the covers.)

2015-10-29  Torvald Riegel  <trie...@redhat.com>

* alloc_cpp.cc (_ZdlPvX, _ZdlPvXRKSt9nothrow_t, _ZGTtdlPvX,
_ZGTtdlPvXRKSt9nothrow_t, delsz_opnt): New.
* libitm.map: Add _ZGTtdlPvX and _ZGTtdlPvXRKSt9nothrow_t.
* libitm_i.h (gtm_alloc_action): Add free_fn_sz, sz, sized_delete
members.
(gtm_thread::forget_allocations): New overload with size_t argument.
* alloc.c (gtm_thread::forget_allocation): Define new overload and
adapt existing one.
(gtm_thread::record_allocation): Adapt.
* testsuite/libitm.c++/newdelete.C: New.

commit f0d6b07a62681ad3cd477cf3c75904ee22baf75e
Author: Torvald Riegel <trie...@redhat.com>
Date:   Thu Oct 29 18:52:20 2015 +0100

Support sized delete.

This adds transactional clones of the sized version of operator delete.

diff --git a/libitm/alloc.cc b/libitm/alloc.cc
index bb292da..a72848d 100644
--- a/libitm/alloc.cc
+++ b/libitm/alloc.cc
@@ -37,6 +37,7 @@ gtm_thread::record_allocation (void *ptr, void 
(*free_fn)(void *))
 
   a->free_fn = free_fn;
   a->allocated = true;
+  a->sized_delete = false;
 }
 
 void
@@ -50,6 +51,23 @@ gtm_thread::forget_allocation (void *ptr, void 
(*free_fn)(void *))
 
   a->free_fn = free_fn;
   a->allocated = false;
+  a->sized_delete = false;
+}
+
+void
+gtm_thread::forget_allocation (void *ptr, size_t sz,
+  void (*free_fn_sz)(void *, size_t))
+{
+  uintptr_t iptr = (uintptr_t) ptr;
+
+  gtm_alloc_action *a = this->alloc_actions.find(iptr);
+  if (a == 0)
+a = this->alloc_actions.insert(iptr);
+
+  a->free_fn_sz = free_fn_sz;
+  a->allocated = false;
+  a->sized_delete = true;
+  a->sz = sz;
 }
 
 namespace {
@@ -102,7 +120,12 @@ commit_allocations_1 (uintptr_t key, gtm_alloc_action *a, 
void *cb_data)
   uintptr_t revert_p = (uintptr_t) cb_data;
 
   if (a->allocated == revert_p)
-a->free_fn (ptr);
+{
+  if (a->sized_delete)
+   a->free_fn_sz (ptr, a->sz);
+  else
+   a->free_fn (ptr);
+}
 }
 
 /* Permanently commit allocated memory during transaction.
diff --git a/libitm/alloc_cpp.cc b/libitm/alloc_cpp.cc
index 8514618..13185a7 100644
--- a/libitm/alloc_cpp.cc
+++ b/libitm/alloc_cpp.cc
@@ -35,41 +35,50 @@ using namespace GTM;
 
 #define _ZnwX  S(_Znw,MANGLE_SIZE_T)
 #define _ZnaX  S(_Zna,MANGLE_SIZE_T)
+#define _ZdlPvXS(_ZdlPv,MANGLE_SIZE_T)
 #define _ZnwXRKSt9nothrow_tS(S(_Znw,MANGLE_SIZE_T),RKSt9nothrow_t)
 #define _ZnaXRKSt9nothrow_tS(S(_Zna,MANGLE_SIZE_T),RKSt9nothrow_t)
+#define _ZdlPvXRKSt9nothrow_t  S(S(_ZdlPv,MANGLE_SIZE_T),RKSt9nothrow_t)
 
 #define _ZGTtnwX   S(_ZGTtnw,MANGLE_SIZE_T)
 #define _ZGTtnaX   S(_ZGTtna,MANGLE_SIZE_T)
+#define _ZGTtdlPvX S(_ZGTtdlPv,MANGLE_SIZE_T)
 #define _ZGTtnwXRKSt9nothrow_t S(S(_ZGTtnw,MANGLE_SIZE_T),RKSt9nothrow_t)
 #define _ZGTtnaXRKSt9nothrow_t S(S(_ZGTtna,MANGLE_SIZE_T),RKSt9nothrow_t)
+#define _ZGTtdlPvXRKSt9nothrow_t S(S(_ZGTtdlPv,MANGLE_SIZE_T),RKSt9nothrow_t)
 
 /* Everything from libstdc++ is weak, to avoid requiring that library
to be linked into plain C applications using libitm.so.  */
 
 extern "C" {
 
-extern void *_ZnwX (size_t) __attribute__((weak));
-extern void _ZdlPv (void *) __attribute__((weak));
-extern void *_ZnaX (size_t) __attribute__((weak));
-extern void _ZdaPv (void *) __attribute__((weak));
+extern void *_ZnwX  (size_t) __attribute__((weak));
+extern void _ZdlPv  (void *) __attribute__((weak));
+extern void _ZdlPvX (void *, size_t) __attribute__((weak));
+extern void *_ZnaX  (size_t) __attribute__((weak));
+extern void _ZdaPv  (void *) __attribute__((weak));
 
 typedef const struct nothrow_t { } *c_nothrow_p;
 
 extern void *_ZnwXRKSt9nothrow_t (size_t, c_nothrow_p) __attribute__((weak));
 extern void _ZdlPvRKSt9nothrow_t (void *, c_nothrow_p) __attribute__((weak));
+extern void _ZdlPvXRKSt9nothrow_t
+(void *, size_t, c_nothrow_p) __attribute__((weak));
 extern void *_ZnaXRKSt9nothrow_t (size_t, c_nothrow_p) __attribute__((weak));
 extern void _ZdaPvRKSt9nothrow_t (void *, c_nothrow_p) __attribute__((weak));
 
 #if !defined (HAVE_ELF_STYLE_WEAKREF) 
-void *_ZnwX (size_t) { return NULL; }
-void _ZdlPv (void *) { return; }
-void *_ZnaX (size_t) { return NULL; }
-void _ZdaPv (void *) { return; }
-
-void *_ZnwXRKSt9nothrow_t (size_t, c_nothrow_p) { return NULL; }
-void _ZdlPvRKSt9nothrow_t (void *, c_nothrow_p) { return; }
-void *_ZnaXRKSt9nothrow_t (size_t, c_nothrow_p) { return NULL; }
-void _ZdaPvRKSt9nothrow_t (vo

Re: [PATCH, rs6000] Add memory barriers to tbegin, tend, etc.

2015-10-09 Thread Torvald Riegel
On Fri, 2015-10-09 at 11:52 -0500, Peter Bergner wrote:
> On Fri, 2015-10-09 at 16:41 +0200, Torvald Riegel wrote:
> > On Thu, 2015-09-03 at 16:58 -0500, Peter Bergner wrote:
> >> +Note that the semantics of the above HTM builtins are required to mimic 
> >> the
> >> +locking semantics used for critical sections.  Builtins that are used to
> >> +create a new transaction or restart a suspended transaction (specifically 
> >> any
> >> +builtin that changes the state from non-transactional to transactional)
> > 
> > Those builtins could be nested in transactions yet would need to provide
> > the same semantics at least as far as the compiler is concerned, so
> > maybe change "specifically" to "for example"?
> 
> Ah, good point.  I forgot about the nested example.  In that case, how about
> we just drop the text within the ()'s instead?  I'm not sure we need it
> given your suggested addition below.

Sounds good to me.

> > I haven't thought enough about txn suspend/resume/abort to make a
> > proposal how their required semantics would be specified best.  Maybe we
> > should call this out explicitly, or maybe the sentences above are
> > sufficient because they refer to HTM instructions generally. 
> 
> Well we do mention that we need acquire semantics not just for instructions
> that start transactions, but also for instructions that resume transactions.
> Similar for release and end/suspend, so I guess I'd lean towards what we have
> is sufficient.

OK.

> 
> > Maybe the second "HTM instructions" should be "builtins" too, I don't
> > know.
> 
> Well the builtins expand into multiple hardware instructions and the
> memory barriers are only attached to the HTM instructions and not the
> other instructions that are generated by the builtins, so I think
> HTM instructions is probably more correct here.  But if you think
> builtins is better, I can change it.

No, it's exactly this aspect of the implementation that I didn't know
about, and thus wasn't sure.  Sounds all good.


> Thanks for the review!  I've attached the changes to the documentation below.
> Is this better?

Yes, thanks!



Re: [PATCH, rs6000] Add memory barriers to tbegin, tend, etc.

2015-10-09 Thread Torvald Riegel
Sorry for the much delayed response.  I've been sick and am slowly
catching up.

On Thu, 2015-09-03 at 16:58 -0500, Peter Bergner wrote:
> On a glibc thread discussing this issue, Torvald also asked that I add
> documention describing the memory consistency semantics the HTM instructions
> should have, so I added a blurb about that.  Torvald, is the text below
> what you were looking for?

> Index: gcc/doc/extend.texi
> ===
> --- gcc/doc/extend.texi   (revision 227308)
> +++ gcc/doc/extend.texi   (working copy)
> @@ -16030,6 +16030,23 @@
>  unsigned int __builtin_tsuspend (void)
>  @end smallexample
>  
> +Note that the semantics of the above HTM builtins are required to mimic the
> +locking semantics used for critical sections.  Builtins that are used to
> +create a new transaction or restart a suspended transaction (specifically any
> +builtin that changes the state from non-transactional to transactional)

Those builtins could be nested in transactions yet would need to provide
the same semantics at least as far as the compiler is concerned, so
maybe change "specifically" to "for example"?

> must
> +have lock acquisition like semantics while those builtins that end or suspend
> +a transaction (ie, changes the state from transactional to non-transactional)
> +must have lock release like semantics.

I would be a bit more specific just to avoid confusion about how the
locks are implemented or what specific semantics they have.  What about
adding this or something similar:

"Specifically, this must mimic lock semantics as specified by C++11, for
example: Lock acquisition is as-if an execution of
__atomic_exchange_n(,1,__ATOMIC_ACQUIRE) that returns 0, and
lock release is as-if an execution of
__atomic_store(,0,__ATOMIC_RELEASE), with globallock being an
implicit implementation-defined lock used for all transactions."

> The HTM instructions associated with
> +with the builtins inherently provide the correct acquisition and release
> +semantics required.

I would say "hardware barriers" or "hardware fences" instead of
"semantics" to clarify that this is just one part.

> However, the compiler must be prohibited from moving
> +loads and stores across the HTM instructions.  This has been accomplished
> +by adding memory barriers to the associated HTM instructions.

I think it would be good to make the requirement tighter, so that we do
not promise too much.  What about changing this to:

"However, the compiler must also be prohibited from moving loads and
stores across the builtins in a way that would violate their semantics.
This has been accomplished by adding memory barriers to the associated
HTM instructions (which is a conservative approach to provide acquire
and release semantics)."

I haven't thought enough about txn suspend/resume/abort to make a
proposal how their required semantics would be specified best.  Maybe we
should call this out explicitly, or maybe the sentences above are
sufficient because they refer to HTM instructions generally. 

Maybe the second "HTM instructions" should be "builtins" too, I don't
know.

> Earlier versions
> +of the compiler did not treat the HTM instructions as memory barriers.
> +A @code{__TM_FENCE__} macro has been added, which can be used to determine
> +whether the current compiler treats HTM instructions as memory barriers or 
> not.

"builtins" instead of "HTM instructions"?

> +This allows the user to explicitly add memory barriers to their code when
> +using an older version of the compiler.
> +
>  The following set of built-in functions are available to gain access
>  to the HTM specific special purpose registers.

Thanks for documenting the semantics!



Re: [gomp4.1] Doacross library implementation

2015-10-08 Thread Torvald Riegel
On Thu, 2015-09-24 at 20:32 +0200, Jakub Jelinek wrote:
> Torvald, can you please have a look at it, if I got all the atomics / memory
> models right?

More detailed comments below, but in general, I'd really suggest to add
more code comments for the synchronization parts.  In the end, the level
of detail of documentation of libgomp is your decision, but, for
example, the lack of comments in synchronization code in glibc has made
maintaining this code and fixing issues in it very costly.  It has also
been hard to understand for many.

My suggestion would be both to (1) document the high-level, abstract
synchronization scheme and (2) how that scheme is implemented.  The
first point is important in my experience because typically, the
high-level scheme and the actual thinking behind it (or, IOW, the intent
of the original author) is much harder to reconstruct in case of
concurrent code than it is for sequential code; you can't just simply
follow the program along line by line, but have to consider
interleavings.

Even if the synchronization problem to solve is relatively
straight-forward as in thise case (ie, one-directional waiting), it's
worth IMO to do point (1).  If it is simple, the high-level description
will be simple, and it will assure others that one really has to just
solve that and that the original author wasn't aware of any other
issues.

Regarding point (2), what we're doing in glibc now is basically to
document how the specific things we do in the code make sure we
implement the high-level scheme.  So we'd say things like "this CAS here
now ensures consensus among the threads A, B, C".  For memory orders
specifically, it helps to document why they are sufficient and
necessary; this helps others understand the code, so that they don't
need to go hunting through all of the code looking for other accesses to
the same memory locations to be able to reconstruct the intended
happens-before relations.  I have some examples below.
Also, given that you don't use explicit atomic types but just atomic
operations, it's IMO a good idea to document which variables are
supposed to be accessed atomically so it becomes easier to not violate
the data-race-freedom requirement accidentally.

> The testcase obviously is not a good benchmark, we'll need
> some more realistic one.  But obviously when asking for oversubscription, it
> is quite expensive.  The question is how to implement a non-busy waiting
> fallback, whether we put some mutex and queue guarded by the mutex into the
> same (or some other?) cache-line, or just use atomics to queue it and how to
> make it cheap for the case where busy waiting is sufficient.

Atomics and futexes is probably the best approach if you want
performance; at least we need some efficient way for post() to figure
out that there are indeed waiters, and we don't want to use a lock for
that.

What specific approach to use is a question of how much time we want to
spend on this.  It's hard to estimate how often we'd really need a
blocking wait in practice, though.

> I'd say
> it should be sufficient to implement non-busy waiting in the flattened
> variant.

Sounds fine to me.

> --- libgomp/ordered.c.jj  2015-09-18 18:36:42.0 +0200
> +++ libgomp/ordered.c 2015-09-24 18:20:28.286244397 +0200
> @@ -252,14 +254,146 @@ GOMP_ordered_end (void)
>  {
>  }
>  
> +/* DOACROSS initialization.  */
> +
> +#define MAX_COLLAPSED_BITS (__SIZEOF_LONG__ * __CHAR_BIT__)
> +
> +void
> +gomp_doacross_init (unsigned ncounts, long *counts, long chunk_size)
> +{
> +  struct gomp_thread *thr = gomp_thread ();
> +  struct gomp_team *team = thr->ts.team;
> +  struct gomp_work_share *ws = thr->ts.work_share;
> +  unsigned int i, bits[MAX_COLLAPSED_BITS], num_bits = 0;
> +  unsigned long ent, num_ents, elt_sz, shift_sz;
> +  struct gomp_doacross_work_share *doacross;
> +
> +  if (team == NULL || team->nthreads == 1)
> +return;
> +
> +  for (i = 0; i < ncounts; i++)
> +{
> +  /* If any count is 0, GOMP_doacross_{post,wait} can't be called.  */
> +  if (counts[i] == 0)
> + return;
> +
> +  if (num_bits <= MAX_COLLAPSED_BITS)
> + {
> +   unsigned int this_bits;
> +   if (counts[i] == 1)
> + this_bits = 1;
> +   else
> + this_bits = __SIZEOF_LONG__ * __CHAR_BIT__
> + - __builtin_clzl (counts[i] - 1);
> +   if (num_bits + this_bits <= MAX_COLLAPSED_BITS)
> + {
> +   bits[i] = this_bits;
> +   num_bits += this_bits;
> + }
> +   else
> + num_bits = MAX_COLLAPSED_BITS + 1;
> + }
> +}
> +
> +  if (ws->sched == GFS_STATIC)
> +num_ents = team->nthreads;
> +  else
> +num_ents = (counts[0] - 1) / chunk_size + 1;
> +  if (num_bits <= MAX_COLLAPSED_BITS)
> +{
> +  elt_sz = sizeof (unsigned long);
> +  shift_sz = ncounts * sizeof (unsigned int);
> +}
> +  else
> +{
> +  elt_sz = sizeof (unsigned long) * ncounts;
> +  shift_sz = 0;
> +}
> +  

Re: RFC: PATCH for front end parts of C++ transactional memory TS

2015-10-02 Thread Torvald Riegel
On Fri, 2015-10-02 at 14:13 -0400, Jason Merrill wrote:
> The patch also doesn't attempt to do anything about the library.  The 
> second patch sets transaction_safe on various built-ins, but without the 
> library support this just means references to undefined symbols.

For some of the builtins, we already have functions in the libitm ABI:
malloc and free, memcpy, memmove, memset.  See libitm/libitm.h.  Can you
transform those same as is already done for the respective C functions?

Other builtins should probably just transaction_pure: abort, I believe,
and I guess the math functions too (although I haven't thought about FP
exceptions at all so far).

For other builtins, I guess we should either map them to the
transactional variants of the respective C functions if we support those
already (but the builtins might not change errno whether the C functions
might do), or extend the libitm ABI with more _ITM_ functions.
If we just need the builtins for libstdc++, just making the builtins
tm_safe might be easier.

Thoughts?

> The 
> third patch is a beginning of libstdc++ support, which may or may not be 
> useful to Jonathan or Torvald in completing that support.

I'd at least need guidance from Jon how to best integrate the
annotations, and how to best ship transaction variants in libstdc++ (eg,
do they go into a separate library because of TS vs. libstdc++?)
I can add transactional versions for anything that the compiler doesn't
take care of when provided with the source code.  But I think those
cases should be rather rare.



Re: [PATCH] [PING] [PR libitm/61164] Remove redefinition of glibc internal macro __always_inline

2015-08-21 Thread Torvald Riegel
On Mon, 2015-08-17 at 13:16 +0200, Gleb Fotengauer-Malinovskiy wrote:
 On Sun, Aug 16, 2015 at 07:35:17PM +0200, Torvald Riegel wrote:
  On Thu, 2015-06-11 at 14:36 +0300, Gleb Fotengauer-Malinovskiy wrote:
   On Fri, May 15, 2015 at 03:04:27PM +0200, Torvald Riegel wrote:
On Wed, 2015-05-06 at 17:54 +0300, Gleb Fotengauer-Malinovskiy wrote:
 2015-05-06  Gleb Fotengauer-Malinovskiy  gle...@altlinux.org
 
   PR libitm/61164
   * local_atomic (__always_inline): Rename to...
   (__libitm_always_inline): ... this.

OK.  Thanks.
   
   You are welcome.
   
   It seems still not applied, AFAICS.
  
  I forgot to ask you at Cauldron whether you have completed a copyright
  assignment agreement?
 
 No, I didn't. I'm in progress.
 
   I don't think this is small enough to be a
  trivial patch.
 
 A regular series of repeated changes, such as renaming a symbol, is not
 legally significant even if the symbol has to be renamed in many places.
 https://www.gnu.org/prep/maintain/html_node/Legally-Significant.html#Legally-Significant

OK.  I committed this to trunk for you, with the (tiny change) marker
in the ChangeLog as suggested on this GNU web page.





Re: [PATCH] [PING] [PR libitm/61164] Remove redefinition of glibc internal macro __always_inline

2015-08-16 Thread Torvald Riegel
On Thu, 2015-06-11 at 14:36 +0300, Gleb Fotengauer-Malinovskiy wrote:
 On Fri, May 15, 2015 at 03:04:27PM +0200, Torvald Riegel wrote:
  On Wed, 2015-05-06 at 17:54 +0300, Gleb Fotengauer-Malinovskiy wrote:
   2015-05-06  Gleb Fotengauer-Malinovskiy  gle...@altlinux.org
   
 PR libitm/61164
 * local_atomic (__always_inline): Rename to...
 (__libitm_always_inline): ... this.
  
  OK.  Thanks.
 
 You are welcome.
 
 It seems still not applied, AFAICS.

I forgot to ask you at Cauldron whether you have completed a copyright
assignment agreement?  I don't think this is small enough to be a
trivial patch.



Re: [Patch libstdc++] Rewrite cpu/generic/atomic_word.h

2015-06-12 Thread Torvald Riegel
On Fri, 2015-06-12 at 10:30 +0100, Ramana Radhakrishnan wrote:
 On Fri, Jun 12, 2015 at 10:06 AM, Jonathan Wakely jwak...@redhat.com wrote:
  On 11/06/15 23:56 +0200, Torvald Riegel wrote:
 
   On Fri, 2015-05-22 at 12:37 +0100, Ramana Radhakrishnan wrote:
  I don't think we can remove _GLIBCXX_READ_MEM_BARRIER and
  _GLIBCXX_WRITE_MEM_BARRIER from atomic_word.h even though they are
  superseded by the atomics as it is published in the documentation as
  available macros.
 
 
  I see.  We should at least update the documentation of those, as the
  current one isn't a really portable specification.  If we can, I'd
  deprecate them.  Jonathan, what do you think?
 
 
  Yes, I'm in favour of deprecating them. They are GCC-specific anyway,
  so there is no reason to prefer them to std::atomic_ or __atomic_
  fences.
 
 I'll treat it as a follow-up.

Thanks.

 Can I get an ack for this patch though ?

The above comment was of a general nature, not specifically about this
patch or meant as a precondition for this patch.  Sorry if that wasn't
clear :)
To me, your patch looks good.



Re: [Patch libstdc++] Rewrite cpu/generic/atomic_word.h

2015-06-11 Thread Torvald Riegel
On Tue, 2015-06-09 at 10:50 +0100, Ramana Radhakrishnan wrote:
 
 On 22/05/15 17:56, Torvald Riegel wrote:
  On Fri, 2015-05-22 at 12:37 +0100, Ramana Radhakrishnan wrote:
  Hi,
 
  While writing atomic_word.h for the ARM backend to fix PR target/66200
  I
  thought it would make more sense to write it all up with atomic
  primitives instead of providing various fragile bits of inline
  asssembler. Thus this patch came about. I intend to ask for a
  specialized version of this to be applied for the ARM backend in any
  case after testing completes.
 
  If this goes in as a cleanup I expect all the ports to be deleting
  their
  atomic_word.h implementation in the various ports directories and
  I'll
  follow up with patches asking port maintainers to test and apply for
  each of the individual cases if this is deemed to be good enough.
 
  Could you use C++11 atomics right away instead of adapting the wrappers?
 
 
 I spent some time trying to massage guard.cc into using C++11 atomics 
 but it was just easier to write it with the builtins - I consider this 
 to be a step improvement compared to where we are currently.

Fair enough.

 I don't think we can remove _GLIBCXX_READ_MEM_BARRIER and 
 _GLIBCXX_WRITE_MEM_BARRIER from atomic_word.h even though they are 
 superseded by the atomics as it is published in the documentation as 
 available macros.

I see.  We should at least update the documentation of those, as the
current one isn't a really portable specification.  If we can, I'd
deprecate them.  Jonathan, what do you think?

 It was obvious that alpha, powerpc, aix, ia64 could just fall back to 
 the standard implementations. The cris and sparc implementations have 
 different types for _Atomic_word from generic and the rest - my guess is 
 sparc can remove the _GLIBCXX_READ_MEM_BARRIER and 
 _GLIBCXX_WRITE_MEM_BARRIER but I have no way of testing this is sane.
 
 I'll leave that to the respective target maintainers for sparc and cris 
 to deal as appropriate.

LGTM.



Re: [Patch libstdc++] Rewrite cpu/generic/atomic_word.h

2015-05-22 Thread Torvald Riegel
On Fri, 2015-05-22 at 12:37 +0100, Ramana Radhakrishnan wrote:
 Hi,
 
 While writing atomic_word.h for the ARM backend to fix PR target/66200
 I 
 thought it would make more sense to write it all up with atomic 
 primitives instead of providing various fragile bits of inline 
 asssembler. Thus this patch came about. I intend to ask for a 
 specialized version of this to be applied for the ARM backend in any 
 case after testing completes.
 
 If this goes in as a cleanup I expect all the ports to be deleting
 their 
 atomic_word.h implementation in the various ports directories and
 I'll 
 follow up with patches asking port maintainers to test and apply for 
 each of the individual cases if this is deemed to be good enough.

Could you use C++11 atomics right away instead of adapting the wrappers?

I think the more non-C++11 atomic operations/wrappers we can remove the
better.

 diff --git a/libstdc++-v3/config/cpu/generic/atomic_word.h b/libstdc
 ++-v3/config/cpu/generic/atomic_word.h
 index 19038bb..bedce31 100644
 --- a/libstdc++-v3/config/cpu/generic/atomic_word.h
 +++ b/libstdc++-v3/config/cpu/generic/atomic_word.h
 @@ -29,19 +29,46 @@
  #ifndef _GLIBCXX_ATOMIC_WORD_H
  #define _GLIBCXX_ATOMIC_WORD_H 1
  
 +#include bits/cxxabi_tweaks.h
 +
  typedef int _Atomic_word;
  
 -// Define these two macros using the appropriate memory barrier for
 the target.
 -// The commented out versions below are the defaults.
 -// See ia64/atomic_word.h for an alternative approach.
 +
 +namespace __gnu_cxx _GLIBCXX_VISIBILITY(default)
 +{
 +  // Test the first byte of __g and ensure that no loads are hoisted
 across
 +  // the test.

That comment is not quite correct.  I'd just say that this is a
memory_order_acquire load and a comparison.

 +  inline bool
 +  __test_and_acquire (__cxxabiv1::__guard *__g)
 +  {
 +unsigned char __c;
 +unsigned char *__p = reinterpret_castunsigned char *(__g);
 +__atomic_load (__p, __c,  __ATOMIC_ACQUIRE);
 +return __c != 0;
 +  }
 +
 +  // Set the first byte of __g to 1 and ensure that no stores are
 sunk
 +  // across the store.

Likewise; I'd just say this is a memory_order_release store with 1 as
value.

 +  inline void
 +  __set_and_release (__cxxabiv1::__guard *__g)
 +  {
 +unsigned char *__p = reinterpret_castunsigned char *(__g);
 +unsigned char val = 1;
 +__atomic_store (__p, val, __ATOMIC_RELEASE);
 +  }
 +
 +#define _GLIBCXX_GUARD_TEST_AND_ACQUIRE(G)
 __gnu_cxx::__test_and_acquire (G)
 +#define _GLIBCXX_GUARD_SET_AND_RELEASE(G)
 __gnu_cxx::__set_and_release (G)
  
  // This one prevents loads from being hoisted across the barrier;
  // in other words, this is a Load-Load acquire barrier.

Likewise; I'd just say that this is an mo_acquire fence.

 -// This is necessary iff TARGET_RELAXED_ORDERING is defined in
 tm.h.  
 -// #define _GLIBCXX_READ_MEM_BARRIER __asm __volatile (:::memory)
 +// This is necessary iff TARGET_RELAXED_ORDERING is defined in tm.h.
 +#define _GLIBCXX_READ_MEM_BARRIER __atomic_thread_fence
 (__ATOMIC_ACQUIRE)
  
  // This one prevents stores from being sunk across the barrier; in
 other
  // words, a Store-Store release barrier.

Likewise; mo_release fence.

 -// #define _GLIBCXX_WRITE_MEM_BARRIER __asm __volatile
 (:::memory)
 +#define _GLIBCXX_WRITE_MEM_BARRIER __atomic_thread_fence
 (__ATOMIC_RELEASE)
 +
 +}
  
  #endif 



Re: [PATCH] Fix memory orders description in atomic ops built-ins docs.

2015-05-22 Thread Torvald Riegel
On Fri, 2015-05-22 at 17:41 +0100, Matthew Wahab wrote:
 On 21/05/15 19:26, Torvald Riegel wrote:
  On Thu, 2015-05-21 at 16:45 +0100, Matthew Wahab wrote:
  On 19/05/15 20:20, Torvald Riegel wrote:
  On Mon, 2015-05-18 at 17:36 +0100, Matthew Wahab wrote:
  Hello,
 
  On 15/05/15 17:22, Torvald Riegel wrote:
  This patch improves the documentation of the built-ins for atomic
  operations.
 
 
 I think we're talking at cross-purposes and not really getting anywhere. I've 
 replied 
 to some of your comments below, but it's mostly a restatement of points 
 already made.

OK.  I have a few more comments below.

  We seem to have different views about the purpose of the manual page. I'm 
  treating it
  as a description of the built-in functions provided by gcc to generate the 
  code
  needed to implement the C++11 model. That is, the built-ins are distinct 
  from C++11
  and their descriptions should be, as far as possible, independent of the 
  methods used
  in the C++11 specification to describe the C++11 memory model.
 
  OK.  But we'd need a *precise* specification of what they do if we'd
  want to make them separate from the C++11 memory model.  And we don't
  have that, would you agree?
 
 There is a difference between the sort of description that is needed for a 
 formal 
 specification and the sort that would be needed for a programmers manual. The 
 best 
 example of this that I can think of is the Standard ML definition 
 (http://sml-family.org). That is a mathematical (so precise) definition that 
 is 
 invaluable if you want an unambiguous specification of the language. But its 
 useless 
 for anybody who just wants to use Standard ML to write programs. For that, 
 you need 
 go to the imprecise descriptions that are given in books about SML and in the 
 documentation for SML compilers and libraries.
 
 The problem with using the formal SML definition is the same as with using 
 the formal 
 C++11 definition: most of it is detail needed to make things in the formal 
 specification come out the right way. That detail, about things that are 
 internal to 
 the definition of the specification, makes it difficult to understand what is 
 intended to be available for the user.

A relation like happens-before is user-facing.  It is how one reasons
about ordering in a multi-threaded execution.  This isn't internal or
for a corner-case like additional-synchronizes-with or one of the
consistency rules.

 The GCC manual seems to me to be aimed more at the people who want to use GCC 
 to 
 write code and I don't think that the patch makes much allowance for them. I 
 do think 
 that more precise statements about the relationship to C++11 are useful to 
 have. Its 
 the sort of constraint that ought to be documented somewhere. But it seems to 
 be more 
 of interest to compiler writers or, at least, to users who are as 
 knowledgeable as 
 compiler writers. A document targeting that group, such as the GCC internals 
 or a GCC 
 wiki-page, would seem to be a better place for the information.
 
 (Another example of the distinction may be the Intel Itanium ABI 
 documentation which 
 has a programmers description of the synchronization primitives and a 
 separate, 
 formal description of their behaviour.)
 
 For what it's worth, my view of how C++11, the __atomics and the machine code 
 line up 
 is that each is a distinct layer. Each layer implements the requirements of 
 the 
 higher (more abstract) layer but is otherwise entirely independent. That's 
 why I 
 think that a description of the __atomic built-in, aimed at compiler users 
 rather 
 than writers and that doesn't expect knowledge of C++11 is desirable and 
 possible.
 
  I'm also concerned that the patch, by describing things in terms of formal 
  C++11
  concepts, makes it more difficult for people to know what the built-ins 
  can be
  expected to do and so make the built-in more difficult to use[..]
 
  I hadn't thought about that possible danger, but that would be right.
  The way I would prefer to counter that is that we add a big fat warning
  to the __sync built-ins that we don't have a precise specification for
  them and that there are several corners of hand-waving and potentially
  further issues, and that this is another reason to prefer the __atomic
  built-ins.  PR 65697 etc. are enough indication for me that we indeed
  lack a proper specification.
 
 Increasing uncertainty about the __sync built-ins wouldn't make people move 
 to 
 equally uncertain __atomic built-ins. There's enough knowledge and use of the 
 __sync 
 builtins to make them a more comfortable choice then the C++11 atomics and in 
 the 
 worst case it would push people to roll their own synchronization functions 
 with 
 assembler or system calls.

I don't buy that.  Sure, some people will be uncomfortable with
anything.  But I don't see how specified in C++11 and C11 is the same
level of uncertainty as we don't have a tight specification.  Users
can pick

Re: [PATCH] Fix memory orders description in atomic ops built-ins docs.

2015-05-21 Thread Torvald Riegel
On Thu, 2015-05-21 at 16:45 +0100, Matthew Wahab wrote:
 On 19/05/15 20:20, Torvald Riegel wrote:
  On Mon, 2015-05-18 at 17:36 +0100, Matthew Wahab wrote:
  Hello,
 
  On 15/05/15 17:22, Torvald Riegel wrote:
  This patch improves the documentation of the built-ins for atomic
  operations.
 
  The memory model to memory order change does improve things but I 
  think that
  the patch has some problems. As it is now, it makes some of the 
  descriptions
  quite difficult to understand and seems to assume more familiarity with 
  details
  of the C++11 specification then might be expected.
 
  I'd say that's a side effect of the C++11 memory model being the
  reference specification of the built-ins.
 
  Generally, the memory order descriptions seem to be targeted towards 
  language
  designers but don't provide for anybody trying to understand how to 
  implement or
  to use the built-ins.
 
  I agree that the current descriptions aren't a tutorial on the C++11
  memory model.  However, given that the model is not GCC-specific, we
  aren't really in a need to provide a tutorial, in the same way that we
  don't provide a C++ tutorial.  Users can pick the C++11 memory model
  educational material of their choice, and we need to document what's
  missing to apply the C++11 knowledge to the built-ins we provide.
 
 
 We seem to have different views about the purpose of the manual page. I'm 
 treating it 
 as a description of the built-in functions provided by gcc to generate the 
 code 
 needed to implement the C++11 model. That is, the built-ins are distinct from 
 C++11 
 and their descriptions should be, as far as possible, independent of the 
 methods used 
 in the C++11 specification to describe the C++11 memory model.

OK.  But we'd need a *precise* specification of what they do if we'd
want to make them separate from the C++11 memory model.  And we don't
have that, would you agree?

It's also not a trivial task, so I wouldn't be optimistic that someone
would offer to write such a specification, and have it cross-checked.

 I understand of course that the __atomics were added in order to support 
 C++11 but 
 that doesn't make them part of C++11 and, since __atomic functions can be 
 made 
 available when C11/C++11 may not be, it seems to make sense to try for 
 stand-alone 
 descriptions.

The compiler can very well provide the C++11 *memory model* without
creating any dependency on the other language/library pieces of C++11 or
C11.  Prior to C++11, multi-threaded executions were not defined by the
standard, so we're not conflicting with anything in prior language
standards, right?

Another way to see this is to say that we just *copy* the C++11 memory
model and use it as the memory model that specifies the behavior of the
atomic built-ins.  That additionally frees us from having to come up
with and maintain our GCC-specific specification of atomics and a memory
model.

 I'm also concerned that the patch, by describing things in terms of formal 
 C++11 
 concepts, makes it more difficult for people to know what the built-ins can 
 be 
 expected to do and so make the built-in more difficult to use There is a 
 danger that 
 rather than take a risk with uncertainty about the behaviour of the 
 __atomics, people 
 will fall-back to the __sync functions simply because their expected 
 behaviour is 
 easier to work out.

I hadn't thought about that possible danger, but that would be right.
The way I would prefer to counter that is that we add a big fat warning
to the __sync built-ins that we don't have a precise specification for
them and that there are several corners of hand-waving and potentially
further issues, and that this is another reason to prefer the __atomic
built-ins.  PR 65697 etc. are enough indication for me that we indeed
lack a proper specification.

 I don't think that linking to external sites will help either, unless people 
 already 
 want to know C++11. Anybody who just wants to (e.g.) add a memory barrier 
 will take 
 one look at the __sync manual page and use the closest match from there 
 instead.

Well, just wants to add a memory barrier is a the start of the
problem.  The same way one needs to understand a hardware memory model
to pick the right HW instruction(s), the same one needs to understand a
programming language memory model to pick a fence and understand its
semantics.

 Note that none of this requires a tutorial of any kind. I'm just suggesting 
 that the 
 manual should describe what behaviour should be expected of the code 
 generated for 
 the functions. For the memory orders, that would mean describing what 
 constraints 
 need to be met by the generated code.

I'd bet that if one describes these constraints correctly, you'll get a
large document -- even if one removes any introductory or explanatory
parts that could make it a tutorial.

It's fairly straight-forward to describe several simple usage patterns
of the atomics (e.g., seq-cst ones, simple acquire/release

Re: [PATCH] add self-tuning to x86 hardware fast path in libitm

2015-05-19 Thread Torvald Riegel
On Mon, 2015-05-18 at 23:27 -0400, Nuno Diegues wrote:
 On Mon, May 18, 2015 at 5:29 PM, Torvald Riegel trie...@redhat.com wrote:
 
  Are there better options for the utility function, or can we tune it to
  be less affected by varying txn length and likelihood of txnal vs.
  nontxnal code?  What are the things we want to avoid with the tuning?  I
  can think of:
  * Not needing to wait for serial txns, or being aborted by a serial txn.
  * Not retrying using HTM too much so that the retry time is larger than
  the scalability we actually gain by having other txns commit
  concurrently.
 
 
 Yes, those are the key points we want to make sure that do not happen.
 
 
 
  Anything else?  Did you consider other utility functions during your
  research?
 
 
 The txnal vs nontxnal is indeed a completely different story. To account for
 this we would need extra book-keeping to count only cycles spent inside
 txnal code. So this would require every thread (or a sample of threads) to
 perform a rdtsc (or equivalent) on every begin/end call rather than the
 current approach of a single rdtsc per optimization round.
 
 With this type of online optimization we found that the algorithm had to be
 very simple and cheap to execute. RDTSC was a good finding to fit this, and
 it enabled us to obtain gains. Other time sources failed to do so.
 
 I do not have, out of the box, a good alternative to offer. I suppose it would
 take some iterations of thinking/experimenting with, just like with any 
 research
 problem :)

So let's iterate on this in parallel with the other changes that we need
to get in place.  I'd prefer to have some more confidence that measuring
txn throughput in epochs is the best way forward.

Here are a few thoughts:

Why do we actually need to measure succeeded transactions?  If a HW txn
is just getting committed without interfering with anything else, is
this really different from, say, two HW txns getting committed?  Aren't
we really just interested in wasted work, or delayed txns?  That may
help taking care of the nontxnal vs. txnal problem.

Measuring committed txns during a time that might otherwise be spent by
a serial txns could be useful to figure out how much other useful work a
serial txn prevents.  But we'd see that as well if we'd just go serial
during the auto-tuning because then concurrent txns will have to wait;
and in this case we could measure it in the slow path of those
concurrent txns (ie, during waiting, where measurement overhead wouldn't
matter as much).

If a serial txn is running, concurrent txns (that wait) are free to sync
and tell the serial how much cost it creates for the concurrent txns.
There, txn length could matter, but we won't find out for real until
after the concurrent ones have run (they could be pretty short, so we
can't simply assume that the concurrent ones are as long as the serial
one, so that simply the number of concurrent ones can be used to
calculate delayed work).

 
  Also, note that the mitigation strategy for rdtsc
  short-comings that you mention in the paper is not applicable in
  general, specifically not in libitm.
 
 
 I suppose you mean the preemption of threads inflating the cycles measured?

Yes, preemption and migration of threads (in case there's no global
sync'ed TSC or similar) -- you mentioned in the paper that you pin
threads to cores...

 This would be similarly a problem to any time source that tries to measure the
 amount of work performed; not sure how we can avoid it in general. Any 
 thoughts?

Not really as long as we keep depending on measuring time in a
light-weight way.  Measuring smaller time intervals could make it less
likely that preemption happens during such an interval, though.

 
  Another issue is that we need to implement the tuning in a portable way.
  You currently make it depend on whether the assembler knows about RTM,
  whereas the rest of the code makes this more portable and defers to
  arch-specific files.  I'd prefer if we could use the tuning on other
  archs too.  But for that, we need to cleanly separate generic from
  arch-specific parts.  That affects magic numbers as well as things like
  rdtsc().
 
 
 Yes, I refrained from adding new calls to the arch-specific files, to
 contain the
 changes mainly. But that is possible and that's part of the feedback I
 was hoping
 to get.

OK.  Let me know if you want further input regarding this.

  I'm wondering about whether it really makes sense to treat XABORT like
  conflicts and other abort reasons, instead of like capacity aborts.
  Perhaps we need to differentiate between the explicit abort codes glibc
  uses, so that we can distinguish between cases where an abort is
  supposed to signal incompatibility with txnal execution and cases where
  it's just used for lock elision (see sysdeps/unix/sysv/linux/x86/hle.h
  in current glibc):
  #define _ABORT_LOCK_BUSY0xff
  #define _ABORT_LOCK_IS_LOCKED   0xfe
  #define _ABORT_NESTED_TRYLOCK   0xfd
 
 
 I am

Re: [PATCH] Fix memory orders description in atomic ops built-ins docs.

2015-05-19 Thread Torvald Riegel
On Mon, 2015-05-18 at 17:36 +0100, Matthew Wahab wrote:
 Hello,
 
 On 15/05/15 17:22, Torvald Riegel wrote:
  This patch improves the documentation of the built-ins for atomic
  operations.
 
 The memory model to memory order change does improve things but I think 
 that
 the patch has some problems. As it is now, it makes some of the descriptions
 quite difficult to understand and seems to assume more familiarity with 
 details
 of the C++11 specification then might be expected.

I'd say that's a side effect of the C++11 memory model being the
reference specification of the built-ins.

 Generally, the memory order descriptions seem to be targeted towards language
 designers but don't provide for anybody trying to understand how to implement 
 or
 to use the built-ins.

I agree that the current descriptions aren't a tutorial on the C++11
memory model.  However, given that the model is not GCC-specific, we
aren't really in a need to provide a tutorial, in the same way that we
don't provide a C++ tutorial.  Users can pick the C++11 memory model
educational material of their choice, and we need to document what's
missing to apply the C++11 knowledge to the built-ins we provide.

There are several resources for implementers, for example the mappings
maintained by the Cambridge research group.  I guess it would be
sufficient to have such material on the wiki.  Is there something
specific that you'd like to see documented for implementers?

 Adding a less formal, programmers view to some of the
 descriptions would help.

Yes, probably.  However, I'm not aware of a good C++11 memory model
tutorial that we could link too (OTOH, I haven't really looked for one).
I don't plan to write one, and doing so would certainly take some time
and require *much* more text than what we have now.

 That implies the descriptions would be more than just
 illustrative, but I'd suggest that would be appropriate for the GCC manual.

The danger I see there is that if we claim to define semantics instead
of just illustrating them, then we need to do a really good job of
defining/explaining them.  I worry that we may end up replicating what's
in the standard or Batty et al.'s formalization of the memory model, and
having to give a tutorial too.  I'm not sure anyone is volunteering to
do that amount of work.

 I'm also not sure that the use of C++11 terms in the some of the
 descriptions. In particular, using happens-before seems wrong because
 happens-before isn't described anywhere in the GCC manual and because it has a
 specific meaning in the C++11 specification that doesn't apply to the GCC
 built-ins (which C++11 doesn't know about).

I agree it's not described in the manual, but we're implementing C++11.
However, I don't see why happens-before semantics wouldn't apply to
GCC's implementation of the built-ins; there may be cases where we
guarantee more, but if one uses the builtins in way allowed by the C++11
model, one certainly gets behavior and happens-before relationships as
specified by C++11.


 diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
 index 6004681..5b2ded8 100644
 --- a/gcc/doc/extend.texi
 +++ b/gcc/doc/extend.texi
 @@ -8853,19 +8853,19 @@ are not prevented from being speculated to before the 
 barrier.
 
   [...]  If the data type size maps to one
 -of the integral sizes that may have lock free support, the generic
 -version uses the lock free built-in function.  Otherwise an
 +of the integral sizes that may support lock-freedom, the generic
 +version uses the lock-free built-in function.  Otherwise an
   external call is left to be resolved at run time.
 
 =
 This is a slightly awkward sentence. Maybe it could be replaced with something
 on the lines of The generic function uses the lock-free built-in function 
 when
 the data-type size makes that possible, otherwise an external call is left to 
 be
 resolved at run-time.
 =

Changed to:
It uses the lock-free built-in function if the specific data type size
makes that possible; otherwise, an external call is left to be resolved
at run time.

 
 -The memory models integrate both barriers to code motion as well as
 -synchronization requirements with other threads.  They are listed here
 -in approximately ascending order of strength.
 +An atomic operation can both constrain code motion by the compiler and
 +be mapped to a hardware instruction for synchronization between threads
 +(e.g., a fence).  [...]
 
 =
 This is a little unclear (and inaccurate, aarch64 can use two instructions
 for fences). I also thought that atomic operations constrain code motion by 
 the
 hardware. Maybe break the link with the compiler and hardware: An atomic
 operation can both constrain code motion and act as a synchronization point
 between threads.
 =

I removed by the compiler and used hardware instruction_s_.

 
   @table  @code
   @item __ATOMIC_RELAXED
 -No barriers or synchronization.
 +Implies no inter-thread ordering constraints.
 
 
 It may

Re: [PATCH] add self-tuning to x86 hardware fast path in libitm

2015-05-18 Thread Torvald Riegel
On Wed, 2015-04-29 at 23:23 -0400, Nuno Diegues wrote: 
 Hello,
 
 I have taken the chance to improve the patch by addressing the
 comments above in this thread.
 Namely:
  - to use a simple random generator managed inside the library only
  - removed floating point usage and replaced by fixed arithmetic
  - added some comments where relevant

First of all, sorry for taking so long to review this.  Thank you for
the contribution.

There are several high-level changes that I'd like to see (or be
convinced of that those aren't necessary).  I'll discuss these first.
I'll have more detailed comments inlined in the patch.


My major concern is about rdtsc being used.  The relation to frequency
adaption that Andi mentioned is one issue, but the bigger issue to me is
that the runtime of transactions might vary a lot, and that the relation
of txnal vs. nontxnal code executed in a period might vary a lot too.

Are there better options for the utility function, or can we tune it to
be less affected by varying txn length and likelihood of txnal vs.
nontxnal code?  What are the things we want to avoid with the tuning?  I
can think of:
* Not needing to wait for serial txns, or being aborted by a serial txn.
* Not retrying using HTM too much so that the retry time is larger than
the scalability we actually gain by having other txns commit
concurrently.

Anything else?  Did you consider other utility functions during your
research?  Also, note that the mitigation strategy for rdtsc
short-comings that you mention in the paper is not applicable in
general, specifically not in libitm.


Another issue is that we need to implement the tuning in a portable way.
You currently make it depend on whether the assembler knows about RTM,
whereas the rest of the code makes this more portable and defers to
arch-specific files.  I'd prefer if we could use the tuning on other
archs too.  But for that, we need to cleanly separate generic from
arch-specific parts.  That affects magic numbers as well as things like
rdtsc().


Generally, the patch needs to include more comments.  Most importantly,
we need to document why we implemented things the way we did, or do
tuning the way we do.  In cases where the intent isn't trivial to see
nor the code is trivial, explain the intent or reference the place where
you document the big picture.
A good rule of thumb is adding comments whenever it's not simple to see
why we use one of several possible implementation alternatives.

The magic numbers being used are a good example.  Make them constants
and don't just put them directly in the code, and then add documentation
why you chose this number and why it's okay to make that choice.  If it
isn't necessarily okay (eg, other archs, future systems), but you don't
have a better solution right now, add something like ??? Not ideal
because of XYZ..  If there's a source for some of the magic numbers,
cite it (e.g., [28] in your paper might be one for the tuning
thresholds, I guess).


Reoptimizing only in a specific, fixed thread is insufficient in the
general case.  There may be a thread that only runs an initial txn and
then never another one; if this happens to be the last thread
registered, you'll never get any tuning.  If we want the tuning to be
effective, one of the threads *actively* running txns needs to tune
eventually, always.

Depending on that, you'll probably have to change the sync between the
tuning thread and others.  Serial mode may be a simple way to do that.
It may be possible to only check for tuning being necessary when in
serial mode.  It could be possible that we end up trying HTM too often
yet never go to serial mode; however, that seems unlikely to me (but I
haven't thought thoroughly about it).

Also, please remember that only data-race-free code is strictly correct
C++ code (the only exception being the validated-loads special case in
the STM implementations, for which C++ doesn't have support yet (but for
which proposals exist)).  Thus, use atomic operations accordingly.  That
might create another issue though in that you can't assume 64b atomic
ops to be supported on all archs.  Maybe using serial mode for tuning
doesn't trigger that issue though.


I'm wondering about whether it really makes sense to treat XABORT like
conflicts and other abort reasons, instead of like capacity aborts.
Perhaps we need to differentiate between the explicit abort codes glibc
uses, so that we can distinguish between cases where an abort is
supposed to signal incompatibility with txnal execution and cases where
it's just used for lock elision (see sysdeps/unix/sysv/linux/x86/hle.h
in current glibc):
#define _ABORT_LOCK_BUSY0xff
#define _ABORT_LOCK_IS_LOCKED   0xfe
#define _ABORT_NESTED_TRYLOCK   0xfd


Andi said that it would be nice to have an env var turning tuning
on/off.  I agree in principle, yet would prefer to have the tuning be
the default.  What about adding an htm_notune option in
parse_default_method?  It would be even nicer to 

Re: [PATCH] add self-tuning to x86 hardware fast path in libitm

2015-05-18 Thread Torvald Riegel
On Mon, 2015-05-18 at 23:39 +0200, Andi Kleen wrote:
  Are there better options for the utility function, or can we tune it to
 
 There is nothing better that isn't a lot slower.

Do you care to elaborate why?  As-is, I find this statement to not be
convincing; at the very least we need to document why we think that
something time-based is the best option.

Other tuning attempts have looked at rates of aborted, attempted, and
committed txns, for example.  Why do we measure nb. of transactions in a
whole period, and can't get the same info through measuring smaller but
more specific time intervals (e.g., how long we wait for a serial txn)?




Re: [PATCH] [PR libitm/61164] Remove redefinition of glibc internal macro __always_inline

2015-05-15 Thread Torvald Riegel
On Wed, 2015-05-06 at 17:54 +0300, Gleb Fotengauer-Malinovskiy wrote:
 2015-05-06  Gleb Fotengauer-Malinovskiy  gle...@altlinux.org
 
   PR libitm/61164
   * local_atomic (__always_inline): Rename to...
   (__libitm_always_inline): ... this.

OK.  Thanks.



[PATCH] Fix memory orders description in atomic ops built-ins docs.

2015-05-15 Thread Torvald Riegel
This patch improves the documentation of the built-ins for atomic
operations.

First, to be consistent with the C++11 standard and make it easier for
users to relate the built-ins to the C++11 specification, the patch
changes 'memory model' to 'memory order'.  The memory model is the whole
thing, whereas memory orders are specific classes ordering requirements
that an atomic operations are parametrized by.

Second, the patch clarifies that the descriptions of the memory orders
are for illustrative purposes only, and adds illustrative descriptions
that are less incorrect and incomplete than what we had before.  There
is no way we can precisely summarize the C++ memory model in the docs,
and there isn't really a need either.

OK for trunk?


2015-05-15  Torvald Riegel  trie...@redhat.com

* doc/extend.texi (__atomic Builtins): Use 'memory order' instead of
'memory model' to align with C++11; fix description of memory orders;
fix a few typos.

commit d29b0bbeecf2267085f4ecdbaa61a0f969665468
Author: Torvald Riegel trie...@redhat.com
Date:   Fri May 15 18:14:40 2015 +0200

Fix memory order description in atomic ops built-ins docs.

diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
index 6004681..5b2ded8 100644
--- a/gcc/doc/extend.texi
+++ b/gcc/doc/extend.texi
@@ -8853,19 +8853,19 @@ are not prevented from being speculated to before the barrier.
 @section Built-in Functions for Memory Model Aware Atomic Operations
 
 The following built-in functions approximately match the requirements
-for C++11 concurrency and memory models.  They are all
+for the C++11 memory model.  They are all
 identified by being prefixed with @samp{__atomic} and most are
 overloaded so that they work with multiple types.
 
 These functions are intended to replace the legacy @samp{__sync}
-builtins.  The main difference is that the memory model to be used is a
-parameter to the functions.  New code should always use the
+builtins.  The main difference is that the memory order that is requested
+is a parameter to the functions.  New code should always use the
 @samp{__atomic} builtins rather than the @samp{__sync} builtins.
 
 Note that the @samp{__atomic} builtins assume that programs will
-conform to the C++11 model for concurrency.  In particular, they assume
+conform to the C++11 memory model.  In particular, they assume
 that programs are free of data races.  See the C++11 standard for
-detailed definitions.
+detailed requirements.
 
 The @samp{__atomic} builtins can be used with any integral scalar or
 pointer type that is 1, 2, 4, or 8 bytes in length.  16-byte integral
@@ -8875,136 +8875,140 @@ supported by the architecture.
 The four non-arithmetic functions (load, store, exchange, and 
 compare_exchange) all have a generic version as well.  This generic
 version works on any data type.  If the data type size maps to one
-of the integral sizes that may have lock free support, the generic
-version uses the lock free built-in function.  Otherwise an
+of the integral sizes that may support lock-freedom, the generic
+version uses the lock-free built-in function.  Otherwise an
 external call is left to be resolved at run time.  This external call is
 the same format with the addition of a @samp{size_t} parameter inserted
 as the first parameter indicating the size of the object being pointed to.
 All objects must be the same size.
 
-There are 6 different memory models that can be specified.  These map
-to the C++11 memory models with the same names, see the C++11 standard
+There are 6 different memory orders that can be specified.  These map
+to the C++11 memory orders with the same names, see the C++11 standard
 or the @uref{http://gcc.gnu.org/wiki/Atomic/GCCMM/AtomicSync,GCC wiki
 on atomic synchronization} for detailed definitions.  Individual
-targets may also support additional memory models for use on specific
+targets may also support additional memory orders for use on specific
 architectures.  Refer to the target documentation for details of
 these.
 
-The memory models integrate both barriers to code motion as well as
-synchronization requirements with other threads.  They are listed here
-in approximately ascending order of strength.
+An atomic operation can both constrain code motion by the compiler and
+be mapped to a hardware instruction for synchronization between threads
+(e.g., a fence).  To which extent this happens is controlled by the
+memory orders, which are listed here in approximately ascending order of
+strength.  The description of each memory order is only meant to roughly
+illustrate the effects and is not a specification; see the C++11
+memory model for precise semantics.
 
 @table  @code
 @item __ATOMIC_RELAXED
-No barriers or synchronization.
+Implies no inter-thread ordering constraints.
 @item __ATOMIC_CONSUME
-Data dependency only for both barrier and synchronization with another
-thread.
+This is currently implemented using the stronger @code{__ATOMIC_ACQUIRE

Re: [patch] Fix shared_timed_mutex::try_lock_until() et al

2015-04-08 Thread Torvald Riegel
There is an correctness issue related to mutex destruction.  The added
documentation is a good start, but I'd still add some more for the
complicated pieces of reasoning.  Details inline below.

On Tue, 2015-04-07 at 15:28 +0100, Jonathan Wakely wrote:
 diff --git a/libstdc++-v3/include/std/shared_mutex b/libstdc
 ++-v3/include/std/shared_mutex
 index ab1b45b..7391f11 100644
 --- a/libstdc++-v3/include/std/shared_mutex
 +++ b/libstdc++-v3/include/std/shared_mutex
 @@ -268,7 +268,12 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
  
  #else // ! _GLIBCXX_USE_PTHREAD_RWLOCK_T
  
 +// Must use the same clock as condition_variable
 +typedef chrono::system_clock   __clock_t;
 +
  #if _GTHREAD_USE_MUTEX_TIMEDLOCK
 +// Can't use std::timed_mutex with std::condition_variable, so
 define
 +// a new timed mutex type that derives from std::mutex.
  struct _Mutex : mutex, __timed_mutex_impl_Mutex
  {
templatetypename _Rep, typename _Period
 @@ -285,16 +290,44 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
  typedef mutex _Mutex;
  #endif
  
 -// Based on Howard Hinnant's reference implementation from N2406
 -
 +// Based on Howard Hinnant's reference implementation from N2406.
 +
 +// The high bit of _M_state is the write-entered flag which is
 set to
 +// indicate a writer has taken the lock or is queuing to take the
 lock.
 +// The remaining bits are the count of reader locks.
 +//
 +// To take a reader lock block on gate1 while the write-entered
 flag is

I think this is easier to parse if you add a comma: To take a reader
lock, block   Same for other sentences below.

 +// set or the maximum number of reader locks is held, then
 increment the
 +// reader lock count.
 +// To release decrement the count, then if the write-entered flag
 is set
 +// and the count is zero then signal gate2 to wake a queued
 writer,
 +// otherwise if the maximum number of reader locks was held
 signal gate1
 +// to wake a reader.
 +//
 +// To take a writer lock block on gate1 while the write-entered
 flag is
 +// set, then set the write-entered flag to start queueing, then
 block on
 +// gate2 while the number of reader locks is non-zero.
 +// To release unset the write-entered flag and signal gate1 to
 wake all
 +// blocked readers and writers.

Perhaps it would also be useful to have a sentence on how writers and
readers are prioritized (e.g., writers preferred if readers already hold
the lock, all are equal when writer unlocks, ...).

 +
 +// Only locked when accessing _M_state or waiting on condition
 variables.
  _Mutex _M_mut;
 +// Used to block while write-entered is set or reader count at
 maximum.
  condition_variable _M_gate1;
 +// Used to block queued writers while reader count is non-zero.
  condition_variable _M_gate2;
 +// The write-entered flag and reader count.
  unsigned   _M_state;
  
  static constexpr unsigned _S_write_entered
= 1U  (sizeof(unsigned)*__CHAR_BIT__ - 1);
 -static constexpr unsigned _M_n_readers = ~_S_write_entered;
 +static constexpr unsigned _S_n_readers = ~_S_write_entered;

Rename this to _S_max_readers or such?

 +
 +// Test whether the write-entered flag is set. _M_mut must be
 locked.
 +bool _M_write_entered() const { return _M_state 
 _S_write_entered; }
 +
 +// The number of reader locks currently held. _M_mut must be
 locked.
 +unsigned _M_readers() const { return _M_state  _S_n_readers; }
  
public:
  shared_timed_mutex() : _M_state(0) {}
 @@ -313,11 +346,9 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
  lock()
  {
unique_lockmutex __lk(_M_mut);
 -  while (_M_state  _S_write_entered)
 -   _M_gate1.wait(__lk);

Add something like this:?

// We first wait until we acquire the writer-part of the lock, and
// then wait until no readers hold the lock anymore.

 +  _M_gate1.wait(__lk, [=]{ return !_M_write_entered(); });
_M_state |= _S_write_entered;
 -  while (_M_state  _M_n_readers)
 -   _M_gate2.wait(__lk);
 +  _M_gate2.wait(__lk, [=]{ return _M_readers() == 0; });
  }
  
  bool
 @@ -337,26 +368,30 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
bool
try_lock_for(const chrono::duration_Rep, _Period __rel_time)
{
 -   unique_lock_Mutex __lk(_M_mut, __rel_time);
 -   if (__lk.owns_lock()  _M_state == 0)
 - {
 -   _M_state = _S_write_entered;
 -   return true;
 - }
 -   return false;
 +   return try_lock_until(__clock_t::now() + __rel_time);
}
  
  templatetypename _Clock, typename _Duration
bool
try_lock_until(const chrono::time_point_Clock, _Duration
 __abs_time)
{
 -   unique_lock_Mutex __lk(_M_mut, __abs_time);
 -   if (__lk.owns_lock()  _M_state == 0)
 +   if (!_M_mut.try_lock_until(__abs_time))
 + return false;

I think a non-timed acquisition 

Re: [PATCH][AArch64] Testcase fix for __ATOMIC_CONSUME

2015-02-11 Thread Torvald Riegel
On Mon, 2015-02-09 at 09:10 -0800, Mike Stump wrote:
 On Feb 9, 2015, at 7:11 AM, Alex Velenko alex.vele...@arm.com wrote:
  The following patch makes atomic-op-consume.c XFAIL
  
  Is this patch ok?
 
 Ok.
 
 I’d shorten the comment above the xfail to be exceedingly short:
 
   /* PR59448 consume not implemented yet */
 
 The reason is the brain can process this about 8x faster.  Also, one can cut 
 and paste the PR part into a web browser directly, or, if you have an 
 electric bugzilla mode for emacs, it will pop right up. */

Given the discussions we had in ISO C++ SG1, it seems the only way to
fix memory_order_consume is to deprecate it (or let it rot forever), and
add something else to the standard.  See
http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2014/n4321.pdf

IOW, I believe the promotion is here to stay.  I'm not aware of any
other implementation doing something else.

Thus, XFAIL doesn't seem right to me.



[patch libstdc++] Add POSIX variant of shared_timed_mutex.

2015-01-16 Thread Torvald Riegel
[Resend.  Sorry for the noise on gcc@.]

This adds a POSIX-based implementation of shared_timed_mutex, using
pthread_rwlock_* operations directly instead of implementing with
mutexes and two condvars.  This enables using an optimized
pthread_rwlock_t on POSIX.

Tested on x86_64-linux.

OK?

2015-01-16  Torvald Riegel  trie...@redhat.com

* include/std/shared_mutex (shared_timed_mutex): Add POSIX-based
implementation.


commit e0a32ddb058d8b4dd563f89130d03bce220ace8c
Author: Torvald Riegel trie...@redhat.com
Date:   Thu Jan 15 22:29:23 2015 +0100

libstdc++: Add POSIX variant of shared_timed_mutex.

	* include/std/shared_mutex (shared_timed_mutex): Add POSIX-based
	implementation.

diff --git a/libstdc++-v3/include/std/shared_mutex b/libstdc++-v3/include/std/shared_mutex
index 8bfede3..643768c 100644
--- a/libstdc++-v3/include/std/shared_mutex
+++ b/libstdc++-v3/include/std/shared_mutex
@@ -57,6 +57,188 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   /// shared_timed_mutex
   class shared_timed_mutex
   {
+#if defined(__GTHREADS_CXX0X)
+typedef chrono::system_clock	__clock_t;
+
+pthread_rwlock_t			_M_rwlock;
+
+  public:
+shared_timed_mutex()
+{
+  int __ret = pthread_rwlock_init(_M_rwlock, NULL);
+  if (__ret == ENOMEM)
+	throw bad_alloc();
+  else if (__ret == EAGAIN)
+	__throw_system_error(int(errc::resource_unavailable_try_again));
+  else if (__ret == EPERM)
+	__throw_system_error(int(errc::operation_not_permitted));
+  // Errors not handled: EBUSY, EINVAL
+  _GLIBCXX_DEBUG_ASSERT(__ret == 0);
+}
+
+~shared_timed_mutex()
+{
+  int __ret __attribute((unused)) = pthread_rwlock_destroy(_M_rwlock);
+  // Errors not handled: EBUSY, EINVAL
+  _GLIBCXX_DEBUG_ASSERT(__ret == 0);
+}
+
+shared_timed_mutex(const shared_timed_mutex) = delete;
+shared_timed_mutex operator=(const shared_timed_mutex) = delete;
+
+// Exclusive ownership
+
+void
+lock()
+{
+  int __ret = pthread_rwlock_wrlock(_M_rwlock);
+  if (__ret == EDEADLK)
+	__throw_system_error(int(errc::resource_deadlock_would_occur));
+  // Errors not handled: EINVAL
+  _GLIBCXX_DEBUG_ASSERT(__ret == 0);
+}
+
+bool
+try_lock()
+{
+  int __ret = pthread_rwlock_trywrlock(_M_rwlock);
+  if (__ret == EBUSY) return false;
+  // Errors not handled: EINVAL
+  _GLIBCXX_DEBUG_ASSERT(__ret == 0);
+  return true;
+}
+
+templatetypename _Rep, typename _Period
+  bool
+  try_lock_for(const chrono::duration_Rep, _Period __rel_time)
+  {
+	return try_lock_until(__clock_t::now() + __rel_time);
+  }
+
+templatetypename _Duration
+  bool
+  try_lock_until(const chrono::time_point__clock_t, _Duration __atime)
+  {
+	auto __s = chrono::time_point_castchrono::seconds(__atime);
+	auto __ns = chrono::duration_castchrono::nanoseconds(__atime - __s);
+
+	__gthread_time_t __ts =
+	  {
+	static_caststd::time_t(__s.time_since_epoch().count()),
+	static_castlong(__ns.count())
+	  };
+
+	int __ret = pthread_rwlock_timedwrlock(_M_rwlock, __ts);
+	// On self-deadlock, we just fail to acquire the lock.  Technically,
+	// the program violated the precondition.
+	if (__ret == ETIMEDOUT || __ret == EDEADLK)
+	  return false;
+	// Errors not handled: EINVAL
+	_GLIBCXX_DEBUG_ASSERT(__ret == 0);
+	return true;
+  }
+
+templatetypename _Clock, typename _Duration
+  bool
+  try_lock_until(const chrono::time_point_Clock, _Duration __abs_time)
+  {
+	// DR 887 - Sync unknown clock to known clock.
+	const typename _Clock::time_point __c_entry = _Clock::now();
+	const __clock_t::time_point __s_entry = __clock_t::now();
+	const auto __delta = __abs_time - __c_entry;
+	const auto __s_atime = __s_entry + __delta;
+	return try_lock_until(__s_atime);
+  }
+
+void
+unlock()
+{
+  int __ret __attribute((unused)) = pthread_rwlock_unlock(_M_rwlock);
+  // Errors not handled: EPERM, EBUSY, EINVAL
+  _GLIBCXX_DEBUG_ASSERT(__ret == 0);
+}
+
+// Shared ownership
+
+void
+lock_shared()
+{
+  int __ret = pthread_rwlock_rdlock(_M_rwlock);
+  if (__ret == EDEADLK)
+	__throw_system_error(int(errc::resource_deadlock_would_occur));
+  if (__ret == EAGAIN)
+	// Maximum number of read locks has been exceeded.
+	__throw_system_error(int(errc::device_or_resource_busy));
+  // Errors not handled: EINVAL
+  _GLIBCXX_DEBUG_ASSERT(__ret == 0);
+}
+
+bool
+try_lock_shared()
+{
+  int __ret = pthread_rwlock_tryrdlock(_M_rwlock);
+  // If the maximum number of read locks has been exceeded, we just fail
+  // to acquire the lock.  Unlike for lock(), we are not allowed to throw
+  // an exception.
+  if (__ret == EBUSY || __ret == EAGAIN) return false;
+  // Errors not handled: EINVAL
+  _GLIBCXX_DEBUG_ASSERT(__ret == 0);
+  return true;
+}
+
+templatetypename _Rep, typename _Period

[patch libstdc++] Optimize synchronization in std::future if futexes are available.

2015-01-16 Thread Torvald Riegel
This patch optimizes synchronization in std::future by using atomic
operations and futexes if the latter are available, instead of the
mutex/condvar combination in the existing code.  That reduces the space
overhead for futexes as well as synchronization runtime overheads.

To do that, the patch introduces an abstraction layer for futexes that
essentially extends an atomic-typed variable with operations that wait
for the variable to (not) have a certain value.  This waiting can then
be implemented internally with a combination of spin-waiting (not
implemented yet) and blocking using OS features such as futexes.  This
approach is similar to what the synchronicT proposal to ISO C++
(N4195) contains.
The atomic-typed variable is an unsigned int because this is what Linux
futexes currently support as futex variable.

If futexes are not available, the implementation falls back to using a
mutex and condvar.  This leads to very similar code for std::future
compared to the existing code.  The exception is that
_State_baseV2::wait_for and _State_baseV2::wait_until may acquire the
mutex twice if the future is not ready; this may lead to some additional
contention if those functions actually have to wait for the future to
become ready (ie, this is on the slow path).

It would be possible to optimize the space overhead in std::future
further by merging _State_baseV2::_M_retrieved and
_State_baseV2::_M_once into _State_baseV2::_M_status.  This would add
some complexity to the synchronization code, but not a lot.
I'm happy to do this next week if people are interested in this.

Tested on x86_64-linux and the 30_threads/* tests.

OK for trunk?
commit 4b07d1c0ab807fd0fbedacde1e9fec99a7d75b6d
Author: Torvald Riegel trie...@redhat.com
Date:   Sun Nov 16 12:07:22 2014 +0100

libstdc++: Optimize synchronization in std::future if futexes are available.

	* src/c++11/futex.cc: New file.
	* include/bits/atomic_futex.h: New file.
	* include/std/future (__future_base::_State_baseV2): Use
	atomic_futex_unsigned instead of mutex+condvar.
	* src/c++11/futex.cc: Likewise.
	* include/Makefile.am: Add atomic_futex.h.
	* include/Makefile.in: Likewise.
	* src/c++11/Makefile.am: Add futex.cc.
	* src/c++11/Makefile.in: Likewise.

diff --git a/libstdc++-v3/config/abi/pre/gnu.ver b/libstdc++-v3/config/abi/pre/gnu.ver
index 700da18..d8d155f 100644
--- a/libstdc++-v3/config/abi/pre/gnu.ver
+++ b/libstdc++-v3/config/abi/pre/gnu.ver
@@ -1759,6 +1759,11 @@ GLIBCXX_3.4.21 {
 _ZNKSt8time_getI[cw]St19istreambuf_iteratorI[cw]St11char_traitsI[cw]EEE3getES3_S3_RSt8ios_baseRSt12_Ios_IostateP2tmPK[cw]SC_;
 _ZNKSt8time_getI[cw]St19istreambuf_iteratorI[cw]St11char_traitsI[cw]EEE6do_getES3_S3_RSt8ios_baseRSt12_Ios_IostateP2tmcc;
 
+extern C++
+{
+  std::__atomic_futex_unsigned_base*;
+};
+
 } GLIBCXX_3.4.20;
 
 
diff --git a/libstdc++-v3/include/Makefile.am b/libstdc++-v3/include/Makefile.am
index 08e5d5f..4772950 100644
--- a/libstdc++-v3/include/Makefile.am
+++ b/libstdc++-v3/include/Makefile.am
@@ -83,6 +83,7 @@ bits_headers = \
 	${bits_srcdir}/allocated_ptr.h \
 	${bits_srcdir}/allocator.h \
 	${bits_srcdir}/atomic_base.h \
+	${bits_srcdir}/atomic_futex.h \
 	${bits_srcdir}/basic_ios.h \
 	${bits_srcdir}/basic_ios.tcc \
 	${bits_srcdir}/basic_string.h \
diff --git a/libstdc++-v3/include/Makefile.in b/libstdc++-v3/include/Makefile.in
index 3e5d82e..ebcaa96 100644
--- a/libstdc++-v3/include/Makefile.in
+++ b/libstdc++-v3/include/Makefile.in
@@ -351,6 +351,7 @@ bits_headers = \
 	${bits_srcdir}/allocated_ptr.h \
 	${bits_srcdir}/allocator.h \
 	${bits_srcdir}/atomic_base.h \
+	${bits_srcdir}/atomic_futex.h \
 	${bits_srcdir}/basic_ios.h \
 	${bits_srcdir}/basic_ios.tcc \
 	${bits_srcdir}/basic_string.h \
diff --git a/libstdc++-v3/include/bits/atomic_futex.h b/libstdc++-v3/include/bits/atomic_futex.h
new file mode 100644
index 000..9a418d8
--- /dev/null
+++ b/libstdc++-v3/include/bits/atomic_futex.h
@@ -0,0 +1,288 @@
+// -*- C++ -*- header.
+
+// Copyright (C) 2015 Free Software Foundation, Inc.
+//
+// This file is part of the GNU ISO C++ Library.  This library is free
+// software; you can redistribute it and/or modify it under the
+// terms of the GNU General Public License as published by the
+// Free Software Foundation; either version 3, or (at your option)
+// any later version.
+
+// This library is distributed in the hope that it will be useful,
+// but WITHOUT ANY WARRANTY; without even the implied warranty of
+// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+// GNU General Public License for more details.
+
+// Under Section 7 of GPL version 3, you are granted additional
+// permissions described in the GCC Runtime Library Exception, version
+// 3.1, as published by the Free Software Foundation.
+
+// You should have received a copy of the GNU General Public License and
+// a copy of the GCC Runtime Library Exception along with this program;
+// see

Re: [PATCH] PR59448 - Promote consume to acquire

2015-01-14 Thread Torvald Riegel
On Wed, 2015-01-14 at 10:44 -0500, Andrew MacLeod wrote:
 I think this brings us to where we ought to be...   at least almost :-)
 The latest version I have  is n3337, which still specifies that 
 atomic_clear can't be memory_order_acquire or memory_order_acq_rel. Has 
 that been updated to specify that memory_order_consume is not allowed 
 either?  I think there was a request in at some point...   I can add 
 that if so.

N3797 disallows mo_consume on atomic_flag::clear.



Re: [PATCH] PR59448 - Promote consume to acquire

2015-01-13 Thread Torvald Riegel
On Tue, 2015-01-13 at 10:11 -0500, Andrew MacLeod wrote:
 On 01/13/2015 09:59 AM, Richard Biener wrote:
  On Tue, Jan 13, 2015 at 3:56 PM, Andrew MacLeod amacl...@redhat.com wrote:
  Lengthy discussion : https://gcc.gnu.org/bugzilla/show_bug.cgi?id=59448
 
  Basically we can generate incorrect code for an atomic consume operation in
  some circumstances.  The general feeling seems to be that we should simply
  promote all consume operations to an acquire operation until there is a
  better definition/understanding of the consume model and how GCC can track
  it.
 
  I proposed a simple patch in the PR, and I have not seen or heard of any
  dissenting opinion.   We should get this in before the end of stage 3 I
  think.
 
  The problem with the patch in the PR is the  memory model is immediately
  promoted from consume to acquire.   This happens *before* any of the
  memmodel checks are made.  If a consume is illegally specified (such as in 
  a
  compare_exchange), it gets promoted to acquire and the compiler doesn't
  report the error because it never sees the consume.
 
  This new patch simply makes the adjustment after any errors are checked on
  the originally specified model.   It bootstraps on x86_64-unknown-linux-gnu
  and passes all regression testing.
  I also built an aarch64 compiler and it appears to issue the LDAR as
  specified in the PR, but anyone with a vested interest really ought to 
  check
  it out with a real build to be sure.
 
  OK for trunk?
  Why not patch get_memmodel?  (not sure if that catches all cases)
 
  Richard.
 
 
 That was the original patch.
 
 The issue is that it  promotes consume to acquire before any error 
 checking gets to look at the model, so then we allow illegal 
 specification of consume. (It actually triggers a failure in the testsuite)

(This is this test: gcc/testsuite/gcc.dg/atomic-invalid.c)

The documentation of the atomic builtins also disallows mo_consume on
atomic_exchange.

However, I don't see any such requirement in C11 or C++14 (and I'd be
surprised to see it in C++11).  It would be surprising also because for
other atomic read-modify-write operations (eg, fetch_add), we don't make
such a requirement in the builtins docs -- and atomic_exchange is just a
read-modify-write with a noop, basically.

Does anyone remember why this requirement for no consume on exchange was
added, or sees a reason to keep it?  If not, I think we should drop it.
This would solve the testsuite failure for Andrew.  Dropping it would
prevent GCC from checking the consume-on-success / acquire-on-failure
case for compare_excahnge I mentioned previously, but I think that this
is pretty harmless.

I could imagine that, for some reason, either backends or libatomic do
not implement consume on atomic_exchange just because the docs
disallowed it -- but I haven't checked that.




Re: [PATCH] PR59448 - Promote consume to acquire

2015-01-13 Thread Torvald Riegel
On Tue, 2015-01-13 at 09:56 -0500, Andrew MacLeod wrote:
 The problem with the patch in the PR is the  memory model is immediately 
 promoted from consume to acquire.   This happens *before* any of the 
 memmodel checks are made.  If a consume is illegally specified (such as 
 in a compare_exchange), it gets promoted to acquire and the compiler 
 doesn't report the error because it never sees the consume.

The only issue I can think of in compare_exchange is if the program
specifies memory_order_consume for the success path but
memory_order_acquire for the failure path, which is disallowed by the
standard.

However, I don't see a reason why the standard's requirement is anything
but a performance check in our particular case.  The only case we
prevent the compiler from reporting is a consume-on-success /
acquire-on-failure combination.  But we upgrade the former to acquire,
so we can't even cause libatomic (or similar) to issue too weak barriers
due to libatomic relying on the standard's requirement.

Thus, if there's no easy way to upgrade to acquire after the sanity
checks, I think this isn't critical enough to hold up the patch from
being committed.  memory_order_consume is clearly a feature for experts.



[patch] [WIP] Optimize synchronization in std::future if futexes are available.

2014-11-16 Thread Torvald Riegel
This WORK-IN-PROGRESS patch uses an atomic unsigned and futex operations
to optimize the synchronization code in std::future.  The current code
uses a mutex/condvar combination, which is both slower (e.g., due to
mutex contention, stronger ordering requirements for condvars, using an
additional condvar-internal mutex, ...) and makes std::future fairly
large.

It introduces an __atomic_futex_unsigned type, which provides basic
atomic operations (load and store) on an atomicunsigned and
additionally provides load_when_[not_]equal operations that do blocking
waits on the atomic -- pretty much what futexes do.  Such an
__atomic_futex_unsigned is then 

There are a few bits missing in this patch:
* A fallback implementation for platforms that don't provide futexes, in
the form of a different implementation of __atomic_futex_unsigned.  A
mutex+condvar combination is what I'm aiming at; for std::future, this
would lead to similar code and sizeof(std::future) as currently.
* More documentation of how the synchronization works.  Jonathan has a
patch in flight for that, so this should get merged.
* Integration with the on_thread_exit patch that Jonathan posted, which
uses the current, lock-based synchronization scheme and thus needs to
get adapted.
* Testing.

There are ways to optimize further I suppose, for example by letting the
__atomic_futex_unsigned take care of all current uses of call_once too.
Let me know if I should do that.  This would reduce the number of atomic
ops a little in some cases, as well as reduce space required for futures
a little.

Comments?
commit 1543313a3b590e6422f5d547cabd6662e0a6f538
Author: Torvald Riegel trie...@redhat.com
Date:   Sun Nov 16 12:07:22 2014 +0100

[WIP] Optimize synchronization in std::future if futexes are available.

diff --git a/libstdc++-v3/include/Makefile.am b/libstdc++-v3/include/Makefile.am
index e6edc73..81e6a8b 100644
--- a/libstdc++-v3/include/Makefile.am
+++ b/libstdc++-v3/include/Makefile.am
@@ -83,6 +83,7 @@ bits_headers = \
 	${bits_srcdir}/allocated_ptr.h \
 	${bits_srcdir}/allocator.h \
 	${bits_srcdir}/atomic_base.h \
+	${bits_srcdir}/atomic_futex.h \
 	${bits_srcdir}/basic_ios.h \
 	${bits_srcdir}/basic_ios.tcc \
 	${bits_srcdir}/basic_string.h \
diff --git a/libstdc++-v3/include/Makefile.in b/libstdc++-v3/include/Makefile.in
index 2ade448..2d72de3 100644
--- a/libstdc++-v3/include/Makefile.in
+++ b/libstdc++-v3/include/Makefile.in
@@ -350,6 +350,7 @@ bits_headers = \
 	${bits_srcdir}/allocated_ptr.h \
 	${bits_srcdir}/allocator.h \
 	${bits_srcdir}/atomic_base.h \
+	${bits_srcdir}/atomic_futex.h \
 	${bits_srcdir}/basic_ios.h \
 	${bits_srcdir}/basic_ios.tcc \
 	${bits_srcdir}/basic_string.h \
diff --git a/libstdc++-v3/include/bits/atomic_futex.h b/libstdc++-v3/include/bits/atomic_futex.h
new file mode 100644
index 000..4b5664d
--- /dev/null
+++ b/libstdc++-v3/include/bits/atomic_futex.h
@@ -0,0 +1,175 @@
+// -*- C++ -*- header.
+
+// Copyright (C) 2014 Free Software Foundation, Inc.
+//
+// This file is part of the GNU ISO C++ Library.  This library is free
+// software; you can redistribute it and/or modify it under the
+// terms of the GNU General Public License as published by the
+// Free Software Foundation; either version 3, or (at your option)
+// any later version.
+
+// This library is distributed in the hope that it will be useful,
+// but WITHOUT ANY WARRANTY; without even the implied warranty of
+// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+// GNU General Public License for more details.
+
+// Under Section 7 of GPL version 3, you are granted additional
+// permissions described in the GCC Runtime Library Exception, version
+// 3.1, as published by the Free Software Foundation.
+
+// You should have received a copy of the GNU General Public License and
+// a copy of the GCC Runtime Library Exception along with this program;
+// see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+// http://www.gnu.org/licenses/.
+
+/** @file bits/atomic_futex.h
+ *  This is an internal header file, included by other library headers.
+ *  Do not attempt to use it directly.
+ */
+
+#ifndef _GLIBCXX_ATOMIC_FUTEX_H
+#define _GLIBCXX_ATOMIC_FUTEX_H 1
+
+#pragma GCC system_header
+
+#include bits/c++config.h
+#include atomic
+#include mutex
+#include condition_variable
+
+#ifndef _GLIBCXX_ALWAYS_INLINE
+#define _GLIBCXX_ALWAYS_INLINE inline __attribute__((always_inline))
+#endif
+
+namespace std _GLIBCXX_VISIBILITY(default)
+{
+_GLIBCXX_BEGIN_NAMESPACE_VERSION
+
+//#ifdef _GLIBCXX_USE_FUTEX
+  struct __atomic_futex_unsigned_base
+  {
+// Returns false iff a timeout occurred.
+static bool
+futex_wait_for(unsigned *addr, unsigned val, bool has_timeout,
+	chrono::seconds s, chrono::nanoseconds ns);
+
+// This is static because it can be executed after the object has been
+// destroyed.
+static void futex_notify_all(unsigned* addr);
+  };
+
+  template unsigned _Waiter_bit = 0x8000

Re: [PATCH v3] PR libitm/61164: use always_inline consistently

2014-09-29 Thread Torvald Riegel
On Mon, 2014-09-29 at 16:53 +0400, Gleb Fotengauer-Malinovskiy wrote:
 -#undef  __always_inline
 -#define __always_inline __attribute__((always_inline))
 +#define __libitm_always_inline inline __attribute__((always_inline))

The previous code seems to work in libstdc++.  I believe that
eventually, we'll want to use libstdc++-v3/include/bits/atomic_base.h
(see the comment at the top of the file).  Can we keep the diff between
the two files small?



Re: [PATCH v3] PR libitm/61164: use always_inline consistently

2014-09-29 Thread Torvald Riegel
On Mon, 2014-09-29 at 17:38 +0200, Jakub Jelinek wrote:
 On Mon, Sep 29, 2014 at 05:35:24PM +0200, Torvald Riegel wrote:
  On Mon, 2014-09-29 at 16:53 +0400, Gleb Fotengauer-Malinovskiy wrote:
   -#undef  __always_inline
   -#define __always_inline __attribute__((always_inline))
   +#define __libitm_always_inline inline __attribute__((always_inline))
  
  The previous code seems to work in libstdc++.  I believe that
  eventually, we'll want to use libstdc++-v3/include/bits/atomic_base.h
  (see the comment at the top of the file).  Can we keep the diff between
  the two files small?
 
 libstdc++-v3/include/bits/atomic_base.h uses
 _GLIBCXX_ALWAYS_INLINE macro:
 
 #ifndef _GLIBCXX_ALWAYS_INLINE
 #define _GLIBCXX_ALWAYS_INLINE inline __attribute__((always_inline))
 #endif

Ahh.. I missed that inline in there when I had a quick look.  Sorry
for the noise :)



[COMMITTED] Add myself as maintainer for libitm.

2014-05-14 Thread Torvald Riegel
On Mon, 2014-03-03 at 09:24 -0800, Richard Henderson wrote:
 On 03/03/2014 04:48 AM, Torvald Riegel wrote:
  Should I add myself as maintainer for libitm?
 
 Yes.

Committed as r210448.


commit 4da9024845f11053b56aa1318469029b044ff6d1
Author: Torvald Riegel trie...@redhat.com
Date:   Thu May 15 00:31:03 2014 +0200

Add myself as maintainer for libitm.

diff --git a/MAINTAINERS b/MAINTAINERS
index ae3d330..8d094be 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -174,6 +174,7 @@ libgomp			Richard Henderson	r...@redhat.com
 libgomp			Jakub Jelinek		ja...@redhat.com
 libiberty		DJ Delorie		d...@redhat.com
 libiberty		Ian Lance Taylor	i...@airs.com
+libitm			Torvald Riegel		trie...@redhat.com
 libffi testsuite	Andreas Tobler		andre...@gcc.gnu.org
 libobjc			Nicola Pero		nicola.p...@meta-innovation.com
 libobjc			Andrew Pinski		pins...@gmail.com


Re: [PATCH] Fix libitm futex handling on non-x86/ppc/sh/alpha targets

2014-03-26 Thread Torvald Riegel
On Wed, 2014-03-26 at 22:19 +0100, Jakub Jelinek wrote:
 Hi!
 
 The sys_futex0 caller expects return values as returned by raw syscalls,
 i.e. value = 0 success, negative value are errors -errorval.
 But, the syscall function returns value = 0 on success, and -1 on error,
 with errno set to errorval.  This means if e.g. futex syscall fails with
 EAGAIN, and EPERM is 1, we get GTM program terminated because we think
 futex returned -EPERM.
 
 Fixed thusly, bootstrapped/regtested on s390x-linux, ok for trunk/4.8?

Looks good to me.  Thanks.




Re: [PATCH, LIBITM] Backport libitm bug fixes to FSF 4.8

2014-03-03 Thread Torvald Riegel
On Fri, 2014-02-28 at 19:32 -0600, Peter Bergner wrote:
 I'd like to ask for permission to backport the following two LIBITM bug
 fixes to the FSF 4.8 branch.  Although these are not technically fixing
 regressions, they do fix the libitm.c/reentrant.c testsuite failure on
 s390 and powerpc (or at least it will when we finally get our power8
 code backported to FSF 4.8).  It also fixes a real bug on x86 that is
 latent because we don't currently have a test case that warms up the
 x86's RTM hardware enough such that its xbegin succeeds exposing the
 bug.  I'd like this backport so that the 4.8 based distros won't need
 to carry this as an add-on patch.
 
 It should also be fairly safe as well, since the fixed code is limited
 to the arches (x86, s390 and powerpc) that define USE_HTM_FASTPATH,
 so all others definitely won't see a difference.

Looks good to me.

 I'll note I CC'd some of the usual suspects interested in TM as well
 as the normal RMs, because LIBITM doesn't seem to have a maintainer
 or reviewer listed in the MAINTAINERS file.  Is that an oversight or???

I'm reviewing all libitm patches that I'm aware of (but I don't read
gcc-patches regularly).  Should I add myself as maintainer for libitm?
Does this come with any other responsibilities than reviewing patches?

Torvald



Re: [PATCH] libitm: Add custom HTM fast path for RTM on x86_64.

2013-08-30 Thread Torvald Riegel
On Mon, 2013-08-26 at 09:49 -0700, Richard Henderson wrote:
 On 08/22/2013 02:57 PM, Torvald Riegel wrote:
  On Thu, 2013-08-22 at 12:05 -0700, Richard Henderson wrote:
  On 08/22/2013 11:39 AM, Torvald Riegel wrote:
  + /* Store edi for future HTM fast path retries.  We use a stack slot
  +lower than the jmpbuf so that the jmpbuf's rip field will overlap
  +with the proper return address on the stack.  */
  + movl%edi, -64(%rsp)
 
  You havn't allocated the stack frame here, and you're storing
  outside the redzone.  This is invalid.
 
  Two possibilities:
 
   (1) always allocate the stack frame on entry to
   the function (adds two register additions to
   the htm fast path -- in the noise i'd think)
 
   (2) store the edi value in the non-htm path, with
   the pr_HTMRetryableAbort bit or'd in.  (adds an
   extra store to the non-htm path; probably noise).
   You'd want to mask out that bit when you reload it.
  
  Oops.  Picked fix (2).  Better now?
  
 
 Move the andl of edi down into the HAVE_AS_RTM block, above the orl of
 HTMRetriedAfterAbort.  Ok with that change.

Committed as r202101.



Re: [PATCH] libitm: Add custom HTM fast path for RTM on x86_64.

2013-08-30 Thread Torvald Riegel
On Fri, 2013-08-30 at 16:49 +0200, Rainer Orth wrote:
 Torvald Riegel trie...@redhat.com writes:
 
  On Mon, 2013-08-26 at 09:49 -0700, Richard Henderson wrote:
  On 08/22/2013 02:57 PM, Torvald Riegel wrote:
   On Thu, 2013-08-22 at 12:05 -0700, Richard Henderson wrote:
   On 08/22/2013 11:39 AM, Torvald Riegel wrote:
   +  /* Store edi for future HTM fast path retries.  We use a stack 
   slot
   + lower than the jmpbuf so that the jmpbuf's rip field will 
   overlap
   + with the proper return address on the stack.  */
   +  movl%edi, -64(%rsp)
  
   You havn't allocated the stack frame here, and you're storing
   outside the redzone.  This is invalid.
  
   Two possibilities:
  
(1) always allocate the stack frame on entry to
the function (adds two register additions to
the htm fast path -- in the noise i'd think)
  
(2) store the edi value in the non-htm path, with
the pr_HTMRetryableAbort bit or'd in.  (adds an
extra store to the non-htm path; probably noise).
You'd want to mask out that bit when you reload it.
   
   Oops.  Picked fix (2).  Better now?
   
  
  Move the andl of edi down into the HAVE_AS_RTM block, above the orl of
  HTMRetriedAfterAbort.  Ok with that change.
 
  Committed as r202101.
 
 
 The patch has broken Solaris/x86 bootstrap:

Sorry about that.  I'll try to remember to test on non-Linux / non-futex
systems too next time.  I committed the attached patch as r202116.
(Please note that I'll be offline for two weeks starting in about an
hour (yes, bad timing), so I hope I didn't miss anything else...)

Torvald
commit 29006efcecde6ec0366f6c88a662d30b9efb931f
Author: torvald torvald@138bc75d-0d04-0410-961f-82ee72b054a4
Date:   Fri Aug 30 17:13:05 2013 +

libitm: Fix wrong initialization order introduced with r202101.

* config/posix/rwlock.cc: Fix initialization order.

git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@202116 
138bc75d-0d04-0410-961f-82ee72b054a4

diff --git a/libitm/config/posix/rwlock.cc b/libitm/config/posix/rwlock.cc
index 488e9c2..61b6ad9 100644
--- a/libitm/config/posix/rwlock.cc
+++ b/libitm/config/posix/rwlock.cc
@@ -30,11 +30,11 @@ namespace GTM HIDDEN {
 // ??? Move this back to the header file when constexpr is implemented.
 
 gtm_rwlock::gtm_rwlock()
-  : mutex (PTHREAD_MUTEX_INITIALIZER),
+  : summary (0),
+mutex (PTHREAD_MUTEX_INITIALIZER),
 c_readers (PTHREAD_COND_INITIALIZER),
 c_writers (PTHREAD_COND_INITIALIZER),
 c_confirmed_writers (PTHREAD_COND_INITIALIZER),
-summary (0),
 a_readers (0),
 w_readers (0),
 w_writers (0)


Re: [PATCH] libitm: Add custom HTM fast path for RTM on x86_64.

2013-08-22 Thread Torvald Riegel
Attached is an updated patch.  Changes explained below.

On Wed, 2013-08-21 at 10:50 -0700, Richard Henderson wrote:
  -#if defined(USE_HTM_FASTPATH)  !defined(HTM_CUSTOM_FASTPATH)
  +#ifdef USE_HTM_FASTPATH
 // HTM fastpath.  Only chosen in the absence of transaction_cancel to 
  allow
 // using an uninstrumented code path.
 // The fastpath is enabled only by dispatch_htm's method group, which 
  uses
  @@ -187,6 +187,7 @@ GTM::gtm_thread::begin_transaction (uint32_t prop, 
  const gtm_jmpbuf *jb)
 // indeed in serial mode, and HW transactions should never need serial 
  mode
 // for any internal changes (e.g., they never abort visibly to the STM 
  code
 // and thus do not trigger the standard retry handling).
  +#ifndef HTM_CUSTOM_FASTPATH
 if (likely(htm_fastpath  (prop  pr_hasNoAbort)))
   {
 for (uint32_t t = htm_fastpath; t; t--)
  @@ -237,6 +238,49 @@ GTM::gtm_thread::begin_transaction (uint32_t prop, 
  const gtm_jmpbuf *jb)
  }
  }
   }
  +#else
  +  // If we have a custom HTM fastpath in ITM_beginTransaction, we implement
  +  // just the retry policy here.  We communicate with the custom fastpath
 
 Don't you want this logic arranged as
 
 #ifdef HTM_CUSTOM_FASTPATH
  ... your new code
 #elif defined(USE_HTM_FASTPATH)
  ... existing htm code
 #endif

It seems you didn't feel strongly about it (and neither do I), so I kept
this as is.  The current code arrangement has the advantage that the
general HTM fastpath comes first, and second the stuff for the custom,
more complex fastpath variants.

  -   subq$56, %rsp
  -   cfi_def_cfa_offset(64)
  +   subq$64, %rsp
  +   cfi_def_cfa_offset(72)
 
 You now have an abi-misaligned stack.  Since the return address is pushed by
 the call, an aligned stack frame allocation is (N + 8) % 16 == 0.

Fixed, and added more suggestions by Richard.  We also discussed
splitting the xbegin into two paths, one for the first HTM abort --
where we'd save the jmpbuf -- and one for subsequent aborts -- where we
wouldn't need to save the jmpbuf.  But given that we only save the
jmpbuf when either going to non-HTM or after an HTM abort, the overheads
didn't seem to matter enough to justify the added complexity.

  +  // Accessed from assembly language, thus the asm specifier on
  +  // the name, avoiding complex name mangling.
  +#ifdef __USER_LABEL_PREFIX__
  +#define UPFX1(t) UPFX(t)
  +#define UPFX(t) #t
  +  static gtm_rwlock serial_lock
  +__asm__(UPFX1(__USER_LABEL_PREFIX__) gtm_serial_lock);
  +#else
  +  static gtm_rwlock serial_lock
  +__asm__(gtm_serial_lock);
  +#endif
 
 Now that we have 3 copies of this, we should simplify things.  E.g.
 
 #ifdef __USER_LABEL_PREFIX__
 # define UPFX1(X) #X
 # define UPFX UPFX1(__USER_LABEL_PREFIX__)
 #else
 # define UPFX
 #endif
 
 static gtm_rwlock serial_lock __asm__(UPFX gtm_serial_lock);

Fixed (including the one additional indirection that was necessary ;) ).

OK for trunk?
commit 428aa3302c5674326585f178409f734222edfa8b
Author: Torvald Riegel trie...@redhat.com
Date:   Wed Aug 21 11:40:54 2013 +0200

Add custom HTM fast path for RTM on x86_64.

* libitm_i.h (gtm_thread): Assign an asm name to serial_lock.
(htm_fastpath): Assign an asm name.
* libitm.h (_ITM_codeProperties): Add non-ABI flags used by custom
HTM fast paths.
(_ITM_actions): Likewise.
* config/x86/target.h (HTM_CUSTOM_FASTPATH): Enable custom fastpath on
x86_64.
* config/x86/sjlj.S (_ITM_beginTransaction): Add custom HTM fast path.
* config/posix/rwlock.h (gtm_rwlock): Update comments.  Move summary
field to the start of the structure.
* config/linux/rwlock.h (gtm_rwlock): Update comments.
* beginend.cc (gtm_thread::begin_transaction): Add retry policy
handling for custom HTM fast paths.

diff --git a/libitm/beginend.cc b/libitm/beginend.cc
index a3bf549..bd7b19e 100644
--- a/libitm/beginend.cc
+++ b/libitm/beginend.cc
@@ -165,7 +165,7 @@ GTM::gtm_thread::begin_transaction (uint32_t prop, const 
gtm_jmpbuf *jb)
   if (unlikely(prop  pr_undoLogCode))
 GTM_fatal(pr_undoLogCode not supported);
 
-#if defined(USE_HTM_FASTPATH)  !defined(HTM_CUSTOM_FASTPATH)
+#ifdef USE_HTM_FASTPATH
   // HTM fastpath.  Only chosen in the absence of transaction_cancel to allow
   // using an uninstrumented code path.
   // The fastpath is enabled only by dispatch_htm's method group, which uses
@@ -187,6 +187,7 @@ GTM::gtm_thread::begin_transaction (uint32_t prop, const 
gtm_jmpbuf *jb)
   // indeed in serial mode, and HW transactions should never need serial mode
   // for any internal changes (e.g., they never abort visibly to the STM code
   // and thus do not trigger the standard retry handling).
+#ifndef HTM_CUSTOM_FASTPATH
   if (likely(htm_fastpath  (prop  pr_hasNoAbort)))
 {
   for (uint32_t t = htm_fastpath; t; t--)
@@ -237,6 +238,49 @@ GTM::gtm_thread::begin_transaction

Re: [PATCH] libitm: Add custom HTM fast path for RTM on x86_64.

2013-08-22 Thread Torvald Riegel
On Thu, 2013-08-22 at 12:05 -0700, Richard Henderson wrote:
 On 08/22/2013 11:39 AM, Torvald Riegel wrote:
  +   /* Store edi for future HTM fast path retries.  We use a stack slot
  +  lower than the jmpbuf so that the jmpbuf's rip field will overlap
  +  with the proper return address on the stack.  */
  +   movl%edi, -64(%rsp)
 
 You havn't allocated the stack frame here, and you're storing
 outside the redzone.  This is invalid.
 
 Two possibilities:
 
  (1) always allocate the stack frame on entry to
  the function (adds two register additions to
  the htm fast path -- in the noise i'd think)
 
  (2) store the edi value in the non-htm path, with
  the pr_HTMRetryableAbort bit or'd in.  (adds an
  extra store to the non-htm path; probably noise).
  You'd want to mask out that bit when you reload it.

Oops.  Picked fix (2).  Better now?
commit 42a3defc64d28dc2c96f5bbdf8547e5b7bd9a40b
Author: Torvald Riegel trie...@redhat.com
Date:   Wed Aug 21 11:40:54 2013 +0200

Add custom HTM fast path for RTM on x86_64.

* libitm_i.h (gtm_thread): Assign an asm name to serial_lock.
(htm_fastpath): Assign an asm name.
* libitm.h (_ITM_codeProperties): Add non-ABI flags used by custom
HTM fast paths.
(_ITM_actions): Likewise.
* config/x86/target.h (HTM_CUSTOM_FASTPATH): Enable custom fastpath on
x86_64.
* config/x86/sjlj.S (_ITM_beginTransaction): Add custom HTM fast path.
* config/posix/rwlock.h (gtm_rwlock): Update comments.  Move summary
field to the start of the structure.
* config/linux/rwlock.h (gtm_rwlock): Update comments.
* beginend.cc (gtm_thread::begin_transaction): Add retry policy
handling for custom HTM fast paths.

diff --git a/libitm/beginend.cc b/libitm/beginend.cc
index a3bf549..bd7b19e 100644
--- a/libitm/beginend.cc
+++ b/libitm/beginend.cc
@@ -165,7 +165,7 @@ GTM::gtm_thread::begin_transaction (uint32_t prop, const 
gtm_jmpbuf *jb)
   if (unlikely(prop  pr_undoLogCode))
 GTM_fatal(pr_undoLogCode not supported);
 
-#if defined(USE_HTM_FASTPATH)  !defined(HTM_CUSTOM_FASTPATH)
+#ifdef USE_HTM_FASTPATH
   // HTM fastpath.  Only chosen in the absence of transaction_cancel to allow
   // using an uninstrumented code path.
   // The fastpath is enabled only by dispatch_htm's method group, which uses
@@ -187,6 +187,7 @@ GTM::gtm_thread::begin_transaction (uint32_t prop, const 
gtm_jmpbuf *jb)
   // indeed in serial mode, and HW transactions should never need serial mode
   // for any internal changes (e.g., they never abort visibly to the STM code
   // and thus do not trigger the standard retry handling).
+#ifndef HTM_CUSTOM_FASTPATH
   if (likely(htm_fastpath  (prop  pr_hasNoAbort)))
 {
   for (uint32_t t = htm_fastpath; t; t--)
@@ -237,6 +238,49 @@ GTM::gtm_thread::begin_transaction (uint32_t prop, const 
gtm_jmpbuf *jb)
}
}
 }
+#else
+  // If we have a custom HTM fastpath in ITM_beginTransaction, we implement
+  // just the retry policy here.  We communicate with the custom fastpath
+  // through additional property bits and return codes, and either transfer
+  // control back to the custom fastpath or run the fallback mechanism.  The
+  // fastpath synchronization algorithm itself is the same.
+  // pr_HTMRetryableAbort states that a HW transaction started by the custom
+  // HTM fastpath aborted, and that we thus have to decide whether to retry
+  // the fastpath (returning a_tryHTMFastPath) or just proceed with the
+  // fallback method.
+  if (likely(htm_fastpath  (prop  pr_HTMRetryableAbort)))
+{
+  tx = gtm_thr();
+  if (unlikely(tx == NULL))
+{
+  // See below.
+  tx = new gtm_thread();
+  set_gtm_thr(tx);
+}
+  // If this is the first abort, reset the retry count.  We abuse
+  // restart_total for the retry count, which is fine because our only
+  // other fallback will use serial transactions, which don't use
+  // restart_total but will reset it when committing.
+  if (!(prop  pr_HTMRetriedAfterAbort))
+   tx-restart_total = htm_fastpath;
+
+  if (--tx-restart_total  0)
+   {
+ // Wait until any concurrent serial-mode transactions have finished.
+ // Essentially the same code as above.
+ if (serial_lock.is_write_locked())
+   {
+ if (tx-nesting  0)
+   goto stop_custom_htm_fastpath;
+ serial_lock.read_lock(tx);
+ serial_lock.read_unlock(tx);
+   }
+ // Let ITM_beginTransaction retry the custom HTM fastpath.
+ return a_tryHTMFastPath;
+   }
+}
+ stop_custom_htm_fastpath:
+#endif
 #endif
 
   tx = gtm_thr();
diff --git a/libitm/config/linux/rwlock.h b/libitm/config/linux/rwlock.h
index 428299f..c761edf 100644
--- a/libitm/config/linux/rwlock.h
+++ b/libitm/config/linux/rwlock.h
@@ -39,6 +39,11 @@ struct gtm_thread

[PATCH] libitm: Add custom HTM fast path for RTM on x86_64.

2013-08-21 Thread Torvald Riegel
This patch adds a custom HTM fast path for RTM on x86_64, which moves
the core HTM fast path bits from gtm_thread::begin_transaction into the
x86-specific ITM_beginTransaction implementation.  It extends/changes
the previous patch by Andi:
http://gcc.gnu.org/ml/gcc-patches/2013-01/msg01228.html

The custom fast path decreases the overheads of using HW transactions.
gtm_thread::begin_transaction remains responsible for handling the retry
policy after aborts of HW transactions, including when to switch to the
fallback execution method.  Right now, the C++ retry code isn't aware of
the specific abort reason but just counts the number of retries for a
particular transaction; it might make sense to add this in the future.

Tested on Haswell with microbenchmarks and STAMP Vacation.  OK for
trunk?  (Please take a closer look at the asm pieces of this.)

(I've seen failures for STAMP Genome during my recent tests, but those
happen also with just ITM_DEFAULT_METHOD=serialirr and a single thread,
and AFAICT they don't seem to be related to the changes in
_ITM_beginTransaction.  I'll have a look...)

Andreas and Peter: Is this sufficient as a proof of concept for custom
fast paths on your architectures, or would you like to see any changes?

Torvald
commit 9329bd4504d13d415542d93418157d588b599b4e
Author: Torvald Riegel trie...@redhat.com
Date:   Wed Aug 21 11:40:54 2013 +0200

Add custom HTM fast path for RTM on x86_64.

* libitm_i.h (gtm_thread): Assign an asm name to serial_lock.
(htm_fastpath): Assign an asm name.
* libitm.h (_ITM_codeProperties): Add non-ABI flags used by custom
HTM fast paths.
(_ITM_actions): Likewise.
* config/x86/target.h (HTM_CUSTOM_FASTPATH): Enable custom fastpath on
x86_64.
* config/x86/sjlj.S (_ITM_beginTransaction): Add custom HTM fast path.
* config/posix/rwlock.h (gtm_rwlock): Update comments.  Move summary
field to the start of the structure.
* config/linux/rwlock.h (gtm_rwlock): Update comments.
* beginend.cc (gtm_thread::begin_transaction): Add retry policy
handling for custom HTM fast paths.

diff --git a/libitm/beginend.cc b/libitm/beginend.cc
index a3bf549..bd7b19e 100644
--- a/libitm/beginend.cc
+++ b/libitm/beginend.cc
@@ -165,7 +165,7 @@ GTM::gtm_thread::begin_transaction (uint32_t prop, const 
gtm_jmpbuf *jb)
   if (unlikely(prop  pr_undoLogCode))
 GTM_fatal(pr_undoLogCode not supported);
 
-#if defined(USE_HTM_FASTPATH)  !defined(HTM_CUSTOM_FASTPATH)
+#ifdef USE_HTM_FASTPATH
   // HTM fastpath.  Only chosen in the absence of transaction_cancel to allow
   // using an uninstrumented code path.
   // The fastpath is enabled only by dispatch_htm's method group, which uses
@@ -187,6 +187,7 @@ GTM::gtm_thread::begin_transaction (uint32_t prop, const 
gtm_jmpbuf *jb)
   // indeed in serial mode, and HW transactions should never need serial mode
   // for any internal changes (e.g., they never abort visibly to the STM code
   // and thus do not trigger the standard retry handling).
+#ifndef HTM_CUSTOM_FASTPATH
   if (likely(htm_fastpath  (prop  pr_hasNoAbort)))
 {
   for (uint32_t t = htm_fastpath; t; t--)
@@ -237,6 +238,49 @@ GTM::gtm_thread::begin_transaction (uint32_t prop, const 
gtm_jmpbuf *jb)
}
}
 }
+#else
+  // If we have a custom HTM fastpath in ITM_beginTransaction, we implement
+  // just the retry policy here.  We communicate with the custom fastpath
+  // through additional property bits and return codes, and either transfer
+  // control back to the custom fastpath or run the fallback mechanism.  The
+  // fastpath synchronization algorithm itself is the same.
+  // pr_HTMRetryableAbort states that a HW transaction started by the custom
+  // HTM fastpath aborted, and that we thus have to decide whether to retry
+  // the fastpath (returning a_tryHTMFastPath) or just proceed with the
+  // fallback method.
+  if (likely(htm_fastpath  (prop  pr_HTMRetryableAbort)))
+{
+  tx = gtm_thr();
+  if (unlikely(tx == NULL))
+{
+  // See below.
+  tx = new gtm_thread();
+  set_gtm_thr(tx);
+}
+  // If this is the first abort, reset the retry count.  We abuse
+  // restart_total for the retry count, which is fine because our only
+  // other fallback will use serial transactions, which don't use
+  // restart_total but will reset it when committing.
+  if (!(prop  pr_HTMRetriedAfterAbort))
+   tx-restart_total = htm_fastpath;
+
+  if (--tx-restart_total  0)
+   {
+ // Wait until any concurrent serial-mode transactions have finished.
+ // Essentially the same code as above.
+ if (serial_lock.is_write_locked())
+   {
+ if (tx-nesting  0)
+   goto stop_custom_htm_fastpath;
+ serial_lock.read_lock(tx);
+ serial_lock.read_unlock(tx);
+   }
+ // Let

Re: [PATCH] libitm: Add custom HTM fast path for RTM on x86_64.

2013-08-21 Thread Torvald Riegel
On Wed, 2013-08-21 at 10:14 -0700, Andi Kleen wrote:
 Torvald Riegel trie...@redhat.com writes:
  +#endif
  leaq8(%rsp), %rax
  -   subq$56, %rsp
  -   cfi_def_cfa_offset(64)
  +   subq$64, %rsp
  +   cfi_def_cfa_offset(72)
 
 I don't see why you did this change and the addq change below.

I need to store edi (ie, the properties of the transaction passed to the
function by the compiler) on the stack, so these two changes create
additional room for it.  (I wasn't sure about alignment requirements, so
I just used 8 bytes for it.)

 The rest seems reasonable to me, although I haven't tried to untangle
 the full dependencies between C++ and asm code for retries.

If anyone has any suggestions for how to improve the comments, let me
know.

 It would be likely cleaner to just keep the retries fully
 in C++ like the original patch did. There's no advantage
 of going back to assembler.

That's true for x86, but it seems that for s390, we can't easily put the
xbegin/tbegin into the C++ code because of floating point register
save/restore issues.  The added complexity on the x86 side seemed to be
a reasonable price for having a general HTM fast path retry handling on
the C++ side.

Torvald



Re: [PATCH] libitm: Add custom HTM fast path for RTM on x86_64.

2013-08-21 Thread Torvald Riegel
On Wed, 2013-08-21 at 19:41 +0200, Andi Kleen wrote:
  That's true for x86, but it seems that for s390, we can't easily put the
  xbegin/tbegin into the C++ code because of floating point register
  save/restore issues.  The added complexity on the x86 side seemed to be
  a reasonable price for having a general HTM fast path retry handling on
  the C++ side.
 
 I don't see much point in trying to unify these fast paths for very
 different implementations with different properties. It would
 not surprise me if the highly tuned end result is different.
 
 It's kind of like trying to write a portable high performance memcpy().

I agree that highly tuned implementations might eventually be pretty
different from each other.  Nonetheless, I'm currently not aware of
anybody volunteering to really work on highly tuned implementations, in
particular regarding the retry policies.  Once we get there -- and it
would be great if somebody would work on that!, I'm completely fine with
separating those bits out.  Until then, I think it's preferable to keep
things as unified as possible as long as this doesn't come with
unreasonable complexity or runtime overheads.

Torvald



Re: [PATCH] S/390: Hardware transactional memory support

2013-08-14 Thread Torvald Riegel
On Tue, 2013-08-13 at 12:09 -0700, Richard Henderson wrote:
 On 08/13/2013 11:26 AM, Jakub Jelinek wrote:
  On Fri, Aug 02, 2013 at 11:05:33AM -1000, Richard Henderson wrote:
  On 08/02/2013 04:45 AM, Andreas Krebbel wrote:
  !   XCFLAGS=${XCFLAGS} -mzarch -mhtm -msoft-float
 
  Not good, since _ITM_R{F,D,E,CF,CD,CE} should return values in
  floating point registers; similarly for the write accessors.
  
  So, would it be enough to compile just beginend.cc with -msoft-float
  and the rest normally?
 
 No.
 
 From what I understand of the s390 restriction, we can't touch the
 floating point registers after starting a transaction, at least until
 we're committed to restoring them all.
 
 Which means that we have to have everything along the path from
 htm_begin_success == false until a longjmp restores them.  This path
 includes a call to std::operator new, so that makes it a non-starter.
 
 Better, I think, to pass the gtm_jmpbuf to htm_begin.  Then we can do
 
   uint32_t ret = __builtin_tbegin_nofloat (NULL);
   if (!htm_begin_success(ret))
 // restore fpu state from jb
   return ret;
 
 at which point we're back to normal and we can do whatever we want
 within the normal abi wrt the fpu.

Can we instead move the successful path of the HTM fastpath to the
_ITM_beginTransaction asm function, and just do the fallback and retry
code in beginend.cc?  So, add a tbegin (or xbegin) and the check of
serial_lock to the asm function as in Andi's patch
(http://gcc.gnu.org/ml/gcc-patches/2013-01/msg00640.html), but keep the
restart policy code (ie, the serial lock read/write locking for the
wait, etc.) in beginend.cc?  The latter could return a new code that
tells the asm function to try the HTM fastpath again; we might also want
to use a separate bit in the arguments to gtm_thread::begin_transaction
to tell it whether the HTM fastpath failed before.

This way, we get lower overhead when the HTM fastpath succeeds right
away because we can do the minimally necessary thing in asm; we still
can change the retry policies easily; and we can restore the floating
point registers to a sane state for any libitm C/C++ code that we might
call without having to do any restore inside of the C/C++ code.

Thoughts?  I'll put working on a draft of this for x86 on my list.



Re: [PATCH] S/390: Hardware transactional memory support

2013-08-02 Thread Torvald Riegel
On Fri, 2013-06-21 at 12:23 +0200, Andreas Krebbel wrote:
 Index: libitm/config/s390/target.h
 ===
 *** libitm/config/s390/target.h.orig
 --- libitm/config/s390/target.h
 ***

[...]

 + static inline uint32_t
 + htm_begin ()
 + {
 +   return __builtin_tbegin_nofloat (NULL);
 + } 

I'm not quite sure I understand why the nofloat variant is sufficient;
is this because we don't use FPRs in the abort/restart code of libitm
(and the FPRs are known to be clobbered after starting a txn)?

There is a bit in the properties passed to _ITM_beginTransaction which
states whether the txn has floating point update code
(pr_hasNoFloatUpdate).  Would this be useful?

(BTW, sorry for the late response.  I don't read gcc-patches frequently,
so please CC me on libitm-related things.)



Re: [PATCH][4.8 backport] S/390: Transactional Execution support

2013-08-02 Thread Torvald Riegel
On Thu, 2013-08-01 at 10:55 +0200, Andreas Krebbel wrote:
 Torvald, could you please consider adding backports of the following
 patches to the 4.8 branch?
 
 19/06/13 [patch] libitm: Fix handling of reentrancy in the HTM fastpath
 http://gcc.gnu.org/ml/gcc-patches/2013-06/msg01132.html
 
 20/06/13 Re: [patch] libitm: Fix handling of reentrancy in the HTM fastpath
 http://gcc.gnu.org/ml/gcc-patches/2013-06/msg01188.html

I believe Jakub already backported these two; at least he told me he was
planning to, IIRC.

Torvald



Re: [PATCH] S/390: Hardware transactional memory support

2013-08-02 Thread Torvald Riegel
On Fri, 2013-08-02 at 15:16 +0200, Andreas Krebbel wrote:
 On 02/08/13 13:31, Torvald Riegel wrote:
  On Fri, 2013-06-21 at 12:23 +0200, Andreas Krebbel wrote:
  Index: libitm/config/s390/target.h
  ===
  *** libitm/config/s390/target.h.orig
  --- libitm/config/s390/target.h
  ***
  
  [...]
  
  + static inline uint32_t
  + htm_begin ()
  + {
  +   return __builtin_tbegin_nofloat (NULL);
  + } 
  
  I'm not quite sure I understand why the nofloat variant is sufficient;
  is this because we don't use FPRs in the abort/restart code of libitm
  (and the FPRs are known to be clobbered after starting a txn)?
 
 Yes. To my understanding there are 2 potential issues when the FPRs are not 
 saved.
 
 1. The abort code will read corrupted FPR content for FPRs:
 - which are live across tbegin and
 - are updated in the TX body and
 - are read in the abort code
 The abort code will see the updated value from the TX body.
 
 Since libitm implements TX begins as function calls only call-saved registers 
 can be live across a
 tbegin. But all the call-saved FPRs are saved in _ITM_beginTransaction and 
 get restored when doing
 the longjmp back into the user code. So this should be no problem.
 
 2. The other problem is that the ABI might get violated when aborting into a 
 callee which already
 returned and there are FPRs which have been modified afterwards. In that case 
 the compiler would not
 have generated FPR save/restore instructions in the function 
 prologue/epilogue of the callee. But in
 fact there are modified FPRs when exiting the second time.
 
 But again this should be no problem thanks to the setjmp/longjmp semantics of 
 _ITM_beginTransaction
 and GTM_longjmp. If the TX body modified a call-saved FPR it will get 
 restored on the abort path
 back from libitm to the user code.
 
 
 As long as libitm does not use FPRs itself this should be safe without having 
 tbegin clobbering FPRs.

And I suppose that even if it would use FPRs, it shouldn't ever access
memory locations touched in user code.

Thanks for the explanation.  Some of this was obvious, sorry, but
that's what happens when a mental context switch is incomplete :)  Next
time, I'd prefer a summary of such reasoning to be added as a comment in
the source; it's just too easy to forget about all the subtleties...

  There is a bit in the properties passed to _ITM_beginTransaction which
  states whether the txn has floating point update code
  (pr_hasNoFloatUpdate).  Would this be useful?
 
 This probably would be useful in the ITM_beginTransaction / GTM_longjmp 
 implementations which in
 that case could try to avoid the FPR save/restores. But Richard mentioned 
 that these bits so far are
 never set by GCC since it is lacking the required infos from the register 
 allocator.

Another thing that comes to mind is that it might be useful to put parts
of the HTM fast path into the asm pieces of _ITM_beginTransaction.  That
way, one can use exactly the amount of SW setjmp one needs given what's
easy to save/restore for the TM; this might decrease the startup costs
of a txn somewhat.  Andi Kleen posted a patch for how to do this for RTM
a while ago (but it had some issues):
http://gcc.gnu.org/ml/gcc-patches/2013-01/msg01189.html

Perhaps something along these lines (while avoiding the issues :) )
would be useful for s/390 and Power too.


Torvald



Re: [patch] libitm: Fix handling of reentrancy in the HTM fastpath

2013-06-20 Thread Torvald Riegel
On Wed, 2013-06-19 at 22:13 -0500, Peter Bergner wrote:
 On Thu, 2013-06-20 at 00:51 +0200, Torvald Riegel wrote:
  On Wed, 2013-06-19 at 14:43 -0500, Peter Bergner wrote: 
   I'm having trouble seeing why/when _ITM_inTransaction() is
   returning something other than inIrrevocableTransaction.  I'll see if I 
   can
   determine why and will report back.
   
   Ok, we return outsideTransaction because the nesting level (tx-nesting)
   is zero.
  
  That's a second bug in libitm, sorry.  Can you try with the attached
  patch additionally to the previous one?  Thanks!
 
 Ok, with this patch, plus adding a powerpc implementation of
 htm_transaction_active(), reentrant.c now executes correctly
 on both HTM and non-HTM hardware for me.  Thanks for all of
 your help with this!
 
 I'd still like to hear from Andreas, whether the reentrant.c test case
 with both patches, now works on S390.
 
 I'll note unlike your x86 htm_transaction_active() implementation,
 my implementation has to check for htm_available() first before
 executing the htm instruction which tells me whether I'm in transaction
 state or not, otherwise I'll get a SIGILL on non-HTM hardware.
 Unfortunately, my htm_available() call uses getauxval() to query
 the AUXV for a hwcap bit.  Is there a place I can stash the result
 of the first call, so that subsequent calls use the cached value?
 Normally, I could use a static var, but I doubt that is what we want
 to do in static inline functions.

You're right, that was missing for x86 as well.  Please see the updated
second patch that is attached.  It additionally checks htm_fastpath to
see whether we are actually using the HTM.  This variable is initialized
to the value that htm_init() returns.  This should do the right thing I
suppose.

Torvald
commit c8352d4d9fa5cfa3453a61c581956835de9753e5
Author: Torvald Riegel trie...@redhat.com
Date:   Thu Jun 20 00:46:59 2013 +0200

libitm: Handle HTM fastpath in status query functions.

diff --git a/libitm/config/x86/target.h b/libitm/config/x86/target.h
index 77b627f..063c09e 100644
--- a/libitm/config/x86/target.h
+++ b/libitm/config/x86/target.h
@@ -125,6 +125,13 @@ htm_abort_should_retry (uint32_t begin_ret)
 {
   return begin_ret  _XABORT_RETRY;
 }
+
+/* Returns true iff a hardware transaction is currently being executed.  */
+static inline bool
+htm_transaction_active ()
+{
+  return _xtest() != 0;
+}
 #endif
 
 
diff --git a/libitm/query.cc b/libitm/query.cc
index 5707321..39a35b3 100644
--- a/libitm/query.cc
+++ b/libitm/query.cc
@@ -43,6 +43,15 @@ _ITM_libraryVersion (void)
 _ITM_howExecuting ITM_REGPARM
 _ITM_inTransaction (void)
 {
+#if defined(USE_HTM_FASTPATH)
+  // If we use the HTM fastpath, we cannot reliably detect whether we are
+  // in a transaction because this function can be called outside of
+  // a transaction and thus we can't deduce this by looking at just the serial
+  // lock.  This function isn't used in practice currently, so the easiest
+  // way to handle it is to just abort.
+  if (htm_fastpath  htm_transaction_active())
+htm_abort();
+#endif
   struct gtm_thread *tx = gtm_thr();
   if (tx  (tx-nesting  0))
 {
@@ -58,6 +67,11 @@ _ITM_inTransaction (void)
 _ITM_transactionId_t ITM_REGPARM
 _ITM_getTransactionId (void)
 {
+#if defined(USE_HTM_FASTPATH)
+  // See ITM_inTransaction.
+  if (htm_fastpath  htm_transaction_active())
+htm_abort();
+#endif
   struct gtm_thread *tx = gtm_thr();
   return (tx  (tx-nesting  0)) ? tx-id : _ITM_noTransactionId;
 }


[patch] libitm: Fix handling of reentrancy in the HTM fastpath

2013-06-19 Thread Torvald Riegel
(Re-sending to the proper list. Sorry for the noise at gcc@.)

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=57643

The HTM fastpath didn't handle a situation in which a relaxed
transaction executed unsafe code that in turn starts a transaction; it
simply tried to wait for the other transaction, not checking whether
the current thread started the other transaction.

We fix this by doing this check, and if we have the lock, we just
continue with the fallback serial-mode path instead of using a HW
transaction.  The current code won't do the check before starting a HW
transaction, but this way we can keep the HTM fastpath unchanged; also,
this particular reentrancy is probably infrequent in practice, so I
suppose the small slowdown shouldn't matter much.

Also, I first thought about trying to use the HTM in the reentrancy
case, but this doesn't make any sense because other transactions can't
run anyway, and we should really just finish the serial-mode transaction
as fast as possible.

Peter and/or Andreas: Could you please check that this fixes the bug you
see on Power/s390?  Thanks.

Torvald

commit 185af84e365e1bae31aea5afd6e67e81f3c32c72
Author: Torvald Riegel trie...@redhat.com
Date:   Wed Jun 19 16:42:24 2013 +0200

libitm: Fix handling of reentrancy in the HTM fastpath.

PR libitm/57643

diff --git a/libitm/beginend.cc b/libitm/beginend.cc
index 93e702e..a3bf549 100644
--- a/libitm/beginend.cc
+++ b/libitm/beginend.cc
@@ -197,6 +197,8 @@ GTM::gtm_thread::begin_transaction (uint32_t prop, const 
gtm_jmpbuf *jb)
  // We are executing a transaction now.
  // Monitor the writer flag in the serial-mode lock, and abort
  // if there is an active or waiting serial-mode transaction.
+ // Note that this can also happen due to an enclosing
+ // serial-mode transaction; we handle this case below.
  if (unlikely(serial_lock.is_write_locked()))
htm_abort();
  else
@@ -219,6 +221,14 @@ GTM::gtm_thread::begin_transaction (uint32_t prop, const 
gtm_jmpbuf *jb)
  tx = new gtm_thread();
  set_gtm_thr(tx);
}
+ // Check whether there is an enclosing serial-mode transaction;
+ // if so, we just continue as a nested transaction and don't
+ // try to use the HTM fastpath.  This case can happen when an
+ // outermost relaxed transaction calls unsafe code that starts
+ // a transaction.
+ if (tx-nesting  0)
+   break;
+ // Another thread is running a serial-mode transaction.  Wait.
  serial_lock.read_lock(tx);
  serial_lock.read_unlock(tx);
  // TODO We should probably reset the retry count t here, unless


Re: [patch] libitm: Fix handling of reentrancy in the HTM fastpath

2013-06-19 Thread Torvald Riegel
On Wed, 2013-06-19 at 14:43 -0500, Peter Bergner wrote: 
 On Wed, 2013-06-19 at 10:57 -0500, Peter Bergner wrote:
  On Wed, 2013-06-19 at 10:49 -0500, Peter Bergner wrote:
   This is due to the following in _ITM_inTransaction():
   
   47  if (tx  (tx-nesting  0))
   (gdb) p tx
   $2 = (GTM::gtm_thread *) 0x10901bf0
   (gdb) p tx-nesting
   $3 = 1
   (gdb) step
   49  if (tx-state  gtm_thread::STATE_IRREVOCABLE)
   (gdb) p tx-state
   $4 = 3
   (gdb) p gtm_thread::STATE_IRREVOCABLE
   $5 = 2
   (gdb) step
   50return inIrrevocableTransaction;
  
  Bah, ignore this.  It's a different call that is returning something other
  than inIrrevocableTransaction.  Unfortunately, gdb is having problems inside
  hw txns and I'm having trouble seeing why/when _ITM_inTransaction() is
  returning something other than inIrrevocableTransaction.  I'll see if I can
  determine why and will report back.
 
 Ok, we return outsideTransaction because the nesting level (tx-nesting)
 is zero.

That's a second bug in libitm, sorry.  Can you try with the attached
patch additionally to the previous one?  Thanks!

Torvald


commit 02dde6bb91107792fb0cb9f5c4785d25b6aa0e3c
Author: Torvald Riegel trie...@redhat.com
Date:   Thu Jun 20 00:46:59 2013 +0200

libitm: Handle HTM fastpath in status query functions.

diff --git a/libitm/config/x86/target.h b/libitm/config/x86/target.h
index 77b627f..063c09e 100644
--- a/libitm/config/x86/target.h
+++ b/libitm/config/x86/target.h
@@ -125,6 +125,13 @@ htm_abort_should_retry (uint32_t begin_ret)
 {
   return begin_ret  _XABORT_RETRY;
 }
+
+/* Returns true iff a hardware transaction is currently being executed.  */
+static inline bool
+htm_transaction_active ()
+{
+  return _xtest() != 0;
+}
 #endif
 
 
diff --git a/libitm/query.cc b/libitm/query.cc
index 5707321..0ac3eda 100644
--- a/libitm/query.cc
+++ b/libitm/query.cc
@@ -43,6 +43,15 @@ _ITM_libraryVersion (void)
 _ITM_howExecuting ITM_REGPARM
 _ITM_inTransaction (void)
 {
+#if defined(USE_HTM_FASTPATH)
+  // If we use the HTM fastpath, we cannot reliably detect whether we are
+  // in a transaction because this function can be called outside of
+  // a transaction and thus we can't deduce this by looking at just the serial
+  // lock.  This function isn't used in practice currently, so the easiest
+  // way to handle it is to just abort.
+  if (htm_transaction_active())
+htm_abort();
+#endif
   struct gtm_thread *tx = gtm_thr();
   if (tx  (tx-nesting  0))
 {
@@ -58,6 +67,11 @@ _ITM_inTransaction (void)
 _ITM_transactionId_t ITM_REGPARM
 _ITM_getTransactionId (void)
 {
+#if defined(USE_HTM_FASTPATH)
+  // See ITM_inTransaction.
+  if (htm_transaction_active())
+htm_abort();
+#endif
   struct gtm_thread *tx = gtm_thr();
   return (tx  (tx-nesting  0)) ? tx-id : _ITM_noTransactionId;
 }


Re: [PATCH] Add faster HTM fastpath for libitm TSX v2

2013-01-30 Thread Torvald Riegel
On Tue, 2013-01-29 at 20:19 +0100, Andi Kleen wrote:
  next time please reply to the points raised in a review if you disagree
  with them, and don't just ignore them.  That speeds up the review.
 
 It was all in the previous email on the topic.

This v2 patch did not incorporate all the changes that I requested, nor
did you explicitly object to those you didn't incorporate.  Maybe you
don't care what's in libitm's comments, but I do.

// See gtm_thread::begin_transaction.
   -uint32_t GTM::htm_fastpath = 0;
   +uint32_t GTM::htm_fastpath asm(__gtm_htm_fastpath) = 0;
   +
   +uint32_t *GTM::global_lock asm(__gtm_global_lock);
  
  Rearrange the serial lock's fields (see below).
 
 To my knowledge C++ classes have no guaranteed layout,
 so that's not safe because there is no guarantee where
 the vtable pointers are. it would be only with plain old structs.

And gtm_rwlock doesn't have any virtual members, right?  So it's like a
plain old struct (ie, it's a POD type).

 +  // pr_tryHTM can be set by an assembler fast path when it already
 tried
 +  // a hardware transaction once. In this case we do one retry less.
 
 pr_tryHTM is already documented.

And I asked you to simply reference this in a comment.  What's the
problem with that?  I do not want people working on libitm to have to
grep through the code for uses of pr_tryHTM and look for comments that
might be related to it just because you don't want to put a simple
reference into the comment at the declaration of pr_tryHTM.  Can't you
see that adding the reference is just plain useful for everybody else?


Torvald



Re: [PATCH] Add faster HTM fastpath for libitm TSX

2013-01-29 Thread Torvald Riegel
On Thu, 2013-01-24 at 17:45 +0100, Andi Kleen wrote:
  
   +  uint32_t *__gtm_global_lock;
  
  Same, and rearrange the serial lock's fields (see below).
 
 My understanding is that C++ makes no guarantees of class layout.
 That is why i ended up not relying on the layout.

gtm_rwlock is a POD type, so you get the guarantees that C would
provide.

Torvald



Re: [PATCH] Add faster HTM fastpath for libitm TSX v2

2013-01-29 Thread Torvald Riegel
On Thu, 2013-01-24 at 19:30 -0800, Andi Kleen wrote:
 From: Andi Kleen a...@linux.intel.com
 
 The libitm TSX hardware transaction fast path currently does quite a bit of
 unnecessary work (saving registers etc.) before even trying to start
 a hardware transaction. This patch moves the initial attempt at a
 transaction early into the assembler stub. Complicated work like retries
 is still done in C. So this is essentially a fast path for the fast
 path.
 
 The assembler code doesn't understand the layout of serial_lock, but
 it needs to check that serial_lock is free. We export just the lock
 variable as a separate pointer for this.
 
 The assembler code controls the HTM fast path with a new pr_tryHTM flag.
 
 I had to reorganize the retry loop slightly so that the waiting for the
 lock to be free again is done first (because the first transaction
 is already done elsewhere). This makes the patch look larger than
 it really is. This just moves one basic block around.
 
 Passes bootstrap/testsuite on x86_64
 
 v2: Use asm() to supply assembler name of namespaced variables
 Make comments more verbose.

Andi,

next time please reply to the points raised in a review if you disagree
with them, and don't just ignore them.  That speeds up the review.

 libitm/:
 2013-01-24  Andi Kleen  a...@linux.intel.com
 
   * beginend.cc (GTM::htm_fastpath): Add asm alias to
   __gtm_htm_fastpath
   (GTM::global_lock): Add.
   (begin_transaction): Check pr_tryHTM. Remove one iteration
   from HTM retry loop. Move lock free check to the beginning of
   loop body.
   * config/linux/rwlock.h (gtm_rwlock::get_lock_var): Add.
   * config/posix/rwlock.h (tm_rwlock::get_lock_var): Add.
   * config/x86/sjlj.S: Include config.h.
   (pr_tryHTM, pr_uninstrumentedCode, pr_hasNoAbort,
a_runInstrumentedCode, a_runUninstrumentedCode,
_XABORT_EXPLICIT, _XABORT_RETRY): Add defines.
   (_ITM_beginTransaction): Add HTM fast path.
   * libitm.h (pr_tryHTM): Add.
   * libitm_i.h (GTM::htm_fastpath): Add asm alias.
   (GTM::gtm_global_lock): Add.
   * method-serial.cc (method_group::init, fini): Initialize
   GTM::global_lock and GTM::htm_fastpath.
 ---
  libitm/beginend.cc   |   46 +++--
  libitm/config/linux/rwlock.h |5 +
  libitm/config/posix/rwlock.h |5 +
  libitm/config/x86/sjlj.S |   47 
 ++
  libitm/libitm.h  |7 +--
  libitm/libitm_i.h|3 ++-
  libitm/method-serial.cc  |2 ++
  7 files changed, 92 insertions(+), 23 deletions(-)
 
 diff --git a/libitm/beginend.cc b/libitm/beginend.cc
 index 4369946..118920b 100644
 --- a/libitm/beginend.cc
 +++ b/libitm/beginend.cc
 @@ -55,7 +55,9 @@ static pthread_key_t thr_release_key;
  static pthread_once_t thr_release_once = PTHREAD_ONCE_INIT;
  
  // See gtm_thread::begin_transaction.
 -uint32_t GTM::htm_fastpath = 0;
 +uint32_t GTM::htm_fastpath asm(__gtm_htm_fastpath) = 0;
 +
 +uint32_t *GTM::global_lock asm(__gtm_global_lock);

Rearrange the serial lock's fields (see below).

  /* Allocate a transaction structure.  */
  void *
 @@ -187,27 +189,13 @@ GTM::gtm_thread::begin_transaction (uint32_t prop, 
 const gtm_jmpbuf *jb)
// indeed in serial mode, and HW transactions should never need serial mode
// for any internal changes (e.g., they never abort visibly to the STM code
// and thus do not trigger the standard retry handling).

Add a comment as second paragraph in the previous block of comments,
briefly explaining when pr_tryHTM is set (ie, under which condition),
and that this happens in the assembler implementation of
_ITM_beginTransaction.

 -  if (likely(htm_fastpath  (prop  pr_hasNoAbort)))
 +  // pr_tryHTM can be set by an assembler fast path when it already tried
 +  // a hardware transaction once. In this case we do one retry less.
 +  if (likely(prop  pr_tryHTM))
  {
 -  for (uint32_t t = htm_fastpath; t; t--)
 +  // One transaction has been already tried by the assembler fast path.
 +  for (uint32_t t = htm_fastpath - 1; t; t--)
   {
 -   uint32_t ret = htm_begin();
 -   if (htm_begin_success(ret))
 - {
 -   // We are executing a transaction now.
 -   // Monitor the writer flag in the serial-mode lock, and abort
 -   // if there is an active or waiting serial-mode transaction.
 -   if (unlikely(serial_lock.is_write_locked()))
 - htm_abort();
 -   else
 - // We do not need to set a_saveLiveVariables because of HTM.
 - return (prop  pr_uninstrumentedCode) ?
 - a_runUninstrumentedCode : a_runInstrumentedCode;
 - }
 -   // The transaction has aborted.  Don't retry if it's unlikely that
 -   // retrying the transaction will be successful.
 -   if (!htm_abort_should_retry(ret))
 - break;
 // 

Re: [PATCH] Add faster HTM fastpath for libitm TSX

2013-01-24 Thread Torvald Riegel
On Sat, 2013-01-12 at 13:03 -0800, Andi Kleen wrote:
 From: Andi Kleen a...@linux.intel.com
 
 The libitm TSX hardware transaction fast path currently does quite a bit of
 unnecessary work (saving registers etc.) before even trying to start
 a hardware transaction. This patch moves the initial attempt at a
 transaction early into the assembler stub. Complicated work like retries
 is still done in C. So this is essentially a fast path for the fast
 path.
 
 The assembler code doesn't understand the layout of serial_lock, but
 it needs to check that serial_lock is free. We export just the lock
 variable as a separate pointer for this.
 
 The assembler code controls the HTM fast path with a new pr_tryHTM flag.
 
 I had to reorganize the retry loop slightly so that the waiting for the
 lock to be free again is done first (because the first transaction
 is already done elsewhere). This makes the patch look larger than
 it really is. This just moves one basic block around.
 
 Passes bootstrap/testsuite on x86_64
 
 libitm/:
 2013-01-12  Andi Kleen  a...@linux.intel.com
 
   * beginend.cc (GTM::htm_fastpath): Remove.
   (__gtm_htm_fastpath, __gtm_global_lock): Add.
   (begin_transaction): Check pr_tryHTM. Remove one iteration
   from HTM retry loop. Move lock free check to the beginning of
   loop body.
   (_ITM_commitTransaction): Check __gtm_htm_fastpath instead of
   htm_fastpath.
   (_ITM_commitTransactionEH): dito.
   * config/linux/rwlock.h (gtm_rwlock::get_lock_var): Add.
   * config/posix/rwlock.h (tm_rwlock::get_lock_var): Add.
   * config/x86/sjlj.S: Include config.h.
   (pr_tryHTM, pr_uninstrumentedCode, pr_hasNoAbort,
a_runInstrumentedCode, a_runUninstrumentedCode,
_XABORT_EXPLICIT, _XABORT_RETRY): Add defines.
   (_ITM_beginTransaction): Add HTM fast path.
   * libitm.h (pr_tryHTM): Add.
   * libitm_i.h (GTM::htm_fastpath): Remove.
   (__gtm_htm_fastpath, __gtm_global_lock): Add.
   * method-serial.cc (method_group::init, fini): Initialize
   __gtm_global_lock and __gtm_htm_fastpath.
   (gtm_thread::serialirr_mode): Check __gtm_htm_fastpath.
 ---
  libitm/beginend.cc   |   50 
 +++---
  libitm/config/linux/rwlock.h |5 +
  libitm/config/posix/rwlock.h |5 +
  libitm/config/x86/sjlj.S |   47 +++
  libitm/libitm.h  |3 ++-
  libitm/libitm_i.h|7 +++---
  libitm/method-serial.cc  |8 ---
  7 files changed, 96 insertions(+), 29 deletions(-)
 
 diff --git a/libitm/beginend.cc b/libitm/beginend.cc
 index 4369946..96ee1b3 100644
 --- a/libitm/beginend.cc
 +++ b/libitm/beginend.cc
 @@ -55,7 +55,11 @@ static pthread_key_t thr_release_key;
  static pthread_once_t thr_release_once = PTHREAD_ONCE_INIT;
  
  // See gtm_thread::begin_transaction.
 -uint32_t GTM::htm_fastpath = 0;
 +extern C
 +{
 +  uint32_t __gtm_htm_fastpath = 0;

Please keep this as-is and rather use the same approach as for
gtm_thread::begin_transaction.

 +  uint32_t *__gtm_global_lock;

Same, and rearrange the serial lock's fields (see below).

 +};
  
  /* Allocate a transaction structure.  */
  void *
 @@ -187,27 +191,11 @@ GTM::gtm_thread::begin_transaction (uint32_t prop, 
 const gtm_jmpbuf *jb)
// indeed in serial mode, and HW transactions should never need serial mode
// for any internal changes (e.g., they never abort visibly to the STM code
// and thus do not trigger the standard retry handling).

Add a comment as second paragraph in the previous block of comments,
briefly explaining when pr_tryHTM is set (ie, under which condition),
and that this happens in the assembler implementation of
_ITM_beginTransaction.

 -  if (likely(htm_fastpath  (prop  pr_hasNoAbort)))
 +  if (likely(prop  pr_tryHTM))
  {
 -  for (uint32_t t = htm_fastpath; t; t--)
 +  // One transaction has been already tried by the assembler fast path.
 +  for (uint32_t t = __gtm_htm_fastpath - 1; t; t--)
   {
 -   uint32_t ret = htm_begin();
 -   if (htm_begin_success(ret))
 - {
 -   // We are executing a transaction now.
 -   // Monitor the writer flag in the serial-mode lock, and abort
 -   // if there is an active or waiting serial-mode transaction.
 -   if (unlikely(serial_lock.is_write_locked()))
 - htm_abort();
 -   else
 - // We do not need to set a_saveLiveVariables because of HTM.
 - return (prop  pr_uninstrumentedCode) ?
 - a_runUninstrumentedCode : a_runInstrumentedCode;
 - }
 -   // The transaction has aborted.  Don't retry if it's unlikely that
 -   // retrying the transaction will be successful.
 -   if (!htm_abort_should_retry(ret))
 - break;
 // Wait until any concurrent serial-mode transactions have finished.
 // This is an empty critical 

Re: [PATCH] Make some asan builtins tm_pure (PR sanitizer/55508)

2013-01-24 Thread Torvald Riegel
On Mon, 2012-12-17 at 13:52 +0400, Dmitry Vyukov wrote:
 resend in plain text
 
 On Mon, Dec 17, 2012 at 1:50 PM, Dmitry Vyukov dvyu...@google.com wrote:
 
  On Fri, Dec 14, 2012 at 5:43 PM, Torvald Riegel trie...@redhat.com wrote:
   On Thu, 2012-12-13 at 10:02 +0100, Jakub Jelinek wrote:
   On Thu, Dec 13, 2012 at 10:38:13AM +0400, Dmitry Vyukov wrote:
On Wed, Dec 12, 2012 at 11:50 PM, Jakub Jelinek ja...@redhat.com 
wrote:
 Various TM tests ICE when built with -fgnu-tm -fsanitizer=address.
 The problem is that asan.c pass adds calls to builtins that weren't 
 there
 before and TM is upset about it.  The __asan_report* are all like
 abort, in correctly written program they shouldn't have a user 
 visible
 effect, in bad program they will terminate the process, but in any 
 case
 it doesn't matter how many times they are retried as part of a 
 transaction,
 there is no state to roll back on transaction cancellation.
 __asan_handle_no_return, while not being noreturn, just marks the 
 stack as
 unprotected, so again in correctly written application no effect, in 
 bad app
 might result in some issues being undetected, but still, it can be 
 done many
 times and isn't irreversible.
   
I was only loosely following tm-languages discussions. Does gcc tm
model guarantees strong consistency for all memory accesses? I mean
there are tm implementations that allow transient inconsistencies,
  
   Will leave this to Torvald.
  
   This has two parts: (1) how TM fits into the C++11/C11 memory model, and
   (2) which guarantees the compiler and the TM runtime library agree on at
   the level of the TM ABI that we use in GCC.
  
   Regarding the first part, all transactions provide guarantees similar to
   a global lock.  Specifically, there are virtual transaction start/end
   events that take part in sequenced-before (similar to acquisition and
   release of the global lock), whose are then guaranteed to execute
   (without transactions interleaving with each other) in a global total
   order (let's call this order TO).  TO then contributes to
   happens-before.
  
   On the ABI level, the TM runtime is allowed to execute speculatively as
   long as (1) it does not expose any speculative execution to
   nontransactional code and (2) speculation doesn't violate the language's
   as-if rules (i.e., no visible side effects other than the abstract
   machine; no signals due to seg faults etc.).  This means that, for
   example, the TM will not access data at a wider granularity than what
   nontransactional code that conforms to the C++11 memory model does would
   do.
   Second, transactions can have a tentative position in TO, but they will
   only expose final TO choices to nontxnal code.  So, each transaction
   will execute code that would be valid when executed in isolation -- but
   it is possible that several transactions noncommitted transactions are
   in flight concurrently that may conflict with each other.  The TM
   ensures that such speculative execution is safe and not visible to a
   race-free program.  So, when a transaction commits at a certain position
   in TO, it will make sure that all other active transactions reach
   consensus on TO (up to this position) before it returns to the execution
   of nontxnal code.  Those active transactions will either see that they
   would be still valid at a new TO position (after the committing txn), or
   they abort and then signal that they agree to this TO.  That means that
   the nontxnal code will not be affected by any speculative execution,
   even if the prior txn privatized some data (this is called privatization
   safety in TM jargon).
   The sort-of counterpart to privatization safety is publication safety
   (ie, transactions can safely make some data shared).  While
   privatization safety is handled by the TM runtime, publication safety is
   ensured by the compiler by not allowing any dangerous load/load
   reordering, basicallly.  Publication safety is not yet ensured always,
   but Aldy is working on this.
  
than are detected later and trx is restarted. Can't asan trigger false
positives in this case?
  
   I can't imagine any.
  
   I also don't see any, because all loads/stores of all transactions, even
   those with a tentative TO position, would be a valid execution.  For
   malloc/free in transactions, AFAIR, ASAN hooks into malloc/free
   directly, so when libitm handles those including rollback of an
   allocation, there shouldn't be false positives.  However, there might be
   false negatives in case of free() because deallocations are deferred
   until transaction commit (they can't be rolled back).  Could we tell
   ASAN about free() directly using a call added by the TM instrumentation
   pass?  That would at least help catch the false negatives for all
   free()-like functions that the compiler knows about.
 
  I do not think

  1   2   3   >