Re: [HACKERS] Deadlock in XLogInsert at AIX

2019-10-13 Thread Tom Lane
Noah Misch  writes:
> On Wed, Oct 09, 2019 at 01:15:29PM -0400, Tom Lane wrote:
>> * I still think that the added configure test is a waste of build cycles.
>> It'd be sufficient to test "#ifdef HAVE__BUILTIN_CONSTANT_P" where you
>> are testing HAVE_I_CONSTRAINT__BUILTIN_CONSTANT_P, because our previous
>> buildfarm go-round with this showed that all supported compilers
>> interpret "i" this way.

> xlc does not interpret "i" that way:
> https://buildfarm.postgresql.org/cgi-bin/show_stage_log.pl?nm=hornet=2019-09-14%2016%3A42%3A32=config

Hm, how'd I miss that while looking at the buildfarm results?  Anyway,
you're clearly right on this one --- objection withdrawn.

>> * I really dislike building the asm calls with macros as you've done
>> here.

> For a macro local to one C file, I think readability is the relevant metric.
> In particular, it would be wrong to add arguments to make these more like
> header file macros.  I think the macros make the code somewhat more readable,
> and you think they make the code less readable.  I have removed the macros.

Personally I definitely find this way more readable, though I agree
beauty is in the eye of the beholder.

I've checked that this patch set compiles and passes regression tests
on my old Apple machines, so it's good to go as far as I can see.

regards, tom lane




Re: [HACKERS] Deadlock in XLogInsert at AIX

2019-10-12 Thread Noah Misch
On Wed, Oct 09, 2019 at 01:15:29PM -0400, Tom Lane wrote:
> Noah Misch  writes:
> > On Mon, Oct 07, 2019 at 03:06:35PM -0400, Tom Lane wrote:
> >> This still fails on Apple's compilers. ...
> 
> > Thanks for testing.  That error boils down to "need to use some other
> > register".  The second operand of addi is one of the ppc instruction 
> > operands
> > that can hold a constant zero or a register number[1], so the proper
> > constraint is "b".  I've made it so and added a comment.
> 
> Ah-hah.  This version does compile and pass check-world for me.
> 
> > I should probably
> > update s_lock.h, too, in a later patch.  I don't know how it has
> > mostly-avoided this failure mode, but its choice of constraint could explain
> > https://postgr.es/m/flat/36E70B06-2C5C-11D8-A096-0005024EF27F%40ifrance.com
> 
> Indeed.  It's a bit astonishing that more people haven't hit that.
> This should be back-patched.

I may as well do that first, so there's no time when s_lock.h disagrees with
arch-ppc.h about the constraint to use.  I'm attaching that patch, too.

> * I still think that the added configure test is a waste of build cycles.
> It'd be sufficient to test "#ifdef HAVE__BUILTIN_CONSTANT_P" where you
> are testing HAVE_I_CONSTRAINT__BUILTIN_CONSTANT_P, because our previous
> buildfarm go-round with this showed that all supported compilers
> interpret "i" this way.

xlc does not interpret "i" that way:
https://buildfarm.postgresql.org/cgi-bin/show_stage_log.pl?nm=hornet=2019-09-14%2016%3A42%3A32=config

> * I really dislike building the asm calls with macros as you've done
> here.  The macros violate project style, and are not remotely general-
> purpose, because they have hard-wired references to variables that are
> not in their argument lists.  While that could be fixed with more
> arguments, I don't think that the approach is readable or maintainable
> --- it's impossible for example to understand the register constraints
> without looking simultaneously at the calls and the macro definition.
> And, as we've seen in this "b" issue, the interactions between the chosen
> instruction types and the constraints are subtle enough to make me wonder
> whether you won't need even more arguments to allow some of the other
> constraints to be variable.  I think it'd be far better just to write out
> the asm in-line and accept the not-very-large amount of code duplication
> you'd get.

For a macro local to one C file, I think readability is the relevant metric.
In particular, it would be wrong to add arguments to make these more like
header file macros.  I think the macros make the code somewhat more readable,
and you think they make the code less readable.  I have removed the macros.

> * src/tools/pginclude/headerscheck needs the same adjustment as you
> made in cpluspluscheck.

Done.
diff --git a/configure b/configure
index 74627a7..02a905b 100755
--- a/configure
+++ b/configure
@@ -14518,6 +14518,46 @@ $as_echo "$pgac_cv_have_ppc_mutex_hint" >&6; }
 $as_echo "#define HAVE_PPC_LWARX_MUTEX_HINT 1" >>confdefs.h
 
 fi
+# Check if compiler accepts "i"(x) when __builtin_constant_p(x).
+{ $as_echo "$as_me:${as_lineno-$LINENO}: checking whether 
__builtin_constant_p(x) implies \"i\"(x) acceptance" >&5
+$as_echo_n "checking whether __builtin_constant_p(x) implies \"i\"(x) 
acceptance... " >&6; }
+if ${pgac_cv_have_i_constraint__builtin_constant_p+:} false; then :
+  $as_echo_n "(cached) " >&6
+else
+  cat confdefs.h - <<_ACEOF >conftest.$ac_ext
+/* end confdefs.h.  */
+static inline int
+ addi(int ra, int si)
+ {
+ int res = 0;
+ if (__builtin_constant_p(si))
+ __asm__ __volatile__(
+ " addi %0,%1,%2\n" : "=r"(res) : "b"(ra), "i"(si));
+ return res;
+ }
+ int test_adds(int x) { return addi(3, x) + addi(x, 5); }
+int
+main ()
+{
+
+  ;
+  return 0;
+}
+_ACEOF
+if ac_fn_c_try_compile "$LINENO"; then :
+  pgac_cv_have_i_constraint__builtin_constant_p=yes
+else
+  pgac_cv_have_i_constraint__builtin_constant_p=no
+fi
+rm -f core conftest.err conftest.$ac_objext conftest.$ac_ext
+fi
+{ $as_echo "$as_me:${as_lineno-$LINENO}: result: 
$pgac_cv_have_i_constraint__builtin_constant_p" >&5
+$as_echo "$pgac_cv_have_i_constraint__builtin_constant_p" >&6; }
+if test x"$pgac_cv_have_i_constraint__builtin_constant_p" = xyes ; then
+
+$as_echo "#define HAVE_I_CONSTRAINT__BUILTIN_CONSTANT_P 1" >>confdefs.h
+
+fi
   ;;
 esac
 
diff --git a/configure.in b/configure.in
index 0d16c1a..88d3a59 100644
--- a/configure.in
+++ b/configure.in
@@ -1539,6 +1539,26 @@ case $host_cpu in
 if test x"$pgac_cv_have_ppc_mutex_hint" = xyes ; then
AC_DEFINE(HAVE_PPC_LWARX_MUTEX_HINT, 1, [Define to 1 if the assembler 
supports PPC's LWARX mutex hint bit.])
 fi
+# Check if compiler accepts "i"(x) when __builtin_constant_p(x).
+AC_CACHE_CHECK([whether __builtin_constant_p(x) implies "i"(x) acceptance],
+   

Re: [HACKERS] Deadlock in XLogInsert at AIX

2019-10-09 Thread Tom Lane
Noah Misch  writes:
> On Mon, Oct 07, 2019 at 03:06:35PM -0400, Tom Lane wrote:
>> This still fails on Apple's compilers. ...

> Thanks for testing.  That error boils down to "need to use some other
> register".  The second operand of addi is one of the ppc instruction operands
> that can hold a constant zero or a register number[1], so the proper
> constraint is "b".  I've made it so and added a comment.

Ah-hah.  This version does compile and pass check-world for me.

> I should probably
> update s_lock.h, too, in a later patch.  I don't know how it has
> mostly-avoided this failure mode, but its choice of constraint could explain
> https://postgr.es/m/flat/36E70B06-2C5C-11D8-A096-0005024EF27F%40ifrance.com

Indeed.  It's a bit astonishing that more people haven't hit that.
This should be back-patched.


Now that the patch passes mechanical checks, I took a closer look,
and there are some things I don't like:

* I still think that the added configure test is a waste of build cycles.
It'd be sufficient to test "#ifdef HAVE__BUILTIN_CONSTANT_P" where you
are testing HAVE_I_CONSTRAINT__BUILTIN_CONSTANT_P, because our previous
buildfarm go-round with this showed that all supported compilers
interpret "i" this way.

* I really dislike building the asm calls with macros as you've done
here.  The macros violate project style, and are not remotely general-
purpose, because they have hard-wired references to variables that are
not in their argument lists.  While that could be fixed with more
arguments, I don't think that the approach is readable or maintainable
--- it's impossible for example to understand the register constraints
without looking simultaneously at the calls and the macro definition.
And, as we've seen in this "b" issue, the interactions between the chosen
instruction types and the constraints are subtle enough to make me wonder
whether you won't need even more arguments to allow some of the other
constraints to be variable.  I think it'd be far better just to write out
the asm in-line and accept the not-very-large amount of code duplication
you'd get.

* src/tools/pginclude/headerscheck needs the same adjustment as you
made in cpluspluscheck.

regards, tom lane




Re: [HACKERS] Deadlock in XLogInsert at AIX

2019-10-09 Thread Noah Misch
On Mon, Oct 07, 2019 at 03:06:35PM -0400, Tom Lane wrote:
> Noah Misch  writes:
> > [ fetch-add-gcc-xlc-unify-v2.patch ]
> 
> This still fails on Apple's compilers.  The first failure I get is
> 
> ccache gcc -std=gnu99 -Wall -Wmissing-prototypes -Wpointer-arith 
> -Wdeclaration-after-statement -Wendif-labels -Wmissing-format-attribute 
> -Wformat-security -fno-strict-aliasing -fwrapv -g -O2 -I../../../src/include 
> -I/usr/local/include -isysroot /Developer/SDKs/MacOSX10.5.sdk-c -o 
> nodeHashjoin.o nodeHashjoin.c
> /var/tmp//ccXUM8ep.s:449:Parameter error: r0 not allowed for parameter 2 
> (code as 0 not r0)
> 
> Line 449 of the assembly file is the addi in
> 
> LM87:
> sync
> lwarx   r0,0,r2
> addir11,r0,1
> stwcx.  r11,0,r2
> bne $-12
> isync
> 
> which I suppose comes out of PG_PPC_FETCH_ADD.  I find this idea of
> constructing assembly code by string-pasting pretty unreadable and am not
> tempted to try to debug it, but I don't immediately see why this doesn't
> work when the existing s_lock.h code does.  I think that the assembler
> error message is probably misleading: while it seems to be saying to
> s/r0/0/ in the addi, gcc consistently uses "rN" syntax for the second
> parameter elsewhere.  I do note that gcc never generates r0 as addi's
> second parameter in several files I checked through, so maybe what it
> means is "you need to use some other register"?  (Which would imply that
> the constraint for this asm argument is too loose.)

Thanks for testing.  That error boils down to "need to use some other
register".  The second operand of addi is one of the ppc instruction operands
that can hold a constant zero or a register number[1], so the proper
constraint is "b".  I've made it so and added a comment.  I should probably
update s_lock.h, too, in a later patch.  I don't know how it has
mostly-avoided this failure mode, but its choice of constraint could explain
https://postgr.es/m/flat/36E70B06-2C5C-11D8-A096-0005024EF27F%40ifrance.com

> I'm also wondering why this isn't following s_lock.h's lead as to
> USE_PPC_LWARX_MUTEX_HINT and USE_PPC_LWSYNC.

Most or all of today's pg_atomic_compare_exchange_u32() usage does not have
the property that the mutex hint would signal.

pg_atomic_compare_exchange_u32() specifies "Full barrier semantics", which
lwsync does not provide.  We might want to relax the specification to make
lwsync acceptable, but that would be a separate, architecture-independent
project.  (generic-gcc.h:pg_atomic_compare_exchange_u32_impl() speculates
along those lines, writing "FIXME: we can probably use a lower consistency
model".)


[1] "Notice that addi and addis use the value 0, not the contents of GPR 0, if
RA=0." -- https://openpowerfoundation.org/?resource_lib=power-isa-version-3-0
diff --git a/configure b/configure
index 7a6bfc2..d6ecb33 100755
--- a/configure
+++ b/configure
@@ -14650,6 +14650,46 @@ $as_echo "$pgac_cv_have_ppc_mutex_hint" >&6; }
 $as_echo "#define HAVE_PPC_LWARX_MUTEX_HINT 1" >>confdefs.h
 
 fi
+# Check if compiler accepts "i"(x) when __builtin_constant_p(x).
+{ $as_echo "$as_me:${as_lineno-$LINENO}: checking whether 
__builtin_constant_p(x) implies \"i\"(x) acceptance" >&5
+$as_echo_n "checking whether __builtin_constant_p(x) implies \"i\"(x) 
acceptance... " >&6; }
+if ${pgac_cv_have_i_constraint__builtin_constant_p+:} false; then :
+  $as_echo_n "(cached) " >&6
+else
+  cat confdefs.h - <<_ACEOF >conftest.$ac_ext
+/* end confdefs.h.  */
+static inline int
+ addi(int ra, int si)
+ {
+ int res = 0;
+ if (__builtin_constant_p(si))
+ __asm__ __volatile__(
+ " addi %0,%1,%2\n" : "=r"(res) : "r"(ra), "i"(si));
+ return res;
+ }
+ int test_adds(int x) { return addi(3, x) + addi(x, 5); }
+int
+main ()
+{
+
+  ;
+  return 0;
+}
+_ACEOF
+if ac_fn_c_try_compile "$LINENO"; then :
+  pgac_cv_have_i_constraint__builtin_constant_p=yes
+else
+  pgac_cv_have_i_constraint__builtin_constant_p=no
+fi
+rm -f core conftest.err conftest.$ac_objext conftest.$ac_ext
+fi
+{ $as_echo "$as_me:${as_lineno-$LINENO}: result: 
$pgac_cv_have_i_constraint__builtin_constant_p" >&5
+$as_echo "$pgac_cv_have_i_constraint__builtin_constant_p" >&6; }
+if test x"$pgac_cv_have_i_constraint__builtin_constant_p" = xyes ; then
+
+$as_echo "#define HAVE_I_CONSTRAINT__BUILTIN_CONSTANT_P 1" >>confdefs.h
+
+fi
   ;;
 esac
 
diff --git a/configure.in b/configure.in
index dde3eec..510bd76 100644
--- a/configure.in
+++ b/configure.in
@@ -1541,6 +1541,26 @@ case $host_cpu in
 if test x"$pgac_cv_have_ppc_mutex_hint" = xyes ; then
AC_DEFINE(HAVE_PPC_LWARX_MUTEX_HINT, 1, [Define to 1 if the assembler 
supports PPC's LWARX mutex hint bit.])
 fi
+# Check if compiler accepts "i"(x) when __builtin_constant_p(x).
+AC_CACHE_CHECK([whether __builtin_constant_p(x) implies "i"(x) acceptance],
+   

Re: [HACKERS] Deadlock in XLogInsert at AIX

2019-10-07 Thread Tom Lane
Noah Misch  writes:
> [ fetch-add-gcc-xlc-unify-v2.patch ]

This still fails on Apple's compilers.  The first failure I get is

ccache gcc -std=gnu99 -Wall -Wmissing-prototypes -Wpointer-arith 
-Wdeclaration-after-statement -Wendif-labels -Wmissing-format-attribute 
-Wformat-security -fno-strict-aliasing -fwrapv -g -O2 -I../../../src/include 
-I/usr/local/include -isysroot /Developer/SDKs/MacOSX10.5.sdk-c -o 
nodeHashjoin.o nodeHashjoin.c
/var/tmp//ccXUM8ep.s:449:Parameter error: r0 not allowed for parameter 2 (code 
as 0 not r0)

Line 449 of the assembly file is the addi in

LM87:
sync
lwarx   r0,0,r2
addir11,r0,1
stwcx.  r11,0,r2
bne $-12
isync

which I suppose comes out of PG_PPC_FETCH_ADD.  I find this idea of
constructing assembly code by string-pasting pretty unreadable and am not
tempted to try to debug it, but I don't immediately see why this doesn't
work when the existing s_lock.h code does.  I think that the assembler
error message is probably misleading: while it seems to be saying to
s/r0/0/ in the addi, gcc consistently uses "rN" syntax for the second
parameter elsewhere.  I do note that gcc never generates r0 as addi's
second parameter in several files I checked through, so maybe what it
means is "you need to use some other register"?  (Which would imply that
the constraint for this asm argument is too loose.)

I'm also wondering why this isn't following s_lock.h's lead as to
USE_PPC_LWARX_MUTEX_HINT and USE_PPC_LWSYNC.

regards, tom lane




Re: [HACKERS] Deadlock in XLogInsert at AIX

2019-10-05 Thread Noah Misch
On Sat, Aug 31, 2019 at 03:30:26PM -0700, Noah Misch wrote:
> On Sat, Aug 31, 2019 at 02:27:55PM -0400, Tom Lane wrote:
> > Noah Misch  writes:
> > > Done.  fetch-add-variable-test-v1.patch just adds tests for non-constant
> > > addends and 16-bit edge cases.  Today's implementation handles those,
> > > PostgreSQL doesn't use them, and I might easily have broken them.
> > > fetch-add-xlc-asm-v1.patch moves xlc builds from the __fetch_and_add()
> > > intrinsic to inline asm.  fetch-add-gcc-xlc-unify-v1.patch moves 
> > > fetch_add to
> > > inline asm for all other ppc compilers.  gcc-7.2.0 generates equivalent 
> > > code
> > > before and after.  I plan to keep the third patch HEAD-only, 
> > > back-patching the
> > > other two.  I tested with xlc v12 and v13.
> > 
> > Hm, no objection to the first two patches, but I don't understand
> > why the third patch goes to so much effort just to use "addi" rather
> > than (one assumes) "li" then "add"?  It doesn't seem likely that
> > that's buying much.
> 
> Changing an addi to li+add may not show up on benchmarks, but I can't claim
> it's immaterial.  I shouldn't unify the code if that makes the compiled code
> materially worse than what the gcc intrinsics produce today, hence the
> nontrivial (~50 line) bits to match the intrinsics' capabilities.

The first two patches have worked so far, but fetch-add-gcc-xlc-unify-v1.patch
broke older gcc: https://postgr.es/m/flat/7517.1568470...@sss.pgh.pa.us.  That
side thread settled on putting pg_atomic_compare_exchange_u{32,64}_impl
implementations in arch-ppc.h.  Attached fetch-add-gcc-xlc-unify-v2.patch does
so.  For compare_exchange, the generated code doesn't match gcc intrinsics
exactly; code comments discuss this.  Compared to v1, its other change is to
extract common __asm__ templates into macros instead of repeating them four
times (32/64bit, variable/const).
diff --git a/configure b/configure
index 7a6bfc2..d6ecb33 100755
--- a/configure
+++ b/configure
@@ -14650,6 +14650,46 @@ $as_echo "$pgac_cv_have_ppc_mutex_hint" >&6; }
 $as_echo "#define HAVE_PPC_LWARX_MUTEX_HINT 1" >>confdefs.h
 
 fi
+# Check if compiler accepts "i"(x) when __builtin_constant_p(x).
+{ $as_echo "$as_me:${as_lineno-$LINENO}: checking whether 
__builtin_constant_p(x) implies \"i\"(x) acceptance" >&5
+$as_echo_n "checking whether __builtin_constant_p(x) implies \"i\"(x) 
acceptance... " >&6; }
+if ${pgac_cv_have_i_constraint__builtin_constant_p+:} false; then :
+  $as_echo_n "(cached) " >&6
+else
+  cat confdefs.h - <<_ACEOF >conftest.$ac_ext
+/* end confdefs.h.  */
+static inline int
+ addi(int ra, int si)
+ {
+ int res = 0;
+ if (__builtin_constant_p(si))
+ __asm__ __volatile__(
+ " addi %0,%1,%2\n" : "=r"(res) : "r"(ra), "i"(si));
+ return res;
+ }
+ int test_adds(int x) { return addi(3, x) + addi(x, 5); }
+int
+main ()
+{
+
+  ;
+  return 0;
+}
+_ACEOF
+if ac_fn_c_try_compile "$LINENO"; then :
+  pgac_cv_have_i_constraint__builtin_constant_p=yes
+else
+  pgac_cv_have_i_constraint__builtin_constant_p=no
+fi
+rm -f core conftest.err conftest.$ac_objext conftest.$ac_ext
+fi
+{ $as_echo "$as_me:${as_lineno-$LINENO}: result: 
$pgac_cv_have_i_constraint__builtin_constant_p" >&5
+$as_echo "$pgac_cv_have_i_constraint__builtin_constant_p" >&6; }
+if test x"$pgac_cv_have_i_constraint__builtin_constant_p" = xyes ; then
+
+$as_echo "#define HAVE_I_CONSTRAINT__BUILTIN_CONSTANT_P 1" >>confdefs.h
+
+fi
   ;;
 esac
 
diff --git a/configure.in b/configure.in
index dde3eec..510bd76 100644
--- a/configure.in
+++ b/configure.in
@@ -1541,6 +1541,26 @@ case $host_cpu in
 if test x"$pgac_cv_have_ppc_mutex_hint" = xyes ; then
AC_DEFINE(HAVE_PPC_LWARX_MUTEX_HINT, 1, [Define to 1 if the assembler 
supports PPC's LWARX mutex hint bit.])
 fi
+# Check if compiler accepts "i"(x) when __builtin_constant_p(x).
+AC_CACHE_CHECK([whether __builtin_constant_p(x) implies "i"(x) acceptance],
+   [pgac_cv_have_i_constraint__builtin_constant_p],
+[AC_COMPILE_IFELSE([AC_LANG_PROGRAM(
+[static inline int
+ addi(int ra, int si)
+ {
+ int res = 0;
+ if (__builtin_constant_p(si))
+ __asm__ __volatile__(
+ " addi %0,%1,%2\n" : "=r"(res) : "r"(ra), "i"(si));
+ return res;
+ }
+ int test_adds(int x) { return addi(3, x) + addi(x, 5); }], [])],
+[pgac_cv_have_i_constraint__builtin_constant_p=yes],
+[pgac_cv_have_i_constraint__builtin_constant_p=no])])
+if test x"$pgac_cv_have_i_constraint__builtin_constant_p" = xyes ; then
+  AC_DEFINE(HAVE_I_CONSTRAINT__BUILTIN_CONSTANT_P, 1,
+[Define to 1 if __builtin_constant_p(x) implies "i"(x) 
acceptance.])
+fi
   ;;
 esac
 
diff --git a/src/include/pg_config.h.in b/src/include/pg_config.h.in
index 512213a..9be84fc 100644
--- a/src/include/pg_config.h.in
+++ b/src/include/pg_config.h.in
@@ -332,6 +332,9 @@
 /* 

Re: [HACKERS] Deadlock in XLogInsert at AIX

2019-08-31 Thread Noah Misch
On Sat, Aug 31, 2019 at 02:27:55PM -0400, Tom Lane wrote:
> Noah Misch  writes:
> > Done.  fetch-add-variable-test-v1.patch just adds tests for non-constant
> > addends and 16-bit edge cases.  Today's implementation handles those,
> > PostgreSQL doesn't use them, and I might easily have broken them.
> > fetch-add-xlc-asm-v1.patch moves xlc builds from the __fetch_and_add()
> > intrinsic to inline asm.  fetch-add-gcc-xlc-unify-v1.patch moves fetch_add 
> > to
> > inline asm for all other ppc compilers.  gcc-7.2.0 generates equivalent code
> > before and after.  I plan to keep the third patch HEAD-only, back-patching 
> > the
> > other two.  I tested with xlc v12 and v13.
> 
> Hm, no objection to the first two patches, but I don't understand
> why the third patch goes to so much effort just to use "addi" rather
> than (one assumes) "li" then "add"?  It doesn't seem likely that
> that's buying much.

Changing an addi to li+add may not show up on benchmarks, but I can't claim
it's immaterial.  I shouldn't unify the code if that makes the compiled code
materially worse than what the gcc intrinsics produce today, hence the
nontrivial (~50 line) bits to match the intrinsics' capabilities.




Re: [HACKERS] Deadlock in XLogInsert at AIX

2019-08-31 Thread Tom Lane
Noah Misch  writes:
> Done.  fetch-add-variable-test-v1.patch just adds tests for non-constant
> addends and 16-bit edge cases.  Today's implementation handles those,
> PostgreSQL doesn't use them, and I might easily have broken them.
> fetch-add-xlc-asm-v1.patch moves xlc builds from the __fetch_and_add()
> intrinsic to inline asm.  fetch-add-gcc-xlc-unify-v1.patch moves fetch_add to
> inline asm for all other ppc compilers.  gcc-7.2.0 generates equivalent code
> before and after.  I plan to keep the third patch HEAD-only, back-patching the
> other two.  I tested with xlc v12 and v13.

Hm, no objection to the first two patches, but I don't understand
why the third patch goes to so much effort just to use "addi" rather
than (one assumes) "li" then "add"?  It doesn't seem likely that
that's buying much.

regards, tom lane




Re: [HACKERS] Deadlock in XLogInsert at AIX

2019-08-31 Thread Noah Misch
On Wed, Jan 17, 2018 at 12:36:31AM -0800, Noah Misch wrote:
> On Tue, Jan 16, 2018 at 08:50:24AM -0800, Andres Freund wrote:
> > On 2018-01-16 16:12:11 +0900, Michael Paquier wrote:
> > > On Fri, Feb 03, 2017 at 12:26:50AM +, Noah Misch wrote:
> > > > Since this emits double syncs with older xlc, I recommend instead 
> > > > replacing
> > > > the whole thing with inline asm.  As I opined in the last message of the
> > > > thread you linked above, the intrinsics provide little value as 
> > > > abstractions
> > > > if one checks the generated code to deduce how to use them.  Now that 
> > > > the
> > > > generated code is xlc-version-dependent, the port is better off with
> > > > compiler-independent asm like we have for ppc in s_lock.h.
> > > 
> > > Could it be cleaner to just use __xlc_ver__ to avoid double syncs on
> > > past versions? I think that it would make the code more understandable
> > > than just listing directly the instructions.
> > 
> > Given the quality of the intrinsics on AIX, see past commits and the
> > comment in the code quoted above, I think we're much better of doing
> > this via inline asm.
> 
> For me, verifiability is the crucial benefit of inline asm.  Anyone with an
> architecture manual can thoroughly review an inline asm implementation.  Given
> intrinsics and __xlc_ver__ conditionals, the same level of review requires
> access to every xlc version.

> > > As Heikki is not around these days, Noah, could you provide a new
> > > version of the patch? This bug has been around for some time now, it
> > > would be nice to move on.. 
> 
> Not soon.

Done.  fetch-add-variable-test-v1.patch just adds tests for non-constant
addends and 16-bit edge cases.  Today's implementation handles those,
PostgreSQL doesn't use them, and I might easily have broken them.
fetch-add-xlc-asm-v1.patch moves xlc builds from the __fetch_and_add()
intrinsic to inline asm.  fetch-add-gcc-xlc-unify-v1.patch moves fetch_add to
inline asm for all other ppc compilers.  gcc-7.2.0 generates equivalent code
before and after.  I plan to keep the third patch HEAD-only, back-patching the
other two.  I tested with xlc v12 and v13.
diff --git a/src/test/regress/regress.c b/src/test/regress/regress.c
index 7f03b7e..b00b76f 100644
--- a/src/test/regress/regress.c
+++ b/src/test/regress/regress.c
@@ -687,7 +687,7 @@ test_atomic_uint32(void)
if (pg_atomic_read_u32() != 3)
elog(ERROR, "atomic_read_u32() #2 wrong");
 
-   if (pg_atomic_fetch_add_u32(, 1) != 3)
+   if (pg_atomic_fetch_add_u32(, pg_atomic_read_u32() - 2) != 3)
elog(ERROR, "atomic_fetch_add_u32() #1 wrong");
 
if (pg_atomic_fetch_sub_u32(, 1) != 4)
@@ -712,6 +712,20 @@ test_atomic_uint32(void)
if (pg_atomic_fetch_add_u32(, INT_MAX) != INT_MAX)
elog(ERROR, "pg_atomic_add_fetch_u32() #3 wrong");
 
+   pg_atomic_fetch_add_u32(, 2);   /* wrap to 0 */
+
+   if (pg_atomic_fetch_add_u32(, PG_INT16_MAX) != 0)
+   elog(ERROR, "pg_atomic_fetch_add_u32() #3 wrong");
+
+   if (pg_atomic_fetch_add_u32(, PG_INT16_MAX + 1) != PG_INT16_MAX)
+   elog(ERROR, "pg_atomic_fetch_add_u32() #4 wrong");
+
+   if (pg_atomic_fetch_add_u32(, PG_INT16_MIN) != 2 * PG_INT16_MAX + 1)
+   elog(ERROR, "pg_atomic_fetch_add_u32() #5 wrong");
+
+   if (pg_atomic_fetch_add_u32(, PG_INT16_MIN - 1) != PG_INT16_MAX)
+   elog(ERROR, "pg_atomic_fetch_add_u32() #6 wrong");
+
pg_atomic_fetch_add_u32(, 1);   /* top up to UINT_MAX */
 
if (pg_atomic_read_u32() != UINT_MAX)
@@ -787,7 +801,7 @@ test_atomic_uint64(void)
if (pg_atomic_read_u64() != 3)
elog(ERROR, "atomic_read_u64() #2 wrong");
 
-   if (pg_atomic_fetch_add_u64(, 1) != 3)
+   if (pg_atomic_fetch_add_u64(, pg_atomic_read_u64() - 2) != 3)
elog(ERROR, "atomic_fetch_add_u64() #1 wrong");
 
if (pg_atomic_fetch_sub_u64(, 1) != 4)
diff --git a/src/include/port/atomics/generic-xlc.h 
b/src/include/port/atomics/generic-xlc.h
index 5ddccff..8b5c732 100644
--- a/src/include/port/atomics/generic-xlc.h
+++ b/src/include/port/atomics/generic-xlc.h
@@ -73,11 +73,27 @@ pg_atomic_compare_exchange_u32_impl(volatile 
pg_atomic_uint32 *ptr,
 static inline uint32
 pg_atomic_fetch_add_u32_impl(volatile pg_atomic_uint32 *ptr, int32 add_)
 {
+   uint32 _t;
+   uint32 res;
+
/*
-* __fetch_and_add() emits a leading "sync" and trailing "isync", 
thereby
-* providing sequential consistency.  This is undocumented.
+* xlc has a no-longer-documented __fetch_and_add() intrinsic.  In xlc
+* 12.01.., it emits a leading "sync" and trailing "isync".  In
+* xlc 13.01.0003.0004, it emits neither.  Hence, using the intrinsic
+* would add redundant syncs on xlc 12.
 */
-   return __fetch_and_add((volatile int *)>value, add_);
+   __asm__ __volatile__(
+   " 

Re: [HACKERS] Deadlock in XLogInsert at AIX

2018-01-23 Thread Robert Haas
On Mon, Jan 22, 2018 at 4:23 PM, Tom Lane  wrote:
>> If nobody is willing to put in the effort to keep AIX supported under
>> XLC, then we should just update the documentation to say that it isn't
>> supported.  Our support for that platform is pretty marginal anyway if
>> we're only supporting it up through 6.1; that was released in 2007 and
>> went out of support in Q2 of last year.
>
> Huh?  We have somebody just upthread saying that they're working out
> issues on newer AIX.  We should at least give them time to finish
> that research before making any decisions.

Well, I certainly have no problem with that in theory.  That's why my
sentence started with the word "if".

On a practical level, this thread has been dead for almost a year, and
has revived slightly because Michael bumped it, but nobody has really
made a firm commitment to do any specific thing to move this issue
forward.  I'm happy if they do, but it might not happen.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: [HACKERS] Deadlock in XLogInsert at AIX

2018-01-22 Thread Michael Paquier
On Mon, Jan 22, 2018 at 04:23:55PM -0500, Tom Lane wrote:
> Robert Haas  writes:
>> I agree with Noah.  It's true that having unfixed bugs isn't
>> particularly good, but it doesn't justify activating a CommitFest
>> entry under someone else's name.
> 
> Agreed.  The CF app is not a bug tracker.

OK, thanks for the feedback. Based on that I am deleting this entry from
the CF. And my apologies for the noise.

>> If nobody is willing to put in the effort to keep AIX supported under
>> XLC, then we should just update the documentation to say that it isn't
>> supported.  Our support for that platform is pretty marginal anyway if
>> we're only supporting it up through 6.1; that was released in 2007 and
>> went out of support in Q2 of last year.
> 
> Huh?  We have somebody just upthread saying that they're working out
> issues on newer AIX.  We should at least give them time to finish
> that research before making any decisions.

Yes, Tony stated so, still he has not published any patches
yet. Regarding the potential documentation impact, please let me suggest
just to mention that support of XLC up to 12.1 is supported, per the
buildfarm this is rather stable and does not face the same issues as
what we are seeing with XLC 13.1. Is there any point to wait for the
docs to not say that now? As far as I know, if support for XLC 13.1
moves on in the future we can update the docs so as the support range is
increased later on.
--
Michael


signature.asc
Description: PGP signature


Re: [HACKERS] Deadlock in XLogInsert at AIX

2018-01-22 Thread Tom Lane
Robert Haas  writes:
> I agree with Noah.  It's true that having unfixed bugs isn't
> particularly good, but it doesn't justify activating a CommitFest
> entry under someone else's name.

Agreed.  The CF app is not a bug tracker.

> If nobody is willing to put in the effort to keep AIX supported under
> XLC, then we should just update the documentation to say that it isn't
> supported.  Our support for that platform is pretty marginal anyway if
> we're only supporting it up through 6.1; that was released in 2007 and
> went out of support in Q2 of last year.

Huh?  We have somebody just upthread saying that they're working out
issues on newer AIX.  We should at least give them time to finish
that research before making any decisions.

regards, tom lane



Re: [HACKERS] Deadlock in XLogInsert at AIX

2018-01-20 Thread Michael Paquier
On Wed, Jan 17, 2018 at 12:36:31AM -0800, Noah Misch wrote:
> For me, verifiability is the crucial benefit of inline asm.  Anyone with an
> architecture manual can thoroughly review an inline asm implementation.  Given
> intrinsics and __xlc_ver__ conditionals, the same level of review requires
> access to every xlc version.

Okay.

> The most recent patch version is Returned with Feedback.  As a matter of
> procedure, I discourage creating commitfest entries as a tool to solicit new
> patch versions.  If I were the author of a RwF patch, I would dislike finding
> a commitfest entry that I did not create with myself listed as author.

Per my understanding, this is a bug, and we don't want to lose track of
them.

> If you do choose to proceed, the entry should be Waiting on Author.

Right.

> Note that fixing this bug is just the start of accepting XLC 13.1 as a
> compiler of PostgreSQL.  If we get a buildfarm member with a few dozen clean
> runs (blocked by, at a minimum, fixing this and the inlining bug), we'll have
> something.  Until then, support for XLC 13.1 is an anti-feature.

Per my understanding of this thread, this is a bug. My point is that the
documentation states that AIX is supported from 4.3.3 to 6.1, however
there are no restrictions related to the compiler, hence I would have
thought that the docs imply XLC 13.1 as a supported compiler. And IBM
states that XLC 13.1 is supported from AIX 6.1:
https://www-01.ibm.com/support/docview.wss?uid=swg21326972

True that the docs tell as well to look at the buildfarm animals, which
don't use XLC 13.1 if you don't look at them closely. Perhaps an
explicit mention of the compiler compatibilities in the docs would help
making the support range clear to anybody then. I would expect more
people to look at the docs than the buildfarm internal contents.
--
Michael


signature.asc
Description: PGP signature


Re: [HACKERS] Deadlock in XLogInsert at AIX

2018-01-17 Thread Bernd Helmle
Am Dienstag, den 16.01.2018, 08:25 + schrieb REIX, Tony:
> I've been able to compare PostgreSQL compiled with XLC vs GCC 7.1
> and, using times outputs provided by PostgreSQL tests, XLC seems to
> provide at least 8% more speed. We also plan to run professional
> performance tests in order to compare PostgreSQL 10.1 on AIX vs
> Linux/Power. I saw some 2017 performance slides, made with older
> versions of PostgreSQL and XLC, that show bad PostgreSQL performance
> on AIX vs Linux/Power, and I cannot believe it. We plan to
> investigate this.

I assume you are referring to the attached graph i've showed on
PGConf.US 2017 ?

The numbers we've got on that E850 machine (pgbench SELECT-only, scale
1000) weren't really good in comparison to Linux on the same machine.

We tried many options to make the performance better, overall the graph
shows the best performance from Linux *and* AIX with gcc. XL C We used
some knobs to get the best out on AIX:

export OBJECT_MODE=64; gcc -m64
ldedit -b forkpolicy:cor -b textpsize:64K -b datapsize:64K -b
stackpsize:64K postgres
export MALLOCOPTIONS=multiheap:16,considersize,pool,no_mallinfo
schedo -p -o vpm_fold_policy=4

There are many other things you can tune on AIX, but they didn't seem
to give the improvement we'd like to see.


run_bench.sh
Description: application/shellscript


Re: [HACKERS] Deadlock in XLogInsert at AIX

2018-01-17 Thread Noah Misch
On Tue, Jan 16, 2018 at 08:50:24AM -0800, Andres Freund wrote:
> On 2018-01-16 16:12:11 +0900, Michael Paquier wrote:
> > On Fri, Feb 03, 2017 at 12:26:50AM +, Noah Misch wrote:
> > > Since this emits double syncs with older xlc, I recommend instead 
> > > replacing
> > > the whole thing with inline asm.  As I opined in the last message of the
> > > thread you linked above, the intrinsics provide little value as 
> > > abstractions
> > > if one checks the generated code to deduce how to use them.  Now that the
> > > generated code is xlc-version-dependent, the port is better off with
> > > compiler-independent asm like we have for ppc in s_lock.h.
> > 
> > Could it be cleaner to just use __xlc_ver__ to avoid double syncs on
> > past versions? I think that it would make the code more understandable
> > than just listing directly the instructions.
> 
> Given the quality of the intrinsics on AIX, see past commits and the
> comment in the code quoted above, I think we're much better of doing
> this via inline asm.

For me, verifiability is the crucial benefit of inline asm.  Anyone with an
architecture manual can thoroughly review an inline asm implementation.  Given
intrinsics and __xlc_ver__ conditionals, the same level of review requires
access to every xlc version.

> > As there have been other
> > bug reports from Tony Reix who has been working on AIX with XLC 13.1 and
> > that this thread got lost in the wild, I have added an entry in the next
> > CF:
> > https://commitfest.postgresql.org/17/1484/

The most recent patch version is Returned with Feedback.  As a matter of
procedure, I discourage creating commitfest entries as a tool to solicit new
patch versions.  If I were the author of a RwF patch, I would dislike finding
a commitfest entry that I did not create with myself listed as author.

If you do choose to proceed, the entry should be Waiting on Author.

> > As Heikki is not around these days, Noah, could you provide a new
> > version of the patch? This bug has been around for some time now, it
> > would be nice to move on.. 

Not soon.

Note that fixing this bug is just the start of accepting XLC 13.1 as a
compiler of PostgreSQL.  If we get a buildfarm member with a few dozen clean
runs (blocked by, at a minimum, fixing this and the inlining bug), we'll have
something.  Until then, support for XLC 13.1 is an anti-feature.

nm



Re: [HACKERS] Deadlock in XLogInsert at AIX

2018-01-16 Thread Andres Freund
On 2018-01-16 16:12:11 +0900, Michael Paquier wrote:
> On Fri, Feb 03, 2017 at 12:26:50AM +, Noah Misch wrote:
> > On Wed, Feb 01, 2017 at 02:39:25PM +0200, Heikki Linnakangas wrote:
> >> @@ -73,11 +73,19 @@ pg_atomic_compare_exchange_u32_impl(volatile 
> >> pg_atomic_uint32 *ptr,
> >>  static inline uint32
> >>  pg_atomic_fetch_add_u32_impl(volatile pg_atomic_uint32 *ptr, int32 add_)
> >>  {
> >> +  uint32  ret;
> >> +
> >>/*
> >> -   * __fetch_and_add() emits a leading "sync" and trailing "isync", 
> >> thereby
> >> -   * providing sequential consistency.  This is undocumented.
> >> +   * Use __sync() before and __isync() after, like in compare-exchange
> >> +   * above.
> >> */
> >> -  return __fetch_and_add((volatile int *)>value, add_);
> >> +  __sync();
> >> +
> >> +  ret = __fetch_and_add((volatile int *)>value, add_);
> >> +
> >> +  __isync();
> >> +
> >> +  return ret;
> >>  }
> > 
> > Since this emits double syncs with older xlc, I recommend instead replacing
> > the whole thing with inline asm.  As I opined in the last message of the
> > thread you linked above, the intrinsics provide little value as abstractions
> > if one checks the generated code to deduce how to use them.  Now that the
> > generated code is xlc-version-dependent, the port is better off with
> > compiler-independent asm like we have for ppc in s_lock.h.
> 
> Could it be cleaner to just use __xlc_ver__ to avoid double syncs on
> past versions? I think that it would make the code more understandable
> than just listing directly the instructions.

Given the quality of the intrinsics on AIX, see past commits and the
comment in the code quoted above, I think we're much better of doing
this via inline asm.

Greetings,

Andres Freund



RE:[HACKERS] Deadlock in XLogInsert at AIX

2018-01-16 Thread REIX, Tony
Thanks Noah !

Hummm You have a big machine, more powerful than mine. However, it seems that 
you do not see the random failure I see.


Cordialement,

Tony Reix

ATOS / Bull SAS
ATOS Expert
IBM Coop Architect & Technical Leader
Office : +33 (0) 4 76 29 72 67
1 rue de Provence - 38432 Échirolles - France
www.atos.net


De : Noah Misch [n...@leadboat.com]
Envoyé : mardi 16 janvier 2018 17:19
À : REIX, Tony
Cc : Michael Paquier; Heikki Linnakangas; Konstantin Knizhnik; PostgreSQL 
Hackers; Bernd Helmle; OLIVA, PASCAL; EMPEREUR-MOT, SYLVIE
Objet : Re: [HACKERS] Deadlock in XLogInsert at AIX

On Tue, Jan 16, 2018 at 01:50:29PM +, REIX, Tony wrote:
> And, on BuildFarm, I do not see any details about the logical/physical 
> configuration of the AIX VMs, like hornet.
> Being able to run real concurrent parallel stress programs, thus required 
> multi-physical-CPU VM, would help.

It has 48 virtual CPUs.  Here's the prtconf output:

System Model: IBM,8231-E2C
Machine Serial Number: 104C0CT
Processor Type: PowerPC_POWER7
Processor Implementation Mode: POWER 7
Processor Version: PV_7_Compat
Number Of Processors: 12
Processor Clock Speed: 3720 MHz
CPU Type: 64-bit
Kernel Type: 64-bit
LPAR Info: 1 10-4C0CT
Memory Size: 127488 MB
Good Memory Size: 127488 MB
Platform Firmware level: AL740_100
Firmware Version: IBM,AL740_100
Console Login: enable
Auto Restart: true
Full Core: false
NX Crypto Acceleration: Not Capable

Network Information
Host Name: power-aix
IP Address: 140.211.15.154
Sub Netmask: 255.255.255.0
Gateway: 140.211.15.1
Name Server: 140.211.166.130
Domain Name: osuosl.org

Paging Space Information
Total Paging Space: 12288MB
Percent Used: 1%

Volume Groups Information
==
Active VGs
==
homevg:
PV_NAME   PV STATE  TOTAL PPs   FREE PPsFREE DISTRIBUTION
hdisk1active558 183 00..00..00..71..112
hdisk2active558 0   00..00..00..00..00
hdisk3active558 0   00..00..00..00..00
hdisk4active558 0   00..00..00..00..00
==

rootvg:
PV_NAME   PV STATE  TOTAL PPs   FREE PPsFREE DISTRIBUTION
hdisk0active558 510 
111..93..83..111..112
hdisk5active558 514 
111..97..83..111..112
==

INSTALLED RESOURCE LIST

The following resources are installed on the machine.
+/- = Added or deleted from Resource List.
*   = Diagnostic support not available.

  Model Architecture: chrp
  Model Implementation: Multiple Processor, PCI bus

+ sys0 System Object
+ sysplanar0   System Planar
* vio0 Virtual I/O Bus
* vsa2 U78AB.001.WZSHZY0-P1-T2 LPAR Virtual Serial Adapter
* vty2 U78AB.001.WZSHZY0-P1-T2-L0  Asynchronous Terminal
* vsa1 U78AB.001.WZSHZY0-P1-T1 LPAR Virtual Serial Adapter
* vty1 U78AB.001.WZSHZY0-P1-T1-L0  Asynchronous Terminal
* vsa0 U8231.E2C.104C0CT-V1-C0 LPAR Virtual Serial Adapter
* vty0 U8231.E2C.104C0CT-V1-C0-L0  Asynchronous Terminal
* pci8 U78AB.001.WZSHZY0-P1PCI Express Bus
* pci7 U78AB.001.WZSHZY0-P1PCI Express Bus
* pci6 U78AB.001.WZSHZY0-P1PCI Express Bus
* pci5 U78AB.001.WZSHZY0-P1PCI Express Bus
* pci4 U78AB.001.WZSHZY0-P1PCI Express Bus
* pci10U78AB.001.WZSHZY0-P1-C2 PCI Bus
+ cor0 U78AB.001.WZSHZY0-P1-C2-T1  GXT145 Graphics Adapter
* pci3 U78AB.001.WZSHZY0-P1PCI Express Bus
+ ent0 U78AB.001.WZSHZY0-P1-C7-T1  4-Port Gigabit Ethernet 
PCI-Express Adapter (e414571614102004)
+ ent1 U78AB.001.WZSHZY0-P1-C7-T2  4-Port Gigabit Ethernet 
PCI-Express Adapter (e414571614102004)
+ ent2 U78AB.001.WZSHZY0-P1-C7-T3  4-Port Gigabit Ethernet 
PCI-Express Adapter (e414571614102004)
+ ent3 U78AB.001.WZSHZY0-P1-C7-T4  4-Port Gigabit Ethernet 
PCI-Express Adapter (e414571614102004)
* pci2 U78AB.001.WZSHZY0-P1PCI Express Bus
* pci1 U78AB.001.WZSHZY0-P1PCI Express Bus
* pci9 U78AB.001.WZSHZY0-P1PCI Bus
+ usbhc0   U78AB.001.WZSHZY0-P1USB Host Controller (33103500)
+ usbhc1   U78AB.001.WZSHZY0-P1USB Host Controller (33103500)
+ usbhc2   U78AB.001.WZSHZY0-P1U

Re: [HACKERS] Deadlock in XLogInsert at AIX

2018-01-16 Thread Andrew Dunstan


On 01/16/2018 08:50 AM, REIX, Tony wrote:
> Hi Michael
>
> You said:
>
>> Setting up a buildfarm member with the combination of compiler and
>> environment where you are seeing the failures would be the best answer
>> in my opinion:
>> https://wiki.postgresql.org/wiki/PostgreSQL_Buildfarm_Howto
>>
>> This does not require special knowledge of PostgreSQL internals, and the
>> in-core testing framework has improved the last couple of years to allow
>> for more advanced tests. I do use it as well for some tests on my own
>> modules (company stuff). The buildfarm code has also followed the pace,
>> which really helps a lot, thanks to Andrew Dunstan.
>>
>> Developers and committers are more pro-active if they can see automated
>> tests failing in the central community place. And buildfarm animals
>> usually don't stay red for more than a couple of days.
> Hu I quickly read this HowTo and I did not find any explanation about the 
> "protocole"
> used for exchanging data between my VM and the PostgreSQL BuildFarm.
> My machine is behind firewalls and have restricted access to the outside.
> Either I'll see when that does not work... or I can get some information 
> about which port
> (or anything else) I have to ask to be opened, if needed.
> Anyway, I'll read it in depth now and I'll try to implement it.
>
>
>


Communication is only done via outbound port 443 (https). There are no
passwords required and no inbound connections, ever. Uploads are signed
using a shared secret. Communication can can be via a proxy. If you need
the client to use a proxy with git that's a bit more complex, but possible.

Ping me if you need help setting this up.

cheers

andrew

-- 

Andrew Dunstanhttps://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




RE:[HACKERS] Deadlock in XLogInsert at AIX

2018-01-16 Thread REIX, Tony
Hi Michael

You said:

> Setting up a buildfarm member with the combination of compiler and
> environment where you are seeing the failures would be the best answer
> in my opinion:
> https://wiki.postgresql.org/wiki/PostgreSQL_Buildfarm_Howto
> 
> This does not require special knowledge of PostgreSQL internals, and the
> in-core testing framework has improved the last couple of years to allow
> for more advanced tests. I do use it as well for some tests on my own
> modules (company stuff). The buildfarm code has also followed the pace,
> which really helps a lot, thanks to Andrew Dunstan.
> 
> Developers and committers are more pro-active if they can see automated
> tests failing in the central community place. And buildfarm animals
> usually don't stay red for more than a couple of days.

Hu I quickly read this HowTo and I did not find any explanation about the 
"protocole"
used for exchanging data between my VM and the PostgreSQL BuildFarm.
My machine is behind firewalls and have restricted access to the outside.
Either I'll see when that does not work... or I can get some information about 
which port
(or anything else) I have to ask to be opened, if needed.
Anyway, I'll read it in depth now and I'll try to implement it.


About the random error we see, I guess that I may see it, though PostgreSQL 
BuildFarm AIX VMs do not see it,
because I'm using a not-too-small VM, using variable Physical Processing units 
(CPUs) since my VM is uncapped
(I may use all Physical CPU if available): up to 4 physical processors and up 
to 8 virtual processors.
And, on BuildFarm, I do not see any details about the logical/physical 
configuration of the AIX VMs, like hornet.
Being able to run real concurrent parallel stress programs, thus required 
multi-physical-CPU VM, would help.

Regards,

Tony



Re: [HACKERS] Deadlock in XLogInsert at AIX

2018-01-16 Thread Michael Paquier
On Tue, Jan 16, 2018 at 08:25:51AM +, REIX, Tony wrote:
> My team and my company (ATOS/Bull) are involved in improving the
> quality of PostgreSQL on AIX.

Cool to hear that!

> We have AIX 6.1, 7.1, and 7.2 Power8 systems, with several
> logical/physical processors. And I plan to have a more powerful (more
> processors) machine for running PostgreSQL stress tests. 
> A DB-expert colleague has started to write some new not-too-complex
> stress tests that we'd like to submit to PostgreSQL project later.
> For now, using latest versions of XLC 12 (12.1.0.19) and 13 (13.1.3.4
> with a patch), we have only (on AIX 6.1 and 7.2) one remaining random
> failure (dealing with src/bin/pgbench/t/001_pgbench.pl test), for
> PostgreSQL 9.6.6 and 10.1 . And, on AIX 7.1, we have one more
> remaining failure that may be due to some other dependent
> software. Investigating. 
> XLC 13.1.3.4 shows an issue with -O2 and I have a work-around that
> fixes it in ./src/backend/parser/gram.c . We have opened a PMR
> (defect) against XLC. Note that our tests are now executed without the 
> PG_FORCE_DISABLE_INLINE "inline" trick in src/include/port/aix.h that
> suppresses the inlining of routines on AIX. I think that older
> versions of XLC have shown issues that have now disappeared (or, at
> least, many of them). 
> I've been able to compare PostgreSQL compiled with XLC vs GCC 7.1 and,
> using times outputs provided by PostgreSQL tests, XLC seems to provide
> at least 8% more speed. We also plan to run professional performance
> tests in order to compare PostgreSQL 10.1 on AIX vs Linux/Power. I saw
> some 2017 performance slides, made with older versions of PostgreSQL
> and XLC, that show bad PostgreSQL performance on AIX vs Linux/Power,
> and I cannot believe it. We plan to investigate this.

That's interesting investigation. The community is always interested in
such stories. You could have material for a conference talk.

> Though I have very very little skills about PostgreSQL (I'm porting
> too now GCC Go on AIX), we can help, at least by
> compiling/testing/investigating/stressing in a different AIX
> environment than the AIX ones (32/64bit, XLC/GCC) you have in your
> BuildFarm. 

Setting up a buildfarm member with the combination of compiler and
environment where you are seeing the failures would be the best answer
in my opinion:
https://wiki.postgresql.org/wiki/PostgreSQL_Buildfarm_Howto

This does not require special knowledge of PostgreSQL internals, and the
in-core testing framework has improved the last couple of years to allow
for more advanced tests. I do use it as well for some tests on my own
modules (company stuff). The buildfarm code has also followed the pace,
which really helps a lot, thanks to Andrew Dunstan.

Developers and committers are more pro-active if they can see automated
tests failing in the central community place. And buildfarm animals
usually don't stay red for more than a couple of days.
--
Michael


signature.asc
Description: PGP signature


RE:[HACKERS] Deadlock in XLogInsert at AIX

2018-01-16 Thread REIX, Tony
Hi Michael,

My team and my company (ATOS/Bull) are involved in improving the quality of 
PostgreSQL on AIX.

We have AIX 6.1, 7.1, and 7.2 Power8 systems, with several logical/physical 
processors.
And I plan to have a more powerful (more processors) machine for running 
PostgreSQL stress tests.
A DB-expert colleague has started to write some new not-too-complex stress 
tests that we'd like to submit to PostgreSQL project later.
For now, using latest versions of XLC 12 (12.1.0.19) and 13 (13.1.3.4 with a 
patch), we have only (on AIX 6.1 and 7.2) one remaining random failure (dealing 
with src/bin/pgbench/t/001_pgbench.pl test), for PostgreSQL 9.6.6 and 10.1 . 
And, on AIX 7.1, we have one more remaining failure that may be due to some 
other dependent software. Investigating.
XLC 13.1.3.4 shows an issue with -O2 and I have a work-around that fixes it in 
./src/backend/parser/gram.c . We have opened a PMR (defect) against XLC.
Note that our tests are now executed without the PG_FORCE_DISABLE_INLINE 
"inline" trick in src/include/port/aix.h that suppresses the inlining of 
routines on AIX. I think that older versions of XLC have shown issues that have 
now disappeared (or, at least, many of them).
I've been able to compare PostgreSQL compiled with XLC vs GCC 7.1 and, using 
times outputs provided by PostgreSQL tests, XLC seems to provide at least 8% 
more speed. We also plan to run professional performance tests in order to 
compare PostgreSQL 10.1 on AIX vs Linux/Power. I saw some 2017 performance 
slides, made with older versions of PostgreSQL and XLC, that show bad 
PostgreSQL performance on AIX vs Linux/Power, and I cannot believe it. We plan 
to investigate this.

Though I have very very little skills about PostgreSQL (I'm porting too now GCC 
Go on AIX), we can help, at least by compiling/testing/investigating/stressing 
in a different AIX environment than the AIX ones (32/64bit, XLC/GCC) you have 
in your BuildFarm.
Let me know how we can help.

Regards,

Cordialement,

Tony Reix

ATOS / Bull SAS
ATOS Expert
IBM Coop Architect & Technical Leader
Office : +33 (0) 4 76 29 72 67
1 rue de Provence - 38432 Échirolles - France
www.atos.net


De : Michael Paquier [michael.paqu...@gmail.com]
Envoyé : mardi 16 janvier 2018 08:12
À : Noah Misch
Cc : Heikki Linnakangas; Konstantin Knizhnik; PostgreSQL Hackers; Bernd Helmle
Objet : Re: [HACKERS] Deadlock in XLogInsert at AIX

On Fri, Feb 03, 2017 at 12:26:50AM +, Noah Misch wrote:
> On Wed, Feb 01, 2017 at 02:39:25PM +0200, Heikki Linnakangas wrote:
>> @@ -73,11 +73,19 @@ pg_atomic_compare_exchange_u32_impl(volatile 
>> pg_atomic_uint32 *ptr,
>>  static inline uint32
>>  pg_atomic_fetch_add_u32_impl(volatile pg_atomic_uint32 *ptr, int32 add_)
>>  {
>> +uint32  ret;
>> +
>>  /*
>> - * __fetch_and_add() emits a leading "sync" and trailing "isync", 
>> thereby
>> - * providing sequential consistency.  This is undocumented.
>> + * Use __sync() before and __isync() after, like in compare-exchange
>> + * above.
>>   */
>> -return __fetch_and_add((volatile int *)>value, add_);
>> +__sync();
>> +
>> +ret = __fetch_and_add((volatile int *)>value, add_);
>> +
>> +__isync();
>> +
>> +return ret;
>>  }
>
> Since this emits double syncs with older xlc, I recommend instead replacing
> the whole thing with inline asm.  As I opined in the last message of the
> thread you linked above, the intrinsics provide little value as abstractions
> if one checks the generated code to deduce how to use them.  Now that the
> generated code is xlc-version-dependent, the port is better off with
> compiler-independent asm like we have for ppc in s_lock.h.

Could it be cleaner to just use __xlc_ver__ to avoid double syncs on
past versions? I think that it would make the code more understandable
than just listing directly the instructions. As there have been other
bug reports from Tony Reix who has been working on AIX with XLC 13.1 and
that this thread got lost in the wild, I have added an entry in the next
CF:
https://commitfest.postgresql.org/17/1484/

As Heikki is not around these days, Noah, could you provide a new
version of the patch? This bug has been around for some time now, it
would be nice to move on.. I think I could have written patches myself,
but I don't have an AIX machine at hand. Of course not with XLC 13.1.
--
Michael



Re: [HACKERS] Deadlock in XLogInsert at AIX

2018-01-15 Thread Michael Paquier
On Fri, Feb 03, 2017 at 12:26:50AM +, Noah Misch wrote:
> On Wed, Feb 01, 2017 at 02:39:25PM +0200, Heikki Linnakangas wrote:
>> @@ -73,11 +73,19 @@ pg_atomic_compare_exchange_u32_impl(volatile 
>> pg_atomic_uint32 *ptr,
>>  static inline uint32
>>  pg_atomic_fetch_add_u32_impl(volatile pg_atomic_uint32 *ptr, int32 add_)
>>  {
>> +uint32  ret;
>> +
>>  /*
>> - * __fetch_and_add() emits a leading "sync" and trailing "isync", 
>> thereby
>> - * providing sequential consistency.  This is undocumented.
>> + * Use __sync() before and __isync() after, like in compare-exchange
>> + * above.
>>   */
>> -return __fetch_and_add((volatile int *)>value, add_);
>> +__sync();
>> +
>> +ret = __fetch_and_add((volatile int *)>value, add_);
>> +
>> +__isync();
>> +
>> +return ret;
>>  }
> 
> Since this emits double syncs with older xlc, I recommend instead replacing
> the whole thing with inline asm.  As I opined in the last message of the
> thread you linked above, the intrinsics provide little value as abstractions
> if one checks the generated code to deduce how to use them.  Now that the
> generated code is xlc-version-dependent, the port is better off with
> compiler-independent asm like we have for ppc in s_lock.h.

Could it be cleaner to just use __xlc_ver__ to avoid double syncs on
past versions? I think that it would make the code more understandable
than just listing directly the instructions. As there have been other
bug reports from Tony Reix who has been working on AIX with XLC 13.1 and
that this thread got lost in the wild, I have added an entry in the next
CF:
https://commitfest.postgresql.org/17/1484/

As Heikki is not around these days, Noah, could you provide a new
version of the patch? This bug has been around for some time now, it
would be nice to move on.. I think I could have written patches myself,
but I don't have an AIX machine at hand. Of course not with XLC 13.1.
--
Michael


signature.asc
Description: PGP signature