Re: [HACKERS] Deadlock in XLogInsert at AIX

2017-02-02 Thread Noah Misch
On Wed, Feb 01, 2017 at 02:39:25PM +0200, Heikki Linnakangas wrote: > On 02/01/2017 01:07 PM, Konstantin Knizhnik wrote: > >Attached please find my patch for XLC/AIX. > >The most critical fix is adding __sync to pg_atomic_fetch_add_u32_impl. > >The comment in this file says that: > > > > *

Re: [HACKERS] Deadlock in XLogInsert at AIX

2017-02-02 Thread Konstantin Knizhnik
Hi Tony, On 02.02.2017 17:10, REIX, Tony wrote: Hi Konstantin I've discussed the "zombie/exit" issue with our expert here. - He does not think that AIX has anything special here - If the process is marked in ps, this is because the flag SEXIT is set, thus the process is blocked somewhere

Re: [HACKERS] Deadlock in XLogInsert at AIX

2017-02-02 Thread Konstantin Knizhnik
On 02.02.2017 18:20, REIX, Tony wrote: Hi Konstantin I have an issue with pgbench. Any idea ? Pgbench -s options specifies scale. Scale 1000 corresponds to 1000 million rows and requires about 16Gb at disk. # mkdir /tmp/PGS # chown pgstbf.staff /tmp/PGS # su pgstbf $

Re: [HACKERS] Deadlock in XLogInsert at AIX

2017-02-02 Thread REIX, Tony
Hi Konstantin I have an issue with pgbench. Any idea ? # mkdir /tmp/PGS # chown pgstbf.staff /tmp/PGS # su pgstbf $ /opt/freeware/bin/initdb -D /tmp/PGS The files belonging to this database system will be owned by user "pgstbf". This user must also own the server prcess. The database

Re: [HACKERS] Deadlock in XLogInsert at AIX

2017-02-02 Thread REIX, Tony
Hi Konstantin I've discussed the "zombie/exit" issue with our expert here. - He does not think that AIX has anything special here - If the process is marked in ps, this is because the flag SEXIT is set, thus the process is blocked somewhere in the kexitx() syscall, waiting for something. -

Re: [HACKERS] Deadlock in XLogInsert at AIX

2017-02-02 Thread Konstantin Knizhnik
Last update on the issue with deadlock in XLogInsert. After almost one day of working, pgbench is once again not working normally:( There are no deadlock, there are no core files and no error messages in log. But TPS is almost zero: progress: 57446.0 s, 1.1 tps, lat 3840.265 ms stddev NaNQ

Re: [HACKERS] Deadlock in XLogInsert at AIX

2017-02-01 Thread Konstantin Knizhnik
On 02/01/2017 08:28 PM, Heikki Linnakangas wrote: > > But if there's no pressing reason to change it, let's leave it alone. It's > not related to the problem at hand, right? > Yes, I agree with you: we should better leave it as it is. -- Konstantin Knizhnik Postgres Professional:

Re: [HACKERS] Deadlock in XLogInsert at AIX

2017-02-01 Thread Konstantin Knizhnik
On 02/01/2017 08:30 PM, REIX, Tony wrote: > > Hi Konstantin, > > Please run:*/opt/IBM/xlc/13.1.3/bin/xlc -qversion* so that I know your exact > XLC v13 version. > IBM XL C/C++ for AIX, V13.1.3 (5725-C72, 5765-J07) > I'm building on Power7 and not giving any architecture flag to XLC. > > I'm not

Re: [HACKERS] Deadlock in XLogInsert at AIX

2017-02-01 Thread REIX, Tony
Hi Konstantin, Please run: /opt/IBM/xlc/13.1.3/bin/xlc -qversion so that I know your exact XLC v13 version. I'm building on Power7 and not giving any architecture flag to XLC. I'm not using -qalign=natural . Thus, by default, XLC use -qalign=power, which is close to natural, as explained at:

Re: [HACKERS] Deadlock in XLogInsert at AIX

2017-02-01 Thread Heikki Linnakangas
On 02/01/2017 04:12 PM, Konstantin Knizhnik wrote: On 01.02.2017 15:39, Heikki Linnakangas wrote: On 02/01/2017 01:07 PM, Konstantin Knizhnik wrote: Attached please find my patch for XLC/AIX. The most critical fix is adding __sync to pg_atomic_fetch_add_u32_impl. The comment in this file says

Re: [HACKERS] Deadlock in XLogInsert at AIX

2017-02-01 Thread REIX, Tony
Hi Konstantin XLC. I'm on AIX 7.1 for now. I'm using this version of XLC v13: # xlc -qversion IBM XL C/C++ for AIX, V13.1.3 (5725-C72, 5765-J07) Version: 13.01.0003.0003 With this version, I have (at least, since I tested with "check" and not "check-world" at that time) 2 failing tests:

Re: [HACKERS] Deadlock in XLogInsert at AIX

2017-02-01 Thread Konstantin Knizhnik
On 01.02.2017 15:39, Heikki Linnakangas wrote: In summary, I came up with the attached. It's essentially your patch, with tweaks for the above-mentioned things. I don't have a powerpc system to test on, so there are probably some silly typos there. Attached pleased find fixed version of

Re: [HACKERS] Deadlock in XLogInsert at AIX

2017-02-01 Thread Konstantin Knizhnik
Hi Tony, On 01.02.2017 18:42, REIX, Tony wrote: Hi Konstantin *XLC.* I'm on AIX 7.1 for now. I'm using this version of *XL**C v13*: # xlc -qversion IBM XL C/C++ for AIX, V13.1.3 (5725-C72, 5765-J07) Version: 13.01.0003.0003 With this version, I have (at least, since I tested with "check"

Re: [HACKERS] Deadlock in XLogInsert at AIX

2017-02-01 Thread REIX, Tony
Hi, I'm now working on the port of PostgreSQL on AIX. (RPMs can be found, as free OpenSource work, at http://http://bullfreeware.com/ . http://bullfreeware.com/search.php?package=postgresql ) I was not aware of any issue with XLC v12 on AIX for atomic operations. (XLC v13 generates at least 2

Re: [HACKERS] Deadlock in XLogInsert at AIX

2017-02-01 Thread Konstantin Knizhnik
Hi, We are using 13.1.3 version of XLC. All tests are passed. Please notice that is is synchronization bug which can be reproduced only under hard load. Our server has 64 cores and it is necessary to run pgbench with 100 connections during several minutes to reproduce the problem. So may be

Re: [HACKERS] Deadlock in XLogInsert at AIX

2017-02-01 Thread Konstantin Knizhnik
On 01.02.2017 15:39, Heikki Linnakangas wrote: On 02/01/2017 01:07 PM, Konstantin Knizhnik wrote: Attached please find my patch for XLC/AIX. The most critical fix is adding __sync to pg_atomic_fetch_add_u32_impl. The comment in this file says that: * __fetch_and_add() emits a leading

Re: [HACKERS] Deadlock in XLogInsert at AIX

2017-02-01 Thread Heikki Linnakangas
On 02/01/2017 01:07 PM, Konstantin Knizhnik wrote: Attached please find my patch for XLC/AIX. The most critical fix is adding __sync to pg_atomic_fetch_add_u32_impl. The comment in this file says that: * __fetch_and_add() emits a leading "sync" and trailing "isync", thereby *

Re: [HACKERS] Deadlock in XLogInsert at AIX

2017-02-01 Thread Heikki Linnakangas
On 02/01/2017 01:07 PM, Konstantin Knizhnik wrote: Attached please find my patch for XLC/AIX. The most critical fix is adding __sync to pg_atomic_fetch_add_u32_impl. The comment in this file says that: * __fetch_and_add() emits a leading "sync" and trailing "isync", thereby *

Re: [HACKERS] Deadlock in XLogInsert at AIX

2017-02-01 Thread Konstantin Knizhnik
Attached please find my patch for XLC/AIX. The most critical fix is adding __sync to pg_atomic_fetch_add_u32_impl. The comment in this file says that: * __fetch_and_add() emits a leading "sync" and trailing "isync", thereby * providing sequential consistency. This is undocumented.

Re: [HACKERS] Deadlock in XLogInsert at AIX

2017-02-01 Thread Heikki Linnakangas
Oh, you were one step ahead of me, I didn't understand it on first read of your email. Need more coffee.. On 01/31/2017 05:03 PM, Konstantin Knizhnik wrote: I inspected code of pg_atomic_compare_exchange_u32_impl and didn't sync in prologue: (dbx) listi pg_atomic_compare_exchange_u32_impl >

Re: [HACKERS] Deadlock in XLogInsert at AIX

2017-02-01 Thread Heikki Linnakangas
On 01/31/2017 05:03 PM, Konstantin Knizhnik wrote: One more assertion failure: ExceptionalCondition(conditionName = "!(OldPageRqstPtr <= XLogCtl->InitializedUpTo)", errorType = "FailedAssertion", fileName = "xlog.c", lineNumber = 1887), line 54 in "assert.c" (dbx) p OldPageRqstPtr

Re: [HACKERS] Deadlock in XLogInsert at AIX

2017-01-31 Thread Konstantin Knizhnik
One more assertion failure: ExceptionalCondition(conditionName = "!(OldPageRqstPtr <= XLogCtl->InitializedUpTo)", errorType = "FailedAssertion", fileName = "xlog.c", lineNumber = 1887), line 54 in "assert.c" (dbx) p OldPageRqstPtr 153551667200 (dbx) p XLogCtl->InitializedUpTo 153551667200

Re: [HACKERS] Deadlock in XLogInsert at AIX

2017-01-31 Thread Konstantin Knizhnik
On 30.01.2017 19:21, Heikki Linnakangas wrote: On 01/24/2017 04:47 PM, Konstantin Knizhnik wrote: Interesting.. What should happen here is that for the backend's own insertion slot, the "insertingat" value should be greater than the requested flush point ('upto' variable). That's because

Re: [HACKERS] Deadlock in XLogInsert at AIX

2017-01-30 Thread Heikki Linnakangas
On 01/24/2017 04:47 PM, Konstantin Knizhnik wrote: As I already mentioned, we built Postgres with LOCK_DEBUG , so we can inspect lock owner. Backend is waiting for itself! Now please look at two frames in this stack trace marked with red. XLogInsertRecord is setting WALInsert locks at the

Re: [HACKERS] Deadlock in XLogInsert at AIX

2017-01-30 Thread Bernd Helmle
Hi Konstantin, We had observed exactly the same issues on a customer system with the same environment and PostgreSQL 9.5.5. Additionally, we've tested on Linux with XL/C 12 and 13 with exactly the same deadlock behavior. So we assumed that this is somehow a compiler issue. Am Dienstag, den

Re: [HACKERS] Deadlock in XLogInsert at AIX

2017-01-24 Thread Konstantin Knizhnik
More information about the problem - Postgres log contains several records: 2017-01-24 19:15:20.272 MSK [19270462] LOG: request to flush past end of generated WAL; request 6/AAEBE000, currpos 6/AAEBC2B0 and them correspond to the time when deadlock happen. There is the following comment in

[HACKERS] Deadlock in XLogInsert at AIX

2017-01-24 Thread Konstantin Knizhnik
Hi Hackers, We are running Postgres at AIX and encoountered two strqange problems: active zombies process and deadlock in XLOG writer. First problem I will explain in separate mail, now I am mostly concerning about deadlock. It is irregularly reproduced with standard pgbench launched with 100