Re: Server crash on RHEL 9/s390x platform against PG16
On Sat, Oct 21, 2023 at 5:17 AM Andres Freund wrote: > Hi, > > On 2023-09-12 15:27:21 +0530, Suraj Kharage wrote: > > *[edb@9428da9d2137 postgres]$ cat /etc/redhat-release AlmaLinux release > 9.2 > > (Turquoise Kodkod)[edb@9428da9d2137 postgres]$ lscpuArchitecture: > > s390x CPU op-mode(s): 32-bit, 64-bit Address sizes:39 > bits > > Can you provide the rest of the lscpu output? There have been issues with > Z14 > vs Z15: > https://github.com/llvm/llvm-project/issues/53009 > > You're apparently not hitting that, but given that fact, you either are on > a > slightly older CPU, or you have applied a patch to work around it. Because > otherwise your uild instructions below would hit that problem, I think. > > > > physical, 48 bits virtual Byte Order: Big Endian* > > *Configure command:* > > ./configure --prefix=/home/edb/postgres/ --with-lz4 --with-zstd > --with-llvm > > --with-perl --with-python --with-tcl --with-openssl --enable-nls > > --with-libxml --with-libxslt --with-systemd --with-libcurl --without-icu > > --enable-debug --enable-cassert --with-pgport=5414 > > Hm, based on "--with-libcurl" this isn't upstream postgres, correct? Have > you > verified the issue reproduces on upstream postgres? > Yes, I can reproduce this on upstream postgres master and v16 branch. Here are details: ./configure --prefix=/home/edb/postgres/ --with-zstd --with-llvm --with-perl --with-python --with-tcl --with-openssl --enable-nls --with-libxml --with-libxslt --with-systemd --without-icu --enable-debug --enable-cassert --with-pgport=5414 CFLAGS="-g -O0" [edb@9428da9d2137 postgres]$ cat /etc/redhat-release AlmaLinux release 9.2 (Turquoise Kodkod) [edb@9428da9d2137 edbas]$ lscpu Architecture: s390x CPU op-mode(s): 32-bit, 64-bit Address sizes:39 bits physical, 48 bits virtual Byte Order: Big Endian CPU(s): 9 On-line CPU(s) list: 0-8 Vendor ID: GenuineIntel Model name: Intel(R) Core(TM) i7-9750H CPU @ 2.60GHz CPU family: 6 Model: 158 Thread(s) per core: 1 Core(s) per socket: 1 Socket(s): 9 Stepping: 10 BogoMIPS: 5200.00 Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht pbe syscall nx pdpe1gb lm constant_tsc rep_good nopl xtopology nonstop_tsc cpuid pni pclmulqdq dtes64 ds_cpl ssse3 sdbg fma cx 16 xtpr pcid sse4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch fsgsbase bmi1 avx2 bmi2 erms xsaveopt arat Caches (sum of all): L1d: 288 KiB (9 instances) L1i: 288 KiB (9 instances) L2: 2.3 MiB (9 instances) L3: 108 MiB (9 instances) Vulnerabilities: Itlb multihit:KVM: Mitigation: VMX unsupported L1tf: Mitigation; PTE Inversion Mds: Vulnerable; SMT Host state unknown Meltdown: Vulnerable Mmio stale data: Vulnerable Spec store bypass:Vulnerable Spectre v1: Vulnerable: __user pointer sanitization and usercopy barriers only; no swapgs barriers Spectre v2: Vulnerable, STIBP: disabled Srbds:Unknown: Dependent on hypervisor status Tsx async abort: Not affected [edb@9428da9d2137 postgres]$ clang --version clang version 15.0.7 (Red Hat 15.0.7-2.el9) Target: s390x-ibm-linux-gnu Thread model: posix InstalledDir: /usr/bin [edb@9428da9d2137 postgres]$ rpm -qa | grep llvm *llvm*-libs-15.0.7-1.el9.s390x *llvm*-15.0.7-1.el9.s390x *llvm*-test-15.0.7-1.el9.s390x *llvm*-static-15.0.7-1.el9.s390x *llvm*-devel-15.0.7-1.el9.s390x Please let me know if any further information is required. > > > > *Test case:* > > CREATE TABLE rm32044_t1 > > ( > > pkey integer, > > val text > > ); > > CREATE TABLE rm32044_t2 > > ( > > pkey integer, > > label text, > > hidden boolean > > ); > > CREATE TABLE rm32044_t3 > > ( > > pkey integer, > > val integer > > ); > > CREATE TABLE rm32044_t4 > > ( > > pkey integer > > ); > > insert into rm32044_t1 values ( 1 , 'row1'); > > insert into rm32044_t1 values ( 2 , 'row2'); > > insert into rm32044_t2 values ( 1 , 'hidden', true); > > insert into rm32044_t2 values ( 2 , 'visible', false); > > insert into rm32044_t3 values (1 , 1); > > insert into rm32044_t3 values (2 , 1); > > > > postgres=# SELECT * FROM rm32044_t1 LEFT JOIN rm32044_t2 ON > rm32044_t1.pkey > > = rm32044_t2.pkey, rm32044_t3 LEFT JOIN rm32044_t4 ON rm32044_t3.pkey = > > rm32044_t4.pkey order by rm32044_t1.pkey,label,hidden; > > > server closed the connection unexpectedly > > This probably means the server terminated abnormally > > before or while processing the request. > > The connection to the server was lost. Attempting reset: Failed. > >
Re: Server crash on RHEL 9/s390x platform against PG16
Hi, On 2023-09-12 15:27:21 +0530, Suraj Kharage wrote: > *[edb@9428da9d2137 postgres]$ cat /etc/redhat-release AlmaLinux release 9.2 > (Turquoise Kodkod)[edb@9428da9d2137 postgres]$ lscpuArchitecture: > s390x CPU op-mode(s): 32-bit, 64-bit Address sizes:39 bits Can you provide the rest of the lscpu output? There have been issues with Z14 vs Z15: https://github.com/llvm/llvm-project/issues/53009 You're apparently not hitting that, but given that fact, you either are on a slightly older CPU, or you have applied a patch to work around it. Because otherwise your uild instructions below would hit that problem, I think. > physical, 48 bits virtual Byte Order: Big Endian* > *Configure command:* > ./configure --prefix=/home/edb/postgres/ --with-lz4 --with-zstd --with-llvm > --with-perl --with-python --with-tcl --with-openssl --enable-nls > --with-libxml --with-libxslt --with-systemd --with-libcurl --without-icu > --enable-debug --enable-cassert --with-pgport=5414 Hm, based on "--with-libcurl" this isn't upstream postgres, correct? Have you verified the issue reproduces on upstream postgres? > > *Test case:* > CREATE TABLE rm32044_t1 > ( > pkey integer, > val text > ); > CREATE TABLE rm32044_t2 > ( > pkey integer, > label text, > hidden boolean > ); > CREATE TABLE rm32044_t3 > ( > pkey integer, > val integer > ); > CREATE TABLE rm32044_t4 > ( > pkey integer > ); > insert into rm32044_t1 values ( 1 , 'row1'); > insert into rm32044_t1 values ( 2 , 'row2'); > insert into rm32044_t2 values ( 1 , 'hidden', true); > insert into rm32044_t2 values ( 2 , 'visible', false); > insert into rm32044_t3 values (1 , 1); > insert into rm32044_t3 values (2 , 1); > > postgres=# SELECT * FROM rm32044_t1 LEFT JOIN rm32044_t2 ON rm32044_t1.pkey > = rm32044_t2.pkey, rm32044_t3 LEFT JOIN rm32044_t4 ON rm32044_t3.pkey = > rm32044_t4.pkey order by rm32044_t1.pkey,label,hidden; > server closed the connection unexpectedly > This probably means the server terminated abnormally > before or while processing the request. > The connection to the server was lost. Attempting reset: Failed. > The connection to the server was lost. Attempting reset: Failed. I tried this on both master and 16, without hitting this issue. If you can reproduce the issue on upstream postgres, can you share more about your configuration? Greetings, Andres Freund
Re: Server crash on RHEL 9/s390x platform against PG16
On Sun, Oct 8, 2023 at 10:55 PM Suraj Kharage < suraj.khar...@enterprisedb.com> wrote: > It looks like an issue with JIT. If I disable the JIT then the above query > runs successfully. > > postgres=# set jit to off; > > SET > > postgres=# SELECT * FROM rm32044_t1 LEFT JOIN rm32044_t2 ON > rm32044_t1.pkey = rm32044_t2.pkey, rm32044_t3 LEFT JOIN rm32044_t4 ON > rm32044_t3.pkey = rm32044_t4.pkey order by rm32044_t1.pkey,label,hidden; > > pkey | val | pkey | label | hidden | pkey | val | pkey > > --+--+--+-++--+-+-- > > 1 | row1 |1 | hidden | t |1 | 1 | > > 1 | row1 |1 | hidden | t |2 | 1 | > > 2 | row2 |2 | visible | f |1 | 1 | > > 2 | row2 |2 | visible | f |2 | 1 | > > (4 rows) > > Any idea on this? > No, but I found a few previous threads complaining about JIT not working on s390x. https://www.postgresql.org/message-id/4106722.1616177...@sss.pgh.pa.us https://www.postgresql.org/message-id/3ba50664-56a2-bcf4-2b24-05a3e0a75...@enterprisedb.com https://www.postgresql.org/message-id/20200715091509.GA3354074%40msg.df7cb.de The most interesting email I found in those threads was this one: http://postgr.es/m/3358505.1594912...@sss.pgh.pa.us The backtrace there is different from the one you posted here in significant ways, but it seems like both that case and this one involve a null pointer showing up for a non-null pass-by-reference datum. That doesn't seem like a whole lot to go on, but maybe somebody who understands the JIT stuff better than I do will have an idea. -- Robert Haas EDB: http://www.enterprisedb.com
Re: Server crash on RHEL 9/s390x platform against PG16
Here is clang version: [edb@9428da9d2137]$ clang --version clang version 15.0.7 (Red Hat 15.0.7-2.el9) Target: s390x-ibm-linux-gnu Thread model: posix InstalledDir: /usr/bin Let me know if any further information is needed. On Mon, Oct 9, 2023 at 8:21 AM Suraj Kharage wrote: > It looks like an issue with JIT. If I disable the JIT then the above query > runs successfully. > > postgres=# set jit to off; > > SET > > postgres=# SELECT * FROM rm32044_t1 LEFT JOIN rm32044_t2 ON > rm32044_t1.pkey = rm32044_t2.pkey, rm32044_t3 LEFT JOIN rm32044_t4 ON > rm32044_t3.pkey = rm32044_t4.pkey order by rm32044_t1.pkey,label,hidden; > > pkey | val | pkey | label | hidden | pkey | val | pkey > > --+--+--+-++--+-+-- > > 1 | row1 |1 | hidden | t |1 | 1 | > > 1 | row1 |1 | hidden | t |2 | 1 | > > 2 | row2 |2 | visible | f |1 | 1 | > > 2 | row2 |2 | visible | f |2 | 1 | > > (4 rows) > > Any idea on this? > > On Mon, Sep 18, 2023 at 11:20 AM Suraj Kharage < > suraj.khar...@enterprisedb.com> wrote: > >> Few more details on this: >> >> (gdb) p val >> $1 = 0 >> (gdb) p i >> $2 = 3 >> (gdb) f 3 >> #3 0x01a1ef70 in ExecCopySlotMinimalTuple (slot=0x202e4f8) at >> ../../../../src/include/executor/tuptable.h:472 >> 472 return slot->tts_ops->copy_minimal_tuple(slot); >> (gdb) p *slot >> $3 = {type = T_TupleTableSlot, tts_flags = 16, tts_nvalid = 8, tts_ops = >> 0x1b6dcc8 , tts_tupleDescriptor = 0x202e0e8, tts_values = >> 0x202e540, tts_isnull = 0x202e580, tts_mcxt = 0x1f54550, tts_tid = >> {ip_blkid = {bi_hi = 65535, >> bi_lo = 65535}, ip_posid = 0}, tts_tableOid = 0} >> (gdb) p *slot->tts_tupleDescriptor >> $2 = {natts = 8, tdtypeid = 2249, tdtypmod = -1, tdrefcount = -1, constr >> = 0x0, attrs = 0x202cd28} >> >> (gdb) p slot.tts_values[3] >> $4 = 0 >> (gdb) p slot.tts_values[2] >> $5 = 1 >> (gdb) p slot.tts_values[1] >> $6 = 34027556 >> >> >> As per the resultslot, it has 0 value for the third attribute (column >> lable). >> Im testing this on the docker container and facing some issues with gdb >> hence could not able to debug it further. >> >> Here is a explain plan: >> >> postgres=# explain (verbose, costs off) SELECT * FROM rm32044_t1 LEFT >> JOIN rm32044_t2 ON rm32044_t1.pkey = rm32044_t2.pkey, rm32044_t3 LEFT JOIN >> rm32044_t4 ON rm32044_t3.pkey = rm32044_t4.pkey order by >> rm32044_t1.pkey,label,hidden; >> >> QUERY PLAN >> >> >> - >> Incremental Sort >>Output: rm32044_t1.pkey, rm32044_t1.val, rm32044_t2.pkey, >> rm32044_t2.label, rm32044_t2.hidden, rm32044_t3.pkey, rm32044_t3.val, >> rm32044_t4.pkey >>Sort Key: rm32044_t1.pkey, rm32044_t2.label, rm32044_t2.hidden >>Presorted Key: rm32044_t1.pkey >>-> Merge Left Join >> Output: rm32044_t1.pkey, rm32044_t1.val, rm32044_t2.pkey, >> rm32044_t2.label, rm32044_t2.hidden, rm32044_t3.pkey, rm32044_t3.val, >> rm32044_t4.pkey >> Merge Cond: (rm32044_t1.pkey = rm32044_t2.pkey) >> -> Sort >>Output: rm32044_t3.pkey, rm32044_t3.val, rm32044_t4.pkey, >> rm32044_t1.pkey, rm32044_t1.val >>Sort Key: rm32044_t1.pkey >>-> Nested Loop >> Output: rm32044_t3.pkey, rm32044_t3.val, >> rm32044_t4.pkey, rm32044_t1.pkey, rm32044_t1.val >> -> Merge Left Join >>Output: rm32044_t3.pkey, rm32044_t3.val, >> rm32044_t4.pkey >>Merge Cond: (rm32044_t3.pkey = rm32044_t4.pkey) >>-> Sort >> Output: rm32044_t3.pkey, rm32044_t3.val >> Sort Key: rm32044_t3.pkey >> -> Seq Scan on public.rm32044_t3 >>Output: rm32044_t3.pkey, >> rm32044_t3.val >>-> Sort >> Output: rm32044_t4.pkey >> Sort Key: rm32044_t4.pkey >> -> Seq Scan on public.rm32044_t4 >>Output: rm32044_t4.pkey >> -> Materialize >>Output: rm32044_t1.pkey, rm32044_t1.val >>-> Seq Scan on public.rm32044_t1 >> Output: rm32044_t1.pkey, rm32044_t1.val >> -> Sort >>Output: rm32044_t2.pkey, rm32044_t2.label, >> rm32044_t2.hidden >>Sort Key: rm32044_t2.pkey >>-> Seq Scan on public.rm32044_t2 >> Output: rm32044_t2.pkey, rm32044_t2.label, >> rm32044_t2.hidden >> (34 rows) >> >> >> It seems like while building the innerslot for merge join, the value for >> attnum 1 is
Re: Server crash on RHEL 9/s390x platform against PG16
It looks like an issue with JIT. If I disable the JIT then the above query runs successfully. postgres=# set jit to off; SET postgres=# SELECT * FROM rm32044_t1 LEFT JOIN rm32044_t2 ON rm32044_t1.pkey = rm32044_t2.pkey, rm32044_t3 LEFT JOIN rm32044_t4 ON rm32044_t3.pkey = rm32044_t4.pkey order by rm32044_t1.pkey,label,hidden; pkey | val | pkey | label | hidden | pkey | val | pkey --+--+--+-++--+-+-- 1 | row1 |1 | hidden | t |1 | 1 | 1 | row1 |1 | hidden | t |2 | 1 | 2 | row2 |2 | visible | f |1 | 1 | 2 | row2 |2 | visible | f |2 | 1 | (4 rows) Any idea on this? On Mon, Sep 18, 2023 at 11:20 AM Suraj Kharage < suraj.khar...@enterprisedb.com> wrote: > Few more details on this: > > (gdb) p val > $1 = 0 > (gdb) p i > $2 = 3 > (gdb) f 3 > #3 0x01a1ef70 in ExecCopySlotMinimalTuple (slot=0x202e4f8) at > ../../../../src/include/executor/tuptable.h:472 > 472 return slot->tts_ops->copy_minimal_tuple(slot); > (gdb) p *slot > $3 = {type = T_TupleTableSlot, tts_flags = 16, tts_nvalid = 8, tts_ops = > 0x1b6dcc8 , tts_tupleDescriptor = 0x202e0e8, tts_values = > 0x202e540, tts_isnull = 0x202e580, tts_mcxt = 0x1f54550, tts_tid = > {ip_blkid = {bi_hi = 65535, > bi_lo = 65535}, ip_posid = 0}, tts_tableOid = 0} > (gdb) p *slot->tts_tupleDescriptor > $2 = {natts = 8, tdtypeid = 2249, tdtypmod = -1, tdrefcount = -1, constr = > 0x0, attrs = 0x202cd28} > > (gdb) p slot.tts_values[3] > $4 = 0 > (gdb) p slot.tts_values[2] > $5 = 1 > (gdb) p slot.tts_values[1] > $6 = 34027556 > > > As per the resultslot, it has 0 value for the third attribute (column > lable). > Im testing this on the docker container and facing some issues with gdb > hence could not able to debug it further. > > Here is a explain plan: > > postgres=# explain (verbose, costs off) SELECT * FROM rm32044_t1 LEFT JOIN > rm32044_t2 ON rm32044_t1.pkey = rm32044_t2.pkey, rm32044_t3 LEFT JOIN > rm32044_t4 ON rm32044_t3.pkey = rm32044_t4.pkey order by > rm32044_t1.pkey,label,hidden; > > QUERY PLAN > > > - > Incremental Sort >Output: rm32044_t1.pkey, rm32044_t1.val, rm32044_t2.pkey, > rm32044_t2.label, rm32044_t2.hidden, rm32044_t3.pkey, rm32044_t3.val, > rm32044_t4.pkey >Sort Key: rm32044_t1.pkey, rm32044_t2.label, rm32044_t2.hidden >Presorted Key: rm32044_t1.pkey >-> Merge Left Join > Output: rm32044_t1.pkey, rm32044_t1.val, rm32044_t2.pkey, > rm32044_t2.label, rm32044_t2.hidden, rm32044_t3.pkey, rm32044_t3.val, > rm32044_t4.pkey > Merge Cond: (rm32044_t1.pkey = rm32044_t2.pkey) > -> Sort >Output: rm32044_t3.pkey, rm32044_t3.val, rm32044_t4.pkey, > rm32044_t1.pkey, rm32044_t1.val >Sort Key: rm32044_t1.pkey >-> Nested Loop > Output: rm32044_t3.pkey, rm32044_t3.val, > rm32044_t4.pkey, rm32044_t1.pkey, rm32044_t1.val > -> Merge Left Join >Output: rm32044_t3.pkey, rm32044_t3.val, > rm32044_t4.pkey >Merge Cond: (rm32044_t3.pkey = rm32044_t4.pkey) >-> Sort > Output: rm32044_t3.pkey, rm32044_t3.val > Sort Key: rm32044_t3.pkey > -> Seq Scan on public.rm32044_t3 >Output: rm32044_t3.pkey, > rm32044_t3.val >-> Sort > Output: rm32044_t4.pkey > Sort Key: rm32044_t4.pkey > -> Seq Scan on public.rm32044_t4 >Output: rm32044_t4.pkey > -> Materialize >Output: rm32044_t1.pkey, rm32044_t1.val >-> Seq Scan on public.rm32044_t1 > Output: rm32044_t1.pkey, rm32044_t1.val > -> Sort >Output: rm32044_t2.pkey, rm32044_t2.label, rm32044_t2.hidden >Sort Key: rm32044_t2.pkey >-> Seq Scan on public.rm32044_t2 > Output: rm32044_t2.pkey, rm32044_t2.label, > rm32044_t2.hidden > (34 rows) > > > It seems like while building the innerslot for merge join, the value for > attnum 1 is not getting fetched correctly. > > On Tue, Sep 12, 2023 at 3:27 PM Suraj Kharage < > suraj.khar...@enterprisedb.com> wrote: > >> Hi, >> >> Found server crash on RHEL 9/s390x platform with below test case - >> >> *Machine details:* >> >> >> >> >> >> >> >> *[edb@9428da9d2137 postgres]$ cat /etc/redhat-release AlmaLinux release >> 9.2 (Turquoise Kodkod)[edb@9428da9d2137 postgres]$ lscpuArchitecture: >> s390x CPU op-mode(s):
Re: Server crash on RHEL 9/s390x platform against PG16
Few more details on this: (gdb) p val $1 = 0 (gdb) p i $2 = 3 (gdb) f 3 #3 0x01a1ef70 in ExecCopySlotMinimalTuple (slot=0x202e4f8) at ../../../../src/include/executor/tuptable.h:472 472 return slot->tts_ops->copy_minimal_tuple(slot); (gdb) p *slot $3 = {type = T_TupleTableSlot, tts_flags = 16, tts_nvalid = 8, tts_ops = 0x1b6dcc8 , tts_tupleDescriptor = 0x202e0e8, tts_values = 0x202e540, tts_isnull = 0x202e580, tts_mcxt = 0x1f54550, tts_tid = {ip_blkid = {bi_hi = 65535, bi_lo = 65535}, ip_posid = 0}, tts_tableOid = 0} (gdb) p *slot->tts_tupleDescriptor $2 = {natts = 8, tdtypeid = 2249, tdtypmod = -1, tdrefcount = -1, constr = 0x0, attrs = 0x202cd28} (gdb) p slot.tts_values[3] $4 = 0 (gdb) p slot.tts_values[2] $5 = 1 (gdb) p slot.tts_values[1] $6 = 34027556 As per the resultslot, it has 0 value for the third attribute (column lable). Im testing this on the docker container and facing some issues with gdb hence could not able to debug it further. Here is a explain plan: postgres=# explain (verbose, costs off) SELECT * FROM rm32044_t1 LEFT JOIN rm32044_t2 ON rm32044_t1.pkey = rm32044_t2.pkey, rm32044_t3 LEFT JOIN rm32044_t4 ON rm32044_t3.pkey = rm32044_t4.pkey order by rm32044_t1.pkey,label,hidden; QUERY PLAN - Incremental Sort Output: rm32044_t1.pkey, rm32044_t1.val, rm32044_t2.pkey, rm32044_t2.label, rm32044_t2.hidden, rm32044_t3.pkey, rm32044_t3.val, rm32044_t4.pkey Sort Key: rm32044_t1.pkey, rm32044_t2.label, rm32044_t2.hidden Presorted Key: rm32044_t1.pkey -> Merge Left Join Output: rm32044_t1.pkey, rm32044_t1.val, rm32044_t2.pkey, rm32044_t2.label, rm32044_t2.hidden, rm32044_t3.pkey, rm32044_t3.val, rm32044_t4.pkey Merge Cond: (rm32044_t1.pkey = rm32044_t2.pkey) -> Sort Output: rm32044_t3.pkey, rm32044_t3.val, rm32044_t4.pkey, rm32044_t1.pkey, rm32044_t1.val Sort Key: rm32044_t1.pkey -> Nested Loop Output: rm32044_t3.pkey, rm32044_t3.val, rm32044_t4.pkey, rm32044_t1.pkey, rm32044_t1.val -> Merge Left Join Output: rm32044_t3.pkey, rm32044_t3.val, rm32044_t4.pkey Merge Cond: (rm32044_t3.pkey = rm32044_t4.pkey) -> Sort Output: rm32044_t3.pkey, rm32044_t3.val Sort Key: rm32044_t3.pkey -> Seq Scan on public.rm32044_t3 Output: rm32044_t3.pkey, rm32044_t3.val -> Sort Output: rm32044_t4.pkey Sort Key: rm32044_t4.pkey -> Seq Scan on public.rm32044_t4 Output: rm32044_t4.pkey -> Materialize Output: rm32044_t1.pkey, rm32044_t1.val -> Seq Scan on public.rm32044_t1 Output: rm32044_t1.pkey, rm32044_t1.val -> Sort Output: rm32044_t2.pkey, rm32044_t2.label, rm32044_t2.hidden Sort Key: rm32044_t2.pkey -> Seq Scan on public.rm32044_t2 Output: rm32044_t2.pkey, rm32044_t2.label, rm32044_t2.hidden (34 rows) It seems like while building the innerslot for merge join, the value for attnum 1 is not getting fetched correctly. On Tue, Sep 12, 2023 at 3:27 PM Suraj Kharage < suraj.khar...@enterprisedb.com> wrote: > Hi, > > Found server crash on RHEL 9/s390x platform with below test case - > > *Machine details:* > > > > > > > > *[edb@9428da9d2137 postgres]$ cat /etc/redhat-release AlmaLinux release > 9.2 (Turquoise Kodkod)[edb@9428da9d2137 postgres]$ lscpuArchitecture: > s390x CPU op-mode(s): 32-bit, 64-bit Address sizes:39 > bits physical, 48 bits virtual Byte Order: Big Endian* > *Configure command:* > ./configure --prefix=/home/edb/postgres/ --with-lz4 --with-zstd > --with-llvm --with-perl --with-python --with-tcl --with-openssl > --enable-nls --with-libxml --with-libxslt --with-systemd --with-libcurl > --without-icu --enable-debug --enable-cassert --with-pgport=5414 > > > *Test case:* > CREATE TABLE rm32044_t1 > ( > pkey integer, > val text > ); > CREATE TABLE rm32044_t2 > ( > pkey integer, > label text, > hidden boolean > ); > CREATE TABLE rm32044_t3 > ( > pkey integer, > val integer > ); > CREATE TABLE rm32044_t4 > ( > pkey integer > ); > insert into rm32044_t1 values ( 1 , 'row1'); > insert into rm32044_t1 values ( 2 , 'row2'); > insert into rm32044_t2 values ( 1 , 'hidden', true); > insert into rm32044_t2 values ( 2 , 'visible', false); > insert into rm32044_t3 values (1 , 1);
Server crash on RHEL 9/s390x platform against PG16
Hi, Found server crash on RHEL 9/s390x platform with below test case - *Machine details:* *[edb@9428da9d2137 postgres]$ cat /etc/redhat-release AlmaLinux release 9.2 (Turquoise Kodkod)[edb@9428da9d2137 postgres]$ lscpuArchitecture: s390x CPU op-mode(s): 32-bit, 64-bit Address sizes:39 bits physical, 48 bits virtual Byte Order: Big Endian* *Configure command:* ./configure --prefix=/home/edb/postgres/ --with-lz4 --with-zstd --with-llvm --with-perl --with-python --with-tcl --with-openssl --enable-nls --with-libxml --with-libxslt --with-systemd --with-libcurl --without-icu --enable-debug --enable-cassert --with-pgport=5414 *Test case:* CREATE TABLE rm32044_t1 ( pkey integer, val text ); CREATE TABLE rm32044_t2 ( pkey integer, label text, hidden boolean ); CREATE TABLE rm32044_t3 ( pkey integer, val integer ); CREATE TABLE rm32044_t4 ( pkey integer ); insert into rm32044_t1 values ( 1 , 'row1'); insert into rm32044_t1 values ( 2 , 'row2'); insert into rm32044_t2 values ( 1 , 'hidden', true); insert into rm32044_t2 values ( 2 , 'visible', false); insert into rm32044_t3 values (1 , 1); insert into rm32044_t3 values (2 , 1); postgres=# SELECT * FROM rm32044_t1 LEFT JOIN rm32044_t2 ON rm32044_t1.pkey = rm32044_t2.pkey, rm32044_t3 LEFT JOIN rm32044_t4 ON rm32044_t3.pkey = rm32044_t4.pkey order by rm32044_t1.pkey,label,hidden; server closed the connection unexpectedly This probably means the server terminated abnormally before or while processing the request. The connection to the server was lost. Attempting reset: Failed. The connection to the server was lost. Attempting reset: Failed. *backtrace:* [edb@9428da9d2137 postgres]$ gdb bin/postgres data/qemu_postgres_20230911-140628_65620.core Core was generated by `postgres: edb postgres [local] SELECT '. Program terminated with signal SIGSEGV, Segmentation fault. #0 0x010a8366 in heap_compute_data_size (tupleDesc=tupleDesc@entry=0x1ba3d10, values=values@entry=0x1ba4168, isnull=isnull@entry=0x1ba41a8) at heaptuple.c:227 227 VARATT_CAN_MAKE_SHORT(DatumGetPointer(val))) [Current thread is 1 (LWP 65597)] Missing separate debuginfos, use: dnf debuginfo-install glibc-2.34-60.el9.s390x libcap-2.48-8.el9.s390x libedit-3.1-37.20210216cvs.el9.s390x libffi-3.4.2-7.el9.s390x libgcc-11.3.1-4.3.el9.alma.s390x libgcrypt-1.10.0-10.el9_2.s390x libgpg-error-1.42-5.el9.s390x libstdc++-11.3.1-4.3.el9.alma.s390x libxml2-2.9.13-3.el9_2.1.s390x libzstd-1.5.1-2.el9.s390x llvm-libs-15.0.7-1.el9.s390x lz4-libs-1.9.3-5.el9.s390x ncurses-libs-6.2-8.20210508.el9.s390x openssl-libs-3.0.7-17.el9_2.s390x systemd-libs-252-14.el9_2.3.s390x xz-libs-5.2.5-8.el9_0.s390x (gdb) bt #0 0x010a8366 in heap_compute_data_size (tupleDesc=tupleDesc@entry=0x1ba3d10, values=values@entry=0x1ba4168, isnull=isnull@entry=0x1ba41a8) at heaptuple.c:227 #1 0x010a9bb0 in heap_form_minimal_tuple (tupleDescriptor=0x1ba3d10, values=0x1ba4168, isnull=0x1ba41a8) at heaptuple.c:1484 #2 0x016553fa in ExecCopySlotMinimalTuple (slot=) at ../../../../src/include/executor/tuptable.h:472 #3 tuplesort_puttupleslot (state=state@entry=0x1be4d18, slot=slot@entry=0x1ba4120) at tuplesortvariants.c:610 #4 0x012dc0e0 in ExecIncrementalSort (pstate=0x1acb4d8) at nodeIncrementalSort.c:716 #5 0x012b32c6 in ExecProcNode (node=0x1acb4d8) at ../../../src/include/executor/executor.h:273 #6 ExecutePlan (execute_once=, dest=0x1ade698, direction=, numberTuples=0, sendTuples=, operation=CMD_SELECT, use_parallel_mode=, planstate=0x1acb4d8, estate=0x1acb258) at execMain.c:1670 #7 standard_ExecutorRun (queryDesc=0x19ad338, direction=, count=0, execute_once=) at execMain.c:365 #8 0x014a6ae2 in PortalRunSelect (portal=portal@entry=0x1a63558, forward=forward@entry=true, count=0, count@entry=9223372036854775807, dest=dest@entry=0x1ade698) at pquery.c:924 #9 0x014a84e0 in PortalRun (portal=portal@entry=0x1a63558, count=count@entry=9223372036854775807, isTopLevel=isTopLevel@entry=true, run_once=run_once@entry=true, dest=dest@entry=0x1ade698, altdest=0x1ade698, qc=0x40007ff7b0) at pquery.c:768 #10 0x014a3c1c in exec_simple_query ( query_string=0x19ea0e8 "SELECT * FROM rm32044_t1 LEFT JOIN rm32044_t2 ON rm32044_t1.pkey = rm32044_t2.pkey, rm32044_t3 LEFT JOIN rm32044_t4 ON rm32044_t3.pkey = rm32044_t4.pkey order by rm32044_t1.pkey,label,hidden;") at postgres.c:1274 #11 0x014a57aa in PostgresMain (dbname=, username=) at postgres.c:4637 #12 0x013fdaf6 in BackendRun (port=0x1a132c0, port=0x1a132c0) at postmaster.c:4464 #13 BackendStartup (port=0x1a132c0) at postmaster.c:4192 #14 ServerLoop () at postmaster.c:1782 #15 0x013fec34 in PostmasterMain (argc=argc@entry=3, argv=argv@entry=0x19a59a0) at postmaster.c:1466 #16 0x01096faa in main (argc=, argv=0x19a59a0) at main.c:198 (gdb) p val $1 = 0 ``` Does anybody have any idea about this? -- -- Thanks & Regards,