Re: Server crash on RHEL 9/s390x platform against PG16

2023-10-22 Thread Suraj Kharage
On Sat, Oct 21, 2023 at 5:17 AM Andres Freund  wrote:

> Hi,
>
> On 2023-09-12 15:27:21 +0530, Suraj Kharage wrote:
> > *[edb@9428da9d2137 postgres]$ cat /etc/redhat-release AlmaLinux release
> 9.2
> > (Turquoise Kodkod)[edb@9428da9d2137 postgres]$ lscpuArchitecture:
> > s390x  CPU op-mode(s):   32-bit, 64-bit  Address sizes:39
> bits
>
> Can you provide the rest of the lscpu output?  There have been issues with
> Z14
> vs Z15:
> https://github.com/llvm/llvm-project/issues/53009
>
> You're apparently not hitting that, but given that fact, you either are on
> a
> slightly older CPU, or you have applied a patch to work around it. Because
> otherwise your uild instructions below would hit that problem, I think.
>
>
> > physical, 48 bits virtual  Byte Order:   Big Endian*
> > *Configure command:*
> > ./configure --prefix=/home/edb/postgres/ --with-lz4 --with-zstd
> --with-llvm
> > --with-perl --with-python --with-tcl --with-openssl --enable-nls
> > --with-libxml --with-libxslt --with-systemd --with-libcurl --without-icu
> > --enable-debug --enable-cassert --with-pgport=5414
>
> Hm, based on "--with-libcurl" this isn't upstream postgres, correct? Have
> you
> verified the issue reproduces on upstream postgres?
>

Yes, I can reproduce this on upstream postgres master and v16 branch.

Here are details:

./configure --prefix=/home/edb/postgres/ --with-zstd --with-llvm
--with-perl --with-python --with-tcl --with-openssl --enable-nls
--with-libxml --with-libxslt --with-systemd --without-icu --enable-debug
--enable-cassert --with-pgport=5414 CFLAGS="-g -O0"



[edb@9428da9d2137 postgres]$ cat /etc/redhat-release

AlmaLinux release 9.2 (Turquoise Kodkod)


[edb@9428da9d2137 edbas]$ lscpu

Architecture:   s390x

  CPU op-mode(s):   32-bit, 64-bit

  Address sizes:39 bits physical, 48 bits virtual

  Byte Order:   Big Endian

CPU(s): 9

  On-line CPU(s) list:  0-8

Vendor ID:  GenuineIntel

  Model name:   Intel(R) Core(TM) i7-9750H CPU @ 2.60GHz

CPU family: 6

Model:  158

Thread(s) per core: 1

Core(s) per socket: 1

Socket(s):  9

Stepping:   10

BogoMIPS:   5200.00

Flags:  fpu vme de pse tsc msr pae mce cx8 apic sep mtrr
pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht pbe syscall nx
pdpe1gb lm constant_tsc rep_good nopl xtopology nonstop_tsc cpuid pni
pclmulqdq dtes64 ds_cpl ssse3 sdbg fma cx

16 xtpr pcid sse4_1 sse4_2 movbe popcnt aes xsave
avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch fsgsbase bmi1 avx2
bmi2 erms xsaveopt arat

Caches (sum of all):

  L1d:  288 KiB (9 instances)

  L1i:  288 KiB (9 instances)

  L2:   2.3 MiB (9 instances)

  L3:   108 MiB (9 instances)

Vulnerabilities:

  Itlb multihit:KVM: Mitigation: VMX unsupported

  L1tf: Mitigation; PTE Inversion

  Mds:  Vulnerable; SMT Host state unknown

  Meltdown: Vulnerable

  Mmio stale data:  Vulnerable

  Spec store bypass:Vulnerable

  Spectre v1:   Vulnerable: __user pointer sanitization and
usercopy barriers only; no swapgs barriers

  Spectre v2:   Vulnerable, STIBP: disabled

  Srbds:Unknown: Dependent on hypervisor status

  Tsx async abort:  Not affected


[edb@9428da9d2137 postgres]$ clang --version

clang version 15.0.7 (Red Hat 15.0.7-2.el9)

Target: s390x-ibm-linux-gnu

Thread model: posix

InstalledDir: /usr/bin


[edb@9428da9d2137 postgres]$ rpm -qa | grep llvm

*llvm*-libs-15.0.7-1.el9.s390x

*llvm*-15.0.7-1.el9.s390x

*llvm*-test-15.0.7-1.el9.s390x

*llvm*-static-15.0.7-1.el9.s390x

*llvm*-devel-15.0.7-1.el9.s390x

Please let me know if any further information is required.


> >
> > *Test case:*
> > CREATE TABLE rm32044_t1
> > (
> > pkey   integer,
> > val  text
> > );
> > CREATE TABLE rm32044_t2
> > (
> > pkey   integer,
> > label  text,
> > hidden boolean
> > );
> > CREATE TABLE rm32044_t3
> > (
> > pkey integer,
> > val integer
> > );
> > CREATE TABLE rm32044_t4
> > (
> > pkey integer
> > );
> > insert into rm32044_t1 values ( 1 , 'row1');
> > insert into rm32044_t1 values ( 2 , 'row2');
> > insert into rm32044_t2 values ( 1 , 'hidden', true);
> > insert into rm32044_t2 values ( 2 , 'visible', false);
> > insert into rm32044_t3 values (1 , 1);
> > insert into rm32044_t3 values (2 , 1);
> >
> > postgres=# SELECT * FROM rm32044_t1 LEFT JOIN rm32044_t2 ON
> rm32044_t1.pkey
> > = rm32044_t2.pkey, rm32044_t3 LEFT JOIN rm32044_t4 ON rm32044_t3.pkey =
> > rm32044_t4.pkey order by rm32044_t1.pkey,label,hidden;
>
> > server closed the connection unexpectedly
> > This probably means the server terminated abnormally
> > before or while processing the request.
> > The connection to the server was lost. Attempting reset: Failed.
> > 

Re: Server crash on RHEL 9/s390x platform against PG16

2023-10-20 Thread Andres Freund
Hi,

On 2023-09-12 15:27:21 +0530, Suraj Kharage wrote:
> *[edb@9428da9d2137 postgres]$ cat /etc/redhat-release AlmaLinux release 9.2
> (Turquoise Kodkod)[edb@9428da9d2137 postgres]$ lscpuArchitecture:
> s390x  CPU op-mode(s):   32-bit, 64-bit  Address sizes:39 bits

Can you provide the rest of the lscpu output?  There have been issues with Z14
vs Z15:
https://github.com/llvm/llvm-project/issues/53009

You're apparently not hitting that, but given that fact, you either are on a
slightly older CPU, or you have applied a patch to work around it. Because
otherwise your uild instructions below would hit that problem, I think.


> physical, 48 bits virtual  Byte Order:   Big Endian*
> *Configure command:*
> ./configure --prefix=/home/edb/postgres/ --with-lz4 --with-zstd --with-llvm
> --with-perl --with-python --with-tcl --with-openssl --enable-nls
> --with-libxml --with-libxslt --with-systemd --with-libcurl --without-icu
> --enable-debug --enable-cassert --with-pgport=5414

Hm, based on "--with-libcurl" this isn't upstream postgres, correct? Have you
verified the issue reproduces on upstream postgres?

> 
> *Test case:*
> CREATE TABLE rm32044_t1
> (
> pkey   integer,
> val  text
> );
> CREATE TABLE rm32044_t2
> (
> pkey   integer,
> label  text,
> hidden boolean
> );
> CREATE TABLE rm32044_t3
> (
> pkey integer,
> val integer
> );
> CREATE TABLE rm32044_t4
> (
> pkey integer
> );
> insert into rm32044_t1 values ( 1 , 'row1');
> insert into rm32044_t1 values ( 2 , 'row2');
> insert into rm32044_t2 values ( 1 , 'hidden', true);
> insert into rm32044_t2 values ( 2 , 'visible', false);
> insert into rm32044_t3 values (1 , 1);
> insert into rm32044_t3 values (2 , 1);
> 
> postgres=# SELECT * FROM rm32044_t1 LEFT JOIN rm32044_t2 ON rm32044_t1.pkey
> = rm32044_t2.pkey, rm32044_t3 LEFT JOIN rm32044_t4 ON rm32044_t3.pkey =
> rm32044_t4.pkey order by rm32044_t1.pkey,label,hidden;

> server closed the connection unexpectedly
> This probably means the server terminated abnormally
> before or while processing the request.
> The connection to the server was lost. Attempting reset: Failed.
> The connection to the server was lost. Attempting reset: Failed.

I tried this on both master and 16, without hitting this issue.

If you can reproduce the issue on upstream postgres, can you share more about
your configuration?

Greetings,

Andres Freund




Re: Server crash on RHEL 9/s390x platform against PG16

2023-10-16 Thread Robert Haas
On Sun, Oct 8, 2023 at 10:55 PM Suraj Kharage <
suraj.khar...@enterprisedb.com> wrote:

> It looks like an issue with JIT. If I disable the JIT then the above query
> runs successfully.
>
> postgres=# set jit to off;
>
> SET
>
> postgres=# SELECT * FROM rm32044_t1 LEFT JOIN rm32044_t2 ON
> rm32044_t1.pkey = rm32044_t2.pkey, rm32044_t3 LEFT JOIN rm32044_t4 ON
> rm32044_t3.pkey = rm32044_t4.pkey order by rm32044_t1.pkey,label,hidden;
>
>  pkey | val  | pkey |  label  | hidden | pkey | val | pkey
>
> --+--+--+-++--+-+--
>
> 1 | row1 |1 | hidden  | t  |1 |   1 |
>
> 1 | row1 |1 | hidden  | t  |2 |   1 |
>
> 2 | row2 |2 | visible | f  |1 |   1 |
>
> 2 | row2 |2 | visible | f  |2 |   1 |
>
> (4 rows)
>
> Any idea on this?
>

No, but I found a few previous threads complaining about JIT not working on
s390x.

https://www.postgresql.org/message-id/4106722.1616177...@sss.pgh.pa.us
https://www.postgresql.org/message-id/3ba50664-56a2-bcf4-2b24-05a3e0a75...@enterprisedb.com
https://www.postgresql.org/message-id/20200715091509.GA3354074%40msg.df7cb.de

The most interesting email I found in those threads was this one:

http://postgr.es/m/3358505.1594912...@sss.pgh.pa.us

The backtrace there is different from the one you posted here in
significant ways, but it seems like both that case and this one involve a
null pointer showing up for a non-null pass-by-reference datum. That
doesn't seem like a whole lot to go on, but maybe somebody who understands
the JIT stuff better than I do will have an idea.

-- 
Robert Haas
EDB: http://www.enterprisedb.com


Re: Server crash on RHEL 9/s390x platform against PG16

2023-10-12 Thread Suraj Kharage
Here is clang version:

[edb@9428da9d2137]$ clang --version

clang version 15.0.7 (Red Hat 15.0.7-2.el9)

Target: s390x-ibm-linux-gnu

Thread model: posix

InstalledDir: /usr/bin


Let me know if any further information is needed.

On Mon, Oct 9, 2023 at 8:21 AM Suraj Kharage 
wrote:

> It looks like an issue with JIT. If I disable the JIT then the above query
> runs successfully.
>
> postgres=# set jit to off;
>
> SET
>
> postgres=# SELECT * FROM rm32044_t1 LEFT JOIN rm32044_t2 ON
> rm32044_t1.pkey = rm32044_t2.pkey, rm32044_t3 LEFT JOIN rm32044_t4 ON
> rm32044_t3.pkey = rm32044_t4.pkey order by rm32044_t1.pkey,label,hidden;
>
>  pkey | val  | pkey |  label  | hidden | pkey | val | pkey
>
> --+--+--+-++--+-+--
>
> 1 | row1 |1 | hidden  | t  |1 |   1 |
>
> 1 | row1 |1 | hidden  | t  |2 |   1 |
>
> 2 | row2 |2 | visible | f  |1 |   1 |
>
> 2 | row2 |2 | visible | f  |2 |   1 |
>
> (4 rows)
>
> Any idea on this?
>
> On Mon, Sep 18, 2023 at 11:20 AM Suraj Kharage <
> suraj.khar...@enterprisedb.com> wrote:
>
>> Few more details on this:
>>
>> (gdb) p val
>> $1 = 0
>> (gdb) p i
>> $2 = 3
>> (gdb) f 3
>> #3  0x01a1ef70 in ExecCopySlotMinimalTuple (slot=0x202e4f8) at
>> ../../../../src/include/executor/tuptable.h:472
>> 472 return slot->tts_ops->copy_minimal_tuple(slot);
>> (gdb) p *slot
>> $3 = {type = T_TupleTableSlot, tts_flags = 16, tts_nvalid = 8, tts_ops =
>> 0x1b6dcc8 , tts_tupleDescriptor = 0x202e0e8, tts_values =
>> 0x202e540, tts_isnull = 0x202e580, tts_mcxt = 0x1f54550, tts_tid =
>> {ip_blkid = {bi_hi = 65535,
>>   bi_lo = 65535}, ip_posid = 0}, tts_tableOid = 0}
>> (gdb) p *slot->tts_tupleDescriptor
>> $2 = {natts = 8, tdtypeid = 2249, tdtypmod = -1, tdrefcount = -1, constr
>> = 0x0, attrs = 0x202cd28}
>>
>> (gdb) p slot.tts_values[3]
>> $4 = 0
>> (gdb) p slot.tts_values[2]
>> $5 = 1
>> (gdb) p slot.tts_values[1]
>> $6 = 34027556
>>
>>
>> As per the resultslot, it has 0 value for the third attribute (column
>> lable).
>> Im testing this on the docker container and facing some issues with gdb
>> hence could not able to debug it further.
>>
>> Here is a explain plan:
>>
>> postgres=# explain (verbose, costs off) SELECT * FROM rm32044_t1 LEFT
>> JOIN rm32044_t2 ON rm32044_t1.pkey = rm32044_t2.pkey, rm32044_t3 LEFT JOIN
>> rm32044_t4 ON rm32044_t3.pkey = rm32044_t4.pkey order by
>> rm32044_t1.pkey,label,hidden;
>>
>>  QUERY PLAN
>>
>>
>> -
>>  Incremental Sort
>>Output: rm32044_t1.pkey, rm32044_t1.val, rm32044_t2.pkey,
>> rm32044_t2.label, rm32044_t2.hidden, rm32044_t3.pkey, rm32044_t3.val,
>> rm32044_t4.pkey
>>Sort Key: rm32044_t1.pkey, rm32044_t2.label, rm32044_t2.hidden
>>Presorted Key: rm32044_t1.pkey
>>->  Merge Left Join
>>  Output: rm32044_t1.pkey, rm32044_t1.val, rm32044_t2.pkey,
>> rm32044_t2.label, rm32044_t2.hidden, rm32044_t3.pkey, rm32044_t3.val,
>> rm32044_t4.pkey
>>  Merge Cond: (rm32044_t1.pkey = rm32044_t2.pkey)
>>  ->  Sort
>>Output: rm32044_t3.pkey, rm32044_t3.val, rm32044_t4.pkey,
>> rm32044_t1.pkey, rm32044_t1.val
>>Sort Key: rm32044_t1.pkey
>>->  Nested Loop
>>  Output: rm32044_t3.pkey, rm32044_t3.val,
>> rm32044_t4.pkey, rm32044_t1.pkey, rm32044_t1.val
>>  ->  Merge Left Join
>>Output: rm32044_t3.pkey, rm32044_t3.val,
>> rm32044_t4.pkey
>>Merge Cond: (rm32044_t3.pkey = rm32044_t4.pkey)
>>->  Sort
>>  Output: rm32044_t3.pkey, rm32044_t3.val
>>  Sort Key: rm32044_t3.pkey
>>  ->  Seq Scan on public.rm32044_t3
>>Output: rm32044_t3.pkey,
>> rm32044_t3.val
>>->  Sort
>>  Output: rm32044_t4.pkey
>>  Sort Key: rm32044_t4.pkey
>>  ->  Seq Scan on public.rm32044_t4
>>Output: rm32044_t4.pkey
>>  ->  Materialize
>>Output: rm32044_t1.pkey, rm32044_t1.val
>>->  Seq Scan on public.rm32044_t1
>>  Output: rm32044_t1.pkey, rm32044_t1.val
>>  ->  Sort
>>Output: rm32044_t2.pkey, rm32044_t2.label,
>> rm32044_t2.hidden
>>Sort Key: rm32044_t2.pkey
>>->  Seq Scan on public.rm32044_t2
>>  Output: rm32044_t2.pkey, rm32044_t2.label,
>> rm32044_t2.hidden
>> (34 rows)
>>
>>
>> It seems like while building the innerslot for merge join, the value for
>> attnum 1 is 

Re: Server crash on RHEL 9/s390x platform against PG16

2023-10-08 Thread Suraj Kharage
It looks like an issue with JIT. If I disable the JIT then the above query
runs successfully.

postgres=# set jit to off;

SET

postgres=# SELECT * FROM rm32044_t1 LEFT JOIN rm32044_t2 ON rm32044_t1.pkey
= rm32044_t2.pkey, rm32044_t3 LEFT JOIN rm32044_t4 ON rm32044_t3.pkey =
rm32044_t4.pkey order by rm32044_t1.pkey,label,hidden;

 pkey | val  | pkey |  label  | hidden | pkey | val | pkey

--+--+--+-++--+-+--

1 | row1 |1 | hidden  | t  |1 |   1 |

1 | row1 |1 | hidden  | t  |2 |   1 |

2 | row2 |2 | visible | f  |1 |   1 |

2 | row2 |2 | visible | f  |2 |   1 |

(4 rows)

Any idea on this?

On Mon, Sep 18, 2023 at 11:20 AM Suraj Kharage <
suraj.khar...@enterprisedb.com> wrote:

> Few more details on this:
>
> (gdb) p val
> $1 = 0
> (gdb) p i
> $2 = 3
> (gdb) f 3
> #3  0x01a1ef70 in ExecCopySlotMinimalTuple (slot=0x202e4f8) at
> ../../../../src/include/executor/tuptable.h:472
> 472 return slot->tts_ops->copy_minimal_tuple(slot);
> (gdb) p *slot
> $3 = {type = T_TupleTableSlot, tts_flags = 16, tts_nvalid = 8, tts_ops =
> 0x1b6dcc8 , tts_tupleDescriptor = 0x202e0e8, tts_values =
> 0x202e540, tts_isnull = 0x202e580, tts_mcxt = 0x1f54550, tts_tid =
> {ip_blkid = {bi_hi = 65535,
>   bi_lo = 65535}, ip_posid = 0}, tts_tableOid = 0}
> (gdb) p *slot->tts_tupleDescriptor
> $2 = {natts = 8, tdtypeid = 2249, tdtypmod = -1, tdrefcount = -1, constr =
> 0x0, attrs = 0x202cd28}
>
> (gdb) p slot.tts_values[3]
> $4 = 0
> (gdb) p slot.tts_values[2]
> $5 = 1
> (gdb) p slot.tts_values[1]
> $6 = 34027556
>
>
> As per the resultslot, it has 0 value for the third attribute (column
> lable).
> Im testing this on the docker container and facing some issues with gdb
> hence could not able to debug it further.
>
> Here is a explain plan:
>
> postgres=# explain (verbose, costs off) SELECT * FROM rm32044_t1 LEFT JOIN
> rm32044_t2 ON rm32044_t1.pkey = rm32044_t2.pkey, rm32044_t3 LEFT JOIN
> rm32044_t4 ON rm32044_t3.pkey = rm32044_t4.pkey order by
> rm32044_t1.pkey,label,hidden;
>
>  QUERY PLAN
>
>
> -
>  Incremental Sort
>Output: rm32044_t1.pkey, rm32044_t1.val, rm32044_t2.pkey,
> rm32044_t2.label, rm32044_t2.hidden, rm32044_t3.pkey, rm32044_t3.val,
> rm32044_t4.pkey
>Sort Key: rm32044_t1.pkey, rm32044_t2.label, rm32044_t2.hidden
>Presorted Key: rm32044_t1.pkey
>->  Merge Left Join
>  Output: rm32044_t1.pkey, rm32044_t1.val, rm32044_t2.pkey,
> rm32044_t2.label, rm32044_t2.hidden, rm32044_t3.pkey, rm32044_t3.val,
> rm32044_t4.pkey
>  Merge Cond: (rm32044_t1.pkey = rm32044_t2.pkey)
>  ->  Sort
>Output: rm32044_t3.pkey, rm32044_t3.val, rm32044_t4.pkey,
> rm32044_t1.pkey, rm32044_t1.val
>Sort Key: rm32044_t1.pkey
>->  Nested Loop
>  Output: rm32044_t3.pkey, rm32044_t3.val,
> rm32044_t4.pkey, rm32044_t1.pkey, rm32044_t1.val
>  ->  Merge Left Join
>Output: rm32044_t3.pkey, rm32044_t3.val,
> rm32044_t4.pkey
>Merge Cond: (rm32044_t3.pkey = rm32044_t4.pkey)
>->  Sort
>  Output: rm32044_t3.pkey, rm32044_t3.val
>  Sort Key: rm32044_t3.pkey
>  ->  Seq Scan on public.rm32044_t3
>Output: rm32044_t3.pkey,
> rm32044_t3.val
>->  Sort
>  Output: rm32044_t4.pkey
>  Sort Key: rm32044_t4.pkey
>  ->  Seq Scan on public.rm32044_t4
>Output: rm32044_t4.pkey
>  ->  Materialize
>Output: rm32044_t1.pkey, rm32044_t1.val
>->  Seq Scan on public.rm32044_t1
>  Output: rm32044_t1.pkey, rm32044_t1.val
>  ->  Sort
>Output: rm32044_t2.pkey, rm32044_t2.label, rm32044_t2.hidden
>Sort Key: rm32044_t2.pkey
>->  Seq Scan on public.rm32044_t2
>  Output: rm32044_t2.pkey, rm32044_t2.label,
> rm32044_t2.hidden
> (34 rows)
>
>
> It seems like while building the innerslot for merge join, the value for
> attnum 1 is not getting fetched correctly.
>
> On Tue, Sep 12, 2023 at 3:27 PM Suraj Kharage <
> suraj.khar...@enterprisedb.com> wrote:
>
>> Hi,
>>
>> Found server crash on RHEL 9/s390x platform with below test case -
>>
>> *Machine details:*
>>
>>
>>
>>
>>
>>
>>
>> *[edb@9428da9d2137 postgres]$ cat /etc/redhat-release AlmaLinux release
>> 9.2 (Turquoise Kodkod)[edb@9428da9d2137 postgres]$ lscpuArchitecture:
>> s390x  CPU op-mode(s): 

Re: Server crash on RHEL 9/s390x platform against PG16

2023-09-17 Thread Suraj Kharage
Few more details on this:

(gdb) p val
$1 = 0
(gdb) p i
$2 = 3
(gdb) f 3
#3  0x01a1ef70 in ExecCopySlotMinimalTuple (slot=0x202e4f8) at
../../../../src/include/executor/tuptable.h:472
472 return slot->tts_ops->copy_minimal_tuple(slot);
(gdb) p *slot
$3 = {type = T_TupleTableSlot, tts_flags = 16, tts_nvalid = 8, tts_ops =
0x1b6dcc8 , tts_tupleDescriptor = 0x202e0e8, tts_values =
0x202e540, tts_isnull = 0x202e580, tts_mcxt = 0x1f54550, tts_tid =
{ip_blkid = {bi_hi = 65535,
  bi_lo = 65535}, ip_posid = 0}, tts_tableOid = 0}
(gdb) p *slot->tts_tupleDescriptor
$2 = {natts = 8, tdtypeid = 2249, tdtypmod = -1, tdrefcount = -1, constr =
0x0, attrs = 0x202cd28}

(gdb) p slot.tts_values[3]
$4 = 0
(gdb) p slot.tts_values[2]
$5 = 1
(gdb) p slot.tts_values[1]
$6 = 34027556


As per the resultslot, it has 0 value for the third attribute (column
lable).
Im testing this on the docker container and facing some issues with gdb
hence could not able to debug it further.

Here is a explain plan:

postgres=# explain (verbose, costs off) SELECT * FROM rm32044_t1 LEFT JOIN
rm32044_t2 ON rm32044_t1.pkey = rm32044_t2.pkey, rm32044_t3 LEFT JOIN
rm32044_t4 ON rm32044_t3.pkey = rm32044_t4.pkey order by
rm32044_t1.pkey,label,hidden;

 QUERY PLAN

-
 Incremental Sort
   Output: rm32044_t1.pkey, rm32044_t1.val, rm32044_t2.pkey,
rm32044_t2.label, rm32044_t2.hidden, rm32044_t3.pkey, rm32044_t3.val,
rm32044_t4.pkey
   Sort Key: rm32044_t1.pkey, rm32044_t2.label, rm32044_t2.hidden
   Presorted Key: rm32044_t1.pkey
   ->  Merge Left Join
 Output: rm32044_t1.pkey, rm32044_t1.val, rm32044_t2.pkey,
rm32044_t2.label, rm32044_t2.hidden, rm32044_t3.pkey, rm32044_t3.val,
rm32044_t4.pkey
 Merge Cond: (rm32044_t1.pkey = rm32044_t2.pkey)
 ->  Sort
   Output: rm32044_t3.pkey, rm32044_t3.val, rm32044_t4.pkey,
rm32044_t1.pkey, rm32044_t1.val
   Sort Key: rm32044_t1.pkey
   ->  Nested Loop
 Output: rm32044_t3.pkey, rm32044_t3.val,
rm32044_t4.pkey, rm32044_t1.pkey, rm32044_t1.val
 ->  Merge Left Join
   Output: rm32044_t3.pkey, rm32044_t3.val,
rm32044_t4.pkey
   Merge Cond: (rm32044_t3.pkey = rm32044_t4.pkey)
   ->  Sort
 Output: rm32044_t3.pkey, rm32044_t3.val
 Sort Key: rm32044_t3.pkey
 ->  Seq Scan on public.rm32044_t3
   Output: rm32044_t3.pkey,
rm32044_t3.val
   ->  Sort
 Output: rm32044_t4.pkey
 Sort Key: rm32044_t4.pkey
 ->  Seq Scan on public.rm32044_t4
   Output: rm32044_t4.pkey
 ->  Materialize
   Output: rm32044_t1.pkey, rm32044_t1.val
   ->  Seq Scan on public.rm32044_t1
 Output: rm32044_t1.pkey, rm32044_t1.val
 ->  Sort
   Output: rm32044_t2.pkey, rm32044_t2.label, rm32044_t2.hidden
   Sort Key: rm32044_t2.pkey
   ->  Seq Scan on public.rm32044_t2
 Output: rm32044_t2.pkey, rm32044_t2.label,
rm32044_t2.hidden
(34 rows)


It seems like while building the innerslot for merge join, the value for
attnum 1 is not getting fetched correctly.

On Tue, Sep 12, 2023 at 3:27 PM Suraj Kharage <
suraj.khar...@enterprisedb.com> wrote:

> Hi,
>
> Found server crash on RHEL 9/s390x platform with below test case -
>
> *Machine details:*
>
>
>
>
>
>
>
> *[edb@9428da9d2137 postgres]$ cat /etc/redhat-release AlmaLinux release
> 9.2 (Turquoise Kodkod)[edb@9428da9d2137 postgres]$ lscpuArchitecture:
> s390x  CPU op-mode(s):   32-bit, 64-bit  Address sizes:39
> bits physical, 48 bits virtual  Byte Order:   Big Endian*
> *Configure command:*
> ./configure --prefix=/home/edb/postgres/ --with-lz4 --with-zstd
> --with-llvm --with-perl --with-python --with-tcl --with-openssl
> --enable-nls --with-libxml --with-libxslt --with-systemd --with-libcurl
> --without-icu --enable-debug --enable-cassert --with-pgport=5414
>
>
> *Test case:*
> CREATE TABLE rm32044_t1
> (
> pkey   integer,
> val  text
> );
> CREATE TABLE rm32044_t2
> (
> pkey   integer,
> label  text,
> hidden boolean
> );
> CREATE TABLE rm32044_t3
> (
> pkey integer,
> val integer
> );
> CREATE TABLE rm32044_t4
> (
> pkey integer
> );
> insert into rm32044_t1 values ( 1 , 'row1');
> insert into rm32044_t1 values ( 2 , 'row2');
> insert into rm32044_t2 values ( 1 , 'hidden', true);
> insert into rm32044_t2 values ( 2 , 'visible', false);
> insert into rm32044_t3 values (1 , 1);

Server crash on RHEL 9/s390x platform against PG16

2023-09-12 Thread Suraj Kharage
Hi,

Found server crash on RHEL 9/s390x platform with below test case -

*Machine details:*







*[edb@9428da9d2137 postgres]$ cat /etc/redhat-release AlmaLinux release 9.2
(Turquoise Kodkod)[edb@9428da9d2137 postgres]$ lscpuArchitecture:
s390x  CPU op-mode(s):   32-bit, 64-bit  Address sizes:39 bits
physical, 48 bits virtual  Byte Order:   Big Endian*
*Configure command:*
./configure --prefix=/home/edb/postgres/ --with-lz4 --with-zstd --with-llvm
--with-perl --with-python --with-tcl --with-openssl --enable-nls
--with-libxml --with-libxslt --with-systemd --with-libcurl --without-icu
--enable-debug --enable-cassert --with-pgport=5414


*Test case:*
CREATE TABLE rm32044_t1
(
pkey   integer,
val  text
);
CREATE TABLE rm32044_t2
(
pkey   integer,
label  text,
hidden boolean
);
CREATE TABLE rm32044_t3
(
pkey integer,
val integer
);
CREATE TABLE rm32044_t4
(
pkey integer
);
insert into rm32044_t1 values ( 1 , 'row1');
insert into rm32044_t1 values ( 2 , 'row2');
insert into rm32044_t2 values ( 1 , 'hidden', true);
insert into rm32044_t2 values ( 2 , 'visible', false);
insert into rm32044_t3 values (1 , 1);
insert into rm32044_t3 values (2 , 1);

postgres=# SELECT * FROM rm32044_t1 LEFT JOIN rm32044_t2 ON rm32044_t1.pkey
= rm32044_t2.pkey, rm32044_t3 LEFT JOIN rm32044_t4 ON rm32044_t3.pkey =
rm32044_t4.pkey order by rm32044_t1.pkey,label,hidden;
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
The connection to the server was lost. Attempting reset: Failed.
The connection to the server was lost. Attempting reset: Failed.

*backtrace:*
[edb@9428da9d2137 postgres]$ gdb bin/postgres
data/qemu_postgres_20230911-140628_65620.core
Core was generated by `postgres: edb postgres [local] SELECT  '.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x010a8366 in heap_compute_data_size
(tupleDesc=tupleDesc@entry=0x1ba3d10,
values=values@entry=0x1ba4168, isnull=isnull@entry=0x1ba41a8) at
heaptuple.c:227
227 VARATT_CAN_MAKE_SHORT(DatumGetPointer(val)))
[Current thread is 1 (LWP 65597)]
Missing separate debuginfos, use: dnf debuginfo-install
glibc-2.34-60.el9.s390x libcap-2.48-8.el9.s390x
libedit-3.1-37.20210216cvs.el9.s390x libffi-3.4.2-7.el9.s390x
libgcc-11.3.1-4.3.el9.alma.s390x libgcrypt-1.10.0-10.el9_2.s390x
libgpg-error-1.42-5.el9.s390x libstdc++-11.3.1-4.3.el9.alma.s390x
libxml2-2.9.13-3.el9_2.1.s390x libzstd-1.5.1-2.el9.s390x
llvm-libs-15.0.7-1.el9.s390x lz4-libs-1.9.3-5.el9.s390x
ncurses-libs-6.2-8.20210508.el9.s390x openssl-libs-3.0.7-17.el9_2.s390x
systemd-libs-252-14.el9_2.3.s390x xz-libs-5.2.5-8.el9_0.s390x
(gdb) bt
#0  0x010a8366 in heap_compute_data_size
(tupleDesc=tupleDesc@entry=0x1ba3d10,
values=values@entry=0x1ba4168, isnull=isnull@entry=0x1ba41a8) at
heaptuple.c:227
#1  0x010a9bb0 in heap_form_minimal_tuple
(tupleDescriptor=0x1ba3d10, values=0x1ba4168, isnull=0x1ba41a8) at
heaptuple.c:1484
#2  0x016553fa in ExecCopySlotMinimalTuple (slot=)
at ../../../../src/include/executor/tuptable.h:472
#3  tuplesort_puttupleslot (state=state@entry=0x1be4d18,
slot=slot@entry=0x1ba4120)
at tuplesortvariants.c:610
#4  0x012dc0e0 in ExecIncrementalSort (pstate=0x1acb4d8) at
nodeIncrementalSort.c:716
#5  0x012b32c6 in ExecProcNode (node=0x1acb4d8) at
../../../src/include/executor/executor.h:273
#6  ExecutePlan (execute_once=, dest=0x1ade698,
direction=, numberTuples=0, sendTuples=,
operation=CMD_SELECT, use_parallel_mode=,
planstate=0x1acb4d8, estate=0x1acb258) at execMain.c:1670
#7  standard_ExecutorRun (queryDesc=0x19ad338, direction=,
count=0, execute_once=) at execMain.c:365
#8  0x014a6ae2 in PortalRunSelect (portal=portal@entry=0x1a63558,
forward=forward@entry=true, count=0, count@entry=9223372036854775807,
dest=dest@entry=0x1ade698) at pquery.c:924
#9  0x014a84e0 in PortalRun (portal=portal@entry=0x1a63558,
count=count@entry=9223372036854775807, isTopLevel=isTopLevel@entry=true,
run_once=run_once@entry=true, dest=dest@entry=0x1ade698, altdest=0x1ade698,
qc=0x40007ff7b0) at pquery.c:768
#10 0x014a3c1c in exec_simple_query (
query_string=0x19ea0e8 "SELECT * FROM rm32044_t1 LEFT JOIN rm32044_t2
ON rm32044_t1.pkey = rm32044_t2.pkey, rm32044_t3 LEFT JOIN rm32044_t4 ON
rm32044_t3.pkey = rm32044_t4.pkey order by rm32044_t1.pkey,label,hidden;")
at postgres.c:1274
#11 0x014a57aa in PostgresMain (dbname=,
username=) at postgres.c:4637
#12 0x013fdaf6 in BackendRun (port=0x1a132c0, port=0x1a132c0) at
postmaster.c:4464
#13 BackendStartup (port=0x1a132c0) at postmaster.c:4192
#14 ServerLoop () at postmaster.c:1782
#15 0x013fec34 in PostmasterMain (argc=argc@entry=3,
argv=argv@entry=0x19a59a0)
at postmaster.c:1466
#16 0x01096faa in main (argc=, argv=0x19a59a0) at
main.c:198

(gdb) p val
$1 = 0
```

Does anybody have any idea about this?

-- 
--

Thanks & Regards,