Re: Segmentation Fault PG 14

2022-11-08 Thread Tom Lane
Willian Colognesi  writes:
> Looks like we can confirm that the jit disable fixed the problem, because
> since yesterday when I disabled jit, the database did not restarted again,
> and before it the database was restarting at least once per hour.

Hmm.  I now recall that we had a previous report of problems with
JIT on aarch64/Focal:

https://www.postgresql.org/message-id/flat/20220303150428.GA26036%40depesz.com

That was LLVM 9 not LLVM 10, but since we never identified the exact
issue, there's no real strong reason to suppose it's been fixed.

Probably keeping JIT off is the best answer for you --- it's hard to
say when we'll be able to make progress with this, given the lack of
reproducible test cases.

regards, tom lane




Re: Segmentation Fault PG 14

2022-11-08 Thread Willian Colognesi
Looks like we can confirm that the jit disable fixed the problem, because
since yesterday when I disabled jit, the database did not restarted again,
and before it the database was restarting at least once per hour.

I don't think it will cause too much impact in our use case having it
disabled, so, if you need anything else that could help the analyses to
find the bug feel free to let me know and I can grab the logs or whatever
needed.

Thanks y'all

On Mon, Nov 7, 2022 at 8:05 PM Tom Lane  wrote:

> Willian Colognesi  writes:
> > There is no llvm installed on ubuntu server, postgresql was installed via
> > apt package `apt install postgresql-14`
>
> If there's no LLVM around, then disabling JIT wouldn't do anything,
> because it depends on LLVM to compile code.
>
> We should perhaps wait awhile to see if that really fixed it.
>
> regards, tom lane
>


-- 


*Willian Cezar de O. Colognesi*
Systems Analysis Specialist, Trimble Transportation Brazil
Avenida Santos Dumont, 271 | Londrina, PR | 86039-090


Re: Segmentation Fault PG 14

2022-11-08 Thread Willian Colognesi
You are right Thomas,

Just confirmed and it's installed:

ubuntu@ip-10-x-x-x:~$ apt search llvm | grep inst
WARNING: apt does not have a stable CLI interface. Use with caution in
scripts.
libllvm10/focal,now 1:10.0.0-4ubuntu1 arm64 [installed,automatic]

I was trying something like `llvm -version` or something like that but did
not have success, but I verified, and in the apt is installed.

Tom,
Since yesterday the database hasn't restarted, so I'm believing that there
is some problem related to jit.

On Tue, Nov 8, 2022 at 4:11 AM Thomas Munro  wrote:

> On Tue, Nov 8, 2022 at 11:45 AM Willian Colognesi
>  wrote:
> > root@ip-10-x-x-x:/home/ubuntu# pg_config --configure
> > ... --with-extra-version= (Ubuntu 14.5-2.pgdg20.04+2)' ...
> > ... '--with-llvm' 'LLVM_CONFIG=/usr/bin/llvm-config-10' ...
>
> > There is no llvm installed on ubuntu server, postgresql was installed
> via apt package `apt install postgresql-14`
>
> We can see from the pg_config output that it's built with LLVM 10.
> Also that looks like it's the usual pgdg packages which are certainly
> built against LLVM and will install it automatically.
>


-- 


*Willian Cezar de O. Colognesi*
Systems Analysis Specialist, Trimble Transportation Brazil
Avenida Santos Dumont, 271 | Londrina, PR | 86039-090


Re: Segmentation Fault PG 14

2022-11-07 Thread Thomas Munro
On Tue, Nov 8, 2022 at 11:45 AM Willian Colognesi
 wrote:
> root@ip-10-x-x-x:/home/ubuntu# pg_config --configure
> ... --with-extra-version= (Ubuntu 14.5-2.pgdg20.04+2)' ...
> ... '--with-llvm' 'LLVM_CONFIG=/usr/bin/llvm-config-10' ...

> There is no llvm installed on ubuntu server, postgresql was installed via apt 
> package `apt install postgresql-14`

We can see from the pg_config output that it's built with LLVM 10.
Also that looks like it's the usual pgdg packages which are certainly
built against LLVM and will install it automatically.




Re: Segmentation Fault PG 14

2022-11-07 Thread Jeffrey Walton
On Mon, Nov 7, 2022 at 2:38 PM Tom Lane  wrote:
>
> Willian Colognesi  writes:
> > `I take it things were okay with the version you used previously?`
>
> > Yes, it was working pretty well in another instance with pg version
> > `12.4-1.pgdg18.04+1`, and we had to make a migration of one database that
> > was running in this server to another using Logical Replication.
>
> 12.4 to 14.5 is kind of a big jump :-(.
>
> The stack trace seems to indicate that ExecProcNode transferred control
> to never-never land, which says that something clobbered the function
> pointer it's trying to indirect through.  I don't recall having seen
> any similar reports though.

I'm just thinking out loud... I've seen the latest GCC do that on what
it believes to be dead code. Our problem was detailed at
https://github.com/weidai11/cryptopp/issues/1141 .

We identified the problem by building/running our self tests with
-fsanitize=unreachable .

Testing with -fsanitize=unreachable should confirm or rule out GCC and
Clang [incorrectly] removing code that is actually needed. If this is
the problem, then -fsanitize=unreachable will also provide a usable
stack trace and provide a useful debugging experience.

Jeff




Re: Segmentation Fault PG 14

2022-11-07 Thread Tom Lane
Willian Colognesi  writes:
> There is no llvm installed on ubuntu server, postgresql was installed via
> apt package `apt install postgresql-14`

If there's no LLVM around, then disabling JIT wouldn't do anything,
because it depends on LLVM to compile code.

We should perhaps wait awhile to see if that really fixed it.

regards, tom lane




Re: Segmentation Fault PG 14

2022-11-07 Thread Willian Colognesi
Do you mean how it was compiled? the output of pg_config is it:
```
root@ip-10-x-x-x:/home/ubuntu# pg_config --configure
 '--build=aarch64-linux-gnu' '--prefix=/usr'
'--includedir=${prefix}/include' '--mandir=${prefix}/share/man'
'--infodir=${prefix}/share/info' '--sysconfdir=/etc' '--localstatedir=/var'
'--disable-silent-rules' '--libdir=${prefix}/lib/aarch64-linux-gnu'
'--runstatedir=/run' '--disable-maintainer-mode'
'--disable-dependency-tracking' '--with-tcl' '--with-perl' '--with-python'
'--with-pam' '--with-openssl' '--with-libxml' '--with-libxslt'
'--mandir=/usr/share/postgresql/14/man'
'--docdir=/usr/share/doc/postgresql-doc-14'
'--sysconfdir=/etc/postgresql-common' '--datarootdir=/usr/share/'
'--datadir=/usr/share/postgresql/14' '--bindir=/usr/lib/postgresql/14/bin'
'--libdir=/usr/lib/aarch64-linux-gnu/' '--libexecdir=/usr/lib/postgresql/'
'--includedir=/usr/include/postgresql/' '--with-extra-version= (Ubuntu
14.5-2.pgdg20.04+2)' '--enable-nls' '--enable-thread-safety'
'--enable-debug' '--enable-dtrace' '--disable-rpath' '--with-uuid=e2fs'
'--with-gnu-ld' '--with-gssapi' '--with-ldap' '--with-pgport=5432'
'--with-system-tzdata=/usr/share/zoneinfo' 'AWK=mawk' 'MKDIR_P=/bin/mkdir
-p' 'PROVE=/usr/bin/prove' 'PYTHON=/usr/bin/python3' 'TAR=/bin/tar'
'XSLTPROC=xsltproc --nonet' 'CFLAGS=-g -O2 -fstack-protector-strong
-Wformat -Werror=format-security' 'LDFLAGS=-Wl,-Bsymbolic-functions
-Wl,-z,relro -Wl,-z,now' '--enable-tap-tests' '--with-icu' '--*with-llvm'
'LLVM_CONFIG=/usr/bin/llvm-config-10*' 'CLANG=/usr/bin/clang-10'
'--with-lz4' '--with-systemd' '--with-selinux'
'build_alias=aarch64-linux-gnu' 'CPPFLAGS=-Wdate-time -D_FORTIFY_SOURCE=2'
'CXXFLAGS=-g -O2 -fstack-protector-strong -Wformat -Werror=format-security'
```

There is no llvm installed on ubuntu server, postgresql was installed via
apt package `apt install postgresql-14`

On Mon, Nov 7, 2022 at 6:09 PM Tom Lane  wrote:

> Willian Colognesi  writes:
> > No, the database is running well, no problem until now after disabled
> *jit.*
>
> Interesting.  Which version of LLVM is installed?
>
> regards, tom lane
>


-- 


*Willian Cezar de O. Colognesi*
Systems Analysis Specialist, Trimble Transportation Brazil
Avenida Santos Dumont, 271 | Londrina, PR | 86039-090


Re: Segmentation Fault PG 14

2022-11-07 Thread Tom Lane
Willian Colognesi  writes:
> No, the database is running well, no problem until now after disabled *jit.*

Interesting.  Which version of LLVM is installed?

regards, tom lane




Re: Segmentation Fault PG 14

2022-11-07 Thread Willian Colognesi
No, the database is running well, no problem until now after disabled *jit.*

I just realized that he send an email direct to me, the message was:
```
I had similar problems with and the cure was to turn off jit in
Postgres.conf

jit = off
--
Boris
```



On Mon, Nov 7, 2022 at 5:25 PM Adrian Klaver 
wrote:

> On 11/7/22 12:15, Willian Colognesi wrote:
> > All the extensions installed in this database are these:
> > ```
> >   List of installed extensions
> >  Name| Version |   Schema   |
> >   Description
> >
> +-++---
> >   amcheck| 1.3 | public | functions for verifying
> > relation integrity
> >   btree_gist | 1.6 | public | support for indexing
> > common datatypes in GiST
> >   pg_stat_statements | 1.9 | public | track execution statistics
> > of all SQL statements executed
> >   pgcrypto   | 1.3 | public | cryptographic functions
> >   plpgsql| 1.0 | pg_catalog | PL/pgSQL procedural
> language
> > (5 rows)
> > ```
> >
> > I tried to execute a query with parameters the query was supposed to be
> > run (because I'm not sure exactly the values in the where clause that
> > made the segmentation fault).
> >
> > here is the explain: https://explain.depesz.com/s/Tql3
> >  (Ps: I just had to suppress the
> real
> > table/index names)
> >
> > Looks like since I've disable *jit* as Boris told, until now the
> > database did not restarted again... (not sure if it's coincidence)
> >
>
> I did not see that post or suggestion.
>
> What was the suggestion?
>
> Are you saying the database does not start up now?
>
> --
> Adrian Klaver
> adrian.kla...@aklaver.com
>
>

-- 


*Willian Cezar de O. Colognesi*
Systems Analysis Specialist, Trimble Transportation Brazil
Avenida Santos Dumont, 271 | Londrina, PR | 86039-090


Re: Segmentation Fault PG 14

2022-11-07 Thread Adrian Klaver

On 11/7/22 12:15, Willian Colognesi wrote:

All the extensions installed in this database are these:
```
                                      List of installed extensions
         Name        | Version |   Schema   |   
  Description

+-++---
  amcheck            | 1.3     | public     | functions for verifying 
relation integrity
  btree_gist         | 1.6     | public     | support for indexing 
common datatypes in GiST
  pg_stat_statements | 1.9     | public     | track execution statistics 
of all SQL statements executed

  pgcrypto           | 1.3     | public     | cryptographic functions
  plpgsql            | 1.0     | pg_catalog | PL/pgSQL procedural language
(5 rows)
```

I tried to execute a query with parameters the query was supposed to be 
run (because I'm not sure exactly the values in the where clause that 
made the segmentation fault).


here is the explain: https://explain.depesz.com/s/Tql3 
 (Ps: I just had to suppress the real 
table/index names)


Looks like since I've disable *jit* as Boris told, until now the 
database did not restarted again... (not sure if it's coincidence)




I did not see that post or suggestion.

What was the suggestion?

Are you saying the database does not start up now?

--
Adrian Klaver
adrian.kla...@aklaver.com





Re: Segmentation Fault PG 14

2022-11-07 Thread Willian Colognesi
All the extensions installed in this database are these:
```
 List of installed extensions
Name| Version |   Schema   |
 Description
+-++---
 amcheck| 1.3 | public | functions for verifying
relation integrity
 btree_gist | 1.6 | public | support for indexing common
datatypes in GiST
 pg_stat_statements | 1.9 | public | track execution statistics of
all SQL statements executed
 pgcrypto   | 1.3 | public | cryptographic functions
 plpgsql| 1.0 | pg_catalog | PL/pgSQL procedural language
(5 rows)
```

I tried to execute a query with parameters the query was supposed to be run
(because I'm not sure exactly the values in the where clause that made the
segmentation fault).

here is the explain: https://explain.depesz.com/s/Tql3 (Ps: I just had to
suppress the real table/index names)

Looks like since I've disable *jit* as Boris told, until now the database
did not restarted again... (not sure if it's coincidence)


On Mon, Nov 7, 2022 at 4:38 PM Tom Lane  wrote:

> Willian Colognesi  writes:
> > `I take it things were okay with the version you used previously?`
>
> > Yes, it was working pretty well in another instance with pg version
> > `12.4-1.pgdg18.04+1`, and we had to make a migration of one database that
> > was running in this server to another using Logical Replication.
>
> 12.4 to 14.5 is kind of a big jump :-(.
>
> The stack trace seems to indicate that ExecProcNode transferred control
> to never-never land, which says that something clobbered the function
> pointer it's trying to indirect through.  I don't recall having seen
> any similar reports though.
>
> Are you using any extensions besides those that come with core Postgres?
> A build incompatibility with some third-party extension might explain
> this, perhaps.
>
> One thing I'm curious about is that the stack trace seems to imply that
> there was an Append plan node immediately below another Append.  That
> shouldn't happen AFAIK --- the planner tries to collapse out such
> cases.  Can you get us an EXPLAIN for the problem query?
>
> regards, tom lane
>


-- 


*Willian Cezar de O. Colognesi*
Systems Analysis Specialist, Trimble Transportation Brazil
Avenida Santos Dumont, 271 | Londrina, PR | 86039-090


Re: Segmentation Fault PG 14

2022-11-07 Thread Tom Lane
Willian Colognesi  writes:
> `I take it things were okay with the version you used previously?`

> Yes, it was working pretty well in another instance with pg version
> `12.4-1.pgdg18.04+1`, and we had to make a migration of one database that
> was running in this server to another using Logical Replication.

12.4 to 14.5 is kind of a big jump :-(.

The stack trace seems to indicate that ExecProcNode transferred control
to never-never land, which says that something clobbered the function
pointer it's trying to indirect through.  I don't recall having seen
any similar reports though.

Are you using any extensions besides those that come with core Postgres?
A build incompatibility with some third-party extension might explain
this, perhaps.

One thing I'm curious about is that the stack trace seems to imply that
there was an Append plan node immediately below another Append.  That
shouldn't happen AFAIK --- the planner tries to collapse out such
cases.  Can you get us an EXPLAIN for the problem query?

regards, tom lane




Re: Segmentation Fault PG 14

2022-11-07 Thread Adrian Klaver

On 11/7/22 11:03 AM, Willian Colognesi wrote:
No, the origin where the database was was running ubuntu 18.04.5 x86_64 
and the destination ubuntu 20.04.5 aarch64


Where I was going was this:

https://wiki.postgresql.org/wiki/Locale_data_changes

Then I realized you had not done any binary upgrades, so that is a dead end.


--
Adrian Klaver
adrian.kla...@aklaver.com




Re: Segmentation Fault PG 14

2022-11-07 Thread Willian Colognesi
No, the origin where the database was was running ubuntu 18.04.5 x86_64 and
the destination ubuntu 20.04.5 aarch64

On Mon, Nov 7, 2022 at 4:00 PM Adrian Klaver 
wrote:

> On 11/7/22 10:57 AM, Willian Colognesi wrote:
> > 1) What versions of pg_dump and pg_restore did you use?
> > A: pg_dump and pg_restore was done using pg 14 (the same as the
> > destination was running)
> >
> > 2) To be clear the subscription was started after the restore?
> > A: Yes
> >
> > 3) Where there any error messages issued at any point in below?
> > A: no errors during the dump and restore.
> >
> > 4) Are the database clusters on the same machine?
> > A: No, the origin and destination were different servers at the same VPC.
>
> Are servers using the same version of OS?
>
>
> --
> Adrian Klaver
> adrian.kla...@aklaver.com
>


-- 


*Willian Cezar de O. Colognesi*
Systems Analysis Specialist, Trimble Transportation Brazil
Avenida Santos Dumont, 271 | Londrina, PR | 86039-090


Re: Segmentation Fault PG 14

2022-11-07 Thread Adrian Klaver

On 11/7/22 10:57 AM, Willian Colognesi wrote:

1) What versions of pg_dump and pg_restore did you use?
A: pg_dump and pg_restore was done using pg 14 (the same as the 
destination was running)


2) To be clear the subscription was started after the restore?
A: Yes

3) Where there any error messages issued at any point in below?
A: no errors during the dump and restore.

4) Are the database clusters on the same machine?
A: No, the origin and destination were different servers at the same VPC.


Are servers using the same version of OS?


--
Adrian Klaver
adrian.kla...@aklaver.com




Re: Segmentation Fault PG 14

2022-11-07 Thread Willian Colognesi
1) What versions of pg_dump and pg_restore did you use?
A: pg_dump and pg_restore was done using pg 14 (the same as the destination
was running)

2) To be clear the subscription was started after the restore?
A: Yes

3) Where there any error messages issued at any point in below?
A: no errors during the dump and restore.

4) Are the database clusters on the same machine?
A: No, the origin and destination were different servers at the same VPC.

On Mon, Nov 7, 2022 at 3:49 PM Adrian Klaver 
wrote:

> On 11/7/22 10:36 AM, Willian Colognesi wrote:
> > Hi Tom,
> >
> > `I take it things were okay with the version you used previously?`
> > Yes, it was working pretty well in another instance with pg version
> > `12.4-1.pgdg18.04+1`, and we had to make a migration of one database
> > that was running in this server to another using Logical Replication.
>
> Actually you used dump/restore and logical replication. '
>
> In below:
>
> 1) What versions of pg_dump and pg_restore did you use?
>
> 2) To be clear the subscription was started after the restore?
>
> 3) Where there any error messages issued at any point in below?
>
> 4) Are the database clusters on the same machine?
>
> >
> > the process was basically this:
> > |CREATE| |PUBLICATION my_database_pub ||FOR| |ALL| |TABLES;|
> > |postgres@origin:~$ psql "dbname= replication=database"
> > |
> > |
> > |my_database=# CREATE_REPLICATION_SLOT  LOGICAL pgoutput;|
> > pg_dump -j4 -h  -p 5432 --no-subscriptions --no-publications -d
> >  --snapshot= -Fd -U  -f
> > 
> > postgres@destination:/mnt/database$ pg_restore -d  -j 5
> > 
> >
> > CREATE SUBSCRIPTION 
> > CONNECTION 'host= dbname= user=replica
> > password=?? port=5432'
> > PUBLICATION 
> > WITH (slot_name=, create_slot=false, copy_data=false);
> > |
> >
> >
> > After this migration we started to have this kind of problem in both
> > replica and primary servers.
> >
> > `This looks pretty messed up.  Are you sure the debug symbols you're
> using`
> > What exactly do you mean? I'm not too familiar with this debug toolings,
> > the packages I've used were:
> >
> > postgresql-14/focal-pgdg,now 14.5-2.pgdg20.04+2 arm64 [installed]
> > postgresql-14-dbgsym/focal-pgdg,now 14.5-2.pgdg20.04+2 arm64 [installed]
> >
> > `Even better, can you construct a self-contained test case?`:
> > Actually I couldn't reproduce the problem because it's happening just in
> > a production database, and it doesn't look to have a pattern in the
> > cases when it happens.
> >
> > Is there anything I could provide you to help the analysis ?
> >
> >
> >
> > On Mon, Nov 7, 2022 at 3:08 PM Tom Lane  > > wrote:
> >
> > Willian Colognesi  > > writes:
> >  > I started to use version `14.5-2.pgdg20.04+2` for a dedicated
> > database and
> >  > I'm facing many segmentation faults during the day when the
> > database has
> >  > more heavy queries.
> >
> > I take it things were okay with the version you used previously?
> > What was that exactly?  Has anything else changed?
> >
> >  > I could also get a little information from gdb, I'm not sure if
> > it will
> >  > help:
> >
> > This looks pretty messed up.  Are you sure the debug symbols you're
> > using
> > match the package?
> >
> > Even better, can you construct a self-contained test case?
> >
> >  regards, tom lane
> >
> >
> >
> > --
> > 
> > *Willian Cezar de O. Colognesi
> > *
> > Systems Analysis Specialist, Trimble Transportation Brazil
> > Avenida Santos Dumont, 271 | Londrina, PR | 86039-090
> >
>
>
> --
> Adrian Klaver
> adrian.kla...@aklaver.com
>


-- 


*Willian Cezar de O. Colognesi*
Systems Analysis Specialist, Trimble Transportation Brazil
Avenida Santos Dumont, 271 | Londrina, PR | 86039-090


Re: Segmentation Fault PG 14

2022-11-07 Thread Adrian Klaver

On 11/7/22 10:36 AM, Willian Colognesi wrote:

Hi Tom,

`I take it things were okay with the version you used previously?`
Yes, it was working pretty well in another instance with pg version 
`12.4-1.pgdg18.04+1`, and we had to make a migration of one database 
that was running in this server to another using Logical Replication.


Actually you used dump/restore and logical replication. '

In below:

1) What versions of pg_dump and pg_restore did you use?

2) To be clear the subscription was started after the restore?

3) Where there any error messages issued at any point in below?

4) Are the database clusters on the same machine?



the process was basically this:
|CREATE| |PUBLICATION my_database_pub ||FOR| |ALL| |TABLES;|
|postgres@origin:~$ psql "dbname= replication=database"
|
|
|my_database=# CREATE_REPLICATION_SLOT  LOGICAL pgoutput;|
pg_dump -j4 -h  -p 5432 --no-subscriptions --no-publications -d 
 --snapshot= -Fd -U  -f 

postgres@destination:/mnt/database$ pg_restore -d  -j 5 



CREATE SUBSCRIPTION 
        CONNECTION 'host= dbname= user=replica 
password=?? port=5432'

        PUBLICATION 
        WITH (slot_name=, create_slot=false, copy_data=false);
|


After this migration we started to have this kind of problem in both 
replica and primary servers.


`This looks pretty messed up.  Are you sure the debug symbols you're using`
What exactly do you mean? I'm not too familiar with this debug toolings, 
the packages I've used were:


postgresql-14/focal-pgdg,now 14.5-2.pgdg20.04+2 arm64 [installed]
postgresql-14-dbgsym/focal-pgdg,now 14.5-2.pgdg20.04+2 arm64 [installed]

`Even better, can you construct a self-contained test case?`:
Actually I couldn't reproduce the problem because it's happening just in 
a production database, and it doesn't look to have a pattern in the 
cases when it happens.


Is there anything I could provide you to help the analysis ?



On Mon, Nov 7, 2022 at 3:08 PM Tom Lane > wrote:


Willian Colognesi mailto:willian_cologn...@trimble.com>> writes:
 > I started to use version `14.5-2.pgdg20.04+2` for a dedicated
database and
 > I'm facing many segmentation faults during the day when the
database has
 > more heavy queries.

I take it things were okay with the version you used previously?
What was that exactly?  Has anything else changed?

 > I could also get a little information from gdb, I'm not sure if
it will
 > help:

This looks pretty messed up.  Are you sure the debug symbols you're
using
match the package?

Even better, can you construct a self-contained test case?

                         regards, tom lane



--

*Willian Cezar de O. Colognesi
*
Systems Analysis Specialist, Trimble Transportation Brazil
Avenida Santos Dumont, 271 | Londrina, PR | 86039-090




--
Adrian Klaver
adrian.kla...@aklaver.com




Re: Segmentation Fault PG 14

2022-11-07 Thread Willian Colognesi
Hi Tom,

`I take it things were okay with the version you used previously?`
Yes, it was working pretty well in another instance with pg version
`12.4-1.pgdg18.04+1`, and we had to make a migration of one database that
was running in this server to another using Logical Replication.

the process was basically this:
CREATE PUBLICATION my_database_pub FOR ALL TABLES;
postgres@origin:~$ psql "dbname= replication=database"
my_database=# CREATE_REPLICATION_SLOT  LOGICAL pgoutput;
pg_dump -j4 -h  -p 5432 --no-subscriptions --no-publications -d
 --snapshot= -Fd -U  -f

postgres@destination:/mnt/database$ pg_restore -d  -j 5


CREATE SUBSCRIPTION 
   CONNECTION 'host= dbname= user=replica
password=?? port=5432'
   PUBLICATION 
   WITH (slot_name=, create_slot=false, copy_data=false);

After this migration we started to have this kind of problem in both
replica and primary servers.

`This looks pretty messed up.  Are you sure the debug symbols you're using`
What exactly do you mean? I'm not too familiar with this debug toolings,
the packages I've used were:

postgresql-14/focal-pgdg,now 14.5-2.pgdg20.04+2 arm64 [installed]
postgresql-14-dbgsym/focal-pgdg,now 14.5-2.pgdg20.04+2 arm64 [installed]

`Even better, can you construct a self-contained test case?`:
Actually I couldn't reproduce the problem because it's happening just in a
production database, and it doesn't look to have a pattern in the cases
when it happens.

Is there anything I could provide you to help the analysis ?



On Mon, Nov 7, 2022 at 3:08 PM Tom Lane  wrote:

> Willian Colognesi  writes:
> > I started to use version `14.5-2.pgdg20.04+2` for a dedicated database
> and
> > I'm facing many segmentation faults during the day when the database has
> > more heavy queries.
>
> I take it things were okay with the version you used previously?
> What was that exactly?  Has anything else changed?
>
> > I could also get a little information from gdb, I'm not sure if it will
> > help:
>
> This looks pretty messed up.  Are you sure the debug symbols you're using
> match the package?
>
> Even better, can you construct a self-contained test case?
>
> regards, tom lane
>


-- 


*Willian Cezar de O. Colognesi*
Systems Analysis Specialist, Trimble Transportation Brazil
Avenida Santos Dumont, 271 | Londrina, PR | 86039-090


Re: Segmentation Fault PG 14

2022-11-07 Thread Tom Lane
Willian Colognesi  writes:
> I started to use version `14.5-2.pgdg20.04+2` for a dedicated database and
> I'm facing many segmentation faults during the day when the database has
> more heavy queries.

I take it things were okay with the version you used previously?
What was that exactly?  Has anything else changed?

> I could also get a little information from gdb, I'm not sure if it will
> help:

This looks pretty messed up.  Are you sure the debug symbols you're using
match the package?

Even better, can you construct a self-contained test case?

regards, tom lane




Segmentation Fault PG 14

2022-11-07 Thread Willian Colognesi
Hello!

I started to use version `14.5-2.pgdg20.04+2` for a dedicated database and
I'm facing many segmentation faults during the day when the database has
more heavy queries.

The server log there are many of this:
```
2022-11-07 17:23:19.423 UTC [728] LOG:  background worker "parallel worker"
(PID 9558) was terminated by signal 11: Segmentation fault
2022-11-07 17:23:19.423 UTC [728] DETAIL:  Failed process was running:
select blablabla from heavyquery where ...;
2022-11-07 17:23:19.423 UTC [728] LOG:  terminating any other active server
processes
2022-11-07 17:23:19.681 UTC [9588] microservice@microservice FATAL:  the
database system is in recovery mode
2022-11-07 17:23:19.683 UTC [9589] microservice@microservice FATAL:  the
database system is in recovery mode
2022-11-07 17:23:24.543 UTC [728] LOG:  all server processes terminated;
reinitializing
2022-11-07 17:23:24.894 UTC [9622] LOG:  database system was interrupted;
last known up at 2022-11-07 17:22:07 UTC
2022-11-07 17:23:25.636 UTC [9622] LOG:  invalid record length at
134/227A3A68: wanted 24, got 0
2022-11-07 17:23:25.636 UTC [9622] LOG:  redo done at 134/227A3A38 system
usage: CPU: user: 0.04 s, system: 0.06 s, elapsed: 0.70 s
2022-11-07 17:23:27.608 UTC [728] LOG:  database system is ready to accept
connections
2022-11-07 17:23:33.474 UTC [9635] replica@[unknown] LOG:  could not
receive data from client: Connection reset by peer
2022-11-07 17:23:33.474 UTC [9635] replica@[unknown] STATEMENT:
 START_REPLICATION 134/2200 TIMELINE 1
2022-11-07 17:23:33.474 UTC [9635] replica@[unknown] LOG:  unexpected EOF
on standby connection
2022-11-07 17:23:33.474 UTC [9635] replica@[unknown] STATEMENT:
 START_REPLICATION 134/2200 TIMELINE 1
2022-11-07 17:23:51.310 UTC [9662] replica@[unknown] LOG:  could not
receive data from client: Connection reset by peer
2022-11-07 17:23:51.310 UTC [9662] replica@[unknown] STATEMENT:
 START_REPLICATION 134/2200 TIMELINE 1
2022-11-07 17:23:51.310 UTC [9662] replica@[unknown] LOG:  unexpected EOF
on standby connection
2022-11-07 17:23:51.310 UTC [9662] replica@[unknown] STATEMENT:
 START_REPLICATION 134/2200 TIMELINE 1
INFO: 2022/11/07 17:23:51.445710 FILE PATH: 000101340022.lz4
2022-11-07 17:24:09.206 UTC [9672] replica@[unknown] LOG:  could not
receive data from client: Connection reset by peer
2022-11-07 17:24:09.206 UTC [9672] replica@[unknown] STATEMENT:
 START_REPLICATION 134/2300 TIMELINE 1
2022-11-07 17:24:09.206 UTC [9672] replica@[unknown] LOG:  unexpected EOF
on standby connection
2022-11-07 17:24:09.206 UTC [9672] replica@[unknown] STATEMENT:
 START_REPLICATION 134/2300 TIMELINE 1
INFO: 2022/11/07 17:24:27.527897 FILE PATH: 000101340023.lz4
INFO: 2022/11/07 17:24:38.076058 FILE PATH: 000101340024.lz4
```

It's server is running in ubuntu 22.04 in aarch64 (ARM architecture)

I could also get a little information from gdb, I'm not sure if it will
help:
```
Type "apropos word" to search for commands related to "word"...
Reading symbols from /usr/lib/postgresql/14/bin/postgres...
Reading symbols from
/usr/lib/debug/.build-id/d7/87a0cf1bb645b349f7c137e36cc30f7ba8805f.debug...
[New LWP 9559]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/aarch64-linux-gnu/libthread_db.so.1".
Core was generated by `postgres: 14/main: parallel worker for PID 9528
  '.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x00010c757c9c in ?? ()
(gdb) bt
#0  0x00010c757c9c in ?? ()
#1  0x0c757124 in ?? ()
#2  0xc2ac9970 in ExecProcNode (node=0xfc599818) at
./build/../src/include/executor/executor.h:257
#3  ExecAppend (pstate=0xfc595918) at
./build/../src/backend/executor/nodeAppend.c:360
#4  0xc2ac9970 in ExecProcNode (node=0xfc595918) at
./build/../src/include/executor/executor.h:257
#5  ExecAppend (pstate=0xfc526988) at
./build/../src/backend/executor/nodeAppend.c:360
#6  0x0001 in ?? ()
Backtrace stopped: previous frame identical to this frame (corrupt stack?)
(gdb)
```

Has anyone already faced this problem or may know a solution?

Thanks in advance.

-- 


*Willian Cezar de O. Colognesi*
Systems Analysis Specialist, Trimble Transportation Brazil
Avenida Santos Dumont, 271 | Londrina, PR | 86039-090