Re: [HACKERS] Postgres process invoking exit resulting in sh-QUIT core

2017-07-18 Thread K S, Sandhya (Nokia - IN/Bangalore)
Hi Craig,

While testing for another scenario of continuous postgres server restart, we 
got many cores of sh-QUIT and along with that we got cores for rm-QUIT. It is 
pointing to rm of the archive file but we were not able to get the bt as the 
stack is corrupted.

We got below info from gdb:
Core was generated by `rm ./Archive_00020118'.

And also we were able to get this info:
4518 12490  0.0  0.0  11484  1356 ?Ss   10:59   0:00 postgres: 
archiver process   archiving 00020118.0028.backup
4518 12704  2.0  0.0   7672  2932 ?S11:00   0:00   \_ sh -c 
rm ./Archive_*; touch ./Archive_"00020118.0028.backup"; 
exit 0
4518 12707  0.0  0.0344 4 ?S11:00   0:00
 \_ rm ./Archive_00020118

In the Postgres configuration file ,we have this information.
archive_command = 'rm ./Archive_*; touch ./Archive_"%f"; exit 0'

So while executing this archive command, core was generated.
You pointed out earlier that issue might be happening during archive command 
and also all evidence for this crash are pointing to this same command.
Are there any suggestions to recover from this situation or on ways to debug 
the issue ?

Regards,
Sandhya

From: K S, Sandhya (Nokia - IN/Bangalore)
Sent: Wednesday, July 12, 2017 4:51 PM
To: 'Craig Ringer' <cr...@2ndquadrant.com>
Cc: pgsql-bugs <pgsql-b...@postgresql.org>; PostgreSQL Hackers 
<pgsql-hackers@postgresql.org>; T, Rasna (Nokia - IN/Bangalore) 
<rasn...@nokia.com>; Itnal, Prakash (Nokia - IN/Bangalore) 
<prakash.it...@nokia.com>
Subject: RE: [HACKERS] Postgres process invoking exit resulting in sh-QUIT core

Hi Craig,

Here is bt after installing all the missing debuginfo packages.

(gdb) bt
#0  0x00fff7682f18 in do_lookup_x (undef_name=undef_name@entry=0xfff75cece5 
"_Jv_RegisterClasses", new_hash=new_hash@entry=2681263574,
old_hash=old_hash@entry=0xa159b8, ref=0xfff75ceac8, 
result=result@entry=0xa159a0, scope=, i=1, 
version=version@entry=0x0,
flags=flags@entry=1, skip=skip@entry=0x0, type_class=type_class@entry=0, 
undef_map=undef_map@entry=0xfff76a9478) at dl-lookup.c:444
#1  0x00fff76839a0 in _dl_lookup_symbol_x (undef_name=0xfff75cece5 
"_Jv_RegisterClasses", undef_map=0xfff76a9478, ref=0xa15a90,
symbol_scope=0xfff76a9980, version=0x0, type_class=, 
flags=, skip_map=0x0) at dl-lookup.c:833
#2  0x00fff7685730 in elf_machine_got_rel (lazy=1, map=0xfff76a9478) at 
../sysdeps/mips/dl-machine.h:870
#3  elf_machine_runtime_setup (profile=, lazy=1, l=0xfff76a9478) 
at ../sysdeps/mips/dl-machine.h:916
#4  _dl_relocate_object (scope=0xfff76a9980, reloc_mode=, 
consider_profiling=0) at dl-reloc.c:259
#5  0x00fff767ba10 in dl_main (phdr=, 
phdr@entry=0x12040, phnum=, phnum@entry=8,
user_entry=user_entry@entry=0xa15cf0, auxv=) at 
rtld.c:2070
#6  0x00fff7692e3c in _dl_sysdep_start (start_argptr=, 
dl_main=0xfff7679a98 ) at ../elf/dl-sysdep.c:249
#7  0x00fff767d0d8 in _dl_start_final (arg=arg@entry=0xa16410, 
info=info@entry=0xa15d80) at rtld.c:307
#8  0x00fff767d3d8 in _dl_start (arg=0xa16410) at rtld.c:415
#9  0x00fff7679380 in __start () from /lib64/ld.so.1

Please see if this could help in analysing the issue.

Regards,
Sandhya

From: Craig Ringer [mailto:cr...@2ndquadrant.com]
Sent: Friday, July 07, 2017 1:53 PM
To: K S, Sandhya (Nokia - IN/Bangalore) 
<sandhya@nokia.com<mailto:sandhya@nokia.com>>
Cc: pgsql-bugs <pgsql-b...@postgresql.org<mailto:pgsql-b...@postgresql.org>>; 
PostgreSQL Hackers 
<pgsql-hackers@postgresql.org<mailto:pgsql-hackers@postgresql.org>>; T, Rasna 
(Nokia - IN/Bangalore) <rasn...@nokia.com<mailto:rasn...@nokia.com>>; Itnal, 
Prakash (Nokia - IN/Bangalore) 
<prakash.it...@nokia.com<mailto:prakash.it...@nokia.com>>
Subject: Re: [HACKERS] Postgres process invoking exit resulting in sh-QUIT core

On 7 July 2017 at 15:41, K S, Sandhya (Nokia - IN/Bangalore) 
<sandhya@nokia.com<mailto:sandhya@nokia.com>> wrote:
Hi Craig,

The scenario is lock and unlock of the system for 30 times. During this 
scenario 5 sh-QUIT core is generated. GDB of 5 core is pointing to different 
locations.
I have attached output for 2 such instance.


You seem to be missing debug symbols. Install appropriate debuginfo packages.


--
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: [HACKERS] Postgres process invoking exit resulting in sh-QUIT core

2017-07-12 Thread K S, Sandhya (Nokia - IN/Bangalore)
Hi Craig,

Here is bt after installing all the missing debuginfo packages.

(gdb) bt
#0  0x00fff7682f18 in do_lookup_x (undef_name=undef_name@entry=0xfff75cece5 
"_Jv_RegisterClasses", new_hash=new_hash@entry=2681263574,
old_hash=old_hash@entry=0xa159b8, ref=0xfff75ceac8, 
result=result@entry=0xa159a0, scope=, i=1, 
version=version@entry=0x0,
flags=flags@entry=1, skip=skip@entry=0x0, type_class=type_class@entry=0, 
undef_map=undef_map@entry=0xfff76a9478) at dl-lookup.c:444
#1  0x00fff76839a0 in _dl_lookup_symbol_x (undef_name=0xfff75cece5 
"_Jv_RegisterClasses", undef_map=0xfff76a9478, ref=0xa15a90,
symbol_scope=0xfff76a9980, version=0x0, type_class=, 
flags=, skip_map=0x0) at dl-lookup.c:833
#2  0x00fff7685730 in elf_machine_got_rel (lazy=1, map=0xfff76a9478) at 
../sysdeps/mips/dl-machine.h:870
#3  elf_machine_runtime_setup (profile=, lazy=1, l=0xfff76a9478) 
at ../sysdeps/mips/dl-machine.h:916
#4  _dl_relocate_object (scope=0xfff76a9980, reloc_mode=, 
consider_profiling=0) at dl-reloc.c:259
#5  0x00fff767ba10 in dl_main (phdr=, 
phdr@entry=0x12040, phnum=, phnum@entry=8,
user_entry=user_entry@entry=0xa15cf0, auxv=) at 
rtld.c:2070
#6  0x00fff7692e3c in _dl_sysdep_start (start_argptr=, 
dl_main=0xfff7679a98 ) at ../elf/dl-sysdep.c:249
#7  0x00fff767d0d8 in _dl_start_final (arg=arg@entry=0xa16410, 
info=info@entry=0xa15d80) at rtld.c:307
#8  0x00fff767d3d8 in _dl_start (arg=0xa16410) at rtld.c:415
#9  0x00fff7679380 in __start () from /lib64/ld.so.1

Please see if this could help in analysing the issue.

Regards,
Sandhya

From: Craig Ringer [mailto:cr...@2ndquadrant.com]
Sent: Friday, July 07, 2017 1:53 PM
To: K S, Sandhya (Nokia - IN/Bangalore) <sandhya@nokia.com>
Cc: pgsql-bugs <pgsql-b...@postgresql.org>; PostgreSQL Hackers 
<pgsql-hackers@postgresql.org>; T, Rasna (Nokia - IN/Bangalore) 
<rasn...@nokia.com>; Itnal, Prakash (Nokia - IN/Bangalore) 
<prakash.it...@nokia.com>
Subject: Re: [HACKERS] Postgres process invoking exit resulting in sh-QUIT core

On 7 July 2017 at 15:41, K S, Sandhya (Nokia - IN/Bangalore) 
<sandhya@nokia.com<mailto:sandhya@nokia.com>> wrote:
Hi Craig,

The scenario is lock and unlock of the system for 30 times. During this 
scenario 5 sh-QUIT core is generated. GDB of 5 core is pointing to different 
locations.
I have attached output for 2 such instance.


You seem to be missing debug symbols. Install appropriate debuginfo packages.


--
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: [HACKERS] Postgres process invoking exit resulting in sh-QUIT core

2017-07-07 Thread K S, Sandhya (Nokia - IN/Bangalore)
Hi Craig,

The scenario is lock and unlock of the system for 30 times. During this 
scenario 5 sh-QUIT core is generated. GDB of 5 core is pointing to different 
locations.
I have attached output for 2 such instance.

Regards,
Sandhya

From: Craig Ringer [mailto:cr...@2ndquadrant.com]
Sent: Friday, July 07, 2017 12:55 PM
To: K S, Sandhya (Nokia - IN/Bangalore) <sandhya@nokia.com>
Cc: pgsql-bugs <pgsql-b...@postgresql.org>; PostgreSQL Hackers 
<pgsql-hackers@postgresql.org>; T, Rasna (Nokia - IN/Bangalore) 
<rasn...@nokia.com>; Itnal, Prakash (Nokia - IN/Bangalore) 
<prakash.it...@nokia.com>
Subject: Re: [HACKERS] Postgres process invoking exit resulting in sh-QUIT core

On 7 July 2017 at 15:10, K S, Sandhya (Nokia - IN/Bangalore) 
<sandhya@nokia.com<mailto:sandhya@nokia.com>> wrote:
Hi Craig,

You were right about the restore_command.

This all makes sense then.

PostgreSQL sends SIGQUIT for immediate shutdown to its children. So the 
restore_command would get signalled too.

Can't immediately explain the exit code, and SIGQUIT should _not_ generate a 
core file. Can you show the result of attaching 'gdb' to the core file and 
running 'bt full' ?

--
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services
GDB of first instance of corefile.
# gdb /bin/bash CFPU-1-7919-595e59a9-sh-QUIT.core
GNU gdb (Wind River Linux G++ 4.4a-470) 7.2.50.20100908-cvs
Copyright (C) 2010 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "mips64-wrs-linux-gnu".
For bug reporting instructions, please see:
<supp...@windriver.com>...
Reading symbols from /bin/bash...(no debugging symbols found)...done.

warning: core file may not match specified executable file.
[New LWP 7919]
Reading symbols from /lib64/libreadline.so.5...(no debugging symbols 
found)...done.
Loaded symbols for /lib64/libreadline.so.5
Reading symbols from /lib64/libhistory.so.5...(no debugging symbols 
found)...done.
Loaded symbols for /lib64/libhistory.so.5
Reading symbols from /lib64/libncurses.so.5...(no debugging symbols 
found)...done.
Loaded symbols for /lib64/libncurses.so.5
Reading symbols from /lib64/libdl.so.2...Reading symbols from 
/mnt/sysimg/usr/lib/debug/lib64/libdl-2.11.1.so.debug...(no debugging symbols 
found)...done.
(no debugging symbols found)...done.
Loaded symbols for /lib64/libdl.so.2
Reading symbols from /lib64/libc.so.6...Reading symbols from 
/mnt/sysimg/usr/lib/debug/lib64/libc-2.11.1.so.debug...done.
done.
Loaded symbols for /lib64/libc.so.6
Reading symbols from /lib64/libtinfo.so.5...(no debugging symbols found)...done.
Loaded symbols for /lib64/libtinfo.so.5
Reading symbols from /lib64/ld.so.1...Reading symbols from 
/mnt/sysimg/usr/lib/debug/lib64/ld-2.11.1.so.debug...(no debugging symbols 
found)...done.
(no debugging symbols found)...done.
Loaded symbols for /lib64/ld.so.1
Core was generated by `sh -c exit 1'.
Program terminated with signal 3, Quit.
#0  0x005558246a80 in _dl_lookup_symbol_x () from /lib64/ld.so.1
(gdb) bt full
#0  0x005558246a80 in _dl_lookup_symbol_x () from /lib64/ld.so.1
No symbol table info available.
#1  0x00555824816c in _dl_relocate_object () from /lib64/ld.so.1
No symbol table info available.
#2  0x00555823fb6c in dl_main () from /lib64/ld.so.1
No symbol table info available.
#3  0x005558254214 in _dl_sysdep_start () from /lib64/ld.so.1
No symbol table info available.
#4  0x00555823d1b0 in _dl_start_final () from /lib64/ld.so.1
No symbol table info available.
#5  0x00555823d3f0 in _dl_start () from /lib64/ld.so.1
No symbol table info available.
#6  0x00555823cc10 in __start () from /lib64/ld.so.1
No symbol table info available.
Backtrace stopped: frame did not save the PC






GDB of second instance of corefile.
# gdb /bin/bash CFPU-1-15638-595e5efb-sh-QUIT.core   
GNU gdb (Wind River Linux G++ 4.4a-470) 7.2.50.20100908-cvs
Copyright (C) 2010 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "mips64-wrs-linux-gnu".
For bug reporting instructions, please see:
<supp...@windriver.com>...
Reading symbols from /bin/bash...(no debugging symbols found)...done.

warning: core file may not match specified executable file.
[New LWP 15638]
Reading symbols from /lib64/libreadline.so.5...(no debugging symbols 
found)...done.
Loaded symbols for /lib64/libreadline.so.5
Rea

Re: [HACKERS] Postgres process invoking exit resulting in sh-QUIT core

2017-07-07 Thread Craig Ringer
On 7 July 2017 at 15:41, K S, Sandhya (Nokia - IN/Bangalore) <
sandhya@nokia.com> wrote:

> Hi Craig,
>
>
>
> The scenario is lock and unlock of the system for 30 times. During this
> scenario 5 sh-QUIT core is generated. GDB of 5 core is pointing to
> different locations.
>
> I have attached output for 2 such instance.
>
>

You seem to be missing debug symbols. Install appropriate debuginfo
packages.


-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: [HACKERS] Postgres process invoking exit resulting in sh-QUIT core

2017-07-07 Thread Craig Ringer
On 7 July 2017 at 15:10, K S, Sandhya (Nokia - IN/Bangalore) <
sandhya@nokia.com> wrote:

> Hi Craig,
>
>
>
> You were right about the restore_command.
>

This all makes sense then.

PostgreSQL sends SIGQUIT for immediate shutdown to its children. So the
restore_command would get signalled too.

Can't immediately explain the exit code, and SIGQUIT should _not_ generate
a core file. Can you show the result of attaching 'gdb' to the core file
and running 'bt full' ?

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: [HACKERS] Postgres process invoking exit resulting in sh-QUIT core

2017-07-05 Thread Craig Ringer
On 3 Jul. 2017 23:01, "K S, Sandhya (Nokia - IN/Bangalore)" <
sandhya@nokia.com> wrote:

Hi Craig,

Thanks for the response.

Scenario tried here is restart of the system multiple times. sh-QUIT core
is generated when Postgres is invoking the shell to exit and may not be due
to kernel or file system issues. I will try to reproduce the issue with
dmesg output being printed.

However, is there any instance in Postgres where 'sh -c exit 1' will be
invoked?


Most likely it's used directly or indirectly by an archive_commsnd or
restore_comand you have configured.


Re: [HACKERS] Postgres process invoking exit resulting in sh-QUIT core

2017-07-03 Thread K S, Sandhya (Nokia - IN/Bangalore)
Hi Craig,

Thanks for the response.

Scenario tried here is restart of the system multiple times. sh-QUIT core is 
generated when Postgres is invoking the shell to exit and may not be due to 
kernel or file system issues. I will try to reproduce the issue with dmesg 
output being printed.

However, is there any instance in Postgres where 'sh -c exit 1' will be invoked?

Regards,
Sandhya

-Original Message-
From: Craig Ringer [mailto:cr...@2ndquadrant.com] 
Sent: Friday, June 30, 2017 5:40 PM
To: K S, Sandhya (Nokia - IN/Bangalore) <sandhya@nokia.com>
Cc: pgsql-hackers@postgresql.org; pgsql-b...@postgresql.org; T, Rasna (Nokia - 
IN/Bangalore) <rasn...@nokia.com>; Itnal, Prakash (Nokia - IN/Bangalore) 
<prakash.it...@nokia.com>
Subject: Re: [HACKERS] Postgres process invoking exit resulting in sh-QUIT core

On 30 June 2017 at 17:41, K S, Sandhya (Nokia - IN/Bangalore)
<sandhya@nokia.com> wrote:

> When we checked the process listing during the time of core generation, we
> found Postgres startup process is invoking “sh -c exit 1”:
> 4518  9249  0.1  0.0 155964  2036 ?Ss   15:05   0:00 postgres:
> startup process   waiting for 000102EB

Looks like an archive_command or restore_command .

If 'sh' is dumping core, you probably have issues at a low level in
the kernel, file system, etc. Check dmesg.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Postgres process invoking exit resulting in sh-QUIT core

2017-06-30 Thread Craig Ringer
On 30 June 2017 at 17:41, K S, Sandhya (Nokia - IN/Bangalore)
 wrote:

> When we checked the process listing during the time of core generation, we
> found Postgres startup process is invoking “sh -c exit 1”:
> 4518  9249  0.1  0.0 155964  2036 ?Ss   15:05   0:00 postgres:
> startup process   waiting for 000102EB

Looks like an archive_command or restore_command .

If 'sh' is dumping core, you probably have issues at a low level in
the kernel, file system, etc. Check dmesg.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Postgres process invoking exit resulting in sh-QUIT core

2017-06-30 Thread K S, Sandhya (Nokia - IN/Bangalore)
Hi,

We are using Postgres version 9.3.14 over linux based OS and we are observing 
sh-QUIT core files randomly when we are restarting the system(occurrence seen 
once in 30 times).
Backtrace is showing as below:

Loaded symbols for /lib64/ld.so.1
Core was generated by `sh -c exit 1'.
Program terminated with signal 3, Quit.
#0  0x005559ed78f0 in do_lookup_x () from /lib64/ld.so.1
(gdb) bt
#0  0x005559ed78f0 in do_lookup_x () from /lib64/ld.so.1
#1  0x005559ed7b88 in _dl_lookup_symbol_x () from /lib64/ld.so.1
#2  0x005559ed916c in _dl_relocate_object () from /lib64/ld.so.1
#3  0x005559ed0b6c in dl_main () from /lib64/ld.so.1
#4  0x005559ee5214 in _dl_sysdep_start () from /lib64/ld.so.1
#5  0x005559ece1b0 in _dl_start_final () from /lib64/ld.so.1
#6  0x005559ece3f0 in _dl_start () from /lib64/ld.so.1
#7  0x005559ecdc10 in __start () from /lib64/ld.so.1
Backtrace stopped: frame did not save the PC

When we checked the process listing during the time of core generation, we 
found Postgres startup process is invoking "sh -c exit 1":
4518  9249  0.1  0.0 155964  2036 ?Ss   15:05   0:00 postgres: 
startup process   waiting for 000102EB
4518 10288  0.0  0.0   3600   508 ?S15:11   0:00  \_ sh -c exit 
1

We tried disabling DB and running the same testcase which didn't result in core 
being generated.
Also we are using immediate shutdown mode which uses SIGQUIT.

Can you please help us in debugging the issue ?

Regards,
Sandhya