Re: [HACKERS] Possible problem with shm_mq spin lock

2014-10-27 Thread Robert Haas
On Sat, Oct 25, 2014 at 9:12 PM, Tom Lane  wrote:
> Haribabu Kommi  writes:
>> Thanks for the details. I am sorry It is not proc_exit. It is the exit
>> callback functions that can cause problem.
>
>> The following is the callstack where the problem can happen, if the signal
>> handler is called after the spin lock took by the worker.
>
>> Breakpoint 1, 0x0072dd83 in shm_mq_detach ()
>> (gdb) bt
>> #0  0x0072dd83 in shm_mq_detach ()
>> #1  0x0072e7db in shm_mq_detach_callback ()
>> #2  0x00726d71 in dsm_detach ()
>> #3  0x00726c43 in dsm_backend_shutdown ()
>> #4  0x00727450 in shmem_exit ()
>> #5  0x007272fc in proc_exit_prepare ()
>> #6  0x00727501 in atexit_callback ()
>> #7  0x0030ff435da2 in exit () from /lib64/libc.so.6
>> #8  0x006ddaec in bgworker_quickdie ()
>
> Or in other words, Robert broke it.  This control path should absolutely
> not occur: the entire point of the on_exit_reset call in quickdie() is to
> prevent any callbacks from being executed when we get to shmem_exit().
> DSM-related functions DO NOT get an exemption.

All true.  However, Robert also fixed it, in commit
cb9a0c7987466b130fbced01ab5d5481cf3a16df, when you complained about it
previously.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Possible problem with shm_mq spin lock

2014-10-25 Thread Haribabu Kommi
On Sun, Oct 26, 2014 at 12:12 PM, Tom Lane  wrote:
> Haribabu Kommi  writes:
>> Thanks for the details. I am sorry It is not proc_exit. It is the exit
>> callback functions that can cause problem.
>
>> The following is the callstack where the problem can happen, if the signal
>> handler is called after the spin lock took by the worker.
>
>> Breakpoint 1, 0x0072dd83 in shm_mq_detach ()
>> (gdb) bt
>> #0  0x0072dd83 in shm_mq_detach ()
>> #1  0x0072e7db in shm_mq_detach_callback ()
>> #2  0x00726d71 in dsm_detach ()
>> #3  0x00726c43 in dsm_backend_shutdown ()
>> #4  0x00727450 in shmem_exit ()
>> #5  0x007272fc in proc_exit_prepare ()
>> #6  0x00727501 in atexit_callback ()
>> #7  0x0030ff435da2 in exit () from /lib64/libc.so.6
>> #8  0x006ddaec in bgworker_quickdie ()
>
> Or in other words, Robert broke it.  This control path should absolutely
> not occur: the entire point of the on_exit_reset call in quickdie() is to
> prevent any callbacks from being executed when we get to shmem_exit().
> DSM-related functions DO NOT get an exemption.

The "reset_on_dsm_detach" function is called to remove the DSM related
callbacks.
It's my mistake, I am really sorry, the code I am using is a wrong
one. Sorry for the noise.

Regards,
Hari Babu
Fujitsu Australia


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Possible problem with shm_mq spin lock

2014-10-25 Thread Tom Lane
Haribabu Kommi  writes:
> Thanks for the details. I am sorry It is not proc_exit. It is the exit
> callback functions that can cause problem.

> The following is the callstack where the problem can happen, if the signal
> handler is called after the spin lock took by the worker.

> Breakpoint 1, 0x0072dd83 in shm_mq_detach ()
> (gdb) bt
> #0  0x0072dd83 in shm_mq_detach ()
> #1  0x0072e7db in shm_mq_detach_callback ()
> #2  0x00726d71 in dsm_detach ()
> #3  0x00726c43 in dsm_backend_shutdown ()
> #4  0x00727450 in shmem_exit ()
> #5  0x007272fc in proc_exit_prepare ()
> #6  0x00727501 in atexit_callback ()
> #7  0x0030ff435da2 in exit () from /lib64/libc.so.6
> #8  0x006ddaec in bgworker_quickdie ()

Or in other words, Robert broke it.  This control path should absolutely
not occur: the entire point of the on_exit_reset call in quickdie() is to
prevent any callbacks from being executed when we get to shmem_exit().
DSM-related functions DO NOT get an exemption.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Possible problem with shm_mq spin lock

2014-10-25 Thread Haribabu Kommi
On Sun, Oct 26, 2014 at 10:17 AM, Andres Freund  wrote:
> Hi,
>
> On 2014-10-26 08:52:42 +1100, Haribabu Kommi wrote:
>> I am thinking of a possible problem with shm_mq structure spin lock.
>> This is used for protecting the shm_mq structure.
>>
>> During the processing of any code under the spin lock, if the process
>> receives SIGQUIT signal then it is leading to a dead lock situation.
>>
>> SIGQUIT->proc_exit->shm_mq_detach->try to acquire spin lock. The spin
>> lock is already took by the process.
>>
>> It is very dificult to reproduce the problem as because the code under
>> the lock is very minimal.
>> Please let me know if I missed anything.
>
> I think you missed the following bit in postgres.c:
>
> /*
>  * quickdie() occurs when signalled SIGQUIT by the postmaster.
>  *
>  * Some backend has bought the farm,
>  * so we need to stop what we're doing and exit.
>  */
> void
> quickdie(SIGNAL_ARGS)
> {
> ...
> /*
>  * We DO NOT want to run proc_exit() callbacks -- we're here because
>  * shared memory may be corrupted, so we don't want to try to clean 
> up our
>  * transaction.  Just nail the windows shut and get out of town.  Now 
> that
>  * there's an atexit callback to prevent third-party code from 
> breaking
>  * things by calling exit() directly, we have to reset the callbacks
>  * explicitly to make this work as intended.
>  */
> on_exit_reset();

Thanks for the details. I am sorry It is not proc_exit. It is the exit
callback functions
that can cause problem.

The following is the callstack where the problem can happen, if the signal
handler is called after the spin lock took by the worker.

Breakpoint 1, 0x0072dd83 in shm_mq_detach ()
(gdb) bt
#0  0x0072dd83 in shm_mq_detach ()
#1  0x0072e7db in shm_mq_detach_callback ()
#2  0x00726d71 in dsm_detach ()
#3  0x00726c43 in dsm_backend_shutdown ()
#4  0x00727450 in shmem_exit ()
#5  0x007272fc in proc_exit_prepare ()
#6  0x00727501 in atexit_callback ()
#7  0x0030ff435da2 in exit () from /lib64/libc.so.6
#8  0x006ddaec in bgworker_quickdie ()
#9  
#10 0x0072ce9a in shm_mq_sendv ()


Regards,
Hari Babu
Fujitsu Australia


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Possible problem with shm_mq spin lock

2014-10-25 Thread Andres Freund
Hi,

On 2014-10-26 08:52:42 +1100, Haribabu Kommi wrote:
> I am thinking of a possible problem with shm_mq structure spin lock.
> This is used for protecting the shm_mq structure.
> 
> During the processing of any code under the spin lock, if the process
> receives SIGQUIT signal then it is leading to a dead lock situation.
> 
> SIGQUIT->proc_exit->shm_mq_detach->try to acquire spin lock. The spin
> lock is already took by the process.
> 
> It is very dificult to reproduce the problem as because the code under
> the lock is very minimal.
> Please let me know if I missed anything.

I think you missed the following bit in postgres.c:

/*
 * quickdie() occurs when signalled SIGQUIT by the postmaster.
 *
 * Some backend has bought the farm,
 * so we need to stop what we're doing and exit.
 */
void
quickdie(SIGNAL_ARGS)
{
...
/*
 * We DO NOT want to run proc_exit() callbacks -- we're here because
 * shared memory may be corrupted, so we don't want to try to clean up 
our
 * transaction.  Just nail the windows shut and get out of town.  Now 
that
 * there's an atexit callback to prevent third-party code from breaking
 * things by calling exit() directly, we have to reset the callbacks
 * explicitly to make this work as intended.
 */
on_exit_reset();
..

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Possible problem with shm_mq spin lock

2014-10-25 Thread Haribabu Kommi
Hi Hackers,

I am thinking of a possible problem with shm_mq structure spin lock.
This is used for protecting the shm_mq structure.

During the processing of any code under the spin lock, if the process
receives SIGQUIT signal then it is leading to a dead lock situation.

SIGQUIT->proc_exit->shm_mq_detach->try to acquire spin lock. The spin
lock is already took by the process.

It is very dificult to reproduce the problem as because the code under
the lock is very minimal.
Please let me know if I missed anything.

Regards,
Hari Babu
Fujitsu Australia


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers