[HACKERS] possible self-deadlock window after bad ProcessStartupPacket

2017-02-02 Thread Jimmy Yih
Hello,

There may possibly be a very small window for a double exit() self-deadlock
during a forked backend process's ProcessStartupPacket returns and status
is not STATUS_OK.  The process will go into proc_exit and then a very well
timed SIGQUIT will call startup_die for another proc_exit.  If timed
correctly, the two exit() calls can deadlock since exit() is not
re-entrant.  It seems extremely hard to hit the deadlock but I believe the
opportunity is there.

Using gdb, I was able to create the window and get this stacktrace:

#0  startup_die (postgres_signal_arg=0) at postmaster.c:5090
#1  
#2  proc_exit_prepare (code=0) at ipc.c:158
#3  0x007c4135 in proc_exit (code=0) at ipc.c:102
#4  0x0076b736 in BackendInitialize (port=0x2c13740) at
postmaster.c:4207
#5  0x0076b190 in BackendStartup (port=0x2c13740) at postmaster.c:3979
#6  0x00767ad3 in ServerLoop () at postmaster.c:1722
#7  0x007671df in PostmasterMain (argc=3, argv=0x2bebad0) at
postmaster.c:1330
#8  0x006b5df6 in main (argc=3, argv=0x2bebad0) at main.c:228

I got the stacktrace by doing the following:

   1. gdb attach to postmaster and run set follow-fork child and break
   postmaster.c:4206(right after ProcessStartupPacket) and continue
   2. In another terminal, open a psql session which should trigger the gdb
   follow
   3. In the gdb session, set status=1 and step into proc_exit()
   4. In another terminal, kill -s QUIT  to send SIGQUIT to the
   child process. Or run pg_ctl stop -M immediate.
   5. In the gdb session, step to process the signal into startup_die and
   run bt

This was discovered while hacking on Greenplum Database (currently based
off of Postgres 8.3) where we recently started encountering the
self-deadlock intermittently in our testing environment.

Here's the pull request discussion:
https://github.com/greenplum-db/gpdb/pull/1662

In that pull request, we fix the issue by checking for proc_exit_inprogress.
Is there a reason why startup_die should not check for proc_exit_inprogress?

In the above pull request, Heikki also mentions that a similar scenario can
happen during palloc() as well... which is similar to what we saw in
Greenplum a couple years back for a deadlock in a malloc() call where we
responded by changing exit() to _exit() in quickdie as a fix.  That could
possibly be applicable to latest Postgres as well.

Regards,
Jimmy


Re: [HACKERS] pg_isready features

2016-06-15 Thread Jimmy
yup, it does for (1)

:-)





On Wed, Jun 15, 2016 at 9:53 AM Imre Samu <pella.s...@gmail.com> wrote:

> >Why I need it? There is tiny window between postgres being ready to
> accept connections
> >and final scripts which create initial user/database.
> >Ideally having option to say "postgres is ready after specific user can
> login to specific database" would be ideal.
>
> temporary - the official docker-postgres solution is not OK?
> see :
> https://github.com/docker-library/postgres/blob/master/9.5/docker-entrypoint.sh#L50
>
> *# internal start of server in order to allow set-up using psql-client *
> *# does not listen on external TCP/IP and waits until start finishes*
> *gosu postgres pg_ctl -D "$PGDATA" -o "-c listen_addresses='localhost'" *
>
> *.*
>
>
> more info: https://github.com/docker-library/postgres/issues/146
>
>
> 2016-06-15 18:09 GMT+02:00 Jimmy <jimmyj...@gmail.com>:
>
>> Not sure if this was discussed in the past and decided it does not belong
>> to pg_isready utility
>>
>> I am using pg_isready in dockerized development environment as part of
>> docker-compose. Simply say I have POSTGRES container and APP container. I
>> use pg_isready inside app container and wait till postgres is ready
>> to accept connections before app starts.
>>
>> There are two features which would make this a bit smoother and better.
>>
>>
>> *1. enforce login*
>> This could be optional and turned off by default. Basically if user
>> supplies username/database as part of pg_isready call and the login fails
>> (for whatever reason), pg_isready should fail.
>>
>> Why I need it? There is tiny window between postgres being ready to
>> accept connections and final scripts which create initial user/database.
>> Ideally having option to say "postgres is ready after specific user can
>> login to specific database" would be ideal. Again, this can be optional and
>> turned off by default.
>>
>>
>> *2. retry*
>> This is something I can do via unix script, but ideally it would be nice
>> if there would be retry mechanism. For example if I say retry=30 then
>> pg_isready would try 30x with x seconds pause in between and fail only
>> after 30 retries. This could use default retry=0 (current behavior)
>>
>>
>> thanks a lot!
>>
>>
>>
>


Re: [HACKERS] pg_isready features

2016-06-15 Thread Jimmy
(1) I can (and do) use psql, pg_isready seems lighter and since it already
supports credentials and DB name, I see very little harm for pg_isready to
say back whether user logged IN or not using these credentials.


(2) timeout is not the same - timeout does not retry, its a simple timeout
in case connection hangs, try it






On Wed, Jun 15, 2016 at 9:30 AM David G. Johnston <
david.g.johns...@gmail.com> wrote:

> On Wed, Jun 15, 2016 at 12:09 PM, Jimmy <jimmyj...@gmail.com> wrote:
>
>> Not sure if this was discussed in the past and decided it does not belong
>> to pg_isready utility
>>
>> I am using pg_isready in dockerized development environment as part of
>> docker-compose. Simply say I have POSTGRES container and APP container. I
>> use pg_isready inside app container and wait till postgres is ready
>> to accept connections before app starts.
>>
>> There are two features which would make this a bit smoother and better.
>>
>>
>> *1. enforce login*
>> This could be optional and turned off by default. Basically if user
>> supplies username/database as part of pg_isready call and the login fails
>> (for whatever reason), pg_isready should fail.
>>
>> Why I need it? There is tiny window between postgres being ready to
>> accept connections and final scripts which create initial user/database.
>> Ideally having option to say "postgres is ready after specific user can
>> login to specific database" would be ideal. Again, this can be optional and
>> turned off by default.
>>
>>
> ​It shouldn't have to enforce login if there is a way for the server to
> communicate ready or not ready for login without requiring credentials to
> actually be supplied.  I guess it would be more effort and invasive.  Are
> you trying to avoid psql here?​
>
>
>>
>> *2. retry*
>> This is something I can do via unix script, but ideally it would be nice
>> if there would be retry mechanism. For example if I say retry=30 then
>> pg_isready would try 30x with x seconds pause in between and fail only
>> after 30 retries. This could use default retry=0 (current behavior)
>>
>>
> And the value of this instead of setting a timeout 30x higher is?
> ​
>
>


[HACKERS] pg_isready features

2016-06-15 Thread Jimmy
Not sure if this was discussed in the past and decided it does not belong
to pg_isready utility

I am using pg_isready in dockerized development environment as part of
docker-compose. Simply say I have POSTGRES container and APP container. I
use pg_isready inside app container and wait till postgres is ready
to accept connections before app starts.

There are two features which would make this a bit smoother and better.


*1. enforce login*
This could be optional and turned off by default. Basically if user
supplies username/database as part of pg_isready call and the login fails
(for whatever reason), pg_isready should fail.

Why I need it? There is tiny window between postgres being ready to accept
connections and final scripts which create initial user/database. Ideally
having option to say "postgres is ready after specific user can login to
specific database" would be ideal. Again, this can be optional and turned
off by default.


*2. retry*
This is something I can do via unix script, but ideally it would be nice if
there would be retry mechanism. For example if I say retry=30 then
pg_isready would try 30x with x seconds pause in between and fail only
after 30 retries. This could use default retry=0 (current behavior)


thanks a lot!