Re: [HACKERS] [BUGS] replication_timeout not effective

2013-04-10 Thread Amit Kapila
 Sent: Wednesday, April 10, 2013 1:49 PM Dang Minh Huong wrote:
 To: Amit Kapila
 Subject: Re: [BUGS] replication_timeout not effective
On Wednesday, April 10, 2013 1:49 PM
 Hi,

 Thank you for your soon reply.

 I'm trying to set the network timeout related parameters to terminate
 it.

 # i've tried to set postgresql.conf's tcp_keepalives_* but not success.

I have also tried those, but they didn't work that's why I have proposed
this feature in 9.3

Please send mail on community list, others can also help you if they have
any idea for avoiding such problems.

 2013/04/10 14:05、Amit Kapila amit.kap...@huawei.com のメッセージ:

  On Wednesday, April 10, 2013 9:35 AM Dang Minh Huong wrote:
  Hi,
 
  I'm wondering  if this is a bug of PostgreSQL.
 
  PostgreSQL's show that replication_timeout parameter can Terminate
 replication connections that are inactive longer than the specified
 number of milliseconds. But in my environment the sender process  is
 hang up (in several tens of minunites) if i turn off  (by power off)
 Standby PC while pg_basebackup is excuting.
 
  Is this correct?
 
  As my debug, sender process is terminated when recieve SIGPIPE
 process but it come too slow (about 30minutes after standby PC was
 down).
 
  For such scenario's, new parameter wal_sender_timeout has been
 introduced in 9.3. Refer below:
  http://www.postgresql.org/docs/devel/static/runtime-config-
 replication.html#RUNTIME-CONFIG-REPLICATION-SENDER
 
  I am not sure how to get rid of this problem in 9.1.9
 
  With Regards,
  Amit Kapila.
 



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [BUGS] replication_timeout not effective

2013-04-10 Thread Dang Minh Huong
Hi Amit,

Thank you for your consideration.

My project not allows to use 9.2 or 9.3.

In 9.3, it sounds replication_timeout is replaced by wal_sender_timeout. 
So if it is solved in 9.3 i think there is a way to terminate it. 
I hope it is fixed in 9.1 soon

Regards,

2013/04/10 18:33、Amit Kapila amit.kap...@huawei.com のメッセージ:

 Sent: Wednesday, April 10, 2013 1:49 PM Dang Minh Huong wrote:
 To: Amit Kapila
 Subject: Re: [BUGS] replication_timeout not effective
 On Wednesday, April 10, 2013 1:49 PM
 Hi,
 
 Thank you for your soon reply.
 
 I'm trying to set the network timeout related parameters to terminate
 it.
 
 # i've tried to set postgresql.conf's tcp_keepalives_* but not success.
 
 I have also tried those, but they didn't work that's why I have proposed
 this feature in 9.3
 
 Please send mail on community list, others can also help you if they have
 any idea for avoiding such problems.
 
 2013/04/10 14:05、Amit Kapila amit.kap...@huawei.com のメッセージ:
 
 On Wednesday, April 10, 2013 9:35 AM Dang Minh Huong wrote:
 Hi,
 
 I'm wondering  if this is a bug of PostgreSQL.
 
 PostgreSQL's show that replication_timeout parameter can Terminate
 replication connections that are inactive longer than the specified
 number of milliseconds. But in my environment the sender process  is
 hang up (in several tens of minunites) if i turn off  (by power off)
 Standby PC while pg_basebackup is excuting.
 
 Is this correct?
 
 As my debug, sender process is terminated when recieve SIGPIPE
 process but it come too slow (about 30minutes after standby PC was
 down).
 
 For such scenario's, new parameter wal_sender_timeout has been
 introduced in 9.3. Refer below:
 http://www.postgresql.org/docs/devel/static/runtime-config-
 replication.html#RUNTIME-CONFIG-REPLICATION-SENDER
 
 I am not sure how to get rid of this problem in 9.1.9
 
 With Regards,
 Amit Kapila.
 


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [BUGS] replication_timeout not effective

2013-04-10 Thread Kyotaro HORIGUCHI
Hello,

On Wed, Apr 10, 2013 at 6:57 PM, Dang Minh Huong kakalo...@gmail.com wrote:
 In 9.3, it sounds replication_timeout is replaced by wal_sender_timeout.
 So if it is solved in 9.3 i think there is a way to terminate it.
 I hope it is fixed in 9.1 soon

Hmm. He said that,

 But in my environment the sender process is hang up (in several tens of 
 minunites) if i turn off  (by power off) Standby PC while *pg_basebackup* is 
 excuting.

Does basebackup run only on 'replication connection' ?
As far as I saw base backup uses 'base backup' connection in addition
to 'streaming' connection. The former seems not under the control of
wal_sender_timeout or replication_timeout and easily blocked at
send(2) after sudden cut out of the network connection underneath.
Although the latter indeed is terminated by them.

Blocking in send(2) might could occur for async-rep connection but not
likely for sync-rep since it does not fill the buffers of libpq and
socket easilly.

I suppose he says about this.

This seems to occur as of the latest 9.3dev.

regards,
--
Kyotaro Horiguchi


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [BUGS] replication_timeout not effective

2013-04-10 Thread Andres Freund
On 2013-04-10 22:38:07 +0900, Kyotaro HORIGUCHI wrote:
 Hello,
 
 On Wed, Apr 10, 2013 at 6:57 PM, Dang Minh Huong kakalo...@gmail.com wrote:
  In 9.3, it sounds replication_timeout is replaced by wal_sender_timeout.
  So if it is solved in 9.3 i think there is a way to terminate it.
  I hope it is fixed in 9.1 soon
 
 Hmm. He said that,
 
  But in my environment the sender process is hang up (in several tens of 
  minunites) if i turn off  (by power off) Standby PC while *pg_basebackup* 
  is excuting.
 
 Does basebackup run only on 'replication connection' ?
 As far as I saw base backup uses 'base backup' connection in addition
 to 'streaming' connection. The former seems not under the control of
 wal_sender_timeout or replication_timeout and easily blocked at
 send(2) after sudden cut out of the network connection underneath.
 Although the latter indeed is terminated by them.

Yes, it's run via a walsender connection. The only problem is that it
doesn't check for those timeouts. I am not sure it would be a good thing
to do so to be honest. At least not using the same timeout as actual WAL
sending, thats just has different characteristics.
On the other hand, hanging around that long isn't nice either...

 Blocking in send(2) might could occur for async-rep connection but not
 likely for sync-rep since it does not fill the buffers of libpq and
 socket easilly.

You just need larger transactions for it. A COPY or so ought to do it.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [BUGS] replication_timeout not effective

2013-04-10 Thread Dang Minh Huong

Thanks all,

(2013/04/10 22:55), Andres Freund wrote:

On 2013-04-10 22:38:07 +0900, Kyotaro HORIGUCHI wrote:

Hello,

On Wed, Apr 10, 2013 at 6:57 PM, Dang Minh Huong kakalo...@gmail.com wrote:

In 9.3, it sounds replication_timeout is replaced by wal_sender_timeout.
So if it is solved in 9.3 i think there is a way to terminate it.
I hope it is fixed in 9.1 soon

Hmm. He said that,


But in my environment the sender process is hang up (in several tens of 
minunites) if i turn off  (by power off) Standby PC while *pg_basebackup* is 
excuting.

Does basebackup run only on 'replication connection' ?
As far as I saw base backup uses 'base backup' connection in addition
to 'streaming' connection. The former seems not under the control of
wal_sender_timeout or replication_timeout and easily blocked at
send(2) after sudden cut out of the network connection underneath.
Although the latter indeed is terminated by them.

Yes, it's run via a walsender connection. The only problem is that it
doesn't check for those timeouts. I am not sure it would be a good thing
to do so to be honest. At least not using the same timeout as actual WAL
sending, thats just has different characteristics.
On the other hand, hanging around that long isn't nice either...

I tried max_wal_sender with 1, so when the walsender is hanging.
I can not run again pg_basebackup (or start the standby DB).
I'm increasing it to 2, so the seconds successfully. But i'm afraid
 that when the third occures the hanging walsender in the first
 is not yet terminated...

 I think not, but is there a way to terminate hanging up but not
 restart PostgreSQL server or kill walsender process?
 (kill walsender process can caused a crash to DB server,
 so i don't want to do it).

 # i've also tried with pg_cancel_backend() but it did not work too.

Blocking in send(2) might could occur for async-rep connection but not
likely for sync-rep since it does not fill the buffers of libpq and
socket easilly.

You just need larger transactions for it. A COPY or so ought to do it.

Greetings,

Andres Freund


Regard,
Huong DM


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [BUGS] replication_timeout not effective

2013-04-10 Thread Andres Freund
On 2013-04-10 23:37:44 +0900, Dang Minh Huong wrote:
 Thanks all,
 
 (2013/04/10 22:55), Andres Freund wrote:
 On 2013-04-10 22:38:07 +0900, Kyotaro HORIGUCHI wrote:
 Hello,
 
 On Wed, Apr 10, 2013 at 6:57 PM, Dang Minh Huong kakalo...@gmail.com 
 wrote:
 In 9.3, it sounds replication_timeout is replaced by wal_sender_timeout.
 So if it is solved in 9.3 i think there is a way to terminate it.
 I hope it is fixed in 9.1 soon
 Hmm. He said that,
 
 But in my environment the sender process is hang up (in several tens of 
 minunites) if i turn off  (by power off) Standby PC while *pg_basebackup* 
 is excuting.
 Does basebackup run only on 'replication connection' ?
 As far as I saw base backup uses 'base backup' connection in addition
 to 'streaming' connection. The former seems not under the control of
 wal_sender_timeout or replication_timeout and easily blocked at
 send(2) after sudden cut out of the network connection underneath.
 Although the latter indeed is terminated by them.
 Yes, it's run via a walsender connection. The only problem is that it
 doesn't check for those timeouts. I am not sure it would be a good thing
 to do so to be honest. At least not using the same timeout as actual WAL
 sending, thats just has different characteristics.
 On the other hand, hanging around that long isn't nice either...
 I tried max_wal_sender with 1, so when the walsender is hanging.
 I can not run again pg_basebackup (or start the standby DB).
 I'm increasing it to 2, so the seconds successfully. But i'm afraid
  that when the third occures the hanging walsender in the first
  is not yet terminated...
 
  I think not, but is there a way to terminate hanging up but not
  restart PostgreSQL server or kill walsender process?
  (kill walsender process can caused a crash to DB server,
  so i don't want to do it).

Depending on where its hanging a normal SELECT
pg_terminate_backend(pid); might do it.

Otherwise you will have to wait for the operating system's tcp timeout.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [BUGS] replication_timeout not effective

2013-04-10 Thread Dang Minh Huong
2013/04/10 23:44、Andres Freund and...@2ndquadrant.com のメッセージ:

 On 2013-04-10 23:37:44 +0900, Dang Minh Huong wrote:
 Thanks all,
 
 (2013/04/10 22:55), Andres Freund wrote:
 On 2013-04-10 22:38:07 +0900, Kyotaro HORIGUCHI wrote:
 Hello,
 
 On Wed, Apr 10, 2013 at 6:57 PM, Dang Minh Huong kakalo...@gmail.com 
 wrote:
 In 9.3, it sounds replication_timeout is replaced by wal_sender_timeout.
 So if it is solved in 9.3 i think there is a way to terminate it.
 I hope it is fixed in 9.1 soon
 Hmm. He said that,
 
 But in my environment the sender process is hang up (in several tens of 
 minunites) if i turn off  (by power off) Standby PC while *pg_basebackup* 
 is excuting.
 Does basebackup run only on 'replication connection' ?
 As far as I saw base backup uses 'base backup' connection in addition
 to 'streaming' connection. The former seems not under the control of
 wal_sender_timeout or replication_timeout and easily blocked at
 send(2) after sudden cut out of the network connection underneath.
 Although the latter indeed is terminated by them.
 Yes, it's run via a walsender connection. The only problem is that it
 doesn't check for those timeouts. I am not sure it would be a good thing
 to do so to be honest. At least not using the same timeout as actual WAL
 sending, thats just has different characteristics.
 On the other hand, hanging around that long isn't nice either...
 I tried max_wal_sender with 1, so when the walsender is hanging.
 I can not run again pg_basebackup (or start the standby DB).
 I'm increasing it to 2, so the seconds successfully. But i'm afraid
 that when the third occures the hanging walsender in the first
 is not yet terminated...
 
 I think not, but is there a way to terminate hanging up but not
 restart PostgreSQL server or kill walsender process?
 (kill walsender process can caused a crash to DB server,
 so i don't want to do it).
 
 Depending on where its hanging a normal SELECT
 pg_terminate_backend(pid); might do it.
 
Greate! it worked. Thank you very much.

 Otherwise you will have to wait for the operating system's tcp timeout.
 
 Greetings,
 
 Andres Freund
 
 -- 
 Andres Freund   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services

Regards,
Huong DM

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers