Re: [HACKERS] Replication server timeout patch

Robert Haas Thu, 17 Feb 2011 19:11:30 -0800

On Thu, Feb 17, 2011 at 9:10 PM, Fujii Masao <[email protected]> wrote:
> On Fri, Feb 18, 2011 at 7:55 AM, Josh Berkus <[email protected]> wrote:
>>> So, in summary, the position is that we have a timeout, but that timeout
>>> doesn't work in all cases. But it does work in some, so that seems
>>> enough for me to say "let's commit". Not committing gives us nothing at
>>> all, which is as much use as a chocolate teapot.
>>
>> Can someone summarize the cases where it does and doesn't work?
>> There's been a longish gap in this thread.
>
> The timeout doesn't work when walsender gets blocked during sending the
> WAL because the send buffer has been filled up, I'm afraid. IOW, it doesn't
> work when the standby becomes unresponsive while WAL is generated on
> the master one after another. Since walsender tries to continue sending the
> WAL while the standby is unresponsive, the send buffer gets filled up and
> the blocking send function (e.g., pq_flush) blocks the walsender.
>
> OTOH, if the standby becomes unresponsive when there is no workload
> which causes WAL, the timeout would work.


IMHO, that's so broken as to be useless.

I would really like to have a solution to this problem, though.
Relying on TCP keepalives is weak.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Replication server timeout patch

Reply via email to