On Thu, Feb 17, 2011 at 9:10 PM, Fujii Masao <masao.fu...@gmail.com> wrote: > On Fri, Feb 18, 2011 at 7:55 AM, Josh Berkus <j...@agliodbs.com> wrote: >>> So, in summary, the position is that we have a timeout, but that timeout >>> doesn't work in all cases. But it does work in some, so that seems >>> enough for me to say "let's commit". Not committing gives us nothing at >>> all, which is as much use as a chocolate teapot. >> >> Can someone summarize the cases where it does and doesn't work? >> There's been a longish gap in this thread. > > The timeout doesn't work when walsender gets blocked during sending the > WAL because the send buffer has been filled up, I'm afraid. IOW, it doesn't > work when the standby becomes unresponsive while WAL is generated on > the master one after another. Since walsender tries to continue sending the > WAL while the standby is unresponsive, the send buffer gets filled up and > the blocking send function (e.g., pq_flush) blocks the walsender. > > OTOH, if the standby becomes unresponsive when there is no workload > which causes WAL, the timeout would work.
IMHO, that's so broken as to be useless. I would really like to have a solution to this problem, though. Relying on TCP keepalives is weak. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers