Robert Haas <robertmh...@gmail.com> wrote:
 
> What does bother me is the fact that we are engineering a critical
> aspect of our system reliability around vendor-specific
> implementation details of the TCP stack, and that if any version
> of any operating system that we support (or ever wish to support
> in the future) fails to have a reliable implementation of this
> feature AND configurable knobs that we can tune to suit our needs,
> then we're screwed. Does anyone want to argue that this is NOT a
> house of cards?
 
[/me raises hand]
 
TCP keepalive has been available and a useful part of my reliability
solutions since I had so find a way to clean up zombie database
connections caused by clients powering down their workstations
without closing their apps -- that was in OS/2 circa 1990.  I'm
pretty sure I've also used it on HP-UX, whatever Unix flavor was on
our Sun SPARC servers, several versions of Windows, and several
versions of Linux. As far as I can recall, the default was always
two hours before doing anything, followed by nine small packets sent
over the course of ten minutes before giving up (if none were
answered).
 
I'm not sure whether the timings were controllable through the
applications, because we generally changed the OS defaults.  Even
so, recovery after two hours and ten minutes is way better than
waiting for eternity.
 
As someone else said, we may want to add some sort of keepalive-
style ping to our application's home-grown protocol; but I don't see
that as an argument to suppress a very widely supported standard
protocol.  These address slightly different problem sets, let's
solve the one that came up in testing for the vast majority of
runtime environments by turning on TCP keepalives.
 
No, I don't see it as a house of cards.
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to