Andrew Dunstan wrote:
Tom Lane wrote:
I see one occurrence in the 8.1 branch on hyena, but the failure
probability seems to have jumped way up in HEAD since we put in the
C-coded pg_regress. This lends weight to the idea that it's a
timing-related issue, because pg_regress.c is presumably much faster
at forking off a parallel gang of psqls than the shell script was;
and it's hard to see what else about the pg_regress change could be
affecting the psqls' ability to connect once forked.
We probably need to get some Solaris experts involved in diagnosing
what's happening. Judging by the buildfarm results you should be able
to replicate it fairly easily by doing "make installcheck-parallel"
I will refer this to those experts - my Solaris-fu is a tad rusty these
How Tom mentioned, problem is in the size of TCP connection queue
(parameter tcp_conn_req_max_q). Default is 128 in solaris 10. Second
limit is twice number of backends. See ./backend/libpq/pqcomm.c
* Select appropriate accept-queue length limit.
PG_SOMAXCONN is only
* intended to provide a clamp on the request on
platforms where an
* overly large request provokes a kernel error (are
maxconn = MaxBackends * 2;
if (maxconn > PG_SOMAXCONN)
maxconn = PG_SOMAXCONN;
err = listen(fd, maxconn);
However what happened? I think that following scenarios occurred.
Postmaster listen only in one process and there are many clients run
really parallel. T2000 server has 32 threads ( 8 core and each has 4
threads). These clients generate more TCP/IP request at one time, than
postmaster is able accepted.
---------------------------(end of broadcast)---------------------------
TIP 3: Have you checked our extensive FAQ?