Andrew Dunstan <[EMAIL PROTECTED]> writes:
> Tom Lane wrote:
>> AFAIK it is not possible for Postgres itself to cause a "connection
>> refused" failure --- that's a kernel-level errno.  So what's going on
>> here?  The only idea that comes to mind is that this version of Solaris
>> has some very low limit on SOMAXCONN, and when the timing is just so
>> it's bouncing connection requests because several of them arrive faster
>> than the postmaster can fork off children.  Googling suggests that there
>> are versions of Solaris with SOMAXCONN as low as 5 :-( ... but other
>> pages say that the default is higher, so this theory might be wrong.

> This is the box that Sun donated, btw.
> I get: ndd /dev/tcp tcp_conn_req_max_q   => 128
> Is that the Solaris equivalent of SOMAXCONN? That's low, maybe, but not 
> impossibly low.

Yeah, I found that variable name in googling.  If it's 128 then there's
no way that it's causing the problem --- you'd have to assume a value in
the single digits to explain the observed failures.

I see one occurrence in the 8.1 branch on hyena, but the failure
probability seems to have jumped way up in HEAD since we put in the
C-coded pg_regress.  This lends weight to the idea that it's a
timing-related issue, because pg_regress.c is presumably much faster
at forking off a parallel gang of psqls than the shell script was;
and it's hard to see what else about the pg_regress change could be
affecting the psqls' ability to connect once forked.

We probably need to get some Solaris experts involved in diagnosing
what's happening.  Judging by the buildfarm results you should be able
to replicate it fairly easily by doing "make installcheck-parallel"
repeatedly.

                        regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 4: Have you searched our list archives?

               http://archives.postgresql.org

Reply via email to