Hi,

When postgres on linux receives connection on a high rate client
connections sometimes error out with:
could not send data to server: Transport endpoint is not connected
could not send startup packet: Transport endpoint is not connected

To reproduce start something like on a server with sufficiently high
max_connections:
pgbench -h /tmp -p 5440 -T 10 -c 400 -j 400 -n -f /tmp/simplequery.sql

Now that's strange since that error should happen at connect(2) time,
not when sending the startup packet. Some investigation led me to
fe-secure.c's PQConnectPoll:

if (connect(conn->sock, addr_cur->ai_addr,
                        addr_cur->ai_addrlen) < 0)
{
    if (SOCK_ERRNO == EINPROGRESS ||
        SOCK_ERRNO == EWOULDBLOCK ||
        SOCK_ERRNO == EINTR ||
        SOCK_ERRNO == 0)
    {
        /*
         * This is fine - we're in non-blocking mode, and
         * the connection is in progress.  Tell caller to
         * wait for write-ready on socket.
         */
        conn->status = CONNECTION_STARTED;
        return PGRES_POLLING_WRITING;
    }
    /* otherwise, trouble */
}

So, we're accepting EWOULDBLOCK as a valid return value for
connect(2). Which it isn't. EAGAIN in contrast is on some BSDs and on
linux. Unfortunately POSIX allows those two to share the same value...

My manpage tells me:
EAGAIN No more free local ports or insufficient entries in the routing cache.  
For
       AF_INET see the description of
       /proc/sys/net/ipv4/ip_local_port_range ip(7)
       for information on how to increase the number of local
       ports.

So, the problem is that we took a failed connection as having been
initially successfull but in progress.

Not accepting EWOULDBLOCK in the above if() results in:
could not connect to server: Resource temporarily unavailable
      Is the server running locally and accepting
      connections on Unix domain socket "/tmp/.s.PGSQL.5440"?

which makes more sense.

Trivial patch attached.

Now, the question is why we cannot complete connections on unix sockets?
Some code reading reading shows net/unix/af_unix.c:unix_stream_connect()
shows:
        if (unix_recvq_full(other)) {
                err = -EAGAIN;
                if (!timeo)
                        goto out_unlock;
So, if we're in nonblocking mode - which we are - and the receive queue
is full we return EAGAIN. The receive queue for unix sockets is defined
as
static inline int unix_recvq_full(struct sock const *sk)
{
        return skb_queue_len(&sk->sk_receive_queue) > sk->sk_max_ack_backlog;
}
Where sk_max_ack_backlog is whatever has been passed to the
listen(backlog) on the listening side.

Question: But postgres does listen(fd, MaxBackends * 2), how can that be
a problem?
Answer:
       If the backlog argument is greater than the value in 
/proc/sys/net/core/somaxconn,
       then  it  is  silently  truncated to that value; the default value in 
this file is
       128.  In kernels before 2.4.25, this limit was a hard coded value, 
SOMAXCONN, with
       the value 128.

Setting somaxconn to something higher indeed makes the problem go away.

I'd guess that pretty much the same holds true for tcp connections,
although I didn't verify that which would explain some previous reports
on the lists.

TLDR: Increase /proc/sys/net/core/somaxconn

Greetings,

Andres Freund

-- 
 Andres Freund                     http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services
>From da5cfb7d237a4a07b146fb9d255f0de72207de10 Mon Sep 17 00:00:00 2001
From: Andres Freund <and...@anarazel.de>
Date: Mon, 17 Jun 2013 16:00:58 +0200
Subject: [PATCH] libpq: Handle connect(2) returning EAGAIN/EWOULDBLOCK
 correctly

libpq used to accept EWOULDBLOCK - which is allowed to have the same value as
EAGAIN by posix - as a valid return code to connect(2) indicating that a
connection is in progress. While posix doesn't specify either as a valid return
code, BSD based systems and linux use it to indicate temporary resource
exhaustion.
Accepting either as a in-progress connection attempt leads to hard to diagnose
errors when sending the startup packet:
could not send data to server: Transport endpoint is not connected
could not send startup packet: Transport endpoint is not connected

Treating it as an error results in:
could not connect to server: Resource temporarily unavailable
      Is the server running locally and accepting
      connections on Unix domain socket "..."?
which is more accurate.
---
 src/interfaces/libpq/fe-connect.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/src/interfaces/libpq/fe-connect.c b/src/interfaces/libpq/fe-connect.c
index 0d729c8..c17c303 100644
--- a/src/interfaces/libpq/fe-connect.c
+++ b/src/interfaces/libpq/fe-connect.c
@@ -1780,7 +1780,6 @@ keep_going:						/* We will come back to here until there is
 								addr_cur->ai_addrlen) < 0)
 					{
 						if (SOCK_ERRNO == EINPROGRESS ||
-							SOCK_ERRNO == EWOULDBLOCK ||
 							SOCK_ERRNO == EINTR ||
 							SOCK_ERRNO == 0)
 						{
-- 
1.8.2.rc2.4.g7799588.dirty

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to