Hi, a Motor user began an interesting discussion on the MongoDB-user list:

https://groups.google.com/d/topic/mongodb-user/2oK6C3BrVKI/discussion

The summary is this: he's fetching hundreds of URLs concurrently and 
inserting the results into MongoDB with Motor. Motor throws lots of 
connection-timeout errors. The problem is getaddrinfo: on Mac, Python only 
allows one getaddrinfo call at a time. With hundreds of HTTP fetches in 
progress, there's a long queue waiting for the getaddrinfo lock. Whenever 
Motor wants to grow its connection pool it has to call getaddrinfo on 
"localhost", and it spends so long waiting for that call, it times out and 
thinks it can't reach MongoDB.

Motor's connection-timeout implementation in asyncio is sort of wrong:

    coro = asyncio.open_connection(host, port)
    sock = yield from asyncio.wait_for(coro, timeout)

The timer runs during the call to getaddrinfo, as well as the call to the 
loop's sock_connect(). This isn't the intention: the timeout should apply 
only to the connection.

A philosophical digression: The "connection timeout" is a heuristic. "If 
I've waited N seconds and haven't established the connection, I probably 
never will. Give up." Based on what they know about their own networks, 
users can tweak the connection timeout. In a fast network, a server that 
hasn't responded in 20ms is probably down; but on a global network, 10 
seconds might be reasonable. Regardless, the heuristic only applies to the 
actual TCP connection. Waiting for getaddrinfo is not related; that's up to 
the operating system.

In a multithreaded client like PyMongo we distinguish the two phases:

    for res in socket.getaddrinfo(host, port, family, socket.SOCK_STREAM):
        af, socktype, proto, dummy, sa = res
        sock = socket.socket(af, socktype, proto)
        try:
            sock.settimeout(connect_timeout)
            
            # THE TIMEOUT ONLY APPLIES HERE.
            sock.connect(sa)
            sock.settimeout(None)
            return sock
        except socket.error as e:
            # Connection refused, or not established within the timeout.
            sock.close()

Here, the call to getaddrinfo isn't timed at all, and each distinct attempt 
to connect on a different address is timed separately. So this kind of code 
matches the idea of a "connect timeout" as a heuristic for deciding whether 
the server is down.

Two questions:

1. Should asyncio.open_connection support a connection timeout that acts 
like the blocking version above? That is, a connection timeout that does 
not include getaddrinfo, and restarts for each address we attempt to 
connect to?

2. Why does Python lock around getaddrinfo on Mac and Windows anyway? The 
code comment says these are "systems on which getaddrinfo() is believed to 
not be thread-safe". Has this belief ever been confirmed?

https://hg.python.org/cpython/file/d2b8354e87f5/Modules/socketmodule.c#l185

Thanks!
Jesse

Reply via email to