Hi, a Motor user began an interesting discussion on the MongoDB-user list:
https://groups.google.com/d/topic/mongodb-user/2oK6C3BrVKI/discussion
The summary is this: he's fetching hundreds of URLs concurrently and
inserting the results into MongoDB with Motor. Motor throws lots of
connection-timeout errors. The problem is getaddrinfo: on Mac, Python only
allows one getaddrinfo call at a time. With hundreds of HTTP fetches in
progress, there's a long queue waiting for the getaddrinfo lock. Whenever
Motor wants to grow its connection pool it has to call getaddrinfo on
"localhost", and it spends so long waiting for that call, it times out and
thinks it can't reach MongoDB.
Motor's connection-timeout implementation in asyncio is sort of wrong:
coro = asyncio.open_connection(host, port)
sock = yield from asyncio.wait_for(coro, timeout)
The timer runs during the call to getaddrinfo, as well as the call to the
loop's sock_connect(). This isn't the intention: the timeout should apply
only to the connection.
A philosophical digression: The "connection timeout" is a heuristic. "If
I've waited N seconds and haven't established the connection, I probably
never will. Give up." Based on what they know about their own networks,
users can tweak the connection timeout. In a fast network, a server that
hasn't responded in 20ms is probably down; but on a global network, 10
seconds might be reasonable. Regardless, the heuristic only applies to the
actual TCP connection. Waiting for getaddrinfo is not related; that's up to
the operating system.
In a multithreaded client like PyMongo we distinguish the two phases:
for res in socket.getaddrinfo(host, port, family, socket.SOCK_STREAM):
af, socktype, proto, dummy, sa = res
sock = socket.socket(af, socktype, proto)
try:
sock.settimeout(connect_timeout)
# THE TIMEOUT ONLY APPLIES HERE.
sock.connect(sa)
sock.settimeout(None)
return sock
except socket.error as e:
# Connection refused, or not established within the timeout.
sock.close()
Here, the call to getaddrinfo isn't timed at all, and each distinct attempt
to connect on a different address is timed separately. So this kind of code
matches the idea of a "connect timeout" as a heuristic for deciding whether
the server is down.
Two questions:
1. Should asyncio.open_connection support a connection timeout that acts
like the blocking version above? That is, a connection timeout that does
not include getaddrinfo, and restarts for each address we attempt to
connect to?
2. Why does Python lock around getaddrinfo on Mac and Windows anyway? The
code comment says these are "systems on which getaddrinfo() is believed to
not be thread-safe". Has this belief ever been confirmed?
https://hg.python.org/cpython/file/d2b8354e87f5/Modules/socketmodule.c#l185
Thanks!
Jesse