On Tue, Dec 8, 2015 at 7:13 AM, A. Jesse Jiryu Davis <[email protected]> wrote:
> Hi, a Motor user began an interesting discussion on the MongoDB-user list: > > https://groups.google.com/d/topic/mongodb-user/2oK6C3BrVKI/discussion > > The summary is this: he's fetching hundreds of URLs concurrently and > inserting the results into MongoDB with Motor. Motor throws lots of > connection-timeout errors. The problem is getaddrinfo: on Mac, Python only > allows one getaddrinfo call at a time. With hundreds of HTTP fetches in > progress, there's a long queue waiting for the getaddrinfo lock. Whenever > Motor wants to grow its connection pool it has to call getaddrinfo on > "localhost", and it spends so long waiting for that call, it times out and > thinks it can't reach MongoDB. > If it's really looking up "localhost" over and over, maybe wrap a cache around getaddrinfo()? > Motor's connection-timeout implementation in asyncio is sort of wrong: > > coro = asyncio.open_connection(host, port) > sock = yield from asyncio.wait_for(coro, timeout) > > The timer runs during the call to getaddrinfo, as well as the call to the > loop's sock_connect(). This isn't the intention: the timeout should apply > only to the connection. > > A philosophical digression: The "connection timeout" is a heuristic. "If > I've waited N seconds and haven't established the connection, I probably > never will. Give up." Based on what they know about their own networks, > users can tweak the connection timeout. In a fast network, a server that > hasn't responded in 20ms is probably down; but on a global network, 10 > seconds might be reasonable. Regardless, the heuristic only applies to the > actual TCP connection. Waiting for getaddrinfo is not related; that's up to > the operating system. > > In a multithreaded client like PyMongo we distinguish the two phases: > > for res in socket.getaddrinfo(host, port, family, socket.SOCK_STREAM): > af, socktype, proto, dummy, sa = res > sock = socket.socket(af, socktype, proto) > try: > sock.settimeout(connect_timeout) > > # THE TIMEOUT ONLY APPLIES HERE. > sock.connect(sa) > sock.settimeout(None) > return sock > except socket.error as e: > # Connection refused, or not established within the timeout. > sock.close() > > Here, the call to getaddrinfo isn't timed at all, and each distinct > attempt to connect on a different address is timed separately. So this kind > of code matches the idea of a "connect timeout" as a heuristic for deciding > whether the server is down. > > Two questions: > > 1. Should asyncio.open_connection support a connection timeout that acts > like the blocking version above? That is, a connection timeout that does > not include getaddrinfo, and restarts for each address we attempt to > connect to? > Hm, I don't really like adding timeouts to every API. As you describe everyone has different needs. IMO if you don't want the timeout to cover the getaddrinfo() call, call getaddrinfo() yourself and pass the host address into the create_connection() call. That way you also have control over whether to e.g. implement "happy eyeballs". (It will still call socket.getaddrinfo(), but it should be quick -- it's not going to a DNS server or even /etc/hosts to discover that 127.0.0.1 maps to 127.0.0.1.) > > 2. Why does Python lock around getaddrinfo on Mac and Windows anyway? The > code comment says these are "systems on which getaddrinfo() is believed to > not be thread-safe". Has this belief ever been confirmed? > > https://hg.python.org/cpython/file/d2b8354e87f5/Modules/socketmodule.c#l185 > I don't know -- the list of ifdefs seems to indicate this is a generic BSD issue, which is OS X's heritage. Maybe someone can do an experiment, or review the source code used by Apple (if it's still open source)? While I agree that if this really isn't an issue we shouldn't bother with the lock, I'd also much rather be safe than sorry when it comes to races in core Python. -- --Guido van Rossum (python.org/~guido)
