Thanks Guido, this all makes sense. One problem, though, is that even if I call getaddrinfo myself in Motor and wrap a cache around it, I *still* can't use the event loop's create_connection() call. create_connection always calls getaddrinfo, and even though it "should be quick", that doesn't matter in this scenario: the time is spent waiting for the getaddrinfo lock. So I would need one of:
1. Copy the body of create_connection into Motor so I can customize the getaddrinfo call. I especially don't like this because I'd have to call the loop's private _create_connection_transport() from Motor's customized create_connection(). Perhaps we could make _create_connection_transport public, or otherwise make a public API for separating getaddrinfo from actually establishing the connection? 2. Make getaddrinfo customizable in asyncio (https://github.com/python/asyncio/issues/160). This isn't ideal, since it requires Motor users on Mac / BSD to change configuration for the whole event loop just so Motor's specific create_connection calls behave correctly. 3. Back to the original proposal: add a connection timeout parameter to create_connection. =) On Tuesday, December 8, 2015 at 4:30:04 PM UTC-5, Guido van Rossum wrote: > > On Tue, Dec 8, 2015 at 7:13 AM, A. Jesse Jiryu Davis < > [email protected] <javascript:>> wrote: > >> Hi, a Motor user began an interesting discussion on the MongoDB-user list: >> >> https://groups.google.com/d/topic/mongodb-user/2oK6C3BrVKI/discussion >> >> The summary is this: he's fetching hundreds of URLs concurrently and >> inserting the results into MongoDB with Motor. Motor throws lots of >> connection-timeout errors. The problem is getaddrinfo: on Mac, Python only >> allows one getaddrinfo call at a time. With hundreds of HTTP fetches in >> progress, there's a long queue waiting for the getaddrinfo lock. Whenever >> Motor wants to grow its connection pool it has to call getaddrinfo on >> "localhost", and it spends so long waiting for that call, it times out and >> thinks it can't reach MongoDB. >> > > If it's really looking up "localhost" over and over, maybe wrap a cache > around getaddrinfo()? > > >> Motor's connection-timeout implementation in asyncio is sort of wrong: >> >> coro = asyncio.open_connection(host, port) >> sock = yield from asyncio.wait_for(coro, timeout) >> >> The timer runs during the call to getaddrinfo, as well as the call to the >> loop's sock_connect(). This isn't the intention: the timeout should apply >> only to the connection. >> >> A philosophical digression: The "connection timeout" is a heuristic. "If >> I've waited N seconds and haven't established the connection, I probably >> never will. Give up." Based on what they know about their own networks, >> users can tweak the connection timeout. In a fast network, a server that >> hasn't responded in 20ms is probably down; but on a global network, 10 >> seconds might be reasonable. Regardless, the heuristic only applies to the >> actual TCP connection. Waiting for getaddrinfo is not related; that's up to >> the operating system. >> >> In a multithreaded client like PyMongo we distinguish the two phases: >> >> for res in socket.getaddrinfo(host, port, family, socket.SOCK_STREAM): >> af, socktype, proto, dummy, sa = res >> sock = socket.socket(af, socktype, proto) >> try: >> sock.settimeout(connect_timeout) >> >> # THE TIMEOUT ONLY APPLIES HERE. >> sock.connect(sa) >> sock.settimeout(None) >> return sock >> except socket.error as e: >> # Connection refused, or not established within the timeout. >> sock.close() >> >> Here, the call to getaddrinfo isn't timed at all, and each distinct >> attempt to connect on a different address is timed separately. So this kind >> of code matches the idea of a "connect timeout" as a heuristic for deciding >> whether the server is down. >> >> Two questions: >> >> 1. Should asyncio.open_connection support a connection timeout that acts >> like the blocking version above? That is, a connection timeout that does >> not include getaddrinfo, and restarts for each address we attempt to >> connect to? >> > > Hm, I don't really like adding timeouts to every API. As you describe > everyone has different needs. IMO if you don't want the timeout to cover > the getaddrinfo() call, call getaddrinfo() yourself and pass the host > address into the create_connection() call. That way you also have control > over whether to e.g. implement "happy eyeballs". (It will still call > socket.getaddrinfo(), but it should be quick -- it's not going to a DNS > server or even /etc/hosts to discover that 127.0.0.1 maps to 127.0.0.1.) > > >> >> 2. Why does Python lock around getaddrinfo on Mac and Windows anyway? The >> code comment says these are "systems on which getaddrinfo() is believed to >> not be thread-safe". Has this belief ever been confirmed? >> >> >> https://hg.python.org/cpython/file/d2b8354e87f5/Modules/socketmodule.c#l185 >> > > I don't know -- the list of ifdefs seems to indicate this is a generic BSD > issue, which is OS X's heritage. Maybe someone can do an experiment, or > review the source code used by Apple (if it's still open source)? While I > agree that if this really isn't an issue we shouldn't bother with the lock, > I'd also much rather be safe than sorry when it comes to races in core > Python. > > -- > --Guido van Rossum (python.org/~guido) >
