Done: https://github.com/python/asyncio/pull/302
On Wednesday, December 9, 2015 at 2:58:07 PM UTC-5, Guido van Rossum wrote: > > It'll go much quicker if you send a PR to the asyncio github project. > Thanks! > > --Guido (mobile) > On Dec 9, 2015 11:56 AM, "A. Jesse Jiryu Davis" <[email protected] > <javascript:>> wrote: > >> Thanks for your kind words about the crawler chapter. =) I think we >> didn't hit this problem there because the crawler only looked up one >> domain, or a few of them. On Mac, subsequent calls are cached at the OS >> layer for a few minutes, so getaddrinfo is very fast, even though it's >> serialized by the getaddrinfo lock. Besides I don't think the crawler has a >> connection timeout, so if it *were *looking up hundreds of domains it >> would wait as long as necessary for the lock. >> >> In this bug report, on the other hand, there are hundreds of coroutines >> all waiting to look up different domains, and some of those domains take >> several seconds to resolve. Resolving hundreds of different domains, plus >> the getaddrinfo lock, plus Motor's 20-second connection timeout, all >> combine to create this bad behavior. >> >> I like option #4 also; that gives me the freedom to either cache lookups >> in Motor, or to treat "localhost" specially, or at least to prevent a slow >> getaddrinfo from appearing to be a connection timeout. I haven't decided >> which is the best approach, but #4 allows me flexibility. Would you like me >> to write the patch or will you? >> >> I'm curious about #5 too, but I think that's a longer term project. >> >> On Wednesday, December 9, 2015 at 12:57:53 PM UTC-5, Guido van Rossum >> wrote: >>> >>> 4. Modify asyncio's getaddrinfo() so that if you pass it something that >>> looks like a numerical address (IPv4 or IPv6 syntax, looking exactly like >>> what getaddrinfo() returns) it skips calling socket.getaddrinfo(). I've >>> wanted this for a while but hadn't run into this queue issue, so this is my >>> favorite. >>> >>> 5. Do the research needed to prove that socket.getaddrinfo() on OS X (or >>> perhaps on sufficiently recent versions of OS X) is thread-safe and submit >>> a patch that avoids the getaddrinfo lock on those versions of OS X. >>> >>> FWIW, IIRC my crawler example (which your editing made so much better!) >>> calls getaddrinfo() for every connection and I hadn't experienced any >>> slowness there. But maybe in the Motor example you're hitting a slow DNS >>> server? (I'm guessing there are lots of system configuration parameters >>> that may make the system's getaddrinfo() slower or faster, and in your >>> setup it may well be slower.) >>> >>> On Wed, Dec 9, 2015 at 7:05 AM, A. Jesse Jiryu Davis < >>> [email protected]> wrote: >>> >>>> Thanks Guido, this all makes sense. >>>> >>>> One problem, though, is that even if I call getaddrinfo myself in Motor >>>> and wrap a cache around it, I *still* can't use the event >>>> loop's create_connection() call. create_connection always >>>> calls getaddrinfo, and even though it "should be quick", that doesn't >>>> matter in this scenario: the time is spent waiting for the getaddrinfo >>>> lock. So I would need one of: >>>> >>>> 1. Copy the body of create_connection into Motor so I can customize the >>>> getaddrinfo call. I especially don't like this because I'd have to call >>>> the >>>> loop's private _create_connection_transport() from Motor's >>>> customized create_connection(). Perhaps we could >>>> make _create_connection_transport public, or otherwise make a public API >>>> for separating getaddrinfo from actually establishing the connection? >>>> >>>> 2. Make getaddrinfo customizable in asyncio ( >>>> https://github.com/python/asyncio/issues/160). This isn't ideal, since >>>> it requires Motor users on Mac / BSD to change configuration for the whole >>>> event loop just so Motor's specific create_connection calls behave >>>> correctly. >>>> >>>> 3. Back to the original proposal: add a connection timeout parameter to >>>> create_connection. =) >>>> >>>> On Tuesday, December 8, 2015 at 4:30:04 PM UTC-5, Guido van Rossum >>>> wrote: >>>> >>>>> On Tue, Dec 8, 2015 at 7:13 AM, A. Jesse Jiryu Davis < >>>>> [email protected]> wrote: >>>>> >>>>>> Hi, a Motor user began an interesting discussion on the MongoDB-user >>>>>> list: >>>>>> >>>>>> https://groups.google.com/d/topic/mongodb-user/2oK6C3BrVKI/discussion >>>>>> >>>>>> The summary is this: he's fetching hundreds of URLs concurrently and >>>>>> inserting the results into MongoDB with Motor. Motor throws lots of >>>>>> connection-timeout errors. The problem is getaddrinfo: on Mac, Python >>>>>> only >>>>>> allows one getaddrinfo call at a time. With hundreds of HTTP fetches in >>>>>> progress, there's a long queue waiting for the getaddrinfo lock. >>>>>> Whenever >>>>>> Motor wants to grow its connection pool it has to call getaddrinfo on >>>>>> "localhost", and it spends so long waiting for that call, it times out >>>>>> and >>>>>> thinks it can't reach MongoDB. >>>>>> >>>>> >>>>> If it's really looking up "localhost" over and over, maybe wrap a >>>>> cache around getaddrinfo()? >>>>> >>>>> >>>>>> Motor's connection-timeout implementation in asyncio is sort of wrong: >>>>>> >>>>>> coro = asyncio.open_connection(host, port) >>>>>> sock = yield from asyncio.wait_for(coro, timeout) >>>>>> >>>>>> The timer runs during the call to getaddrinfo, as well as the call to >>>>>> the loop's sock_connect(). This isn't the intention: the timeout should >>>>>> apply only to the connection. >>>>>> >>>>>> A philosophical digression: The "connection timeout" is a heuristic. >>>>>> "If I've waited N seconds and haven't established the connection, I >>>>>> probably never will. Give up." Based on what they know about their own >>>>>> networks, users can tweak the connection timeout. In a fast network, a >>>>>> server that hasn't responded in 20ms is probably down; but on a global >>>>>> network, 10 seconds might be reasonable. Regardless, the heuristic only >>>>>> applies to the actual TCP connection. Waiting for getaddrinfo is not >>>>>> related; that's up to the operating system. >>>>>> >>>>>> In a multithreaded client like PyMongo we distinguish the two phases: >>>>>> >>>>>> for res in socket.getaddrinfo(host, port, family, >>>>>> socket.SOCK_STREAM): >>>>>> af, socktype, proto, dummy, sa = res >>>>>> sock = socket.socket(af, socktype, proto) >>>>>> try: >>>>>> sock.settimeout(connect_timeout) >>>>>> >>>>>> # THE TIMEOUT ONLY APPLIES HERE. >>>>>> sock.connect(sa) >>>>>> sock.settimeout(None) >>>>>> return sock >>>>>> except socket.error as e: >>>>>> # Connection refused, or not established within the >>>>>> timeout. >>>>>> sock.close() >>>>>> >>>>>> Here, the call to getaddrinfo isn't timed at all, and each distinct >>>>>> attempt to connect on a different address is timed separately. So this >>>>>> kind >>>>>> of code matches the idea of a "connect timeout" as a heuristic for >>>>>> deciding >>>>>> whether the server is down. >>>>>> >>>>>> Two questions: >>>>>> >>>>>> 1. Should asyncio.open_connection support a connection timeout that >>>>>> acts like the blocking version above? That is, a connection timeout that >>>>>> does not include getaddrinfo, and restarts for each address we attempt >>>>>> to >>>>>> connect to? >>>>>> >>>>> >>>>> Hm, I don't really like adding timeouts to every API. As you describe >>>>> everyone has different needs. IMO if you don't want the timeout to cover >>>>> the getaddrinfo() call, call getaddrinfo() yourself and pass the host >>>>> address into the create_connection() call. That way you also have control >>>>> over whether to e.g. implement "happy eyeballs". (It will still call >>>>> socket.getaddrinfo(), but it should be quick -- it's not going to a DNS >>>>> server or even /etc/hosts to discover that 127.0.0.1 maps to 127.0.0.1.) >>>>> >>>>> >>>>>> >>>>>> 2. Why does Python lock around getaddrinfo on Mac and Windows anyway? >>>>>> The code comment says these are "systems on which getaddrinfo() is >>>>>> believed >>>>>> to not be thread-safe". Has this belief ever been confirmed? >>>>>> >>>>>> >>>>>> https://hg.python.org/cpython/file/d2b8354e87f5/Modules/socketmodule.c#l185 >>>>>> >>>>> >>>>> I don't know -- the list of ifdefs seems to indicate this is a generic >>>>> BSD issue, which is OS X's heritage. Maybe someone can do an experiment, >>>>> or >>>>> review the source code used by Apple (if it's still open source)? While I >>>>> agree that if this really isn't an issue we shouldn't bother with the >>>>> lock, >>>>> I'd also much rather be safe than sorry when it comes to races in core >>>>> Python. >>>>> >>>>> -- >>>>> --Guido van Rossum (python.org/~guido) >>>>> >>>> >>> >>> >>> -- >>> --Guido van Rossum (python.org/~guido) >>> >>
