Ned merged my patches, so Python 3.6 and the next releases of Python 2.7 and 3.5 won't lock around getaddrinfo on Mac 10.5+.
getaddrinfo is also thread-safe on recent NetBSD 4 and OpenBSD 5.4+. I plan to submit similar version checks for them. (FreeBSD already has the proper version check.) In other words the lock is about to be removed on all modern platforms. Should we revert this asyncio change? It's a complex solution to a problem that's going away. On Friday, January 29, 2016 at 7:23:46 PM UTC-8, A. Jesse Jiryu Davis wrote: > > I've determined that getaddrinfo is thread-safe on Mac OS 10.5+ and > submitted a patch to disable the getaddrinfo lock on those systems: > > http://bugs.python.org/issue25924 > <http://www.google.com/url?q=http%3A%2F%2Fbugs.python.org%2Fissue25924&sa=D&sntz=1&usg=AFQjCNFCywP99jlxjVqwLAdq-aYNbVuoOw> > > On Wednesday, December 16, 2015 at 7:25:19 PM UTC-5, Guido van Rossum > wrote: >> >> Thanks a bundle, Jesse! >> >> On Wed, Dec 16, 2015 at 4:24 PM, A. Jesse Jiryu Davis < >> [email protected]> wrote: >> >>> Committed here: >>> >>> >>> https://github.com/python/asyncio/commit/39c135baf73762830148236da622787052efba19 >>> >>> Yury would you please update the CPython repo when the time is right? >>> >>> >>> On Wednesday, December 9, 2015 at 6:05:46 PM UTC-5, A. Jesse Jiryu Davis >>> wrote: >>>> >>>> Done: >>>> >>>> https://github.com/python/asyncio/pull/302 >>>> >>>> On Wednesday, December 9, 2015 at 2:58:07 PM UTC-5, Guido van Rossum >>>> wrote: >>>>> >>>>> It'll go much quicker if you send a PR to the asyncio github project. >>>>> Thanks! >>>>> >>>>> --Guido (mobile) >>>>> On Dec 9, 2015 11:56 AM, "A. Jesse Jiryu Davis" <[email protected]> >>>>> wrote: >>>>> >>>>>> Thanks for your kind words about the crawler chapter. =) I think we >>>>>> didn't hit this problem there because the crawler only looked up one >>>>>> domain, or a few of them. On Mac, subsequent calls are cached at the OS >>>>>> layer for a few minutes, so getaddrinfo is very fast, even though it's >>>>>> serialized by the getaddrinfo lock. Besides I don't think the crawler >>>>>> has a >>>>>> connection timeout, so if it *were *looking up hundreds of >>>>>> domains it would wait as long as necessary for the lock. >>>>>> >>>>>> In this bug report, on the other hand, there are hundreds of >>>>>> coroutines all waiting to look up different domains, and some of those >>>>>> domains take several seconds to resolve. Resolving hundreds of different >>>>>> domains, plus the getaddrinfo lock, plus Motor's 20-second connection >>>>>> timeout, all combine to create this bad behavior. >>>>>> >>>>>> I like option #4 also; that gives me the freedom to either cache >>>>>> lookups in Motor, or to treat "localhost" specially, or at least to >>>>>> prevent >>>>>> a slow getaddrinfo from appearing to be a connection timeout. I haven't >>>>>> decided which is the best approach, but #4 allows me flexibility. Would >>>>>> you >>>>>> like me to write the patch or will you? >>>>>> >>>>>> I'm curious about #5 too, but I think that's a longer term project. >>>>>> >>>>>> On Wednesday, December 9, 2015 at 12:57:53 PM UTC-5, Guido van Rossum >>>>>> wrote: >>>>>>> >>>>>>> 4. Modify asyncio's getaddrinfo() so that if you pass it something >>>>>>> that looks like a numerical address (IPv4 or IPv6 syntax, looking >>>>>>> exactly >>>>>>> like what getaddrinfo() returns) it skips calling socket.getaddrinfo(). >>>>>>> I've wanted this for a while but hadn't run into this queue issue, so >>>>>>> this >>>>>>> is my favorite. >>>>>>> >>>>>>> 5. Do the research needed to prove that socket.getaddrinfo() on OS X >>>>>>> (or perhaps on sufficiently recent versions of OS X) is thread-safe and >>>>>>> submit a patch that avoids the getaddrinfo lock on those versions of OS >>>>>>> X. >>>>>>> >>>>>>> FWIW, IIRC my crawler example (which your editing made so much >>>>>>> better!) calls getaddrinfo() for every connection and I hadn't >>>>>>> experienced >>>>>>> any slowness there. But maybe in the Motor example you're hitting a >>>>>>> slow >>>>>>> DNS server? (I'm guessing there are lots of system configuration >>>>>>> parameters >>>>>>> that may make the system's getaddrinfo() slower or faster, and in your >>>>>>> setup it may well be slower.) >>>>>>> >>>>>>> On Wed, Dec 9, 2015 at 7:05 AM, A. Jesse Jiryu Davis < >>>>>>> [email protected]> wrote: >>>>>>> >>>>>>>> Thanks Guido, this all makes sense. >>>>>>>> >>>>>>>> One problem, though, is that even if I call getaddrinfo myself in >>>>>>>> Motor and wrap a cache around it, I *still* can't use the event >>>>>>>> loop's create_connection() call. create_connection always >>>>>>>> calls getaddrinfo, and even though it "should be quick", that doesn't >>>>>>>> matter in this scenario: the time is spent waiting for the getaddrinfo >>>>>>>> lock. So I would need one of: >>>>>>>> >>>>>>>> 1. Copy the body of create_connection into Motor so I can customize >>>>>>>> the getaddrinfo call. I especially don't like this because I'd have to >>>>>>>> call >>>>>>>> the loop's private _create_connection_transport() from Motor's >>>>>>>> customized create_connection(). Perhaps we could >>>>>>>> make _create_connection_transport public, or otherwise make a public >>>>>>>> API >>>>>>>> for separating getaddrinfo from actually establishing the connection? >>>>>>>> >>>>>>>> 2. Make getaddrinfo customizable in asyncio ( >>>>>>>> https://github.com/python/asyncio/issues/160). This isn't ideal, >>>>>>>> since it requires Motor users on Mac / BSD to change configuration for >>>>>>>> the >>>>>>>> whole event loop just so Motor's specific create_connection calls >>>>>>>> behave >>>>>>>> correctly. >>>>>>>> >>>>>>>> 3. Back to the original proposal: add a connection timeout >>>>>>>> parameter to create_connection. =) >>>>>>>> >>>>>>>> On Tuesday, December 8, 2015 at 4:30:04 PM UTC-5, Guido van Rossum >>>>>>>> wrote: >>>>>>>> >>>>>>>>> On Tue, Dec 8, 2015 at 7:13 AM, A. Jesse Jiryu Davis < >>>>>>>>> [email protected]> wrote: >>>>>>>>> >>>>>>>>>> Hi, a Motor user began an interesting discussion on the >>>>>>>>>> MongoDB-user list: >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> https://groups.google.com/d/topic/mongodb-user/2oK6C3BrVKI/discussion >>>>>>>>>> >>>>>>>>>> The summary is this: he's fetching hundreds of URLs concurrently >>>>>>>>>> and inserting the results into MongoDB with Motor. Motor throws lots >>>>>>>>>> of >>>>>>>>>> connection-timeout errors. The problem is getaddrinfo: on Mac, >>>>>>>>>> Python only >>>>>>>>>> allows one getaddrinfo call at a time. With hundreds of HTTP fetches >>>>>>>>>> in >>>>>>>>>> progress, there's a long queue waiting for the getaddrinfo lock. >>>>>>>>>> Whenever >>>>>>>>>> Motor wants to grow its connection pool it has to call getaddrinfo >>>>>>>>>> on >>>>>>>>>> "localhost", and it spends so long waiting for that call, it times >>>>>>>>>> out and >>>>>>>>>> thinks it can't reach MongoDB. >>>>>>>>>> >>>>>>>>> >>>>>>>>> If it's really looking up "localhost" over and over, maybe wrap a >>>>>>>>> cache around getaddrinfo()? >>>>>>>>> >>>>>>>>> >>>>>>>>>> Motor's connection-timeout implementation in asyncio is sort of >>>>>>>>>> wrong: >>>>>>>>>> >>>>>>>>>> coro = asyncio.open_connection(host, port) >>>>>>>>>> sock = yield from asyncio.wait_for(coro, timeout) >>>>>>>>>> >>>>>>>>>> The timer runs during the call to getaddrinfo, as well as the >>>>>>>>>> call to the loop's sock_connect(). This isn't the intention: the >>>>>>>>>> timeout >>>>>>>>>> should apply only to the connection. >>>>>>>>>> >>>>>>>>>> A philosophical digression: The "connection timeout" is a >>>>>>>>>> heuristic. "If I've waited N seconds and haven't established the >>>>>>>>>> connection, I probably never will. Give up." Based on what they know >>>>>>>>>> about >>>>>>>>>> their own networks, users can tweak the connection timeout. In a >>>>>>>>>> fast >>>>>>>>>> network, a server that hasn't responded in 20ms is probably down; >>>>>>>>>> but on a >>>>>>>>>> global network, 10 seconds might be reasonable. Regardless, the >>>>>>>>>> heuristic >>>>>>>>>> only applies to the actual TCP connection. Waiting for getaddrinfo >>>>>>>>>> is not >>>>>>>>>> related; that's up to the operating system. >>>>>>>>>> >>>>>>>>>> In a multithreaded client like PyMongo we distinguish the two >>>>>>>>>> phases: >>>>>>>>>> >>>>>>>>>> for res in socket.getaddrinfo(host, port, family, >>>>>>>>>> socket.SOCK_STREAM): >>>>>>>>>> af, socktype, proto, dummy, sa = res >>>>>>>>>> sock = socket.socket(af, socktype, proto) >>>>>>>>>> try: >>>>>>>>>> sock.settimeout(connect_timeout) >>>>>>>>>> >>>>>>>>>> # THE TIMEOUT ONLY APPLIES HERE. >>>>>>>>>> sock.connect(sa) >>>>>>>>>> sock.settimeout(None) >>>>>>>>>> return sock >>>>>>>>>> except socket.error as e: >>>>>>>>>> # Connection refused, or not established within the >>>>>>>>>> timeout. >>>>>>>>>> sock.close() >>>>>>>>>> >>>>>>>>>> Here, the call to getaddrinfo isn't timed at all, and each >>>>>>>>>> distinct attempt to connect on a different address is timed >>>>>>>>>> separately. So >>>>>>>>>> this kind of code matches the idea of a "connect timeout" as a >>>>>>>>>> heuristic >>>>>>>>>> for deciding whether the server is down. >>>>>>>>>> >>>>>>>>>> Two questions: >>>>>>>>>> >>>>>>>>>> 1. Should asyncio.open_connection support a connection timeout >>>>>>>>>> that acts like the blocking version above? That is, a connection >>>>>>>>>> timeout >>>>>>>>>> that does not include getaddrinfo, and restarts for each address we >>>>>>>>>> attempt >>>>>>>>>> to connect to? >>>>>>>>>> >>>>>>>>> >>>>>>>>> Hm, I don't really like adding timeouts to every API. As you >>>>>>>>> describe everyone has different needs. IMO if you don't want the >>>>>>>>> timeout to >>>>>>>>> cover the getaddrinfo() call, call getaddrinfo() yourself and pass >>>>>>>>> the host >>>>>>>>> address into the create_connection() call. That way you also have >>>>>>>>> control >>>>>>>>> over whether to e.g. implement "happy eyeballs". (It will still call >>>>>>>>> socket.getaddrinfo(), but it should be quick -- it's not going to a >>>>>>>>> DNS >>>>>>>>> server or even /etc/hosts to discover that 127.0.0.1 maps to >>>>>>>>> 127.0.0.1.) >>>>>>>>> >>>>>>>>> >>>>>>>>>> >>>>>>>>>> 2. Why does Python lock around getaddrinfo on Mac and Windows >>>>>>>>>> anyway? The code comment says these are "systems on which >>>>>>>>>> getaddrinfo() is >>>>>>>>>> believed to not be thread-safe". Has this belief ever been confirmed? >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> https://hg.python.org/cpython/file/d2b8354e87f5/Modules/socketmodule.c#l185 >>>>>>>>>> >>>>>>>>> >>>>>>>>> I don't know -- the list of ifdefs seems to indicate this is a >>>>>>>>> generic BSD issue, which is OS X's heritage. Maybe someone can do an >>>>>>>>> experiment, or review the source code used by Apple (if it's still >>>>>>>>> open >>>>>>>>> source)? While I agree that if this really isn't an issue we >>>>>>>>> shouldn't >>>>>>>>> bother with the lock, I'd also much rather be safe than sorry when it >>>>>>>>> comes >>>>>>>>> to races in core Python. >>>>>>>>> >>>>>>>>> -- >>>>>>>>> --Guido van Rossum (python.org/~guido) >>>>>>>>> >>>>>>>> >>>>>>> >>>>>>> >>>>>>> -- >>>>>>> --Guido van Rossum (python.org/~guido) >>>>>>> >>>>>> >> >> >> -- >> --Guido van Rossum (python.org/~guido) >> >
