I've determined that getaddrinfo is thread-safe on Mac OS 10.5+ and submitted a patch to disable the getaddrinfo lock on those systems:
http://bugs.python.org/issue25924 On Wednesday, December 16, 2015 at 7:25:19 PM UTC-5, Guido van Rossum wrote: > > Thanks a bundle, Jesse! > > On Wed, Dec 16, 2015 at 4:24 PM, A. Jesse Jiryu Davis < > [email protected] <javascript:>> wrote: > >> Committed here: >> >> >> https://github.com/python/asyncio/commit/39c135baf73762830148236da622787052efba19 >> >> Yury would you please update the CPython repo when the time is right? >> >> >> On Wednesday, December 9, 2015 at 6:05:46 PM UTC-5, A. Jesse Jiryu Davis >> wrote: >>> >>> Done: >>> >>> https://github.com/python/asyncio/pull/302 >>> >>> On Wednesday, December 9, 2015 at 2:58:07 PM UTC-5, Guido van Rossum >>> wrote: >>>> >>>> It'll go much quicker if you send a PR to the asyncio github project. >>>> Thanks! >>>> >>>> --Guido (mobile) >>>> On Dec 9, 2015 11:56 AM, "A. Jesse Jiryu Davis" <[email protected]> >>>> wrote: >>>> >>>>> Thanks for your kind words about the crawler chapter. =) I think we >>>>> didn't hit this problem there because the crawler only looked up one >>>>> domain, or a few of them. On Mac, subsequent calls are cached at the OS >>>>> layer for a few minutes, so getaddrinfo is very fast, even though it's >>>>> serialized by the getaddrinfo lock. Besides I don't think the crawler has >>>>> a >>>>> connection timeout, so if it *were *looking up hundreds of domains it >>>>> would wait as long as necessary for the lock. >>>>> >>>>> In this bug report, on the other hand, there are hundreds of >>>>> coroutines all waiting to look up different domains, and some of those >>>>> domains take several seconds to resolve. Resolving hundreds of different >>>>> domains, plus the getaddrinfo lock, plus Motor's 20-second connection >>>>> timeout, all combine to create this bad behavior. >>>>> >>>>> I like option #4 also; that gives me the freedom to either cache >>>>> lookups in Motor, or to treat "localhost" specially, or at least to >>>>> prevent >>>>> a slow getaddrinfo from appearing to be a connection timeout. I haven't >>>>> decided which is the best approach, but #4 allows me flexibility. Would >>>>> you >>>>> like me to write the patch or will you? >>>>> >>>>> I'm curious about #5 too, but I think that's a longer term project. >>>>> >>>>> On Wednesday, December 9, 2015 at 12:57:53 PM UTC-5, Guido van Rossum >>>>> wrote: >>>>>> >>>>>> 4. Modify asyncio's getaddrinfo() so that if you pass it something >>>>>> that looks like a numerical address (IPv4 or IPv6 syntax, looking >>>>>> exactly >>>>>> like what getaddrinfo() returns) it skips calling socket.getaddrinfo(). >>>>>> I've wanted this for a while but hadn't run into this queue issue, so >>>>>> this >>>>>> is my favorite. >>>>>> >>>>>> 5. Do the research needed to prove that socket.getaddrinfo() on OS X >>>>>> (or perhaps on sufficiently recent versions of OS X) is thread-safe and >>>>>> submit a patch that avoids the getaddrinfo lock on those versions of OS >>>>>> X. >>>>>> >>>>>> FWIW, IIRC my crawler example (which your editing made so much >>>>>> better!) calls getaddrinfo() for every connection and I hadn't >>>>>> experienced >>>>>> any slowness there. But maybe in the Motor example you're hitting a slow >>>>>> DNS server? (I'm guessing there are lots of system configuration >>>>>> parameters >>>>>> that may make the system's getaddrinfo() slower or faster, and in your >>>>>> setup it may well be slower.) >>>>>> >>>>>> On Wed, Dec 9, 2015 at 7:05 AM, A. Jesse Jiryu Davis < >>>>>> [email protected]> wrote: >>>>>> >>>>>>> Thanks Guido, this all makes sense. >>>>>>> >>>>>>> One problem, though, is that even if I call getaddrinfo myself in >>>>>>> Motor and wrap a cache around it, I *still* can't use the event >>>>>>> loop's create_connection() call. create_connection always >>>>>>> calls getaddrinfo, and even though it "should be quick", that doesn't >>>>>>> matter in this scenario: the time is spent waiting for the getaddrinfo >>>>>>> lock. So I would need one of: >>>>>>> >>>>>>> 1. Copy the body of create_connection into Motor so I can customize >>>>>>> the getaddrinfo call. I especially don't like this because I'd have to >>>>>>> call >>>>>>> the loop's private _create_connection_transport() from Motor's >>>>>>> customized create_connection(). Perhaps we could >>>>>>> make _create_connection_transport public, or otherwise make a public >>>>>>> API >>>>>>> for separating getaddrinfo from actually establishing the connection? >>>>>>> >>>>>>> 2. Make getaddrinfo customizable in asyncio ( >>>>>>> https://github.com/python/asyncio/issues/160). This isn't ideal, >>>>>>> since it requires Motor users on Mac / BSD to change configuration for >>>>>>> the >>>>>>> whole event loop just so Motor's specific create_connection calls >>>>>>> behave >>>>>>> correctly. >>>>>>> >>>>>>> 3. Back to the original proposal: add a connection timeout parameter >>>>>>> to create_connection. =) >>>>>>> >>>>>>> On Tuesday, December 8, 2015 at 4:30:04 PM UTC-5, Guido van Rossum >>>>>>> wrote: >>>>>>> >>>>>>>> On Tue, Dec 8, 2015 at 7:13 AM, A. Jesse Jiryu Davis < >>>>>>>> [email protected]> wrote: >>>>>>>> >>>>>>>>> Hi, a Motor user began an interesting discussion on the >>>>>>>>> MongoDB-user list: >>>>>>>>> >>>>>>>>> >>>>>>>>> https://groups.google.com/d/topic/mongodb-user/2oK6C3BrVKI/discussion >>>>>>>>> >>>>>>>>> The summary is this: he's fetching hundreds of URLs concurrently >>>>>>>>> and inserting the results into MongoDB with Motor. Motor throws lots >>>>>>>>> of >>>>>>>>> connection-timeout errors. The problem is getaddrinfo: on Mac, Python >>>>>>>>> only >>>>>>>>> allows one getaddrinfo call at a time. With hundreds of HTTP fetches >>>>>>>>> in >>>>>>>>> progress, there's a long queue waiting for the getaddrinfo lock. >>>>>>>>> Whenever >>>>>>>>> Motor wants to grow its connection pool it has to call getaddrinfo on >>>>>>>>> "localhost", and it spends so long waiting for that call, it times >>>>>>>>> out and >>>>>>>>> thinks it can't reach MongoDB. >>>>>>>>> >>>>>>>> >>>>>>>> If it's really looking up "localhost" over and over, maybe wrap a >>>>>>>> cache around getaddrinfo()? >>>>>>>> >>>>>>>> >>>>>>>>> Motor's connection-timeout implementation in asyncio is sort of >>>>>>>>> wrong: >>>>>>>>> >>>>>>>>> coro = asyncio.open_connection(host, port) >>>>>>>>> sock = yield from asyncio.wait_for(coro, timeout) >>>>>>>>> >>>>>>>>> The timer runs during the call to getaddrinfo, as well as the call >>>>>>>>> to the loop's sock_connect(). This isn't the intention: the timeout >>>>>>>>> should >>>>>>>>> apply only to the connection. >>>>>>>>> >>>>>>>>> A philosophical digression: The "connection timeout" is a >>>>>>>>> heuristic. "If I've waited N seconds and haven't established the >>>>>>>>> connection, I probably never will. Give up." Based on what they know >>>>>>>>> about >>>>>>>>> their own networks, users can tweak the connection timeout. In a fast >>>>>>>>> network, a server that hasn't responded in 20ms is probably down; but >>>>>>>>> on a >>>>>>>>> global network, 10 seconds might be reasonable. Regardless, the >>>>>>>>> heuristic >>>>>>>>> only applies to the actual TCP connection. Waiting for getaddrinfo is >>>>>>>>> not >>>>>>>>> related; that's up to the operating system. >>>>>>>>> >>>>>>>>> In a multithreaded client like PyMongo we distinguish the two >>>>>>>>> phases: >>>>>>>>> >>>>>>>>> for res in socket.getaddrinfo(host, port, family, >>>>>>>>> socket.SOCK_STREAM): >>>>>>>>> af, socktype, proto, dummy, sa = res >>>>>>>>> sock = socket.socket(af, socktype, proto) >>>>>>>>> try: >>>>>>>>> sock.settimeout(connect_timeout) >>>>>>>>> >>>>>>>>> # THE TIMEOUT ONLY APPLIES HERE. >>>>>>>>> sock.connect(sa) >>>>>>>>> sock.settimeout(None) >>>>>>>>> return sock >>>>>>>>> except socket.error as e: >>>>>>>>> # Connection refused, or not established within the >>>>>>>>> timeout. >>>>>>>>> sock.close() >>>>>>>>> >>>>>>>>> Here, the call to getaddrinfo isn't timed at all, and each >>>>>>>>> distinct attempt to connect on a different address is timed >>>>>>>>> separately. So >>>>>>>>> this kind of code matches the idea of a "connect timeout" as a >>>>>>>>> heuristic >>>>>>>>> for deciding whether the server is down. >>>>>>>>> >>>>>>>>> Two questions: >>>>>>>>> >>>>>>>>> 1. Should asyncio.open_connection support a connection timeout >>>>>>>>> that acts like the blocking version above? That is, a connection >>>>>>>>> timeout >>>>>>>>> that does not include getaddrinfo, and restarts for each address we >>>>>>>>> attempt >>>>>>>>> to connect to? >>>>>>>>> >>>>>>>> >>>>>>>> Hm, I don't really like adding timeouts to every API. As you >>>>>>>> describe everyone has different needs. IMO if you don't want the >>>>>>>> timeout to >>>>>>>> cover the getaddrinfo() call, call getaddrinfo() yourself and pass the >>>>>>>> host >>>>>>>> address into the create_connection() call. That way you also have >>>>>>>> control >>>>>>>> over whether to e.g. implement "happy eyeballs". (It will still call >>>>>>>> socket.getaddrinfo(), but it should be quick -- it's not going to a >>>>>>>> DNS >>>>>>>> server or even /etc/hosts to discover that 127.0.0.1 maps to >>>>>>>> 127.0.0.1.) >>>>>>>> >>>>>>>> >>>>>>>>> >>>>>>>>> 2. Why does Python lock around getaddrinfo on Mac and Windows >>>>>>>>> anyway? The code comment says these are "systems on which >>>>>>>>> getaddrinfo() is >>>>>>>>> believed to not be thread-safe". Has this belief ever been confirmed? >>>>>>>>> >>>>>>>>> >>>>>>>>> https://hg.python.org/cpython/file/d2b8354e87f5/Modules/socketmodule.c#l185 >>>>>>>>> >>>>>>>> >>>>>>>> I don't know -- the list of ifdefs seems to indicate this is a >>>>>>>> generic BSD issue, which is OS X's heritage. Maybe someone can do an >>>>>>>> experiment, or review the source code used by Apple (if it's still >>>>>>>> open >>>>>>>> source)? While I agree that if this really isn't an issue we shouldn't >>>>>>>> bother with the lock, I'd also much rather be safe than sorry when it >>>>>>>> comes >>>>>>>> to races in core Python. >>>>>>>> >>>>>>>> -- >>>>>>>> --Guido van Rossum (python.org/~guido) >>>>>>>> >>>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> --Guido van Rossum (python.org/~guido) >>>>>> >>>>> > > > -- > --Guido van Rossum (python.org/~guido) >
