Thanks Guido, this all makes sense.

One problem, though, is that even if I call getaddrinfo myself in Motor and 
wrap a cache around it, I *still* can't use the event 
loop's create_connection() call. create_connection always 
calls getaddrinfo, and even though it "should be quick", that doesn't 
matter in this scenario: the time is spent waiting for the getaddrinfo 
lock. So I would need one of:

1. Copy the body of create_connection into Motor so I can customize the 
getaddrinfo call. I especially don't like this because I'd have to call the 
loop's private _create_connection_transport() from Motor's 
customized create_connection(). Perhaps we could 
make _create_connection_transport public, or otherwise make a public API 
for separating getaddrinfo from actually establishing the connection?

2. Make getaddrinfo customizable in asyncio 
(https://github.com/python/asyncio/issues/160). This isn't ideal, since it 
requires Motor users on Mac / BSD to change configuration for the whole 
event loop just so Motor's specific create_connection calls behave 
correctly.

3. Back to the original proposal: add a connection timeout parameter to 
create_connection. =)

On Tuesday, December 8, 2015 at 4:30:04 PM UTC-5, Guido van Rossum wrote:
>
> On Tue, Dec 8, 2015 at 7:13 AM, A. Jesse Jiryu Davis <
> [email protected] <javascript:>> wrote:
>
>> Hi, a Motor user began an interesting discussion on the MongoDB-user list:
>>
>> https://groups.google.com/d/topic/mongodb-user/2oK6C3BrVKI/discussion
>>
>> The summary is this: he's fetching hundreds of URLs concurrently and 
>> inserting the results into MongoDB with Motor. Motor throws lots of 
>> connection-timeout errors. The problem is getaddrinfo: on Mac, Python only 
>> allows one getaddrinfo call at a time. With hundreds of HTTP fetches in 
>> progress, there's a long queue waiting for the getaddrinfo lock. Whenever 
>> Motor wants to grow its connection pool it has to call getaddrinfo on 
>> "localhost", and it spends so long waiting for that call, it times out and 
>> thinks it can't reach MongoDB.
>>
>
> If it's really looking up "localhost" over and over, maybe wrap a cache 
> around getaddrinfo()?
>  
>
>> Motor's connection-timeout implementation in asyncio is sort of wrong:
>>
>>     coro = asyncio.open_connection(host, port)
>>     sock = yield from asyncio.wait_for(coro, timeout)
>>
>> The timer runs during the call to getaddrinfo, as well as the call to the 
>> loop's sock_connect(). This isn't the intention: the timeout should apply 
>> only to the connection.
>>
>> A philosophical digression: The "connection timeout" is a heuristic. "If 
>> I've waited N seconds and haven't established the connection, I probably 
>> never will. Give up." Based on what they know about their own networks, 
>> users can tweak the connection timeout. In a fast network, a server that 
>> hasn't responded in 20ms is probably down; but on a global network, 10 
>> seconds might be reasonable. Regardless, the heuristic only applies to the 
>> actual TCP connection. Waiting for getaddrinfo is not related; that's up to 
>> the operating system.
>>
>> In a multithreaded client like PyMongo we distinguish the two phases:
>>
>>     for res in socket.getaddrinfo(host, port, family, socket.SOCK_STREAM):
>>         af, socktype, proto, dummy, sa = res
>>         sock = socket.socket(af, socktype, proto)
>>         try:
>>             sock.settimeout(connect_timeout)
>>             
>>             # THE TIMEOUT ONLY APPLIES HERE.
>>             sock.connect(sa)
>>             sock.settimeout(None)
>>             return sock
>>         except socket.error as e:
>>             # Connection refused, or not established within the timeout.
>>             sock.close()
>>
>> Here, the call to getaddrinfo isn't timed at all, and each distinct 
>> attempt to connect on a different address is timed separately. So this kind 
>> of code matches the idea of a "connect timeout" as a heuristic for deciding 
>> whether the server is down.
>>
>> Two questions:
>>
>> 1. Should asyncio.open_connection support a connection timeout that acts 
>> like the blocking version above? That is, a connection timeout that does 
>> not include getaddrinfo, and restarts for each address we attempt to 
>> connect to?
>>
>
> Hm, I don't really like adding timeouts to every API. As you describe 
> everyone has different needs. IMO if you don't want the timeout to cover 
> the getaddrinfo() call, call getaddrinfo() yourself and pass the host 
> address into the create_connection() call. That way you also have control 
> over whether to e.g. implement "happy eyeballs". (It will still call 
> socket.getaddrinfo(), but it should be quick -- it's not going to a DNS 
> server or even /etc/hosts to discover that 127.0.0.1 maps to 127.0.0.1.)
>  
>
>>
>> 2. Why does Python lock around getaddrinfo on Mac and Windows anyway? The 
>> code comment says these are "systems on which getaddrinfo() is believed to 
>> not be thread-safe". Has this belief ever been confirmed?
>>
>>
>> https://hg.python.org/cpython/file/d2b8354e87f5/Modules/socketmodule.c#l185
>>
>
> I don't know -- the list of ifdefs seems to indicate this is a generic BSD 
> issue, which is OS X's heritage. Maybe someone can do an experiment, or 
> review the source code used by Apple (if it's still open source)? While I 
> agree that if this really isn't an issue we shouldn't bother with the lock, 
> I'd also much rather be safe than sorry when it comes to races in core 
> Python.
>
> -- 
> --Guido van Rossum (python.org/~guido)
>

Reply via email to