Re: [openstack-dev] [all][oslo] Dealing with database connection sharing issues

Mike Bayer Thu, 19 Feb 2015 18:47:41 -0800


Doug Hellmann <[email protected]> wrote:


>> 5) Allow this sort of connection sharing to continue for a deprecation
>> period with apppropriate logging, then make it a hard failure.
>> 
>> This would provide services time to find and fix any sharing problems
>> they might have, but would delay the timeframe for a final fix.
>> 
>> 6-ish) Fix oslo-incubator service.py to close all file descriptors after
>> forking.
>> 
> 
> I'm not sure why 6 is "slower", can someone elaborate on that?

So, option “A”, they call engine.dispose() the moment they’re in a fork, the 
activity upon requesting a connection from the pool is: look in pool, no 
connections present, create a connection and return it.

Option “5”, the way the patch is right now to auto-invalidate on detection of 
new fork, the activity upon requesting a connection is from the pool is: look 
in pool, connection present, check that os.pid() matches what we’ve associated 
with the connection record, if not, we raise an exception indicating “invalid”, 
this is immediately caught, sets the connection record as “invalid”, the 
connection record them immediately disposes that file descriptor, makes a new 
connection and returns that.

Option “6”, the new fork starts, the activity upon requesting a connection from 
the pool is: look in pool, connection present, perform the oslo.db “ping” 
event, ping event emits “SELECT 1” to the MySQLdb driver, driver attempts to 
emit this statement on the socket, socket communication fails, MySQLdb converts 
to an exception, exception is raised, SQLAlchemy catches the exception, sends 
it to a parser to determine the nature of the exception, we see that it’s a 
“disconnect” exception, we set the “invalidate” flag on the exception, we 
re-raise, oslo.db’s exc_filters then catch the exception, more string parsers 
get involved, we determine we need to raise an oslo.db.DBDisconnect exception, 
we raise that, the “SELECT 1” ping handler catches that, we then emit “SELECT 
1” again so that it reconnects, we then hit the connection record that’s in 
“invalid” state so it knows to reconnect, it reconnects and the “SELECT 1” 
continues on the new connection and we start up.

So essentially option “5” (the way the gerrit is right now) has a subset of the 
components of “6”; “6” has the additional steps of: emit a doomed statement on 
the closed socket, then when it fails raise / catch / parse / reraise / catch / 
parse / reraise that exception.   Option “5” just has, check the pid, raise / 
catch an exception.

IMO the two options are: “5”, check the pid and recover or “3” make it a hard 
failure.
 
__________________________________________________________________________
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: [email protected]?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [all][oslo] Dealing with database connection sharing issues

Reply via email to