Re: [openstack-dev] [all][oslo] Dealing with database connection sharing issues

Michael Bayer Sun, 22 Feb 2015 16:57:05 -0800



> On Feb 22, 2015, at 10:20 AM, Yuriy Taraday <[email protected]> wrote:
> 
> 
> 
>> On Sun Feb 22 2015 at 6:27:16 AM Michael Bayer <[email protected]> wrote:
>> 
>> 
>> 
>> > On Feb 21, 2015, at 9:49 PM, Joshua Harlow <[email protected]> wrote:
>> >
>> > Some comments/questions inline...
>> >
>> > Mike Bayer wrote:
>> >>
>> >> Yuriy Taraday<[email protected]>  wrote:
>> >>
>> >>>> On Fri Feb 20 2015 at 9:14:30 PM Joshua Harlow<[email protected]>  
>> >>>> wrote:
>> >>>> This feels like something we could do in the service manager base class,
>> >>>> maybe by adding a "post fork" hook or something.
>> >>> +1 to that.
>> >>>
>> >>> I think it'd be nice to have the service __init__() maybe be something 
>> >>> like:
>> >>>
>> >>>   def __init__(self, threads=1000, prefork_callbacks=None,
>> >>>                postfork_callbacks=None):
>> >>>      self.postfork_callbacks = postfork_callbacks or []
>> >>>      self.prefork_callbacks = prefork_callbacks or []
>> >>>      # always ensure we are closing any left-open fds last...
>> >>>      self.prefork_callbacks.append(self._close_descriptors)
>> >>>      ...
>> >>>
>> >>> (you must've meant postfork_callbacks.append)
>> >>>
>> >>> Note that multiprocessing module already have 
>> >>> `multiprocessing.util.register_after_fork` method that allows to 
>> >>> register callback that will be called every time a Process object is 
>> >>> run. If we remove explicit use of `os.fork` in oslo.service (replace it 
>> >>> with Process class) we'll be able to specify any after-fork callbacks in 
>> >>> libraries that they need.
>> >>> For example, EngineFacade could register `pool.dispose()` callback there 
>> >>> (it should have some proper finalization logic though).
>> >>
>> >> +1 to use Process and the callback system for required initialization 
>> >> steps
>> >> and so forth, however I don’t know that an oslo lib should silently 
>> >> register
>> >> global events on the assumption of how its constructs are to be used.
>> >>
>> >> I think whatever Oslo library is responsible for initiating the 
>> >> Process/fork
>> >> should be where it ensures that resources from other Oslo libraries are 
>> >> set
>> >> up correctly. So oslo.service might register its own event handler with
>> >
>> > Sounds like some kind of new entrypoint + discovery service that 
>> > oslo.service (eck can we name it something else, something that makes it 
>> > useable for others on pypi...) would need to plug-in to. It would seems 
>> > like this is a general python problem (who is to say that only oslo 
>> > libraries use resources that need to be fixed/closed after forking); are 
>> > there any recommendations that the python community has in general for 
>> > this (aka, a common entrypoint *all* libraries export that allows them to 
>> > do things when a fork is about to occur)?
>> >
>> >> oslo.db such that it gets notified of new database engines so that it can
>> >> associate a disposal with it; it would do something similar for
>> >> oslo.messaging and other systems that use file handles.   The end
>> >> result might be that it uses register_after_fork(), but the point is that
>> >> oslo.db.sqlalchemy.create_engine doesn’t do this; it lets oslo.service
>> >> apply a hook so that oslo.service can do it on behalf of oslo.db.
>> >
>> > Sounds sort of like global state/a 'open resource' pool that each library 
>> > needs to maintain internally to it that tracks how applications/other 
>> > libraries are using it; that feels sorta odd IMHO.
>> >
>> > Wouldn't that mean libraries that provide back resource objects, or 
>> > resource containing objects..., for others to use would now need to 
>> > capture who is using what (weakref pools?) to retain what all the 
>> > resources are being used and by whom (so that they can fix/close them on 
>> > fork); not every library has a pool (like sqlalchemy afaik does) to track 
>> > these kind(s) of things (for better or worse...). And what if those 
>> > libraries use other libraries that use resources (who owns what?); seems 
>> > like this just gets very messy/impractical pretty quickly once you start 
>> > using any kind of 3rd party library that doesn't follow the same 
>> > pattern... (which brings me back to the question of isn't there a common 
>> > python way/entrypoint that deal with forks that works better than ^).
>> >
>> >>
>> >> So, instead of oslo.service cutting through and closing out the file
>> >> descriptors from underneath other oslo libraries that opened them, we set 
>> >> up
>> >> communication channels between oslo libs that maintain a consistent layer 
>> >> of
>> >> abstraction, and instead of making all libraries responsible for the side
>> >> effects that might be introduced from other oslo libraries, we make the
>> >> side-effect-causing library the point at which those effects are
>> >> ameliorated as a service to other oslo libraries.   This allows us to keep
>> >> the knowledge of what it means to use “multiprocessing” in one
>> >> place, rather than spreading out its effects.
>> >
>> > If only we didn't have all those other libraries[1] that people use to 
>> > (that afaik highly likely also have resources they open); so even with 
>> > getting oslo.db and oslo.messaging into this kind of pattern, we are still 
>> > left with the other 200+ that aren't/haven't been following this pattern 
>> > ;-)
>> 
>> I'm only trying to solve well known points like this one between two Oslo 
>> libraries.   Obviously trying to multiply out this pattern times all 
>> libraries, including non Oslo ones, is infeasible.
>> 
>> The issue here is simple.   Does oslo.service have to worry that the app 
>> uses oslo.db also, or does oslo.db have to worry that the app uses 
>> oslo.service also?   I say the former bc oslo.db is the more fundamental 
>> system at the base of an app, whereas oslo.service actually deploys code in 
>> a certain context.
> 
> I think that oslo.db needs to worry about running in Python environment where 
> people might want to create new Process'es. If we tie everything to 
> oslo.service we won't be able to do the right thing if someone just wants to 
> fork in some other code.
> We can however provide some nice wrapper around `register_after_fork` that 
> would allow us to avoid all the weakrefs in other libraries' code.

Correct me if I'm wrong but the register_after_fork seems to apply only to the 
higher level Process abstraction.   If someone calls os.fork(), as is the case 
now, there's no hook to use.

Hence the solution I have in place right now, which is that Oslo.db *can* 
detect a fork and adapt at the most basic level by checking for os.getpid() and 
recreating the connection, no need for anyone to call engine.dispose() 
anywhere. But that approach has been rejected.  Because the caller of the 
library should be aware they're doing this. 

If we can all read the whole thread here each time and be on the same page 
about what is acceptable and what's not, that would help.



> __________________________________________________________________________
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: [email protected]?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

__________________________________________________________________________
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: [email protected]?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [all][oslo] Dealing with database connection sharing issues

Reply via email to