On Wed, Feb 8, 2017 at 5:36 AM, Thomas Munro <thomas.mu...@enterprisedb.com> wrote: > Thinking a bit harder about this, I suppose there could be a kind of > object called a SharedBufFileManager (insert better name) which you > can store in a DSM segment. The leader backend that initialises a DSM > segment containing one of these would then call a constructor function > that sets an internal refcount to 1 and registers an on_dsm_detach > callback for its on-detach function. All worker backends that attach > to the DSM segment would need to call an attach function for the > SharedBufFileManager to increment a refcount and also register the > on_dsm_detach callback, before any chance that an error might be > thrown (is that difficult?); failure to do so could result in file > leaks. Then, when a BufFile is to be shared (AKA exported, made > unifiable), a SharedBufFile object can be initialised somewhere in the > same DSM segment and registered with the SharedBufFileManager. > Internally all registered SharedBufFile objects would be linked > together using offsets from the start of the DSM segment for link > pointers. Now when SharedBufFileManager's on-detach function runs, it > decrements the refcount in the SharedBufFileManager, and if that > reaches zero then it runs a destructor that spins through the list of > SharedBufFile objects deleting files that haven't already been deleted > explicitly.
I think this is approximately reasonable, but I think it could be made simpler by having fewer separate objects. Let's assume the leader can put an upper bound on the number of shared BufFiles at the time it's sizing the DSM segment (i.e. before InitializeParallelDSM). Then it can allocate a big ol' array with a header indicating the array size and each element containing enough space to identify the relevant details of 1 shared BufFile. Now you don't need to do any allocations later on, and you don't need a linked list. You just loop over the array and do what needs doing. > There are a couple of problems with the above though. Firstly, doing > reference counting in DSM segment on-detach hooks is really a way to > figure out when the DSM segment is about to be destroyed by keeping a > separate refcount in sync with the DSM segment's refcount, but it > doesn't account for pinned DSM segments. It's not your use-case or > mine currently, but someone might want a DSM segment to live even when > it's not attached anywhere, to be reattached later. If we're trying > to use DSM segment lifetime as a scope, we'd be ignoring this detail. > Perhaps instead of adding our own refcount we need a new kind of hook > on_dsm_destroy. I think it's good enough to plan for current needs now. It's not impossible to change this stuff later, but we need something that works robustly right now without being too invasive. Inventing whole new system concepts because of stuff we might someday want to do isn't a good idea because we may easily guess wrong about what direction we'll want to go in the future. This is more like building a wrench than a 747: a 747 needs to be extensible and reconfigurable and upgradable because it costs $350 million. A wrench costs $10 at Walmart and if it turns out we bought the wrong one, we can just throw it out and get a different one later. > Secondly, I might not want to be constrained by a > fixed-sized DSM segment to hold my SharedBufFile objects... there are > cases where I need to shared a number of batch files that is unknown > at the start of execution time when the DSM segment is sized (I'll > write about that shortly on the Parallel Shared Hash thread). Maybe I > can find a way to get rid of that requirement. Or maybe it could > support DSA memory too, but I don't think it's possible to use > on_dsm_detach-based cleanup routines that refer to DSA memory because > by the time any given DSM segment's detach hook runs, there's no > telling which other DSM segments have been detached already, so the > DSA area may already have partially vanished; some other kind of hook > that runs earlier would be needed... Again, wrench. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (firstname.lastname@example.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers