On Thu, Dec 1, 2016 at 6:35 AM, Thomas Munro <thomas.mu...@enterprisedb.com> wrote: > On Sat, Nov 26, 2016 at 1:55 AM, Thomas Munro > <thomas.mu...@enterprisedb.com> wrote: >> Here's a new version to apply on top of dsa-v7.patch. > > Here's a version to go with dsa-v8.patch.
Thomas has spent a fair amount of time beating me up off-list about the fact that we have no way to recycle LWLock tranche IDs, and I've spent a fair amount of time trying to defend that decision, but it's really a problem here. This is all well and good the way it's written provided that there's only one parallel context in existence at a time, but should that number ever exceed 1, which it can, then the tranche's array_base will be incorrect for LWLocks in one or the other tranche. That *almost* doesn't matter. If you're not compiling with dtrace or LOCK_DEBUG or LWLOCK_STATS, you'll be fine. And if you are compiling with one of those things, I believe the consequences will be no worse than an occasional nonsensical lock ID. It's halfway tempting to just accept that as a known wart, but, uggh, that sucks. It's a bit hard to come up with a better alternative. We could set aside a certain number of tranche IDs for parallel contexts, say 16. Then as long as the same backend doesn't try to do create more than 16 parallel contexts at the same time, we'll be fine. And a backend really shouldn't have more than 16 Gather nodes active at the same time. But it could. In fact, even if the planner never created more than one Gather node in the same plan (which it sometimes does), the leader can have multiple parallel contexts active for different queries at the same time. It could fire off a Gather and then later some part of that plan that's running in the master could call a function that tries to execute some OTHER query which also happens to involve a Gather and then while the master is executing its part of THAT query the same thing could happen all over again, so any arbitrary limit we install here can be exceeded in, admittedly, extremely contrived scenarios. Moreover, the LWLock tranche system itself is limited to a 16-bit integer for tranche IDs, so there can't be more than 65536 of those over the lifetime of the cluster no matter what. Since 008608b9d51061b1f598c197477b3dc7be9c4a64, that limit is rather pointless; as things stand today, we could change the tranche ID to a 32-bit integer without losing anything. But changing that might eat up bit space that somebody wants to use later to solve some other problem, and anyway by itself it doesn't fix anything. Also, it would allow LWLockTrancheArray to get regrettably large. Another alternative is to have the DSA system fix up the tranche base address every time we enter a DSA function that might need to take a lock. The DSA code never returns with any relevant locks held, so the base address can't get obsoleted by some other DSA while a lock using the old base address is still held. That's sort of ugly too, and it adds a little overhead to fix a problem that will bite almost nobody, but it's formally correct and seems fairly watertight. Basically each DSA function that needs to take a lock would need to first do this: area->lwlock_tranche.array_base = &area->control->pools; Maybe that's not too bad... thoughts? BTW, I just noticed that the dsa_area_control member called "lock" is going to get a fairly random lock ID, based on the number of bytes by which "lock" follows "pools". Wouldn't it be better to put all of the LWLocks - both the general locks and the per-pool locks - in one array, probably with the general lock first, so that T_ID() will do the right thing for such locks? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (firstname.lastname@example.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers