On Tue, Jun 4, 2019 at 9:25 PM Robert Haas <robertmh...@gmail.com> wrote: > On Fri, May 17, 2019 at 4:36 PM Tom Lane <t...@sss.pgh.pa.us> wrote: > > Yeah, I did some additional testing that showed that it's pretty darn > > hard to get the leak to amount to anything. The test case that I > > originally posted did many DDLs in a single transaction, and it > > seems that that's actually essential to get a meaningful leak; as > > soon as you exit the transaction the leaked contexts will be recovered > > during sinval cleanup. > > My colleague Amul Sul rediscovered this same leak when he tried to > attach lots of partitions to an existing partitioned table, all in the > course of a single transaction. This seems a little less artificial > than Tom's original reproducer, which involved attaching and detaching > the same partition repeatedly. > > Here is a patch that tries to fix this, along the lines I previously > suggested; Amul reports that it does work for him.
Thanks for the patch. I noticed a crash with one of the scenarios that I think the patch is meant to address. Let me describe the steps: Attach gdb (or whatever) to session 1 in which we'll run a query that will scan at least two partitions and set a breakpoint in expand_partitioned_rtentry(). Run the query, so the breakpoint will be hit. Step through up to the start of the following loop in this function: i = -1; while ((i = bms_next_member(live_parts, i)) >= 0) { Oid childOID = partdesc->oids[i]; Relation childrel; RangeTblEntry *childrte; Index childRTindex; RelOptInfo *childrelinfo; /* Open rel, acquiring required locks */ childrel = table_open(childOID, lockmode); Note that 'partdesc' in the above code is from the partition directory. Before stepping through into the loop, start another session and attach a new partition. On into the loop. When the 1st partition is opened, its locking will result in RelationClearRelation() being called on the parent relation due to partition being attached concurrently, which I observed, is actually invoked a couple of times due to recursion. Parent relation's rd_oldpdcxt will be set in this process, which contains the aforementioned partition descriptor. Before moving the loop to process the 2nd partition, attach another partition in session 2. When the 2nd partition is opened, RelationClearRelation() will run again and in one of its recursive invocations, it destroys the rd_oldpdcxt that was set previously, taking the partition directory's partition descriptor with it. Anything that tries to access it later crashes. I couldn't figure out what to blame here though -- the design of rd_pdoldcxt or the fact that RelationClearRelation() is invoked multiple times. I will try to study it more closely tomorrow. Thanks, Amit