On Mon, 24 Jun 2019 at 22:16, Michael Paquier <[email protected]> wrote: > > On Sat, Jun 15, 2019 at 12:25:12PM +1200, David Rowley wrote: > > With the attached version I'm just calling table_finish_bulk_insert() > > once per partition at the end of CopyFrom(). We've got an array with > > all the ResultRelInfos we touched in the proute, so it's mostly a > > matter of looping over that array and calling the function on each > > ResultRelInfo's ri_RelationDesc. However, to make it more complex, > > PartitionTupleRouting is private to execPartition.c so we can't do > > this directly... After staring at my screen for a while, I decided to > > write a function that calls a callback function on each ResultRelInfo > > in the PartitionTupleRouting. > > Don't take me bad, but I find the solution of defining and using a new > callback to call the table AM callback not really elegant, and keeping > all table AM callbacks called at a higher level than the executor > makes the code easier to follow. Shouldn't we try to keep any calls > to table_finish_bulk_insert() within copy.c for each partition > instead?
I'm not quite sure if I follow you since the call to table_finish_bulk_insert() is within copy.c still. The problem was that PartitionTupleRouting is private to execPartition.c, and we need a way to determine which of the partitions we routed tuples to. It seems inefficient to flush all of them if only a small number had tuples inserted into them and to me, it seems inefficient to add some additional tracking in CopyFrom(), like a hash table to store partition Oids that we inserted into. Using PartitionTupleRouting makes sense. It's just a question of how to access it, which is not so easy due to it being private. I did suggest a few other ways that we could solve this. I'm not so clear on which one of those you're suggesting or if you're thinking of something new. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
