On Tue, Aug 18, 2020 at 1:37 PM Bharath Rupireddy <bharath.rupireddyforpostg...@gmail.com> wrote: > > On Tue, Jul 14, 2020 at 1:20 PM Dilip Kumar <dilipbal...@gmail.com> wrote: > > > > On Mon, Jul 13, 2020 at 4:23 PM Amit Kapila <amit.kapil...@gmail.com> wrote: > > > > > I think we can do more than this by > > > parallelizing the Insert part of this query as well as we have lifted > > > group locking restrictions related to RelationExtension and Page lock > > > in PG13. It would be really cool to do that unless we see any > > > fundamental problems with it. > > > > +1, I think it will be cool to support for insert part here as well as > > insert part in 'Create Table As Select..' as well. > > > > +1 to parallelize inserts. Currently, ExecInsert() and CTAS use > table_tuple_insert(), if we parallelize these parts, each worker will > be inserting it's tuples(one tuple at a time) into the same data page, > until space is available, if not a new data page can be obtained by > any of the worker, others might start inserting into it. This way, > will there be lock contention on data pages? >
It is possible but we need to check how much that is a bottleneck because that should not be a big part of the operation. And, it won't be any worse than inserts via multiple backends. I think it is important to do that way, otherwise, some of the pages can remain half-empty. Right now, the plan for Insert ... Select is like Insert on <tbl_x> -> Seq Scan on <tbl_y> .... In the above the scan could be index scan as well. What we want is: Gather -> Insert on <tbl_x> -> Seq Scan on <tbl_y> .... >. Do we also need to make > inserts to use table_multi_insert() (like the way "COPY" uses) instead > of table_tuple_insert()? > I am not sure at this stage but if it turns out to be a big problem then we might think of inventing some way to allow individual workers to operate on different pages. I think even without that we should be able to make a big gain as reads, filtering, etc can be done in parallel. -- With Regards, Amit Kapila.