st 24. 4. 2024 v 14:50 odesílatel Bharath Rupireddy <
bharath.rupireddyforpostg...@gmail.com> napsal:

> On Wed, Apr 3, 2024 at 1:10 AM Jeff Davis <pg...@j-davis.com> wrote:
> >
> > Here's where I think this API should go:
> >
> > 1. Have table_modify_begin/end and table_modify_buffer_insert, like
> > those that are implemented in your patch.
>
> I added table_modify_begin, table_modify_buffer_insert,
> table_modify_buffer_flush and table_modify_end. Table Access Method (AM)
> authors now can define their own buffering strategy and flushing decisions
> based on their tuple storage kinds and various other AM specific factors. I
> also added a default implementation that falls back to single inserts when
> no implementation is provided for these AM by AM authors. See the attached
> v19-0001 patch.
>
> > 2. Add some kind of flush callback that will be called either while the
> > tuples are being flushed or after the tuples are flushed (but before
> > they are freed by the AM). (Aside: do we need to call it while the
> > tuples are being flushed to get the right visibility semantics for
> > after-row triggers?)
>
> I added a flush callback named TableModifyBufferFlushCallback; when
> provided by callers invoked after tuples are flushed to disk from the
> buffers but before the AM frees them up. Index insertions and AFTER ROW
> INSERT triggers can be executed in this callback. See the v19-0001 patch
> for how AM invokes the flush callback, and see either v19-0003 or v19-0004
> or v19-0005 for how a caller can supply the callback and required context
> to execute index insertions and AR triggers.
>
> > 3. Add table_modify_buffer_{update|delete} APIs.
> >
> > 9. Use these new methods for DELETE, UPDATE, and MERGE. MERGE can use
> > the buffer_insert/update/delete APIs; we don't need a separate merge
> > method. This probably requires that the AM maintain 3 separate buffers
> > to distinguish different kinds of changes at flush time (obviously
> > these can be initialized lazily to avoid overhead when not being used).
>
> I haven't thought about these things yet. I can only focus on them after
> seeing how the attached patches go from here.
>
> > 4. Some kind of API tweaks to help manage memory when modifying
> > pertitioned tables, so that the buffering doesn't get out of control.
> > Perhaps just reporting memory usage and allowing the caller to force
> > flushes would be enough.
>
> Heap implementation for thes new Table AMs uses a separate memory context
> for all of the operations. Please have a look and let me know if we need
> anything more.
>
> > 5. Use these new methods for CREATE/REFRESH MATERIALIZED VIEW. This is
> > fairly straightforward, I believe, and handled by your patch. Indexes
> > are (re)built afterward, and no triggers are possible.
> >
> > 6. Use these new methods for CREATE TABLE ... AS. This is fairly
> > straightforward, I believe, and handled by your patch. No indexes or
> > triggers are possible.
>
> I used multi inserts for all of these including TABLE REWRITE commands
> such as ALTER TABLE. See the attached v19-0002 patch. Check the testing
> section below for benefits.
>
> FWIW, following are some of the TABLE REWRITE commands that can get
> benefitted:
>
> ALTER TABLE tbl ALTER c1 TYPE bigint;
> ALTER TABLE itest13 ADD COLUMN c int GENERATED BY DEFAULT AS IDENTITY;
> ALTER MATERIALIZED VIEW heapmv SET ACCESS METHOD heap2;
> ALTER TABLE itest3 ALTER COLUMN a TYPE int;
> ALTER TABLE gtest20 ALTER COLUMN b SET EXPRESSION AS (a * 3);
> ALTER TABLE has_volatile ADD col4 int DEFAULT (random() * 10000)::int;
> and so on.
>
> > 7. Use these new methods for COPY. We have to be careful to avoid
> > regressions for the heap method, because it's already managing its own
> > buffers. If the AM manages the buffering, then it may require
> > additional copying of slots, which could be a disadvantage. To solve
> > this, we may need some minor API tweaks to avoid copying when the
> > caller guarantees that the memory will not be freed to early, or
> > perhaps expose the AM's memory context to copyfrom.c. Another thing to
> > consider is that the buffering in copyfrom.c is also used for FDWs, so
> > that buffering code path needs to be preserved in copyfrom.c even if
> > not used for AMs.
>
> I modified the COPY FROM code to use the new Table AMs, and performed some
> tests which show no signs of regression. Check the testing section below
> for more details. See the attached v19-0005 patch. With this,
> table_multi_insert can be deprecated.
>
> > 8. Use these new methods for INSERT INTO ... SELECT. One potential
> > challenge here is that execution nodes are not always run to
> > completion, so we need to be sure that the flush isn't forgotten in
> > that case.
>
> I did that in v19-0003. I did place the table_modify_end call in multiple
> places including ExecEndModifyTable. I didn't find any issues with it.
> Please have a look and let me know if we need the end call in more places.
> Check the testing section below for benefits.
>
> > 10. Use these new methods for logical apply.
>
> I used multi inserts for Logical Replication apply. in v19-0004. Check the
> testing section below for benefits.
>
> FWIW, open-source pglogical does have multi insert support, check code
> around
> https://github.com/2ndQuadrant/pglogical/blob/REL2_x_STABLE/pglogical_apply_heap.c#L960
> .
>
> > 11. Deprecate the multi_insert API.
>
> I did remove both table_multi_insert and table_finish_bulk_insert in
> v19-0006. Perhaps, removing them isn't a great idea, but adding a
> deprecation WARNING/ERROR until some more PG releases might be worth
> looking at.
>
> > Thoughts on this plan? Does your patch make sense in v17 as a stepping
> > stone, or should we try to make all of these API changes together in
> > v18?
>
> If the design, code and benefits that these new Table AMs bring to the
> table look good, I hope to see it for PG 18.
>
> > Also, a sample AM code would be a huge benefit here. Writing a real AM
> > is hard, but perhaps we can at least have an example one to demonstrate
> > how to use these APIs?
>
> The attached patches already have implemented these new Table AMs for
> Heap. I don't think we need a separate implementation to demonstrate. If
> others feel so, I'm open to thoughts here.
>
> Having said above, I'd like to reiterate the motivation behind the new
> Table AMs for multi and single inserts.
>
> 1. A scan-like API with state being carried across is thought to be better
> as suggested by Andres Freund -
> https://www.postgresql.org/message-id/20200924024128.kyk3r5g7dnu3f...@alap3.anarazel.de
> .
> 2. Allowing a Table AM to optimize operations across multiple inserts,
> define its own buffering strategy and take its own flushing decisions based
> on their tuple storage kinds and various other AM specific factors.
> 3. Improve performance of various SQL commands with multi inserts for Heap
> AM.
>
> The attached v19 patches might need some more detailed comments, some
> documentation and some specific tests ensuring the multi inserts for Heap
> are kicked-in for various commands. I'm open to thoughts here.
>
> I did some testing to see how various commands benefit with multi inserts
> using these new Table AM for heap. It's not only the improvement in
> performance these commands see, but also the amount of WAL that gets
> generated reduces greatly. After all, multi inserts optimize the insertions
> by writing less WAL. IOW, writing WAL record per page if multiple rows fit
> into a single data page as opposed to WAL record per row.
>
> Test case 1: 100 million rows, 2 columns (int and float)
>
> Command                        | HEAD (sec) | PATCHED (sec) | Faster by %
> | Faster by X
> ------------------------------ | ---------- | ------------- | -----------
> | -----------
> CREATE TABLE AS                | 121        | 77            | 36.3
>  | 1.57
> CREATE MATERIALIZED VIEW       | 101        | 49            | 51.4
>  | 2.06
> REFRESH MATERIALIZED VIEW      | 113        | 54            | 52.2
>  | 2.09
> ALTER TABLE (TABLE REWRITE)    | 124        | 81            | 34.6
>  | 1.53
> COPY FROM                      | 71         | 72            | 0
> | 1
> INSERT INTO ... SELECT         | 117        | 62            | 47
>  | 1.88
> LOGICAL REPLICATION APPLY      | 393        | 306           | 22.1
>  | 1.28
>
> Command                        | HEAD (WAL in GB) | PATCHED (WAL in GB) |
> Reduced by % | Reduced by X
> ------------------------------ | ---------------- | ------------------- |
> ------------ | -----------
> CREATE TABLE AS                | 6.8              | 2.4                 |
> 64.7         | 2.83
> CREATE MATERIALIZED VIEW       | 7.2              | 2.3                 |
> 68           | 3.13
> REFRESH MATERIALIZED VIEW      | 10               | 5.1                 |
> 49           | 1.96
> ALTER TABLE (TABLE REWRITE)    | 8                | 3.2                 |
> 60           | 2.5
> COPY FROM                      | 2.9              | 3                   |
> 0            | 1
> INSERT INTO ... SELECT         | 8                | 3                   |
> 62.5         | 2.66
> LOGICAL REPLICATION APPLY      | 7.5              | 2.3                 |
> 69.3         | 3.26
>
> Test case 2: 1 billion rows, 1 column (int)
>
> Command                        | HEAD (sec) | PATCHED (sec) | Faster by %
> | Faster by X
> ------------------------------ | ---------- | ------------- | -----------
> | -----------
> CREATE TABLE AS                | 794        | 386           | 51.38
> | 2.05
> CREATE MATERIALIZED VIEW       | 1006       | 563           | 44.03
> | 1.78
> REFRESH MATERIALIZED VIEW      | 977        | 603           | 38.28
> | 1.62
> ALTER TABLE (TABLE REWRITE)    | 1189       | 714           | 39.94
> | 1.66
> COPY FROM                      | 321        | 330           | -0.02
> | 0.97
> INSERT INTO ... SELECT         | 1084       | 586           | 45.94
> | 1.84
> LOGICAL REPLICATION APPLY      | 3530       | 2982          | 15.52
> | 1.18
>
> Command                        | HEAD (WAL in GB) | PATCHED (WAL in GB) |
> Reduced by % | Reduced by X
> ------------------------------ | ---------------- | ------------------- |
> ------------ | -----------
> CREATE TABLE AS                | 60               | 12                  |
> 80           | 5
> CREATE MATERIALIZED VIEW       | 60               | 12                  |
> 80           | 5
> REFRESH MATERIALIZED VIEW      | 60               | 12                  |
> 80           | 5
> ALTER TABLE (TABLE REWRITE)    | 123              | 31                  |
> 60           | 2.5
> COPY FROM                      | 12               | 12                  |
> 0            | 1
> INSERT INTO ... SELECT         | 120              | 24                  |
> 80           | 5
> LOGICAL REPLICATION APPLY      | 61               | 12                  |
> 80.32        | 5
>

looks pretty impressive!

Pavel


>
> Test setup:
> ./configure --prefix=$PWD/pg17/ --enable-tap-tests CFLAGS="-ggdb3 -O2" >
> install.log && make -j 8 install > install.log 2>&1 &
>
> wal_level=logical
> max_wal_size = 256GB
> checkpoint_timeout = 1h
>
> Test system is EC2 instance of type c5.4xlarge:
> Architecture:            x86_64
>   CPU op-mode(s):        32-bit, 64-bit
>   Address sizes:         46 bits physical, 48 bits virtual
>   Byte Order:            Little Endian
> CPU(s):                  16
>   On-line CPU(s) list:   0-15
> Vendor ID:               GenuineIntel
>   Model name:            Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz
>     CPU family:          6
>     Model:               85
>     Thread(s) per core:  2
>     Core(s) per socket:  8
>     Socket(s):           1
>     Stepping:            7
>     BogoMIPS:            5999.99
> Caches (sum of all):
>   L1d:                   256 KiB (8 instances)
>   L1i:                   256 KiB (8 instances)
>   L2:                    8 MiB (8 instances)
>   L3:                    35.8 MiB (1 instance)
> NUMA:
>   NUMA node(s):          1
>   NUMA node0 CPU(s):     0-15
> RAM:
>   MemTotal:       32036536 kB
>
> --
> Bharath Rupireddy
> PostgreSQL Contributors Team
> RDS Open Source Databases
> Amazon Web Services: https://aws.amazon.com
>

Reply via email to