On Fri, 11 Aug 2023 at 16:26, vignesh C <vignes...@gmail.com> wrote: > > On Wed, 9 Aug 2023 at 09:51, vignesh C <vignes...@gmail.com> wrote: > > > > Hi Melih, > > > > Here is a patch to help in getting the execution at various phases > > like: a) replication slot creation time, b) Wal reading c) Number of > > WAL records read d) subscription relation state change etc > > Couple of observation while we tested with this patch: > > 1) We noticed that the patch takes more time for finding the decoding > > start point. > > 2) Another observation was that the number of XLOG records read for > > identify the consistent point was significantly high with the v26_0001 > > patch. > > > > HEAD > > postgres=# select avg(counttime)/1000 "avgtime(ms)", > > median(counttime)/1000 "median(ms)", min(counttime)/1000 > > "mintime(ms)", max(counttime)/1000 "maxtime(ms)", logtype from test > > group by logtype; > > avgtime(ms) | median(ms) | mintime(ms) | > > maxtime(ms) | logtype > > ------------------------+------------------------+-------------+-------------+-------------------------- > > 0.00579245283018867920 | 0.00200000000000000000 | 0 | > > 1 | SNAPSHOT_BUILD > > 1.2246811320754717 | 0.98550000000000000000 | 0 | > > 37 | LOGICAL_SLOT_CREATION > > 171.0863283018867920 | 183.9120000000000000 | 0 | > > 408 | FIND_DECODING_STARTPOINT > > 2.0699433962264151 | 1.4380000000000000 | 1 | > > 49 | INIT_DECODING_CONTEXT > > (4 rows) > > > > HEAD + v26-0001 patch > > postgres=# select avg(counttime)/1000 "avgtime(ms)", > > median(counttime)/1000 "median(ms)", min(counttime)/1000 > > "mintime(ms)", max(counttime)/1000 "maxtime(ms)", logtype from test > > group by logtype; > > avgtime(ms) | median(ms) | mintime(ms) | > > maxtime(ms) | logtype > > ------------------------+------------------------+-------------+-------------+-------------------------- > > 0.00588113207547169810 | 0.00500000000000000000 | 0 | > > 0 | SNAPSHOT_BUILD > > 1.1270962264150943 | 1.1000000000000000 | 0 | > > 2 | LOGICAL_SLOT_CREATION > > 301.1745528301886790 | 410.4870000000000000 | 0 | > > 427 | FIND_DECODING_STARTPOINT > > 1.4814660377358491 | 1.4530000000000000 | 1 | > > 9 | INIT_DECODING_CONTEXT > > (4 rows) > > > > In the above FIND_DECODING_STARTPOINT is very much higher with V26-0001 > > patch. > > > > HEAD > > FIND_DECODING_XLOG_RECORD_COUNT > > - average = 2762 > > - median = 3362 > > > > HEAD + reuse worker patch(v26_0001 patch) > > Where FIND_DECODING_XLOG_RECORD_COUNT > > - average = 4105 > > - median = 5345 > > > > Similarly Number of xlog records read is higher with v26_0001 patch. > > > > Steps to calculate the timing: > > -- first collect the necessary LOG from subscriber's log. > > cat *.log | grep -E > > '(LOGICAL_SLOT_CREATION|INIT_DECODING_CONTEXT|FIND_DECODING_STARTPOINT|SNAPSHOT_BUILD|FIND_DECODING_XLOG_RECORD_COUNT|LOGICAL_XLOG_READ|LOGICAL_DECODE_PROCESS_RECORD|LOGICAL_WAIT_TRANSACTION)' > > > grep.dat > > > > create table testv26(logtime varchar, pid varchar, level varchar, > > space varchar, logtype varchar, counttime int); > > -- then copy these datas into db table to count the avg number. > > COPY testv26 FROM '/home/logs/grep.dat' DELIMITER ' '; > > > > -- Finally, use the SQL to analyze the data: > > select avg(counttime)/1000 "avgtime(ms)", logtype from testv26 group by > > logtype; > > > > --- To get the number of xlog records read: > > select avg(counttime) from testv26 where logtype > > ='FIND_DECODING_XLOG_RECORD_COUNT' and counttime != 1; > > > > Thanks to Peter and Hou-san who helped in finding these out. We are > > parallely analysing this, @Melih Mutlu posting this information so > > that it might help you too in analysing this issue. > > I analysed further on why it needs to read a larger number of XLOG > records in some cases while creating the replication slot, here are my > thoughts: > Note: Tablesync worker needs to connect to the publisher and create > consistent point for the slots by reading the XLOG records. This > requires that all the open transactions and the transactions that are > created while creating consistent point should be committed. > I feel the creation of slots is better in few cases in Head because: > Publisher | Subscriber > ------------------------------------------------------------ > Begin txn1 transaction | > Insert 1..1000 records | > Commit | > Begin txn2 transaction | > Insert 1..1000 records | Apply worker applies transaction txn1 > | Start tablesync table t2 > | create consistent point in > | publisher before transaction txn3 is > | started > commit | We just need to wait till > | transaction txn2 is finished. > Begin txn3 transaction | > Insert 1..1000 records | > commit | > > In V26, this is happening in some cases: > Publisher | Subscriber > ------------------------------------------------------------ > Begin txn1 transaction | > Insert 1..1000 records | > Commit | > Begin txn2 transaction | > Insert 1..1000 records | Apply worker applies transaction txn1 > | Start tablesync table t2 > commit | Create consistent point > Begin txn3 transaction | (since transaction txn2 is committed > | and txn3 is started, we will > | need to wait > | for transaction txn3 to be > | committed) > Insert 1..1000 records | > commit | > > This is because In HEAD the tablesync worker will be started after one > commit, so we are able to create the consistent point before a new > transaction is started in some cases. > Create slot will be fastest if the tablesync worker is able to connect > to the publisher and create a consistent point before the new > transaction is started. The probability of this is better in HEAD for > this scenario as the new tablesync worker is started after commit and > the tablesync worker in HEAD has a better time window(because the > current transaction has just started) before another new transaction > is started. This probability is slightly lower with the V26 version. > I felt this issue is purely a timing issue in a few cases because of > the timing of the new transactions being created while creating the > slot.
I used the following steps to analyse this issue: Logs can be captured by applying the patches at [1]. -- first collect the necessary information about from publisher's log from the execution of HEAD: cat *.log | grep FIND_DECODING_XLOG_RECORD_COUNT > grep_head.dat -- first collect the necessary information about from publisher's log from the execution of v26: cat *.log | grep FIND_DECODING_XLOG_RECORD_COUNT > grep_v26.dat -- then copy these datas into HEAD's db table to count the avg number. COPY test_head FROM '/home/logs/grep_head.dat' DELIMITER ' '; -- then copy these datas into the v26 db table to count the avg number. COPY test_v26 FROM '/home/logs/grep_v26.dat' DELIMITER ' '; Find the average of XLOG records read in HEAD: postgres=# select avg(counttime) from test_head where logtype ='FIND_DECODING_XLOG_RECORD_COUNT' and counttime != 1; avg ----------------------- 1394.1100000000000000 (1 row) Find the average of XLOG records read in V26: postgres=# select avg(counttime) from test_v26 where logtype ='FIND_DECODING_XLOG_RECORD_COUNT' and counttime != 1; avg ----------------------- 1900.4100000000000000 (1 row) When analysing why create replication slot needs to read more records in a few cases, I found a very interesting observation. I found that with HEAD about 29% (29 out of 100 tables) of tables could find the consistent point by reading the WAL records up to the next subsequent COMMIT, whereas with V26 patch only 5% of tables could find the consistent point by reading the WAL records up to next subsequent commit. In these cases V26 patch had to read another transaction of approximately > 1000 WAL records to reach the consistent point which results in an increase of average for more records to be read with V26 version. For these I got the start lsn and consistent lsn from the log files by matching the corresponding FIND_DECODING_XLOG_RECORD_COUNT, I did a waldump of the WAL file and searched the records between start lsn and consistent LSN in the WAL dump and confirmed that only one COMMIT record had to be read to reach the consistent point. Details of this information from the log of HEAD and V26 is attached. The number of tables required to read less than 1 commit can be found by the following: -- I checked for 1000 WAL records because we are having 1000 inserts in each transaction. select count(counttime) from test_head where logtype ='FIND_DECODING_XLOG_RECORD_COUNT' and counttime < 1000; count ------- 29 (1 row) select count(counttime) from test_v26 where logtype ='FIND_DECODING_XLOG_RECORD_COUNT' and counttime < 1000; count ------- 5 (1 row) Apart from these there were other instances where the V26 had to read more COMMIT record in few cases. The above is happening because as mentioned in [2]. i.e. in HEAD the tablesync worker will be started after one commit, so we are able to create the consistent point before a new transaction is started in some cases. Create slot will be fastest if the tablesync worker is able to connect to the publisher and create a consistent point before the new transaction is started. The probability of this is better in HEAD for this scenario as the new tablesync worker is started after commit and the tablesync worker in HEAD has a better time window(because the current transaction has just started) before another new transaction is started. This probability is slightly lower with the V26 version. I felt this issue is purely a timing issue in a few cases because of the timing of the new transactions being created while creating the slot. Since this is purely a timing issue as explained above in a few cases because of the timing of the new transactions being created while creating the slot, I felt we can ignore this. [1] - https://www.postgresql.org/message-id/CALDaNm1TA068E2niJFUR9ig%2BYz3-ank%3Dj5%3Dj-2UocbzaDnQPrA%40mail.gmail.com [2] - https://www.postgresql.org/message-id/CALDaNm2k2z3Hpa3Omb_tpxWkyHnUvsWjJMbqDs-2uD2eLzemJQ%40mail.gmail.com Regards, Vignesh
HEAD ==================================================================================================== 1691513006.983 [170437] LOG: searching for logical decoding starting point, starting at 0/161D0C8 1691513007.357 [170437] LOG: logical decoding found consistent point at 0/162D140 1691513007.357 [170437] LOG: FIND_DECODING_XLOG_RECORD_COUNT 911 Inserted records prior to starting point = 98 Commit = 1 1691513006.996 [170439] LOG: searching for logical decoding starting point, starting at 0/161DF58 1691513007.361 [170439] LOG: logical decoding found consistent point at 0/162D140 1691513007.361 [170439] LOG: FIND_DECODING_XLOG_RECORD_COUNT 859 Inserted records prior to starting point = 150 Commit = 1 1691513007.694 [170447] LOG: searching for logical decoding starting point, starting at 0/1689830 1691513007.862 [170447] LOG: logical decoding found consistent point at 0/1697250 1691513007.863 [170447] LOG: FIND_DECODING_XLOG_RECORD_COUNT 775 Inserted records prior to starting point = 233 Commit = 1 1691513007.778 [170449] LOG: searching for logical decoding starting point, starting at 0/1696C48 - 41234, prev comit = 40249 1691513007.863 [170449] LOG: logical decoding found consistent point at 0/1697250 1691513007.863 [170449] LOG: FIND_DECODING_XLOG_RECORD_COUNT 23 Inserted records prior to starting point = 984 Commit = 1 1691513008.177 [170457] LOG: searching for logical decoding starting point, starting at 0/16E1DA0 1691513008.389 [170457] LOG: logical decoding found consistent point at 0/16EF840 1691513008.390 [170457] LOG: FIND_DECODING_XLOG_RECORD_COUNT 777 Inserted records prior to starting point = 230 Commit = 1 1691513008.183 [170455] LOG: searching for logical decoding starting point, starting at 0/16E3398 1691513008.389 [170455] LOG: logical decoding found consistent point at 0/16EF840 1691513008.390 [170455] LOG: FIND_DECODING_XLOG_RECORD_COUNT 699 Inserted records prior to starting point = 308 Commit = 1 1691513008.253 [170459] LOG: searching for logical decoding starting point, starting at 0/16EF478 - 46258, prev comit = 45266 1691513008.389 [170459] LOG: logical decoding found consistent point at 0/16EF840 1691513008.390 [170459] LOG: FIND_DECODING_XLOG_RECORD_COUNT 15 Inserted records prior to starting point = 992 Commit = 1 1691513009.724 [170489] LOG: searching for logical decoding starting point, starting at 0/1800778 1691513009.805 [170489] LOG: logical decoding found consistent point at 0/180A668 1691513009.805 [170489] LOG: FIND_DECODING_XLOG_RECORD_COUNT 565 Inserted records prior to starting point = 442 Commit = 1 1691513009.920 [170493] LOG: searching for logical decoding starting point, starting at 0/181CA80 1691513009.920 [170493] LOG: logical decoding found initial starting point at 0/181CB10 1691513010.025 [170493] LOG: FIND_DECODING_XLOG_RECORD_COUNT 973 Inserted records prior to starting point = 34 Commit = 1 1691513010.020 [170496] LOG: searching for logical decoding starting point, starting at 0/182BDD0 1691513010.026 [170496] LOG: logical decoding found consistent point at 0/182DC68 1691513010.026 [170496] LOG: FIND_DECODING_XLOG_RECORD_COUNT 110 Inserted records prior to starting point = 896 Commit = 1 1691513010.042 [170497] LOG: searching for logical decoding starting point, starting at 0/182DBF0 1691513010.042 [170497] LOG: logical decoding found consistent point at 0/182DC68 1691513010.042 [170497] LOG: FIND_DECODING_XLOG_RECORD_COUNT 3 Inserted records prior to starting point = 999 Commit = 1 1691513011.173 [170511] LOG: searching for logical decoding starting point, starting at 0/190BD08 1691513011.292 [170511] LOG: logical decoding found consistent point at 0/19136B0 1691513011.292 [170511] LOG: FIND_DECODING_XLOG_RECORD_COUNT 433 Inserted records prior to starting point = 569 Commit = 1 1691513011.183 [170513] LOG: searching for logical decoding starting point, starting at 0/190E170 1691513011.293 [170513] LOG: logical decoding found consistent point at 0/19136B0 1691513011.293 [170513] LOG: FIND_DECODING_XLOG_RECORD_COUNT 304 Inserted records prior to starting point = 699 Commit = 1 1691513011.163 [170509] LOG: searching for logical decoding starting point, starting at 0/190B548 1691513011.293 [170509] LOG: logical decoding found consistent point at 0/19136B0 1691513011.293 [170509] LOG: FIND_DECODING_XLOG_RECORD_COUNT 461 Inserted records prior to starting point = 543 Commit = 1 1691513011.594 [170518] LOG: searching for logical decoding starting point, starting at 0/1963050 1691513011.667 [170518] LOG: logical decoding found consistent point at 0/196BC30 1691513011.667 [170518] LOG: FIND_DECODING_XLOG_RECORD_COUNT 498 Inserted records prior to starting point = 505 Commit = 1 1691513011.624 [170521] LOG: searching for logical decoding starting point, starting at 0/1968D48 1691513011.667 [170521] LOG: logical decoding found consistent point at 0/196BC30 1691513011.667 [170521] LOG: FIND_DECODING_XLOG_RECORD_COUNT 168 Inserted records prior to starting point = 835 Commit = 1 1691513011.615 [170519] LOG: searching for logical decoding starting point, starting at 0/1966AA8 1691513011.667 [170519] LOG: logical decoding found consistent point at 0/196BC30 1691513011.667 [170519] LOG: FIND_DECODING_XLOG_RECORD_COUNT 291 Inserted records prior to starting point = 713 Commit = 1 1691513012.704 [170541] LOG: searching for logical decoding starting point, starting at 0/1A440A8 1691513012.771 [170541] LOG: logical decoding found consistent point at 0/1A518A8 1691513012.771 [170541] LOG: FIND_DECODING_XLOG_RECORD_COUNT 768 Inserted records prior to starting point = 236 Commit = 1 1691513012.737 [170543] LOG: searching for logical decoding starting point, starting at 0/1A4D760 1691513012.783 [170543] LOG: logical decoding found consistent point at 0/1A518A8 1691513012.783 [170543] LOG: FIND_DECODING_XLOG_RECORD_COUNT 233 Inserted records prior to starting point = 772 Commit = 1 1691513012.992 [170549] LOG: searching for logical decoding starting point, starting at 0/1A770E0 1691513013.083 [170549] LOG: logical decoding found consistent point at 0/1A86978 1691513013.083 [170549] LOG: FIND_DECODING_XLOG_RECORD_COUNT 883 Inserted records prior to starting point = 122 Commit = 1 1691513013.029 [170551] LOG: searching for logical decoding starting point, starting at 0/1A781C8 1691513013.095 [170551] LOG: logical decoding found consistent point at 0/1A86978 1691513013.111 [170551] LOG: FIND_DECODING_XLOG_RECORD_COUNT 823 Inserted records prior to starting point = 182 Commit = 1 1691513013.261 [170555] LOG: searching for logical decoding starting point, starting at 0/1AA8198 1691513013.407 [170555] LOG: logical decoding found consistent point at 0/1AA9F40 1691513013.407 [170555] LOG: FIND_DECODING_XLOG_RECORD_COUNT 107 Inserted records prior to starting point = 897 Commit = 1 1691513013.928 [170569] LOG: searching for logical decoding starting point, starting at 0/1B2EB30 1691513013.969 [170569] LOG: logical decoding found consistent point at 0/1B375C8 1691513013.969 [170569] LOG: FIND_DECODING_XLOG_RECORD_COUNT 493 Inserted records prior to starting point = 512 Commit = 1 1691513014.019 [170573] LOG: searching for logical decoding starting point, starting at 0/1B3D5C8 1691513014.115 [170573] LOG: logical decoding found consistent point at 0/1B490F0 1691513014.115 [170573] LOG: FIND_DECODING_XLOG_RECORD_COUNT 665 Inserted records prior to starting point = 339 Commit = 1 1691513014.009 [170571] LOG: searching for logical decoding starting point, starting at 0/1B393D8 1691513014.115 [170571] LOG: logical decoding found consistent point at 0/1B490F0 1691513014.115 [170571] LOG: FIND_DECODING_XLOG_RECORD_COUNT 899 Inserted records prior to starting point = 105 Commit = 1 1691513014.297 [170577] LOG: searching for logical decoding starting point, starting at 0/1B6EDD8 1691513014.546 [170577] LOG: logical decoding found consistent point at 0/1B7E1C0 1691513014.546 [170577] LOG: FIND_DECODING_XLOG_RECORD_COUNT 867 Inserted records prior to starting point = 138 Commit = 1 1691513014.376 [170583] LOG: searching for logical decoding starting point, starting at 0/1B795F8 1691513014.546 [170583] LOG: logical decoding found consistent point at 0/1B7E1C0 1691513014.546 [170583] LOG: FIND_DECODING_XLOG_RECORD_COUNT 270 Inserted records prior to starting point = 736 Commit = 1 1691513014.303 [170579] LOG: searching for logical decoding starting point, starting at 0/1B712A8 1691513014.303 [170579] LOG: logical decoding found initial starting point at 0/1B712A8 1691513014.547 [170579] LOG: FIND_DECODING_XLOG_RECORD_COUNT 736 Inserted records prior to starting point = 272 Commit = 1 1691513014.314 [170581] LOG: searching for logical decoding starting point, starting at 0/1B72150 1691513014.314 [170581] LOG: logical decoding found initial starting point at 0/1B72198 1691513014.547 [170581] LOG: FIND_DECODING_XLOG_RECORD_COUNT 684 Inserted records prior to starting point = 319 Commit = 1 V26 ======================================================================================================= 1691512940.792 [170167] LOG: searching for logical decoding starting point, starting at 0/1778270 1691512940.823 [170167] LOG: logical decoding found consistent point at 0/177CE20 1691512940.823 [170167] LOG: FIND_DECODING_XLOG_RECORD_COUNT 270 Inserted records prior to starting point = 734 Commit = 1 1691512944.308 [170206] LOG: searching for logical decoding starting point, starting at 0/1A329F8 1691512944.375 [170206] LOG: logical decoding found consistent point at 0/1A40230 1691512944.375 [170206] LOG: FIND_DECODING_XLOG_RECORD_COUNT 768 Inserted records prior to starting point = 282 Commit = 1 1691512946.684 [170228] LOG: searching for logical decoding starting point, starting at 0/1C03060 1691512946.750 [170228] LOG: logical decoding found consistent point at 0/1C0BB78 1691512946.750 [170228] LOG: FIND_DECODING_XLOG_RECORD_COUNT 495 Inserted records prior to starting point = 510 Commit = 1 1691512946.740 [170229] LOG: searching for logical decoding starting point, starting at 0/1C0B330 1691512946.750 [170229] LOG: logical decoding found consistent point at 0/1C0BB78 1691512946.750 [170229] LOG: FIND_DECODING_XLOG_RECORD_COUNT 31 Inserted records prior to starting point = 975 Commit = 1 1691512946.882 [170233] LOG: searching for logical decoding starting point, starting at 0/1C1E4B0 1691512946.987 [170233] LOG: logical decoding found consistent point at 0/1C2F270 1691512946.987 [170233] LOG: FIND_DECODING_XLOG_RECORD_COUNT 958 Inserted records prior to starting point = 48 Commit = 1