On Fri, 11 Aug 2023 at 16:26, vignesh C <vignes...@gmail.com> wrote:
>
> On Wed, 9 Aug 2023 at 09:51, vignesh C <vignes...@gmail.com> wrote:
> >
> > Hi Melih,
> >
> > Here is a patch to help in getting the execution at various phases
> > like: a) replication slot creation time, b) Wal reading c) Number of
> > WAL records read d) subscription relation state change etc
> > Couple of observation while we tested with this patch:
> > 1) We noticed that the patch takes more time for finding the decoding
> > start point.
> > 2) Another observation was that the number of XLOG records read for
> > identify the consistent point was significantly high with the v26_0001
> > patch.
> >
> > HEAD
> > postgres=# select avg(counttime)/1000 "avgtime(ms)",
> > median(counttime)/1000 "median(ms)", min(counttime)/1000
> > "mintime(ms)", max(counttime)/1000 "maxtime(ms)", logtype from test
> > group by logtype;
> >       avgtime(ms)       |       median(ms)       | mintime(ms) |
> > maxtime(ms) |         logtype
> > ------------------------+------------------------+-------------+-------------+--------------------------
> >  0.00579245283018867920 | 0.00200000000000000000 |           0 |
> >     1 | SNAPSHOT_BUILD
> >      1.2246811320754717 | 0.98550000000000000000 |           0 |
> >    37 | LOGICAL_SLOT_CREATION
> >    171.0863283018867920 |   183.9120000000000000 |           0 |
> >   408 | FIND_DECODING_STARTPOINT
> >      2.0699433962264151 |     1.4380000000000000 |           1 |
> >    49 | INIT_DECODING_CONTEXT
> > (4 rows)
> >
> > HEAD + v26-0001 patch
> > postgres=# select avg(counttime)/1000 "avgtime(ms)",
> > median(counttime)/1000 "median(ms)", min(counttime)/1000
> > "mintime(ms)", max(counttime)/1000 "maxtime(ms)", logtype from test
> > group by logtype;
> >       avgtime(ms)       |       median(ms)       | mintime(ms) |
> > maxtime(ms) |         logtype
> > ------------------------+------------------------+-------------+-------------+--------------------------
> >  0.00588113207547169810 | 0.00500000000000000000 |           0 |
> >     0 | SNAPSHOT_BUILD
> >      1.1270962264150943 |     1.1000000000000000 |           0 |
> >     2 | LOGICAL_SLOT_CREATION
> >    301.1745528301886790 |   410.4870000000000000 |           0 |
> >   427 | FIND_DECODING_STARTPOINT
> >      1.4814660377358491 |     1.4530000000000000 |           1 |
> >     9 | INIT_DECODING_CONTEXT
> > (4 rows)
> >
> > In the above FIND_DECODING_STARTPOINT is very much higher with V26-0001 
> > patch.
> >
> > HEAD
> > FIND_DECODING_XLOG_RECORD_COUNT
> > - average =  2762
> > - median = 3362
> >
> > HEAD + reuse worker patch(v26_0001 patch)
> > Where FIND_DECODING_XLOG_RECORD_COUNT
> > - average =  4105
> > - median = 5345
> >
> > Similarly Number of xlog records read is higher with v26_0001 patch.
> >
> > Steps to calculate the timing:
> > -- first collect the necessary LOG from subscriber's log.
> > cat *.log | grep -E
> > '(LOGICAL_SLOT_CREATION|INIT_DECODING_CONTEXT|FIND_DECODING_STARTPOINT|SNAPSHOT_BUILD|FIND_DECODING_XLOG_RECORD_COUNT|LOGICAL_XLOG_READ|LOGICAL_DECODE_PROCESS_RECORD|LOGICAL_WAIT_TRANSACTION)'
> > > grep.dat
> >
> > create table testv26(logtime varchar, pid varchar, level varchar,
> > space varchar, logtype varchar, counttime int);
> > -- then copy these datas into db table to count the avg number.
> > COPY testv26 FROM '/home/logs/grep.dat' DELIMITER ' ';
> >
> > -- Finally, use the SQL to analyze the data:
> > select avg(counttime)/1000 "avgtime(ms)", logtype from testv26 group by 
> > logtype;
> >
> > --- To get the number of xlog records read:
> > select avg(counttime) from testv26 where logtype
> > ='FIND_DECODING_XLOG_RECORD_COUNT' and counttime != 1;
> >
> > Thanks to Peter and Hou-san who helped in finding these out. We are
> > parallely analysing this, @Melih Mutlu  posting this information so
> > that it might help you too in analysing this issue.
>
> I analysed further on why it needs to read a larger number of XLOG
> records in some cases while creating the replication slot, here are my
> thoughts:
> Note: Tablesync worker needs to connect to the publisher and create
> consistent point for the slots by reading the XLOG records. This
> requires that all the open transactions and the transactions that are
> created while creating consistent point should be committed.
> I feel the creation of slots is better in few cases in Head because:
> Publisher                        | Subscriber
> ------------------------------------------------------------
> Begin txn1 transaction    |
> Insert 1..1000 records    |
> Commit                           |
> Begin txn2 transaction    |
> Insert 1..1000 records     |  Apply worker applies transaction txn1
>                                         |  Start tablesync table t2
>                                         |  create consistent point in
>                                         | publisher before transaction txn3 is
>                                         | started
> commit                            |  We just need to wait till
>                                         | transaction txn2 is finished.
> Begin txn3 transaction    |
> Insert 1..1000 records     |
> commit                            |
>
> In V26, this is happening in some cases:
> Publisher                        | Subscriber
> ------------------------------------------------------------
> Begin txn1 transaction    |
> Insert 1..1000 records    |
> Commit                           |
> Begin txn2 transaction    |
> Insert 1..1000 records     |  Apply worker applies transaction txn1
>                                         |  Start tablesync table t2
> commit                             |  Create consistent point
> Begin txn3 transaction     |  (since transaction txn2 is committed
>                                         |  and txn3 is started, we will
>                                         |  need to wait
>                                         |  for transaction txn3 to be
>                                         |  committed)
> Insert 1..1000 records     |
> commit                             |
>
> This is because In HEAD the tablesync worker will be started after one
> commit, so we are able to create the consistent point before a new
> transaction is started in some cases.
> Create slot will be fastest if the tablesync worker is able to connect
> to the publisher and create a consistent point before the new
> transaction is started. The probability of this is better in HEAD for
> this scenario as the new tablesync worker is started after commit and
> the tablesync worker in HEAD has a better time window(because the
> current transaction has just started) before another new transaction
> is started. This probability is slightly lower with the V26 version.
> I felt this issue is purely a timing issue in a few cases because of
> the timing of the new transactions being created while creating the
> slot.

I used the following steps to analyse this issue:
Logs can be captured by applying the patches at [1].

-- first collect the necessary information about from publisher's log
from the execution of HEAD:
cat *.log | grep FIND_DECODING_XLOG_RECORD_COUNT > grep_head.dat

-- first collect the necessary information about from publisher's log
from the execution of v26:
cat *.log | grep FIND_DECODING_XLOG_RECORD_COUNT > grep_v26.dat

-- then copy these datas into HEAD's db table to count the avg number.
COPY test_head FROM '/home/logs/grep_head.dat' DELIMITER ' ';

-- then copy these datas into the v26 db table to count the avg number.
COPY test_v26 FROM '/home/logs/grep_v26.dat' DELIMITER ' ';

Find the average of XLOG records read in HEAD:
postgres=# select avg(counttime) from test_head  where logtype
='FIND_DECODING_XLOG_RECORD_COUNT' and counttime != 1;
          avg
-----------------------
 1394.1100000000000000
(1 row)

Find the average of XLOG records read in V26:
postgres=# select avg(counttime) from test_v26 where logtype
='FIND_DECODING_XLOG_RECORD_COUNT' and counttime != 1;
          avg
-----------------------
 1900.4100000000000000
(1 row)

When analysing why create replication slot needs to read more records
in a few cases, I found a very interesting observation. I found that
with HEAD about 29% (29 out of 100 tables) of tables could find the
consistent point by reading the WAL records up to the next subsequent
COMMIT, whereas with V26 patch only 5% of tables could find the
consistent point by reading the WAL records up to next subsequent
commit. In these cases V26 patch had to read another transaction of
approximately > 1000 WAL records to reach the consistent point which
results in an increase of average for more records to be read with V26
version. For these I got the start lsn and consistent lsn from the log
files by matching the corresponding FIND_DECODING_XLOG_RECORD_COUNT, I
did a waldump of the WAL file and searched the records between start
lsn and consistent LSN in the WAL dump and confirmed that only one
COMMIT record had to be read to reach the consistent point. Details of
this information from the log of HEAD and V26 is attached.

The number of tables required to read less than 1 commit can be found
by the following:
-- I checked for 1000 WAL records because we are having 1000 inserts
in each transaction.
select count(counttime) from test_head  where logtype
='FIND_DECODING_XLOG_RECORD_COUNT' and counttime < 1000;
 count
-------
    29
(1 row)

select count(counttime) from test_v26  where logtype
='FIND_DECODING_XLOG_RECORD_COUNT' and counttime < 1000;
 count
-------
    5
(1 row)

Apart from these there were other instances where the V26 had to read
more COMMIT record in few cases.
The above is happening because as mentioned in [2]. i.e. in HEAD the
tablesync worker will be started after one commit, so we are able to
create the consistent point before a new transaction is started in
some cases. Create slot will be fastest if the tablesync worker is
able to connect to the publisher and create a consistent point before
the new transaction is started. The probability of this is better in
HEAD for this scenario as the new tablesync worker is started after
commit and the tablesync worker in HEAD has a better time
window(because the current transaction has just started) before
another new transaction  is started. This probability is slightly
lower with the V26 version. I felt this issue is purely a timing issue
in a few cases because of the timing of the new transactions being
created while creating the slot.
Since this is purely a timing issue as explained above in a few cases
because of the timing of the new transactions being created while
creating the slot, I felt we can ignore this.

[1] - 
https://www.postgresql.org/message-id/CALDaNm1TA068E2niJFUR9ig%2BYz3-ank%3Dj5%3Dj-2UocbzaDnQPrA%40mail.gmail.com
[2] - 
https://www.postgresql.org/message-id/CALDaNm2k2z3Hpa3Omb_tpxWkyHnUvsWjJMbqDs-2uD2eLzemJQ%40mail.gmail.com

Regards,
Vignesh
HEAD
====================================================================================================
1691513006.983 [170437] LOG:  searching for logical decoding starting point, 
starting at 0/161D0C8
1691513007.357 [170437] LOG:  logical decoding found consistent point at 
0/162D140
1691513007.357 [170437] LOG:  FIND_DECODING_XLOG_RECORD_COUNT 911
Inserted records prior to starting point = 98 
Commit = 1

1691513006.996 [170439] LOG:  searching for logical decoding starting point, 
starting at 0/161DF58
1691513007.361 [170439] LOG:  logical decoding found consistent point at 
0/162D140
1691513007.361 [170439] LOG:  FIND_DECODING_XLOG_RECORD_COUNT 859
Inserted records prior to starting point = 150 
Commit = 1

1691513007.694 [170447] LOG:  searching for logical decoding starting point, 
starting at 0/1689830
1691513007.862 [170447] LOG:  logical decoding found consistent point at 
0/1697250
1691513007.863 [170447] LOG:  FIND_DECODING_XLOG_RECORD_COUNT 775
Inserted records prior to starting point = 233 
Commit = 1

1691513007.778 [170449] LOG:  searching for logical decoding starting point, 
starting at 0/1696C48 - 41234, prev comit = 40249
1691513007.863 [170449] LOG:  logical decoding found consistent point at 
0/1697250
1691513007.863 [170449] LOG:  FIND_DECODING_XLOG_RECORD_COUNT 23 
Inserted records prior to starting point = 984 
Commit = 1


1691513008.177 [170457] LOG:  searching for logical decoding starting point, 
starting at 0/16E1DA0
1691513008.389 [170457] LOG:  logical decoding found consistent point at 
0/16EF840
1691513008.390 [170457] LOG:  FIND_DECODING_XLOG_RECORD_COUNT 777
Inserted records prior to starting point = 230 
Commit = 1

1691513008.183 [170455] LOG:  searching for logical decoding starting point, 
starting at 0/16E3398
1691513008.389 [170455] LOG:  logical decoding found consistent point at 
0/16EF840
1691513008.390 [170455] LOG:  FIND_DECODING_XLOG_RECORD_COUNT 699
Inserted records prior to starting point = 308 
Commit = 1

1691513008.253 [170459] LOG:  searching for logical decoding starting point, 
starting at 0/16EF478 - 46258, prev comit = 45266
1691513008.389 [170459] LOG:  logical decoding found consistent point at 
0/16EF840
1691513008.390 [170459] LOG:  FIND_DECODING_XLOG_RECORD_COUNT 15 
Inserted records prior to starting point = 992 
Commit = 1

1691513009.724 [170489] LOG:  searching for logical decoding starting point, 
starting at 0/1800778
1691513009.805 [170489] LOG:  logical decoding found consistent point at 
0/180A668
1691513009.805 [170489] LOG:  FIND_DECODING_XLOG_RECORD_COUNT 565
Inserted records prior to starting point = 442 
Commit = 1

1691513009.920 [170493] LOG:  searching for logical decoding starting point, 
starting at 0/181CA80
1691513009.920 [170493] LOG:  logical decoding found initial starting point at 
0/181CB10
1691513010.025 [170493] LOG:  FIND_DECODING_XLOG_RECORD_COUNT 973
Inserted records prior to starting point = 34
Commit = 1

1691513010.020 [170496] LOG:  searching for logical decoding starting point, 
starting at 0/182BDD0
1691513010.026 [170496] LOG:  logical decoding found consistent point at 
0/182DC68
1691513010.026 [170496] LOG:  FIND_DECODING_XLOG_RECORD_COUNT 110
Inserted records prior to starting point = 896
Commit = 1

1691513010.042 [170497] LOG:  searching for logical decoding starting point, 
starting at 0/182DBF0
1691513010.042 [170497] LOG:  logical decoding found consistent point at 
0/182DC68
1691513010.042 [170497] LOG:  FIND_DECODING_XLOG_RECORD_COUNT 3 
Inserted records prior to starting point = 999 
Commit = 1

1691513011.173 [170511] LOG:  searching for logical decoding starting point, 
starting at 0/190BD08
1691513011.292 [170511] LOG:  logical decoding found consistent point at 
0/19136B0
1691513011.292 [170511] LOG:  FIND_DECODING_XLOG_RECORD_COUNT 433
Inserted records prior to starting point = 569 
Commit = 1

1691513011.183 [170513] LOG:  searching for logical decoding starting point, 
starting at 0/190E170
1691513011.293 [170513] LOG:  logical decoding found consistent point at 
0/19136B0
1691513011.293 [170513] LOG:  FIND_DECODING_XLOG_RECORD_COUNT 304
Inserted records prior to starting point = 699 
Commit = 1

1691513011.163 [170509] LOG:  searching for logical decoding starting point, 
starting at 0/190B548
1691513011.293 [170509] LOG:  logical decoding found consistent point at 
0/19136B0
1691513011.293 [170509] LOG:  FIND_DECODING_XLOG_RECORD_COUNT 461
Inserted records prior to starting point = 543 
Commit = 1

1691513011.594 [170518] LOG:  searching for logical decoding starting point, 
starting at 0/1963050
1691513011.667 [170518] LOG:  logical decoding found consistent point at 
0/196BC30
1691513011.667 [170518] LOG:  FIND_DECODING_XLOG_RECORD_COUNT 498
Inserted records prior to starting point = 505 
Commit = 1

1691513011.624 [170521] LOG:  searching for logical decoding starting point, 
starting at 0/1968D48
1691513011.667 [170521] LOG:  logical decoding found consistent point at 
0/196BC30
1691513011.667 [170521] LOG:  FIND_DECODING_XLOG_RECORD_COUNT 168
Inserted records prior to starting point = 835 
Commit = 1

1691513011.615 [170519] LOG:  searching for logical decoding starting point, 
starting at 0/1966AA8
1691513011.667 [170519] LOG:  logical decoding found consistent point at 
0/196BC30
1691513011.667 [170519] LOG:  FIND_DECODING_XLOG_RECORD_COUNT 291
Inserted records prior to starting point = 713 
Commit = 1

1691513012.704 [170541] LOG:  searching for logical decoding starting point, 
starting at 0/1A440A8
1691513012.771 [170541] LOG:  logical decoding found consistent point at 
0/1A518A8
1691513012.771 [170541] LOG:  FIND_DECODING_XLOG_RECORD_COUNT 768
Inserted records prior to starting point = 236 
Commit = 1

1691513012.737 [170543] LOG:  searching for logical decoding starting point, 
starting at 0/1A4D760
1691513012.783 [170543] LOG:  logical decoding found consistent point at 
0/1A518A8
1691513012.783 [170543] LOG:  FIND_DECODING_XLOG_RECORD_COUNT 233
Inserted records prior to starting point = 772 
Commit = 1

1691513012.992 [170549] LOG:  searching for logical decoding starting point, 
starting at 0/1A770E0
1691513013.083 [170549] LOG:  logical decoding found consistent point at 
0/1A86978
1691513013.083 [170549] LOG:  FIND_DECODING_XLOG_RECORD_COUNT 883
Inserted records prior to starting point = 122 
Commit = 1

1691513013.029 [170551] LOG:  searching for logical decoding starting point, 
starting at 0/1A781C8
1691513013.095 [170551] LOG:  logical decoding found consistent point at 
0/1A86978
1691513013.111 [170551] LOG:  FIND_DECODING_XLOG_RECORD_COUNT 823
Inserted records prior to starting point = 182
Commit = 1

1691513013.261 [170555] LOG:  searching for logical decoding starting point, 
starting at 0/1AA8198
1691513013.407 [170555] LOG:  logical decoding found consistent point at 
0/1AA9F40
1691513013.407 [170555] LOG:  FIND_DECODING_XLOG_RECORD_COUNT 107
Inserted records prior to starting point = 897
Commit = 1

1691513013.928 [170569] LOG:  searching for logical decoding starting point, 
starting at 0/1B2EB30
1691513013.969 [170569] LOG:  logical decoding found consistent point at 
0/1B375C8
1691513013.969 [170569] LOG:  FIND_DECODING_XLOG_RECORD_COUNT 493
Inserted records prior to starting point = 512
Commit = 1

1691513014.019 [170573] LOG:  searching for logical decoding starting point, 
starting at 0/1B3D5C8
1691513014.115 [170573] LOG:  logical decoding found consistent point at 
0/1B490F0
1691513014.115 [170573] LOG:  FIND_DECODING_XLOG_RECORD_COUNT 665
Inserted records prior to starting point = 339
Commit = 1

1691513014.009 [170571] LOG:  searching for logical decoding starting point, 
starting at 0/1B393D8
1691513014.115 [170571] LOG:  logical decoding found consistent point at 
0/1B490F0
1691513014.115 [170571] LOG:  FIND_DECODING_XLOG_RECORD_COUNT 899
Inserted records prior to starting point = 105
Commit = 1

1691513014.297 [170577] LOG:  searching for logical decoding starting point, 
starting at 0/1B6EDD8
1691513014.546 [170577] LOG:  logical decoding found consistent point at 
0/1B7E1C0
1691513014.546 [170577] LOG:  FIND_DECODING_XLOG_RECORD_COUNT 867
Inserted records prior to starting point = 138
Commit = 1

1691513014.376 [170583] LOG:  searching for logical decoding starting point, 
starting at 0/1B795F8
1691513014.546 [170583] LOG:  logical decoding found consistent point at 
0/1B7E1C0
1691513014.546 [170583] LOG:  FIND_DECODING_XLOG_RECORD_COUNT 270
Inserted records prior to starting point = 736
Commit = 1

1691513014.303 [170579] LOG:  searching for logical decoding starting point, 
starting at 0/1B712A8
1691513014.303 [170579] LOG:  logical decoding found initial starting point at 
0/1B712A8
1691513014.547 [170579] LOG:  FIND_DECODING_XLOG_RECORD_COUNT 736
Inserted records prior to starting point = 272
Commit = 1

1691513014.314 [170581] LOG:  searching for logical decoding starting point, 
starting at 0/1B72150
1691513014.314 [170581] LOG:  logical decoding found initial starting point at 
0/1B72198
1691513014.547 [170581] LOG:  FIND_DECODING_XLOG_RECORD_COUNT 684
Inserted records prior to starting point = 319
Commit = 1

V26
=======================================================================================================
1691512940.792 [170167] LOG:  searching for logical decoding starting point, 
starting at 0/1778270
1691512940.823 [170167] LOG:  logical decoding found consistent point at 
0/177CE20
1691512940.823 [170167] LOG:  FIND_DECODING_XLOG_RECORD_COUNT 270
Inserted records prior to starting point = 734
Commit = 1

1691512944.308 [170206] LOG:  searching for logical decoding starting point, 
starting at 0/1A329F8
1691512944.375 [170206] LOG:  logical decoding found consistent point at 
0/1A40230
1691512944.375 [170206] LOG:  FIND_DECODING_XLOG_RECORD_COUNT 768
Inserted records prior to starting point = 282
Commit = 1

1691512946.684 [170228] LOG:  searching for logical decoding starting point, 
starting at 0/1C03060
1691512946.750 [170228] LOG:  logical decoding found consistent point at 
0/1C0BB78
1691512946.750 [170228] LOG:  FIND_DECODING_XLOG_RECORD_COUNT 495
Inserted records prior to starting point = 510
Commit = 1

1691512946.740 [170229] LOG:  searching for logical decoding starting point, 
starting at 0/1C0B330
1691512946.750 [170229] LOG:  logical decoding found consistent point at 
0/1C0BB78
1691512946.750 [170229] LOG:  FIND_DECODING_XLOG_RECORD_COUNT 31
Inserted records prior to starting point = 975
Commit = 1

1691512946.882 [170233] LOG:  searching for logical decoding starting point, 
starting at 0/1C1E4B0
1691512946.987 [170233] LOG:  logical decoding found consistent point at 
0/1C2F270
1691512946.987 [170233] LOG:  FIND_DECODING_XLOG_RECORD_COUNT 958
Inserted records prior to starting point = 48
Commit = 1

Reply via email to