Re: logical decoding and replication of sequences, take 2

Tomas Vondra Tue, 05 Dec 2023 08:54:08 -0800

On 12/5/23 13:17, Amit Kapila wrote:
> ...
>> I was hopeful the global hash table would be an improvement, but that
>> doesn't seem to be the case. I haven't done much profiling yet, but I'd
>> guess most of the overhead is due to ReorderBufferQueueSequence()
>> starting and aborting a transaction in the non-transactinal case. Which
>> is unfortunate, but I don't know if there's a way to optimize that.
>>
> 
> Before discussing the alternative ideas you shared, let me try to
> clarify my understanding so that we are on the same page. I see two
> observations based on the testing and discussion we had (a) for
> non-transactional cases, the overhead observed is mainly due to
> starting/aborting a transaction for each change;


Yes, I believe that's true. See the attached profiles for nextval.sql
and nextval-40.sql from master and optimized build (with the global
hash), and also a perf-diff. I only include the top 1000 lines for each
profile, that should be enough.

master - current master without patches applied
optimized - master + sequence decoding with global hash table

For nextval, there's almost no difference in the profile. Decoding the
other changes (inserts) is the dominant part, as we only log sequences
every 32 increments.

For nextval-40, the main increase is likely due to this part

  |--11.09%--seq_decode
  |     |
  |     |--9.25%--ReorderBufferQueueSequence
  |     |     |
  |     |     |--3.56%--AbortCurrentTransaction
  |     |     |    |
  |     |     |     --3.53%--AbortSubTransaction
  |     |     |        |
  |     |     |        |--0.95%--AtSubAbort_Portals
  |     |     |        |          |
  |     |     |        |           --0.83%--hash_seq_search
  |     |     |        |
  |     |     |         --0.83%--ResourceOwnerReleaseInternal
  |     |     |
  |     |     |--2.06%--BeginInternalSubTransaction
  |     |     |          |
  |     |     |           --1.10%--CommitTransactionCommand
  |     |     |                     |
  |     |     |                      --1.07%--StartSubTransaction
  |     |     |
  |     |     |--1.28%--CleanupSubTransaction
  |     |     |          |
  |     |     |           --0.64%--AtSubCleanup_Portals
  |     |     |                     |
  |     |     |                      --0.55%--hash_seq_search
  |     |     |
  |     |      --0.67%--RelidByRelfilenumber

So yeah, that's the transaction stuff in ReorderBufferQueueSequence.

There's also per-diff, comparing individual functions.

> (b) for transactional
> cases, we see overhead due to traversing all the top-level txns and
> check the hash table for each one to find whether change is
> transactional.
> 

Not really, no. As I explained in my preceding e-mail, this check makes
almost no difference - I did expect it to matter, but it doesn't. And I
was a bit disappointed the global hash table didn't move the needle.

Most of the time is spent in

    78.81%     0.00%  postgres  postgres  [.] DecodeCommit (inlined)
      |
      ---DecodeCommit (inlined)
         |
         |--72.65%--SnapBuildCommitTxn
         |     |
         |      --72.61%--SnapBuildBuildSnapshot
         |            |
         |             --72.09%--pg_qsort
         |                    |
         |                    |--66.24%--pg_qsort
         |                    |          |

And there's almost no difference between master and build with sequence
decoding - see the attached diff-alter-sequence.perf, comparing the two
branches (perf diff -c delta-abs).


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

alter-sequence-master.perf.gz
Description: application/gzip

alter-sequence-optimized.perf.gz
Description: application/gzip

diff-alter-sequence.perf.gz
Description: application/gzip

diff-nextval.perf.gz
Description: application/gzip

diff-nextval-40.perf.gz
Description: application/gzip

nextval-40-master.perf.gz
Description: application/gzip

nextval-40-optimized.perf.gz
Description: application/gzip

nextval-master.perf.gz
Description: application/gzip

nextval-optimized.perf.gz
Description: application/gzip

Re: logical decoding and replication of sequences, take 2

Reply via email to