On 12/5/23 13:17, Amit Kapila wrote: > ... >> I was hopeful the global hash table would be an improvement, but that >> doesn't seem to be the case. I haven't done much profiling yet, but I'd >> guess most of the overhead is due to ReorderBufferQueueSequence() >> starting and aborting a transaction in the non-transactinal case. Which >> is unfortunate, but I don't know if there's a way to optimize that. >> > > Before discussing the alternative ideas you shared, let me try to > clarify my understanding so that we are on the same page. I see two > observations based on the testing and discussion we had (a) for > non-transactional cases, the overhead observed is mainly due to > starting/aborting a transaction for each change;
Yes, I believe that's true. See the attached profiles for nextval.sql and nextval-40.sql from master and optimized build (with the global hash), and also a perf-diff. I only include the top 1000 lines for each profile, that should be enough. master - current master without patches applied optimized - master + sequence decoding with global hash table For nextval, there's almost no difference in the profile. Decoding the other changes (inserts) is the dominant part, as we only log sequences every 32 increments. For nextval-40, the main increase is likely due to this part |--11.09%--seq_decode | | | |--9.25%--ReorderBufferQueueSequence | | | | | |--3.56%--AbortCurrentTransaction | | | | | | | --3.53%--AbortSubTransaction | | | | | | | |--0.95%--AtSubAbort_Portals | | | | | | | | | --0.83%--hash_seq_search | | | | | | | --0.83%--ResourceOwnerReleaseInternal | | | | | |--2.06%--BeginInternalSubTransaction | | | | | | | --1.10%--CommitTransactionCommand | | | | | | | --1.07%--StartSubTransaction | | | | | |--1.28%--CleanupSubTransaction | | | | | | | --0.64%--AtSubCleanup_Portals | | | | | | | --0.55%--hash_seq_search | | | | | --0.67%--RelidByRelfilenumber So yeah, that's the transaction stuff in ReorderBufferQueueSequence. There's also per-diff, comparing individual functions. > (b) for transactional > cases, we see overhead due to traversing all the top-level txns and > check the hash table for each one to find whether change is > transactional. > Not really, no. As I explained in my preceding e-mail, this check makes almost no difference - I did expect it to matter, but it doesn't. And I was a bit disappointed the global hash table didn't move the needle. Most of the time is spent in 78.81% 0.00% postgres postgres [.] DecodeCommit (inlined) | ---DecodeCommit (inlined) | |--72.65%--SnapBuildCommitTxn | | | --72.61%--SnapBuildBuildSnapshot | | | --72.09%--pg_qsort | | | |--66.24%--pg_qsort | | | And there's almost no difference between master and build with sequence decoding - see the attached diff-alter-sequence.perf, comparing the two branches (perf diff -c delta-abs). regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
alter-sequence-master.perf.gz
Description: application/gzip
alter-sequence-optimized.perf.gz
Description: application/gzip
diff-alter-sequence.perf.gz
Description: application/gzip
diff-nextval.perf.gz
Description: application/gzip
diff-nextval-40.perf.gz
Description: application/gzip
nextval-40-master.perf.gz
Description: application/gzip
nextval-40-optimized.perf.gz
Description: application/gzip
nextval-master.perf.gz
Description: application/gzip
nextval-optimized.perf.gz
Description: application/gzip