On 03/06/2018 06:23 AM, Yura Sokolov wrote: > 05.03.2018 18:00, Tom Lane пишет: >> Tomas Vondra <tomas.von...@2ndquadrant.com> writes: >>> Snapshots are static (we don't really add new XIDs into existing ones, >>> right?), so why don't we simply sort the XIDs and then use bsearch to >>> lookup values? That should fix the linear search, without need for any >>> local hash table. >> >> +1 for looking into that, since it would avoid adding any complication >> to snapshot copying. In principle, with enough XIDs in the snap, an >> O(1) hash probe would be quicker than an O(log N) bsearch ... but I'm >> dubious that we are often in the range where that would matter. >> We do need to worry about the cost of snapshot copying, too. > > Sorting - is the first thing I've tried. Unfortunately, sorting > itself eats too much cpu. Filling hash table is much faster. >
I've been interested in the sort-based approach, so I've spent a bit of time hacking on it (patch attached). It's much less invasive compared to the hash-table, but Yura is right the hashtable gives better results. I've tried to measure the benefits using the same script I shared on Tuesday, but I kept getting really strange numbers. In fact, I've been unable to even reproduce the results I shared back then. And after a bit of head-scratching I think the script is useless - it can't possibly generate snapshots with many XIDs because all the update threads sleep for exactly the same time. And first one to sleep is first one to wake up and commit, so most of the other XIDs are above xmax (and thus not included in the snapshot). I have no idea why I got the numbers :-/ But with this insight, I quickly modified the script to also consume XIDs by another thread (which simply calls txid_current). With that I'm getting snapshots with as many XIDs as there are UPDATE clients (this time I actually checked that using gdb). And for a 60-second run the tps results look like this (see the attached chart, showing the same data): writers master hash sort ----------------------------------------- 16 1068 1057 1070 32 1005 995 1033 64 1064 1042 1117 128 1058 1021 1051 256 977 1004 928 512 773 935 808 768 576 815 670 1000 413 752 573 The sort certainly does improve performance compared to master, but it's only about half of the hashtable improvement. I don't know how much this simple workload resembles the YCSB benchmark, but I seem to recall it's touching individual rows. In which case it's likely worse due to the pg_qsort being more expensive than building the hash table. So I agree with Yura the sort is not a viable alternative to the hash table, in this case. > Can I rely on snapshots being static? May be then there is no need > in separate raw representation and hash table. I may try to fill hash > table directly in GetSnapshotData instead of lazy filling. Though it > will increase time under lock, so it is debatable and should be > carefully measured. > Yes, I think you can rely on snapshots not being modified later. That's pretty much the definition of a snapshot. You may do that in GetSnapshotData, but you certainly can't do that in the for loop there. Not only we don't want to add more work there, but you don't know the number of XIDs in that loop. And we don't want to build hash tables for small number of XIDs. One reason against building the hash table in GetSnapshotData is that we'd build it even when the snapshot is never queried. Or when it is queried, but we only need to check xmin/xmax. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
diff --git a/src/backend/storage/ipc/procarray.c b/src/backend/storage/ipc/procarray.c index afe1c03..b2dae85 100644 --- a/src/backend/storage/ipc/procarray.c +++ b/src/backend/storage/ipc/procarray.c @@ -1758,6 +1758,8 @@ GetSnapshotData(Snapshot snapshot) snapshot->active_count = 0; snapshot->regd_count = 0; snapshot->copied = false; + snapshot->xip_sorted = false; + snapshot->subxip_sorted = false; if (old_snapshot_threshold < 0) { diff --git a/src/backend/utils/time/tqual.c b/src/backend/utils/time/tqual.c index f7c4c91..f155267 100644 --- a/src/backend/utils/time/tqual.c +++ b/src/backend/utils/time/tqual.c @@ -1461,6 +1461,45 @@ HeapTupleIsSurelyDead(HeapTuple htup, TransactionId OldestXmin) return TransactionIdPrecedes(HeapTupleHeaderGetRawXmax(tuple), OldestXmin); } +static int +cmp_transaction_id(const void *a, const void *b) +{ + return memcmp(a, b, sizeof(TransactionId)); +} + +/* + * XidInXip searches xid in xip array. + * + * When xcnt is smaller than SnapshotSortThreshold, then the array is unsorted + * and we can simply do a linear search. If there are more elements, the array + * is sorted, and we can do a bsearch. + */ +static bool +XidInXip(TransactionId xid, TransactionId *xip, uint32 xcnt, bool *sorted) +{ + if (!TransactionIdIsNormal(xid)) + return false; + + if (xcnt < SnapshotSortThreshold) + { + uint32 i; + + /* linear scan for small snapshots */ + for (i = 0; i < xcnt; i++) + if (TransactionIdEquals(xid, xip[i])) + return true; + return false; + } + + if (!(*sorted)) + { + pg_qsort(xip, xcnt, sizeof(TransactionId), cmp_transaction_id); + *sorted = true; + } + + return (bsearch(&xid, xip, xcnt, sizeof(TransactionId), cmp_transaction_id) != NULL); +} + /* * XidInMVCCSnapshot * Is the given XID still-in-progress according to the snapshot? @@ -1474,8 +1513,6 @@ HeapTupleIsSurelyDead(HeapTuple htup, TransactionId OldestXmin) bool XidInMVCCSnapshot(TransactionId xid, Snapshot snapshot) { - uint32 i; - /* * Make a quick range check to eliminate most XIDs without looking at the * xip arrays. Note that this is OK even if we convert a subxact XID to @@ -1507,13 +1544,8 @@ XidInMVCCSnapshot(TransactionId xid, Snapshot snapshot) if (!snapshot->suboverflowed) { /* we have full data, so search subxip */ - int32 j; - - for (j = 0; j < snapshot->subxcnt; j++) - { - if (TransactionIdEquals(xid, snapshot->subxip[j])) - return true; - } + if (XidInXip(xid, snapshot->subxip, snapshot->subxcnt, &snapshot->subxip_sorted)) + return true; /* not there, fall through to search xip[] */ } @@ -1534,16 +1566,12 @@ XidInMVCCSnapshot(TransactionId xid, Snapshot snapshot) return false; } - for (i = 0; i < snapshot->xcnt; i++) - { - if (TransactionIdEquals(xid, snapshot->xip[i])) - return true; - } + + if (XidInXip(xid, snapshot->xip, snapshot->xcnt, &snapshot->xip_sorted)) + return true; } else { - int32 j; - /* * In recovery we store all xids in the subxact array because it is by * far the bigger array, and we mostly don't know which xids are @@ -1573,11 +1601,8 @@ XidInMVCCSnapshot(TransactionId xid, Snapshot snapshot) * indeterminate xid. We don't know whether it's top level or subxact * but it doesn't matter. If it's present, the xid is visible. */ - for (j = 0; j < snapshot->subxcnt; j++) - { - if (TransactionIdEquals(xid, snapshot->subxip[j])) - return true; - } + if (XidInXip(xid, snapshot->subxip, snapshot->subxcnt, &snapshot->subxip_sorted)) + return true; } return false; diff --git a/src/include/utils/snapshot.h b/src/include/utils/snapshot.h index a8a5a8f..85301c7 100644 --- a/src/include/utils/snapshot.h +++ b/src/include/utils/snapshot.h @@ -25,6 +25,11 @@ typedef struct SnapshotData *Snapshot; #define InvalidSnapshot ((Snapshot) NULL) /* + * SnapshotSortThreshold ... TODO comment + */ +#define SnapshotSortThreshold 60 + +/* * We use SnapshotData structures to represent both "regular" (MVCC) * snapshots and "special" snapshots that have non-MVCC semantics. * The specific semantics of a snapshot are encoded by the "satisfies" @@ -78,6 +83,7 @@ typedef struct SnapshotData */ TransactionId *xip; uint32 xcnt; /* # of xact ids in xip[] */ + bool xip_sorted; /* is xip array sorted */ /* * For non-historic MVCC snapshots, this contains subxact IDs that are in @@ -94,6 +100,7 @@ typedef struct SnapshotData bool takenDuringRecovery; /* recovery-shaped snapshot? */ bool copied; /* false if it's a static snapshot */ + bool subxip_sorted; /* is subxip array sorted */ CommandId curcid; /* in my xact, CID < curcid are visible */
xid-stats.tgz
Description: application/compressed-tar