Re: [HACKERS] Lazy hash table for XidInMVCCSnapshot (helps Zipfian a bit)
>On Sun, Nov 4, 2018 at 1:27 PM Dmitry Dolgov <9erthali...@gmail.com> wrote: > > > On Sun, 1 Apr 2018 at 19:58, Yura Sokolov wrote: > > > > I didn't change serialized format. Therefore is no need to change > > SerializeSnapshot. > > But in-memory representation were changed, so RestoreSnapshot is changed. > > This patch went through the last tree commit fests without any noticeable > activity, but cfbot says it still applies and doesn't break any tests. Taking > into account potential performance improvements, I believe it would be a pity > to stop at this point. > > Yura, what're your plans about it? If I understand correctly, there are still > some commentaries, that were not answered from the last few messages. At the > same time can anyone from active reviewers (Tomas, Amit) look at it to agree > on > what should be done to push it forward? Due to lack of response I'm marking it as "Returned with feedback". Feel free to resubmit a new version though.
Re: [HACKERS] Lazy hash table for XidInMVCCSnapshot (helps Zipfian a bit)
> On Sun, 1 Apr 2018 at 19:58, Yura Sokolov wrote: > > I didn't change serialized format. Therefore is no need to change > SerializeSnapshot. > But in-memory representation were changed, so RestoreSnapshot is changed. This patch went through the last tree commit fests without any noticeable activity, but cfbot says it still applies and doesn't break any tests. Taking into account potential performance improvements, I believe it would be a pity to stop at this point. Yura, what're your plans about it? If I understand correctly, there are still some commentaries, that were not answered from the last few messages. At the same time can anyone from active reviewers (Tomas, Amit) look at it to agree on what should be done to push it forward?
Re: [HACKERS] Lazy hash table for XidInMVCCSnapshot (helps Zipfian a bit)
23.03.2018 17:59, Amit Kapila пишет: > On Sat, Mar 10, 2018 at 7:41 AM, Yura Sokolovwrote: >> 08.03.2018 03:42, Tomas Vondra пишет: >>> One reason against building the hash table in GetSnapshotData is that >>> we'd build it even when the snapshot is never queried. Or when it is >>> queried, but we only need to check xmin/xmax. >> >> Thank you for analyze, Tomas. >> >> Stephen is right about bug in snapmgr.c >> Attached version fixes bug, and also simplifies XidInXip a bit. >> > > @@ -2167,8 +2175,7 @@ RestoreSnapshot(char *start_address) > /* Copy SubXIDs, if present. */ > if (serialized_snapshot.subxcnt > 0) > { > - snapshot->subxip = ((TransactionId *) (snapshot + 1)) + > - serialized_snapshot.xcnt; > + snapshot->subxip = ((TransactionId *) (snapshot + 1)) + xcnt; > memcpy(snapshot->subxip, serialized_xids + serialized_snapshot.xcnt, > serialized_snapshot.subxcnt * sizeof(TransactionId)); > } > > > It is not clear why you want to change this in RestoreSnapshot when > nothing related is changed in SerializeSnapshot? Can you please add > some comments to clarify it? > I didn't change serialized format. Therefore is no need to change SerializeSnapshot. But in-memory representation were changed, so RestoreSnapshot is changed. With regards, Sokolov Yura.
Re: [HACKERS] Lazy hash table for XidInMVCCSnapshot (helps Zipfian a bit)
17.03.2018 03:36, Tomas Vondra пишет: > > On 03/17/2018 12:03 AM, Yura Sokolov wrote: >> 16.03.2018 04:23, Tomas Vondra пишет: >>> >>> ... >>> >>> OK, a few more comments. >>> >>> 1) The code in ExtendXipSizeForHash seems somewhat redundant with >>> my_log2 (that is, we could just call the existing function). >> >> Yes, I could call my_log2 from ExtendXipSizeForHash. But wouldn't one >> more call be more expensive than loop itself? >> > > I very much doubt it there would be a measurable difference. Firstly, > function calls are not that expensive, otherwise we'd be running around > and removing function calls from the hot paths. Secondly, the call only > happens with many XIDs, and in that case the cost should be out-weighted > by faster lookups. And finally, the function does stuff that seems far > more expensive than a single function call (for example allocation, > which may easily trigger a malloc). > > In fact, in the interesting cases it's pretty much guaranteed to hit a > malloc, because the number of backend processes needs to be high enough > (say, 256 or more), which means > > GetMaxSnapshotSubxidCount() > > will translate to something like > > ((PGPROC_MAX_CACHED_SUBXIDS + 1) * PROCARRAY_MAXPROCS) > = (64 + 1) * 256 > = 16640 > > and because XIDs are 4B each, that's ~65kB of memory (even ignoring the > ExtendXipSizeForHash business). And aset.c treats chunks larger than 8kB > as separate blocks, that are always malloc-ed independently. > > But I may be missing something, and the extra call actually makes a > difference. But I think the onus of proving that is on you, and the > default should be not to duplicate code. > >>> 2) Apparently xhlog/subxhlog fields are used for two purposes - to >>> store log2 of the two XID counts, and to remember if the hash table >>> is built. That's a bit confusing (at least it should be explained >>> in a comment) but most importantly it seems a bit unnecessary. >> >> Ok, I'll add comment. >> >>> I assume it was done to save space, but I very much doubt that >>> makes any difference. So we could easily keep a separate flag. I'm >>> pretty sure there are holes in the SnapshotData struct, so we could >>> squeeze it the flags in one of those. >> >> There's hole just right after xhlog. But it will be a bit strange to >> move fields around. >> > > Is SnapshotData really that sensitive to size increase? I have my doubts > about that, TBH. The default stance should be to make the code easy to > understand. That is, we should only move fields around if it actually > makes a difference. > >>> But do we even need a separate flag? We could just as easily keep >>> the log2 fields set to 0 and only set them after actually building >>> the hash table. >> >> There is a need to signal that there is space for hash. It is not >> always allocated. iirc, I didn't cover the case where snapshot were >> restored from file, and some other place or two. >> Only if all places where snapshot is allocated are properly modified >> to allocate space for hash, then flag could be omitted, and log2 >> itself used as a flag. >> > > Hmmm, that makes it a bit inconsistent, I guess ... why not to do the > same thing on all those places? > >>> Or even better, why not to store the mask so that XidInXip does not >>> need to compute it over and over (granted, that's uint32 instead >>> of just uint8, but I don't think SnapshotData is particularly >>> sensitive to this, especially considering how much larger the xid >>> hash table is). >> >> I don't like unnecessary sizeof struct increase. And I doubt that >> computation matters. I could be mistaken though, because it is hot >> place. Do you think it will significantly improve readability? >> > > IMHO the primary goal is to make the code easy to read and understand, > and only optimize struct size if it actually makes a difference. We have > no such proof here, and I very much doubt you'll be able to show any > difference because even with separate flags pahole says this: > > struct SnapshotData { > SnapshotSatisfiesFunc satisfies;/* 0 8 */ > TransactionId xmin; /* 8 4 */ > TransactionId xmax; /*12 4 */ > TransactionId *xip; /*16 8 */ > uint32 xcnt; /*24 4 */ > uint8 xhlog;/*28 1 */ > > /* XXX 3 bytes hole, try to pack */ > > TransactionId *subxip; /*32 8 */ > int32 subxcnt; /*40 4 */ > uint8 subxhlog; /*44 1 */ > bool suboverflowed;/*45 1 */ > bool takenDuringRecovery; /*46 1 */ > bool copied; /*47 1 */ > CommandId
Re: [HACKERS] Lazy hash table for XidInMVCCSnapshot (helps Zipfian a bit)
On 03/17/2018 12:03 AM, Yura Sokolov wrote: > 16.03.2018 04:23, Tomas Vondra пишет: >> >> ... >> >> OK, a few more comments. >> >> 1) The code in ExtendXipSizeForHash seems somewhat redundant with >> my_log2 (that is, we could just call the existing function). > > Yes, I could call my_log2 from ExtendXipSizeForHash. But wouldn't one > more call be more expensive than loop itself? > I very much doubt it there would be a measurable difference. Firstly, function calls are not that expensive, otherwise we'd be running around and removing function calls from the hot paths. Secondly, the call only happens with many XIDs, and in that case the cost should be out-weighted by faster lookups. And finally, the function does stuff that seems far more expensive than a single function call (for example allocation, which may easily trigger a malloc). In fact, in the interesting cases it's pretty much guaranteed to hit a malloc, because the number of backend processes needs to be high enough (say, 256 or more), which means GetMaxSnapshotSubxidCount() will translate to something like ((PGPROC_MAX_CACHED_SUBXIDS + 1) * PROCARRAY_MAXPROCS) = (64 + 1) * 256 = 16640 and because XIDs are 4B each, that's ~65kB of memory (even ignoring the ExtendXipSizeForHash business). And aset.c treats chunks larger than 8kB as separate blocks, that are always malloc-ed independently. But I may be missing something, and the extra call actually makes a difference. But I think the onus of proving that is on you, and the default should be not to duplicate code. >> 2) Apparently xhlog/subxhlog fields are used for two purposes - to >> store log2 of the two XID counts, and to remember if the hash table >> is built. That's a bit confusing (at least it should be explained >> in a comment) but most importantly it seems a bit unnecessary. > > Ok, I'll add comment. > >> I assume it was done to save space, but I very much doubt that >> makes any difference. So we could easily keep a separate flag. I'm >> pretty sure there are holes in the SnapshotData struct, so we could >> squeeze it the flags in one of those. > > There's hole just right after xhlog. But it will be a bit strange to > move fields around. > Is SnapshotData really that sensitive to size increase? I have my doubts about that, TBH. The default stance should be to make the code easy to understand. That is, we should only move fields around if it actually makes a difference. >> But do we even need a separate flag? We could just as easily keep >> the log2 fields set to 0 and only set them after actually building >> the hash table. > > There is a need to signal that there is space for hash. It is not > always allocated. iirc, I didn't cover the case where snapshot were > restored from file, and some other place or two. > Only if all places where snapshot is allocated are properly modified > to allocate space for hash, then flag could be omitted, and log2 > itself used as a flag. > Hmmm, that makes it a bit inconsistent, I guess ... why not to do the same thing on all those places? >> Or even better, why not to store the mask so that XidInXip does not >> need to compute it over and over (granted, that's uint32 instead >> of just uint8, but I don't think SnapshotData is particularly >> sensitive to this, especially considering how much larger the xid >> hash table is). > > I don't like unnecessary sizeof struct increase. And I doubt that > computation matters. I could be mistaken though, because it is hot > place. Do you think it will significantly improve readability? > IMHO the primary goal is to make the code easy to read and understand, and only optimize struct size if it actually makes a difference. We have no such proof here, and I very much doubt you'll be able to show any difference because even with separate flags pahole says this: struct SnapshotData { SnapshotSatisfiesFunc satisfies;/* 0 8 */ TransactionId xmin; /* 8 4 */ TransactionId xmax; /*12 4 */ TransactionId *xip; /*16 8 */ uint32 xcnt; /*24 4 */ uint8 xhlog;/*28 1 */ /* XXX 3 bytes hole, try to pack */ TransactionId *subxip; /*32 8 */ int32 subxcnt; /*40 4 */ uint8 subxhlog; /*44 1 */ bool suboverflowed;/*45 1 */ bool takenDuringRecovery; /*46 1 */ bool copied; /*47 1 */ CommandId curcid; /*48 4 */ uint32 speculativeToken; /*52 4 */ uint32 active_count; /*56 4 */
Re: [HACKERS] Lazy hash table for XidInMVCCSnapshot (helps Zipfian a bit)
16.03.2018 04:23, Tomas Vondra пишет: > > > On 03/10/2018 03:11 AM, Yura Sokolov wrote: >> 08.03.2018 03:42, Tomas Vondra пишет: >>> On 03/06/2018 06:23 AM, Yura Sokolov wrote: 05.03.2018 18:00, Tom Lane пишет: > Tomas Vondrawrites: >> Snapshots are static (we don't really add new XIDs into existing ones, >> right?), so why don't we simply sort the XIDs and then use bsearch to >> lookup values? That should fix the linear search, without need for any >> local hash table. > > +1 for looking into that, since it would avoid adding any complication > to snapshot copying. In principle, with enough XIDs in the snap, an > O(1) hash probe would be quicker than an O(log N) bsearch ... but I'm > dubious that we are often in the range where that would matter. > We do need to worry about the cost of snapshot copying, too. Sorting - is the first thing I've tried. Unfortunately, sorting itself eats too much cpu. Filling hash table is much faster. >>> >>> I've been interested in the sort-based approach, so I've spent a bit of >>> time hacking on it (patch attached). It's much less invasive compared to >>> the hash-table, but Yura is right the hashtable gives better results. >>> >>> I've tried to measure the benefits using the same script I shared on >>> Tuesday, but I kept getting really strange numbers. In fact, I've been >>> unable to even reproduce the results I shared back then. And after a bit >>> of head-scratching I think the script is useless - it can't possibly >>> generate snapshots with many XIDs because all the update threads sleep >>> for exactly the same time. And first one to sleep is first one to wake >>> up and commit, so most of the other XIDs are above xmax (and thus not >>> included in the snapshot). I have no idea why I got the numbers :-/ >>> >>> But with this insight, I quickly modified the script to also consume >>> XIDs by another thread (which simply calls txid_current). With that I'm >>> getting snapshots with as many XIDs as there are UPDATE clients (this >>> time I actually checked that using gdb). >>> >>> And for a 60-second run the tps results look like this (see the attached >>> chart, showing the same data): >>> >>> writers master hash sort >>>- >>> 16 1068 1057 1070 >>> 32 1005995 1033 >>> 64 1064 1042 1117 >>> 128 1058 1021 1051 >>> 256 977 1004928 >>> 512 773935808 >>> 768 576815670 >>> 1000 413752573 >>> >>> The sort certainly does improve performance compared to master, but it's >>> only about half of the hashtable improvement. >>> >>> I don't know how much this simple workload resembles the YCSB benchmark, >>> but I seem to recall it's touching individual rows. In which case it's >>> likely worse due to the pg_qsort being more expensive than building the >>> hash table. >>> >>> So I agree with Yura the sort is not a viable alternative to the hash >>> table, in this case. >>> Can I rely on snapshots being static? May be then there is no need in separate raw representation and hash table. I may try to fill hash table directly in GetSnapshotData instead of lazy filling. Though it will increase time under lock, so it is debatable and should be carefully measured. >>> >>> Yes, I think you can rely on snapshots not being modified later. That's >>> pretty much the definition of a snapshot. >>> >>> You may do that in GetSnapshotData, but you certainly can't do that in >>> the for loop there. Not only we don't want to add more work there, but >>> you don't know the number of XIDs in that loop. And we don't want to >>> build hash tables for small number of XIDs. >>> >>> One reason against building the hash table in GetSnapshotData is that >>> we'd build it even when the snapshot is never queried. Or when it is >>> queried, but we only need to check xmin/xmax. >> >> Thank you for analyze, Tomas. >> >> Stephen is right about bug in snapmgr.c >> Attached version fixes bug, and also simplifies XidInXip a bit. >> > > OK, a few more comments. > > 1) The code in ExtendXipSizeForHash seems somewhat redundant with > my_log2 (that is, we could just call the existing function). Yes, I could call my_log2 from ExtendXipSizeForHash. But wouldn't one more call be more expensive than loop itself? > 2) Apparently xhlog/subxhlog fields are used for two purposes - to store > log2 of the two XID counts, and to remember if the hash table is built. > That's a bit confusing (at least it should be explained in a comment) > but most importantly it seems a bit unnecessary. Ok, I'll add comment. > I assume it was done to save space, but I very much doubt that makes any > difference. So we could easily
Re: [HACKERS] Lazy hash table for XidInMVCCSnapshot (helps Zipfian a bit)
On 03/10/2018 03:11 AM, Yura Sokolov wrote: > 08.03.2018 03:42, Tomas Vondra пишет: >> On 03/06/2018 06:23 AM, Yura Sokolov wrote: >>> 05.03.2018 18:00, Tom Lane пишет: Tomas Vondrawrites: > Snapshots are static (we don't really add new XIDs into existing ones, > right?), so why don't we simply sort the XIDs and then use bsearch to > lookup values? That should fix the linear search, without need for any > local hash table. +1 for looking into that, since it would avoid adding any complication to snapshot copying. In principle, with enough XIDs in the snap, an O(1) hash probe would be quicker than an O(log N) bsearch ... but I'm dubious that we are often in the range where that would matter. We do need to worry about the cost of snapshot copying, too. >>> >>> Sorting - is the first thing I've tried. Unfortunately, sorting >>> itself eats too much cpu. Filling hash table is much faster. >>> >> >> I've been interested in the sort-based approach, so I've spent a bit of >> time hacking on it (patch attached). It's much less invasive compared to >> the hash-table, but Yura is right the hashtable gives better results. >> >> I've tried to measure the benefits using the same script I shared on >> Tuesday, but I kept getting really strange numbers. In fact, I've been >> unable to even reproduce the results I shared back then. And after a bit >> of head-scratching I think the script is useless - it can't possibly >> generate snapshots with many XIDs because all the update threads sleep >> for exactly the same time. And first one to sleep is first one to wake >> up and commit, so most of the other XIDs are above xmax (and thus not >> included in the snapshot). I have no idea why I got the numbers :-/ >> >> But with this insight, I quickly modified the script to also consume >> XIDs by another thread (which simply calls txid_current). With that I'm >> getting snapshots with as many XIDs as there are UPDATE clients (this >> time I actually checked that using gdb). >> >> And for a 60-second run the tps results look like this (see the attached >> chart, showing the same data): >> >> writers master hash sort >>- >> 16 1068 1057 1070 >> 32 1005995 1033 >> 64 1064 1042 1117 >> 128 1058 1021 1051 >> 256 977 1004928 >> 512 773935808 >> 768 576815670 >> 1000 413752573 >> >> The sort certainly does improve performance compared to master, but it's >> only about half of the hashtable improvement. >> >> I don't know how much this simple workload resembles the YCSB benchmark, >> but I seem to recall it's touching individual rows. In which case it's >> likely worse due to the pg_qsort being more expensive than building the >> hash table. >> >> So I agree with Yura the sort is not a viable alternative to the hash >> table, in this case. >> >>> Can I rely on snapshots being static? May be then there is no need >>> in separate raw representation and hash table. I may try to fill hash >>> table directly in GetSnapshotData instead of lazy filling. Though it >>> will increase time under lock, so it is debatable and should be >>> carefully measured. >>> >> >> Yes, I think you can rely on snapshots not being modified later. That's >> pretty much the definition of a snapshot. >> >> You may do that in GetSnapshotData, but you certainly can't do that in >> the for loop there. Not only we don't want to add more work there, but >> you don't know the number of XIDs in that loop. And we don't want to >> build hash tables for small number of XIDs. >> >> One reason against building the hash table in GetSnapshotData is that >> we'd build it even when the snapshot is never queried. Or when it is >> queried, but we only need to check xmin/xmax. > > Thank you for analyze, Tomas. > > Stephen is right about bug in snapmgr.c > Attached version fixes bug, and also simplifies XidInXip a bit. > OK, a few more comments. 1) The code in ExtendXipSizeForHash seems somewhat redundant with my_log2 (that is, we could just call the existing function). 2) Apparently xhlog/subxhlog fields are used for two purposes - to store log2 of the two XID counts, and to remember if the hash table is built. That's a bit confusing (at least it should be explained in a comment) but most importantly it seems a bit unnecessary. I assume it was done to save space, but I very much doubt that makes any difference. So we could easily keep a separate flag. I'm pretty sure there are holes in the SnapshotData struct, so we could squeeze it the flags in one of those. But do we even need a separate flag? We could just as easily keep the log2 fields set to 0 and only set them after actually building the hash table. Or even
Re: [HACKERS] Lazy hash table for XidInMVCCSnapshot (helps Zipfian a bit)
08.03.2018 03:42, Tomas Vondra пишет: > On 03/06/2018 06:23 AM, Yura Sokolov wrote: >> 05.03.2018 18:00, Tom Lane пишет: >>> Tomas Vondrawrites: Snapshots are static (we don't really add new XIDs into existing ones, right?), so why don't we simply sort the XIDs and then use bsearch to lookup values? That should fix the linear search, without need for any local hash table. >>> >>> +1 for looking into that, since it would avoid adding any complication >>> to snapshot copying. In principle, with enough XIDs in the snap, an >>> O(1) hash probe would be quicker than an O(log N) bsearch ... but I'm >>> dubious that we are often in the range where that would matter. >>> We do need to worry about the cost of snapshot copying, too. >> >> Sorting - is the first thing I've tried. Unfortunately, sorting >> itself eats too much cpu. Filling hash table is much faster. >> > > I've been interested in the sort-based approach, so I've spent a bit of > time hacking on it (patch attached). It's much less invasive compared to > the hash-table, but Yura is right the hashtable gives better results. > > I've tried to measure the benefits using the same script I shared on > Tuesday, but I kept getting really strange numbers. In fact, I've been > unable to even reproduce the results I shared back then. And after a bit > of head-scratching I think the script is useless - it can't possibly > generate snapshots with many XIDs because all the update threads sleep > for exactly the same time. And first one to sleep is first one to wake > up and commit, so most of the other XIDs are above xmax (and thus not > included in the snapshot). I have no idea why I got the numbers :-/ > > But with this insight, I quickly modified the script to also consume > XIDs by another thread (which simply calls txid_current). With that I'm > getting snapshots with as many XIDs as there are UPDATE clients (this > time I actually checked that using gdb). > > And for a 60-second run the tps results look like this (see the attached > chart, showing the same data): > > writers master hash sort >- > 16 1068 1057 1070 > 32 1005995 1033 > 64 1064 1042 1117 > 128 1058 1021 1051 > 256 977 1004928 > 512 773935808 > 768 576815670 > 1000 413752573 > > The sort certainly does improve performance compared to master, but it's > only about half of the hashtable improvement. > > I don't know how much this simple workload resembles the YCSB benchmark, > but I seem to recall it's touching individual rows. In which case it's > likely worse due to the pg_qsort being more expensive than building the > hash table. > > So I agree with Yura the sort is not a viable alternative to the hash > table, in this case. > >> Can I rely on snapshots being static? May be then there is no need >> in separate raw representation and hash table. I may try to fill hash >> table directly in GetSnapshotData instead of lazy filling. Though it >> will increase time under lock, so it is debatable and should be >> carefully measured. >> > > Yes, I think you can rely on snapshots not being modified later. That's > pretty much the definition of a snapshot. > > You may do that in GetSnapshotData, but you certainly can't do that in > the for loop there. Not only we don't want to add more work there, but > you don't know the number of XIDs in that loop. And we don't want to > build hash tables for small number of XIDs. > > One reason against building the hash table in GetSnapshotData is that > we'd build it even when the snapshot is never queried. Or when it is > queried, but we only need to check xmin/xmax. Thank you for analyze, Tomas. Stephen is right about bug in snapmgr.c Attached version fixes bug, and also simplifies XidInXip a bit. With regards, Sokolov Yura. From 8484ae3f2c2f8af20ae8ce2f6d7960b6519e65c0 Mon Sep 17 00:00:00 2001 From: Sokolov Yura aka funny_falcon Date: Fri, 9 Mar 2018 22:49:01 +0300 Subject: [PATCH] Make hash table for xip in XidInMVCCSnapshot When lot of concurrent transactions attempts to update single row, then linear scan through running list in XidInMVCCSnapshot became noticebale bottleneck. With this change, hash table is built on first search of xid in snapshot->xip and snapshot->subxip arrays. If size of array is smaller than 60, then linear scan is still used, cause there is no much benefits from building hash then. (at least on Intel Xeon). --- src/backend/storage/ipc/procarray.c | 67 ++-- src/backend/utils/time/snapmgr.c| 25 +++ src/backend/utils/time/tqual.c | 88 - src/include/utils/snapshot.h| 11 + 4
Re: [HACKERS] Lazy hash table for XidInMVCCSnapshot (helps Zipfian a bit)
05.03.2018 18:00, Tom Lane пишет: > Tomas Vondrawrites: >> Snapshots are static (we don't really add new XIDs into existing ones, >> right?), so why don't we simply sort the XIDs and then use bsearch to >> lookup values? That should fix the linear search, without need for any >> local hash table. > > +1 for looking into that, since it would avoid adding any complication > to snapshot copying. In principle, with enough XIDs in the snap, an > O(1) hash probe would be quicker than an O(log N) bsearch ... but I'm > dubious that we are often in the range where that would matter. > We do need to worry about the cost of snapshot copying, too. Sorting - is the first thing I've tried. Unfortunately, sorting itself eats too much cpu. Filling hash table is much faster. Can I rely on snapshots being static? May be then there is no need in separate raw representation and hash table. I may try to fill hash table directly in GetSnapshotData instead of lazy filling. Though it will increase time under lock, so it is debatable and should be carefully measured. I'll look at a bug in CopySnapshot soon. Excuse me for delaying. With regards, Sokolov Yura signature.asc Description: OpenPGP digital signature
Re: [HACKERS] Lazy hash table for XidInMVCCSnapshot (helps Zipfian a bit)
Tomas Vondrawrites: > Snapshots are static (we don't really add new XIDs into existing ones, > right?), so why don't we simply sort the XIDs and then use bsearch to > lookup values? That should fix the linear search, without need for any > local hash table. +1 for looking into that, since it would avoid adding any complication to snapshot copying. In principle, with enough XIDs in the snap, an O(1) hash probe would be quicker than an O(log N) bsearch ... but I'm dubious that we are often in the range where that would matter. We do need to worry about the cost of snapshot copying, too. regards, tom lane
Re: [HACKERS] Lazy hash table for XidInMVCCSnapshot (helps Zipfian a bit)
Hi, I've done a bit of benchmarking on the last patch version (from 22/8), using a simple workload: 1) 8 clients doing SELECT SUM(abalance) FROM pgbench_accounts with the table truncated to only 10k rows 2) variable number (8, 16, 32, ..., 512) of write clients, doing this \set aid random(1, 1 * :scale) BEGIN; UPDATE pgbench_accounts SET abalance = abalance + 1 WHERE aid = :aid; SELECT pg_sleep(0.005); COMMIT; The idea is to measure tps of the read-only clients, with many throttled write clients (idle in transaction for 5ms after each update). This should generate snapshots with many XIDs. Scripts attached, of course. The results look really promising (see the attached chart and .ods): write clientsmasterpatched 87048 7089 16 6601 6614 32 5862 5944 64 5327 5349 128 4574 4952 256 4132 4956 512 2196 4930 That being said, I think Stephen is right that there's a bug in CopySnapshot, and I agree with his other comments too. I think the patch should have been in WoA, so I'll mark it like that. simplehash is great, but I'm not sure it's a good fit for this use case. Seems a bit overkill to me in this case, but maybe I'm wrong. Snapshots are static (we don't really add new XIDs into existing ones, right?), so why don't we simply sort the XIDs and then use bsearch to lookup values? That should fix the linear search, without need for any local hash table. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services xid-select.sql Description: application/sql xid-update.sql Description: application/sql xid-tests.sh Description: application/shellscript xid-hash.ods Description: application/vnd.oasis.opendocument.spreadsheet tests.ods Description: application/vnd.oasis.opendocument.spreadsheet
Re: [HACKERS] Lazy hash table for XidInMVCCSnapshot (helps Zipfian a bit)
Greetings, * Sokolov Yura (funny.fal...@postgrespro.ru) wrote: > diff --git a/src/backend/utils/time/snapmgr.c > b/src/backend/utils/time/snapmgr.c > index 08a08c8e8f..7c3fe7563e 100644 > --- a/src/backend/utils/time/snapmgr.c > +++ b/src/backend/utils/time/snapmgr.c > @@ -662,13 +662,16 @@ CopySnapshot(Snapshot snapshot) > Snapshotnewsnap; > Sizesubxipoff; > Sizesize; > + int xcnt, subxcnt; > + uint8 xhlog, subxhlog; > > Assert(snapshot != InvalidSnapshot); > > + xcnt = ExtendXipSizeForHash(snapshot->xcnt, ); > + subxcnt = ExtendXipSizeForHash(snapshot->subxcnt, ); > /* We allocate any XID arrays needed in the same palloc block. */ > - size = subxipoff = sizeof(SnapshotData) + > - snapshot->xcnt * sizeof(TransactionId); > - if (snapshot->subxcnt > 0) > + size = subxipoff = sizeof(SnapshotData) + xcnt * sizeof(TransactionId); > + if (subxcnt > 0) > size += snapshot->subxcnt * sizeof(TransactionId); Here you've changed to use xcnt instead of snapshot->xcnt for calculating size, but when it comes to adding in the subxcnt, you calculate a subxcnt but still use snapshot->subxcnt...? That doesn't seem like what you intended to do here. > diff --git a/src/backend/utils/time/tqual.c b/src/backend/utils/time/tqual.c > index f9da9e17f5..a5e7d427b4 100644 > --- a/src/backend/utils/time/tqual.c > +++ b/src/backend/utils/time/tqual.c > @@ -1451,6 +1451,69 @@ HeapTupleIsSurelyDead(HeapTuple htup, TransactionId > OldestXmin) > } > > /* > + * XidInXip searches xid in xip array. > + * > + * If xcnt is smaller than SnapshotMinHash (60), or *xhlog is zero, then > simple > + * linear scan of xip array used. Looks like there is no benifit in hash > table > + * for small xip array. I wouldn't write '60' in there, anyone who is interested could go look up whatever it ends up being set to. I tend to agree with Robert that it'd be nice if simplehash could be used here instead. The insertion is definitely more expensive but that's specifically to try and improve lookup performance, so it might end up not being so bad. I do understand that it would end up using more memory, so that's a fair concern. I do think this could use with more comments and perhaps having some Assert()'s thrown in (and it looks like you're removing one..?). I haven't got a whole lot else to say on this patch, would be good if someone could spend some more time reviewing it more carefully and testing it to see what kind of performance they get. The improvements that you've posted definitely look nice, especially with the lwlock performance work. Thanks! Stephen signature.asc Description: PGP signature
Re: [HACKERS] Lazy hash table for XidInMVCCSnapshot (helps Zipfian a bit)
On Tue, Aug 22, 2017 at 9:17 PM, Sokolov Yurawrote: > Simplified a bit and more commented patch version is in attach. > > Algorithm were switched to linear probing, it makes code simpler and > clearer. > Flag usages were toggled: now it indicates that hash table were built, > it also makes code clearer. > XidInXip and some other functions got comments. This patch still applies but nobody seems interested in it. So moved to next CF. -- Michael