Re: [HACKERS] Lazy hash table for XidInMVCCSnapshot (helps Zipfian a bit)

2017-08-22 Thread Sokolov Yura
Simplified a bit and more commented patch version is in attach.

Algorithm were switched to linear probing, it makes code simpler and
clearer.
Flag usages were toggled: now it indicates that hash table were built,
it also makes code clearer.
XidInXip and some other functions got comments.

-- 
With regards,
Sokolov Yura aka funny_falcon
Postgres Professional: https://postgrespro.ru
The Russian Postgres Company
From 5851f01de6b793be5e682c67db8db339667be92d Mon Sep 17 00:00:00 2001
From: Sokolov Yura 
Date: Mon, 10 Jul 2017 12:34:48 +
Subject: [PATCH] Make hash table for xip in XidInMVCCSnapshot

When lot of concurrent transactions attempts to update single
row, then linear scan through running list in XidInMVCCSnapshot
became noticebale bottleneck.

With this change, hash table is built on first search of xid in
snapshot->xip and snapshot->subxip arrays.

If size of array is smaller than 60, then linear scan is still
used, cause there is no much benefits from building hash then.
(at least on Intel Xeon).
---
 src/backend/storage/ipc/procarray.c | 67 +--
 src/backend/utils/time/snapmgr.c| 24 ++
 src/backend/utils/time/tqual.c  | 91 -
 src/include/utils/snapshot.h| 11 +
 4 files changed, 151 insertions(+), 42 deletions(-)

diff --git a/src/backend/storage/ipc/procarray.c b/src/backend/storage/ipc/procarray.c
index a7e8cf2d43..2c7e065342 100644
--- a/src/backend/storage/ipc/procarray.c
+++ b/src/backend/storage/ipc/procarray.c
@@ -1469,6 +1469,54 @@ GetMaxSnapshotSubxidCount(void)
 }
 
 /*
+ * ExtendXipSizeForHash - calculate xip array size with space for hash table.
+ *
+ * Hash table should be at least twice larger than array to not depend on
+ * cleverness of algorithm.
+ *
+ * But if xipcnt < SnapshotMinHash, then no need in hash-table at all.
+ */
+int
+ExtendXipSizeForHash(int xipcnt, uint8* plog)
+{
+	int size;
+	uint8 log = 0;
+
+	size = xipcnt;
+	if (xipcnt >= SnapshotMinHash)
+	{
+		log = 1;
+		while (xipcnt) {
+			log++;
+			xipcnt >>= 1;
+		}
+		size += 1<xip = (TransactionId *)
-			malloc(GetMaxSnapshotXidCount() * sizeof(TransactionId));
-		if (snapshot->xip == NULL)
-			ereport(ERROR,
-	(errcode(ERRCODE_OUT_OF_MEMORY),
-	 errmsg("out of memory")));
-		Assert(snapshot->subxip == NULL);
-		snapshot->subxip = (TransactionId *)
-			malloc(GetMaxSnapshotSubxidCount() * sizeof(TransactionId));
-		if (snapshot->subxip == NULL)
-			ereport(ERROR,
-	(errcode(ERRCODE_OUT_OF_MEMORY),
-	 errmsg("out of memory")));
+		snapshot->xip = AllocateXip(GetMaxSnapshotXidCount(),
+>xhlog);
+		snapshot->subxip = AllocateXip(GetMaxSnapshotSubxidCount(),
+>subxhlog);
 	}
 
 	/*
@@ -1757,6 +1796,8 @@ GetSnapshotData(Snapshot snapshot)
 	snapshot->active_count = 0;
 	snapshot->regd_count = 0;
 	snapshot->copied = false;
+	snapshot->xhlog &= ~SnapshotHashBuilt;
+	snapshot->subxhlog &= ~SnapshotHashBuilt;
 
 	if (old_snapshot_threshold < 0)
 	{
diff --git a/src/backend/utils/time/snapmgr.c b/src/backend/utils/time/snapmgr.c
index 08a08c8e8f..7c3fe7563e 100644
--- a/src/backend/utils/time/snapmgr.c
+++ b/src/backend/utils/time/snapmgr.c
@@ -662,13 +662,16 @@ CopySnapshot(Snapshot snapshot)
 	Snapshot	newsnap;
 	Size		subxipoff;
 	Size		size;
+	int			xcnt, subxcnt;
+	uint8		xhlog, subxhlog;
 
 	Assert(snapshot != InvalidSnapshot);
 
+	xcnt = ExtendXipSizeForHash(snapshot->xcnt, );
+	subxcnt = ExtendXipSizeForHash(snapshot->subxcnt, );
 	/* We allocate any XID arrays needed in the same palloc block. */
-	size = subxipoff = sizeof(SnapshotData) +
-		snapshot->xcnt * sizeof(TransactionId);
-	if (snapshot->subxcnt > 0)
+	size = subxipoff = sizeof(SnapshotData) + xcnt * sizeof(TransactionId);
+	if (subxcnt > 0)
 		size += snapshot->subxcnt * sizeof(TransactionId);
 
 	newsnap = (Snapshot) MemoryContextAlloc(TopTransactionContext, size);
@@ -677,6 +680,8 @@ CopySnapshot(Snapshot snapshot)
 	newsnap->regd_count = 0;
 	newsnap->active_count = 0;
 	newsnap->copied = true;
+	newsnap->xhlog = xhlog;
+	newsnap->subxhlog = 

Re: [HACKERS] Lazy hash table for XidInMVCCSnapshot (helps Zipfian a bit)

2017-08-14 Thread Robert Haas
On Sun, Aug 13, 2017 at 11:05 AM, Sokolov Yura
 wrote:
> BTW, ad-hoc hash tables already exist: at least recourse owner uses one.

Yeah.  :-(

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Lazy hash table for XidInMVCCSnapshot (helps Zipfian a bit)

2017-08-13 Thread Sokolov Yura
В Fri, 11 Aug 2017 14:05:08 -0400
Robert Haas  пишет:

> On Thu, Aug 10, 2017 at 11:12 AM, Alexander Korotkov
>  wrote:
> > These results look very cool!
> > I think CSN is eventually inevitable, but it's a long distance
> > feature. Thus, this optimization could make us a good serve before
> > we would have CSN. Do you expect there are cases when this patch
> > causes slowdown?  What about case when we scan each xip array only
> > once (not sure how to simulate that using pgbench)?  
> 
> Just a random thought here:
> 
> The statements pgbench executes are pretty simple: they touch one row
> in one table.  You wouldn't expect them to scan the xip array very
> many times.  If even those statements touch the array enough for this
> to win, it seems like it might be hard to construct an even worse
> case.  I might be missing something, though.

With zipfian distribution, many concurrent transactions tries to update
the same row. Every transaction eventually updates this row (may be
after waiting), so there are a lot of concurrent row version, and
transaction scans through its snapshot many times.

> 
> We're not going to accept code like this, though:
> 
> +d = xip[i] >> 6;
> +j = k = xip[i] & mask;
> +while (xiphash[j] != InvalidTransactionId)
> +{
> +j = (k + d) & mask;
> +d = d*5 + 1;
> +}
> 
> Single-character variable names, hard-coded constants, and no
> comments!

I totally agree. First version (which you've cited) was clear POC.
Second version (I've sent in Thursday) looks a bit better, but I agree
there is room for improvement. I'll try to make it prettier.

BTW, I have a question: there is CopySnapshot function, so snapshot is
clearly changed after copy. But I could not follow: is xip and subxip
array changes in a copy, or it remains the same to original, but only
other snapshot fields could be changed?

> I kind of doubt as a general point that we really want another
> open-coded hash table -- I wonder if this could be made to use
> simplehash.

I like simplehash very much (although I'm missing couple of features).
Unfortunately, siplehash is not simple enough for this use case:
- it will use at least twice more memory for hash table itself (cause
  it have to have 'entry->status' field),
- it has large header (ad-hoc hash-table consumes 1 byte in
  SnapshotData structure),
- its allocation will be trickier (custom hash-table is co-allocated
  after xip-array itself),
- its insert algorithm looks to be much more expensive (at least, it is
  more complex in instructions count).

I thing, using simplehash here will lead to much more invasive patch,
than this ad-hoc table. And I believe it will be slower (and this place
is very performance critical), though I will not bet on.

BTW, ad-hoc hash tables already exist: at least recourse owner uses one.

-- 
With regards,
Sokolov Yura aka funny_falcon
Postgres Professional: https://postgrespro.ru
The Russian Postgres Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Lazy hash table for XidInMVCCSnapshot (helps Zipfian a bit)

2017-08-11 Thread Robert Haas
On Thu, Aug 10, 2017 at 11:12 AM, Alexander Korotkov
 wrote:
> These results look very cool!
> I think CSN is eventually inevitable, but it's a long distance feature.
> Thus, this optimization could make us a good serve before we would have CSN.
> Do you expect there are cases when this patch causes slowdown?  What about
> case when we scan each xip array only once (not sure how to simulate that
> using pgbench)?

Just a random thought here:

The statements pgbench executes are pretty simple: they touch one row
in one table.  You wouldn't expect them to scan the xip array very
many times.  If even those statements touch the array enough for this
to win, it seems like it might be hard to construct an even worse
case.  I might be missing something, though.

We're not going to accept code like this, though:

+d = xip[i] >> 6;
+j = k = xip[i] & mask;
+while (xiphash[j] != InvalidTransactionId)
+{
+j = (k + d) & mask;
+d = d*5 + 1;
+}

Single-character variable names, hard-coded constants, and no comments!

I kind of doubt as a general point that we really want another
open-coded hash table -- I wonder if this could be made to use
simplehash.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Lazy hash table for XidInMVCCSnapshot (helps Zipfian a bit)

2017-08-11 Thread Sokolov Yura
В Thu, 10 Aug 2017 18:12:34 +0300
Alexander Korotkov  пишет:

> On Thu, Aug 10, 2017 at 3:30 PM, Sokolov Yura
>  wrote:
> 
> > On 2017-07-31 18:56, Sokolov Yura wrote:
> >  
> >> Good day, every one.
> >>
> >> In attempt to improve performance of YCSB on zipfian distribution,
> >> it were found that significant time is spent in XidInMVCCSnapshot
> >> in scanning snapshot->xip array. While overall CPU time is not too
> >> noticable, it has measurable impact on scaleability.
> >>
> >> First I tried to sort snapshot->xip in GetSnapshotData, and search
> >> in a sorted array. But since snapshot->xip is not touched if no
> >> transaction contention occurs, sorting xip always is not best
> >> option.
> >>
> >> Then I sorted xip array on demand in XidInMVCCSnapshot only if
> >> search in snapshot->xip occurs (ie lazy sorting). It performs much
> >> better, but since it is O(NlogN), sort's execution time is
> >> noticable for large number of clients.
> >>
> >> Third approach (present in attached patch) is making hash table
> >> lazily on first search in xip array.
> >>
> >> Note: hash table is not built if number of "in-progress" xids is
> >> less than 60. Tests shows, there is no big benefit from doing so
> >> (at least on Intel Xeon).
> >>
> >> With regards,
> >>  
> >
> > Here is new, more correct, patch version, and results for pgbench
> > with zipfian distribution (50% read 50% write) (same to Alik's
> > tests at
> > https://www.postgresql.org/message-id/BF3B6F54-68C3-417A-BFA
> > b-fb4d66f2b...@postgrespro.ru )
> >
> >
> > clients | master | hashsnap2 | hashsnap2_lwlock
> > ++---+--
> >  10 | 203384 |203813 |   204852
> >  20 | 334344 |334268 |   363510
> >  40 | 228496 |231777 |   383820
> >  70 | 146892 |148173 |   221326
> > 110 |  99741 |111580 |   157327
> > 160 |  65257 | 81230 |   112028
> > 230 |  38344 | 56790 |77514
> > 310 |  22355 | 39249 |55907
> > 400 |  13402 | 26899 |39742
> > 500 |   8382 | 17855 |28362
> > 650 |   5313 | 11450 |17497
> > 800 |   3352 |  7816 |11030
> >
> > (Note: I'm not quite sure, why my numbers for master are lower than
> > Alik's one. Probably, our test methodology differs a bit. Perhaps,
> > it is because I don't recreate tables between client number change,
> > but between branch switch).
> >  
> 
> These results look very cool!
> I think CSN is eventually inevitable, but it's a long distance
> feature. Thus, this optimization could make us a good serve before we
> would have CSN. Do you expect there are cases when this patch causes
> slowdown?  What about case when we scan each xip array only once (not
> sure how to simulate that using pgbench)?
> 

Patched version allocates a bit more memory per process (if number of
backends is greater than 60), and for copied snapshot (if number of
concurrent transactions is greater than 60).

I have no strong evidence patch could cause slowdown. Different
When xip array is scanned only once, then XidInMVCCSnapshot consumes
usually 0.2-0.5% CPU in total, and scanning xip array even less. So
even building hash table slows first scan in three-four times, it could
increase overall CPU usage just on 0.5-1%, and it is not necessary
directly mapped to tps.

-- 
With regards,
Sokolov Yura aka funny_falcon
Postgres Professional: https://postgrespro.ru
The Russian Postgres Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Lazy hash table for XidInMVCCSnapshot (helps Zipfian a bit)

2017-08-10 Thread Alexander Korotkov
On Thu, Aug 10, 2017 at 3:30 PM, Sokolov Yura 
wrote:

> On 2017-07-31 18:56, Sokolov Yura wrote:
>
>> Good day, every one.
>>
>> In attempt to improve performance of YCSB on zipfian distribution,
>> it were found that significant time is spent in XidInMVCCSnapshot in
>> scanning snapshot->xip array. While overall CPU time is not too
>> noticable, it has measurable impact on scaleability.
>>
>> First I tried to sort snapshot->xip in GetSnapshotData, and search in a
>> sorted array. But since snapshot->xip is not touched if no transaction
>> contention occurs, sorting xip always is not best option.
>>
>> Then I sorted xip array on demand in XidInMVCCSnapshot only if
>> search in snapshot->xip occurs (ie lazy sorting). It performs much
>> better, but since it is O(NlogN), sort's execution time is noticable
>> for large number of clients.
>>
>> Third approach (present in attached patch) is making hash table lazily
>> on first search in xip array.
>>
>> Note: hash table is not built if number of "in-progress" xids is less
>> than 60. Tests shows, there is no big benefit from doing so (at least
>> on Intel Xeon).
>>
>> With regards,
>>
>
> Here is new, more correct, patch version, and results for pgbench with
> zipfian distribution (50% read 50% write) (same to Alik's tests at
> https://www.postgresql.org/message-id/BF3B6F54-68C3-417A-BFA
> b-fb4d66f2b...@postgrespro.ru )
>
>
> clients | master | hashsnap2 | hashsnap2_lwlock
> ++---+--
>  10 | 203384 |203813 |   204852
>  20 | 334344 |334268 |   363510
>  40 | 228496 |231777 |   383820
>  70 | 146892 |148173 |   221326
> 110 |  99741 |111580 |   157327
> 160 |  65257 | 81230 |   112028
> 230 |  38344 | 56790 |77514
> 310 |  22355 | 39249 |55907
> 400 |  13402 | 26899 |39742
> 500 |   8382 | 17855 |28362
> 650 |   5313 | 11450 |17497
> 800 |   3352 |  7816 |11030
>
> (Note: I'm not quite sure, why my numbers for master are lower than
> Alik's one. Probably, our test methodology differs a bit. Perhaps, it
> is because I don't recreate tables between client number change, but
> between branch switch).
>

These results look very cool!
I think CSN is eventually inevitable, but it's a long distance feature.
Thus, this optimization could make us a good serve before we would have CSN.
Do you expect there are cases when this patch causes slowdown?  What about
case when we scan each xip array only once (not sure how to simulate that
using pgbench)?

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


Re: [HACKERS] Lazy hash table for XidInMVCCSnapshot (helps Zipfian a bit)

2017-08-10 Thread Sokolov Yura

On 2017-07-31 18:56, Sokolov Yura wrote:

Good day, every one.

In attempt to improve performance of YCSB on zipfian distribution,
it were found that significant time is spent in XidInMVCCSnapshot in
scanning snapshot->xip array. While overall CPU time is not too
noticable, it has measurable impact on scaleability.

First I tried to sort snapshot->xip in GetSnapshotData, and search in a
sorted array. But since snapshot->xip is not touched if no transaction
contention occurs, sorting xip always is not best option.

Then I sorted xip array on demand in XidInMVCCSnapshot only if
search in snapshot->xip occurs (ie lazy sorting). It performs much
better, but since it is O(NlogN), sort's execution time is noticable
for large number of clients.

Third approach (present in attached patch) is making hash table lazily
on first search in xip array.

Note: hash table is not built if number of "in-progress" xids is less
than 60. Tests shows, there is no big benefit from doing so (at least
on Intel Xeon).

With regards,


Here is new, more correct, patch version, and results for pgbench with
zipfian distribution (50% read 50% write) (same to Alik's tests at
https://www.postgresql.org/message-id/bf3b6f54-68c3-417a-bfab-fb4d66f2b...@postgrespro.ru 
)



clients | master | hashsnap2 | hashsnap2_lwlock
++---+--
 10 | 203384 |203813 |   204852
 20 | 334344 |334268 |   363510
 40 | 228496 |231777 |   383820
 70 | 146892 |148173 |   221326
110 |  99741 |111580 |   157327
160 |  65257 | 81230 |   112028
230 |  38344 | 56790 |77514
310 |  22355 | 39249 |55907
400 |  13402 | 26899 |39742
500 |   8382 | 17855 |28362
650 |   5313 | 11450 |17497
800 |   3352 |  7816 |11030

(Note: I'm not quite sure, why my numbers for master are lower than
Alik's one. Probably, our test methodology differs a bit. Perhaps, it
is because I don't recreate tables between client number change, but
between branch switch).

With regards
--
Sokolov Yura aka funny_falcon
Postgres Professional: https://postgrespro.ru
The Russian Postgres CompanyFrom 80787aee44b7f9b5389476e45f2241073b6c89ee Mon Sep 17 00:00:00 2001
From: Sokolov Yura 
Date: Mon, 10 Jul 2017 12:34:48 +
Subject: [PATCH] Make hash table for xip in XidInMVCCSnapshot

When lot of concurrent transactions attempts to update single
row, then linear scan through running list in XidInMVCCSnapshot
became noticebale bottleneck.

With this change, hash table is built on first search of xid in
snapshot->xip and snapshot->subxip arrays.

If size of array is smaller than 60, then linear scan is still
used, cause there is no much benefits from building hash then.
(at least on Intel Xeon).
---
 src/backend/storage/ipc/procarray.c | 64 +++--
 src/backend/utils/time/snapmgr.c| 24 +++
 src/backend/utils/time/tqual.c  | 81 +++--
 src/include/utils/snapshot.h| 11 +
 4 files changed, 138 insertions(+), 42 deletions(-)

diff --git a/src/backend/storage/ipc/procarray.c b/src/backend/storage/ipc/procarray.c
index a7e8cf2..18cc233 100644
--- a/src/backend/storage/ipc/procarray.c
+++ b/src/backend/storage/ipc/procarray.c
@@ -1469,6 +1469,49 @@ GetMaxSnapshotSubxidCount(void)
 }
 
 /*
+ * ExtendXipSizeForHash - calculate xip array size with space for hash table
+ */
+int
+ExtendXipSizeForHash(int xipcnt, uint8* plog)
+{
+	int size;
+	uint8 log = 0;
+
+	size = xipcnt;
+	if (xipcnt >= SnapshotMinHash)
+	{
+		log = 1;
+		while (xipcnt) {
+			log++;
+			xipcnt >>= 1;
+		}
+		size += 1<xip = (TransactionId *)
-			malloc(GetMaxSnapshotXidCount() * sizeof(TransactionId));
-		if (snapshot->xip == NULL)
-			ereport(ERROR,
-	(errcode(ERRCODE_OUT_OF_MEMORY),
-	 errmsg("out of memory")));
-		Assert(snapshot->subxip == NULL);
-		snapshot->subxip = (TransactionId *)
-