subject:"Re\: \[HACKERS\] Move PinBuffer and UnpinBuffer to atomics"

Re: [HACKERS] Move PinBuffer and UnpinBuffer to atomics

2016-04-25 Thread Amit Kapila

On Mon, Apr 25, 2016 at 6:04 PM, Alexander Korotkov <
a.korot...@postgrespro.ru> wrote:

> On Sun, Apr 17, 2016 at 7:32 PM, Amit Kapila 
> wrote:
>
>> On Thu, Apr 14, 2016 at 8:05 AM, Andres Freund 
>> wrote:
>> >
>> > On 2016-04-14 07:59:07 +0530, Amit Kapila wrote:
>> > > What you want to see by prewarming?
>> >
>> > Prewarming appears to greatly reduce the per-run variance on that
>> > machine, making it a lot easier to get meaningful results.
>> >
>>
>> I think you are referring the tests done by Robert on power-8 m/c, but
>> the performance results I have reported were on intel x86.  In last two
>> days, I have spent quite some effort to do the performance testing of this
>> patch with pre-warming by using the same query [1] as used by Robert in his
>> tests.  The tests are done such that first it start server, pre-warms the
>> relations, ran read-only test, stop server, again repeat this for next test.
>>
>
> What did you include into single run: test of single version (HEAD or
> Patch) or test of both of them?
>

single run includes single version (either HEAD or Patch).


>
>
>> I have observed that the variance in run-to-run performance still occurs
>> especially at higher client count (128).  Below are results for 128 client
>> count both when the tests ran first with patch and then with HEAD and vice
>> versa.
>>
>> Test-1
>> --
>> client count - 128 (basically -c 128 -j 128)
>>
>> first tests ran with patch and then with HEAD
>>
>> Patch_ver/Runs HEAD (commit -70715e6a) Patch
>> Run-1 156748 174640
>> Run-2 151352 150115
>> Run-3 177940 165269
>>
>>
>> Test-2
>> --
>> client count - 128 (basically -c 128 -j 128)
>>
>> first tests ran with HEAD and then with patch
>>
>> Patch_ver/Runs HEAD (commit -70715e6a) Patch
>> Run-1 173063 151282
>> Run-2 173187 140676
>> Run-3 177046 166726
>>
>> I think this patch (padding pgxact) certainly is beneficial as reported
>> above thread. At very high client count some variation in performance is
>> observed with and without patch, but I feel in general it is a win.
>>
>
> So, what hardware did you use for these tests: power-8 or x86?
>

x86


> How long was single run?
>

5 minutes.


With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: [HACKERS] Move PinBuffer and UnpinBuffer to atomics

2016-04-25 Thread Alexander Korotkov

On Sun, Apr 17, 2016 at 7:32 PM, Amit Kapila 
wrote:

> On Thu, Apr 14, 2016 at 8:05 AM, Andres Freund  wrote:
> >
> > On 2016-04-14 07:59:07 +0530, Amit Kapila wrote:
> > > What you want to see by prewarming?
> >
> > Prewarming appears to greatly reduce the per-run variance on that
> > machine, making it a lot easier to get meaningful results.
> >
>
> I think you are referring the tests done by Robert on power-8 m/c, but the
> performance results I have reported were on intel x86.  In last two days, I
> have spent quite some effort to do the performance testing of this patch
> with pre-warming by using the same query [1] as used by Robert in his
> tests.  The tests are done such that first it start server, pre-warms the
> relations, ran read-only test, stop server, again repeat this for next test.
>

What did you include into single run: test of single version (HEAD or
Patch) or test of both of them?


> I have observed that the variance in run-to-run performance still occurs
> especially at higher client count (128).  Below are results for 128 client
> count both when the tests ran first with patch and then with HEAD and vice
> versa.
>
> Test-1
> --
> client count - 128 (basically -c 128 -j 128)
>
> first tests ran with patch and then with HEAD
>
> Patch_ver/Runs HEAD (commit -70715e6a) Patch
> Run-1 156748 174640
> Run-2 151352 150115
> Run-3 177940 165269
>
>
> Test-2
> --
> client count - 128 (basically -c 128 -j 128)
>
> first tests ran with HEAD and then with patch
>
> Patch_ver/Runs HEAD (commit -70715e6a) Patch
> Run-1 173063 151282
> Run-2 173187 140676
> Run-3 177046 166726
>
> I think this patch (padding pgxact) certainly is beneficial as reported
> above thread. At very high client count some variation in performance is
> observed with and without patch, but I feel in general it is a win.
>

So, what hardware did you use for these tests: power-8 or x86? How long was
single run?
Per-run variation seems quite high.  It also seems that it depends on which
version runs first.  But that could be a coincidence.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: [HACKERS] Move PinBuffer and UnpinBuffer to atomics

2016-04-17 Thread Amit Kapila

On Fri, Apr 15, 2016 at 1:59 AM, Alexander Korotkov <
a.korot...@postgrespro.ru> wrote:

> On Thu, Apr 14, 2016 at 5:35 AM, Andres Freund  wrote:
>
>> On 2016-04-14 07:59:07 +0530, Amit Kapila wrote:
>> > What you want to see by prewarming?
>>
>> Prewarming appears to greatly reduce the per-run variance on that
>> machine, making it a lot easier to get meaningful results.  Thus it'd
>> make it easier to compare pre/post padding numbers.
>>
>> > Will it have safe effect, if the tests are run for 10 or 15 mins
>> > rather than 5 mins?
>>
>> s/safe/same/? If so, no, I doesn't look that way. The order of buffers
>> appears to play a large role; and it won't change in a memory resident
>> workload over one run.
>>
>
> I've tried to run read-only benchmark of pad_pgxact_v1.patch on 4x18 Intel
> machine.  The results are following.
>
> clients no-pad  pad
> 1   12997   13381
> 2   26728   25645
> 4   52539   51738
> 8   103785  102337
> 10  132606  126174
> 20  255844  252143
> 30  371359  357629
> 40  450429  443053
> 50  497705  488250
> 60  564385  564877
> 70  718860  725149
> 80  934170  931218
> 90  1152961 1146498
> 100 1240055 1300369
> 110 1207727 1375706
> 120 1167681 1417032
> 130 1120891 1448408
> 140 1085904 1449027
> 150 1022160 1437545
> 160 976487  1441720
> 170 978120  1435848
> 180 953843  1414925
>
> snapshot_too_old patch was reverted in the both cases.
> On high number of clients padding gives very significant benefit.
>


These results indicates that the patch is a win.   Are these results median
of 3 runs or single run data.  By the way can you share the output of lscpu
command on this m/c.

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: [HACKERS] Move PinBuffer and UnpinBuffer to atomics

2016-04-17 Thread Amit Kapila

On Thu, Apr 14, 2016 at 8:05 AM, Andres Freund  wrote:
>
> On 2016-04-14 07:59:07 +0530, Amit Kapila wrote:
> > What you want to see by prewarming?
>
> Prewarming appears to greatly reduce the per-run variance on that
> machine, making it a lot easier to get meaningful results.
>

I think you are referring the tests done by Robert on power-8 m/c, but the
performance results I have reported were on intel x86.  In last two days, I
have spent quite some effort to do the performance testing of this patch
with pre-warming by using the same query [1] as used by Robert in his
tests.  The tests are done such that first it start server, pre-warms the
relations, ran read-only test, stop server, again repeat this for next
test.  I have observed that the variance in run-to-run performance still
occurs especially at higher client count (128).  Below are results for 128
client count both when the tests ran first with patch and then with HEAD
and vice versa.

Test-1
--
client count - 128 (basically -c 128 -j 128)

first tests ran with patch and then with HEAD

Patch_ver/Runs HEAD (commit -70715e6a) Patch
Run-1 156748 174640
Run-2 151352 150115
Run-3 177940 165269

Test-2
--
client count - 128 (basically -c 128 -j 128)

first tests ran with HEAD and then with patch

Patch_ver/Runs HEAD (commit -70715e6a) Patch
Run-1 173063 151282
Run-2 173187 140676
Run-3 177046 166726

I think this patch (padding pgxact) certainly is beneficial as reported
above thread. At very high client count some variation in performance is
observed with and without patch, but I feel in general it is a win.

[1] - psql -c "select sum(x.x) from (select pg_prewarm(oid) as x from
pg_class where relkind in ('i', 'r') order by oid) x;"

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: [HACKERS] Move PinBuffer and UnpinBuffer to atomics

2016-04-14 Thread Alexander Korotkov

On Thu, Apr 14, 2016 at 5:35 AM, Andres Freund  wrote:

> On 2016-04-14 07:59:07 +0530, Amit Kapila wrote:
> > What you want to see by prewarming?
>
> Prewarming appears to greatly reduce the per-run variance on that
> machine, making it a lot easier to get meaningful results.  Thus it'd
> make it easier to compare pre/post padding numbers.
>
> > Will it have safe effect, if the tests are run for 10 or 15 mins
> > rather than 5 mins?
>
> s/safe/same/? If so, no, I doesn't look that way. The order of buffers
> appears to play a large role; and it won't change in a memory resident
> workload over one run.
>

I've tried to run read-only benchmark of pad_pgxact_v1.patch on 4x18 Intel
machine.  The results are following.

clients no-pad  pad
1   12997   13381
2   26728   25645
4   52539   51738
8   103785  102337
10  132606  126174
20  255844  252143
30  371359  357629
40  450429  443053
50  497705  488250
60  564385  564877
70  718860  725149
80  934170  931218
90  1152961 1146498
100 1240055 1300369
110 1207727 1375706
120 1167681 1417032
130 1120891 1448408
140 1085904 1449027
150 1022160 1437545
160 976487  1441720
170 978120  1435848
180 953843  1414925

snapshot_too_old patch was reverted in the both cases.
On high number of clients padding gives very significant benefit.  However,
on low number of clients there is small regression.  I think this
regression could be caused by alignment of other data structures.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Move PinBuffer and UnpinBuffer to atomics

2016-04-13 Thread Andres Freund

On 2016-04-14 07:59:07 +0530, Amit Kapila wrote:
> What you want to see by prewarming?

Prewarming appears to greatly reduce the per-run variance on that
machine, making it a lot easier to get meaningful results.  Thus it'd
make it easier to compare pre/post padding numbers.

> Will it have safe effect, if the tests are run for 10 or 15 mins
> rather than 5 mins?

s/safe/same/? If so, no, I doesn't look that way. The order of buffers
appears to play a large role; and it won't change in a memory resident
workload over one run.

Greetings,

Andres Freund

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Move PinBuffer and UnpinBuffer to atomics

2016-04-13 Thread Amit Kapila

On Tue, Apr 12, 2016 at 9:32 PM, Andres Freund  wrote:
>
> On 2016-04-12 19:42:11 +0530, Amit Kapila wrote:
> > Andres suggested me on IM to take performance data on x86 m/c
> > by padding PGXACT and the data for the same is as below:
> >
> > median of 3, 5-min runs
>
> Thanks for running these.
>
> I presume these were *without* pg_prewarming the contents?
>

Yes.

>  It'd be
> interesting to do the same with prewarming;

What you want to see by prewarming?  Will it have safe effect, if the tests
are run for 10 or 15 mins rather than 5 mins?

> Could share details of hardware you used?  I could try to find something
similar to reproduce this.

Processor related information (using lscpu)

Architecture:  x86_64
CPU op-mode(s):32-bit, 64-bit
Byte Order:Little Endian
CPU(s):128
On-line CPU(s) list:   0-127
Thread(s) per core:2
Core(s) per socket:8
Socket(s): 8
NUMA node(s):  8
Vendor ID: GenuineIntel
CPU family:6
Model: 47
Model name:Intel(R) Xeon(R) CPU E7- 8830  @ 2.13GHz
Stepping:  2
CPU MHz:   1064.000
BogoMIPS:  4266.62
Virtualization:VT-x
L1d cache: 32K
L1i cache: 32K
L2 cache:  256K
L3 cache:  24576K
NUMA node0 CPU(s): 0,65-71,96-103
NUMA node1 CPU(s): 72-79,104-111
NUMA node2 CPU(s): 80-87,112-119
NUMA node3 CPU(s): 88-95,120-127
NUMA node4 CPU(s): 1-8,33-40
NUMA node5 CPU(s): 9-16,41-48
NUMA node6 CPU(s): 17-24,49-56
NUMA node7 CPU(s): 25-32,57-64

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: [HACKERS] Move PinBuffer and UnpinBuffer to atomics

2016-04-13 Thread Alexander Korotkov

On Tue, Apr 12, 2016 at 5:12 PM, Amit Kapila 
wrote:

> On Tue, Apr 12, 2016 at 3:48 PM, Alexander Korotkov <
> a.korot...@postgrespro.ru> wrote:
>
>> On Tue, Apr 12, 2016 at 12:40 AM, Andres Freund 
>> wrote:
>>
>>> I did get access to the machine (thanks!). My testing shows that
>>> performance is sensitive to various parameters influencing memory
>>> allocation. E.g. twiddling with max_connections changes
>>> performance. With max_connections=400 and the previous patches applied I
>>> get ~122 tps, with 402 ~162 tps.  This sorta confirms that we're
>>> dealing with an alignment/sharing related issue.
>>>
>>> Padding PGXACT to a full cache-line seems to take care of the largest
>>> part of the performance irregularity. I looked at perf profiles and saw
>>> that most cache misses stem from there, and that the percentage (not
>>> absolute amount!) changes between fast/slow settings.
>>>
>>> To me it makes intuitive sense why you'd want PGXACTs to be on separate
>>> cachelines - they're constantly dirtied via SnapshotResetXmin(). Indeed
>>> making it immediately return propels performance up to 172, without
>>> other changes. Additionally cacheline-padding PGXACT speeds things up to
>>> 175 tps.
>>>
>>
>> It seems like padding PGXACT to a full cache-line is a great
>> improvement.  We have not so many PGXACTs to care about bytes wasted to
>> padding.
>>
>
> Yes, it seems generally it is a good idea, but not sure if it is a
> complete fix for variation in performance we are seeing when we change
> shared memory structures.  Andres suggested me on IM to take performance
> data on x86 m/c by padding PGXACT and the data for the same is as below:
>
> median of 3, 5-min runs
>
> Client_Count/Patch_ver 8 64 128
> HEAD 59708 329560 173655
> PATCH 61480 379798 157580
>
> Here, at 128 client-count the performance with patch still seems to have
> variation.  The highest tps with patch (170363) is close to HEAD (175718).
> This could be run-to-run variation, but I think it indicates that there are
> more places where we might need such padding or may be optimize them, so
> that they are aligned.
>
> I can do some more experiments on similar lines, but I am out on vacation
> and might not be able to access the m/c for 3-4 days.
>

Could share details of hardware you used?  I could try to find something
similar to reproduce this.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: [HACKERS] Move PinBuffer and UnpinBuffer to atomics

2016-04-12 Thread Andres Freund

On 2016-04-12 19:42:11 +0530, Amit Kapila wrote:
> Yes, it seems generally it is a good idea, but not sure if it is a complete
> fix for variation in performance we are seeing when we change shared memory
> structures.

I didn't suspect it would be. More whether it'd be beneficial
performance wise. FWIW, I haven't seen the variations you're observing
on any machine so far.

I think at high concurrency levels we're quite likely to interact with
the exact strategy used for the last-level/l3 cache. pgprocno,
allPgXact, BufferDescs are all arrays with a regular stride that we
access across several numa nodes, at a very high rate. At some point
that makes very likely that cache conflicts occur in set associative
caches.

> Andres suggested me on IM to take performance data on x86 m/c
> by padding PGXACT and the data for the same is as below:
> 
> median of 3, 5-min runs

Thanks for running these.

I presume these were *without* pg_prewarming the contents? It'd be
interesting to do the same with prewarming; aligning these structures
should be unrelated to the separate issue of bufferdesc order having a
rather massive performance effect.

Greetings,

Andres Freund

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Move PinBuffer and UnpinBuffer to atomics

2016-04-12 Thread Amit Kapila

On Tue, Apr 12, 2016 at 3:48 PM, Alexander Korotkov <
a.korot...@postgrespro.ru> wrote:

> On Tue, Apr 12, 2016 at 12:40 AM, Andres Freund 
> wrote:
>
>> I did get access to the machine (thanks!). My testing shows that
>> performance is sensitive to various parameters influencing memory
>> allocation. E.g. twiddling with max_connections changes
>> performance. With max_connections=400 and the previous patches applied I
>> get ~122 tps, with 402 ~162 tps.  This sorta confirms that we're
>> dealing with an alignment/sharing related issue.
>>
>> Padding PGXACT to a full cache-line seems to take care of the largest
>> part of the performance irregularity. I looked at perf profiles and saw
>> that most cache misses stem from there, and that the percentage (not
>> absolute amount!) changes between fast/slow settings.
>>
>> To me it makes intuitive sense why you'd want PGXACTs to be on separate
>> cachelines - they're constantly dirtied via SnapshotResetXmin(). Indeed
>> making it immediately return propels performance up to 172, without
>> other changes. Additionally cacheline-padding PGXACT speeds things up to
>> 175 tps.
>>
>
> It seems like padding PGXACT to a full cache-line is a great improvement.
> We have not so many PGXACTs to care about bytes wasted to padding.
>

Yes, it seems generally it is a good idea, but not sure if it is a complete
fix for variation in performance we are seeing when we change shared memory
structures.  Andres suggested me on IM to take performance data on x86 m/c
by padding PGXACT and the data for the same is as below:

median of 3, 5-min runs

Client_Count/Patch_ver 8 64 128
HEAD 59708 329560 173655
PATCH 61480 379798 157580

Here, at 128 client-count the performance with patch still seems to have
variation.  The highest tps with patch (170363) is close to HEAD (175718).
This could be run-to-run variation, but I think it indicates that there are
more places where we might need such padding or may be optimize them, so
that they are aligned.

I can do some more experiments on similar lines, but I am out on vacation
and might not be able to access the m/c for 3-4 days.

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

pad_pgxact_v1.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Move PinBuffer and UnpinBuffer to atomics

2016-04-12 Thread Alexander Korotkov

On Tue, Apr 12, 2016 at 12:40 AM, Andres Freund  wrote:

> I did get access to the machine (thanks!). My testing shows that
> performance is sensitive to various parameters influencing memory
> allocation. E.g. twiddling with max_connections changes
> performance. With max_connections=400 and the previous patches applied I
> get ~122 tps, with 402 ~162 tps.  This sorta confirms that we're
> dealing with an alignment/sharing related issue.
>
> Padding PGXACT to a full cache-line seems to take care of the largest
> part of the performance irregularity. I looked at perf profiles and saw
> that most cache misses stem from there, and that the percentage (not
> absolute amount!) changes between fast/slow settings.
>
> To me it makes intuitive sense why you'd want PGXACTs to be on separate
> cachelines - they're constantly dirtied via SnapshotResetXmin(). Indeed
> making it immediately return propels performance up to 172, without
> other changes. Additionally cacheline-padding PGXACT speeds things up to
> 175 tps.
>

It seems like padding PGXACT to a full cache-line is a great improvement.
We have not so many PGXACTs to care about bytes wasted to padding.  But
could it have another negative side-effect?

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: [HACKERS] Move PinBuffer and UnpinBuffer to atomics

2016-04-12 Thread Amit Kapila

On Mon, Apr 11, 2016 at 7:33 PM, Alexander Korotkov <
a.korot...@postgrespro.ru> wrote:

> On Sun, Apr 10, 2016 at 2:24 PM, Amit Kapila 
> wrote:
>>
>> I also tried to run perf top during pgbench and get some interesting
>>> results.
>>>
>>> Without 5364b357:
>>>5,69%  postgres [.] GetSnapshotData
>>>4,47%  postgres [.] LWLockAttemptLock
>>>3,81%  postgres [.] _bt_compare
>>>3,42%  postgres [.] hash_search_with_hash_value
>>>3,08%  postgres [.] LWLockRelease
>>>2,49%  postgres [.] PinBuffer.isra.3
>>>1,58%  postgres [.] AllocSetAlloc
>>>1,17%  [kernel] [k] __schedule
>>>1,15%  postgres [.] PostgresMain
>>>1,13%  libc-2.17.so [.] vfprintf
>>>1,01%  libc-2.17.so [.] __memcpy_ssse3_back
>>>
>>> With 5364b357:
>>>   18,54%  postgres [.] GetSnapshotData
>>>3,45%  postgres [.] LWLockRelease
>>>3,27%  postgres [.] LWLockAttemptLock
>>>3,21%  postgres [.] _bt_compare
>>>2,93%  postgres [.] hash_search_with_hash_value
>>>2,00%  postgres [.] PinBuffer.isra.3
>>>1,32%  postgres [.] AllocSetAlloc
>>>1,10%  libc-2.17.so [.] vfprintf
>>>
>>> Very surprising.  It appears that after 5364b357, GetSnapshotData
>>> consumes more time.  But I can't see anything depending on clog buffers
>>> in GetSnapshotData code...
>>>
>>
>> There is a related fact presented by Mithun C Y as well [1] which
>> suggests that Andres's idea of reducing the cost of snapshot shows
>> noticeable gain after increasing the clog buffers.  If you read that thread
>> you will notice that initially we didn't notice much gain by that idea, but
>> with increased clog buffers, it started showing noticeable gain.  If by any
>> chance, you can apply that patch and see the results (latest patch is at
>> [2]).
>>
>>
>> [1] -
>> http://www.postgresql.org/message-id/CAD__Ouic1Tvnwqm6Wf6j7Cz1Kk1DQgmy0isC7=ogx+3jtfg...@mail.gmail.com
>>
>> [2] -
>> http://www.postgresql.org/message-id/cad__ouiwei5she2wwqck36ac9qmhvjuqg3cfpn+ofcmb7rd...@mail.gmail.com
>>
>
> I took a look at this thread but I still didn't get why number of clog
> buffers affects read-only benchmark.
> Could you please explain it to me in more details?
>
>

As already pointed out by Andres, that this is mainly due to shared memory
alignment issues.  We have observed that changing some shared memory
arrangement (structures) some times causes huge differences in
performance.  I guess that is the reason why with Cache the snapshot patch,
I am seeing the performance gets restored (mainly because it is changing
shared memory structures).  I think the right way to fix this is to find
which shared structure/structure's needs padding, so that we don't see such
fluctuations every time we change some thing is shared memory.



With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: [HACKERS] Move PinBuffer and UnpinBuffer to atomics

2016-04-11 Thread Andres Freund

On 2016-04-11 14:40:29 -0700, Andres Freund wrote:
> On 2016-04-11 12:17:20 -0700, Andres Freund wrote:
> I did get access to the machine (thanks!). My testing shows that
> performance is sensitive to various parameters influencing memory
> allocation. E.g. twiddling with max_connections changes
> performance. With max_connections=400 and the previous patches applied I
> get ~122 tps, with 402 ~162 tps.  This sorta confirms that we're
> dealing with an alignment/sharing related issue.
> 
> Padding PGXACT to a full cache-line seems to take care of the largest
> part of the performance irregularity. I looked at perf profiles and saw
> that most cache misses stem from there, and that the percentage (not
> absolute amount!) changes between fast/slow settings.
> 
> To me it makes intuitive sense why you'd want PGXACTs to be on separate
> cachelines - they're constantly dirtied via SnapshotResetXmin(). Indeed
> making it immediately return propels performance up to 172, without
> other changes. Additionally cacheline-padding PGXACT speeds things up to
> 175 tps.
> 
> But I'm unclear why the magnitude of the effect depends on other
> allocations. With the previously posted patches allPgXact is always
> cacheline-aligned.

I've spent considerable amount experimenting around this. The alignment
of allPgXact does *not* apear to play a significant role; rather it
apears to be the the "distance" between the allPgXact and pgprocno
arrays.

Alexander, could you post dmidecode output, and install numactl &
numastat on the machine? I wonder if the box has cluster-on-die
activated or not.  Do I see correctly that this is a system that could
potentially have 8 sockets, but actually has only four? Because I see
physical id : 3 in /proc/cpuinfo only going up to three (from zero),
not 7? And there's only 144 processorcs, while each E7-8890 v3 should
have 36 threads.

Greetings,

Andres Freund

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Move PinBuffer and UnpinBuffer to atomics

2016-04-11 Thread Andres Freund

On 2016-04-11 14:40:29 -0700, Andres Freund wrote:
> On 2016-04-11 12:17:20 -0700, Andres Freund wrote:
> > On 2016-04-11 22:08:15 +0300, Alexander Korotkov wrote:
> > > On Mon, Apr 11, 2016 at 5:04 PM, Alexander Korotkov <
> > > a.korot...@postgrespro.ru> wrote:
> > > 
> > > > On Mon, Apr 11, 2016 at 8:10 AM, Andres Freund  
> > > > wrote:
> > > >
> > > >> Could you retry after applying the attached series of patches?
> > > >>
> > > >
> > > > Yes, I will try with these patches and snapshot too old reverted.
> > > >
> > > 
> > > I've run the same benchmark with 279d86af and 848ef42b reverted.  I've
> > > tested of all 3 patches from you applied and, for comparison, 3 patches +
> > > clog buffers reverted back to 32.
> > > 
> > > clients patches patches + clog_32
> > > 1 12594   12556
> > > 2 26705   26258
> > > 4 50985   53254
> > > 8103234  104416
> > > 10   135321  130893
> > > 20   268675  267648
> > > 30   370437  409710
> > > 40   486512  482382
> > > 50   539910  525667
> > > 60   616401  672230
> > > 70   667864  660853
> > > 80   924606  737768
> > > 90  1217435  799581
> > > 100 1326054  863066
> > > 110 1446380  980206
> > > 120 1484920 1000963
> > > 130 1512440 1058852
> > > 140 1536181 1088958
> > > 150 1504750 1134354
> > > 160 1461513 1132173
> > > 170 1453943 1158656
> > > 180 1426288 1120511
> 
> > Any chance that I could run some tests on that machine myself? It's very
> > hard to investigate that kind of issue without access; the only thing I
> > otherwise can do is lob patches at you, till we find the relevant
> > memory.
> 
> I did get access to the machine (thanks!). My testing shows that
> performance is sensitive to various parameters influencing memory
> allocation. E.g. twiddling with max_connections changes
> performance. With max_connections=400 and the previous patches applied I
> get ~122 tps, with 402 ~162 tps.  This sorta confirms that we're
> dealing with an alignment/sharing related issue.
> 
> Padding PGXACT to a full cache-line seems to take care of the largest
> part of the performance irregularity. I looked at perf profiles and saw
> that most cache misses stem from there, and that the percentage (not
> absolute amount!) changes between fast/slow settings.
> 
> To me it makes intuitive sense why you'd want PGXACTs to be on separate
> cachelines - they're constantly dirtied via SnapshotResetXmin(). Indeed
> making it immediately return propels performance up to 172, without
> other changes. Additionally cacheline-padding PGXACT speeds things up to
> 175 tps.
> 
> But I'm unclear why the magnitude of the effect depends on other
> allocations. With the previously posted patches allPgXact is always
> cacheline-aligned.

Oh, one more thing: The volatile on PGXACT in GetSnapshotData() costs us
about 100k tps on that machine; without, afaics, any point but force
pgxact->xmin to only be loaded once (which a *((volatile
TransactionId)>xmin) does just as well).

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Move PinBuffer and UnpinBuffer to atomics

2016-04-11 Thread Andres Freund

On 2016-04-11 12:17:20 -0700, Andres Freund wrote:
> On 2016-04-11 22:08:15 +0300, Alexander Korotkov wrote:
> > On Mon, Apr 11, 2016 at 5:04 PM, Alexander Korotkov <
> > a.korot...@postgrespro.ru> wrote:
> > 
> > > On Mon, Apr 11, 2016 at 8:10 AM, Andres Freund  wrote:
> > >
> > >> Could you retry after applying the attached series of patches?
> > >>
> > >
> > > Yes, I will try with these patches and snapshot too old reverted.
> > >
> > 
> > I've run the same benchmark with 279d86af and 848ef42b reverted.  I've
> > tested of all 3 patches from you applied and, for comparison, 3 patches +
> > clog buffers reverted back to 32.
> > 
> > clients patches patches + clog_32
> > 1 12594   12556
> > 2 26705   26258
> > 4 50985   53254
> > 8103234  104416
> > 10   135321  130893
> > 20   268675  267648
> > 30   370437  409710
> > 40   486512  482382
> > 50   539910  525667
> > 60   616401  672230
> > 70   667864  660853
> > 80   924606  737768
> > 90  1217435  799581
> > 100 1326054  863066
> > 110 1446380  980206
> > 120 1484920 1000963
> > 130 1512440 1058852
> > 140 1536181 1088958
> > 150 1504750 1134354
> > 160 1461513 1132173
> > 170 1453943 1158656
> > 180 1426288 1120511

> Any chance that I could run some tests on that machine myself? It's very
> hard to investigate that kind of issue without access; the only thing I
> otherwise can do is lob patches at you, till we find the relevant
> memory.

I did get access to the machine (thanks!). My testing shows that
performance is sensitive to various parameters influencing memory
allocation. E.g. twiddling with max_connections changes
performance. With max_connections=400 and the previous patches applied I
get ~122 tps, with 402 ~162 tps.  This sorta confirms that we're
dealing with an alignment/sharing related issue.

Padding PGXACT to a full cache-line seems to take care of the largest
part of the performance irregularity. I looked at perf profiles and saw
that most cache misses stem from there, and that the percentage (not
absolute amount!) changes between fast/slow settings.

To me it makes intuitive sense why you'd want PGXACTs to be on separate
cachelines - they're constantly dirtied via SnapshotResetXmin(). Indeed
making it immediately return propels performance up to 172, without
other changes. Additionally cacheline-padding PGXACT speeds things up to
175 tps.

But I'm unclear why the magnitude of the effect depends on other
allocations. With the previously posted patches allPgXact is always
cacheline-aligned.

Greetings,

Andres Freund

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Move PinBuffer and UnpinBuffer to atomics

2016-04-11 Thread Andres Freund

On 2016-04-11 22:08:15 +0300, Alexander Korotkov wrote:
> On Mon, Apr 11, 2016 at 5:04 PM, Alexander Korotkov <
> a.korot...@postgrespro.ru> wrote:
> 
> > On Mon, Apr 11, 2016 at 8:10 AM, Andres Freund  wrote:
> >
> >> Could you retry after applying the attached series of patches?
> >>
> >
> > Yes, I will try with these patches and snapshot too old reverted.
> >
> 
> I've run the same benchmark with 279d86af and 848ef42b reverted.  I've
> tested of all 3 patches from you applied and, for comparison, 3 patches +
> clog buffers reverted back to 32.
> 
> clients patches patches + clog_32
> 1 12594   12556
> 2 26705   26258
> 4 50985   53254
> 8103234  104416
> 10   135321  130893
> 20   268675  267648
> 30   370437  409710
> 40   486512  482382
> 50   539910  525667
> 60   616401  672230
> 70   667864  660853
> 80   924606  737768
> 90  1217435  799581
> 100 1326054  863066
> 110 1446380  980206
> 120 1484920 1000963
> 130 1512440 1058852
> 140 1536181 1088958
> 150 1504750 1134354
> 160 1461513 1132173
> 170 1453943 1158656
> 180 1426288 1120511
> 
> I hardly can understand how clog buffers influence read-only
> benchmark.

My guess is that the number of buffers influences some alignment;
causing a lot of false sharing or something. I.e. the number of clog
buffers itself doesn't play a role, it's just a question of how it
change the memory layout.

> It even harder for me why influence of clog buffers change its
> direction after applying your patches.  But the results are following.
> And I've rechecked some values manually to verify that there is no
> error. > I would be very thankful for any explanation.

Hm. Possibly this patches influenced alignment, but didn't make things
sufficiently stable to guarantee that we're always correctly aligned,
thus the 32bit case now regresses.

Any chance that I could run some tests on that machine myself? It's very
hard to investigate that kind of issue without access; the only thing I
otherwise can do is lob patches at you, till we find the relevant
memory.

If not, one of the things to do is to use perf to compare where cache
misses is happening between the fast and the slow case.

Greetings,

Andres Freund

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Move PinBuffer and UnpinBuffer to atomics

2016-04-11 Thread Alexander Korotkov

On Mon, Apr 11, 2016 at 5:04 PM, Alexander Korotkov <
a.korot...@postgrespro.ru> wrote:

> On Mon, Apr 11, 2016 at 8:10 AM, Andres Freund  wrote:
>
>> Could you retry after applying the attached series of patches?
>>
>
> Yes, I will try with these patches and snapshot too old reverted.
>

I've run the same benchmark with 279d86af and 848ef42b reverted.  I've
tested of all 3 patches from you applied and, for comparison, 3 patches +
clog buffers reverted back to 32.

clients patches patches + clog_32
1 12594   12556
2 26705   26258
4 50985   53254
8103234  104416
10   135321  130893
20   268675  267648
30   370437  409710
40   486512  482382
50   539910  525667
60   616401  672230
70   667864  660853
80   924606  737768
90  1217435  799581
100 1326054  863066
110 1446380  980206
120 1484920 1000963
130 1512440 1058852
140 1536181 1088958
150 1504750 1134354
160 1461513 1132173
170 1453943 1158656
180 1426288 1120511

I hardly can understand how clog buffers influence read-only benchmark.  It
even harder for me why influence of clog buffers change its direction after
applying your patches.  But the results are following.  And I've rechecked
some values manually to verify that there is no error.  I would be very
thankful for any explanation.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: [HACKERS] Move PinBuffer and UnpinBuffer to atomics

2016-04-11 Thread Alexander Korotkov

On Mon, Apr 11, 2016 at 8:10 AM, Andres Freund  wrote:

> On 2016-04-10 09:03:37 +0300, Alexander Korotkov wrote:
> > On Sun, Apr 10, 2016 at 8:36 AM, Alexander Korotkov <
> > a.korot...@postgrespro.ru> wrote:
> >
> > > On Sat, Apr 9, 2016 at 10:49 PM, Andres Freund 
> wrote:
> > >
> > >>
> > >>
> > >> On April 9, 2016 12:43:03 PM PDT, Andres Freund 
> > >> wrote:
> > >> >On 2016-04-09 22:38:31 +0300, Alexander Korotkov wrote:
> > >> >> There are results with 5364b357 reverted.
> > >> >
> > >> >Crazy that this has such a negative impact. Amit, can you reproduce
> > >> >that? Alexander, I guess for r/w workload 5364b357 is a benefit on
> that
> > >> >machine as well?
> > >>
> > >> How sure are you about these measurements?
> > >
> > >
> > > I'm pretty sure.  I've retried it multiple times by hand before re-run
> the
> > > script.
> > >
> > >
> > >> Because there really shouldn't be clog lookups one a steady state is
> > >> reached...
> > >>
> > >
> > > Hm... I'm also surprised. There shouldn't be clog lookups once hint
> bits
> > > are set.
> > >
> >
> > I also tried to run perf top during pgbench and get some interesting
> > results.
> >
> > Without 5364b357:
> >5,69%  postgres [.] GetSnapshotData
> >4,47%  postgres [.] LWLockAttemptLock
> >3,81%  postgres [.] _bt_compare
> >3,42%  postgres [.] hash_search_with_hash_value
> >3,08%  postgres [.] LWLockRelease
> >2,49%  postgres [.] PinBuffer.isra.3
> >1,58%  postgres [.] AllocSetAlloc
> >1,17%  [kernel] [k] __schedule
> >1,15%  postgres [.] PostgresMain
> >1,13%  libc-2.17.so [.] vfprintf
> >1,01%  libc-2.17.so [.] __memcpy_ssse3_back
> >
> > With 5364b357:
> >   18,54%  postgres [.] GetSnapshotData
> >3,45%  postgres [.] LWLockRelease
> >3,27%  postgres [.] LWLockAttemptLock
> >3,21%  postgres [.] _bt_compare
> >2,93%  postgres [.] hash_search_with_hash_value
> >2,00%  postgres [.] PinBuffer.isra.3
> >1,32%  postgres [.] AllocSetAlloc
> >1,10%  libc-2.17.so [.] vfprintf
> >
> > Very surprising.  It appears that after 5364b357, GetSnapshotData
> consumes
> > more time.  But I can't see anything depending on clog buffers
> > in GetSnapshotData code...
>
> Could you retry after applying the attached series of patches?
>

Yes, I will try with these patches and snapshot too old reverted.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: [HACKERS] Move PinBuffer and UnpinBuffer to atomics

2016-04-11 Thread Alexander Korotkov

On Sun, Apr 10, 2016 at 2:24 PM, Amit Kapila 
wrote:

> On Sun, Apr 10, 2016 at 11:33 AM, Alexander Korotkov <
> a.korot...@postgrespro.ru> wrote:
>
>> On Sun, Apr 10, 2016 at 8:36 AM, Alexander Korotkov <
>> a.korot...@postgrespro.ru> wrote:
>>
>>> On Sat, Apr 9, 2016 at 10:49 PM, Andres Freund 
>>> wrote:
>>>


 On April 9, 2016 12:43:03 PM PDT, Andres Freund 
 wrote:
 >On 2016-04-09 22:38:31 +0300, Alexander Korotkov wrote:
 >> There are results with 5364b357 reverted.
 >
 >Crazy that this has such a negative impact. Amit, can you reproduce
 >that? Alexander, I guess for r/w workload 5364b357 is a benefit on that
 >machine as well?

 How sure are you about these measurements?
>>>
>>>
>>> I'm pretty sure.  I've retried it multiple times by hand before re-run
>>> the script.
>>>
>>>
 Because there really shouldn't be clog lookups one a steady state is
 reached...

>>>
>>> Hm... I'm also surprised. There shouldn't be clog lookups once hint bits
>>> are set.
>>>
>>
>> I also tried to run perf top during pgbench and get some interesting
>> results.
>>
>> Without 5364b357:
>>5,69%  postgres [.] GetSnapshotData
>>4,47%  postgres [.] LWLockAttemptLock
>>3,81%  postgres [.] _bt_compare
>>3,42%  postgres [.] hash_search_with_hash_value
>>3,08%  postgres [.] LWLockRelease
>>2,49%  postgres [.] PinBuffer.isra.3
>>1,58%  postgres [.] AllocSetAlloc
>>1,17%  [kernel] [k] __schedule
>>1,15%  postgres [.] PostgresMain
>>1,13%  libc-2.17.so [.] vfprintf
>>1,01%  libc-2.17.so [.] __memcpy_ssse3_back
>>
>> With 5364b357:
>>   18,54%  postgres [.] GetSnapshotData
>>3,45%  postgres [.] LWLockRelease
>>3,27%  postgres [.] LWLockAttemptLock
>>3,21%  postgres [.] _bt_compare
>>2,93%  postgres [.] hash_search_with_hash_value
>>2,00%  postgres [.] PinBuffer.isra.3
>>1,32%  postgres [.] AllocSetAlloc
>>1,10%  libc-2.17.so [.] vfprintf
>>
>> Very surprising.  It appears that after 5364b357, GetSnapshotData
>> consumes more time.  But I can't see anything depending on clog buffers
>> in GetSnapshotData code...
>>
>
> There is a related fact presented by Mithun C Y as well [1] which suggests
> that Andres's idea of reducing the cost of snapshot shows noticeable gain
> after increasing the clog buffers.  If you read that thread you will notice
> that initially we didn't notice much gain by that idea, but with increased
> clog buffers, it started showing noticeable gain.  If by any chance, you
> can apply that patch and see the results (latest patch is at [2]).
>
>
> [1] -
> http://www.postgresql.org/message-id/CAD__Ouic1Tvnwqm6Wf6j7Cz1Kk1DQgmy0isC7=ogx+3jtfg...@mail.gmail.com
>
> [2] -
> http://www.postgresql.org/message-id/cad__ouiwei5she2wwqck36ac9qmhvjuqg3cfpn+ofcmb7rd...@mail.gmail.com
>

I took a look at this thread but I still didn't get why number of clog
buffers affects read-only benchmark.
Could you please explain it to me in more details?

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: [HACKERS] Move PinBuffer and UnpinBuffer to atomics

2016-04-10 Thread Andres Freund

On 2016-04-10 09:03:37 +0300, Alexander Korotkov wrote:
> On Sun, Apr 10, 2016 at 8:36 AM, Alexander Korotkov <
> a.korot...@postgrespro.ru> wrote:
> 
> > On Sat, Apr 9, 2016 at 10:49 PM, Andres Freund  wrote:
> >
> >>
> >>
> >> On April 9, 2016 12:43:03 PM PDT, Andres Freund 
> >> wrote:
> >> >On 2016-04-09 22:38:31 +0300, Alexander Korotkov wrote:
> >> >> There are results with 5364b357 reverted.
> >> >
> >> >Crazy that this has such a negative impact. Amit, can you reproduce
> >> >that? Alexander, I guess for r/w workload 5364b357 is a benefit on that
> >> >machine as well?
> >>
> >> How sure are you about these measurements?
> >
> >
> > I'm pretty sure.  I've retried it multiple times by hand before re-run the
> > script.
> >
> >
> >> Because there really shouldn't be clog lookups one a steady state is
> >> reached...
> >>
> >
> > Hm... I'm also surprised. There shouldn't be clog lookups once hint bits
> > are set.
> >
> 
> I also tried to run perf top during pgbench and get some interesting
> results.
> 
> Without 5364b357:
>5,69%  postgres [.] GetSnapshotData
>4,47%  postgres [.] LWLockAttemptLock
>3,81%  postgres [.] _bt_compare
>3,42%  postgres [.] hash_search_with_hash_value
>3,08%  postgres [.] LWLockRelease
>2,49%  postgres [.] PinBuffer.isra.3
>1,58%  postgres [.] AllocSetAlloc
>1,17%  [kernel] [k] __schedule
>1,15%  postgres [.] PostgresMain
>1,13%  libc-2.17.so [.] vfprintf
>1,01%  libc-2.17.so [.] __memcpy_ssse3_back
> 
> With 5364b357:
>   18,54%  postgres [.] GetSnapshotData
>3,45%  postgres [.] LWLockRelease
>3,27%  postgres [.] LWLockAttemptLock
>3,21%  postgres [.] _bt_compare
>2,93%  postgres [.] hash_search_with_hash_value
>2,00%  postgres [.] PinBuffer.isra.3
>1,32%  postgres [.] AllocSetAlloc
>1,10%  libc-2.17.so [.] vfprintf
> 
> Very surprising.  It appears that after 5364b357, GetSnapshotData consumes
> more time.  But I can't see anything depending on clog buffers
> in GetSnapshotData code...

Could you retry after applying the attached series of patches?

- Andres
>From e8ad791c97004c64a1f27a500ba100b69fdc8d87 Mon Sep 17 00:00:00 2001
From: Andres Freund 
Date: Sun, 10 Apr 2016 21:47:04 -0700
Subject: [PATCH 1/3] Finish 09adc9a8c09c9640de05c7023b27fb83c761e91c.

---
 src/backend/access/transam/parallel.c |  2 +-
 src/backend/port/sysv_shmem.c |  2 +-
 src/backend/port/win32_shmem.c|  2 +-
 src/backend/storage/buffer/buf_init.c | 19 ++-
 src/backend/storage/ipc/shm_toc.c |  6 +++---
 src/backend/storage/ipc/shmem.c   | 17 ++---
 src/include/c.h   |  2 --
 src/include/pg_config_manual.h|  8 
 src/include/storage/shm_toc.h |  2 +-
 9 files changed, 23 insertions(+), 37 deletions(-)

diff --git a/src/backend/access/transam/parallel.c b/src/backend/access/transam/parallel.c
index 0bba9a7..f8ea89c 100644
--- a/src/backend/access/transam/parallel.c
+++ b/src/backend/access/transam/parallel.c
@@ -237,7 +237,7 @@ InitializeParallelDSM(ParallelContext *pcxt)
 		shm_toc_estimate_keys(>estimator, 6);
 
 		/* Estimate space need for error queues. */
-		StaticAssertStmt(BUFFERALIGN(PARALLEL_ERROR_QUEUE_SIZE) ==
+		StaticAssertStmt(CACHELINEALIGN(PARALLEL_ERROR_QUEUE_SIZE) ==
 		 PARALLEL_ERROR_QUEUE_SIZE,
 		 "parallel error queue size not buffer-aligned");
 		shm_toc_estimate_chunk(>estimator,
diff --git a/src/backend/port/sysv_shmem.c b/src/backend/port/sysv_shmem.c
index 6c442b9..084bc31 100644
--- a/src/backend/port/sysv_shmem.c
+++ b/src/backend/port/sysv_shmem.c
@@ -559,7 +559,7 @@ PGSharedMemoryCreate(Size size, bool makePrivate, int port,
 	 * Initialize space allocation status for segment.
 	 */
 	hdr->totalsize = size;
-	hdr->freeoffset = MAXALIGN(sizeof(PGShmemHeader));
+	hdr->freeoffset = CACHELINEALIGN(sizeof(PGShmemHeader));
 	*shim = hdr;
 
 	/* Save info for possible future use */
diff --git a/src/backend/port/win32_shmem.c b/src/backend/port/win32_shmem.c
index 0ff2c7e..81705fc 100644
--- a/src/backend/port/win32_shmem.c
+++ b/src/backend/port/win32_shmem.c
@@ -241,7 +241,7 @@ PGSharedMemoryCreate(Size size, bool makePrivate, int port,
 	 * Initialize space allocation status for segment.
 	 */
 	hdr->totalsize = size;
-	hdr->freeoffset = MAXALIGN(sizeof(PGShmemHeader));
+	hdr->freeoffset = CACHELINEALIGN(sizeof(PGShmemHeader));
 	hdr->dsm_control = 0;
 
 	/* Save info for possible future use */
diff --git a/src/backend/storage/buffer/buf_init.c b/src/backend/storage/buffer/buf_init.c
index a5cffc7..61f9c34 100644
--- a/src/backend/storage/buffer/buf_init.c

Re: [HACKERS] Move PinBuffer and UnpinBuffer to atomics

2016-04-10 Thread Amit Kapila

On Sun, Apr 10, 2016 at 6:15 PM, Amit Kapila 
wrote:

> On Sun, Apr 10, 2016 at 11:10 AM, Alexander Korotkov <
> a.korot...@postgrespro.ru> wrote:
>
>> On Sun, Apr 10, 2016 at 7:26 AM, Amit Kapila 
>> wrote:
>>
>>> On Sun, Apr 10, 2016 at 1:13 AM, Andres Freund 
>>> wrote:
>>>
 On 2016-04-09 22:38:31 +0300, Alexander Korotkov wrote:
 > There are results with 5364b357 reverted.

>>> What exactly is this test?
>>> I think assuming it is a read-only -M prepared pgbench run where data
>>> fits in shared buffers.  However if you can share exact details, then I can
>>> try the similar test.
>>>
>>
>> Yes, the test is:
>>
>> pgbench -s 1000 -c $clients -j 100 -M prepared -S -T 300
>> (shared_buffers=24GB)
>>
>>
>>>
 Crazy that this has such a negative impact. Amit, can you reproduce
 that?
>>>
>>>
>>> I will try it.
>>>
>>
>> Good.
>>
>
> Okay, I have done some performance testing of read-only tests with
> configuration suggested by you to see the impact
>
> pin_unpin - latest version of pin unpin patch on top of HEAD.
> pin_unpin_clog_32 - pin_unpin + change clog buffers to 32
>
> Client_Count/Patch_ver 64 128
> pin_unpin 330280 133586
> pin_unpin_clog_32 388244 132388
>
>
> This shows that at 64 client count, the performance is better with 32 clog
> buffers.  However, I think this is more attributed towards the fact that
> contention seems to shifted to procarraylock as to an extent indicated in
> Alexandar's mail.  I will try once with cache the snapshot patch as well
> and with clog buffers as 64.
>
>
I went ahead and tried with Cache the snapshot patch and with clog buffers
as 64 and below is performance data:

Description of patches

pin_unpin - latest version of pin unpin patch on top of HEAD.
pin_unpin_clog_32 - pin_unpin + change clog buffers to 32
pin_unpin_cache_snapshot - pin_unpin + Cache the snapshot
pin_unpin_clog_64 - pin_unpin + change clog buffers to 64

Client_Count/Patch_ver 64 128
pin_unpin 330280 133586
pin_unpin_clog_32 388244 132388
pin_unpin_cache_snapshot 412149 144799
pin_unpin_clog_64 391472 132951

Above data seems to indicate that cache the snapshot patch will make
performance go further up with clog buffers as 128 (HEAD).  I will take the
performance data with pin_unpin + clog buffers as 32 + cache the snapshot,
but above seems a good enough indication that making clog buffers as 128 is
a good move considering we will one day improve GetSnapshotData either by
Cache the snapshot technique or some other way.   Also making clog buffers
as 64 instead of 128 seems to address the regression (at the very least in
above tests), but for read-write performance, clog buffers as 128 has
better numbers, though the difference between 64 and 128 is not very high.

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: [HACKERS] Move PinBuffer and UnpinBuffer to atomics

2016-04-10 Thread Amit Kapila

On Sun, Apr 10, 2016 at 11:10 AM, Alexander Korotkov <
a.korot...@postgrespro.ru> wrote:

> On Sun, Apr 10, 2016 at 7:26 AM, Amit Kapila 
> wrote:
>
>> On Sun, Apr 10, 2016 at 1:13 AM, Andres Freund 
>> wrote:
>>
>>> On 2016-04-09 22:38:31 +0300, Alexander Korotkov wrote:
>>> > There are results with 5364b357 reverted.
>>>
>>>
>> What exactly is this test?
>> I think assuming it is a read-only -M prepared pgbench run where data
>> fits in shared buffers.  However if you can share exact details, then I can
>> try the similar test.
>>
>
> Yes, the test is:
>
> pgbench -s 1000 -c $clients -j 100 -M prepared -S -T 300
> (shared_buffers=24GB)
>
>
>>
>>> Crazy that this has such a negative impact. Amit, can you reproduce
>>> that?
>>
>>
>> I will try it.
>>
>
> Good.
>

Okay, I have done some performance testing of read-only tests with
configuration suggested by you to see the impact

pin_unpin - latest version of pin unpin patch on top of HEAD.
pin_unpin_clog_32 - pin_unpin + change clog buffers to 32

Client_Count/Patch_ver 64 128
pin_unpin 330280 133586
pin_unpin_clog_32 388244 132388

This shows that at 64 client count, the performance is better with 32 clog
buffers.  However, I think this is more attributed towards the fact that
contention seems to shifted to procarraylock as to an extent indicated in
Alexandar's mail.  I will try once with cache the snapshot patch as well
and with clog buffers as 64.

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: [HACKERS] Move PinBuffer and UnpinBuffer to atomics

2016-04-10 Thread Amit Kapila

On Sun, Apr 10, 2016 at 11:33 AM, Alexander Korotkov <
a.korot...@postgrespro.ru> wrote:

> On Sun, Apr 10, 2016 at 8:36 AM, Alexander Korotkov <
> a.korot...@postgrespro.ru> wrote:
>
>> On Sat, Apr 9, 2016 at 10:49 PM, Andres Freund 
>> wrote:
>>
>>>
>>>
>>> On April 9, 2016 12:43:03 PM PDT, Andres Freund 
>>> wrote:
>>> >On 2016-04-09 22:38:31 +0300, Alexander Korotkov wrote:
>>> >> There are results with 5364b357 reverted.
>>> >
>>> >Crazy that this has such a negative impact. Amit, can you reproduce
>>> >that? Alexander, I guess for r/w workload 5364b357 is a benefit on that
>>> >machine as well?
>>>
>>> How sure are you about these measurements?
>>
>>
>> I'm pretty sure.  I've retried it multiple times by hand before re-run
>> the script.
>>
>>
>>> Because there really shouldn't be clog lookups one a steady state is
>>> reached...
>>>
>>
>> Hm... I'm also surprised. There shouldn't be clog lookups once hint bits
>> are set.
>>
>
> I also tried to run perf top during pgbench and get some interesting
> results.
>
> Without 5364b357:
>5,69%  postgres [.] GetSnapshotData
>4,47%  postgres [.] LWLockAttemptLock
>3,81%  postgres [.] _bt_compare
>3,42%  postgres [.] hash_search_with_hash_value
>3,08%  postgres [.] LWLockRelease
>2,49%  postgres [.] PinBuffer.isra.3
>1,58%  postgres [.] AllocSetAlloc
>1,17%  [kernel] [k] __schedule
>1,15%  postgres [.] PostgresMain
>1,13%  libc-2.17.so [.] vfprintf
>1,01%  libc-2.17.so [.] __memcpy_ssse3_back
>
> With 5364b357:
>   18,54%  postgres [.] GetSnapshotData
>3,45%  postgres [.] LWLockRelease
>3,27%  postgres [.] LWLockAttemptLock
>3,21%  postgres [.] _bt_compare
>2,93%  postgres [.] hash_search_with_hash_value
>2,00%  postgres [.] PinBuffer.isra.3
>1,32%  postgres [.] AllocSetAlloc
>1,10%  libc-2.17.so [.] vfprintf
>
> Very surprising.  It appears that after 5364b357, GetSnapshotData consumes
> more time.  But I can't see anything depending on clog buffers
> in GetSnapshotData code...
>

There is a related fact presented by Mithun C Y as well [1] which suggests
that Andres's idea of reducing the cost of snapshot shows noticeable gain
after increasing the clog buffers.  If you read that thread you will notice
that initially we didn't notice much gain by that idea, but with increased
clog buffers, it started showing noticeable gain.  If by any chance, you
can apply that patch and see the results (latest patch is at [2]).


[1] -
http://www.postgresql.org/message-id/CAD__Ouic1Tvnwqm6Wf6j7Cz1Kk1DQgmy0isC7=ogx+3jtfg...@mail.gmail.com

[2] -
http://www.postgresql.org/message-id/cad__ouiwei5she2wwqck36ac9qmhvjuqg3cfpn+ofcmb7rd...@mail.gmail.com


With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: [HACKERS] Move PinBuffer and UnpinBuffer to atomics

2016-04-10 Thread Alexander Korotkov

On Sun, Apr 10, 2016 at 8:36 AM, Alexander Korotkov <
a.korot...@postgrespro.ru> wrote:

> On Sat, Apr 9, 2016 at 10:49 PM, Andres Freund  wrote:
>
>>
>>
>> On April 9, 2016 12:43:03 PM PDT, Andres Freund 
>> wrote:
>> >On 2016-04-09 22:38:31 +0300, Alexander Korotkov wrote:
>> >> There are results with 5364b357 reverted.
>> >
>> >Crazy that this has such a negative impact. Amit, can you reproduce
>> >that? Alexander, I guess for r/w workload 5364b357 is a benefit on that
>> >machine as well?
>>
>> How sure are you about these measurements?
>
>
> I'm pretty sure.  I've retried it multiple times by hand before re-run the
> script.
>
>
>> Because there really shouldn't be clog lookups one a steady state is
>> reached...
>>
>
> Hm... I'm also surprised. There shouldn't be clog lookups once hint bits
> are set.
>

I also tried to run perf top during pgbench and get some interesting
results.

Without 5364b357:
   5,69%  postgres [.] GetSnapshotData
   4,47%  postgres [.] LWLockAttemptLock
   3,81%  postgres [.] _bt_compare
   3,42%  postgres [.] hash_search_with_hash_value
   3,08%  postgres [.] LWLockRelease
   2,49%  postgres [.] PinBuffer.isra.3
   1,58%  postgres [.] AllocSetAlloc
   1,17%  [kernel] [k] __schedule
   1,15%  postgres [.] PostgresMain
   1,13%  libc-2.17.so [.] vfprintf
   1,01%  libc-2.17.so [.] __memcpy_ssse3_back

With 5364b357:
  18,54%  postgres [.] GetSnapshotData
   3,45%  postgres [.] LWLockRelease
   3,27%  postgres [.] LWLockAttemptLock
   3,21%  postgres [.] _bt_compare
   2,93%  postgres [.] hash_search_with_hash_value
   2,00%  postgres [.] PinBuffer.isra.3
   1,32%  postgres [.] AllocSetAlloc
   1,10%  libc-2.17.so [.] vfprintf

Very surprising.  It appears that after 5364b357, GetSnapshotData consumes
more time.  But I can't see anything depending on clog buffers
in GetSnapshotData code...

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: [HACKERS] Move PinBuffer and UnpinBuffer to atomics

2016-04-09 Thread Alexander Korotkov

On Sun, Apr 10, 2016 at 7:26 AM, Amit Kapila 
wrote:

> On Sun, Apr 10, 2016 at 1:13 AM, Andres Freund  wrote:
>
>> On 2016-04-09 22:38:31 +0300, Alexander Korotkov wrote:
>> > There are results with 5364b357 reverted.
>>
>>
> What exactly is this test?
> I think assuming it is a read-only -M prepared pgbench run where data fits
> in shared buffers.  However if you can share exact details, then I can try
> the similar test.
>

Yes, the test is:

pgbench -s 1000 -c $clients -j 100 -M prepared -S -T 300
(shared_buffers=24GB)


>
>> Crazy that this has such a negative impact. Amit, can you reproduce
>> that?
>
>
> I will try it.
>

Good.


>
>
>> Alexander, I guess for r/w workload 5364b357 is a benefit on that
>> machine as well?
>>
>
> I also think so. Alexander, if try read-write workload with unlogged
> tables, then we should see an improvement.
>

I'll try read-write workload.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: [HACKERS] Move PinBuffer and UnpinBuffer to atomics

2016-04-09 Thread Alexander Korotkov

On Sat, Apr 9, 2016 at 10:49 PM, Andres Freund  wrote:

>
>
> On April 9, 2016 12:43:03 PM PDT, Andres Freund 
> wrote:
> >On 2016-04-09 22:38:31 +0300, Alexander Korotkov wrote:
> >> There are results with 5364b357 reverted.
> >
> >Crazy that this has such a negative impact. Amit, can you reproduce
> >that? Alexander, I guess for r/w workload 5364b357 is a benefit on that
> >machine as well?
>
> How sure are you about these measurements?


I'm pretty sure.  I've retried it multiple times by hand before re-run the
script.


> Because there really shouldn't be clog lookups one a steady state is
> reached...
>

Hm... I'm also surprised. There shouldn't be clog lookups once hint bits
are set.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: [HACKERS] Move PinBuffer and UnpinBuffer to atomics

2016-04-09 Thread Amit Kapila

On Sun, Apr 10, 2016 at 1:13 AM, Andres Freund  wrote:

> On 2016-04-09 22:38:31 +0300, Alexander Korotkov wrote:
> > There are results with 5364b357 reverted.
>
>
What exactly is this test?
I think assuming it is a read-only -M prepared pgbench run where data fits
in shared buffers.  However if you can share exact details, then I can try
the similar test.

> Crazy that this has such a negative impact. Amit, can you reproduce
> that?

I will try it.

> Alexander, I guess for r/w workload 5364b357 is a benefit on that
> machine as well?
>

I also think so. Alexander, if try read-write workload with unlogged
tables, then we should see an improvement.

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: [HACKERS] Move PinBuffer and UnpinBuffer to atomics

2016-04-09 Thread Andres Freund



On April 9, 2016 12:43:03 PM PDT, Andres Freund  wrote:
>On 2016-04-09 22:38:31 +0300, Alexander Korotkov wrote:
>> There are results with 5364b357 reverted.
>
>Crazy that this has such a negative impact. Amit, can you reproduce
>that? Alexander, I guess for r/w workload 5364b357 is a benefit on that
>machine as well?

How sure are you about these measurements? Because there really shouldn't be 
clog lookups one a steady state is reached...

Andres
-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Move PinBuffer and UnpinBuffer to atomics

2016-04-09 Thread Andres Freund

On 2016-04-09 22:38:31 +0300, Alexander Korotkov wrote:
> There are results with 5364b357 reverted.

Crazy that this has such a negative impact. Amit, can you reproduce
that? Alexander, I guess for r/w workload 5364b357 is a benefit on that
machine as well?

> It's much closer to what we had before.

I'm going to apply this later then. If there's some micro optimization
for large x86, we can look into that later.

Greetings,

Andres Freund

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Move PinBuffer and UnpinBuffer to atomics

2016-04-09 Thread Alexander Korotkov

On Sat, Apr 9, 2016 at 11:24 AM, Alexander Korotkov <
a.korot...@postgrespro.ru> wrote:

> On Fri, Apr 8, 2016 at 10:19 PM, Alexander Korotkov <
> a.korot...@postgrespro.ru> wrote:
>
>> On Fri, Apr 8, 2016 at 7:39 PM, Andres Freund  wrote:
>>
>>> As you can see in
>>>
>>
>>> http://archives.postgresql.org/message-id/CA%2BTgmoaeRbN%3DZ4oWENLvgGLeHEvGZ_S_Z3KGrdScyKiSvNt3oA%40mail.gmail.com
>>> I'm planning to apply this sometime this weekend, after running some
>>> tests and going over the patch again.
>>>
>>> Any chance you could have a look over this?
>>>
>>
>> I took a look at this.  Changes you made look good for me.
>> I also run test on 4x18 Intel server.
>>
>
> On the top of current master results are following:
>
> clientsTPS
> 1 12562
> 2 25604
> 4 52661
> 8103209
> 10   128599
> 20   256872
> 30   365718
> 40   432749
> 50   513528
> 60   684943
> 70   696050
> 80   923350
> 90  1119776
> 100 1208027
> 110 1229429
> 120 1163356
> 130 1107924
> 140 1084344
> 150 1014064
> 160  961730
> 170  980743
> 180  968419
>
> The results are quite discouraging because previously we had about 1.5M
> TPS in the peak while we have only about 1.2M now.  I found that it's not
> related to the changes you made in the patch, but it's related to 5364b357
> "Increase maximum number of clog buffers".  I'm making same benchmark with
> 5364b357 reverted.
>

There are results with 5364b357 reverted.

clientsTPS
1  12980
2  27105
4  51969
8 105507
   10 132811
   20 256888
   30 368573
   40 467605
   50 544231
   60 590898
   70 799094
   80 967569
   901211662
  1001352427
  1101432561
  1201480324
  1301486624
  1401492092
  1501461681
  1601426733
  1701409081
  1801366199

It's much closer to what we had before.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: [HACKERS] Move PinBuffer and UnpinBuffer to atomics

2016-04-09 Thread Alexander Korotkov

On Fri, Apr 8, 2016 at 10:19 PM, Alexander Korotkov <
a.korot...@postgrespro.ru> wrote:

> On Fri, Apr 8, 2016 at 7:39 PM, Andres Freund  wrote:
>
>> As you can see in
>>
>
>> http://archives.postgresql.org/message-id/CA%2BTgmoaeRbN%3DZ4oWENLvgGLeHEvGZ_S_Z3KGrdScyKiSvNt3oA%40mail.gmail.com
>> I'm planning to apply this sometime this weekend, after running some
>> tests and going over the patch again.
>>
>> Any chance you could have a look over this?
>>
>
> I took a look at this.  Changes you made look good for me.
> I also run test on 4x18 Intel server.
>

On the top of current master results are following:

clientsTPS
1 12562
2 25604
4 52661
8103209
10   128599
20   256872
30   365718
40   432749
50   513528
60   684943
70   696050
80   923350
90  1119776
100 1208027
110 1229429
120 1163356
130 1107924
140 1084344
150 1014064
160  961730
170  980743
180  968419

The results are quite discouraging because previously we had about 1.5M TPS
in the peak while we have only about 1.2M now.  I found that it's not
related to the changes you made in the patch, but it's related to 5364b357
"Increase maximum number of clog buffers".  I'm making same benchmark with
5364b357 reverted.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: [HACKERS] Move PinBuffer and UnpinBuffer to atomics

2016-04-08 Thread Alexander Korotkov

On Fri, Apr 8, 2016 at 7:39 PM, Andres Freund  wrote:

> On 2016-04-07 16:50:44 +0300, Alexander Korotkov wrote:
> > On Thu, Apr 7, 2016 at 4:41 PM, Andres Freund 
> wrote:
> >
> > > On 2016-03-31 20:21:02 +0300, Alexander Korotkov wrote:
> > > > ! BEGIN_BUFSTATE_CAS_LOOP(bufHdr);
> > > >
> > > > ! Assert(BUF_STATE_GET_REFCOUNT(state) > 0);
> > > > ! wasDirty = (state & BM_DIRTY) ? true : false;
> > > > ! state |= BM_DIRTY | BM_JUST_DIRTIED;
> > > > ! if (state == oldstate)
> > > > ! break;
> > >
> > > I'm doubtful that this early exit is entirely safe. None of the
> > > preceding operations imply a memory barrier. The buffer could
> previously
> > > have been marked dirty, but cleaned since. It's pretty critical that we
> > > re-set the dirty bit (there's no danger of loosing it with a barrier,
> > > because we hold an exclusive content lock).
> > >
> >
> > Oh, I get it.
> >
> >
> > > Practically the risk seems fairly low, because acquiring the exclusive
> > > content lock will have implied a barrier. But it seems unlikely to have
> > > a measurable performance effect to me, so I'd rather not add the early
> > > exit.
> > >
> >
> > Ok, let's just remove it.
>
> Here's my updated version of the patch. I've updated this on an
> intercontinental flight, after a otherwise hectic week (moving from SF
> to Berlin); so I'm planning to look over this once more before pushing (.
>

Ok.

I've decided that the cas-loop macros are too obfuscating for my
> taste. To avoid duplicating the wait part I've introduced
> WaitBufHdrUnlocked().
>

That's OK for me.  Cas-loop macros looks cute, but too magic.


> As you can see in
>
> http://archives.postgresql.org/message-id/CA%2BTgmoaeRbN%3DZ4oWENLvgGLeHEvGZ_S_Z3KGrdScyKiSvNt3oA%40mail.gmail.com
> I'm planning to apply this sometime this weekend, after running some
> tests and going over the patch again.
>
> Any chance you could have a look over this?
>

I took a look at this.  Changes you made look good for me.
I also run test on 4x18 Intel server.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: [HACKERS] Move PinBuffer and UnpinBuffer to atomics

2016-04-08 Thread Andres Freund

On 2016-04-07 16:50:44 +0300, Alexander Korotkov wrote:
> On Thu, Apr 7, 2016 at 4:41 PM, Andres Freund  wrote:
>
> > On 2016-03-31 20:21:02 +0300, Alexander Korotkov wrote:
> > > ! BEGIN_BUFSTATE_CAS_LOOP(bufHdr);
> > >
> > > ! Assert(BUF_STATE_GET_REFCOUNT(state) > 0);
> > > ! wasDirty = (state & BM_DIRTY) ? true : false;
> > > ! state |= BM_DIRTY | BM_JUST_DIRTIED;
> > > ! if (state == oldstate)
> > > ! break;
> >
> > I'm doubtful that this early exit is entirely safe. None of the
> > preceding operations imply a memory barrier. The buffer could previously
> > have been marked dirty, but cleaned since. It's pretty critical that we
> > re-set the dirty bit (there's no danger of loosing it with a barrier,
> > because we hold an exclusive content lock).
> >
>
> Oh, I get it.
>
>
> > Practically the risk seems fairly low, because acquiring the exclusive
> > content lock will have implied a barrier. But it seems unlikely to have
> > a measurable performance effect to me, so I'd rather not add the early
> > exit.
> >
>
> Ok, let's just remove it.

Here's my updated version of the patch. I've updated this on an
intercontinental flight, after a otherwise hectic week (moving from SF
to Berlin); so I'm planning to look over this once more before pushing (.

I've decided that the cas-loop macros are too obfuscating for my
taste. To avoid duplicating the wait part I've introduced
WaitBufHdrUnlocked().

As you can see in
http://archives.postgresql.org/message-id/CA%2BTgmoaeRbN%3DZ4oWENLvgGLeHEvGZ_S_Z3KGrdScyKiSvNt3oA%40mail.gmail.com
I'm planning to apply this sometime this weekend, after running some
tests and going over the patch again.

Any chance you could have a look over this?

Regards,

Andres
>From e89e99acc5b0854f918f0c7f685efcd50e6ffcae Mon Sep 17 00:00:00 2001
From: Andres Freund 
Date: Thu, 7 Apr 2016 10:29:41 +0200
Subject: [PATCH] Allow Pin/UnpinBuffer to operate in a lockfree manner.

Pinning/Unpinning a buffer is a very frequent operation; especially in
read-mostly cache resident workloads. Benchmarking shows that in various
scenarios the spinlock protecting a buffer header's state becomes a
significant bottleneck. The problem can be reproduced with pgbench -S on
larger machines, but can be considerably worse for queries which touch
the same buffers over and over at a high frequency (e.g. nested loops
over a small inner table).

To allow atomic operations to be used, cram BufferDesc's flags,
usage_count, buf_hdr_lock, refcount into a single 32bit atomic variable;
that allows to manipulate them together using 32bit compare-and-swap
operations. This requires reducing MAX_BACKENDS to 2^18-1 (which could
be lifted by using a 64bit field, but it's not a realistic configuration
atm).

As not all operations can easily implemented in a lockfree manner,
implement the previous buf_hdr_lock via a flag bit in the atomic
variable. That way we can continue to lock the header in places where
it's needed, but can get away without acquiring it in the more frequent
hot-paths.  There's some additional operations which can be done without
the lock, but aren't in this patch; but the most important places are
covered.

As bufmgr.c now essentially re-implements spinlocks, abstract the delay
logic from s_lock.c into something more generic. It now has already two
users, and it seems likely that more are coming up; there's pending
patches for lwlock.c at least.

This patch is based on a proof-of-concept written by me, which Alexander
Korotkov made into a fully working patch; the committed version is again
revised by me.  Benchmarking and testing has, amongst others, been
provided by Dilip Kumar, Alexander Korotkov, Robert Haas.

Author: Alexander Korotkov and Andres Freund
Discussion: 2400449.GjM57CE0Yg@dinodell
---
 contrib/pg_buffercache/pg_buffercache_pages.c |  15 +-
 src/backend/storage/buffer/buf_init.c |  18 +-
 src/backend/storage/buffer/bufmgr.c   | 491 +-
 src/backend/storage/buffer/freelist.c |  44 ++-
 src/backend/storage/buffer/localbuf.c |  64 ++--
 src/backend/storage/lmgr/s_lock.c | 218 ++--
 src/include/postmaster/postmaster.h   |  15 +-
 src/include/storage/buf_internals.h   | 101 --
 src/include/storage/s_lock.h  |  18 +
 src/tools/pgindent/typedefs.list  |   1 +
 10 files changed, 621 insertions(+), 364 deletions(-)

diff --git a/contrib/pg_buffercache/pg_buffercache_pages.c b/contrib/pg_buffercache/pg_buffercache_pages.c
index 6622d22..17b4b6f 100644
--- a/contrib/pg_buffercache/pg_buffercache_pages.c
+++ b/contrib/pg_buffercache/pg_buffercache_pages.c
@@ -148,11 +148,12 @@ pg_buffercache_pages(PG_FUNCTION_ARGS)
 		 */
 		for (i = 0; i < NBuffers; i++)
 		{
-			volatile BufferDesc *bufHdr;
+			BufferDesc *bufHdr;
+			uint32		buf_state;

 			bufHdr = GetBufferDescriptor(i);
 			/* Lock each buffer header before

Re: [HACKERS] Move PinBuffer and UnpinBuffer to atomics

2016-04-07 Thread Alexander Korotkov

On Thu, Apr 7, 2016 at 4:41 PM, Andres Freund  wrote:

> On 2016-03-31 20:21:02 +0300, Alexander Korotkov wrote:
> > ! BEGIN_BUFSTATE_CAS_LOOP(bufHdr);
> >
> > ! Assert(BUF_STATE_GET_REFCOUNT(state) > 0);
> > ! wasDirty = (state & BM_DIRTY) ? true : false;
> > ! state |= BM_DIRTY | BM_JUST_DIRTIED;
> > ! if (state == oldstate)
> > ! break;
>
> I'm doubtful that this early exit is entirely safe. None of the
> preceding operations imply a memory barrier. The buffer could previously
> have been marked dirty, but cleaned since. It's pretty critical that we
> re-set the dirty bit (there's no danger of loosing it with a barrier,
> because we hold an exclusive content lock).
>

Oh, I get it.


> Practically the risk seems fairly low, because acquiring the exclusive
> content lock will have implied a barrier. But it seems unlikely to have
> a measurable performance effect to me, so I'd rather not add the early
> exit.
>

Ok, let's just remove it.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: [HACKERS] Move PinBuffer and UnpinBuffer to atomics

2016-04-07 Thread Andres Freund

On 2016-03-31 20:21:02 +0300, Alexander Korotkov wrote:
> ! BEGIN_BUFSTATE_CAS_LOOP(bufHdr);
>   
> ! Assert(BUF_STATE_GET_REFCOUNT(state) > 0);
> ! wasDirty = (state & BM_DIRTY) ? true : false;
> ! state |= BM_DIRTY | BM_JUST_DIRTIED;
> ! if (state == oldstate)
> ! break;

I'm doubtful that this early exit is entirely safe. None of the
preceding operations imply a memory barrier. The buffer could previously
have been marked dirty, but cleaned since. It's pretty critical that we
re-set the dirty bit (there's no danger of loosing it with a barrier,
because we hold an exclusive content lock).

Practically the risk seems fairly low, because acquiring the exclusive
content lock will have implied a barrier. But it seems unlikely to have
a measurable performance effect to me, so I'd rather not add the early
exit.

Andres

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Move PinBuffer and UnpinBuffer to atomics

2016-04-07 Thread Andres Freund

Hi,

On 2016-04-06 21:58:50 -0400, Robert Haas wrote:
> I spent a lot of time testing things on power2 today

Thanks for that!

> It's fairly mysterious to me why there is so much jitter in the
> results on this machine. By doing prewarming in a consistent fashion,
> we make sure that every disk run puts the same disk blocks in the same
> buffers.  Andres guessed that maybe the degree of associativity of the
> CPU caches comes into play here: depending on where the hot data is we
> either get the important cache lines in places where they can all be
> cached at once, or we get them in places where they can't all be
> cached at once.  But if that's really what is going on here, it's
> shocking that it makes this much difference.

I'm not sure at all that that is indeed the reason. I think it'd be
quite worthwhile to
a) collect perf stat -ddd of a slow and a fast run, compare
b) collect a perf profile of a slow and fast run, run perf diff
c) add perf probes for lwlocks and spinlocks where we ended up sleeping,
   check whether the two runs have different distribution of which
   locks ended up sleeping.

I'm also wondering if the idea from
http://archives.postgresql.org/message-id/20160407082711.q7iq3ykffqxcszkv%40alap3.anarazel.de
both with/without preloading, changes variability.

Greetings,

Andres Freund

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Move PinBuffer and UnpinBuffer to atomics

2016-04-06 Thread Robert Haas

On Wed, Apr 6, 2016 at 10:04 AM, Dilip Kumar  wrote:
> On Wed, Apr 6, 2016 at 3:22 PM, Andres Freund  wrote:
>> Which scale did you initialize with? I'm trying to reproduce the
>> workload on hydra as precisely as possible...
>
> I tested with scale factor 300, shared buffer 8GB.
>
> My test script is attached with the mail (perf_pgbench_ro.sh).
>
> I have done some more test on power (same machine)

I spent a lot of time testing things on power2 today and also
discussed with Andres via IM.  Andres came up with a few ideas to
reduce the variability, which I tried.  One of those was to run the
server under numactl --interleave=all (that is, numactl
--interleave=all pg_ctl start etc.) and another was to set
kernel.numa_balancing = 0 (it was set to 1).  Neither of those things
seemed to prevent the problem of run-to-run variability.  Andres also
suggested running pgbench with "-P 1", which revealed that it was
generally possible to tell what the overall performance of a run was
going to be like within 10-20 seconds.  Runs that started out fast
stayed fast, and those that started out slower remained slower.
Therefore, long runs didn't seem to be necessary for testing, so I
switched to using 2-minute test runs launched via pgbench -T 120 -c 64
-j 64 -n -M prepared -S -P1.

After quite a bit of experimentation, Andres hit on an idea that did
succeed in drastically reducing the run-to-run variation: prewarming
all of the relations in a deterministic order before starting the
test.  I used this query:

psql -c "select sum(x.x) from (select pg_prewarm(oid) as x from
pg_class where relkind in ('i', 'r') order by oid) x;"

With that change to my test script, the results became much more
stable.  I tested four different builds of the server: commit 3fed4174
(that is, the one just before where you have been reporting the
variability to have begun), commit 6150a1b0 (the one you reported that
the variability actually did begin), master as of this morning
(actually commit cac0e366), and master + pinunpin-cas-9.patch +
0001-WIP-Avoid-the-use-of-a-separate-spinlock-to-protect-.patch +
backoff.patch (herein called "andres").  The first and last of these
have 64-byte BufferDescs, and the others have 80-byte BufferDescs.
Without prewarming, I see high and low results on *all* of these
builds, even 3fed4174.  I did nine test runs with each configuration
with and without prewarming, and here are the results.  With each
result I have reported the raw numbers, plus the median and the range
(highest result - lowest result).

-- without prewarming --
3fed4174
tps by run: 249165.928992 300958.039880 501281.083247
488073.289603 251033.038275 272086.063197
522287.023374 528855.922328 266057.502255
median: 300958.039880, range: 206687.132100

6150a1b0
tps by run: 455853.061092 438628.888369 353643.017507
419850.232715 424353.870515 440219.581180
431805.001465 237528.175877 431789.666417
median: 431789.666417, range: 218324.885215

master
tps by run: 427808.559919 366625.640433 376165.188508
441349.141152 363381.095198 352925.252345
348975.712841 446626.284308 444633.921009
median: 376165.188508, range: 97650.571467

andres
tps by run: 391123.866928 423525.415037 496103.017599
346707.246825 531194.321999 466100.337795
517708.470146 355392.837942 510817.921728
median: 466100.337795, range: 184487.075174

-- with prewarming --
3fed4174
tps by run: 413239.471751 428996.202541 488206.103941
493788.533491 497953.728875 498074.092686
501719.406720 508880.505416 509868.266778
median: 497953.728875, range: 96628.795027

6150a1b0
tps by run: 421589.299889 438765.851782 440601.270742
440649.818900 443033.460295 447317.269583
452831.480337 456387.316178 460140.159903
median: 443033.460295, range: 38550.860014

master
tps by run: 427211.917303 427796.174209 435601.396857
436581.431219 442329.269335 446438.814270
449970.003595 450085.123059 456231.229966
median: 442329.269335, range: 29019.312663

andres
tps by run: 425513.771259 429219.907952 433838.084721
451354.257738 494735.045808 495301.319716
517166.054466 531655.887669 546984.476602
median: 494735.045808, range: 121470.705343

My belief is that the first set of numbers has so much jigger that you
can't really draw any meaningful conclusions.  For two of the four
branches, the range is more than 50% of the median, which is enormous.
You could probably draw some conclusions if you took enough
measurements, but it's pretty hard. Notice that that set of numbers
makes 6150a1b0 look like a performance improvement, whereas the second
set makes it pretty clear that 6150a1b0 was a regression.

Also, the patch set is clearly winning here.  It picks up 90k TPS
median on the first set of numbers and 50k TPS median on the second
set.

It's fairly mysterious to me why there is so much jitter in the
results on this machine. By doing prewarming in a consistent fashion,
we make sure that every disk run puts the same disk blocks in the same
buffers.  Andres guessed

Re: [HACKERS] Move PinBuffer and UnpinBuffer to atomics

2016-04-06 Thread Dilip Kumar

On Wed, Apr 6, 2016 at 3:22 PM, Andres Freund  wrote:

> Which scale did you initialize with? I'm trying to reproduce the
> workload on hydra as precisely as possible...
>

I tested with scale factor 300, shared buffer 8GB.

My test script is attached with the mail (perf_pgbench_ro.sh).

I have done some more test on power (same machine)

Test1:

head + pinunpin-cas-9.patch + BufferDesc content lock to pointer (patch
attached: buffer_content_lock_ptr**.patch)

Ashutosh helped me in generating this patch (this is just temp patch to see
the pin/unpin behaviour if content lock is pointer)

64 client
run1 497684
run2 543366
run3 476988
  128 Client
run1740301
run2 482676
run3 474530
run4 480971
run5 757779

*Summary:*
1. With 64 client I think whether we apply only
pinunpin-cas-9.patch or we apply pinunpin-cas-9.patch +
buffer_content_lock_ptr_rebased_head_temp.patch
max TPS is ~550,000 and some fluctuations.

2. With 128, we saw in earlier post that with pinunpin we were getting max
TPS was 650,000 (even after converting BufferDesc to 64 bytes it was
650,000.  Now after converting content lock to pointer on top of pinunpin I
get max as ~750,000.

   - One more point to be noted, earlier it was varying from 250,000 to
650,000 but after converting content lock to pointer its varying from
450,000 to 75.

*Test2:*

Head + buffer_content_lock_ptr_rebased_head_temp.patch

1. With this test reading is same as head, and can see same variance in
performance.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

perf_pgbench_ro.sh
Description: Bourne shell script

buffer_content_lock_ptr_rebased_head_temp.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Move PinBuffer and UnpinBuffer to atomics

2016-04-06 Thread Andres Freund

On 2016-04-06 11:52:28 +0200, Andres Freund wrote:
> Hi,
> 
> On 2016-04-03 16:47:49 +0530, Dilip Kumar wrote:
> 
> > Summary Of the Run:
> > -
> > 1. Throughout one run if we observe TPS every 30 seconds its stable in one
> > run.
> > 2. With Head 64 client run vary between ~250,000 to ~45. you can see
> > below results.
> > 
> > run1: 434860(5min)
> > run2: 275815(5min)
> > run3: 437872(5min)
> > run4: 237033   (5min)
> > run5: 347611(10min)
> > run6: 435933   (20min)
> 
> > > [1] -
> > > http://www.postgresql.org/message-id/20160401083518.ge9...@awork2.anarazel.de
> > >
> > 
> > 
> > Non Default Parameter:
> > 
> > shared_buffer 8GB
> > Max Connections 150
> > 
> > ./pgbench -c $threads -j $threads -T 1200 -M prepared -S -P 30 postgres
> 
> Which scale did you initialize with? I'm trying to reproduce the
> workload on hydra as precisely as possible...

On hydra, even after a fair amount of tinkering, I cannot reprodue such
variability.

Pin/Unpin only has a minor effect itself, reducing the size of lwlock
on-top improves performance (378829 to 409914 TPS) in
intentionally short 15s tests (warm cache) with 128 clients; as well as
to a lower degree in 120s tests (415921 to 420487).

It appears, over 5 runs, that the alignment fix shortens the rampup
phase from about 100s to about 8; interesting.

Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Move PinBuffer and UnpinBuffer to atomics

2016-04-06 Thread Andres Freund

Hi,

On 2016-04-03 16:47:49 +0530, Dilip Kumar wrote:

> Summary Of the Run:
> -
> 1. Throughout one run if we observe TPS every 30 seconds its stable in one
> run.
> 2. With Head 64 client run vary between ~250,000 to ~45. you can see
> below results.
> 
> run1: 434860(5min)
> run2: 275815(5min)
> run3: 437872(5min)
> run4: 237033   (5min)
> run5: 347611(10min)
> run6: 435933   (20min)

> > [1] -
> > http://www.postgresql.org/message-id/20160401083518.ge9...@awork2.anarazel.de
> >
> 
> 
> Non Default Parameter:
> 
> shared_buffer 8GB
> Max Connections 150
> 
> ./pgbench -c $threads -j $threads -T 1200 -M prepared -S -P 30 postgres

Which scale did you initialize with? I'm trying to reproduce the
workload on hydra as precisely as possible...

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Move PinBuffer and UnpinBuffer to atomics

2016-04-06 Thread Andres Freund

On 2016-04-05 12:56:46 +0530, Dilip Kumar wrote:
> On Mon, Apr 4, 2016 at 2:28 PM, Andres Freund  wrote:
> 
> > Hm, interesting. I suspect that's because of the missing backoff in my
> > experimental patch. If you apply the attached patch ontop of that
> > (requires infrastructure from pinunpin), how does performance develop?
> >
> 
> I have applied this patch also, but still results are same, I mean around
> 550,000 with 64 threads and 650,000 with 128 client with lot of
> fluctuations..
> 
> *128 client
> **(head+**0001-WIP-Avoid-the-use-of-a-separate-spinlock-to-protect
> +pinunpin-cas-9+backoff)*
> 
> run1 645769
> run2 643161
> run3 *285546*
> run4 *289421*
> run5 630772
> run6 *284363*

I wonder what 
http://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=09adc9a8c09c9640de05c7023b27fb83c761e91c
does to all these numbers. It seems entirely possible that "this" is
mainly changing the alignment of some common datastructures...

- Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Move PinBuffer and UnpinBuffer to atomics

2016-04-06 Thread Alexander Korotkov

On Tue, Apr 5, 2016 at 5:45 PM, Andres Freund  wrote:

> On 2016-04-05 17:36:49 +0300, Alexander Korotkov wrote:
> > Could the reason be that we're increasing concurrency for LWLock state
> > atomic variable by placing queue spinlock there?
>
> Don't think so, it's the same cache-line either way.
>

Yes, it's very unlikely.

> But I wonder why this could happen during "pgbench -S", because it doesn't
> > seem to have high traffic of exclusive LWLocks.
>
> Yea, that confuses me too. I suspect there's some mis-aligned
> datastructures somewhere. It's hard to investigate such things without
> access to hardware.
>

But it's quite easy to check if it is alignment issue.  We can try your
patch but without removing mutex from LWLock struct.  If it's alignment
issue, then TPS should become stable again.

(FWIW, I'm working on getting pinunpin committed)
>

Sounds good.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: [HACKERS] Move PinBuffer and UnpinBuffer to atomics

2016-04-05 Thread Robert Haas

On Tue, Apr 5, 2016 at 1:04 PM, Andres Freund  wrote:
> On 2016-04-05 12:14:35 -0400, Robert Haas wrote:
>> On Tue, Apr 5, 2016 at 11:30 AM, Andres Freund  wrote:
>> > On 2016-04-05 20:56:31 +0530, Amit Kapila wrote:
>> >> This fluctuation started appearing after commit 6150a1b0 which we have
>> >> discussed in another thread [1] and a colleague of mine is working on to
>> >> write a patch to try to revert it on current HEAD and then see the 
>> >> results.
>> >
>> > I don't see what that buys us. That commit is a good win on x86...
>>
>> Maybe.  But I wouldn't be surprised to find out that that is an
>> overgeneralization.  Based on some results Mithun Cy showed me this
>> morning, I think that some of this enormous run-to-run fluctuation
>> that we're seeing is due to NUMA effects.  So some runs we get two
>> things that are frequently accessed together on the same NUMA node and
>> other times they get placed on different NUMA nodes and then
>> everything sucks.  I don't think we fully understand what's going on
>> here yet - and I think we're committing changes in this area awfully
>> quickly - but I see no reason to believe that x86 is immune to such
>> effects.  They may just happen in different scenarios than what we see
>> on POWER.
>
> I'm not really following - we were talking about 6150a1b0 ("Move buffer
> I/O and content LWLocks out of the main tranche.") made four months
> ago. Afaics the atomic buffer pin patch is a pretty clear win on both
> ppc and x86?

The point is that the testing Amit's team is doing can't tell the
answer to that question one way or another.  6150a1b0 completely
destabilized performance on our test systems to the point where
testing subsequent patches is extremely difficult.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Move PinBuffer and UnpinBuffer to atomics

2016-04-05 Thread Andres Freund

On 2016-04-05 12:14:35 -0400, Robert Haas wrote:
> On Tue, Apr 5, 2016 at 11:30 AM, Andres Freund  wrote:
> > On 2016-04-05 20:56:31 +0530, Amit Kapila wrote:
> >> This fluctuation started appearing after commit 6150a1b0 which we have
> >> discussed in another thread [1] and a colleague of mine is working on to
> >> write a patch to try to revert it on current HEAD and then see the results.
> >
> > I don't see what that buys us. That commit is a good win on x86...
> 
> Maybe.  But I wouldn't be surprised to find out that that is an
> overgeneralization.  Based on some results Mithun Cy showed me this
> morning, I think that some of this enormous run-to-run fluctuation
> that we're seeing is due to NUMA effects.  So some runs we get two
> things that are frequently accessed together on the same NUMA node and
> other times they get placed on different NUMA nodes and then
> everything sucks.  I don't think we fully understand what's going on
> here yet - and I think we're committing changes in this area awfully
> quickly - but I see no reason to believe that x86 is immune to such
> effects.  They may just happen in different scenarios than what we see
> on POWER.

I'm not really following - we were talking about 6150a1b0 ("Move buffer
I/O and content LWLocks out of the main tranche.") made four months
ago. Afaics the atomic buffer pin patch is a pretty clear win on both
ppc and x86?

I agree that there's numa effects we don't understand.  I think
re-considering Kevin's patch that did explicit numa hinting on linux
would be rather worthwhile. It makes a lot of sense to force shmem to be
force-interleaved, but make backend local memory, uh, local.

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Move PinBuffer and UnpinBuffer to atomics

2016-04-05 Thread Robert Haas

On Tue, Apr 5, 2016 at 11:30 AM, Andres Freund  wrote:
> On 2016-04-05 20:56:31 +0530, Amit Kapila wrote:
>> This fluctuation started appearing after commit 6150a1b0 which we have
>> discussed in another thread [1] and a colleague of mine is working on to
>> write a patch to try to revert it on current HEAD and then see the results.
>
> I don't see what that buys us. That commit is a good win on x86...

Maybe.  But I wouldn't be surprised to find out that that is an
overgeneralization.  Based on some results Mithun Cy showed me this
morning, I think that some of this enormous run-to-run fluctuation
that we're seeing is due to NUMA effects.  So some runs we get two
things that are frequently accessed together on the same NUMA node and
other times they get placed on different NUMA nodes and then
everything sucks.  I don't think we fully understand what's going on
here yet - and I think we're committing changes in this area awfully
quickly - but I see no reason to believe that x86 is immune to such
effects.  They may just happen in different scenarios than what we see
on POWER.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Move PinBuffer and UnpinBuffer to atomics

2016-04-05 Thread Amit Kapila

On Tue, Apr 5, 2016 at 9:00 PM, Andres Freund  wrote:
>
> On 2016-04-05 20:56:31 +0530, Amit Kapila wrote:
> > This fluctuation started appearing after commit 6150a1b0 which we have
> > discussed in another thread [1] and a colleague of mine is working on to
> > write a patch to try to revert it on current HEAD and then see the
results.
>
> I don't see what that buys us. That commit is a good win on x86...
>

At least, that way we can see the results of pin-unpin without
fluctuation.  I agree that we need to narrow down why even after reducing
BufferDesc size, we are seeing performance fluctuation.

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: [HACKERS] Move PinBuffer and UnpinBuffer to atomics

2016-04-05 Thread Andres Freund

On 2016-04-05 20:56:31 +0530, Amit Kapila wrote:
> This fluctuation started appearing after commit 6150a1b0 which we have
> discussed in another thread [1] and a colleague of mine is working on to
> write a patch to try to revert it on current HEAD and then see the results.

I don't see what that buys us. That commit is a good win on x86...

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Move PinBuffer and UnpinBuffer to atomics

2016-04-05 Thread Amit Kapila

On Tue, Apr 5, 2016 at 8:15 PM, Andres Freund  wrote:
>
> On 2016-04-05 17:36:49 +0300, Alexander Korotkov wrote:
> > Could the reason be that we're increasing concurrency for LWLock state
> > atomic variable by placing queue spinlock there?
>
> Don't think so, it's the same cache-line either way.
>
> > But I wonder why this could happen during "pgbench -S", because it
doesn't
> > seem to have high traffic of exclusive LWLocks.
>
> Yea, that confuses me too. I suspect there's some mis-aligned
> datastructures somewhere. It's hard to investigate such things without
> access to hardware.
>

This fluctuation started appearing after commit 6150a1b0 which we have
discussed in another thread [1] and a colleague of mine is working on to
write a patch to try to revert it on current HEAD and then see the results.

> (FWIW, I'm working on getting pinunpin committed)
>

Good to know, but I am slightly worried that it will make the problem
harder to detect as it will reduce the reproducibility.  I understand that
we are running short of time and committing this patch is important, so we
should proceed with it as this is not a problem of this patch.  After this
patch gets committed, we always need to revert it locally to narrow down
the problem due to commit 6150a1b0.

[1] -
http://www.postgresql.org/message-id/caa4ek1+zeb8pmwwktf+3brs0pt4ux6rs6aom0uip8c6shjw...@mail.gmail.com

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: [HACKERS] Move PinBuffer and UnpinBuffer to atomics

2016-04-05 Thread Andres Freund

On 2016-04-05 17:36:49 +0300, Alexander Korotkov wrote:
> Could the reason be that we're increasing concurrency for LWLock state
> atomic variable by placing queue spinlock there?

Don't think so, it's the same cache-line either way.

> But I wonder why this could happen during "pgbench -S", because it doesn't
> seem to have high traffic of exclusive LWLocks.

Yea, that confuses me too. I suspect there's some mis-aligned
datastructures somewhere. It's hard to investigate such things without
access to hardware.

(FWIW, I'm working on getting pinunpin committed)

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Move PinBuffer and UnpinBuffer to atomics

2016-04-05 Thread Alexander Korotkov

On Tue, Apr 5, 2016 at 10:26 AM, Dilip Kumar  wrote:

>
> On Mon, Apr 4, 2016 at 2:28 PM, Andres Freund  wrote:
>
>> Hm, interesting. I suspect that's because of the missing backoff in my
>> experimental patch. If you apply the attached patch ontop of that
>> (requires infrastructure from pinunpin), how does performance develop?
>>
>
> I have applied this patch also, but still results are same, I mean around
> 550,000 with 64 threads and 650,000 with 128 client with lot of
> fluctuations..
>
> *128 client 
> **(head+**0001-WIP-Avoid-the-use-of-a-separate-spinlock-to-protect
> +pinunpin-cas-9+backoff)*
>
> run1 645769
> run2 643161
> run3 *285546*
> run4 *289421*
> run5 630772
> run6 *284363*
>

Could the reason be that we're increasing concurrency for LWLock state
atomic variable by placing queue spinlock there?
But I wonder why this could happen during "pgbench -S", because it doesn't
seem to have high traffic of exclusive LWLocks.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: [HACKERS] Move PinBuffer and UnpinBuffer to atomics

2016-04-05 Thread Dilip Kumar

On Mon, Apr 4, 2016 at 2:28 PM, Andres Freund  wrote:

> Hm, interesting. I suspect that's because of the missing backoff in my
> experimental patch. If you apply the attached patch ontop of that
> (requires infrastructure from pinunpin), how does performance develop?
>

I have applied this patch also, but still results are same, I mean around
550,000 with 64 threads and 650,000 with 128 client with lot of
fluctuations..

*128 client
**(head+**0001-WIP-Avoid-the-use-of-a-separate-spinlock-to-protect
+pinunpin-cas-9+backoff)*

run1 645769
run2 643161
run3 *285546*
run4 *289421*
run5 630772
run6 *284363*

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

Re: [HACKERS] Move PinBuffer and UnpinBuffer to atomics

2016-04-04 Thread Andres Freund

Hi,

On 2016-04-03 16:47:49 +0530, Dilip Kumar wrote:
> 6. With Head+ pinunpin-cas-8 +
> 0001-WIP-Avoid-the-use-of-a-separate-spinlock-to-protect performance is
> almost same as with
> Head+pinunpin-cas-8, only sometime performance at 128 client is low
> (~250,000 instead of 650,000)

Hm, interesting. I suspect that's because of the missing backoff in my
experimental patch. If you apply the attached patch ontop of that
(requires infrastructure from pinunpin), how does performance develop?

Regards,

Andres
diff --git a/src/backend/storage/lmgr/lwlock.c b/src/backend/storage/lmgr/lwlock.c
index ec6baf6..4216be5 100644
--- a/src/backend/storage/lmgr/lwlock.c
+++ b/src/backend/storage/lmgr/lwlock.c
@@ -858,11 +858,15 @@ LWLockWaitListLock(LWLock *lock)
 	{
 		if (old_state & LW_FLAG_LOCKED)
 		{
-			/* FIXME: add exponential backoff */
-			pg_spin_delay();
-			old_state = pg_atomic_read_u32(>state);
+			SpinDelayStatus delayStatus = init_spin_delay((void*)>state);
+			while (old_state & LW_FLAG_LOCKED)
+			{
+perform_spin_delay();
+old_state = pg_atomic_read_u32(>state);
+			}
+			finish_spin_delay();
 #ifdef LWLOCK_STATS
-			delays++;
+			delays += delayStatus.delays;
 #endif
 		}
 		else

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Move PinBuffer and UnpinBuffer to atomics

2016-04-03 Thread Dilip Kumar

On Sun, Apr 3, 2016 at 2:28 PM, Amit Kapila  wrote:

>
> What is the conclusion of this test?  As far as I see, with the patch
> (0001-WIP-Avoid-the-use-of-a-separate-spinlock-to-protect), the performance
> degradation is not fixed, but with pin-unpin patch, the performance seems
> to be better in most of the runs, however still you see less performance in
> some of the runs.  Is that right?
>

Summary Of the Run:
-
1. Throughout one run if we observe TPS every 30 seconds its stable in one
run.
2. With Head 64 client run vary between ~250,000 to ~45. you can see
below results.

run1: 434860(5min)
run2: 275815(5min)
run3: 437872(5min)
run4: 237033   (5min)
run5: 347611(10min)
run6: 435933   (20min)

3. With Head + 0001-WIP-Avoid-the-use-of-a-separate-spinlock-to-protect
 with 64 client I always saw ~450,000 TPS

run1: 429520   (5min)
run2: 446249   (5min)
run3: 431066   (5min)
run4: 441280   (10min)
run5: 429844   (20 mins)

4. With Head+ 0001-WIP-Avoid-the-use-of-a-separate-spinlock-to-protect
with 128 Client something performance is as low as ~150,000 which is never
observed with Head (with head it is constantly ~ 350,000 TPS).

run1: 372958  (5min)
run2: 167189  (5min)
run3: 381592  (5min)
run4: 441280  (10min)
run5: 362742  (20 min)

5. With Head+pinunpin-cas-8, with 64 client its ~ 550,000 TPS and with 128
client ~650,000 TPS.

6. With Head+ pinunpin-cas-8 +
0001-WIP-Avoid-the-use-of-a-separate-spinlock-to-protect performance is
almost same as with
Head+pinunpin-cas-8, only sometime performance at 128 client is low
(~250,000 instead of 650,000)

Seems like Head+ pinunpin-cas-8 is giving best performance and without much
fluctuation.

> Can you answer some of the questions asked by Andres upthread[1]?
>
>
> [1] -
> http://www.postgresql.org/message-id/20160401083518.ge9...@awork2.anarazel.de
>

Non Default Parameter:

shared_buffer 8GB
Max Connections 150

./pgbench -c $threads -j $threads -T 1200 -M prepared -S -P 30 postgres

*BufferDesc Size:*
**
Head: 80 Bytes
Head+0001-WIP-Avoid-the-use-of-a-separate-spinlock-to-protect : 72Bytes
Head+0001-WIP-Avoid-the-use-of-a-separate-spinlock-to-protect+
Pinunpin-cas-8 : 64 Bytes

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

Re: [HACKERS] Move PinBuffer and UnpinBuffer to atomics

2016-04-03 Thread Amit Kapila

On Sun, Apr 3, 2016 at 9:55 AM, Dilip Kumar  wrote:

>
> On Fri, Apr 1, 2016 at 2:09 PM, Andres Freund  wrote:
>
>> One interesting thing to do would be to use -P1 during the test and see
>> how much the performance varies over time.
>>
>
> I have run with -P option, I ran for 1200 second and set -P as 30 second,
> and what I observed is that when its low its low throughout the run and
> when its high, Its high for complete run.
>
>
What is the conclusion of this test?  As far as I see, with the patch
(0001-WIP-Avoid-the-use-of-a-separate-spinlock-to-protect), the performance
degradation is not fixed, but with pin-unpin patch, the performance seems
to be better in most of the runs, however still you see less performance in
some of the runs.  Is that right?   Can you answer some of the questions
asked by Andres upthread[1]?


[1] -
http://www.postgresql.org/message-id/20160401083518.ge9...@awork2.anarazel.de


With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: [HACKERS] Move PinBuffer and UnpinBuffer to atomics

2016-04-02 Thread Dilip Kumar

On Fri, Apr 1, 2016 at 2:09 PM, Andres Freund  wrote:

> One interesting thing to do would be to use -P1 during the test and see
> how much the performance varies over time.
>

I have run with -P option, I ran for 1200 second and set -P as 30 second,
and what I observed is that when its low its low throughout the run and
when its high, Its high for complete run.

Non Default Parameter:

shared_buffer 8GB
Max Connections 150

./pgbench -c $threads -j $threads -T 1200 -M prepared -S -P 30 postgres

Test results are attached in result.tar file.

File details:
--
   1. head_run1.txt   --> 20 mins run on head
reading 1
   2. head_run2.txt   --> 20 mins run on head
reading 2
   3. head_patch_run1.txt--> 20 mins run on head + patch
reading 1
   4. head_patch_run2.txt--> 20 mins run on head + patch
reading 2
   5. head_pinunpin.txt--> 20 mins run on head +
pinunpin-cas-8.patch
   6. head_pinunpin_patch.txt --> 20 mins run on head +
pinunpin-cas-8.patch + patch reading

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com


result.tar
Description: Unix tar archive

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Move PinBuffer and UnpinBuffer to atomics

2016-04-01 Thread Andres Freund

On 2016-04-01 10:35:18 +0200, Andres Freund wrote:
> On 2016-04-01 13:50:10 +0530, Dilip Kumar wrote:
> > I think it needs more number of runs.. After seeing this results I did not
> > run head+pinunpin,
> > 
> > Head 64 Client 128 Client
> > -
> > Run1 434860 356945
> > Run2 275815 *275815*
> > Run3 437872 366560
> > Patch 64 Client 128 Client
> > -
> > Run1 429520 372958
> > Run2 446249 *167189*
> > Run3 431066 381592
> > Patch+Pinunpin  64 Client 128 Client
> > --
> > Run1 338298 642535
> > Run2 406240 644187
> > Run3 595439 *285420 *
> 
> Could you describe the exact setup a bit more? Postgres settings,
> pgbench parameters, etc.
> 
> What's the size of BufferDesc after applying the patch?

One interesting thing to do would be to use -P1 during the test and see
how much the performance varies over time.

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Move PinBuffer and UnpinBuffer to atomics

2016-04-01 Thread Andres Freund

On 2016-04-01 13:50:10 +0530, Dilip Kumar wrote:
> I think it needs more number of runs.. After seeing this results I did not
> run head+pinunpin,
> 
> Head 64 Client 128 Client
> -
> Run1 434860 356945
> Run2 275815 *275815*
> Run3 437872 366560
> Patch 64 Client 128 Client
> -
> Run1 429520 372958
> Run2 446249 *167189*
> Run3 431066 381592
> Patch+Pinunpin  64 Client 128 Client
> --
> Run1 338298 642535
> Run2 406240 644187
> Run3 595439 *285420 *

Could you describe the exact setup a bit more? Postgres settings,
pgbench parameters, etc.

What's the size of BufferDesc after applying the patch?

Thanks,

Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Move PinBuffer and UnpinBuffer to atomics

2016-04-01 Thread Dilip Kumar

On Thu, Mar 31, 2016 at 5:52 PM, Andres Freund  wrote:

> Here's a WIP patch to evaluate. Dilip/Ashutosh, could you perhaps run
> some benchmarks, to see whether this addresses the performance issues?
>
> I guess it'd both be interesting to compare master with master + patch,
> and this thread's latest patch with the patch additionally applied.
>

I tested it in Power and seen lot of fluctuations in the reading, From this
reading I could not reach to any conclusion.
 only we can say that with (patch + pinunpin), we can reach more than
60.

I think it needs more number of runs.. After seeing this results I did not
run head+pinunpin,

Head 64 Client 128 Client
-
Run1 434860 356945
Run2 275815 *275815*
Run3 437872 366560
Patch 64 Client 128 Client
-
Run1 429520 372958
Run2 446249 *167189*
Run3 431066 381592
Patch+Pinunpin  64 Client 128 Client
--
Run1 338298 642535
Run2 406240 644187
Run3 595439 *285420 *

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

Re: [HACKERS] Move PinBuffer and UnpinBuffer to atomics

2016-03-31 Thread Alexander Korotkov

On Thu, Mar 31, 2016 at 8:21 PM, Alexander Korotkov <
a.korot...@postgrespro.ru> wrote:

>
> I think these changes worth running benchmark again.  I'm going to run it
> on 4x18 Intel.
>

The results are following.

clients master  v3  v5  v9
1   11671   12507   12679   12408
2   24650   26005   25010   25495
4   49631   48863   49811   50150
8   96790   96441   99946   98383
10  121275  119928  124100  124180
20  243066  243365  246432  248723
30  359616  342241  357310  378881
40  431375  415310  441619  425653
50  489991  489896  500590  502549
60  538057  636473  554069  685010
70  588659  714426  738535  719383
80  405008  923039  902632  909126
90  295443  1181247 1155918 1179163
100 258695  1323125 1325019 1351578
110 238842  1393767 1410274 1421531
120 226018  1432504 1474982 1497122
130 215102  1465459 1503241 1521619
140 206415  1470454 1505380 1541816
150 197850  1475479 1519908 1515017
160 190935  1420915 1484868 1523150
170 185835  1438965 1453128 1499913
180 182519  1416252 1453098 1472945

It appears that atomic OR for LockBufHdr() gives small but measurable
effect.  Great idea, Andres!

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Move PinBuffer and UnpinBuffer to atomics

2016-03-31 Thread Alexander Korotkov

On Thu, Mar 31, 2016 at 7:14 PM, Andres Freund  wrote:

> > +/*
> > + * The following two macros are aimed to simplify buffer state
> modification
> > + * in CAS loop.  It's assumed that variable "uint32 state" is defined
> outside
> > + * of this loop.  It should be used as following:
> > + *
> > + * BEGIN_BUFSTATE_CAS_LOOP(bufHdr);
> > + * modifications of state variable;
> > + * END_BUFSTATE_CAS_LOOP(bufHdr);
> > + *
> > + * For local buffers, these macros shouldn't be used..  Since there is
> > + * no cuncurrency, local buffer state could be chaged directly by atomic
> > + * read/write operations.
> > + */
> > +#define BEGIN_BUFSTATE_CAS_LOOP(bufHdr) \
> > + do { \
> > + SpinDelayStatus delayStatus =
> init_spin_delay((Pointer)(bufHdr)); \
>
> > + uint32  oldstate; \
> > + oldstate = pg_atomic_read_u32(&(bufHdr)->state); \
> > + for (;;) { \
> > + while (oldstate & BM_LOCKED) \
> > + { \
> > + make_spin_delay(); \
> > + oldstate =
> pg_atomic_read_u32(&(bufHdr)->state); \
> > + } \
> > + state = oldstate
> > +
> > +#define END_BUFSTATE_CAS_LOOP(bufHdr) \
> > + if (pg_atomic_compare_exchange_u32(>state,
> , state)) \
> > + break; \
> > + } \
> > + finish_spin_delay(); \
> > + } while (0)
>
> Hm. Not sure if that's not too much magic. Will think about it.
>

I'm not sure too...


> >   /*
> > +  * Lock buffer header - set BM_LOCKED in buffer state.
> > +  */
> > + uint32
> > + LockBufHdr(BufferDesc *desc)
> > + {
> > + uint32  state;
> > +
> > + BEGIN_BUFSTATE_CAS_LOOP(desc);
> > + state |= BM_LOCKED;
> > + END_BUFSTATE_CAS_LOOP(desc);
> > +
> > + return state;
> > + }
> > +
>
> Hm. It seems a bit over the top to do the full round here. How about
>
> uint32 oldstate;
> while (true)
> {
> oldstate = pg_atomic_fetch_or(..., BM_LOCKED);
> if (!(oldstate & BM_LOCKED)
> break;
> perform_spin_delay();
> }
> return oldstate | BM_LOCKED;
>
> Especially if we implement atomic xor on x86, that'd likely be a good
> bit more efficient.
>

Nice idea.  Done.


> >  typedef struct BufferDesc
> >  {
> >   BufferTag   tag;/* ID of page contained in
> buffer */
> > - BufFlagsflags;  /* see bit definitions
> above */
> > - uint8   usage_count;/* usage counter for clock sweep
> code */
> > - slock_t buf_hdr_lock;   /* protects a subset of fields,
> see above */
> > - unsignedrefcount;   /* # of backends holding
> pins on buffer */
> > - int wait_backend_pid;   /* backend
> PID of pin-count waiter */
> >
> > + /* state of the tag, containing flags, refcount and usagecount */
> > + pg_atomic_uint32 state;
> > +
> > + int wait_backend_pid;   /* backend
> PID of pin-count waiter */
> >   int buf_id; /* buffer's index
> number (from 0) */
> >   int freeNext;   /* link in
> freelist chain */
>
> Hm. Won't matter on most platforms, but it seems like a good idea to
> move to an 8 byte aligned boundary. Move buf_id up?
>

I think so.  Done.


> > +/*
> > + * Support for spin delay which could be useful in other places where
> > + * spinlock-like procedures take place.
> > + */
> > +typedef struct
> > +{
> > + int spins;
> > + int delays;
> > + int cur_delay;
> > + Pointer ptr;
> > + const char *file;
> > + int line;
> > +} SpinDelayStatus;
> > +
> > +#define init_spin_delay(ptr) {0, 0, 0, (ptr), __FILE__, __LINE__}
> > +
> > +void make_spin_delay(SpinDelayStatus *status);
> > +void finish_spin_delay(SpinDelayStatus *status);
> > +
> >  #endif/* S_LOCK_H */
>
> s/make_spin_delay/perform_spin_delay/? The former sounds like it's
> allocating something or such.
>

Done.

I think these changes worth running benchmark again.  I'm going to run it
on 4x18 Intel.


--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


pinunpin-cas-9.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Move PinBuffer and UnpinBuffer to atomics

2016-03-31 Thread Andres Freund

Hi,

> +/*
> + * The following two macros are aimed to simplify buffer state modification
> + * in CAS loop.  It's assumed that variable "uint32 state" is defined outside
> + * of this loop.  It should be used as following:
> + *
> + * BEGIN_BUFSTATE_CAS_LOOP(bufHdr);
> + * modifications of state variable;
> + * END_BUFSTATE_CAS_LOOP(bufHdr);
> + *
> + * For local buffers, these macros shouldn't be used..  Since there is
> + * no cuncurrency, local buffer state could be chaged directly by atomic
> + * read/write operations.
> + */
> +#define BEGIN_BUFSTATE_CAS_LOOP(bufHdr) \
> + do { \
> + SpinDelayStatus delayStatus = 
> init_spin_delay((Pointer)(bufHdr)); \

> + uint32  oldstate; \
> + oldstate = pg_atomic_read_u32(&(bufHdr)->state); \
> + for (;;) { \
> + while (oldstate & BM_LOCKED) \
> + { \
> + make_spin_delay(); \
> + oldstate = 
> pg_atomic_read_u32(&(bufHdr)->state); \
> + } \
> + state = oldstate
> +
> +#define END_BUFSTATE_CAS_LOOP(bufHdr) \
> + if (pg_atomic_compare_exchange_u32(>state, 
> , state)) \
> + break; \
> + } \
> + finish_spin_delay(); \
> + } while (0)

Hm. Not sure if that's not too much magic. Will think about it.

>   /*
> +  * Lock buffer header - set BM_LOCKED in buffer state.
> +  */
> + uint32
> + LockBufHdr(BufferDesc *desc)
> + {
> + uint32  state;
> + 
> + BEGIN_BUFSTATE_CAS_LOOP(desc);
> + state |= BM_LOCKED;
> + END_BUFSTATE_CAS_LOOP(desc);
> + 
> + return state;
> + }
> + 

Hm. It seems a bit over the top to do the full round here. How about

uint32 oldstate;
while (true)
{
oldstate = pg_atomic_fetch_or(..., BM_LOCKED);
if (!(oldstate & BM_LOCKED)
break;
perform_spin_delay();
}
return oldstate | BM_LOCKED;

Especially if we implement atomic xor on x86, that'd likely be a good
bit more efficient.


>  typedef struct BufferDesc
>  {
>   BufferTag   tag;/* ID of page contained in 
> buffer */
> - BufFlagsflags;  /* see bit definitions above */
> - uint8   usage_count;/* usage counter for clock sweep code */
> - slock_t buf_hdr_lock;   /* protects a subset of fields, see 
> above */
> - unsignedrefcount;   /* # of backends holding pins 
> on buffer */
> - int wait_backend_pid;   /* backend PID 
> of pin-count waiter */
>  
> + /* state of the tag, containing flags, refcount and usagecount */
> + pg_atomic_uint32 state;
> +
> + int wait_backend_pid;   /* backend PID 
> of pin-count waiter */
>   int buf_id; /* buffer's index 
> number (from 0) */
>   int freeNext;   /* link in freelist 
> chain */

Hm. Won't matter on most platforms, but it seems like a good idea to
move to an 8 byte aligned boundary. Move buf_id up?

> +/*
> + * Support for spin delay which could be useful in other places where
> + * spinlock-like procedures take place.
> + */
> +typedef struct
> +{
> + int spins;
> + int delays;
> + int cur_delay;
> + Pointer ptr;
> + const char *file;
> + int line;
> +} SpinDelayStatus;
> +
> +#define init_spin_delay(ptr) {0, 0, 0, (ptr), __FILE__, __LINE__}
> +
> +void make_spin_delay(SpinDelayStatus *status);
> +void finish_spin_delay(SpinDelayStatus *status);
> +
>  #endif/* S_LOCK_H */

s/make_spin_delay/perform_spin_delay/? The former sounds like it's
allocating something or such.


Regards,

Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Move PinBuffer and UnpinBuffer to atomics

2016-03-31 Thread Alexander Korotkov

Hi!

On Thu, Mar 31, 2016 at 4:59 PM, Amit Kapila 
wrote:

> On Tue, Mar 29, 2016 at 10:52 PM, Alexander Korotkov <
> a.korot...@postgrespro.ru> wrote:
>
>> Hi, Andres!
>>
>> Please, find next revision of patch in attachment.
>>
>>
> Couple of minor comments:
>
> +  * The following two macroses
>
> is macroses right word to be used here?
>
> +  * of this loop.  It should be used as fullowing:
>
> /fullowing/following
>
> +  * For local buffers usage of these macros shouldn't be used.
>
> isn't it better to write it as
>
> For local buffers, these macros shouldn't be used.
>
>
>   static int ts_ckpt_progress_comparator(Datum a, Datum b, void *arg);
>
> -
>
> Spurious line deletion.
>

All of above is fixed.

+  * Since buffers are pinned/unpinned very frequently, this functions tries
> +  * to pin buffer as cheap as possible.
>
> /this functions tries
>
> which functions are you referring here? Comment seems to be slightly
> unclear.
>

I meant just PinBuffer() there.  UnpinBuffer() has another comment in the
body.  Fixed.


> ! if (XLogHintBitIsNeeded() && (pg_atomic_read_u32(>state) &
> BM_PERMANENT))
>
> Is there a reason that you have kept macro's to read refcount and
> usagecount, but not for flags?
>

We could change dealing with flags to GET/SET macros.  But I guess such
change would make review more complicated, because it would touch some
places which are unchanged for now.  I think it could be done in a separate
refactoring patch.

Apart from this, I have verified that patch compiles on Windows and passed
> regressions (make check)!
>

Thank you!  I didn't manage to try this on Windows.


> Nice work!
>

Thank you!

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


pinunpin-cas-8.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Move PinBuffer and UnpinBuffer to atomics

2016-03-31 Thread Tom Lane

Andres Freund  writes:
> Oh. I confused my approaches. I was thinking about going for 2):

>> 2) Replace the lwlock spinlock by a bit in LWLock->state. That'd avoid
>> embedding the spinlock, and actually might allow to avoid one atomic
>> op in a number of cases.

> precisely because of that concern.

Oh, okay, ignore my comment just now in the other thread.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Move PinBuffer and UnpinBuffer to atomics

2016-03-31 Thread Amit Kapila

On Tue, Mar 29, 2016 at 10:52 PM, Alexander Korotkov <
a.korot...@postgrespro.ru> wrote:

> Hi, Andres!
>
> Please, find next revision of patch in attachment.
>
>
Couple of minor comments:

+  * The following two macroses

is macroses right word to be used here?

+  * of this loop.  It should be used as fullowing:

/fullowing/following

+  * For local buffers usage of these macros shouldn't be used.

isn't it better to write it as

For local buffers, these macros shouldn't be used.

  static int ts_ckpt_progress_comparator(Datum a, Datum b, void *arg);

-

Spurious line deletion.

+  * Since buffers are pinned/unpinned very frequently, this functions tries
+  * to pin buffer as cheap as possible.

/this functions tries

which functions are you referring here? Comment seems to be slightly
unclear.

! if (XLogHintBitIsNeeded() && (pg_atomic_read_u32(>state) &
BM_PERMANENT))

Is there a reason that you have kept macro's to read refcount and
usagecount, but not for flags?

Apart from this, I have verified that patch compiles on Windows and passed
regressions (make check)!

Nice work!

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: [HACKERS] Move PinBuffer and UnpinBuffer to atomics

2016-03-31 Thread Andres Freund

On 2016-03-31 12:58:55 +0200, Andres Freund wrote:
> On 2016-03-31 06:54:02 -0400, Robert Haas wrote:
> > On Wed, Mar 30, 2016 at 3:16 AM, Andres Freund  wrote:
> > > Yea, as Tom pointed out that's not going to work.  I'll try to write a
> > > patch for approach 1).
> > 
> > Does this mean that any platform that wants to perform well will now
> > need a sub-4-byte spinlock implementation?  That's has a somewhat
> > uncomfortable sound to it.
> 
> Oh. I confused my approaches. I was thinking about going for 2):
> 
> > 2) Replace the lwlock spinlock by a bit in LWLock->state. That'd avoid
> >embedding the spinlock, and actually might allow to avoid one atomic
> >op in a number of cases.
> 
> precisely because of that concern.

Here's a WIP patch to evaluate. Dilip/Ashutosh, could you perhaps run
some benchmarks, to see whether this addresses the performance issues?

I guess it'd both be interesting to compare master with master + patch,
and this thread's latest patch with the patch additionally applied.

Thanks!

Andres
>From 623581574409bdf5cbfbba005bb2e961cb689573 Mon Sep 17 00:00:00 2001
From: Andres Freund 
Date: Thu, 31 Mar 2016 14:14:20 +0200
Subject: [PATCH 1/4] WIP: Avoid the use of a separate spinlock to protect
 LWLock's wait queue.

Previously we used a spinlock, in adition to the atomically manipulated
->state field, to protect the wait queue. But it's pretty simple to
instead perform the locking using a flag in state.

This tries to address a performance regression on PPC due to
6150a1b0. Our PPC spinlocks are 4 byte each, which makes BufferDesc
exceed 64bytes, causing cacheline-sharing issues.x

Discussion: caa4ek1+zeb8pmwwktf+3brs0pt4ux6rs6aom0uip8c6shjw...@mail.gmail.com
20160327121858.zrmrjegmji2ym...@alap3.anarazel.de
---
 src/backend/storage/lmgr/lwlock.c | 178 --
 src/include/storage/lwlock.h  |   1 -
 2 files changed, 94 insertions(+), 85 deletions(-)

diff --git a/src/backend/storage/lmgr/lwlock.c b/src/backend/storage/lmgr/lwlock.c
index 53ae7d5..ec6baf6 100644
--- a/src/backend/storage/lmgr/lwlock.c
+++ b/src/backend/storage/lmgr/lwlock.c
@@ -97,6 +97,7 @@ extern slock_t *ShmemLock;
 
 #define LW_FLAG_HAS_WAITERS			((uint32) 1 << 30)
 #define LW_FLAG_RELEASE_OK			((uint32) 1 << 29)
+#define LW_FLAG_LOCKED((uint32) 1 << 28)
 
 #define LW_VAL_EXCLUSIVE			((uint32) 1 << 24)
 #define LW_VAL_SHARED1
@@ -711,7 +712,6 @@ RequestNamedLWLockTranche(const char *tranche_name, int num_lwlocks)
 void
 LWLockInitialize(LWLock *lock, int tranche_id)
 {
-	SpinLockInit(>mutex);
 	pg_atomic_init_u32(>state, LW_FLAG_RELEASE_OK);
 #ifdef LOCK_DEBUG
 	pg_atomic_init_u32(>nwaiters, 0);
@@ -842,6 +842,56 @@ LWLockAttemptLock(LWLock *lock, LWLockMode mode)
 	pg_unreachable();
 }
 
+static void
+LWLockWaitListLock(LWLock *lock)
+{
+	uint32 old_state;
+#ifdef LWLOCK_STATS
+	lwlock_stats *lwstats;
+	uint32 delays = 0;
+
+	lwstats = get_lwlock_stats_entry(lock);
+#endif
+
+	old_state = pg_atomic_read_u32(>state);
+	while (true)
+	{
+		if (old_state & LW_FLAG_LOCKED)
+		{
+			/* FIXME: add exponential backoff */
+			pg_spin_delay();
+			old_state = pg_atomic_read_u32(>state);
+#ifdef LWLOCK_STATS
+			delays++;
+#endif
+		}
+		else
+		{
+			old_state = pg_atomic_fetch_or_u32(>state, LW_FLAG_LOCKED);
+			if (!(old_state & LW_FLAG_LOCKED))
+			{
+/* got lock */
+break;
+			}
+		}
+	}
+
+#ifdef LWLOCK_STATS
+	lwstats->spin_delay_count += delays;
+#endif
+
+}
+
+static void
+LWLockWaitListUnlock(LWLock *lock)
+{
+	uint32 old_state PG_USED_FOR_ASSERTS_ONLY;
+
+	old_state = pg_atomic_fetch_and_u32(>state, ~LW_FLAG_LOCKED);
+
+	Assert(old_state & LW_FLAG_LOCKED);
+}
+
 /*
  * Wakeup all the lockers that currently have a chance to acquire the lock.
  */
@@ -852,22 +902,13 @@ LWLockWakeup(LWLock *lock)
 	bool		wokeup_somebody = false;
 	dlist_head	wakeup;
 	dlist_mutable_iter iter;
-#ifdef LWLOCK_STATS
-	lwlock_stats *lwstats;
-
-	lwstats = get_lwlock_stats_entry(lock);
-#endif
 
 	dlist_init();
 
 	new_release_ok = true;
 
 	/* Acquire mutex.  Time spent holding mutex should be short! */
-#ifdef LWLOCK_STATS
-	lwstats->spin_delay_count += SpinLockAcquire(>mutex);
-#else
-	SpinLockAcquire(>mutex);
-#endif
+	LWLockWaitListLock(lock);
 
 	dlist_foreach_modify(iter, >waiters)
 	{
@@ -904,19 +945,34 @@ LWLockWakeup(LWLock *lock)
 
 	Assert(dlist_is_empty() || pg_atomic_read_u32(>state) & LW_FLAG_HAS_WAITERS);
 
-	/* Unset both flags at once if required */
-	if (!new_release_ok && dlist_is_empty())
-		pg_atomic_fetch_and_u32(>state,
-~(LW_FLAG_RELEASE_OK | LW_FLAG_HAS_WAITERS));
-	else if (!new_release_ok)
-		pg_atomic_fetch_and_u32(>state, ~LW_FLAG_RELEASE_OK);
-	else if (dlist_is_empty())
-		pg_atomic_fetch_and_u32(>state, ~LW_FLAG_HAS_WAITERS);
-	else if (new_release_ok)
-		pg_atomic_fetch_or_u32(>state, LW_FLAG_RELEASE_OK);
+	/* unset required flags, and release lock, in one fell swoop */
+	{
+		uint32

Re: [HACKERS] Move PinBuffer and UnpinBuffer to atomics

2016-03-31 Thread Andres Freund

On 2016-03-31 06:54:02 -0400, Robert Haas wrote:
> On Wed, Mar 30, 2016 at 3:16 AM, Andres Freund  wrote:
> > Yea, as Tom pointed out that's not going to work.  I'll try to write a
> > patch for approach 1).
> 
> Does this mean that any platform that wants to perform well will now
> need a sub-4-byte spinlock implementation?  That's has a somewhat
> uncomfortable sound to it.

Oh. I confused my approaches. I was thinking about going for 2):

> 2) Replace the lwlock spinlock by a bit in LWLock->state. That'd avoid
>embedding the spinlock, and actually might allow to avoid one atomic
>op in a number of cases.

precisely because of that concern.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Move PinBuffer and UnpinBuffer to atomics

2016-03-31 Thread Robert Haas

On Wed, Mar 30, 2016 at 3:16 AM, Andres Freund  wrote:
> On 2016-03-30 07:13:16 +0530, Dilip Kumar wrote:
>> On Tue, Mar 29, 2016 at 10:43 PM, Andres Freund  wrote:
>>
>> > My gut feeling is that we should do both 1) and 2).
>> >
>> > Dilip, could you test performance of reducing ppc's spinlock to 1 byte?
>> > Cross-compiling suggest that doing so "just works".  I.e. replace the
>> > #if defined(__ppc__) typedef from an int to a char.
>> >
>>
>> I set that, but after that it hangs, even Initdb hangs..
>
> Yea, as Tom pointed out that's not going to work.  I'll try to write a
> patch for approach 1).

Does this mean that any platform that wants to perform well will now
need a sub-4-byte spinlock implementation?  That's has a somewhat
uncomfortable sound to it.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Move PinBuffer and UnpinBuffer to atomics

2016-03-30 Thread Alexander Korotkov

On Wed, Mar 30, 2016 at 10:16 AM, Andres Freund  wrote:

> On 2016-03-30 07:13:16 +0530, Dilip Kumar wrote:
> > On Tue, Mar 29, 2016 at 10:43 PM, Andres Freund 
> wrote:
> >
> > > My gut feeling is that we should do both 1) and 2).
> > >
> > > Dilip, could you test performance of reducing ppc's spinlock to 1 byte?
> > > Cross-compiling suggest that doing so "just works".  I.e. replace the
> > > #if defined(__ppc__) typedef from an int to a char.
> > >
> >
> > I set that, but after that it hangs, even Initdb hangs..
>
> Yea, as Tom pointed out that's not going to work.  I'll try to write a
> patch for approach 1).
>

Great!  Do you need any improvements for pinunpin-cas-7.patch from me?

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: [HACKERS] Move PinBuffer and UnpinBuffer to atomics

2016-03-30 Thread Andres Freund

On 2016-03-30 07:13:16 +0530, Dilip Kumar wrote:
> On Tue, Mar 29, 2016 at 10:43 PM, Andres Freund  wrote:
> 
> > My gut feeling is that we should do both 1) and 2).
> >
> > Dilip, could you test performance of reducing ppc's spinlock to 1 byte?
> > Cross-compiling suggest that doing so "just works".  I.e. replace the
> > #if defined(__ppc__) typedef from an int to a char.
> >
> 
> I set that, but after that it hangs, even Initdb hangs..

Yea, as Tom pointed out that's not going to work.  I'll try to write a
patch for approach 1).

Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Move PinBuffer and UnpinBuffer to atomics

2016-03-29 Thread Dilip Kumar

On Tue, Mar 29, 2016 at 10:43 PM, Andres Freund  wrote:

> My gut feeling is that we should do both 1) and 2).
>
> Dilip, could you test performance of reducing ppc's spinlock to 1 byte?
> Cross-compiling suggest that doing so "just works".  I.e. replace the
> #if defined(__ppc__) typedef from an int to a char.
>

I set that, but after that it hangs, even Initdb hangs..

 int

   │164 s_lock(volatile slock_t *lock, const char *file, int line)


   │165 {


   │166 SpinDelayStatus delayStatus;


   │167


   │168 init_spin_delay(, (Pointer)lock, file,
line);

   │169


*   │170 while (TAS_SPIN(lock))

 *
*   │171 {

   *
*  >│172 make_spin_delay();

  *
*   │173 }*


   │174

I did not try to find the reason, but just built in debug mode and found it
never come out of this loop.

I clean build multiple times but problem persist,

Does it have dependency of some other variable of defined under PPC in some
other place ? I don't know..

/* PowerPC */

*#if* defined(__ppc__) || defined(__powerpc__) || defined(__ppc64__) ||
defined(__powerpc64__)

*#define* HAS_TEST_AND_SET

*typedef* *unsigned* *int* slock_t;  --> changed like this

*#define* TAS(lock) tas(lock)

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

Re: [HACKERS] Move PinBuffer and UnpinBuffer to atomics

2016-03-29 Thread Andres Freund

On 2016-03-29 14:09:42 -0400, Tom Lane wrote:
> Andres Freund  writes:
> > There's actually lbarx/stbcx - but it's not present in all ISAs. So I
> > guess it's clear where to go.
> 
> Hm.  We could certainly add a configure test to see if the local assembler
> knows these instructions --- but it's not clear that we should depend on
> compile-time environment to match run-time.

I think it'd be easier to continue using lwarx/stwcx, but be careful
about only testing/setting the lowest byte, if we want to go there. But
that then likely would require hints about alignment to the compiler...

i've no experience writing ppc assembly, but it doesn't look too hard.

but i think it's easier to just remove the spinlock from struct lwlock
then - and it also improves the situation for other architectures with
wider spinlocks.  i think that's beneficial for some future things
anyway.

> Googling suggests that these instructions came in with PPC ISA 2.06
> which seems to date to 2010.  So there's undoubtedly still production
> hardware without them.
> 
> In the department of not-production hardware, I checked this on prairiedog
> and got
> /var/tmp/ccbQy9uG.s:1722:Invalid mnemonic 'lbarx'
> /var/tmp/ccbQy9uG.s:1726:Invalid mnemonic 'stbcx.'

Heh. Thanks for testing.

Andres

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Move PinBuffer and UnpinBuffer to atomics

2016-03-29 Thread Tom Lane

Andres Freund  writes:
> On 2016-03-29 13:24:40 -0400, Tom Lane wrote:
>> AFAICS, lwarx/stwcx are specifically *word* wide.

> There's actually lbarx/stbcx - but it's not present in all ISAs. So I
> guess it's clear where to go.

Hm.  We could certainly add a configure test to see if the local assembler
knows these instructions --- but it's not clear that we should depend on
compile-time environment to match run-time.

Googling suggests that these instructions came in with PPC ISA 2.06
which seems to date to 2010.  So there's undoubtedly still production
hardware without them.

In the department of not-production hardware, I checked this on prairiedog
and got
/var/tmp/ccbQy9uG.s:1722:Invalid mnemonic 'lbarx'
/var/tmp/ccbQy9uG.s:1726:Invalid mnemonic 'stbcx.'

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Move PinBuffer and UnpinBuffer to atomics

2016-03-29 Thread Andres Freund

On 2016-03-29 20:22:00 +0300, Alexander Korotkov wrote:
> > > + while (true)
> > >   {
> > > - if (buf->usage_count == 0)
> > > - buf->usage_count = 1;
> > > + /* spin-wait till lock is free */
> > > + while (state & BM_LOCKED)
> > > + {
> > > + make_spin_delay();
> > > + state = pg_atomic_read_u32(>state);
> > > + oldstate = state;
> > > + }
> >
> > Maybe we should abstract that to pg_atomic_wait_bit_unset_u32()? It
> > seems quite likely we need this in other places (e.g. lwlock.c itself).
> >
> 
> I have some doubts about such function.  At first, I can't find a place for
> it in lwlock.c.  It doesn't run explicit spin delays yet.  Is it related to
> some changes you're planning for lwlock.c.

Yes, see 
http://www.postgresql.org/message-id/20160328130904.4mhugvkf4f3wg...@awork2.anarazel.de


> > >   lsn = XLogSaveBufferForHint(buffer, buffer_std);
> > >   }
> > >
> > > - LockBufHdr(bufHdr);
> > > - Assert(bufHdr->refcount > 0);
> > > - if (!(bufHdr->flags & BM_DIRTY))
> > > + state = LockBufHdr(bufHdr);
> > > +
> > > + Assert(BUF_STATE_GET_REFCOUNT(state) > 0);
> > > +
> > > + if (!(state & BM_DIRTY))
> > >   {
> > >   dirtied = true; /* Means "will be dirtied
> > by this action" */
> >
> > It's again worthwhile to try make this work without taking the lock.
> >
> 
> Comment there claims.
> 
> /*
>  * Set the page LSN if we wrote a backup block. We aren't supposed
>  * to set this when only holding a share lock but as long as we
>  * serialise it somehow we're OK. We choose to set LSN while
>  * holding the buffer header lock, which causes any reader of an
>  * LSN who holds only a share lock to also obtain a buffer header
>  * lock before using PageGetLSN(), which is enforced in
>  * BufferGetLSNAtomic().
> 
> Thus, buffer header lock is used for write serialization.  I don't think it
> would be correct to remove the lock here.

Gah, I forgot about this uglyness. Man, the checksumming commit sure
made a number of things really ugly.

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Move PinBuffer and UnpinBuffer to atomics

2016-03-29 Thread Andres Freund

On 2016-03-29 13:24:40 -0400, Tom Lane wrote:
> Andres Freund  writes:
> > Dilip, could you test performance of reducing ppc's spinlock to 1 byte?
> > Cross-compiling suggest that doing so "just works".  I.e. replace the
> > #if defined(__ppc__) typedef from an int to a char.
> 
> AFAICS, lwarx/stwcx are specifically *word* wide.

Ah, x86 gcc is just too convenient, with it automatically adjusting
instruction types :(

There's actually lbarx/stbcx - but it's not present in all ISAs. So I
guess it's clear where to go.

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Move PinBuffer and UnpinBuffer to atomics

2016-03-29 Thread Tom Lane

Andres Freund  writes:
> Dilip, could you test performance of reducing ppc's spinlock to 1 byte?
> Cross-compiling suggest that doing so "just works".  I.e. replace the
> #if defined(__ppc__) typedef from an int to a char.

AFAICS, lwarx/stwcx are specifically *word* wide.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Move PinBuffer and UnpinBuffer to atomics

2016-03-29 Thread Andres Freund

On 2016-03-29 13:09:05 -0400, Robert Haas wrote:
> On Mon, Mar 28, 2016 at 9:09 AM, Andres Freund  wrote:
> > On 2016-03-28 11:48:46 +0530, Dilip Kumar wrote:
> >> On Sun, Mar 27, 2016 at 5:48 PM, Andres Freund  wrote:
> >> > What's sizeof(BufferDesc) after applying these patches? It should better
> >> > be <= 64...
> >> >
> >>
> >> It is 72.
> >
> > Ah yes, miscalculated the required alignment.  Hm. So we got to get this
> > smaller. I see three approaches:
> >
> > 1) Reduce the spinlock size on ppc. That actually might just work by
> >replacing "unsigned int" by "unsigned char"
> > 2) Replace the lwlock spinlock by a bit in LWLock->state. That'd avoid
> >embedding the spinlock, and actually might allow to avoid one atomic
> >op in a number of cases.
> > 3) Shrink the size of BufferDesc by removing buf_id; that'd bring it to
> >64byte.
> >
> > I'm a bit hesitant to go for 3), because it'd likely end up adding a bit
> > of arithmetic to a number of places in bufmgr.c.  Robert, what do you
> > think?
> 
> I don't have a clear idea what's going to be better here.

My gut feeling is that we should do both 1) and 2).

Dilip, could you test performance of reducing ppc's spinlock to 1 byte?
Cross-compiling suggest that doing so "just works".  I.e. replace the
#if defined(__ppc__) typedef from an int to a char.

Regards,

Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Move PinBuffer and UnpinBuffer to atomics

2016-03-29 Thread Robert Haas

On Mon, Mar 28, 2016 at 9:09 AM, Andres Freund  wrote:
> On 2016-03-28 11:48:46 +0530, Dilip Kumar wrote:
>> On Sun, Mar 27, 2016 at 5:48 PM, Andres Freund  wrote:
>> > What's sizeof(BufferDesc) after applying these patches? It should better
>> > be <= 64...
>> >
>>
>> It is 72.
>
> Ah yes, miscalculated the required alignment.  Hm. So we got to get this
> smaller. I see three approaches:
>
> 1) Reduce the spinlock size on ppc. That actually might just work by
>replacing "unsigned int" by "unsigned char"
> 2) Replace the lwlock spinlock by a bit in LWLock->state. That'd avoid
>embedding the spinlock, and actually might allow to avoid one atomic
>op in a number of cases.
> 3) Shrink the size of BufferDesc by removing buf_id; that'd bring it to
>64byte.
>
> I'm a bit hesitant to go for 3), because it'd likely end up adding a bit
> of arithmetic to a number of places in bufmgr.c.  Robert, what do you
> think?

I don't have a clear idea what's going to be better here.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Move PinBuffer and UnpinBuffer to atomics

2016-03-28 Thread Alvaro Herrera

Andres Freund wrote:
> On 2016-03-28 15:46:43 +0300, Alexander Korotkov wrote:

> > @@ -932,8 +936,13 @@ ReadBuffer_common(SMgrRelation smgr, cha
> >  
> > if (isLocalBuf)
> > {
> > -   /* Only need to adjust flags */
> > -   bufHdr->flags |= BM_VALID;
> > +   /*
> > +* Only need to adjust flags.  Since it's local buffer, there 
> > is no
> > +* concurrency.  We assume read/write pair to be cheaper than 
> > atomic OR.
> > +*/
> > +   uint32 state = pg_atomic_read_u32(>state);
> > +   state |= BM_VALID;
> > +   pg_atomic_write_u32(>state, state);
> 
> I'm not a fan of repeating that comment in multiple places. Hm.

Perhaps a set of macros should be offered that do "read, set some flag,
write".  Then the comments can be attached to the macro instead of to
each callsite.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Move PinBuffer and UnpinBuffer to atomics

2016-03-28 Thread Andres Freund

On 2016-03-28 15:46:43 +0300, Alexander Korotkov wrote:
> diff --git a/src/backend/storage/buffer/bufmnew file mode 100644
> index 6dd7c6e..fe6fb9c
> --- a/src/backend/storage/buffer/bufmgr.c
> +++ b/src/backend/storage/buffer/bufmgr.c
> @@ -52,7 +52,6 @@
>  #include "utils/resowner_private.h"
>  #include "utils/timestamp.h"
>  
> -
>  /* Note: these two macros only work on shared buffers, not local ones! */
>  #define BufHdrGetBlock(bufHdr)   ((Block) (BufferBlocks + ((Size) 
> (bufHdr)->buf_id) * BLCKSZ))
>  #define BufferGetLSN(bufHdr) (PageGetLSN(BufHdrGetBlock(bufHdr)))

spurious

> @@ -848,7 +852,7 @@ ReadBuffer_common(SMgrRelation smgr, cha
>* it's not been recycled) but come right back here to try smgrextend
>* again.
>*/
> - Assert(!(bufHdr->flags & BM_VALID));/* spinlock not needed 
> */
> + Assert(!(pg_atomic_read_u32(>state) & BM_VALID)); /* header 
> lock not needed */
>  
>   bufBlock = isLocalBuf ? LocalBufHdrGetBlock(bufHdr) : 
> BufHdrGetBlock(bufHdr);
>  
> @@ -932,8 +936,13 @@ ReadBuffer_common(SMgrRelation smgr, cha
>  
>   if (isLocalBuf)
>   {
> - /* Only need to adjust flags */
> - bufHdr->flags |= BM_VALID;
> + /*
> +  * Only need to adjust flags.  Since it's local buffer, there 
> is no
> +  * concurrency.  We assume read/write pair to be cheaper than 
> atomic OR.
> +  */
> + uint32 state = pg_atomic_read_u32(>state);
> + state |= BM_VALID;
> + pg_atomic_write_u32(>state, state);

I'm not a fan of repeating that comment in multiple places. Hm.


>   }
>   else
>   {
> @@ -987,10 +996,11 @@ BufferAlloc(SMgrRelation smgr, char relp
>   BufferTag   oldTag; /* previous identity of 
> selected buffer */
>   uint32  oldHash;/* hash value for oldTag */
>   LWLock *oldPartitionLock;   /* buffer partition lock for it 
> */
> - BufFlagsoldFlags;
> + uint32  oldFlags;
>   int buf_id;
>   BufferDesc *buf;
>   boolvalid;
> + uint32  state;
>  
>   /* create a tag so we can lookup the buffer */
>   INIT_BUFFERTAG(newTag, smgr->smgr_rnode.node, forkNum, blockNum);
> @@ -1050,23 +1060,23 @@ BufferAlloc(SMgrRelation smgr, char relp
>   for (;;)
>   {
>   /*
> -  * Ensure, while the spinlock's not yet held, that there's a 
> free
> +  * Ensure, while the header lock isn't yet held, that there's a 
> free
>* refcount entry.
>*/
>   ReservePrivateRefCountEntry();
>  
>   /*
>* Select a victim buffer.  The buffer is returned with its 
> header
> -  * spinlock still held!
> +  * lock still held!
>*/
> - buf = StrategyGetBuffer(strategy);
> + buf = StrategyGetBuffer(strategy, );

The new thing really still is a spinlock, not sure if it's worth
changing the comments that way.


> @@ -1319,7 +1330,7 @@ BufferAlloc(SMgrRelation smgr, char relp
>   * InvalidateBuffer -- mark a shared buffer invalid and return it to the
>   * freelist.
>   *
> - * The buffer header spinlock must be held at entry.  We drop it before
> + * The buffer header lock must be held at entry.  We drop it before
>   * returning.  (This is sane because the caller must have locked the
>   * buffer in order to be sure it should be dropped.)
>   *

Ok, by now I'm pretty sure that I don't want this to be changed
everywhere, just makes reviewing harder.



> @@ -1433,6 +1443,7 @@ void
>  MarkBufferDirty(Buffer buffer)
>  {
>   BufferDesc *bufHdr;
> + uint32  state;
>  
>   if (!BufferIsValid(buffer))
>   elog(ERROR, "bad buffer ID: %d", buffer);
> @@ -1449,14 +1460,14 @@ MarkBufferDirty(Buffer buffer)
>   /* unfortunately we can't check if the lock is held exclusively */
>   Assert(LWLockHeldByMe(BufferDescriptorGetContentLock(bufHdr)));
>  
> - LockBufHdr(bufHdr);
> + state = LockBufHdr(bufHdr);
>  
> - Assert(bufHdr->refcount > 0);
> + Assert(BUF_STATE_GET_REFCOUNT(state) > 0);
>  
>   /*
>* If the buffer was not dirty already, do vacuum accounting.
>*/
> - if (!(bufHdr->flags & BM_DIRTY))
> + if (!(state & BM_DIRTY))
>   {
>   VacuumPageDirty++;
>   pgBufferUsage.shared_blks_dirtied++;
> @@ -1464,9 +1475,9 @@ MarkBufferDirty(Buffer buffer)
>   VacuumCostBalance += VacuumCostPageDirty;
>   }
>  
> - bufHdr->flags |= (BM_DIRTY | BM_JUST_DIRTIED);
> -
> - UnlockBufHdr(bufHdr);
> + state |= BM_DIRTY | BM_JUST_DIRTIED;
> + state &= ~BM_LOCKED;
> + pg_atomic_write_u32(>state, state);
>  }

Hm, this is a routine I think we should avoid taking the lock in; it's
often called quite

Re: [HACKERS] Move PinBuffer and UnpinBuffer to atomics

2016-03-28 Thread Andres Freund

On 2016-03-28 11:48:46 +0530, Dilip Kumar wrote:
> On Sun, Mar 27, 2016 at 5:48 PM, Andres Freund  wrote:
>
> >
> > What's sizeof(BufferDesc) after applying these patches? It should better
> > be <= 64...
> >
>
> It is 72.

Ah yes, miscalculated the required alignment.  Hm. So we got to get this
smaller. I see three approaches:

1) Reduce the spinlock size on ppc. That actually might just work by
   replacing "unsigned int" by "unsigned char"
2) Replace the lwlock spinlock by a bit in LWLock->state. That'd avoid
   embedding the spinlock, and actually might allow to avoid one atomic
   op in a number of cases.
3) Shrink the size of BufferDesc by removing buf_id; that'd bring it to
   64byte.

I'm a bit hesitant to go for 3), because it'd likely end up adding a bit
of arithmetic to a number of places in bufmgr.c.  Robert, what do you
think?

Andres

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Move PinBuffer and UnpinBuffer to atomics

2016-03-28 Thread Alexander Korotkov

On Sun, Mar 27, 2016 at 4:31 PM, Alexander Korotkov <
a.korot...@postgrespro.ru> wrote:

> On Sun, Mar 27, 2016 at 3:10 PM, Andres Freund  wrote:
>
>> On 2016-03-27 12:38:25 +0300, Alexander Korotkov wrote:
>> > On Sat, Mar 26, 2016 at 1:26 AM, Alexander Korotkov <
>> > a.korot...@postgrespro.ru> wrote:
>> >
>> > > Thank you very much for testing!
>> > > I also got access to 4 x 18 Intel server with 144 threads. I'm going
>> to
>> > > post results of tests on this server in next Monday.
>> > >
>> >
>> > I've run pgbench tests on this machine: pgbench -s 1000 -c $clients -j
>> 100
>> > -M prepared -T 300.
>> > See results in the table and chart.
>> >
>> > clients master  v3  v5
>> > 1   11671   12507   12679
>> > 2   24650   26005   25010
>> > 4   49631   48863   49811
>> > 8   96790   96441   99946
>> > 10  121275  119928  124100
>> > 20  243066  243365  246432
>> > 30  359616  342241  357310
>> > 40  431375  415310  441619
>> > 50  489991  489896  500590
>> > 60  538057  636473  554069
>> > 70  588659  714426  738535
>> > 80  405008  923039  902632
>> > 90  295443  1181247 1155918
>> > 100 258695  1323125 1325019
>> > 110 238842  1393767 1410274
>> > 120 226018  1432504 1474982
>> > 130 215102  1465459 1503241
>> > 140 206415  1470454 1505380
>> > 150 197850  1475479 1519908
>> > 160 190935  1420915 1484868
>> > 170 185835  1438965 1453128
>> > 180 182519  1416252 1453098
>> >
>> > My conclusions are following:
>> > 1) We don't observe any regression in v5 in comparison to master.
>> > 2) v5 in most of cases slightly outperforms v3.
>>
>> What commit did you base these tests on? I guess something recent, after
>> 98a64d0bd?
>>
>
> Yes, more recent than 98a64d0bd. It was based on 676265eb7b.
>
>
>> > I'm going to do some code cleanup of v5 in Monday
>>
>> Ok, I'll try to do a review and possibly commit after that.
>>
>
> Sounds good.
>

There is next revision of patch.  It contains mostly cosmetic changes.
Comment are adjusted to reflect changes of behaviour.
I also changed atomic AND/OR for local buffers to read/write pair which
would be cheaper I suppose.  However, I don't insist on it, and it could be
reverted.
The patch is ready for your review.  It's especially interesting what do
you think about the way I abstracted exponential back off of spinlock.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


pinunpin-cas-6.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Move PinBuffer and UnpinBuffer to atomics

2016-03-28 Thread Dilip Kumar

On Sun, Mar 27, 2016 at 5:48 PM, Andres Freund  wrote:

>
> What's sizeof(BufferDesc) after applying these patches? It should better
> be <= 64...
>

It is 72.


-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

Re: [HACKERS] Move PinBuffer and UnpinBuffer to atomics

2016-03-27 Thread Alexander Korotkov

On Sun, Mar 27, 2016 at 3:10 PM, Andres Freund  wrote:

> On 2016-03-27 12:38:25 +0300, Alexander Korotkov wrote:
> > On Sat, Mar 26, 2016 at 1:26 AM, Alexander Korotkov <
> > a.korot...@postgrespro.ru> wrote:
> >
> > > Thank you very much for testing!
> > > I also got access to 4 x 18 Intel server with 144 threads. I'm going to
> > > post results of tests on this server in next Monday.
> > >
> >
> > I've run pgbench tests on this machine: pgbench -s 1000 -c $clients -j
> 100
> > -M prepared -T 300.
> > See results in the table and chart.
> >
> > clients master  v3  v5
> > 1   11671   12507   12679
> > 2   24650   26005   25010
> > 4   49631   48863   49811
> > 8   96790   96441   99946
> > 10  121275  119928  124100
> > 20  243066  243365  246432
> > 30  359616  342241  357310
> > 40  431375  415310  441619
> > 50  489991  489896  500590
> > 60  538057  636473  554069
> > 70  588659  714426  738535
> > 80  405008  923039  902632
> > 90  295443  1181247 1155918
> > 100 258695  1323125 1325019
> > 110 238842  1393767 1410274
> > 120 226018  1432504 1474982
> > 130 215102  1465459 1503241
> > 140 206415  1470454 1505380
> > 150 197850  1475479 1519908
> > 160 190935  1420915 1484868
> > 170 185835  1438965 1453128
> > 180 182519  1416252 1453098
> >
> > My conclusions are following:
> > 1) We don't observe any regression in v5 in comparison to master.
> > 2) v5 in most of cases slightly outperforms v3.
>
> What commit did you base these tests on? I guess something recent, after
> 98a64d0bd?
>

Yes, more recent than 98a64d0bd. It was based on 676265eb7b.


> > I'm going to do some code cleanup of v5 in Monday
>
> Ok, I'll try to do a review and possibly commit after that.
>

Sounds good.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: [HACKERS] Move PinBuffer and UnpinBuffer to atomics

2016-03-27 Thread Andres Freund

On 2016-03-27 17:45:52 +0530, Dilip Kumar wrote:
> On Sun, Mar 27, 2016 at 5:37 PM, Andres Freund  wrote:
> 
> > On what hardware did you run these tests?
> 
> 
> IBM POWER 8 MACHINE.
> 
> Architecture:  ppc64le
> Byte Order:Little Endian
> CPU(s):192
> Thread(s) per core:8
> Core(s) per socket:1
> Socket(s):24
> NUMA node(s): 4

What's sizeof(BufferDesc) after applying these patches? It should better
be <= 64...

Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Move PinBuffer and UnpinBuffer to atomics

2016-03-27 Thread Dilip Kumar

On Sun, Mar 27, 2016 at 5:37 PM, Andres Freund  wrote:

> On what hardware did you run these tests?


IBM POWER 8 MACHINE.

Architecture:  ppc64le
Byte Order:Little Endian
CPU(s):192
Thread(s) per core:8
Core(s) per socket:1
Socket(s):24
NUMA node(s): 4

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

Re: [HACKERS] Move PinBuffer and UnpinBuffer to atomics

2016-03-27 Thread Andres Freund

On 2016-03-27 12:38:25 +0300, Alexander Korotkov wrote:
> On Sat, Mar 26, 2016 at 1:26 AM, Alexander Korotkov <
> a.korot...@postgrespro.ru> wrote:
> 
> > Thank you very much for testing!
> > I also got access to 4 x 18 Intel server with 144 threads. I'm going to
> > post results of tests on this server in next Monday.
> >
> 
> I've run pgbench tests on this machine: pgbench -s 1000 -c $clients -j 100
> -M prepared -T 300.
> See results in the table and chart.
> 
> clients master  v3  v5
> 1   11671   12507   12679
> 2   24650   26005   25010
> 4   49631   48863   49811
> 8   96790   96441   99946
> 10  121275  119928  124100
> 20  243066  243365  246432
> 30  359616  342241  357310
> 40  431375  415310  441619
> 50  489991  489896  500590
> 60  538057  636473  554069
> 70  588659  714426  738535
> 80  405008  923039  902632
> 90  295443  1181247 1155918
> 100 258695  1323125 1325019
> 110 238842  1393767 1410274
> 120 226018  1432504 1474982
> 130 215102  1465459 1503241
> 140 206415  1470454 1505380
> 150 197850  1475479 1519908
> 160 190935  1420915 1484868
> 170 185835  1438965 1453128
> 180 182519  1416252 1453098
> 
> My conclusions are following:
> 1) We don't observe any regression in v5 in comparison to master.
> 2) v5 in most of cases slightly outperforms v3.

What commit did you base these tests on? I guess something recent, after
98a64d0bd?

> I'm going to do some code cleanup of v5 in Monday

Ok, I'll try to do a review and possibly commit after that.

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Move PinBuffer and UnpinBuffer to atomics

2016-03-27 Thread Andres Freund

On 2016-03-25 23:02:11 +0530, Dilip Kumar wrote:
> On Fri, Mar 25, 2016 at 8:09 PM, Alexander Korotkov <
> a.korot...@postgrespro.ru> wrote:
> 
> > Could anybody run benchmarks?  Feature freeze is soon, but it would be
> > *very nice* to fit it into 9.6 release cycle, because it greatly improves
> > scalability on large machines.  Without this patch PostgreSQL 9.6 will be
> > significantly behind competitors like MySQL 5.7.
> 
> 
> I have run the performance and here are the results.. With latest patch I
> did not see any regression at lower client count (median of 3 reading).
> 
> scale factor 1000 shared buffer 8GB readonly
> *Client Base patch*
> 1 12957 13068
> 2 24931 25816
> 4 46311 48767
> 32 300921 310062
> 64 387623 493843
> 128 249635 583513
> scale factor 300 shared buffer 8GB readonly
> *Client Base patch*
> 1 14537 14586--> one thread number looks little less, generally I get
> ~18000 (will recheck).
> 2 34703 33929--> may be run to run variance (once I get time, will
> recheck)
> 4 67744 69069
> 32 312575 336012
> 64 213312 539056
> 128 190139 380122
> 
> *Summary:*
> 
> Actually with 64 client we have seen ~470,000 TPS with head also, by
> revering commit 6150a1b0.
> refer this thread: (
> http://www.postgresql.org/message-id/caa4ek1+zeb8pmwwktf+3brs0pt4ux6rs6aom0uip8c6shjw...@mail.gmail.com
> )
> 
> I haven't tested this patch by reverting commit 6150a1b0, so not sure can
> this patch give even better performance ?
> 
> It also points to the case, what Andres has mentioned in this thread.
> 
> http://www.postgresql.org/message-id/20160226191158.3vidtk3ktcmhi...@alap3.anarazel.de

On what hardware did you run these tests?

Thanks,

Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Move PinBuffer and UnpinBuffer to atomics

2016-03-27 Thread Alexander Korotkov

On Sat, Mar 26, 2016 at 1:26 AM, Alexander Korotkov <
a.korot...@postgrespro.ru> wrote:

> Thank you very much for testing!
> I also got access to 4 x 18 Intel server with 144 threads. I'm going to
> post results of tests on this server in next Monday.
>

I've run pgbench tests on this machine: pgbench -s 1000 -c $clients -j 100
-M prepared -T 300.
See results in the table and chart.

clients master  v3  v5
1   11671   12507   12679
2   24650   26005   25010
4   49631   48863   49811
8   96790   96441   99946
10  121275  119928  124100
20  243066  243365  246432
30  359616  342241  357310
40  431375  415310  441619
50  489991  489896  500590
60  538057  636473  554069
70  588659  714426  738535
80  405008  923039  902632
90  295443  1181247 1155918
100 258695  1323125 1325019
110 238842  1393767 1410274
120 226018  1432504 1474982
130 215102  1465459 1503241
140 206415  1470454 1505380
150 197850  1475479 1519908
160 190935  1420915 1484868
170 185835  1438965 1453128
180 182519  1416252 1453098

My conclusions are following:
1) We don't observe any regression in v5 in comparison to master.
2) v5 in most of cases slightly outperforms v3.

I'm going to do some code cleanup of v5 in Monday

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Move PinBuffer and UnpinBuffer to atomics

2016-03-25 Thread Alexander Korotkov

Hi, Dilip!

On Fri, Mar 25, 2016 at 8:32 PM, Dilip Kumar  wrote:

> On Fri, Mar 25, 2016 at 8:09 PM, Alexander Korotkov <
> a.korot...@postgrespro.ru> wrote:
>
>> Could anybody run benchmarks?  Feature freeze is soon, but it would be
>> *very nice* to fit it into 9.6 release cycle, because it greatly improves
>> scalability on large machines.  Without this patch PostgreSQL 9.6 will be
>> significantly behind competitors like MySQL 5.7.
>
>
> I have run the performance and here are the results.. With latest patch I
> did not see any regression at lower client count (median of 3 reading).
>
> scale factor 1000 shared buffer 8GB readonly
> *Client Base patch*
> 1 12957 13068
> 2 24931 25816
> 4 46311 48767
> 32 300921 310062
> 64 387623 493843
> 128 249635 583513
> scale factor 300 shared buffer 8GB readonly
> *Client Base patch*
> 1 14537 14586--> one thread number looks little less, generally I get
> ~18000 (will recheck).
> 2 34703 33929--> may be run to run variance (once I get time, will
> recheck)
> 4 67744 69069
> 32 312575 336012
> 64 213312 539056
> 128 190139 380122
>
> *Summary:*
>
> Actually with 64 client we have seen ~470,000 TPS with head also, by
> revering commit 6150a1b0.
> refer this thread: (
> http://www.postgresql.org/message-id/caa4ek1+zeb8pmwwktf+3brs0pt4ux6rs6aom0uip8c6shjw...@mail.gmail.com
> )
>
> I haven't tested this patch by reverting commit 6150a1b0, so not sure can
> this patch give even better performance ?
>
> It also points to the case, what Andres has mentioned in this thread.
>
>
> http://www.postgresql.org/message-id/20160226191158.3vidtk3ktcmhi...@alap3.anarazel.de
>

Thank you very much for testing!
I also got access to 4 x 18 Intel server with 144 threads. I'm going to
post results of tests on this server in next Monday.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: [HACKERS] Move PinBuffer and UnpinBuffer to atomics

2016-03-25 Thread Dilip Kumar

On Fri, Mar 25, 2016 at 8:09 PM, Alexander Korotkov <
a.korot...@postgrespro.ru> wrote:

> Could anybody run benchmarks?  Feature freeze is soon, but it would be
> *very nice* to fit it into 9.6 release cycle, because it greatly improves
> scalability on large machines.  Without this patch PostgreSQL 9.6 will be
> significantly behind competitors like MySQL 5.7.

I have run the performance and here are the results.. With latest patch I
did not see any regression at lower client count (median of 3 reading).

scale factor 1000 shared buffer 8GB readonly
*Client Base patch*
1 12957 13068
2 24931 25816
4 46311 48767
32 300921 310062
64 387623 493843
128 249635 583513
scale factor 300 shared buffer 8GB readonly
*Client Base patch*
1 14537 14586--> one thread number looks little less, generally I get
~18000 (will recheck).
2 34703 33929--> may be run to run variance (once I get time, will
recheck)
4 67744 69069
32 312575 336012
64 213312 539056
128 190139 380122

*Summary:*

Actually with 64 client we have seen ~470,000 TPS with head also, by
revering commit 6150a1b0.
refer this thread: (
http://www.postgresql.org/message-id/caa4ek1+zeb8pmwwktf+3brs0pt4ux6rs6aom0uip8c6shjw...@mail.gmail.com
)

I haven't tested this patch by reverting commit 6150a1b0, so not sure can
this patch give even better performance ?

It also points to the case, what Andres has mentioned in this thread.

http://www.postgresql.org/message-id/20160226191158.3vidtk3ktcmhi...@alap3.anarazel.de

Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

Re: [HACKERS] Move PinBuffer and UnpinBuffer to atomics

2016-03-25 Thread Alexander Korotkov

On Tue, Mar 22, 2016 at 1:08 PM, Alexander Korotkov <
a.korot...@postgrespro.ru> wrote:

> On Tue, Mar 22, 2016 at 7:57 AM, Dilip Kumar 
> wrote:
>
>>
>> On Tue, Mar 22, 2016 at 12:31 PM, Dilip Kumar 
>> wrote:
>>
>>> ! pg_atomic_write_u32(>state, state);
>>>   } while (!StartBufferIO(bufHdr, true));
>>>
>>> Better Write some comment, about we clearing the BM_LOCKED from stage
>>> directly and need not to call UnlockBufHdr explicitly.
>>> otherwise its confusing.
>>>
>>
>> Few more comments..
>>
>> *** 828,837 
>> */
>>do
>>{
>> !  LockBufHdr(bufHdr);
>> *!  Assert(bufHdr->flags & BM_VALID);*
>> !  bufHdr->flags &= ~BM_VALID;
>> !  UnlockBufHdr(bufHdr);
>>} while (!StartBufferIO(bufHdr, true));
>>}
>>}
>> --- 826,834 
>> */
>>do
>>{
>> !  uint32 state = LockBufHdr(bufHdr);
>> !  state &= ~(BM_VALID | BM_LOCKED);
>> !  pg_atomic_write_u32(>state, state);
>>} while (!StartBufferIO(bufHdr, true));
>>
>> 1. Previously there was a Assert, any reason why we removed it ?
>>  Assert(bufHdr->flags & BM_VALID);
>>
>
> It was missed.  In the attached patch I've put it back.
>
> 2. And also if we don't need assert then can't we directly clear BM_VALID
>> flag from state variable (if not locked) like pin/unpin buffer does,
>> without taking buffer header lock?
>>
>
> In this version of patch it could be also done as loop of CAS operation.
> But I'm not intended to it so because it would significantly complicate
> code.  It's not yes clear that traffic in this place is high enough to make
> such optimizations.
> Since v4 patch implements slightly different approach.  Could you please
> test it?  We need to check that this approach worth putting more efforts on
> it.  Or through it away otherwise.
>

Could anybody run benchmarks?  Feature freeze is soon, but it would be
*very nice* to fit it into 9.6 release cycle, because it greatly improves
scalability on large machines.  Without this patch PostgreSQL 9.6 will be
significantly behind competitors like MySQL 5.7.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: [HACKERS] Move PinBuffer and UnpinBuffer to atomics

2016-03-22 Thread Alexander Korotkov

On Tue, Mar 22, 2016 at 7:57 AM, Dilip Kumar  wrote:

>
> On Tue, Mar 22, 2016 at 12:31 PM, Dilip Kumar 
> wrote:
>
>> ! pg_atomic_write_u32(>state, state);
>>   } while (!StartBufferIO(bufHdr, true));
>>
>> Better Write some comment, about we clearing the BM_LOCKED from stage
>> directly and need not to call UnlockBufHdr explicitly.
>> otherwise its confusing.
>>
>
> Few more comments..
>
> *** 828,837 
> */
>do
>{
> !  LockBufHdr(bufHdr);
> *!  Assert(bufHdr->flags & BM_VALID);*
> !  bufHdr->flags &= ~BM_VALID;
> !  UnlockBufHdr(bufHdr);
>} while (!StartBufferIO(bufHdr, true));
>}
>}
> --- 826,834 
> */
>do
>{
> !  uint32 state = LockBufHdr(bufHdr);
> !  state &= ~(BM_VALID | BM_LOCKED);
> !  pg_atomic_write_u32(>state, state);
>} while (!StartBufferIO(bufHdr, true));
>
> 1. Previously there was a Assert, any reason why we removed it ?
>  Assert(bufHdr->flags & BM_VALID);
>

It was missed.  In the attached patch I've put it back.

2. And also if we don't need assert then can't we directly clear BM_VALID
> flag from state variable (if not locked) like pin/unpin buffer does,
> without taking buffer header lock?
>

In this version of patch it could be also done as loop of CAS operation.
But I'm not intended to it so because it would significantly complicate
code.  It's not yes clear that traffic in this place is high enough to make
such optimizations.
Since v4 patch implements slightly different approach.  Could you please
test it?  We need to check that this approach worth putting more efforts on
it.  Or through it away otherwise.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


pinunpin-cas-5.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Move PinBuffer and UnpinBuffer to atomics

2016-03-21 Thread Dilip Kumar

On Tue, Mar 22, 2016 at 12:31 PM, Dilip Kumar  wrote:

> ! pg_atomic_write_u32(>state, state);
>   } while (!StartBufferIO(bufHdr, true));
>
> Better Write some comment, about we clearing the BM_LOCKED from stage
> directly and need not to call UnlockBufHdr explicitly.
> otherwise its confusing.
>

Few more comments..

*** 828,837 
*/
   do
   {
!  LockBufHdr(bufHdr);
*!  Assert(bufHdr->flags & BM_VALID);*
!  bufHdr->flags &= ~BM_VALID;
!  UnlockBufHdr(bufHdr);
   } while (!StartBufferIO(bufHdr, true));
   }
   }
--- 826,834 
*/
   do
   {
!  uint32 state = LockBufHdr(bufHdr);
!  state &= ~(BM_VALID | BM_LOCKED);
!  pg_atomic_write_u32(>state, state);
   } while (!StartBufferIO(bufHdr, true));

1. Previously there was a Assert, any reason why we removed it ?
 Assert(bufHdr->flags & BM_VALID);

2. And also if we don't need assert then can't we directly clear BM_VALID
flag from state variable (if not locked) like pin/unpin buffer does,
without taking buffer header lock?

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

Re: [HACKERS] Move PinBuffer and UnpinBuffer to atomics

2016-03-21 Thread Dilip Kumar

On Sun, Mar 20, 2016 at 4:10 AM, Alexander Korotkov <
a.korot...@postgrespro.ru> wrote:

> Actually, we behave like old code and do such modifications without
> increasing number of atomic operations.  We can just calculate new value of
> state (including unset of BM_LOCKED flag) and write it to the buffer.  In
> this case we don't require more atomic operations.  However, it becomes not
> safe to use atomic decrement in UnpinBuffer().  We can use there loop of
> CAS which wait buffer header to be unlocked like PinBuffer() does.  I'm not
> sure if this affects multicore scalability, but I think it worth trying.
>
> So, this idea is implemented in attached patch.  Please, try it for both
> regression on lower number of clients and scalability on large number of
> clients.
>
>
Good, seems like we have reduced some instructions, I will test it soon.


> Other changes in this patch:
> 1) PinBuffer() and UnpinBuffer() use exponential backoff in spindealy
> like LockBufHdr() does.
> 2) Previous patch contained revert
> of ac1d7945f866b1928c2554c0f80fd52d7f92.  This was not intentional.
> No, it doesn't contains this revert.
>


Some other comments..

*** ReadBuffer_common(SMgrRelation smgr, cha
*** 828,837 
  */
  do
  {
! LockBufHdr(bufHdr);
! Assert(bufHdr->flags & BM_VALID);
! bufHdr->flags &= ~BM_VALID;
! UnlockBufHdr(bufHdr);
  } while (!StartBufferIO(bufHdr, true));
  }
  }
--- 826,834 
  */
  do
  {
! uint32 state = LockBufHdr(bufHdr);
! state &= ~(BM_VALID | BM_LOCKED);
! pg_atomic_write_u32(>state, state);
  } while (!StartBufferIO(bufHdr, true));

Better Write some comment, about we clearing the BM_LOCKED from stage
directly and need not to call UnlockBufHdr explicitly.
otherwise its confusing.




-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

Re: [HACKERS] Move PinBuffer and UnpinBuffer to atomics

2016-03-19 Thread Alexander Korotkov

On Sat, Mar 19, 2016 at 3:22 PM, Dilip Kumar  wrote:

>
> On Mon, Mar 14, 2016 at 3:09 AM, Alexander Korotkov <
> a.korot...@postgrespro.ru> wrote:
>
>> I've drawn graphs for these measurements. The variation doesn't look
>> random here.  TPS is going higher from measurement to measurement.  I bet
>> you did measurements sequentially.
>> I think we should do more measurements until TPS stop growing and beyond
>> to see accumulate average statistics.  Could you please do the same tests
>> but 50 runs.
>>
>
> I have taken reading at different client count (1,2 and 4)
>
> 1. Avg: Is taken average after discarding lowest 5 and highest 5 readings.
> 2. With 4 client I have only 30 reading.
>
> Summary:
>
> I think if we check avg or median atleast at 1 or 2 client head always
> looks winner, but same is not the case with 4 clients. And with 4 clients I
> can see much more fluctuation in the reading.
>
>
> Head(1 Client) Patch (1 Client)
> Head (2 Client) Patch (2 Client)
> Head (4 Client) Patch (4 Client)
> Avg: 19628 19578
> 37180 36536
> 70044 70731
> Median: 19663 19581
> 37967 37484
> 73003 75376
> Below is all the reading. (Arranged in sorted order)
>
> *Runs* *Head (1 Client)* *Patch (1 Client)*
> *Head (2 Client)* *Patch (2 Client)*
> *Head (4 Client)* *Patch (4 Client)*
> 1 18191 18153
> 29454 26128
> 49931 47210
> 2 18365 18768
> 31218 26956
> 53358 47287
> 3 19053 18769
> 31808 29286
> 53575 55458
> 4 19128 18915
> 31860 30558
> 54282 55794
> 5 19160 18948
> 32411 30945
> 56629 56253
> 6 19177 19055
> 32453 31316
> 57595 58147
> 7 19351 19232
> 32571 31703
> 59490 58543
> 8 19353 19239
> 33294 32029
> 63887 58990
> 9 19361 19255
> 33936 32217
> 64718 60233
> 10 19390 19297
> 34361 32767
> 65737 68210
> 11 19452 19368
> 34563 34487
> 65881 71409
> 12 19478 19387
> 36110 34907
> 67151 72108
> 13 19488 19388
> 36221 34936
> 70974 73977
> 14 19490 19395
> 36461 35068
> 72212 74891
> 15 19512 19406
> 36712 35298
> 73003 75376
> 16 19538 19450
> 37104 35492
> 74842 75468
> 17 19547 19487
> 37246 36648
> 75400 75515
> 18 19592 19521
> 37567 37263
> 75573 75626
> 19 19627 19536
> 37707 37430
> 75798 75745
> 20 19661 19556
> 37958 37461
> 75834 75868
> 21 19666 19607
> 37976 37507
> 76240 76161
> 22 19701 19624
> 38060 37557
> 76426 76162
> 23 19708 19643
> 38244 37717
> 76770 76333
> 24 19748 19684
> 38272 38285
> 77011 77009
> 25 19751 19694
> 38467 38294
> 77114 77168
> 26 19776 19695
> 38524 38469
> 77630 77318
> 27 19781 19709
> 38625 38642
> 77865 77550
> 28 19786 19765
> 38756 38643
> 77912 77904
> 29 19796 19823
> 38939 38649
> 78242 78736
> 30 19826 19847
> 39139 38659
>
>
> 31 19837 19899
> 39208 38713
>
>
> 32 19849 19909
> 39211 38837
>
>
> 33 19854 19932
> 39230 38876
>
>
> 34 19867 19949
> 39249 39088
>
>
> 35 19891 19990
> 39259 39148
>
>
> 36 20038 20085
> 39286 39453
>
>
> 37 20083 20128
> 39435 39563
>
>
> 38 20143 20166
> 39448 39959
>
>
> 39 20191 20198
> 39475 40495
>
>
> 40 20437 20455
> 40375 40664
>
>
>
So, I think it's really some regression here on small number clients.  I
have following hypothesis about that.  In some cases we use more atomic
operations than before. For instace, in BufferAlloc we have following block
of code.  Previous code deals with that without atomic operations relying
on spinlock.  So, we have 2 extra atomic operations here, and similar
situation in other places.

pg_atomic_fetch_and_u32(>state, ~(BM_VALID | BM_DIRTY |
BM_JUST_DIRTIED | BM_CHECKPOINT_NEEDED | BM_IO_ERROR | BM_PERMANENT |
BUF_USAGECOUNT_MASK)); if (relpersistence == RELPERSISTENCE_PERMANENT)
pg_atomic_fetch_or_u32(>state, BM_TAG_VALID | BM_PERMANENT |
BUF_USAGECOUNT_ONE); else pg_atomic_fetch_or_u32(>state, BM_TAG_VALID
| BUF_USAGECOUNT_ONE);
Actually, we behave like old code and do such modifications without
increasing number of atomic operations.  We can just calculate new value of
state (including unset of BM_LOCKED flag) and write it to the buffer.  In
this case we don't require more atomic operations.  However, it becomes not
safe to use atomic decrement in UnpinBuffer().  We can use there loop of
CAS which wait buffer header to be unlocked like PinBuffer() does.  I'm not
sure if this affects multicore scalability, but I think it worth trying.

So, this idea is implemented in attached patch.  Please, try it for both
regression on lower number of clients and scalability on large number of
clients.

Other changes in this patch:
1) PinBuffer() and UnpinBuffer() use exponential backoff in spindealy
like LockBufHdr() does.
2) Previous patch contained revert
of ac1d7945f866b1928c2554c0f80fd52d7f92.  This was not intentional.
No, it doesn't contains this revert.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


pinunpin-cas-4.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:

Re: [HACKERS] Move PinBuffer and UnpinBuffer to atomics

2016-03-19 Thread Dilip Kumar

On Mon, Mar 14, 2016 at 3:09 AM, Alexander Korotkov <
a.korot...@postgrespro.ru> wrote:

> I've drawn graphs for these measurements. The variation doesn't look
> random here.  TPS is going higher from measurement to measurement.  I bet
> you did measurements sequentially.
> I think we should do more measurements until TPS stop growing and beyond
> to see accumulate average statistics.  Could you please do the same tests
> but 50 runs.
>

I have taken reading at different client count (1,2 and 4)

1. Avg: Is taken average after discarding lowest 5 and highest 5 readings.
2. With 4 client I have only 30 reading.

Summary:

I think if we check avg or median atleast at 1 or 2 client head always
looks winner, but same is not the case with 4 clients. And with 4 clients I
can see much more fluctuation in the reading.


Head(1 Client) Patch (1 Client)
Head (2 Client) Patch (2 Client)
Head (4 Client) Patch (4 Client)
Avg: 19628 19578
37180 36536
70044 70731
Median: 19663 19581
37967 37484
73003 75376
Below is all the reading. (Arranged in sorted order)

*Runs* *Head (1 Client)* *Patch (1 Client)*
*Head (2 Client)* *Patch (2 Client)*
*Head (4 Client)* *Patch (4 Client)*
1 18191 18153
29454 26128
49931 47210
2 18365 18768
31218 26956
53358 47287
3 19053 18769
31808 29286
53575 55458
4 19128 18915
31860 30558
54282 55794
5 19160 18948
32411 30945
56629 56253
6 19177 19055
32453 31316
57595 58147
7 19351 19232
32571 31703
59490 58543
8 19353 19239
33294 32029
63887 58990
9 19361 19255
33936 32217
64718 60233
10 19390 19297
34361 32767
65737 68210
11 19452 19368
34563 34487
65881 71409
12 19478 19387
36110 34907
67151 72108
13 19488 19388
36221 34936
70974 73977
14 19490 19395
36461 35068
72212 74891
15 19512 19406
36712 35298
73003 75376
16 19538 19450
37104 35492
74842 75468
17 19547 19487
37246 36648
75400 75515
18 19592 19521
37567 37263
75573 75626
19 19627 19536
37707 37430
75798 75745
20 19661 19556
37958 37461
75834 75868
21 19666 19607
37976 37507
76240 76161
22 19701 19624
38060 37557
76426 76162
23 19708 19643
38244 37717
76770 76333
24 19748 19684
38272 38285
77011 77009
25 19751 19694
38467 38294
77114 77168
26 19776 19695
38524 38469
77630 77318
27 19781 19709
38625 38642
77865 77550
28 19786 19765
38756 38643
77912 77904
29 19796 19823
38939 38649
78242 78736
30 19826 19847
39139 38659


31 19837 19899
39208 38713


32 19849 19909
39211 38837


33 19854 19932
39230 38876


34 19867 19949
39249 39088


35 19891 19990
39259 39148


36 20038 20085
39286 39453


37 20083 20128
39435 39563


38 20143 20166
39448 39959


39 20191 20198
39475 40495


40 20437 20455
40375 40664





-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

Re: [HACKERS] Move PinBuffer and UnpinBuffer to atomics

2016-03-13 Thread Alexander Korotkov

On Fri, Mar 11, 2016 at 7:08 AM, Dilip Kumar  wrote:

>
> On Thu, Mar 10, 2016 at 8:26 PM, Alexander Korotkov <
> a.korot...@postgrespro.ru> wrote:
>
>> I don't think we can rely on median that much if we have only 3 runs.
>> For 3 runs we can only apply Kornfeld method which claims that confidence
>> interval should be between lower and upper values.
>> Since confidence intervals for master and patched versions are
>> overlapping we can't conclude that expected TPS numbers are different.
>> Dilip, could you do more runs? 10, for example. Using such statistics we
>> would be able to conclude something.
>>
>
> Here is the reading for 10 runs
>
>
> Median Result
> ---
>
> Client   Base  Patch
> ---
> 1  1987319739
> 2  3865838276
> 4  6881262075
>
> Full Results of 10 runs...
>
>  Base
> -
>  Runs  1 Client2 Client  4 Client
> -
> 1194423486649023
> 2196053513951695
> 3197263710453899
> 4198353843955708
> 5198663863867473
> 6198803867970152
> 7199733872070920
> 8200483873771342
> 9200573881571403
> 10  203444142377953
> -
>
>
> Patch
> ---
> Runs  1 Client 2 Client  4 Client
> --
> 1188813016154928
> 2194153203159741
> 3195643502261060
> 4196273671261839
> 5196703765962011
> 6198083889462139
> 7198573908162983
> 8199103992375358
> 9201693993777481
> 10  201814000378462
> --
>

I've drawn graphs for these measurements. The variation doesn't look random
here.  TPS is going higher from measurement to measurement.  I bet you did
measurements sequentially.
I think we should do more measurements until TPS stop growing and beyond to
see accumulate average statistics.  Could you please do the same tests but
50 runs.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Move PinBuffer and UnpinBuffer to atomics

2016-03-10 Thread Dilip Kumar

On Thu, Mar 10, 2016 at 8:26 PM, Alexander Korotkov <
a.korot...@postgrespro.ru> wrote:

> I don't think we can rely on median that much if we have only 3 runs.
> For 3 runs we can only apply Kornfeld method which claims that confidence
> interval should be between lower and upper values.
> Since confidence intervals for master and patched versions are overlapping
> we can't conclude that expected TPS numbers are different.
> Dilip, could you do more runs? 10, for example. Using such statistics we
> would be able to conclude something.
>

Here is the reading for 10 runs


Median Result
---

Client   Base  Patch
---
1  1987319739
2  3865838276
4  6881262075

Full Results of 10 runs...

 Base
-
 Runs  1 Client2 Client  4 Client
-
1194423486649023
2196053513951695
3197263710453899
4198353843955708
5198663863867473
6198803867970152
7199733872070920
8200483873771342
9200573881571403
10  203444142377953
-


Patch
---
Runs  1 Client 2 Client  4 Client
--
1188813016154928
2194153203159741
3195643502261060
4196273671261839
5196703765962011
6198083889462139
7198573908162983
8199103992375358
9201693993777481
10  201814000378462
--


-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

Re: [HACKERS] Move PinBuffer and UnpinBuffer to atomics

2016-03-10 Thread Alexander Korotkov

On Mon, Mar 7, 2016 at 6:19 PM, Robert Haas  wrote:

> On Sat, Mar 5, 2016 at 7:22 AM, Dilip Kumar  wrote:
> > On Wed, Mar 2, 2016 at 11:05 AM, Dilip Kumar 
> wrote:
> >> And this latest result (no regression) is on X86 but on my local
> machine.
> >>
> >> I did not exactly saw what this new version of patch is doing different,
> >> so I will test this version in other machines also and see the results.
> >
> >
> > I tested this on PPC again, This time in various order (sometime patch
> first
> > and then base first).
> >  I tested with latest patch pinunpin-cas-2.patch on Power8.
> >
> > Shared Buffer = 8GB
> > ./pgbench  -j$ -c$ -T300 -M prepared -S postgres
> >
> > BASE
> > -
> > Clientsrun1run2run3
> > 1   212001875420537
> > 2   403313952038746
> >
> >
> > Patch
> > -
> > Clientsrun1run2run3
> > 1   202251980619778
> > 2   398304189836620
> >
> > I think, here we can not see any regression, (If I take median then it
> may
> > looks low with patch so posting all 3 reading).
>
> If the median looks low, how is that not a regression?
>

I don't think we can rely on median that much if we have only 3 runs.
For 3 runs we can only apply Kornfeld method which claims that confidence
interval should be between lower and upper values.
Since confidence intervals for master and patched versions are overlapping
we can't conclude that expected TPS numbers are different.
Dilip, could you do more runs? 10, for example. Using such statistics we
would be able to conclude something.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: [HACKERS] Move PinBuffer and UnpinBuffer to atomics

2016-03-07 Thread Robert Haas

On Sat, Mar 5, 2016 at 7:22 AM, Dilip Kumar  wrote:
> On Wed, Mar 2, 2016 at 11:05 AM, Dilip Kumar  wrote:
>> And this latest result (no regression) is on X86 but on my local machine.
>>
>> I did not exactly saw what this new version of patch is doing different,
>> so I will test this version in other machines also and see the results.
>
>
> I tested this on PPC again, This time in various order (sometime patch first
> and then base first).
>  I tested with latest patch pinunpin-cas-2.patch on Power8.
>
> Shared Buffer = 8GB
> ./pgbench  -j$ -c$ -T300 -M prepared -S postgres
>
> BASE
> -
> Clientsrun1run2run3
> 1   212001875420537
> 2   403313952038746
>
>
> Patch
> -
> Clientsrun1run2run3
> 1   202251980619778
> 2   398304189836620
>
> I think, here we can not see any regression, (If I take median then it may
> looks low with patch so posting all 3 reading).

If the median looks low, how is that not a regression?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

1 2 >

1 - 100 of 161 matches

Mail list logo