Re: [HACKERS] Speed up Clog Access by increasing CLOG buffers

2017-09-01 Thread Amit Kapila
On Fri, Sep 1, 2017 at 9:17 PM, Robert Haas  wrote:
> On Fri, Sep 1, 2017 at 10:03 AM, Dilip Kumar  wrote:
>>> Sure will do so.  In the meantime, I have rebased the patch.
>>
>> I have repeated some of the tests we have performed earlier.
>

Thanks for repeating the performance tests.

> OK, these tests seem to show that this is still working.  Committed,
> again.  Let's hope this attempt goes better than the last one.
>

Thanks for committing.


-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Speed up Clog Access by increasing CLOG buffers

2017-09-01 Thread Robert Haas
On Fri, Sep 1, 2017 at 10:03 AM, Dilip Kumar  wrote:
>> Sure will do so.  In the meantime, I have rebased the patch.
>
> I have repeated some of the tests we have performed earlier.

OK, these tests seem to show that this is still working.  Committed,
again.  Let's hope this attempt goes better than the last one.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Speed up Clog Access by increasing CLOG buffers

2017-09-01 Thread Dilip Kumar
On Wed, Aug 30, 2017 at 12:54 PM, Amit Kapila  wrote:
>
> That would have been better. In any case, will do the tests on some
> higher end machine and will share the results.
>
>> Given that we've changed the approach here somewhat, I think we need
>> to validate that we're still seeing a substantial reduction in
>> CLogControlLock contention on big machines.
>>
>
> Sure will do so.  In the meantime, I have rebased the patch.

I have repeated some of the tests we have performed earlier.

Machine:
Intel 8 socket machine with 128 core.

Configuration:

shared_buffers=8GB
checkpoint_timeout=40min
max_wal_size=20GB
max_connections=300
maintenance_work_mem=4GB
synchronous_commit=off
checkpoint_completion_target=0.9

I have run taken one reading for each test to measure the wait event.
Observation is same that at higher client count there is a significant
reduction in the contention on ClogControlLock.

Benchmark:  Pgbench simple_update, 30 mins run:

Head: (64 client) : (TPS 60720)
53808  Client  | ClientRead
  26147  IPC | ProcArrayGroupUpdate
   7866  LWLock  | CLogControlLock
   3705  Activity| LogicalLauncherMain
   3699  Activity| AutoVacuumMain
   3353  LWLock  | ProcArrayLoc
   3099  LWLock  | wal_insert
   2825  Activity| BgWriterMain
   2688  Lock| extend
   1436  Activity| WalWriterMain

Patch: (64 client) : (TPS 67207)
 53235  Client  | ClientRead
  29470  IPC | ProcArrayGroupUpdate
   4302  LWLock  | wal_insert
   3717  Activity| LogicalLauncherMain
   3715  Activity| AutoVacuumMain
   3463  LWLock  | ProcArrayLock
   3140  Lock| extend
   2934  Activity| BgWriterMain
   1434  Activity| WalWriterMain
   1198  Activity| CheckpointerMain
   1073  LWLock  | XidGenLock
869  IPC | ClogGroupUpdate

Head:(72 Client): (TPS 57856)

 55820  Client  | ClientRead
  34318  IPC | ProcArrayGroupUpdate
  15392  LWLock  | CLogControlLock
   3708  Activity| LogicalLauncherMain
   3705  Activity| AutoVacuumMain
   3436  LWLock  | ProcArrayLock

Patch:(72 Client): (TPS 65740)

  60356  Client  | ClientRead
  38545  IPC | ProcArrayGroupUpdate
   4573  LWLock  | wal_insert
   3708  Activity| LogicalLauncherMain
   3705  Activity| AutoVacuumMain
   3508  LWLock  | ProcArrayLock
   3492  Lock| extend
   2903  Activity| BgWriterMain
   1903  LWLock  | XidGenLock
   1383  Activity| WalWriterMain
   1212  Activity| CheckpointerMain
   1056  IPC | ClogGroupUpdate


Head:(96 Client): (TPS 52170)

  62841  LWLock  | CLogControlLock
  56150  IPC | ProcArrayGroupUpdate
  54761  Client  | ClientRead
   7037  LWLock  | wal_insert
   4077  Lock| extend
   3727  Activity| LogicalLauncherMain
   3727  Activity| AutoVacuumMain
   3027  LWLock  | ProcArrayLock

Patch:(96 Client): (TPS 67932)

  87378  IPC | ProcArrayGroupUpdate
  80201  Client  | ClientRead
  11511  LWLock  | wal_insert
   4102  Lock| extend
   3971  LWLock  | ProcArrayLock
   3731  Activity| LogicalLauncherMain
   3731  Activity| AutoVacuumMain
   2948  Activity| BgWriterMain
   1763  LWLock  | XidGenLock
   1736  IPC | ClogGroupUpdate

Head:(128 Client): (TPS 40820)

 182569  LWLock  | CLogControlLock
  61484  IPC | ProcArrayGroupUpdate
  37969  Client  | ClientRead
   5135  LWLock  | wal_insert
   3699  Activity| LogicalLauncherMain
   3699  Activity| AutoVacuumMain

Patch:(128 Client): (TPS 67054)

 174583  IPC | ProcArrayGroupUpdate
  66084  Client  | ClientRead
  16738  LWLock  | wal_insert
   4993  IPC | ClogGroupUpdate
   4893  LWLock  | ProcArrayLock
   4839  Lock| extend

Benchmark: select for update with 3 save points, 10 mins run

Script:
\set aid random (1,3000)
\set tid random (1,3000)

BEGIN;
SELECT abalance FROM pgbench_accounts WHERE aid = :aid for UPDATE;
SAVEPOINT s1;
SELECT tbalance FROM pgbench_tellers WHERE tid = :tid for UPDATE;
SAVEPOINT s2;
SELECT abalance FROM pgbench_accounts WHERE aid = :aid for UPDATE;
SAVEPOINT s3;
SELECT tbalance FROM pgbench_tellers WHERE tid = :tid for UPDATE;
END;

Head:(64 Client): (TPS 44577.1802)

  53808  Client  | ClientRead
  26147  IPC | ProcArrayGroupUpdate
   7866  LWLock  | CLogControlLock
   3705  Activity| LogicalLauncherMain
   3699  Activity| AutoVacuumMain
   3353  LWLock  | ProcArrayLock
   3099  LWLock  | wal_insert

Patch:(64 Client): (TPS 46156.245)

 53235  Client  | ClientRead
  

Re: [HACKERS] Speed up Clog Access by increasing CLOG buffers

2017-08-30 Thread Amit Kapila
On Wed, Aug 30, 2017 at 2:43 AM, Robert Haas  wrote:
> On Tue, Jul 4, 2017 at 12:33 AM, Amit Kapila  wrote:
>> I have updated the patch to support wait events and moved it to upcoming CF.
>
> This patch doesn't apply any more, but I made it apply with a hammer
> and then did a little benchmarking (scylla, EDB server, Intel Xeon
> E5-2695 v3 @ 2.30GHz, 2 sockets, 14 cores/socket, 2 threads/core).
> The results were not impressive.  There's basically no clog contention
> to remove, so the patch just doesn't really do anything.
>

Yeah, in such a case patch won't help.

>  For example,
> here's a wait event profile with master and using Ashutosh's test
> script with 5 savepoints:
>
>   1  Lock| tuple
>   2  IO  | SLRUSync
>   5  LWLock  | wal_insert
>   5  LWLock  | XidGenLock
>   9  IO  | DataFileRead
>  12  LWLock  | lock_manager
>  16  IO  | SLRURead
>  20  LWLock  | CLogControlLock
>  97  LWLock  | buffer_content
> 216  Lock| transactionid
> 237  LWLock  | ProcArrayLock
>1238  IPC | ProcArrayGroupUpdate
>2266  Client  | ClientRead
>
> This is just a 5-minute test; maybe things would change if we ran it
> for longer, but if only 0.5% of the samples are blocked on
> CLogControlLock without the patch, obviously the patch can't help
> much.  I did some other experiments too, but I won't bother
> summarizing the results here because they're basically boring.  I
> guess I should have used a bigger machine.
>

That would have been better. In any case, will do the tests on some
higher end machine and will share the results.

> Given that we've changed the approach here somewhat, I think we need
> to validate that we're still seeing a substantial reduction in
> CLogControlLock contention on big machines.
>

Sure will do so.  In the meantime, I have rebased the patch.


-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


group_update_clog_v14.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Speed up Clog Access by increasing CLOG buffers

2017-08-29 Thread Robert Haas
On Tue, Jul 4, 2017 at 12:33 AM, Amit Kapila  wrote:
> I have updated the patch to support wait events and moved it to upcoming CF.

This patch doesn't apply any more, but I made it apply with a hammer
and then did a little benchmarking (scylla, EDB server, Intel Xeon
E5-2695 v3 @ 2.30GHz, 2 sockets, 14 cores/socket, 2 threads/core).
The results were not impressive.  There's basically no clog contention
to remove, so the patch just doesn't really do anything.  For example,
here's a wait event profile with master and using Ashutosh's test
script with 5 savepoints:

  1  Lock| tuple
  2  IO  | SLRUSync
  5  LWLock  | wal_insert
  5  LWLock  | XidGenLock
  9  IO  | DataFileRead
 12  LWLock  | lock_manager
 16  IO  | SLRURead
 20  LWLock  | CLogControlLock
 97  LWLock  | buffer_content
216  Lock| transactionid
237  LWLock  | ProcArrayLock
   1238  IPC | ProcArrayGroupUpdate
   2266  Client  | ClientRead

This is just a 5-minute test; maybe things would change if we ran it
for longer, but if only 0.5% of the samples are blocked on
CLogControlLock without the patch, obviously the patch can't help
much.  I did some other experiments too, but I won't bother
summarizing the results here because they're basically boring.  I
guess I should have used a bigger machine.

Given that we've changed the approach here somewhat, I think we need
to validate that we're still seeing a substantial reduction in
CLogControlLock contention on big machines.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Speed up Clog Access by increasing CLOG buffers

2017-07-03 Thread Amit Kapila
On Mon, Jul 3, 2017 at 6:15 PM, Amit Kapila  wrote:
> On Thu, Mar 23, 2017 at 1:18 PM, Ashutosh Sharma 
> wrote:
>>
>> Conclusion:
>> As seen from the test results mentioned above, there is some performance
>> improvement with 3 SP(s), with 5 SP(s) the results with patch is slightly
>> better than HEAD, with 7 and 10 SP(s) we do see regression with patch.
>> Therefore, I think the threshold value of 4 for number of subtransactions
>> considered in the patch looks fine to me.
>>
>
> Thanks for the tests.  Attached find the rebased patch on HEAD.  I have ran
> latest pgindent on patch.  I have yet to add wait event for group lock waits
> in this patch as is done by Robert in commit
> d4116a771925379c33cf4c6634ca620ed08b551d for ProcArrayGroupUpdate.
>

I have updated the patch to support wait events and moved it to upcoming CF.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


group_update_clog_v13.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Speed up Clog Access by increasing CLOG buffers

2017-07-03 Thread Amit Kapila
On Thu, Mar 23, 2017 at 1:18 PM, Ashutosh Sharma 
wrote:
>
> *Conclusion:*
> As seen from the test results mentioned above, there is some performance
> improvement with 3 SP(s), with 5 SP(s) the results with patch is slightly
> better than HEAD, with 7 and 10 SP(s) we do see regression with patch.
> Therefore, I think the threshold value of 4 for number of subtransactions
> considered in the patch looks fine to me.
>
>
Thanks for the tests.  Attached find the rebased patch on HEAD.  I have ran
latest pgindent on patch.  I have yet to add wait event for group lock
waits in this patch as is done by Robert in commit
 d4116a771925379c33cf4c6634ca620ed08b551d for ProcArrayGroupUpdate.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


group_update_clog_v12.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Speed up Clog Access by increasing CLOG buffers

2017-04-07 Thread Robert Haas
On Thu, Mar 9, 2017 at 5:49 PM, Robert Haas  wrote:
> However, I just realized that in
> both this case and in the case of group XID clearing, we weren't
> advertising a wait event for the PGSemaphoreLock calls that are part
> of the group locking machinery.  I think we should fix that, because a
> quick test shows that can happen fairly often -- not, I think, as
> often as we would have seen LWLock waits without these patches, but
> often enough that you'll want to know.  Patch attached.

I've pushed the portion of this that relates to ProcArrayLock.  (I
know this hasn't been discussed much, but there doesn't really seem to
be any reason for anybody to object, and looking at just the
LWLock/ProcArrayLock wait events gives a highly misleading answer.)

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Speed up Clog Access by increasing CLOG buffers

2017-03-23 Thread Ashutosh Sharma
Hi All,

I have tried to test 'group_update_clog_v11.1.patch' shared upthread by
Amit on a high end machine. I have tested the patch with various savepoints
in my test script. The machine details along with test scripts and the test
results are shown below,

Machine details:

24 sockets, 192 CPU(s)
RAM - 500GB

test script:


\set aid random (1,3000)
\set tid random (1,3000)

BEGIN;
SELECT abalance FROM pgbench_accounts WHERE aid = :aid for UPDATE;
SAVEPOINT s1;
SELECT tbalance FROM pgbench_tellers WHERE tid = :tid for UPDATE;
SAVEPOINT s2;
SELECT abalance FROM pgbench_accounts WHERE aid = :aid for UPDATE;
SAVEPOINT s3;
SELECT tbalance FROM pgbench_tellers WHERE tid = :tid for UPDATE;
SAVEPOINT s4;
SELECT abalance FROM pgbench_accounts WHERE aid = :aid for UPDATE;
SAVEPOINT s5;
SELECT tbalance FROM pgbench_tellers WHERE tid = :tid for UPDATE;
END;

Non-default parameters
==
max_connections = 200
shared_buffers=8GB
min_wal_size=10GB
max_wal_size=15GB
maintenance_work_mem = 1GB
checkpoint_completion_target = 0.9
checkpoint_timeout=900
synchronous_commit=off


pgbench -M prepared -c $thread -j $thread -T $time_for_reading postgres -f
~/test_script.sql

where, time_for_reading = 10 mins

Test Results:
=

With 3 savepoints
=

CLIENT COUNT TPS (HEAD) TPS (PATCH) % IMPROVEMENT
128 50275 53704 6.82048732
64 62860 66561 5.887686923
8 18464 18752 1.559792028

With 5 savepoints
=

CLIENT COUNT TPS (HEAD) TPS (PATCH) % IMPROVEMENT
128 46559 47715 2.482871196
64 52306 52082 -0.4282491492
8 12289 12852 4.581332899


With 7 savepoints
=

CLIENT COUNT TPS (HEAD) TPS (PATCH) % IMPROVEMENT
128 41367 41500 0.3215123166
64 42996 41473 -3.542189971
8 9665 9657 -0.0827728919

With 10 savepoints
==

CLIENT COUNT TPS (HEAD) TPS (PATCH) % IMPROVEMENT
128 34513 34597 0.24338655
64 32581 32035 -1.67582
8 7293 7622 4.511175099
*Conclusion:*
As seen from the test results mentioned above, there is some performance
improvement with 3 SP(s), with 5 SP(s) the results with patch is slightly
better than HEAD, with 7 and 10 SP(s) we do see regression with patch.
Therefore, I think the threshold value of 4 for number of subtransactions
considered in the patch looks fine to me.


--
With Regards,
Ashutosh Sharma
EnterpriseDB:http://www.enterprisedb.com

On Tue, Mar 21, 2017 at 6:19 PM, Amit Kapila 
wrote:

> On Mon, Mar 20, 2017 at 8:27 AM, Robert Haas 
> wrote:
> > On Fri, Mar 17, 2017 at 2:30 AM, Amit Kapila 
> wrote:
> >>> I was wondering about doing an explicit test: if the XID being
> >>> committed matches the one in the PGPROC, and nsubxids matches, and the
> >>> actual list of XIDs matches, then apply the optimization.  That could
> >>> replace the logic that you've proposed to exclude non-commit cases,
> >>> gxact cases, etc. and it seems fundamentally safer.  But it might be a
> >>> more expensive test, too, so I'm not sure.
> >>
> >> I think if the number of subxids is very small let us say under 5 or
> >> so, then such a check might not matter, but otherwise it could be
> >> expensive.
> >
> > We could find out by testing it.  We could also restrict the
> > optimization to cases with just a few subxids, because if you've got a
> > large number of subxids this optimization probably isn't buying much
> > anyway.
> >
>
> Yes, and I have modified the patch to compare xids and subxids for
> group update.  In the initial short tests (with few client counts), it
> seems like till 3 savepoints we can win and 10 savepoints onwards
> there is some regression or at the very least there doesn't appear to
> be any benefit.  We need more tests to identify what is the safe
> number, but I thought it is better to share the patch to see if we
> agree on the changes because if not, then the whole testing needs to
> be repeated.  Let me know what do you think about attached?
>
>
>
> --
> With Regards,
> Amit Kapila.
> EnterpriseDB: http://www.enterprisedb.com
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>
>


Re: [HACKERS] Speed up Clog Access by increasing CLOG buffers

2017-03-21 Thread Amit Kapila
On Mon, Mar 20, 2017 at 8:27 AM, Robert Haas  wrote:
> On Fri, Mar 17, 2017 at 2:30 AM, Amit Kapila  wrote:
>>> I was wondering about doing an explicit test: if the XID being
>>> committed matches the one in the PGPROC, and nsubxids matches, and the
>>> actual list of XIDs matches, then apply the optimization.  That could
>>> replace the logic that you've proposed to exclude non-commit cases,
>>> gxact cases, etc. and it seems fundamentally safer.  But it might be a
>>> more expensive test, too, so I'm not sure.
>>
>> I think if the number of subxids is very small let us say under 5 or
>> so, then such a check might not matter, but otherwise it could be
>> expensive.
>
> We could find out by testing it.  We could also restrict the
> optimization to cases with just a few subxids, because if you've got a
> large number of subxids this optimization probably isn't buying much
> anyway.
>

Yes, and I have modified the patch to compare xids and subxids for
group update.  In the initial short tests (with few client counts), it
seems like till 3 savepoints we can win and 10 savepoints onwards
there is some regression or at the very least there doesn't appear to
be any benefit.  We need more tests to identify what is the safe
number, but I thought it is better to share the patch to see if we
agree on the changes because if not, then the whole testing needs to
be repeated.  Let me know what do you think about attached?



-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


group_update_clog_v11.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Speed up Clog Access by increasing CLOG buffers

2017-03-19 Thread Robert Haas
On Fri, Mar 17, 2017 at 2:30 AM, Amit Kapila  wrote:
>> I was wondering about doing an explicit test: if the XID being
>> committed matches the one in the PGPROC, and nsubxids matches, and the
>> actual list of XIDs matches, then apply the optimization.  That could
>> replace the logic that you've proposed to exclude non-commit cases,
>> gxact cases, etc. and it seems fundamentally safer.  But it might be a
>> more expensive test, too, so I'm not sure.
>
> I think if the number of subxids is very small let us say under 5 or
> so, then such a check might not matter, but otherwise it could be
> expensive.

We could find out by testing it.  We could also restrict the
optimization to cases with just a few subxids, because if you've got a
large number of subxids this optimization probably isn't buying much
anyway.  We're trying to avoid grabbing CLogControlLock to do a very
small amount of work, but if you've got 10 or 20 subxids we're doing
as much work anyway as the group update optimization is attempting to
put into one batch.

> So we have four ways to proceed:
> 1. Have this optimization for subtransactions and make it safe by
> having some additional conditions like check for recovery, explicit
> check for if the actual transaction ids match with ids stored in proc.
> 2. Have this optimization when there are no subtransactions. In this
> case, we can have a very simple check for this optimization.
> 3. Drop this patch and idea.
> 4. Consider it for next version.
>
> I personally think second way is okay for this release as that looks
> safe and gets us the maximum benefit we can achieve by this
> optimization and then consider adding optimization for subtransactions
> (first way) in the future version if we think it is safe and gives us
> the benefit.
>
> Thoughts?

I don't like #2 very much.  Restricting it to a relatively small
number of transactions - whatever we can show doesn't hurt performance
- seems OK, but restriction it to the exactly-zero-subtransactions
case seems poor.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Speed up Clog Access by increasing CLOG buffers

2017-03-17 Thread Amit Kapila
On Sun, Mar 12, 2017 at 8:11 AM, Robert Haas  wrote:
> On Fri, Mar 10, 2017 at 7:39 PM, Amit Kapila  wrote:
>> I agree that more analysis can help us to decide if we can use subxids
>> from PGPROC and if so under what conditions.  Have you considered the
>> another patch I have posted to fix the issue which is to do this
>> optimization only when subxids are not present?  In that patch, it
>> will remove the dependency of relying on subxids in PGPROC.
>
> Well, that's an option, but it narrows the scope of the optimization
> quite a bit.  I think Simon previously opposed handling only the
> no-subxid cases (although I may be misremembering) and I'm not that
> keen about it either.
>
> I was wondering about doing an explicit test: if the XID being
> committed matches the one in the PGPROC, and nsubxids matches, and the
> actual list of XIDs matches, then apply the optimization.  That could
> replace the logic that you've proposed to exclude non-commit cases,
> gxact cases, etc. and it seems fundamentally safer.  But it might be a
> more expensive test, too, so I'm not sure.
>

I think if the number of subxids is very small let us say under 5 or
so, then such a check might not matter, but otherwise it could be
expensive.

> It would be nice to get some other opinions on how (and whether) to
> proceed with this.  I'm feeling really nervous about this right at the
> moment, because it seems like everybody including me missed some
> fairly critical points relating to the safety (or lack thereof) of
> this patch, and I want to make sure that if it gets committed again,
> we've really got everything nailed down tight.
>

I think the basic thing that is missing in the last patch was that we
can't apply this optimization during WAL replay as during
recovery/hotstandby the xids/subxids are tracked in KnownAssignedXids.
The same is mentioned in header file comments in procarray.c and in
GetSnapshotData (look at an else loop of the check if
(!snapshot->takenDuringRecovery)).  As far as I can see, the patch has
considered that in the initial versions but then the check got dropped
in one of the later revisions by mistake. The patch version-5 [1] has
the check for recovery, but during some code rearrangement, it got
dropped in version-6 [2].  Having said that, I think the improvement
in case there are subtransactions will be lesser because having
subtransactions means more work under LWLock and that will have lesser
context switches.  This optimization is all about the reduction in
frequent context switches, so I think even if we don't optimize the
case for subtransactions we are not leaving much on the table and it
will make this optimization much safe.  To substantiate this theory
with data, see the difference in performance when subtransactions are
used [3] and when they are not used [4].

So we have four ways to proceed:
1. Have this optimization for subtransactions and make it safe by
having some additional conditions like check for recovery, explicit
check for if the actual transaction ids match with ids stored in proc.
2. Have this optimization when there are no subtransactions. In this
case, we can have a very simple check for this optimization.
3. Drop this patch and idea.
4. Consider it for next version.

I personally think second way is okay for this release as that looks
safe and gets us the maximum benefit we can achieve by this
optimization and then consider adding optimization for subtransactions
(first way) in the future version if we think it is safe and gives us
the benefit.

Thoughts?

[1] - 
https://www.postgresql.org/message-id/CAA4eK1KUVPxBcGTdOuKyvf5p1sQ0HeUbSMbTxtQc%3DP65OxiZog%40mail.gmail.com
[2] - 
https://www.postgresql.org/message-id/CAA4eK1L4iV-2qe7AyMVsb%2Bnz7SiX8JvCO%2BCqhXwaiXgm3CaBUw%40mail.gmail.com
[3] - 
https://www.postgresql.org/message-id/CAFiTN-u3%3DXUi7z8dTOgxZ98E7gL1tzL%3Dq9Yd%3DCwWCtTtS6pOZw%40mail.gmail.com
[4] - 
https://www.postgresql.org/message-id/CAFiTN-u-XEzhd%3DhNGW586fmQwdTy6Qy6_SXe09tNB%3DgBcVzZ_A%40mail.gmail.com

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Speed up Clog Access by increasing CLOG buffers

2017-03-11 Thread Robert Haas
On Fri, Mar 10, 2017 at 7:39 PM, Amit Kapila  wrote:
> I agree that more analysis can help us to decide if we can use subxids
> from PGPROC and if so under what conditions.  Have you considered the
> another patch I have posted to fix the issue which is to do this
> optimization only when subxids are not present?  In that patch, it
> will remove the dependency of relying on subxids in PGPROC.

Well, that's an option, but it narrows the scope of the optimization
quite a bit.  I think Simon previously opposed handling only the
no-subxid cases (although I may be misremembering) and I'm not that
keen about it either.

I was wondering about doing an explicit test: if the XID being
committed matches the one in the PGPROC, and nsubxids matches, and the
actual list of XIDs matches, then apply the optimization.  That could
replace the logic that you've proposed to exclude non-commit cases,
gxact cases, etc. and it seems fundamentally safer.  But it might be a
more expensive test, too, so I'm not sure.

It would be nice to get some other opinions on how (and whether) to
proceed with this.  I'm feeling really nervous about this right at the
moment, because it seems like everybody including me missed some
fairly critical points relating to the safety (or lack thereof) of
this patch, and I want to make sure that if it gets committed again,
we've really got everything nailed down tight.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Speed up Clog Access by increasing CLOG buffers

2017-03-10 Thread Amit Kapila
On Sat, Mar 11, 2017 at 2:10 AM, Robert Haas  wrote:
> On Fri, Mar 10, 2017 at 6:25 AM, Amit Kapila  wrote:
>> On Fri, Mar 10, 2017 at 11:51 AM, Tom Lane  wrote:
>>> Amit Kapila  writes:
 Just to let you know that I think I have figured out the reason of
 failure.  If we run the regressions with attached patch, it will make
 the regression tests fail consistently in same way.  The patch just
 makes all transaction status updates to go via group clog update
 mechanism.
>>>
>>> This does *not* give me a warm fuzzy feeling that this patch was
>>> ready to commit.  Or even that it was tested to the claimed degree.
>>>
>>
>> I think this is more of an implementation detail missed by me.  We
>> have done quite some performance/stress testing with a different
>> number of savepoints, but this could have been caught only by having
>> Rollback to Savepoint followed by a commit.  I agree that we could
>> have devised some simple way (like the one I shared above) to test the
>> wide range of tests with this new mechanism earlier.  This is a
>> learning from here and I will try to be more cautious about such
>> things in future.
>
> After some study, I don't feel confident that it's this simple.  The
> underlying issue here is that TransactionGroupUpdateXidStatus thinks
> it can assume that proc->clogGroupMemberXid, pgxact->nxids, and
> proc->subxids.xids match the values that were passed to
> TransactionIdSetPageStatus, but that's not checked anywhere.  For
> example, I thought about adding these assertions:
>
>Assert(nsubxids == MyPgXact->nxids);
>Assert(memcmp(subxids, MyProc->subxids.xids,
>   nsubxids * sizeof(TransactionId)) == 0);
>
> There's not even a comment in the patch anywhere that notes that we're
> assuming this, let alone anything that checks that it's actually true,
> which seems worrying.
>
> One thing that seems off is that we have this new field
> clogGroupMemberXid, which we use to determine the XID that is being
> committed, but for the subxids we think it's going to be true in every
> case.   Well, that seems a bit odd, right?  I mean, if the contents of
> the PGXACT are a valid way to figure out the subxids that we need to
> worry about, then why not also it to get the toplevel XID?
>
> Another point that's kind of bothering me is that this whole approach
> now seems to me to be an abstraction violation.  It relies on the set
> of subxids for which we're setting status in clog matching the set of
> subxids advertised in PGPROC.  But actually there's a fair amount of
> separation between those things.  What's getting passed down to clog
> is coming from xact.c's transaction state stack, which is completely
> separate from the procarray.  Now after going over the logic in some
> detail, it does look to me that you're correct that in the case of a
> toplevel commit they will always match, but in some sense that looks
> accidental.
>
> For example, look at this code from RecordTransactionAbort:
>
> /*
>  * If we're aborting a subtransaction, we can immediately remove failed
>  * XIDs from PGPROC's cache of running child XIDs.  We do that here for
>  * subxacts, because we already have the child XID array at hand.  For
>  * main xacts, the equivalent happens just after this function returns.
>  */
> if (isSubXact)
> XidCacheRemoveRunningXids(xid, nchildren, children, latestXid);
>
> That code paints the removal of the aborted subxids from our PGPROC as
> an optimization, not a requirement for correctness.  And without this
> patch, that's correct: the XIDs are advertised in PGPROC so that we
> construct correct snapshots, but they only need to be present there
> for so long as there is a possibility that those XIDs might in the
> future commit.  Once they've aborted, it's not *necessary* for them to
> appear in PGPROC any more, but it doesn't hurt anything if they do.
> However, with this patch, removing them from PGPROC becomes a hard
> requirement, because otherwise the set of XIDs that are running
> according to the transaction state stack and the set that are running
> according to the PGPROC might be different.  Yet, neither the original
> patch nor your proposed fix patch updated any of the comments here.
>

There was a comment in existing code (proc.h) which states that it
will contain non-aborted transactions.  I agree that having it
explicitly mentioned in patch would have been much better.

/*
 * Each backend advertises up to PGPROC_MAX_CACHED_SUBXIDS TransactionIds
 * for non-aborted subtransactions of its current top transaction.  These
 * have to be treated as running XIDs by other backends.




> One might wonder whether it's even wise to tie these things together
> too closely.  For example, you can imagine a future patch for
> autonomous transactions stashing their XIDs in the subxids array.
> That'd be fine 

Re: [HACKERS] Speed up Clog Access by increasing CLOG buffers

2017-03-10 Thread Alvaro Herrera
Robert Haas wrote:

> The smoking gun was in 009_twophase_slave.log:
> 
> TRAP: FailedAssertion("!(nsubxids == MyPgXact->nxids)", File:
> "clog.c", Line: 288)
> 
> ...and then the node shuts down, which is why this hangs forever.
> (Also... what's up with it hanging forever instead of timing out or
> failing or something?)

This bit my while messing with 2PC tests recently.  I think it'd be
worth doing something about this, such as causing the test to die if we
request a server to (re)start and it doesn't start or it immediately
crashes.  This doesn't solve the problem of a server crashing at a point
not immediately after start, though.

(It'd be very annoying to have to sprinkle the Perl test code with
"assert $server->islive", but perhaps we can add assertions of some kind
in PostgresNode itself).

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Speed up Clog Access by increasing CLOG buffers

2017-03-10 Thread Robert Haas
On Fri, Mar 10, 2017 at 3:40 PM, Robert Haas  wrote:
> Finally, I had an unexplained hang during the TAP tests while testing
> out your fix patch.  I haven't been able to reproduce that so it
> might've just been an artifact of something stupid I did, or of some
> unrelated bug, but I think it's best to back up and reconsider a bit
> here.

I was able to reproduce this with the following patch:

diff --git a/src/backend/access/transam/clog.c
b/src/backend/access/transam/clog.c
index bff42dc..0546425 100644
--- a/src/backend/access/transam/clog.c
+++ b/src/backend/access/transam/clog.c
@@ -268,9 +268,11 @@ set_status_by_pages(int nsubxids, TransactionId *subxids,
  * has a race condition (see TransactionGroupUpdateXidStatus) but the
  * worst thing that happens if we mess up is a small loss of efficiency;
  * the intent is to avoid having the leader access pages it wouldn't
- * otherwise need to touch.  Finally, we skip it for prepared transactions,
- * which don't have the semaphore we would need for this optimization,
- * and which are anyway probably not all that common.
+ * otherwise need to touch.  We also skip it if the transaction status is
+ * other than commit, because for rollback and rollback to savepoint, the
+ * list of subxids won't be same as subxids array in PGPROC.  Finally, we skip
+ * it for prepared transactions, which don't have the semaphore we would need
+ * for this optimization, and which are anyway probably not all that common.
  */
 static void
 TransactionIdSetPageStatus(TransactionId xid, int nsubxids,
@@ -280,15 +282,20 @@ TransactionIdSetPageStatus(TransactionId xid,
int nsubxids,
 {
 if (all_xact_same_page &&
 nsubxids < PGPROC_MAX_CACHED_SUBXIDS &&
+status == TRANSACTION_STATUS_COMMITTED &&
 !IsGXactActive())
 {
+Assert(nsubxids == MyPgXact->nxids);
+Assert(memcmp(subxids, MyProc->subxids.xids,
+   nsubxids * sizeof(TransactionId)) == 0);
+
 /*
  * If we can immediately acquire CLogControlLock, we update the status
  * of our own XID and release the lock.  If not, try use group XID
  * update.  If that doesn't work out, fall back to waiting for the
  * lock to perform an update for this transaction only.
  */
-if (LWLockConditionalAcquire(CLogControlLock, LW_EXCLUSIVE))
+if (false && LWLockConditionalAcquire(CLogControlLock, LW_EXCLUSIVE))
 {
 TransactionIdSetPageStatusInternal(xid, nsubxids,
subxids, status, lsn, pageno);
 LWLockRelease(CLogControlLock);

make check-world hung here:

t/009_twophase.pl ..
1..13
ok 1 - Commit prepared transaction after restart
ok 2 - Rollback prepared transaction after restart

[rhaas pgsql]$ ps uxww | grep postgres
rhaas 72255   0.0  0.0  2447996   1684 s000  S+3:40PM   0:00.00
/Users/rhaas/pgsql/tmp_install/Users/rhaas/install/dev/bin/psql -XAtq
-d port=64230 host=/var/folders/y8/r2ycj_jj2vd65v71rmyddpr4gn/T/ZVWy0JGbuw
dbname='postgres' -f - -v ON_ERROR_STOP=1
rhaas 72253   0.0  0.0  2478532   1548   ??  Ss3:40PM   0:00.00
postgres: bgworker: logical replication launcher
rhaas 72252   0.0  0.0  2483132740   ??  Ss3:40PM   0:00.05
postgres: stats collector process
rhaas 72251   0.0  0.0  2486724   1952   ??  Ss3:40PM   0:00.02
postgres: autovacuum launcher process
rhaas 72250   0.0  0.0  2477508880   ??  Ss3:40PM   0:00.03
postgres: wal writer process
rhaas 72249   0.0  0.0  2477508972   ??  Ss3:40PM   0:00.03
postgres: writer process
rhaas 72248   0.0  0.0  2477508   1252   ??  Ss3:40PM   0:00.00
postgres: checkpointer process
rhaas 72246   0.0  0.0  2481604   5076 s000  S+3:40PM   0:00.03
/Users/rhaas/pgsql/tmp_install/Users/rhaas/install/dev/bin/postgres -D
/Users/rhaas/pgsql/src/test/recovery/tmp_check/data_master_Ylq1/pgdata
rhaas 72337   0.0  0.0  2433796688 s002  S+4:14PM   0:00.00
grep postgres
rhaas 72256   0.0  0.0  2478920   2984   ??  Ss3:40PM   0:00.00
postgres: rhaas postgres [local] COMMIT PREPARED waiting for 0/301D0D0

Backtrace of PID 72256:

#0  0x7fff8ecc85c2 in poll ()
#1  0x0001078eb727 in WaitEventSetWaitBlock [inlined] () at
/Users/rhaas/pgsql/src/backend/storage/ipc/latch.c:1118
#2  0x0001078eb727 in WaitEventSetWait (set=0x7fab3c8366c8,
timeout=-1, occurred_events=0x7fff585e5410, nevents=1,
wait_event_info=)
at latch.c:949
#3  0x0001078eb409 in WaitLatchOrSocket (latch=, wakeEvents=, sock=-1, timeout=,
wait_event_info=134217741) at latch.c:349
#4  0x0001078cf077 in SyncRepWaitForLSN (lsn=, commit=) at syncrep.c:284
#5  0x0001076a2dab in FinishPreparedTransaction (gid=, isCommit=1 '\001') at
twophase.c:2110
#6  0x000107919420 in standard_ProcessUtility (pstmt=, queryString=,
context=PROCESS_UTILITY_TOPLEVEL, params=0x0, dest=0x7fab3c853cf8,
completionTag=)
at utility.c:452
#7  0x0001079186f3 in PortalRunUtility (portal=0x7fab3c874a40,

Re: [HACKERS] Speed up Clog Access by increasing CLOG buffers

2017-03-10 Thread Robert Haas
On Fri, Mar 10, 2017 at 6:25 AM, Amit Kapila  wrote:
> On Fri, Mar 10, 2017 at 11:51 AM, Tom Lane  wrote:
>> Amit Kapila  writes:
>>> Just to let you know that I think I have figured out the reason of
>>> failure.  If we run the regressions with attached patch, it will make
>>> the regression tests fail consistently in same way.  The patch just
>>> makes all transaction status updates to go via group clog update
>>> mechanism.
>>
>> This does *not* give me a warm fuzzy feeling that this patch was
>> ready to commit.  Or even that it was tested to the claimed degree.
>>
>
> I think this is more of an implementation detail missed by me.  We
> have done quite some performance/stress testing with a different
> number of savepoints, but this could have been caught only by having
> Rollback to Savepoint followed by a commit.  I agree that we could
> have devised some simple way (like the one I shared above) to test the
> wide range of tests with this new mechanism earlier.  This is a
> learning from here and I will try to be more cautious about such
> things in future.

After some study, I don't feel confident that it's this simple.  The
underlying issue here is that TransactionGroupUpdateXidStatus thinks
it can assume that proc->clogGroupMemberXid, pgxact->nxids, and
proc->subxids.xids match the values that were passed to
TransactionIdSetPageStatus, but that's not checked anywhere.  For
example, I thought about adding these assertions:

   Assert(nsubxids == MyPgXact->nxids);
   Assert(memcmp(subxids, MyProc->subxids.xids,
  nsubxids * sizeof(TransactionId)) == 0);

There's not even a comment in the patch anywhere that notes that we're
assuming this, let alone anything that checks that it's actually true,
which seems worrying.

One thing that seems off is that we have this new field
clogGroupMemberXid, which we use to determine the XID that is being
committed, but for the subxids we think it's going to be true in every
case.   Well, that seems a bit odd, right?  I mean, if the contents of
the PGXACT are a valid way to figure out the subxids that we need to
worry about, then why not also it to get the toplevel XID?

Another point that's kind of bothering me is that this whole approach
now seems to me to be an abstraction violation.  It relies on the set
of subxids for which we're setting status in clog matching the set of
subxids advertised in PGPROC.  But actually there's a fair amount of
separation between those things.  What's getting passed down to clog
is coming from xact.c's transaction state stack, which is completely
separate from the procarray.  Now after going over the logic in some
detail, it does look to me that you're correct that in the case of a
toplevel commit they will always match, but in some sense that looks
accidental.

For example, look at this code from RecordTransactionAbort:

/*
 * If we're aborting a subtransaction, we can immediately remove failed
 * XIDs from PGPROC's cache of running child XIDs.  We do that here for
 * subxacts, because we already have the child XID array at hand.  For
 * main xacts, the equivalent happens just after this function returns.
 */
if (isSubXact)
XidCacheRemoveRunningXids(xid, nchildren, children, latestXid);

That code paints the removal of the aborted subxids from our PGPROC as
an optimization, not a requirement for correctness.  And without this
patch, that's correct: the XIDs are advertised in PGPROC so that we
construct correct snapshots, but they only need to be present there
for so long as there is a possibility that those XIDs might in the
future commit.  Once they've aborted, it's not *necessary* for them to
appear in PGPROC any more, but it doesn't hurt anything if they do.
However, with this patch, removing them from PGPROC becomes a hard
requirement, because otherwise the set of XIDs that are running
according to the transaction state stack and the set that are running
according to the PGPROC might be different.  Yet, neither the original
patch nor your proposed fix patch updated any of the comments here.

One might wonder whether it's even wise to tie these things together
too closely.  For example, you can imagine a future patch for
autonomous transactions stashing their XIDs in the subxids array.
That'd be fine for snapshot purposes, but it would break this.

Finally, I had an unexplained hang during the TAP tests while testing
out your fix patch.  I haven't been able to reproduce that so it
might've just been an artifact of something stupid I did, or of some
unrelated bug, but I think it's best to back up and reconsider a bit
here.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Speed up Clog Access by increasing CLOG buffers

2017-03-10 Thread Amit Kapila
On Fri, Mar 10, 2017 at 11:51 AM, Tom Lane  wrote:
> Amit Kapila  writes:
>> Just to let you know that I think I have figured out the reason of
>> failure.  If we run the regressions with attached patch, it will make
>> the regression tests fail consistently in same way.  The patch just
>> makes all transaction status updates to go via group clog update
>> mechanism.
>
> This does *not* give me a warm fuzzy feeling that this patch was
> ready to commit.  Or even that it was tested to the claimed degree.
>

I think this is more of an implementation detail missed by me.  We
have done quite some performance/stress testing with a different
number of savepoints, but this could have been caught only by having
Rollback to Savepoint followed by a commit.  I agree that we could
have devised some simple way (like the one I shared above) to test the
wide range of tests with this new mechanism earlier.  This is a
learning from here and I will try to be more cautious about such
things in future.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Speed up Clog Access by increasing CLOG buffers

2017-03-10 Thread Amit Kapila
On Fri, Mar 10, 2017 at 11:43 AM, Amit Kapila  wrote:
> On Fri, Mar 10, 2017 at 10:51 AM, Tom Lane  wrote:
>>
>> Also, I see clam reported in green just now, so it's not 100%
>> reproducible :-(
>>
>
> Just to let you know that I think I have figured out the reason of
> failure.  If we run the regressions with attached patch, it will make
> the regression tests fail consistently in same way.  The patch just
> makes all transaction status updates to go via group clog update
> mechanism.  Now, the reason of the problem is that the patch has
> relied on XidCache in PGPROC for subtransactions when they are not
> overflowed which is okay for Commits, but not for Rollback to
> Savepoint and Rollback.  For Rollback to Savepoint, we just pass the
> particular (sub)-transaction id to abort, but group mechanism will
> abort all the sub-transactions in that top transaction to Rollback.  I
> am still analysing what could be the best way to fix this issue.  I
> think there could be multiple ways to fix this problem.  One way is
> that we can advertise the fact that the status update for transaction
> involves subtransactions and then we can use xidcache for actually
> processing the status update.  Second is advertise all the
> subtransaction ids for which status needs to be update, but I am sure
> that is not-at all efficient as that will cosume lot of memory.  Last
> resort could be that we don't use group clog update optimization when
> transaction has sub-transactions.
>

On further analysis, I don't think the first way mentioned above can
work for Rollback To Savepoint because it can pass just a subset of
sub-tranasctions in which case we can never identify it by looking at
subxids in PGPROC unless we advertise all such subxids.  The case I am
talking is something like:

Begin;
Savepoint one;
Insert ...
Savepoint two
Insert ..
Savepoint three
Insert ...
Rollback to Savepoint two;

Now, for Rollback to Savepoint two, we pass transaction ids
corresponding to Savepoint three and two.

So, I think we can apply this optimization only for transactions that
always commits which will anyway be the most common use case.  Another
alternative as mentioned above is to do this optimization when there
are no subtransactions involved.  Attached two patches implements
these two approaches (fix_clog_group_commit_opt_v1.patch - allow
optimization only for commits; fix_clog_group_commit_opt_v2.patch -
allow optimizations for transaction status updates that don't involve
subxids).  I think the first approach is a better way to deal with
this, let me know your thoughts?


-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


fix_clog_group_commit_opt_v1.patch
Description: Binary data


fix_clog_group_commit_opt_v2.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Speed up Clog Access by increasing CLOG buffers

2017-03-09 Thread Tom Lane
Amit Kapila  writes:
> Just to let you know that I think I have figured out the reason of
> failure.  If we run the regressions with attached patch, it will make
> the regression tests fail consistently in same way.  The patch just
> makes all transaction status updates to go via group clog update
> mechanism.

This does *not* give me a warm fuzzy feeling that this patch was
ready to commit.  Or even that it was tested to the claimed degree.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Speed up Clog Access by increasing CLOG buffers

2017-03-09 Thread Amit Kapila
On Fri, Mar 10, 2017 at 10:51 AM, Tom Lane  wrote:
> Robert Haas  writes:
>> On Thu, Mar 9, 2017 at 9:17 PM, Tom Lane  wrote:
>>> Buildfarm thinks eight wasn't enough.
>>> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=clam=2017-03-10%2002%3A00%3A01
>
>> At first I was confused how you knew that this was the fault of this
>> patch, but this seems like a pretty indicator:
>> TRAP: FailedAssertion("!(curval == 0 || (curval == 0x03 && status !=
>> 0x00) || curval == status)", File: "clog.c", Line: 574)
>
> Yeah, that's what led me to blame the clog-group-update patch.
>
>> I'm not sure whether it's related to this problem or not, but now that
>> I look at it, this (preexisting) comment looks like entirely wishful
>> thinking:
>>  * If we update more than one xid on this page while it is being written
>>  * out, we might find that some of the bits go to disk and others don't.
>>  * If we are updating commits on the page with the top-level xid that
>>  * could break atomicity, so we subcommit the subxids first before we 
>> mark
>>  * the top-level commit.
>
> Maybe, but that comment dates to 2008 according to git, and clam has
> been, er, happy as a clam up to now.  My money is on a newly-introduced
> memory-access-ordering bug.
>
> Also, I see clam reported in green just now, so it's not 100%
> reproducible :-(
>

Just to let you know that I think I have figured out the reason of
failure.  If we run the regressions with attached patch, it will make
the regression tests fail consistently in same way.  The patch just
makes all transaction status updates to go via group clog update
mechanism.  Now, the reason of the problem is that the patch has
relied on XidCache in PGPROC for subtransactions when they are not
overflowed which is okay for Commits, but not for Rollback to
Savepoint and Rollback.  For Rollback to Savepoint, we just pass the
particular (sub)-transaction id to abort, but group mechanism will
abort all the sub-transactions in that top transaction to Rollback.  I
am still analysing what could be the best way to fix this issue.  I
think there could be multiple ways to fix this problem.  One way is
that we can advertise the fact that the status update for transaction
involves subtransactions and then we can use xidcache for actually
processing the status update.  Second is advertise all the
subtransaction ids for which status needs to be update, but I am sure
that is not-at all efficient as that will cosume lot of memory.  Last
resort could be that we don't use group clog update optimization when
transaction has sub-transactions.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


force_clog_group_commit_v1.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Speed up Clog Access by increasing CLOG buffers

2017-03-09 Thread Tom Lane
Robert Haas  writes:
> On Thu, Mar 9, 2017 at 9:17 PM, Tom Lane  wrote:
>> Buildfarm thinks eight wasn't enough.
>> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=clam=2017-03-10%2002%3A00%3A01

> At first I was confused how you knew that this was the fault of this
> patch, but this seems like a pretty indicator:
> TRAP: FailedAssertion("!(curval == 0 || (curval == 0x03 && status !=
> 0x00) || curval == status)", File: "clog.c", Line: 574)

Yeah, that's what led me to blame the clog-group-update patch.

> I'm not sure whether it's related to this problem or not, but now that
> I look at it, this (preexisting) comment looks like entirely wishful
> thinking:
>  * If we update more than one xid on this page while it is being written
>  * out, we might find that some of the bits go to disk and others don't.
>  * If we are updating commits on the page with the top-level xid that
>  * could break atomicity, so we subcommit the subxids first before we mark
>  * the top-level commit.

Maybe, but that comment dates to 2008 according to git, and clam has
been, er, happy as a clam up to now.  My money is on a newly-introduced
memory-access-ordering bug.

Also, I see clam reported in green just now, so it's not 100%
reproducible :-(

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Speed up Clog Access by increasing CLOG buffers

2017-03-09 Thread Robert Haas
On Thu, Mar 9, 2017 at 9:17 PM, Tom Lane  wrote:
> Robert Haas  writes:
>> I think eight is enough.  Committed with some cosmetic changes.
>
> Buildfarm thinks eight wasn't enough.
>
> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=clam=2017-03-10%2002%3A00%3A01

At first I was confused how you knew that this was the fault of this
patch, but this seems like a pretty indicator:

TRAP: FailedAssertion("!(curval == 0 || (curval == 0x03 && status !=
0x00) || curval == status)", File: "clog.c", Line: 574)

I'm not sure whether it's related to this problem or not, but now that
I look at it, this (preexisting) comment looks like entirely wishful
thinking:

 * If we update more than one xid on this page while it is being written
 * out, we might find that some of the bits go to disk and others don't.
 * If we are updating commits on the page with the top-level xid that
 * could break atomicity, so we subcommit the subxids first before we mark
 * the top-level commit.

The problem with that is the word "before".  There are no memory
barriers here, so there's zero guarantee that other processes see the
writes in the order they're performed here.  But it might be a stretch
to suppose that that would cause this symptom.

Maybe we should replace that Assert() with an elog() and dump out the
actual values.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Speed up Clog Access by increasing CLOG buffers

2017-03-09 Thread Amit Kapila
On Fri, Mar 10, 2017 at 7:47 AM, Tom Lane  wrote:
> Robert Haas  writes:
>> I think eight is enough.  Committed with some cosmetic changes.
>
> Buildfarm thinks eight wasn't enough.
>
> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=clam=2017-03-10%2002%3A00%3A01
>

Will look into this, though I don't have access to that machine, but
it looks to be a power machine and I have access to somewhat similar
machine.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Speed up Clog Access by increasing CLOG buffers

2017-03-09 Thread Tom Lane
Robert Haas  writes:
> I think eight is enough.  Committed with some cosmetic changes.

Buildfarm thinks eight wasn't enough.

https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=clam=2017-03-10%2002%3A00%3A01

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Speed up Clog Access by increasing CLOG buffers

2017-03-09 Thread Robert Haas
On Tue, Jan 31, 2017 at 11:35 PM, Michael Paquier
 wrote:
>> Thanks for the review.
>
> Moved to CF 2017-03, the 8th commit fest of this patch.

I think eight is enough.  Committed with some cosmetic changes.

I think the turning point for this somewhat-troubled patch was when we
realized that, while results were somewhat mixed on whether it
improved performance, wait event monitoring showed that it definitely
reduced contention significantly.  However, I just realized that in
both this case and in the case of group XID clearing, we weren't
advertising a wait event for the PGSemaphoreLock calls that are part
of the group locking machinery.  I think we should fix that, because a
quick test shows that can happen fairly often -- not, I think, as
often as we would have seen LWLock waits without these patches, but
often enough that you'll want to know.  Patch attached.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


group-update-waits-v1.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Speed up Clog Access by increasing CLOG buffers

2017-01-31 Thread Michael Paquier
On Tue, Jan 17, 2017 at 9:18 PM, Amit Kapila  wrote:
> On Tue, Jan 17, 2017 at 11:39 AM, Dilip Kumar  wrote:
>> On Wed, Jan 11, 2017 at 10:55 AM, Dilip Kumar  wrote:
>>> I have reviewed the latest patch and I don't have any more comments.
>>> So if there is no objection from other reviewers I can move it to
>>> "Ready For Committer"?
>>
>> Seeing no objections, I have moved it to Ready For Committer.
>>
>
> Thanks for the review.

Moved to CF 2017-03, the 8th commit fest of this patch.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Speed up Clog Access by increasing CLOG buffers

2017-01-17 Thread Amit Kapila
On Tue, Jan 17, 2017 at 11:39 AM, Dilip Kumar  wrote:
> On Wed, Jan 11, 2017 at 10:55 AM, Dilip Kumar  wrote:
>> I have reviewed the latest patch and I don't have any more comments.
>> So if there is no objection from other reviewers I can move it to
>> "Ready For Committer"?
>
> Seeing no objections, I have moved it to Ready For Committer.
>

Thanks for the review.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Speed up Clog Access by increasing CLOG buffers

2017-01-16 Thread Dilip Kumar
On Wed, Jan 11, 2017 at 10:55 AM, Dilip Kumar  wrote:
> I have reviewed the latest patch and I don't have any more comments.
> So if there is no objection from other reviewers I can move it to
> "Ready For Committer"?

Seeing no objections, I have moved it to Ready For Committer.


-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Speed up Clog Access by increasing CLOG buffers

2017-01-10 Thread Dilip Kumar
On Sat, Dec 31, 2016 at 9:01 AM, Amit Kapila  wrote:
> Agreed and changed accordingly.
>
>> 2. It seems that we have missed one unlock in case of absorbed
>> wakeups. You have initialised extraWaits with -1 and if there is one
>> extra wake up then extraWaits will become 0 (it means we have made one
>> extra call to PGSemaphoreLock and it's our responsibility to fix it as
>> the leader will Unlock only once). But it appear in such case we will
>> not make any call to PGSemaphoreUnlock.
>>
>
> Good catch!  I have fixed it by initialising extraWaits to 0.  This
> same issue exists from Group clear xid for which I will send a patch
> separately.
>
> Apart from above, the patch needs to be adjusted for commit be7b2848
> which has changed the definition of PGSemaphore.

I have reviewed the latest patch and I don't have any more comments.
So if there is no objection from other reviewers I can move it to
"Ready For Committer"?


I have performed one more test, with 3000 scale factor because
previously I tested only up to 1000 scale factor. The purpose of this
test is to check whether there is any regression at higher scale
factor.

Machine: Intel 8 socket machine.
Scale Factor: 3000
Shared Buffer: 8GB
Test: Pgbench RW test.
Run: 30 mins median of 3

Other modified GUC:
-N 300 -c min_wal_size=15GB -c max_wal_size=20GB -c
checkpoint_timeout=900 -c maintenance_work_mem=1GB -c
checkpoint_completion_target=0.9

Summary:
- Did not observed any regression.
- The performance gain is in sync with what we have observed with
other tests at lower scale factors.


Sync_Commit_Off:
client  Head Patch

8 10065   10009
16   18487   18826
32   28167   28057
64   26655   28712
128 20152   24917
256 16740   22891

Sync_Commit_On:

Client   Head Patch

8 5102   5110
16   8087   8282
32 12523 12548
64 14701 15112
128   14656  15238
256   13421  16424

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Speed up Clog Access by increasing CLOG buffers

2016-12-30 Thread Amit Kapila
On Thu, Dec 29, 2016 at 10:41 AM, Dilip Kumar  wrote:
>
> I have done one more pass of the review today. I have few comments.
>
> + if (nextidx != INVALID_PGPROCNO)
> + {
> + /* Sleep until the leader updates our XID status. */
> + for (;;)
> + {
> + /* acts as a read barrier */
> + PGSemaphoreLock(>sem);
> + if (!proc->clogGroupMember)
> + break;
> + extraWaits++;
> + }
> +
> + Assert(pg_atomic_read_u32(>clogGroupNext) == INVALID_PGPROCNO);
> +
> + /* Fix semaphore count for any absorbed wakeups */
> + while (extraWaits-- > 0)
> + PGSemaphoreUnlock(>sem);
> + return true;
> + }
>
> 1. extraWaits is used only locally in this block so I guess we can
> declare inside this block only.
>

Agreed and changed accordingly.

> 2. It seems that we have missed one unlock in case of absorbed
> wakeups. You have initialised extraWaits with -1 and if there is one
> extra wake up then extraWaits will become 0 (it means we have made one
> extra call to PGSemaphoreLock and it's our responsibility to fix it as
> the leader will Unlock only once). But it appear in such case we will
> not make any call to PGSemaphoreUnlock.
>

Good catch!  I have fixed it by initialising extraWaits to 0.  This
same issue exists from Group clear xid for which I will send a patch
separately.

Apart from above, the patch needs to be adjusted for commit be7b2848
which has changed the definition of PGSemaphore.


-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


group_update_clog_v10.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Speed up Clog Access by increasing CLOG buffers

2016-12-28 Thread Dilip Kumar
On Fri, Dec 23, 2016 at 8:28 AM, Amit Kapila  wrote:
> The results look positive.  Do you think we can conclude based on all
> the tests you and Dilip have done, that we can move forward with this
> patch (in particular group-update) or do you still want to do more
> tests?   I am aware that in one of the tests we have observed that
> reducing contention on CLOGControlLock has increased the contention on
> WALWriteLock, but I feel we can leave that point as a note to
> committer and let him take a final call.  From the code perspective
> already Robert and Andres have taken one pass of review and I have
> addressed all their comments, so surely more review of code can help,
> but I think that is not a big deal considering patch size is
> relatively small.

I have done one more pass of the review today. I have few comments.

+ if (nextidx != INVALID_PGPROCNO)
+ {
+ /* Sleep until the leader updates our XID status. */
+ for (;;)
+ {
+ /* acts as a read barrier */
+ PGSemaphoreLock(>sem);
+ if (!proc->clogGroupMember)
+ break;
+ extraWaits++;
+ }
+
+ Assert(pg_atomic_read_u32(>clogGroupNext) == INVALID_PGPROCNO);
+
+ /* Fix semaphore count for any absorbed wakeups */
+ while (extraWaits-- > 0)
+ PGSemaphoreUnlock(>sem);
+ return true;
+ }

1. extraWaits is used only locally in this block so I guess we can
declare inside this block only.

2. It seems that we have missed one unlock in case of absorbed
wakeups. You have initialised extraWaits with -1 and if there is one
extra wake up then extraWaits will become 0 (it means we have made one
extra call to PGSemaphoreLock and it's our responsibility to fix it as
the leader will Unlock only once). But it appear in such case we will
not make any call to PGSemaphoreUnlock. Am I missing something?



-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Speed up Clog Access by increasing CLOG buffers

2016-12-27 Thread Tomas Vondra

On 12/23/2016 03:58 AM, Amit Kapila wrote:

On Thu, Dec 22, 2016 at 6:59 PM, Tomas Vondra
 wrote:

Hi,

But as discussed with Amit in Tokyo at pgconf.asia, I got access to a
Power8e machine (IBM 8247-22L to be precise). It's a much smaller machine
compared to the x86 one, though - it only has 24 cores in 2 sockets, 128GB
of RAM and less powerful storage, for example.

I've repeated a subset of x86 tests and pushed them to

https://bitbucket.org/tvondra/power8-results-2

The new results are prefixed with "power-" and I've tried to put them right
next to the "same" x86 tests.

In all cases the patches significantly reduce the contention on
CLogControlLock, just like on x86. Which is good and expected.



The results look positive.  Do you think we can conclude based on all
the tests you and Dilip have done, that we can move forward with this
patch (in particular group-update) or do you still want to do more
tests?   I am aware that in one of the tests we have observed that
reducing contention on CLOGControlLock has increased the contention on
WALWriteLock, but I feel we can leave that point as a note to
committer and let him take a final call.  From the code perspective
already Robert and Andres have taken one pass of review and I have
addressed all their comments, so surely more review of code can help,
but I think that is not a big deal considering patch size is
relatively small.



Yes, I believe that seems like a reasonable conclusion. I've done a few 
more tests on the Power machine with data placed on a tmpfs filesystem 
(to minimize all the I/O overhead), but the results are the same.


I don't think more testing is needed at this point, at lest not with the 
synthetic test cases we've been using for the testing. The patch already 
received way more benchmarking than most other patches.


regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Speed up Clog Access by increasing CLOG buffers

2016-12-22 Thread Amit Kapila
On Thu, Dec 22, 2016 at 6:59 PM, Tomas Vondra
 wrote:
> Hi,
>
> But as discussed with Amit in Tokyo at pgconf.asia, I got access to a
> Power8e machine (IBM 8247-22L to be precise). It's a much smaller machine
> compared to the x86 one, though - it only has 24 cores in 2 sockets, 128GB
> of RAM and less powerful storage, for example.
>
> I've repeated a subset of x86 tests and pushed them to
>
> https://bitbucket.org/tvondra/power8-results-2
>
> The new results are prefixed with "power-" and I've tried to put them right
> next to the "same" x86 tests.
>
> In all cases the patches significantly reduce the contention on
> CLogControlLock, just like on x86. Which is good and expected.
>

The results look positive.  Do you think we can conclude based on all
the tests you and Dilip have done, that we can move forward with this
patch (in particular group-update) or do you still want to do more
tests?   I am aware that in one of the tests we have observed that
reducing contention on CLOGControlLock has increased the contention on
WALWriteLock, but I feel we can leave that point as a note to
committer and let him take a final call.  From the code perspective
already Robert and Andres have taken one pass of review and I have
addressed all their comments, so surely more review of code can help,
but I think that is not a big deal considering patch size is
relatively small.


-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Speed up Clog Access by increasing CLOG buffers

2016-12-22 Thread Tomas Vondra

Hi,


The attached results show that:

(a) master shows the same zig-zag behavior - No idea why this wasn't
observed on the previous runs.

(b) group_update actually seems to improve the situation, because the
performance keeps stable up to 72 clients, while on master the
fluctuation starts way earlier.

I'll redo the tests with a newer kernel - this was on 3.10.x which is
what Red Hat 7.2 uses, I'll try on 4.8.6. Then I'll try with the patches
you submitted, if the 4.8.6 kernel does not help.

Overall, I'm convinced this issue is unrelated to the patches.


I've been unable to rerun the tests on this hardware with a newer 
kernel, so nothing new on the x86 front.


But as discussed with Amit in Tokyo at pgconf.asia, I got access to a 
Power8e machine (IBM 8247-22L to be precise). It's a much smaller 
machine compared to the x86 one, though - it only has 24 cores in 2 
sockets, 128GB of RAM and less powerful storage, for example.


I've repeated a subset of x86 tests and pushed them to

https://bitbucket.org/tvondra/power8-results-2

The new results are prefixed with "power-" and I've tried to put them 
right next to the "same" x86 tests.


In all cases the patches significantly reduce the contention on 
CLogControlLock, just like on x86. Which is good and expected.


Otherwise the results are rather boring - no major regressions compared 
to master, and all the patches perform almost exactly the same. Compare 
for example this:


* http://tvondra.bitbucket.org/#dilip-300-unlogged-sync

* http://tvondra.bitbucket.org/#power-dilip-300-unlogged-sync

So the results seem much smoother compared to x86, and the performance 
difference is roughly 3x, which matches the 24 vs. 72 cores.


For pgbench, the difference is much more significant, though:

* http://tvondra.bitbucket.org/#pgbench-300-unlogged-sync-skip

* http://tvondra.bitbucket.org/#power-pgbench-300-unlogged-sync-skip

So, we're doing ~40k on Power8, but 220k on x86 (which is ~6x more, so 
double per-core throughput). My first guess was that this is due to the 
x86 machine having better I/O subsystem, so I've reran the tests with 
data directory in tmpfs, but that produced almost the same results.


Of course, this observation is unrelated to this patch.

regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Speed up Clog Access by increasing CLOG buffers

2016-12-04 Thread Haribabu Kommi
On Mon, Dec 5, 2016 at 1:14 PM, Amit Kapila  wrote:

> On Mon, Dec 5, 2016 at 6:00 AM, Haribabu Kommi 
> wrote:
>
> No, that is not true.  You have quoted the wrong message, that
> discussion was about WALWriteLock contention not about the patch being
> discussed in this thread.  I have posted the latest set of patches
> here [1].  Tomas is supposed to share the results of his tests.  He
> mentioned to me in PGConf Asia last week that he ran few tests on
> Power Box, so let us wait for him to share his findings.
>
> > Moved to next CF with "waiting on author" status. Please feel free to
> > update the status if the current status differs with the actual patch
> > status.
> >
>
> I think we should keep the status as "Needs Review".
>
> [1] - https://www.postgresql.org/message-id/CAA4eK1JjatUZu0%
> 2BHCi%3D5VM1q-hFgN_OhegPAwEUJqxf-7pESbg%40mail.gmail.com


Thanks for the update.
I changed the status to "needs review" in 2017-01 commitfest.

Regards,
Hari Babu
Fujitsu Australia


Re: [HACKERS] Speed up Clog Access by increasing CLOG buffers

2016-12-04 Thread Amit Kapila
On Mon, Dec 5, 2016 at 6:00 AM, Haribabu Kommi  wrote:
>
>
> On Fri, Nov 4, 2016 at 8:20 PM, Amit Kapila  wrote:
>>
>> On Thu, Nov 3, 2016 at 8:38 PM, Robert Haas  wrote:
>> > On Tue, Nov 1, 2016 at 11:31 PM, Tomas Vondra
>> >> The difference is that both the fast-path locks and msgNumLock went
>> >> into
>> >> 9.2, so that end users probably never saw that regression. But we don't
>> >> know
>> >> if that happens for clog and WAL.
>> >>
>> >> Perhaps you have a working patch addressing the WAL contention, so that
>> >> we
>> >> could see how that changes the results?
>> >
>> > I don't think we do, yet.
>> >
>>
>> Right.  At this stage, we are just evaluating the ways (basic idea is
>> to split the OS writes and Flush requests in separate locks) to reduce
>> it.  It is difficult to speculate results at this stage.  I think
>> after spending some more time (probably few weeks), we will be in
>> position to share our findings.
>>
>
> As per my understanding the current state of the patch is waiting for the
> performance results from author.
>

No, that is not true.  You have quoted the wrong message, that
discussion was about WALWriteLock contention not about the patch being
discussed in this thread.  I have posted the latest set of patches
here [1].  Tomas is supposed to share the results of his tests.  He
mentioned to me in PGConf Asia last week that he ran few tests on
Power Box, so let us wait for him to share his findings.

> Moved to next CF with "waiting on author" status. Please feel free to
> update the status if the current status differs with the actual patch
> status.
>

I think we should keep the status as "Needs Review".

[1] - 
https://www.postgresql.org/message-id/CAA4eK1JjatUZu0%2BHCi%3D5VM1q-hFgN_OhegPAwEUJqxf-7pESbg%40mail.gmail.com


-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Speed up Clog Access by increasing CLOG buffers

2016-12-04 Thread Haribabu Kommi
On Fri, Nov 4, 2016 at 8:20 PM, Amit Kapila  wrote:

> On Thu, Nov 3, 2016 at 8:38 PM, Robert Haas  wrote:
> > On Tue, Nov 1, 2016 at 11:31 PM, Tomas Vondra
> >> The difference is that both the fast-path locks and msgNumLock went into
> >> 9.2, so that end users probably never saw that regression. But we don't
> know
> >> if that happens for clog and WAL.
> >>
> >> Perhaps you have a working patch addressing the WAL contention, so that
> we
> >> could see how that changes the results?
> >
> > I don't think we do, yet.
> >
>
> Right.  At this stage, we are just evaluating the ways (basic idea is
> to split the OS writes and Flush requests in separate locks) to reduce
> it.  It is difficult to speculate results at this stage.  I think
> after spending some more time (probably few weeks), we will be in
> position to share our findings.
>
>
As per my understanding the current state of the patch is waiting for the
performance results from author.

Moved to next CF with "waiting on author" status. Please feel free to
update the status if the current status differs with the actual patch
status.

Regards,
Hari Babu
Fujitsu Australia


Re: [HACKERS] Speed up Clog Access by increasing CLOG buffers

2016-11-04 Thread Amit Kapila
On Thu, Nov 3, 2016 at 8:38 PM, Robert Haas  wrote:
> On Tue, Nov 1, 2016 at 11:31 PM, Tomas Vondra
>> The difference is that both the fast-path locks and msgNumLock went into
>> 9.2, so that end users probably never saw that regression. But we don't know
>> if that happens for clog and WAL.
>>
>> Perhaps you have a working patch addressing the WAL contention, so that we
>> could see how that changes the results?
>
> I don't think we do, yet.
>

Right.  At this stage, we are just evaluating the ways (basic idea is
to split the OS writes and Flush requests in separate locks) to reduce
it.  It is difficult to speculate results at this stage.  I think
after spending some more time (probably few weeks), we will be in
position to share our findings.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Speed up Clog Access by increasing CLOG buffers

2016-11-03 Thread Robert Haas
On Tue, Nov 1, 2016 at 11:31 PM, Tomas Vondra
 wrote:
> I don't think I've suggested not committing any of the clog patches (or
> other patches in general) because shifting the contention somewhere else
> might cause regressions. At the end of the last CF I've however stated that
> we need to better understand the impact on various wokloads, and I think
> Amit agreed with that conclusion.
>
> We have that understanding now, I believe - also thanks to your idea of
> sampling wait events data.
>
> You're right we can't fix all the contention points in one patch, and that
> shifting the contention may cause regressions. But we should at least
> understand what workloads might be impacted, how serious the regressions may
> get etc. Which is why all the testing was done.

OK.

> Sure, I understand that. My main worry was that people will get worse
> performance with the next major version that what they get now (assuming we
> don't manage to address the other contention points). Which is difficult to
> explain to users & customers, no matter how reasonable it seems to us.
>
> The difference is that both the fast-path locks and msgNumLock went into
> 9.2, so that end users probably never saw that regression. But we don't know
> if that happens for clog and WAL.
>
> Perhaps you have a working patch addressing the WAL contention, so that we
> could see how that changes the results?

I don't think we do, yet.  Amit or Kuntal might know more.  At some
level I think we're just hitting the limits of the hardware's ability
to lay bytes on a platter, and fine-tuning the locking may not help
much.

> I might be wrong, but I doubt the kernel guys are running particularly wide
> set of tests, so how likely is it they will notice issues with specific
> workloads? Wouldn't it be great if we could tell them there's a bug and
> provide a workload that reproduces it?
>
> I don't see how "it's a Linux issue" makes it someone else's problem. The
> kernel guys can't really test everything (and are not obliged to). It's up
> to us to do more testing in this area, and report issues to the kernel guys
> (which is not happening as much as it should).

I don't exactly disagree with any of that.  I just want to find a
course of action that we can agree on and move forward.  This has been
cooking for a long time, and I want to converge on some resolution.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Speed up Clog Access by increasing CLOG buffers

2016-11-02 Thread Tomas Vondra

On 11/02/2016 05:52 PM, Amit Kapila wrote:

On Wed, Nov 2, 2016 at 9:01 AM, Tomas Vondra
 wrote:

On 11/01/2016 08:13 PM, Robert Haas wrote:


On Mon, Oct 31, 2016 at 5:48 PM, Tomas Vondra
 wrote:




The one remaining thing is the strange zig-zag behavior, but that might
easily be a due to scheduling in kernel, or something else. I don't consider
it a blocker for any of the patches, though.



The only reason I could think of for that zig-zag behaviour is
frequent multiple clog page accesses and it could be due to below
reasons:

a. transaction and its subtransactions (IIRC, Dilip's case has one
main transaction and two subtransactions) can't fit into same page, in
which case the group_update optimization won't apply and I don't think
we can do anything for it.
b. In the same group, multiple clog pages are being accessed.  It is
not a likely scenario, but it can happen and we might be able to
improve a bit if that is happening.
c. The transactions at same time tries to update different clog page.
I think as mentioned upthread we can handle it by using slots an
allowing multiple groups to work together instead of a single group.

To check if there is any impact due to (a) or (b), I have added few
logs in code (patch - group_update_clog_v9_log). The log message
could be "all xacts are not on same page" or "Group contains
different pages".

Patch group_update_clog_v9_slots tries to address (c). So if there
is any problem due to (c), this patch should improve the situation.

Can you please try to run the test where you saw zig-zag behaviour
with both the patches separately? I think if there is anything due
to postgres, then you can see either one of the new log message or
performance will be improved, OTOH if we see same behaviour, then I
think we can probably assume it due to scheduler activity and move
on. Also one point to note here is that even when the performance is
down in that curve, it is equal to or better than HEAD.



Will do.

Based on the results with more client counts (increment by 6 clients 
instead of 36), I think this really looks like something unrelated to 
any of the patches - kernel, CPU, or something already present in 
current master.


The attached results show that:

(a) master shows the same zig-zag behavior - No idea why this wasn't 
observed on the previous runs.


(b) group_update actually seems to improve the situation, because the 
performance keeps stable up to 72 clients, while on master the 
fluctuation starts way earlier.


I'll redo the tests with a newer kernel - this was on 3.10.x which is 
what Red Hat 7.2 uses, I'll try on 4.8.6. Then I'll try with the patches 
you submitted, if the 4.8.6 kernel does not help.


Overall, I'm convinced this issue is unrelated to the patches.

regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


zig-zag.ods
Description: application/vnd.oasis.opendocument.spreadsheet

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Speed up Clog Access by increasing CLOG buffers

2016-11-02 Thread Tomas Vondra

On 11/02/2016 05:52 PM, Amit Kapila wrote:

On Wed, Nov 2, 2016 at 9:01 AM, Tomas Vondra
 wrote:

On 11/01/2016 08:13 PM, Robert Haas wrote:


On Mon, Oct 31, 2016 at 5:48 PM, Tomas Vondra
 wrote:




The one remaining thing is the strange zig-zag behavior, but that might
easily be a due to scheduling in kernel, or something else. I don't consider
it a blocker for any of the patches, though.



The only reason I could think of for that zig-zag behaviour is
frequent multiple clog page accesses and it could be due to below
reasons:

a. transaction and its subtransactions (IIRC, Dilip's case has one
main transaction and two subtransactions) can't fit into same page, in
which case the group_update optimization won't apply and I don't think
we can do anything for it.
b. In the same group, multiple clog pages are being accessed.  It is
not a likely scenario, but it can happen and we might be able to
improve a bit if that is happening.
c. The transactions at same time tries to update different clog page.
I think as mentioned upthread we can handle it by using slots an
allowing multiple groups to work together instead of a single group.

To check if there is any impact due to (a) or (b), I have added few
logs in code (patch - group_update_clog_v9_log). The log message
could be "all xacts are not on same page" or "Group contains
different pages".

Patch group_update_clog_v9_slots tries to address (c). So if there
is any problem due to (c), this patch should improve the situation.

Can you please try to run the test where you saw zig-zag behaviour
with both the patches separately? I think if there is anything due
to postgres, then you can see either one of the new log message or
performance will be improved, OTOH if we see same behaviour, then I
think we can probably assume it due to scheduler activity and move
on. Also one point to note here is that even when the performance is
down in that curve, it is equal to or better than HEAD.



Will do.

Based on the results with more client counts (increment by 6 clients 
instead of 36), I think this really looks like something unrelated to 
any of the patches - kernel, CPU, or something already present in 
current master.


The attached results show that:

(a) master shows the same zig-zag behavior - No idea why this wasn't 
observed on the previous runs.


(b) group_update actually seems to improve the situation, because the 
performance keeps stable up to 72 clients, while on master the 
fluctuation starts way earlier.


I'll redo the tests with a newer kernel - this was on 3.10.x which is 
what Red Hat 7.2 uses, I'll try on 4.8.6. Then I'll try with the patches 
you submitted, if the 4.8.6 kernel does not help.


Overall, I'm convinced this issue is unrelated to the patches.

regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Speed up Clog Access by increasing CLOG buffers

2016-11-02 Thread Amit Kapila
On Wed, Nov 2, 2016 at 9:01 AM, Tomas Vondra
 wrote:
> On 11/01/2016 08:13 PM, Robert Haas wrote:
>>
>> On Mon, Oct 31, 2016 at 5:48 PM, Tomas Vondra
>>  wrote:
>>>
>
> The one remaining thing is the strange zig-zag behavior, but that might
> easily be a due to scheduling in kernel, or something else. I don't consider
> it a blocker for any of the patches, though.
>

The only reason I could think of for that zig-zag behaviour is
frequent multiple clog page accesses and it could be due to below
reasons:

a. transaction and its subtransactions (IIRC, Dilip's case has one
main transaction and two subtransactions) can't fit into same page, in
which case the group_update optimization won't apply and I don't think
we can do anything for it.
b. In the same group, multiple clog pages are being accessed.  It is
not a likely scenario, but it can happen and we might be able to
improve a bit if that is happening.
c. The transactions at same time tries to update different clog page.
I think as mentioned upthread we can handle it by using slots an
allowing multiple groups to work together instead of a single group.

To check if there is any impact due to (a) or (b), I have added few
logs in code (patch - group_update_clog_v9_log).  The log message
could be "all xacts are not on same page" or  "Group contains
different pages".

Patch group_update_clog_v9_slots tries to address (c). So if there is
any problem due to (c), this patch should improve the situation.

Can you please try to run the test where you saw zig-zag behaviour
with both the patches separately?  I think if there is anything due to
postgres, then you can see either one of the new log message or
performance will be improved, OTOH if we see same behaviour, then I
think we can probably assume it due to scheduler activity and move on.
Also one point to note here is that even when the performance is down
in that curve, it is equal to or better than HEAD.


-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


group_update_clog_v9_log.patch
Description: Binary data


group_update_clog_v9_slots.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Speed up Clog Access by increasing CLOG buffers

2016-11-01 Thread Tomas Vondra

On 11/01/2016 08:13 PM, Robert Haas wrote:

On Mon, Oct 31, 2016 at 5:48 PM, Tomas Vondra
 wrote:

Honestly, I have no idea what to think about this ...


I think a lot of the details here depend on OS scheduler behavior.
For example, here's one of the first scalability graphs I ever did:

http://rhaas.blogspot.com/2011/09/scalability-in-graphical-form-analyzed.html

It's a nice advertisement for fast-path locking, but look at the funny
shape of the red and green lines between 1 and 32 cores.  The curve is
oddly bowl-shaped.  As the post discusses, we actually dip WAY under
linear scalability in the 8-20 core range and then shoot up like a
rocket afterwards so that at 32 cores we actually achieve super-linear
scalability. You can't blame this on anything except Linux.  Someone
shared BSD graphs (I forget which flavor) with me privately and they
don't exhibit this poor behavior.  (They had different poor behaviors
instead - performance collapsed at high client counts.  That was a
long time ago so it's probably fixed now.)

This is why I think it's fundamentally wrong to look at this patch and
say "well, contention goes down, and in some cases that makes
performance go up, but because in other cases it decreases performance
or increases variability we shouldn't commit it".  If we took that
approach, we wouldn't have fast-path locking today, because the early
versions of fast-path locking could exhibit *major* regressions
precisely because of contention shifting to other locks, specifically
SInvalReadLock and msgNumLock.  (cf. commit
b4fbe392f8ff6ff1a66b488eb7197eef9e1770a4).  If we say that because the
contention on those other locks can get worse as a result of
contention on this lock being reduced, or even worse, if we try to
take responsibility for what effect reducing lock contention might
have on the operating system scheduler discipline (which will
certainly differ from system to system and version to version), we're
never going to get anywhere, because there's almost always going to be
some way that reducing contention in one place can bite you someplace
else.



I don't think I've suggested not committing any of the clog patches (or 
other patches in general) because shifting the contention somewhere else 
might cause regressions. At the end of the last CF I've however stated 
that we need to better understand the impact on various wokloads, and I 
think Amit agreed with that conclusion.


We have that understanding now, I believe - also thanks to your idea of 
sampling wait events data.


You're right we can't fix all the contention points in one patch, and 
that shifting the contention may cause regressions. But we should at 
least understand what workloads might be impacted, how serious the 
regressions may get etc. Which is why all the testing was done.




I also believe it's pretty normal for patches that remove lock
contention to increase variability.  If you run an auto race where
every car has a speed governor installed that limits it to 80 kph,
there will be much less variability in the finish times than if you
remove the governor, but that's a stupid way to run a race.  You won't
get much innovation around increasing the top speed of the cars under
those circumstances, either.  Nobody ever bothered optimizing the
contention around msgNumLock before fast-path locking happened,
because the heavyweight lock manager burdened the system so heavily
that you couldn't generate enough contention on it to matter.
Similarly, we're not going to get much traction around optimizing the
other locks to which contention would shift if we applied this patch
unless we apply it.  This is not theoretical: EnterpriseDB staff have
already done work on trying to optimize WALWriteLock, but it's hard to
get a benefit.  The more contention other contention we eliminate, the
easier it will be to see whether a proposed change to WALWriteLock
helps.


Sure, I understand that. My main worry was that people will get worse 
performance with the next major version that what they get now (assuming 
we don't manage to address the other contention points). Which is 
difficult to explain to users & customers, no matter how reasonable it 
seems to us.


The difference is that both the fast-path locks and msgNumLock went into 
9.2, so that end users probably never saw that regression. But we don't 
know if that happens for clog and WAL.


Perhaps you have a working patch addressing the WAL contention, so that 
we could see how that changes the results?



> Of course, we'll also be more at the mercy of operating system

scheduler discipline, but that's not all a bad thing either.  The
Linux kernel guys have been known to run PostgreSQL to see whether
proposed changes help or hurt, but they're not going to try those
tests after applying patches that we rejected because they expose us
to existing Linux shortcomings.



I might be wrong, but I doubt the kernel guys are running particularly 
wide set 

Re: [HACKERS] Speed up Clog Access by increasing CLOG buffers

2016-11-01 Thread Robert Haas
On Mon, Oct 31, 2016 at 5:48 PM, Tomas Vondra
 wrote:
> Honestly, I have no idea what to think about this ...

I think a lot of the details here depend on OS scheduler behavior.
For example, here's one of the first scalability graphs I ever did:

http://rhaas.blogspot.com/2011/09/scalability-in-graphical-form-analyzed.html

It's a nice advertisement for fast-path locking, but look at the funny
shape of the red and green lines between 1 and 32 cores.  The curve is
oddly bowl-shaped.  As the post discusses, we actually dip WAY under
linear scalability in the 8-20 core range and then shoot up like a
rocket afterwards so that at 32 cores we actually achieve super-linear
scalability. You can't blame this on anything except Linux.  Someone
shared BSD graphs (I forget which flavor) with me privately and they
don't exhibit this poor behavior.  (They had different poor behaviors
instead - performance collapsed at high client counts.  That was a
long time ago so it's probably fixed now.)

This is why I think it's fundamentally wrong to look at this patch and
say "well, contention goes down, and in some cases that makes
performance go up, but because in other cases it decreases performance
or increases variability we shouldn't commit it".  If we took that
approach, we wouldn't have fast-path locking today, because the early
versions of fast-path locking could exhibit *major* regressions
precisely because of contention shifting to other locks, specifically
SInvalReadLock and msgNumLock.  (cf. commit
b4fbe392f8ff6ff1a66b488eb7197eef9e1770a4).  If we say that because the
contention on those other locks can get worse as a result of
contention on this lock being reduced, or even worse, if we try to
take responsibility for what effect reducing lock contention might
have on the operating system scheduler discipline (which will
certainly differ from system to system and version to version), we're
never going to get anywhere, because there's almost always going to be
some way that reducing contention in one place can bite you someplace
else.

I also believe it's pretty normal for patches that remove lock
contention to increase variability.  If you run an auto race where
every car has a speed governor installed that limits it to 80 kph,
there will be much less variability in the finish times than if you
remove the governor, but that's a stupid way to run a race.  You won't
get much innovation around increasing the top speed of the cars under
those circumstances, either.  Nobody ever bothered optimizing the
contention around msgNumLock before fast-path locking happened,
because the heavyweight lock manager burdened the system so heavily
that you couldn't generate enough contention on it to matter.
Similarly, we're not going to get much traction around optimizing the
other locks to which contention would shift if we applied this patch
unless we apply it.  This is not theoretical: EnterpriseDB staff have
already done work on trying to optimize WALWriteLock, but it's hard to
get a benefit.  The more contention other contention we eliminate, the
easier it will be to see whether a proposed change to WALWriteLock
helps.  Of course, we'll also be more at the mercy of operating system
scheduler discipline, but that's not all a bad thing either.  The
Linux kernel guys have been known to run PostgreSQL to see whether
proposed changes help or hurt, but they're not going to try those
tests after applying patches that we rejected because they expose us
to existing Linux shortcomings.

I don't want to be perceived as advocating too forcefully for a patch
that was, after all, written by a colleague.  However, I sincerely
believe it's a mistake to say that a patch which reduces lock
contention must show a tangible win or at least no loss on every piece
of hardware, on every kernel, at every client count with no increase
in variability in any configuration.  Very few (if any) patches are
going to be able to meet that bar, and if we make that the bar, people
aren't going to write patches to reduce lock contention in PostgreSQL.
For that to be worth doing, you have to be able to get the patch
committed in finite time.  We've spent an entire release cycle
dithering over this patch.  Several alternative patches have been
written that are not any better (and the people who wrote those
patches don't seem especially interested in doing further work on them
anyway).  There is increasing evidence that the patch is effective at
solving the problem it claims to solve, and that any downsides are
just the result of poor lock-scaling behavior elsewhere which we could
be working on fixing if we weren't still spending time on this.  Is
that really not good enough?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Speed up Clog Access by increasing CLOG buffers

2016-10-31 Thread Tomas Vondra

On 10/31/2016 02:24 PM, Tomas Vondra wrote:

On 10/31/2016 05:01 AM, Jim Nasby wrote:

On 10/30/16 1:32 PM, Tomas Vondra wrote:


Now, maybe this has nothing to do with PostgreSQL itself, but maybe it's
some sort of CPU / OS scheduling artifact. For example, the system has
36 physical cores, 72 virtual ones (thanks to HT). I find it strange
that the "good" client counts are always multiples of 72, while the
"bad" ones fall in between.

  72 = 72 * 1   (good)
 108 = 72 * 1.5 (bad)
 144 = 72 * 2   (good)
 180 = 72 * 2.5 (bad)
 216 = 72 * 3   (good)
 252 = 72 * 3.5 (bad)
 288 = 72 * 4   (good)

So maybe this has something to do with how OS schedules the tasks, or
maybe some internal heuristics in the CPU, or something like that.


It might be enlightening to run a series of tests that are 72*.1 or *.2
apart (say, 72, 79, 86, ..., 137, 144).


Yeah, I've started a benchmark with client a step of 6 clients

36 42 48 54 60 66 72 78 ... 252 258 264 270 276 282 288

instead of just

36 72 108 144 180 216 252 288

which did a test every 36 clients. To compensate for the 6x longer runs,
I'm only running tests for "group-update" and "master", so I should have
the results in ~36h.



So I've been curious and looked at results of the runs executed so far, 
and for the group_update patch it looks like this:


  clients  tps
 -
   36  117663
   42  139791
   48  129331
   54  144970
   60  124174
   66  137227
   72  146064
   78  100267
   84  141538
   90   96607
   96  139290
  102   93976
  108  136421
  114   91848
  120  133563
  126   89801
  132  132607
  138   87912
  144  129688
  150   87221
  156  129608
  162   85403
  168  130193
  174   83863
  180  129337
  186   81968
  192  128571
  198   82053
  204  128020
  210   80768
  216  124153
  222   80493
  228  125503
  234   78950
  240  125670
  246   78418
  252  123532
  258   77623
  264  124366
  270   76726
  276  119054
  282   76960
  288  121819

So, similar saw-like behavior, perfectly periodic. But the really 
strange thing is the peaks/valleys don't match those observed before!


That is, during the previous runs, 72, 144, 216 and 288 were "good" 
while 108, 180 and 252 were "bad". But in those runs, all those client 
counts are "good" ...


Honestly, I have no idea what to think about this ...

regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Speed up Clog Access by increasing CLOG buffers

2016-10-31 Thread Tomas Vondra

On 10/31/2016 08:43 PM, Amit Kapila wrote:

On Mon, Oct 31, 2016 at 7:58 PM, Tomas Vondra
 wrote:

On 10/31/2016 02:51 PM, Amit Kapila wrote:
And moreover, this setup (single device for the whole cluster) is very
common, we can't just neglect it.

But my main point here really is that the trade-off in those cases may not
be really all that great, because you get the best performance at 36/72
clients, and then the tps drops and variability increases. At least not
right now, before tackling contention on the WAL lock (or whatever lock
becomes the bottleneck).



Okay, but does wait event results show increase in contention on some
other locks for pgbench-3000-logged-sync-skip-64?  Can you share wait
events for the runs where there is a fluctuation?



Sure, I do have wait event stats, including a summary for different 
client counts - see this:


http://tvondra.bitbucket.org/by-test/pgbench-3000-logged-sync-skip-64.txt

Looking only at group_update patch for three interesting client counts, 
it looks like this:


   wait_event_type |wait_event |108 144  180
  -+---+-
   LWLockNamed | WALWriteLock  | 661284  847057  1006061
   |   | 126654  191506   265386
   Client  | ClientRead|  37273   5279164799
   LWLockTranche   | wal_insert|  28394   5189379932
   LWLockNamed | CLogControlLock   |   7766   1491323138
   LWLockNamed | WALBufMappingLock |   36153739 3803
   LWLockNamed | ProcArrayLock |9131776 2685
   Lock| extend|9092082 2228
   LWLockNamed | XidGenLock|301 349  675
   LWLockTranche   | clog  |173 331  607
   LWLockTranche   | buffer_content|163 468  737
   LWLockTranche   | lock_manager  | 88 140  145

Compared to master, this shows significant reduction of contention on 
CLogControlLock (which on master has 20k, 83k and 200k samples), and 
moving the contention to WALWriteLock.


But perhaps you're asking about variability during the benchmark? I 
suppose that could be extracted from the collected data, but I haven't 
done that.


regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Speed up Clog Access by increasing CLOG buffers

2016-10-31 Thread Amit Kapila
On Mon, Oct 31, 2016 at 7:58 PM, Tomas Vondra
 wrote:
> On 10/31/2016 02:51 PM, Amit Kapila wrote:
> And moreover, this setup (single device for the whole cluster) is very
> common, we can't just neglect it.
>
> But my main point here really is that the trade-off in those cases may not
> be really all that great, because you get the best performance at 36/72
> clients, and then the tps drops and variability increases. At least not
> right now, before tackling contention on the WAL lock (or whatever lock
> becomes the bottleneck).
>

Okay, but does wait event results show increase in contention on some
other locks for pgbench-3000-logged-sync-skip-64?  Can you share wait
events for the runs where there is a fluctuation?


-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Speed up Clog Access by increasing CLOG buffers

2016-10-31 Thread Tomas Vondra

On 10/31/2016 02:51 PM, Amit Kapila wrote:

On Mon, Oct 31, 2016 at 12:02 AM, Tomas Vondra
 wrote:

Hi,

On 10/27/2016 01:44 PM, Amit Kapila wrote:

I've read that analysis, but I'm not sure I see how it explains the "zig
zag" behavior. I do understand that shifting the contention to some other
(already busy) lock may negatively impact throughput, or that the
group_update may result in updating multiple clog pages, but I don't
understand two things:

(1) Why this should result in the fluctuations we observe in some of the
cases. For example, why should we see 150k tps on, 72 clients, then drop to
92k with 108 clients, then back to 130k on 144 clients, then 84k on 180
clients etc. That seems fairly strange.



I don't think hitting multiple clog pages has much to do with
client-count.  However, we can wait to see your further detailed test
report.


(2) Why this should affect all three patches, when only group_update has to
modify multiple clog pages.



No, all three patches can be affected due to multiple clog pages.
Read second paragraph ("I think one of the probable reasons that could
happen for both the approaches") in same e-mail [1].  It is basically
due to frequent release-and-reacquire of locks.





On logged tables it usually looks like this (i.e. modest increase for
high
client counts at the expense of significantly higher variability):

  http://tvondra.bitbucket.org/#pgbench-3000-logged-sync-skip-64



What variability are you referring to in those results?






Good question. What I mean by "variability" is how stable the tps is during
the benchmark (when measured on per-second granularity). For example, let's
run a 10-second benchmark, measuring number of transactions committed each
second.

Then all those runs do 1000 tps on average:

  run 1: 1000, 1000, 1000, 1000, 1000, 1000, 1000, 1000, 1000, 1000
  run 2: 500, 1500, 500, 1500, 500, 1500, 500, 1500, 500, 1500
  run 3: 0, 2000, 0, 2000, 0, 2000, 0, 2000, 0, 2000



Generally, such behaviours are seen due to writes. Are WAL and DATA
on same disk in your tests?



Yes, there's one RAID device on 10 SSDs, with 4GB of the controller. 
I've done some tests and it easily handles > 1.5GB/s in sequential 
writes, and >500MB/s in sustained random writes.


Also, let me point out that most of the tests were done so that the 
whole data set fits into shared_buffers, and with no checkpoints during 
the runs (so no writes to data files should really happen).


For example these tests were done on scale 3000 (45GB data set) with 
64GB shared buffers:


[a] 
http://tvondra.bitbucket.org/index2.html#pgbench-3000-unlogged-sync-noskip-64


[b] 
http://tvondra.bitbucket.org/index2.html#pgbench-3000-logged-async-noskip-64


and I could show similar cases with scale 300 on 16GB shared buffers.

In those cases, there's very little contention between WAL and the rest 
of the data base (in terms of I/O).


And moreover, this setup (single device for the whole cluster) is very 
common, we can't just neglect it.


But my main point here really is that the trade-off in those cases may 
not be really all that great, because you get the best performance at 
36/72 clients, and then the tps drops and variability increases. At 
least not right now, before tackling contention on the WAL lock (or 
whatever lock becomes the bottleneck).


regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Speed up Clog Access by increasing CLOG buffers

2016-10-31 Thread Amit Kapila
On Mon, Oct 31, 2016 at 7:02 PM, Tomas Vondra
 wrote:
>
> The remaining benchmark with 512 clog buffers completed, and the impact
> roughly matches Dilip's benchmark - that is, increasing the number of clog
> buffers eliminates all positive impact of the patches observed on 128
> buffers. Compare these two reports:
>
> [a] http://tvondra.bitbucket.org/#pgbench-3000-logged-sync-noskip-retest
>
> [b] http://tvondra.bitbucket.org/#pgbench-3000-logged-sync-noskip-retest-512
>
> With 128 buffers the group_update and granular_locking patches achieve up to
> 50k tps, while master and no_content_lock do ~30k tps. After increasing
> number of clog buffers, we get only ~30k in all cases.
>
> I'm not sure what's causing this, whether we're hitting limits of the simple
> LRU cache used for clog buffers, or something else.
>

I have also seen previously that increasing clog buffers to 256 can
impact performance negatively.  So, probably here the gains due to
group_update patch is negated due to the impact of increasing clog
buffers.   I am not sure if it is good idea to see the impact of
increasing clog buffers along with this patch.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Speed up Clog Access by increasing CLOG buffers

2016-10-31 Thread Amit Kapila
On Mon, Oct 31, 2016 at 12:02 AM, Tomas Vondra
 wrote:
> Hi,
>
> On 10/27/2016 01:44 PM, Amit Kapila wrote:
>
> I've read that analysis, but I'm not sure I see how it explains the "zig
> zag" behavior. I do understand that shifting the contention to some other
> (already busy) lock may negatively impact throughput, or that the
> group_update may result in updating multiple clog pages, but I don't
> understand two things:
>
> (1) Why this should result in the fluctuations we observe in some of the
> cases. For example, why should we see 150k tps on, 72 clients, then drop to
> 92k with 108 clients, then back to 130k on 144 clients, then 84k on 180
> clients etc. That seems fairly strange.
>

I don't think hitting multiple clog pages has much to do with
client-count.  However, we can wait to see your further detailed test
report.

> (2) Why this should affect all three patches, when only group_update has to
> modify multiple clog pages.
>

No, all three patches can be affected due to multiple clog pages.
Read second paragraph ("I think one of the probable reasons that could
happen for both the approaches") in same e-mail [1].  It is basically
due to frequent release-and-reacquire of locks.

>
>
>>> On logged tables it usually looks like this (i.e. modest increase for
>>> high
>>> client counts at the expense of significantly higher variability):
>>>
>>>   http://tvondra.bitbucket.org/#pgbench-3000-logged-sync-skip-64
>>>
>>
>> What variability are you referring to in those results?
>
>>
>
> Good question. What I mean by "variability" is how stable the tps is during
> the benchmark (when measured on per-second granularity). For example, let's
> run a 10-second benchmark, measuring number of transactions committed each
> second.
>
> Then all those runs do 1000 tps on average:
>
>   run 1: 1000, 1000, 1000, 1000, 1000, 1000, 1000, 1000, 1000, 1000
>   run 2: 500, 1500, 500, 1500, 500, 1500, 500, 1500, 500, 1500
>   run 3: 0, 2000, 0, 2000, 0, 2000, 0, 2000, 0, 2000
>

Generally, such behaviours are seen due to writes.  Are WAL and DATA
on same disk in your tests?


[1] - 
https://www.postgresql.org/message-id/CAA4eK1J9VxJUnpOiQDf0O%3DZ87QUMbw%3DuGcQr4EaGbHSCibx9yA%40mail.gmail.com


-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Speed up Clog Access by increasing CLOG buffers

2016-10-31 Thread Tomas Vondra

On 10/30/2016 07:32 PM, Tomas Vondra wrote:

Hi,

On 10/27/2016 01:44 PM, Amit Kapila wrote:

On Thu, Oct 27, 2016 at 4:15 AM, Tomas Vondra
 wrote:


FWIW I plan to run the same test with logged tables - if it shows
similar
regression, I'll be much more worried, because that's a fairly typical
scenario (logged tables, data set > shared buffers), and we surely can't
just go and break that.



Sure, please do those tests.



OK, so I do have results for those tests - that is, scale 3000 with
shared_buffers=16GB (so continuously writing out dirty buffers). The
following reports show the results slightly differently - all three "tps
charts" next to each other, then the speedup charts and tables.

Overall, the results are surprisingly positive - look at these results
(all ending with "-retest"):

[1] http://tvondra.bitbucket.org/index2.html#dilip-3000-logged-sync-retest

[2]
http://tvondra.bitbucket.org/index2.html#pgbench-3000-logged-sync-noskip-retest


[3]
http://tvondra.bitbucket.org/index2.html#pgbench-3000-logged-sync-skip-retest


All three show significant improvement, even with fairly low client
counts. For example with 72 clients, the tps improves 20%, without
significantly affecting variability variability of the results( measured
as stdddev, more on this later).

It's however interesting that "no_content_lock" is almost exactly the
same as master, while the other two cases improve significantly.

The other interesting thing is that "pgbench -N" [3] shows no such
improvement, unlike regular pgbench and Dilip's workload. Not sure why,
though - I'd expect to see significant improvement in this case.

I have also repeated those tests with clog buffers increased to 512 (so
4x the current maximum of 128). I only have results for Dilip's workload
and "pgbench -N":

[4]
http://tvondra.bitbucket.org/index2.html#dilip-3000-logged-sync-retest-512

[5]
http://tvondra.bitbucket.org/index2.html#pgbench-3000-logged-sync-skip-retest-512


The results are somewhat surprising, I guess, because the effect is
wildly different for each workload.

For Dilip's workload increasing clog buffers to 512 pretty much
eliminates all benefits of the patches. For example with 288 client,
the group_update patch gives ~60k tps on 128 buffers [1] but only 42k
tps on 512 buffers [4].

With "pgbench -N", the effect is exactly the opposite - while with
128 buffers there was pretty much no benefit from any of the patches
[3], with 512 buffers we suddenly get almost 2x the throughput, but
only for group_update and master (while the other two patches show no
improvement at all).



The remaining benchmark with 512 clog buffers completed, and the impact 
roughly matches Dilip's benchmark - that is, increasing the number of 
clog buffers eliminates all positive impact of the patches observed on 
128 buffers. Compare these two reports:


[a] http://tvondra.bitbucket.org/#pgbench-3000-logged-sync-noskip-retest

[b] http://tvondra.bitbucket.org/#pgbench-3000-logged-sync-noskip-retest-512

With 128 buffers the group_update and granular_locking patches achieve 
up to 50k tps, while master and no_content_lock do ~30k tps. After 
increasing number of clog buffers, we get only ~30k in all cases.


I'm not sure what's causing this, whether we're hitting limits of the 
simple LRU cache used for clog buffers, or something else. But maybe 
there's something in the design of clog buffers that make them work less 
efficiently with more clog buffers? I'm not sure whether that's 
something we need to fix before eventually committing any of them.


regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Speed up Clog Access by increasing CLOG buffers

2016-10-31 Thread Tomas Vondra

On 10/31/2016 05:01 AM, Jim Nasby wrote:

On 10/30/16 1:32 PM, Tomas Vondra wrote:


Now, maybe this has nothing to do with PostgreSQL itself, but maybe it's
some sort of CPU / OS scheduling artifact. For example, the system has
36 physical cores, 72 virtual ones (thanks to HT). I find it strange
that the "good" client counts are always multiples of 72, while the
"bad" ones fall in between.

  72 = 72 * 1   (good)
 108 = 72 * 1.5 (bad)
 144 = 72 * 2   (good)
 180 = 72 * 2.5 (bad)
 216 = 72 * 3   (good)
 252 = 72 * 3.5 (bad)
 288 = 72 * 4   (good)

So maybe this has something to do with how OS schedules the tasks, or
maybe some internal heuristics in the CPU, or something like that.


It might be enlightening to run a series of tests that are 72*.1 or *.2
apart (say, 72, 79, 86, ..., 137, 144).


Yeah, I've started a benchmark with client a step of 6 clients

36 42 48 54 60 66 72 78 ... 252 258 264 270 276 282 288

instead of just

36 72 108 144 180 216 252 288

which did a test every 36 clients. To compensate for the 6x longer runs, 
I'm only running tests for "group-update" and "master", so I should have 
the results in ~36h.


regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Speed up Clog Access by increasing CLOG buffers

2016-10-30 Thread Jim Nasby

On 10/30/16 1:32 PM, Tomas Vondra wrote:


Now, maybe this has nothing to do with PostgreSQL itself, but maybe it's
some sort of CPU / OS scheduling artifact. For example, the system has
36 physical cores, 72 virtual ones (thanks to HT). I find it strange
that the "good" client counts are always multiples of 72, while the
"bad" ones fall in between.

  72 = 72 * 1   (good)
 108 = 72 * 1.5 (bad)
 144 = 72 * 2   (good)
 180 = 72 * 2.5 (bad)
 216 = 72 * 3   (good)
 252 = 72 * 3.5 (bad)
 288 = 72 * 4   (good)

So maybe this has something to do with how OS schedules the tasks, or
maybe some internal heuristics in the CPU, or something like that.


It might be enlightening to run a series of tests that are 72*.1 or *.2 
apart (say, 72, 79, 86, ..., 137, 144).

--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com
855-TREBLE2 (855-873-2532)   mobile: 512-569-9461


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Speed up Clog Access by increasing CLOG buffers

2016-10-30 Thread Tomas Vondra

Hi,

On 10/27/2016 01:44 PM, Amit Kapila wrote:

On Thu, Oct 27, 2016 at 4:15 AM, Tomas Vondra
 wrote:


FWIW I plan to run the same test with logged tables - if it shows similar
regression, I'll be much more worried, because that's a fairly typical
scenario (logged tables, data set > shared buffers), and we surely can't
just go and break that.



Sure, please do those tests.



OK, so I do have results for those tests - that is, scale 3000 with 
shared_buffers=16GB (so continuously writing out dirty buffers). The 
following reports show the results slightly differently - all three "tps 
charts" next to each other, then the speedup charts and tables.


Overall, the results are surprisingly positive - look at these results 
(all ending with "-retest"):


[1] http://tvondra.bitbucket.org/index2.html#dilip-3000-logged-sync-retest

[2] 
http://tvondra.bitbucket.org/index2.html#pgbench-3000-logged-sync-noskip-retest


[3] 
http://tvondra.bitbucket.org/index2.html#pgbench-3000-logged-sync-skip-retest


All three show significant improvement, even with fairly low client 
counts. For example with 72 clients, the tps improves 20%, without 
significantly affecting variability variability of the results( measured 
as stdddev, more on this later).


It's however interesting that "no_content_lock" is almost exactly the 
same as master, while the other two cases improve significantly.


The other interesting thing is that "pgbench -N" [3] shows no such 
improvement, unlike regular pgbench and Dilip's workload. Not sure why, 
though - I'd expect to see significant improvement in this case.


I have also repeated those tests with clog buffers increased to 512 (so 
4x the current maximum of 128). I only have results for Dilip's workload 
and "pgbench -N":


[4] 
http://tvondra.bitbucket.org/index2.html#dilip-3000-logged-sync-retest-512


[5] 
http://tvondra.bitbucket.org/index2.html#pgbench-3000-logged-sync-skip-retest-512


The results are somewhat surprising, I guess, because the effect is 
wildly different for each workload.


For Dilip's workload increasing clog buffers to 512 pretty much 
eliminates all benefits of the patches. For example with 288 client, the 
group_update patch gives ~60k tps on 128 buffers [1] but only 42k tps on 
512 buffers [4].


With "pgbench -N", the effect is exactly the opposite - while with 128 
buffers there was pretty much no benefit from any of the patches [3], 
with 512 buffers we suddenly get almost 2x the throughput, but only for 
group_update and master (while the other two patches show no improvement 
at all).


I don't have results for the regular pgbench ("noskip") with 512 buffers 
yet, but I'm curious what that will show.


In general I however think that the patches don't show any regression in 
any of those workloads (at least not with 128 buffers). Based solely on 
the results, I like the group_update more, because it performs as good 
as master or significantly better.



2. We do see in some cases that granular_locking and
no_content_lock patches has shown significant increase in
contention on CLOGControlLock. I have already shared my analysis
for same upthread [8].




I've read that analysis, but I'm not sure I see how it explains the "zig 
zag" behavior. I do understand that shifting the contention to some 
other (already busy) lock may negatively impact throughput, or that the 
group_update may result in updating multiple clog pages, but I don't 
understand two things:


(1) Why this should result in the fluctuations we observe in some of the 
cases. For example, why should we see 150k tps on, 72 clients, then drop 
to 92k with 108 clients, then back to 130k on 144 clients, then 84k on 
180 clients etc. That seems fairly strange.


(2) Why this should affect all three patches, when only group_update has 
to modify multiple clog pages.


For example consider this:

http://tvondra.bitbucket.org/index2.html#dilip-300-logged-async

For example looking at % of time spent on different locks with the 
group_update patch, I see this (ignoring locks with ~1%):


 event_type wait_event   36   72  108  144  180  216  252  288
 -
 -  -60   63   45   53   38   50   33   48
 Client ClientRead   33   239   146   1048
 LWLockNamedCLogControlLock   27   33   14   34   14   33   14
 LWLockTranche  buffer_content029   13   19   18   26   22

I don't see any sign of contention shifting to other locks, just 
CLogControlLock fluctuating between 14% and 33% for some reason.


Now, maybe this has nothing to do with PostgreSQL itself, but maybe it's 
some sort of CPU / OS scheduling artifact. For example, the system has 
36 physical cores, 72 virtual ones (thanks to HT). I find it strange 
that the "good" client counts are always multiples of 72, while the 
"bad" ones fall in between.


  72 = 72 * 1   (good)
 

Re: [HACKERS] Speed up Clog Access by increasing CLOG buffers

2016-10-27 Thread Dilip Kumar
On Thu, Oct 27, 2016 at 5:14 PM, Amit Kapila  wrote:
>>> Thanks Tomas and Dilip for doing detailed performance tests for this
>>> patch.  I would like to summarise the performance testing results.
>>>
>>> 1. With update intensive workload, we are seeing gains from 23%~192%
>>> at client count >=64 with group_update patch [1].

this is with unlogged table

>>> 2. With tpc-b pgbench workload (at 1000 scale factor), we are seeing
>>> gains from 12% to ~70% at client count >=64 [2].  Tests are done on
>>> 8-socket intel   m/c.

this is with synchronous_commit=off

>>> 3. With pgbench workload (both simple-update and tpc-b at 300 scale
>>> factor), we are seeing gain 10% to > 50% at client count >=64 [3].
>>> Tests are done on 8-socket intel m/c.

this is with synchronous_commit=off

>>> 4. To see why the patch only helps at higher client count, we have
>>> done wait event testing for various workloads [4], [5] and the results
>>> indicate that at lower clients, the waits are mostly due to
>>> transactionid or clientread.  At client-counts where contention due to
>>> CLOGControlLock is significant, this patch helps a lot to reduce that
>>> contention.  These tests are done on on 8-socket intel m/c and
>>> 4-socket power m/c

these both are with synchronous_commit=off + unlogged table

>>> 5. With pgbench workload (unlogged tables), we are seeing gains from
>>> 15% to > 300% at client count >=72 [6].
>>>
>>
>> It's not entirely clear which of the above tests were done on unlogged
>> tables, and I don't see that in the referenced e-mails. That would be an
>> interesting thing to mention in the summary, I think.
>>
>
> One thing is clear that all results are on either
> synchronous_commit=off or on unlogged tables.  I think Dilip can
> answer better which of those are on unlogged and which on
> synchronous_commit=off.

I have mentioned this above under each of your test point..

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Speed up Clog Access by increasing CLOG buffers

2016-10-27 Thread Amit Kapila
On Thu, Oct 27, 2016 at 4:15 AM, Tomas Vondra
 wrote:
> On 10/25/2016 06:10 AM, Amit Kapila wrote:
>>
>> On Mon, Oct 24, 2016 at 2:48 PM, Dilip Kumar 
>> wrote:
>>>
>>> On Fri, Oct 21, 2016 at 7:57 AM, Dilip Kumar 
>>> wrote:

 On Thu, Oct 20, 2016 at 9:03 PM, Tomas Vondra
  wrote:

> In the results you've posted on 10/12, you've mentioned a regression
> with 32
> clients, where you got 52k tps on master but only 48k tps with the
> patch (so
> ~10% difference). I have no idea what scale was used for those tests,


 That test was with scale factor 300 on POWER 4 socket machine. I think
 I need to repeat this test with multiple reading to confirm it was
 regression or run to run variation. I will do that soon and post the
 results.
>>>
>>>
>>> As promised, I have rerun my test (3 times), and I did not see any
>>> regression.
>>>
>>
>> Thanks Tomas and Dilip for doing detailed performance tests for this
>> patch.  I would like to summarise the performance testing results.
>>
>> 1. With update intensive workload, we are seeing gains from 23%~192%
>> at client count >=64 with group_update patch [1].
>> 2. With tpc-b pgbench workload (at 1000 scale factor), we are seeing
>> gains from 12% to ~70% at client count >=64 [2].  Tests are done on
>> 8-socket intel   m/c.
>> 3. With pgbench workload (both simple-update and tpc-b at 300 scale
>> factor), we are seeing gain 10% to > 50% at client count >=64 [3].
>> Tests are done on 8-socket intel m/c.
>> 4. To see why the patch only helps at higher client count, we have
>> done wait event testing for various workloads [4], [5] and the results
>> indicate that at lower clients, the waits are mostly due to
>> transactionid or clientread.  At client-counts where contention due to
>> CLOGControlLock is significant, this patch helps a lot to reduce that
>> contention.  These tests are done on on 8-socket intel m/c and
>> 4-socket power m/c
>> 5. With pgbench workload (unlogged tables), we are seeing gains from
>> 15% to > 300% at client count >=72 [6].
>>
>
> It's not entirely clear which of the above tests were done on unlogged
> tables, and I don't see that in the referenced e-mails. That would be an
> interesting thing to mention in the summary, I think.
>

One thing is clear that all results are on either
synchronous_commit=off or on unlogged tables.  I think Dilip can
answer better which of those are on unlogged and which on
synchronous_commit=off.

>> There are many more tests done for the proposed patches where gains
>> are either or similar lines as above or are neutral.  We do see
>> regression in some cases.
>>
>> 1. When data doesn't fit in shared buffers, there is regression at
>> some client counts [7], but on analysis it has been found that it is
>> mainly due to the shift in contention from CLOGControlLock to
>> WALWriteLock and or other locks.
>
>
> The questions is why shifting the lock contention to WALWriteLock should
> cause such significant performance drop, particularly when the test was done
> on unlogged tables. Or, if that's the case, how it makes the performance
> drop less problematic / acceptable.
>

Whenever the contention shifts to other lock, there is a chance that
it can show performance dip in some cases and I have seen that
previously as well. The theory behind that could be like this, say you
have two locks L1 and L2, and there are 100 processes that are
contending on L1 and 50 on L2.  Now say, you reduce contention on L1
such that it leads to 120 processes contending on L2, so increased
contention on L2 can slowdown the overall throughput of all processes.

> FWIW I plan to run the same test with logged tables - if it shows similar
> regression, I'll be much more worried, because that's a fairly typical
> scenario (logged tables, data set > shared buffers), and we surely can't
> just go and break that.
>

Sure, please do those tests.

>> 2. We do see in some cases that granular_locking and no_content_lock
>> patches has shown significant increase in contention on
>> CLOGControlLock.  I have already shared my analysis for same upthread
>> [8].
>
>
> I do agree that some cases this significantly reduces contention on the
> CLogControlLock. I do however think that currently the performance gains are
> limited almost exclusively to cases on unlogged tables, and some
> logged+async cases.
>

Right, because the contention is mainly visible for those workloads.

> On logged tables it usually looks like this (i.e. modest increase for high
> client counts at the expense of significantly higher variability):
>
>   http://tvondra.bitbucket.org/#pgbench-3000-logged-sync-skip-64
>

What variability are you referring to in those results?

> or like this (i.e. only partial recovery for the drop above 36 clients):
>
>   http://tvondra.bitbucket.org/#pgbench-3000-logged-async-skip-64
>
> And 

Re: [HACKERS] Speed up Clog Access by increasing CLOG buffers

2016-10-26 Thread Tomas Vondra

On 10/25/2016 06:10 AM, Amit Kapila wrote:

On Mon, Oct 24, 2016 at 2:48 PM, Dilip Kumar  wrote:

On Fri, Oct 21, 2016 at 7:57 AM, Dilip Kumar  wrote:

On Thu, Oct 20, 2016 at 9:03 PM, Tomas Vondra
 wrote:


In the results you've posted on 10/12, you've mentioned a regression with 32
clients, where you got 52k tps on master but only 48k tps with the patch (so
~10% difference). I have no idea what scale was used for those tests,


That test was with scale factor 300 on POWER 4 socket machine. I think
I need to repeat this test with multiple reading to confirm it was
regression or run to run variation. I will do that soon and post the
results.


As promised, I have rerun my test (3 times), and I did not see any regression.



Thanks Tomas and Dilip for doing detailed performance tests for this
patch.  I would like to summarise the performance testing results.

1. With update intensive workload, we are seeing gains from 23%~192%
at client count >=64 with group_update patch [1].
2. With tpc-b pgbench workload (at 1000 scale factor), we are seeing
gains from 12% to ~70% at client count >=64 [2].  Tests are done on
8-socket intel   m/c.
3. With pgbench workload (both simple-update and tpc-b at 300 scale
factor), we are seeing gain 10% to > 50% at client count >=64 [3].
Tests are done on 8-socket intel m/c.
4. To see why the patch only helps at higher client count, we have
done wait event testing for various workloads [4], [5] and the results
indicate that at lower clients, the waits are mostly due to
transactionid or clientread.  At client-counts where contention due to
CLOGControlLock is significant, this patch helps a lot to reduce that
contention.  These tests are done on on 8-socket intel m/c and
4-socket power m/c
5. With pgbench workload (unlogged tables), we are seeing gains from
15% to > 300% at client count >=72 [6].



It's not entirely clear which of the above tests were done on unlogged 
tables, and I don't see that in the referenced e-mails. That would be an 
interesting thing to mention in the summary, I think.



There are many more tests done for the proposed patches where gains
are either or similar lines as above or are neutral.  We do see
regression in some cases.

1. When data doesn't fit in shared buffers, there is regression at
some client counts [7], but on analysis it has been found that it is
mainly due to the shift in contention from CLOGControlLock to
WALWriteLock and or other locks.


The questions is why shifting the lock contention to WALWriteLock should 
cause such significant performance drop, particularly when the test was 
done on unlogged tables. Or, if that's the case, how it makes the 
performance drop less problematic / acceptable.


FWIW I plan to run the same test with logged tables - if it shows 
similar regression, I'll be much more worried, because that's a fairly 
typical scenario (logged tables, data set > shared buffers), and we 
surely can't just go and break that.



2. We do see in some cases that granular_locking and no_content_lock
patches has shown significant increase in contention on
CLOGControlLock.  I have already shared my analysis for same upthread
[8].


I do agree that some cases this significantly reduces contention on the 
CLogControlLock. I do however think that currently the performance gains 
are limited almost exclusively to cases on unlogged tables, and some 
logged+async cases.


On logged tables it usually looks like this (i.e. modest increase for 
high client counts at the expense of significantly higher variability):


  http://tvondra.bitbucket.org/#pgbench-3000-logged-sync-skip-64

or like this (i.e. only partial recovery for the drop above 36 clients):

  http://tvondra.bitbucket.org/#pgbench-3000-logged-async-skip-64

And of course, there are cases like this:

  http://tvondra.bitbucket.org/#dilip-300-logged-async

I'd really like to understand why the patched results behave that 
differently depending on client count.


>
> Attached is the latest group update clog patch.
>

How is that different from the previous versions?

>

In last commit fest, the patch was returned with feedback to evaluate
the cases where it can show win and I think above results indicates
that the patch has significant benefit on various workloads.  What I
think is pending at this stage is the either one of the committer or
the reviewers of this patch needs to provide feedback on my analysis
[8] for the cases where patches are not showing win.

Thoughts?



I do agree the patch(es) significantly reduce CLogControlLock, although 
with WAL logging enabled (which is what matters for most production 
deployments) it pretty much only shifts the contention to a different 
lock (so the immediate performance benefit is 0).


Which raises the question why to commit this patch now, before we have a 
patch addressing the WAL locks. I realize this is a chicken-egg problem, 
but my worry is that the 

Re: [HACKERS] Speed up Clog Access by increasing CLOG buffers

2016-10-24 Thread Amit Kapila
On Mon, Oct 24, 2016 at 2:48 PM, Dilip Kumar  wrote:
> On Fri, Oct 21, 2016 at 7:57 AM, Dilip Kumar  wrote:
>> On Thu, Oct 20, 2016 at 9:03 PM, Tomas Vondra
>>  wrote:
>>
>>> In the results you've posted on 10/12, you've mentioned a regression with 32
>>> clients, where you got 52k tps on master but only 48k tps with the patch (so
>>> ~10% difference). I have no idea what scale was used for those tests,
>>
>> That test was with scale factor 300 on POWER 4 socket machine. I think
>> I need to repeat this test with multiple reading to confirm it was
>> regression or run to run variation. I will do that soon and post the
>> results.
>
> As promised, I have rerun my test (3 times), and I did not see any regression.
>

Thanks Tomas and Dilip for doing detailed performance tests for this
patch.  I would like to summarise the performance testing results.

1. With update intensive workload, we are seeing gains from 23%~192%
at client count >=64 with group_update patch [1].
2. With tpc-b pgbench workload (at 1000 scale factor), we are seeing
gains from 12% to ~70% at client count >=64 [2].  Tests are done on
8-socket intel   m/c.
3. With pgbench workload (both simple-update and tpc-b at 300 scale
factor), we are seeing gain 10% to > 50% at client count >=64 [3].
Tests are done on 8-socket intel m/c.
4. To see why the patch only helps at higher client count, we have
done wait event testing for various workloads [4], [5] and the results
indicate that at lower clients, the waits are mostly due to
transactionid or clientread.  At client-counts where contention due to
CLOGControlLock is significant, this patch helps a lot to reduce that
contention.  These tests are done on on 8-socket intel m/c and
4-socket power m/c
5. With pgbench workload (unlogged tables), we are seeing gains from
15% to > 300% at client count >=72 [6].

There are many more tests done for the proposed patches where gains
are either or similar lines as above or are neutral.  We do see
regression in some cases.

1. When data doesn't fit in shared buffers, there is regression at
some client counts [7], but on analysis it has been found that it is
mainly due to the shift in contention from CLOGControlLock to
WALWriteLock and or other locks.
2. We do see in some cases that granular_locking and no_content_lock
patches has shown significant increase in contention on
CLOGControlLock.  I have already shared my analysis for same upthread
[8].

Attached is the latest group update clog patch.

In last commit fest, the patch was returned with feedback to evaluate
the cases where it can show win and I think above results indicates
that the patch has significant benefit on various workloads.  What I
think is pending at this stage is the either one of the committer or
the reviewers of this patch needs to provide feedback on my analysis
[8] for the cases where patches are not showing win.

Thoughts?

[1] - 
https://www.postgresql.org/message-id/CAFiTN-u-XEzhd%3DhNGW586fmQwdTy6Qy6_SXe09tNB%3DgBcVzZ_A%40mail.gmail.com
[2] - 
https://www.postgresql.org/message-id/CAFiTN-tr_%3D25EQUFezKNRk%3D4N-V%2BD6WMxo7HWs9BMaNx7S3y6w%40mail.gmail.com
[3] - 
https://www.postgresql.org/message-id/CAFiTN-v5hm1EO4cLXYmpppYdNQk%2Bn4N-O1m%2B%2B3U9f0Ga1gBzRQ%40mail.gmail.com
[4] - 
https://www.postgresql.org/message-id/CAFiTN-taV4iVkPHrxg%3DYCicKjBS6%3DQZm_cM4hbS_2q2ryLhUUw%40mail.gmail.com
[5] - 
https://www.postgresql.org/message-id/CAFiTN-uQ%2BJbd31cXvRbj48Ba6TqDUDpLKSPnsUCCYRju0Y0U8Q%40mail.gmail.com
[6] - http://tvondra.bitbucket.org/#pgbench-300-unlogged-sync-skip
[7] - http://tvondra.bitbucket.org/#pgbench-3000-unlogged-sync-skip
[8] - 
https://www.postgresql.org/message-id/CAA4eK1J9VxJUnpOiQDf0O%3DZ87QUMbw%3DuGcQr4EaGbHSCibx9yA%40mail.gmail.com

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


group_update_clog_v9.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Speed up Clog Access by increasing CLOG buffers

2016-10-24 Thread Dilip Kumar
On Fri, Oct 21, 2016 at 7:57 AM, Dilip Kumar  wrote:
> On Thu, Oct 20, 2016 at 9:03 PM, Tomas Vondra
>  wrote:
>
>> In the results you've posted on 10/12, you've mentioned a regression with 32
>> clients, where you got 52k tps on master but only 48k tps with the patch (so
>> ~10% difference). I have no idea what scale was used for those tests,
>
> That test was with scale factor 300 on POWER 4 socket machine. I think
> I need to repeat this test with multiple reading to confirm it was
> regression or run to run variation. I will do that soon and post the
> results.

As promised, I have rerun my test (3 times), and I did not see any regression.
Median of 3 run on both head and with group lock patch are same.
However I am posting results of all three runs.

I think in my earlier reading, we saw TPS ~48K with the patch, but I
think over multiple run we get this reading with both head as well as
with patch.

Head:

run1:

transaction type: 
scaling factor: 300
query mode: prepared
number of clients: 32
number of threads: 32
duration: 1800 s
number of transactions actually processed: 87784836
latency average = 0.656 ms
tps = 48769.327513 (including connections establishing)
tps = 48769.543276 (excluding connections establishing)

run2:
transaction type: 
scaling factor: 300
query mode: prepared
number of clients: 32
number of threads: 32
duration: 1800 s
number of transactions actually processed: 91240374
latency average = 0.631 ms
tps = 50689.069717 (including connections establishing)
tps = 50689.263505 (excluding connections establishing)

run3:
transaction type: 
scaling factor: 300
query mode: prepared
number of clients: 32
number of threads: 32
duration: 1800 s
number of transactions actually processed: 90966003
latency average = 0.633 ms
tps = 50536.639303 (including connections establishing)
tps = 50536.836924 (excluding connections establishing)

With group lock patch:
--
run1:
transaction type: 
scaling factor: 300
query mode: prepared
number of clients: 32
number of threads: 32
duration: 1800 s
number of transactions actually processed: 87316264
latency average = 0.660 ms
tps = 48509.008040 (including connections establishing)
tps = 48509.194978 (excluding connections establishing)

run2:
transaction type: 
scaling factor: 300
query mode: prepared
number of clients: 32
number of threads: 32
duration: 1800 s
number of transactions actually processed: 91950412
latency average = 0.626 ms
tps = 51083.507790 (including connections establishing)
tps = 51083.704489 (excluding connections establishing)

run3:
transaction type: 
scaling factor: 300
query mode: prepared
number of clients: 32
number of threads: 32
duration: 1800 s
number of transactions actually processed: 90378462
latency average = 0.637 ms
tps = 50210.225983 (including connections establishing)
tps = 50210.405401 (excluding connections establishing)

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Speed up Clog Access by increasing CLOG buffers

2016-10-21 Thread Amit Kapila
On Fri, Oct 21, 2016 at 1:07 PM, Tomas Vondra
 wrote:
> On 10/21/2016 08:13 AM, Amit Kapila wrote:
>>
>> On Fri, Oct 21, 2016 at 6:31 AM, Robert Haas 
>> wrote:
>>>
>>> On Thu, Oct 20, 2016 at 4:04 PM, Tomas Vondra
>>>  wrote:
>
> I then started a run at 96 clients which I accidentally killed shortly
> before it was scheduled to finish, but the results are not much
> different; there is no hint of the runaway CLogControlLock contention
> that Dilip sees on power2.
>
 What shared_buffer size were you using? I assume the data set fit into
 shared buffers, right?
>>>
>>>
>>> 8GB.
>>>
 FWIW as I explained in the lengthy post earlier today, I can actually
 reproduce the significant CLogControlLock contention (and the patches do
 reduce it), even on x86_64.
>>>
>>>
>>> /me goes back, rereads post.  Sorry, I didn't look at this carefully
>>> the first time.
>>>
 For example consider these two tests:

 * http://tvondra.bitbucket.org/#dilip-300-unlogged-sync
 * http://tvondra.bitbucket.org/#pgbench-300-unlogged-sync-skip

 However, it seems I can also reproduce fairly bad regressions, like for
 example this case with data set exceeding shared_buffers:

 * http://tvondra.bitbucket.org/#pgbench-3000-unlogged-sync-skip
>>>
>>>
>>> I'm not sure how seriously we should take the regressions.  I mean,
>>> what I see there is that CLogControlLock contention goes down by about
>>> 50% -- which is the point of the patch -- and WALWriteLock contention
>>> goes up dramatically -- which sucks, but can't really be blamed on the
>>> patch except in the indirect sense that a backend can't spend much
>>> time waiting for A if it's already spending all of its time waiting
>>> for B.
>>>
>>
>> Right, I think not only WALWriteLock, but contention on other locks
>> also goes up as you can see in below table.  I think there is nothing
>> much we can do for that with this patch.  One thing which is unclear
>> is why on unlogged tests it is showing WALWriteLock?
>>
>
> Well, although we don't write the table data to the WAL, we still need to
> write commits and other stuff, right?
>

We do need to write commit, but do we need to flush it immediately to
WAL for unlogged tables?  It seems we allow WALWriter to do that,
refer logic in RecordTransactionCommit.

 And on scale 3000 (which exceeds the
> 16GB shared buffers in this case), there's a continuous stream of dirty
> pages (not to WAL, but evicted from shared buffers), so iostat looks like
> this:
>
>   timetps  wr_sec/s  avgrq-sz  avgqu-sz await   %util
>   08:48:21  81654   1367483 16.75 127264.60   1294.80   97.41
>   08:48:31  41514697516 16.80 103271.11   3015.01   97.64
>   08:48:41  78892   1359779 17.24  97308.42928.36   96.76
>   08:48:51  58735978475 16.66  92303.00   1472.82   95.92
>   08:49:01  62441   1068605 17.11  78482.71   1615.56   95.57
>   08:49:11  55571945365 17.01 113672.62   1923.37   98.07
>   08:49:21  69016   1161586 16.83  87055.66   1363.05   95.53
>   08:49:31  54552913461 16.74  98695.87   1761.30   97.84
>
> That's ~500-600 MB/s of continuous writes. I'm sure the storage could handle
> more than this (will do some testing after the tests complete), but surely
> the WAL has to compete for bandwidth (it's on the same volume / devices).
> Another thing is that we only have 8 WAL insert locks, and maybe that leads
> to contention with such high client counts.
>

Yeah, quite possible, but I don't think increasing that would benefit
in general, because while writing WAL we need to take all the
wal_insert locks. In any case, I think that is a separate problem to
study.


-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Speed up Clog Access by increasing CLOG buffers

2016-10-21 Thread Tomas Vondra

On 10/21/2016 08:13 AM, Amit Kapila wrote:

On Fri, Oct 21, 2016 at 6:31 AM, Robert Haas  wrote:

On Thu, Oct 20, 2016 at 4:04 PM, Tomas Vondra
 wrote:

I then started a run at 96 clients which I accidentally killed shortly
before it was scheduled to finish, but the results are not much
different; there is no hint of the runaway CLogControlLock contention
that Dilip sees on power2.


What shared_buffer size were you using? I assume the data set fit into
shared buffers, right?


8GB.


FWIW as I explained in the lengthy post earlier today, I can actually
reproduce the significant CLogControlLock contention (and the patches do
reduce it), even on x86_64.


/me goes back, rereads post.  Sorry, I didn't look at this carefully
the first time.


For example consider these two tests:

* http://tvondra.bitbucket.org/#dilip-300-unlogged-sync
* http://tvondra.bitbucket.org/#pgbench-300-unlogged-sync-skip

However, it seems I can also reproduce fairly bad regressions, like for
example this case with data set exceeding shared_buffers:

* http://tvondra.bitbucket.org/#pgbench-3000-unlogged-sync-skip


I'm not sure how seriously we should take the regressions.  I mean,
what I see there is that CLogControlLock contention goes down by about
50% -- which is the point of the patch -- and WALWriteLock contention
goes up dramatically -- which sucks, but can't really be blamed on the
patch except in the indirect sense that a backend can't spend much
time waiting for A if it's already spending all of its time waiting
for B.



Right, I think not only WALWriteLock, but contention on other locks
also goes up as you can see in below table.  I think there is nothing
much we can do for that with this patch.  One thing which is unclear
is why on unlogged tests it is showing WALWriteLock?



Well, although we don't write the table data to the WAL, we still need 
to write commits and other stuff, right? And on scale 3000 (which 
exceeds the 16GB shared buffers in this case), there's a continuous 
stream of dirty pages (not to WAL, but evicted from shared buffers), so 
iostat looks like this:


  timetps  wr_sec/s  avgrq-sz  avgqu-sz await   %util
  08:48:21  81654   1367483 16.75 127264.60   1294.80   97.41
  08:48:31  41514697516 16.80 103271.11   3015.01   97.64
  08:48:41  78892   1359779 17.24  97308.42928.36   96.76
  08:48:51  58735978475 16.66  92303.00   1472.82   95.92
  08:49:01  62441   1068605 17.11  78482.71   1615.56   95.57
  08:49:11  55571945365 17.01 113672.62   1923.37   98.07
  08:49:21  69016   1161586 16.83  87055.66   1363.05   95.53
  08:49:31  54552913461 16.74  98695.87   1761.30   97.84

That's ~500-600 MB/s of continuous writes. I'm sure the storage could 
handle more than this (will do some testing after the tests complete), 
but surely the WAL has to compete for bandwidth (it's on the same volume 
/ devices). Another thing is that we only have 8 WAL insert locks, and 
maybe that leads to contention with such high client counts.


regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Speed up Clog Access by increasing CLOG buffers

2016-10-21 Thread Amit Kapila
On Fri, Oct 21, 2016 at 6:31 AM, Robert Haas  wrote:
> On Thu, Oct 20, 2016 at 4:04 PM, Tomas Vondra
>  wrote:
>>> I then started a run at 96 clients which I accidentally killed shortly
>>> before it was scheduled to finish, but the results are not much
>>> different; there is no hint of the runaway CLogControlLock contention
>>> that Dilip sees on power2.
>>>
>> What shared_buffer size were you using? I assume the data set fit into
>> shared buffers, right?
>
> 8GB.
>
>> FWIW as I explained in the lengthy post earlier today, I can actually
>> reproduce the significant CLogControlLock contention (and the patches do
>> reduce it), even on x86_64.
>
> /me goes back, rereads post.  Sorry, I didn't look at this carefully
> the first time.
>
>> For example consider these two tests:
>>
>> * http://tvondra.bitbucket.org/#dilip-300-unlogged-sync
>> * http://tvondra.bitbucket.org/#pgbench-300-unlogged-sync-skip
>>
>> However, it seems I can also reproduce fairly bad regressions, like for
>> example this case with data set exceeding shared_buffers:
>>
>> * http://tvondra.bitbucket.org/#pgbench-3000-unlogged-sync-skip
>
> I'm not sure how seriously we should take the regressions.  I mean,
> what I see there is that CLogControlLock contention goes down by about
> 50% -- which is the point of the patch -- and WALWriteLock contention
> goes up dramatically -- which sucks, but can't really be blamed on the
> patch except in the indirect sense that a backend can't spend much
> time waiting for A if it's already spending all of its time waiting
> for B.
>

Right, I think not only WALWriteLock, but contention on other locks
also goes up as you can see in below table.  I think there is nothing
much we can do for that with this patch.  One thing which is unclear
is why on unlogged tests it is showing WALWriteLock?


  test   | clients |
wait_event_type |  wait_event  | master  | granular_locking |
no_content_lock | group_update
--+-+-+--+-+--+-+--

pgbench-3000-unlogged-sync-skip  |  72 |
LWLockNamed | CLogControlLock  |  217012 |37326 |
 32288 |12040
pgbench-3000-unlogged-sync-skip  |  72 |
LWLockNamed | WALWriteLock |   13188 |   104183 |
123359 |   103267
pgbench-3000-unlogged-sync-skip  |  72 |
LWLockTranche   | buffer_content   |   10532 |65880 |
 57007 |86176
pgbench-3000-unlogged-sync-skip  |  72 |
LWLockTranche   | wal_insert   |9280 |85917 |
109472 |99609
pgbench-3000-unlogged-sync-skip  |  72 |
LWLockTranche   | clog |4623 |25692 |
 10422 |11755




-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Speed up Clog Access by increasing CLOG buffers

2016-10-20 Thread Dilip Kumar
On Thu, Oct 20, 2016 at 9:15 PM, Robert Haas  wrote:
> So here's my theory.  The whole reason why Tomas is having difficulty
> seeing any big effect from these patches is because he's testing on
> x86.  When Dilip tests on x86, he doesn't see a big effect either,
> regardless of workload.  But when Dilip tests on POWER, which I think
> is where he's mostly been testing, he sees a huge effect, because for
> some reason POWER has major problems with this lock that don't exist
> on x86.

Right, because on POWER we can see big contention on ClogControlLock
with 300 scale factor, even at 96 client count, but on X86 with 300
scan factor there is almost no contention on ClogControlLock.

However at 1000 scale factor we can see significant contention on
ClogControlLock on X86 machine.

I want to test on POWER with 1000 scale factor to see whether
contention on ClogControlLock become much worse ?

I will run this test and post the results.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Speed up Clog Access by increasing CLOG buffers

2016-10-20 Thread Dilip Kumar
On Thu, Oct 20, 2016 at 9:03 PM, Tomas Vondra
 wrote:

> In the results you've posted on 10/12, you've mentioned a regression with 32
> clients, where you got 52k tps on master but only 48k tps with the patch (so
> ~10% difference). I have no idea what scale was used for those tests,

That test was with scale factor 300 on POWER 4 socket machine. I think
I need to repeat this test with multiple reading to confirm it was
regression or run to run variation. I will do that soon and post the
results.

> and I
> see no such regression in the current results (but you only report results
> for some of the client counts).

This test is on X86 8 socket machine, At 1000 scale factor I have
given reading with all client counts (32,64,96,192), but at 300 scale
factor I posted only with 192 because on this machine (X86 8 socket
machine) I did not see much load on ClogControlLock at 300 scale
factor.
>
> Also, which of the proposed patches have you been testing?
I tested with GroupLock patch.

> Can you collect and share a more complete set of data, perhaps based on the
> scripts I use to do tests on the large machine with 36/72 cores, available
> at https://bitbucket.org/tvondra/hp05-results ?

I think from my last run I did not share data for -> X86 8 socket
machine, 300 scale factor, 32,64,96 client. I already have those data
so I ma sharing it. (Please let me know if you want to see at some
other client count, for that I need to run another test.)

Head:
scaling factor: 300
query mode: prepared
number of clients: 32
number of threads: 32
duration: 1800 s
number of transactions actually processed: 77233356
latency average: 0.746 ms
tps = 42907.363243 (including connections establishing)
tps = 42907.546190 (excluding connections establishing)
[dilip.kumar@cthulhu bin]$ cat 300_32_ul.txt
 111757  |
   3666
   1289  LWLockNamed | ProcArrayLock
   1142  Lock| transactionid
318  LWLockNamed | CLogControlLock
299  Lock| extend
109  LWLockNamed | XidGenLock
 70  LWLockTranche   | buffer_content
 35  Lock| tuple
 29  LWLockTranche   | lock_manager
 14  LWLockTranche   | wal_insert
  1 Tuples only is on.
  1  LWLockNamed | CheckpointerCommLock

Group Lock Patch:

scaling factor: 300
query mode: prepared
number of clients: 32
number of threads: 32
duration: 1800 s
number of transactions actually processed: 77544028
latency average: 0.743 ms
tps = 43079.783906 (including connections establishing)
tps = 43079.960331 (excluding connections establishing
 112209  |
   3718
   1402  LWLockNamed | ProcArrayLock
   1070  Lock| transactionid
245  LWLockNamed | CLogControlLock
188  Lock| extend
 80  LWLockNamed | XidGenLock
 76  LWLockTranche   | buffer_content
 39  LWLockTranche   | lock_manager
 31  Lock| tuple
  7  LWLockTranche   | wal_insert
  1 Tuples only is on.
  1  LWLockTranche   | buffer_mapping

Head:
number of clients: 64
number of threads: 64
duration: 1800 s
number of transactions actually processed: 76211698
latency average: 1.512 ms
tps = 42339.731054 (including connections establishing)
tps = 42339.930464 (excluding connections establishing)
[dilip.kumar@cthulhu bin]$ cat 300_64_ul.txt
 215734  |
   5106  Lock| transactionid
   3754  LWLockNamed | ProcArrayLock
   3669
   3267  LWLockNamed | CLogControlLock
661  Lock| extend
339  LWLockNamed | XidGenLock
310  Lock| tuple
289  LWLockTranche   | buffer_content
205  LWLockTranche   | lock_manager
 50  LWLockTranche   | wal_insert
  2  LWLockTranche   | buffer_mapping
  1 Tuples only is on.
  1  LWLockTranche   | proc

GroupLock patch:
scaling factor: 300
query mode: prepared
number of clients: 64
number of threads: 64
duration: 1800 s
number of transactions actually processed: 76629309
latency average: 1.503 ms
tps = 42571.704635 (including connections establishing)
tps = 42571.905157 (excluding connections establishing)
[dilip.kumar@cthulhu bin]$ cat 300_64_ul.txt
 217840  |
   5197  Lock| transactionid
   3744  LWLockNamed | ProcArrayLock
   3663
966  Lock| extend
849  LWLockNamed | CLogControlLock
372  Lock| tuple
305  LWLockNamed | XidGenLock
199  LWLockTranche   | buffer_content
184  LWLockTranche   | lock_manager
 35  LWLockTranche   | wal_insert
  1 Tuples only is on.
  1  LWLockTranche   | proc
  1  LWLockTranche   | buffer_mapping

Head:
scaling factor: 300
query mode: prepared
number of clients: 96
number of threads: 96
duration: 1800 s
number of transactions actually processed: 77663593
latency average: 2.225 ms
tps = 43145.624864 (including connections establishing)
tps = 43145.838167 (excluding connections establishing)

 302317 

Re: [HACKERS] Speed up Clog Access by increasing CLOG buffers

2016-10-20 Thread Robert Haas
On Thu, Oct 20, 2016 at 4:04 PM, Tomas Vondra
 wrote:
>> I then started a run at 96 clients which I accidentally killed shortly
>> before it was scheduled to finish, but the results are not much
>> different; there is no hint of the runaway CLogControlLock contention
>> that Dilip sees on power2.
>>
> What shared_buffer size were you using? I assume the data set fit into
> shared buffers, right?

8GB.

> FWIW as I explained in the lengthy post earlier today, I can actually
> reproduce the significant CLogControlLock contention (and the patches do
> reduce it), even on x86_64.

/me goes back, rereads post.  Sorry, I didn't look at this carefully
the first time.

> For example consider these two tests:
>
> * http://tvondra.bitbucket.org/#dilip-300-unlogged-sync
> * http://tvondra.bitbucket.org/#pgbench-300-unlogged-sync-skip
>
> However, it seems I can also reproduce fairly bad regressions, like for
> example this case with data set exceeding shared_buffers:
>
> * http://tvondra.bitbucket.org/#pgbench-3000-unlogged-sync-skip

I'm not sure how seriously we should take the regressions.  I mean,
what I see there is that CLogControlLock contention goes down by about
50% -- which is the point of the patch -- and WALWriteLock contention
goes up dramatically -- which sucks, but can't really be blamed on the
patch except in the indirect sense that a backend can't spend much
time waiting for A if it's already spending all of its time waiting
for B.  It would be nice to know why it happened, but we shouldn't
allow CLogControlLock to act as an admission control facility for
WALWriteLock (I think).

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Speed up Clog Access by increasing CLOG buffers

2016-10-20 Thread Tomas Vondra

On 10/20/2016 07:59 PM, Robert Haas wrote:

On Thu, Oct 20, 2016 at 11:45 AM, Robert Haas  wrote:

On Thu, Oct 20, 2016 at 3:36 AM, Dilip Kumar  wrote:

On Thu, Oct 13, 2016 at 12:25 AM, Robert Haas  wrote:

>>

...

So here's my theory.  The whole reason why Tomas is having difficulty
seeing any big effect from these patches is because he's testing on
x86.  When Dilip tests on x86, he doesn't see a big effect either,
regardless of workload.  But when Dilip tests on POWER, which I think
is where he's mostly been testing, he sees a huge effect, because for
some reason POWER has major problems with this lock that don't exist
on x86.

If that's so, then we ought to be able to reproduce the big gains on
hydra, a community POWER server.  In fact, I think I'll go run a quick
test over there right now...


And ... nope.  I ran a 30-minute pgbench test on unpatched master
using unlogged tables at scale factor 300 with 64 clients and got
these results:

 14  LWLockTranche   | wal_insert
 36  LWLockTranche   | lock_manager
 45  LWLockTranche   | buffer_content
223  Lock| tuple
527  LWLockNamed | CLogControlLock
921  Lock| extend
   1195  LWLockNamed | XidGenLock
   1248  LWLockNamed | ProcArrayLock
   3349  Lock| transactionid
  85957  Client  | ClientRead
 135935  |

I then started a run at 96 clients which I accidentally killed shortly
before it was scheduled to finish, but the results are not much
different; there is no hint of the runaway CLogControlLock contention
that Dilip sees on power2.



What shared_buffer size were you using? I assume the data set fit into 
shared buffers, right?


FWIW as I explained in the lengthy post earlier today, I can actually 
reproduce the significant CLogControlLock contention (and the patches do 
reduce it), even on x86_64.


For example consider these two tests:

* http://tvondra.bitbucket.org/#dilip-300-unlogged-sync
* http://tvondra.bitbucket.org/#pgbench-300-unlogged-sync-skip

However, it seems I can also reproduce fairly bad regressions, like for 
example this case with data set exceeding shared_buffers:


* http://tvondra.bitbucket.org/#pgbench-3000-unlogged-sync-skip

regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Speed up Clog Access by increasing CLOG buffers

2016-10-20 Thread Robert Haas
On Thu, Oct 20, 2016 at 11:45 AM, Robert Haas  wrote:
> On Thu, Oct 20, 2016 at 3:36 AM, Dilip Kumar  wrote:
>> On Thu, Oct 13, 2016 at 12:25 AM, Robert Haas  wrote:
>>> I agree with these conclusions.  I had a chance to talk with Andres
>>> this morning at Postgres Vision and based on that conversation I'd
>>> like to suggest a couple of additional tests:
>>>
>>> 1. Repeat this test on x86.  In particular, I think you should test on
>>> the EnterpriseDB server cthulhu, which is an 8-socket x86 server.
>>
>> I have done my test on cthulhu, basic difference is that In POWER we
>> saw ClogControlLock on top at 96 and more client with 300 scale
>> factor. But, on cthulhu at 300 scale factor transactionid lock is
>> always on top. So I repeated my test with 1000 scale factor as well on
>> cthulhu.
>
> So the upshot appears to be that this problem is a lot worse on power2
> than cthulhu, which suggests that this is architecture-dependent.  I
> guess it could also be kernel-dependent, but it doesn't seem likely,
> because:
>
> power2: Red Hat Enterprise Linux Server release 7.1 (Maipo),
> 3.10.0-229.14.1.ael7b.ppc64le
> cthulhu: CentOS Linux release 7.2.1511 (Core), 3.10.0-229.7.2.el7.x86_64
>
> So here's my theory.  The whole reason why Tomas is having difficulty
> seeing any big effect from these patches is because he's testing on
> x86.  When Dilip tests on x86, he doesn't see a big effect either,
> regardless of workload.  But when Dilip tests on POWER, which I think
> is where he's mostly been testing, he sees a huge effect, because for
> some reason POWER has major problems with this lock that don't exist
> on x86.
>
> If that's so, then we ought to be able to reproduce the big gains on
> hydra, a community POWER server.  In fact, I think I'll go run a quick
> test over there right now...

And ... nope.  I ran a 30-minute pgbench test on unpatched master
using unlogged tables at scale factor 300 with 64 clients and got
these results:

 14  LWLockTranche   | wal_insert
 36  LWLockTranche   | lock_manager
 45  LWLockTranche   | buffer_content
223  Lock| tuple
527  LWLockNamed | CLogControlLock
921  Lock| extend
   1195  LWLockNamed | XidGenLock
   1248  LWLockNamed | ProcArrayLock
   3349  Lock| transactionid
  85957  Client  | ClientRead
 135935  |

I then started a run at 96 clients which I accidentally killed shortly
before it was scheduled to finish, but the results are not much
different; there is no hint of the runaway CLogControlLock contention
that Dilip sees on power2.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Speed up Clog Access by increasing CLOG buffers

2016-10-20 Thread Robert Haas
On Thu, Oct 20, 2016 at 3:36 AM, Dilip Kumar  wrote:
> On Thu, Oct 13, 2016 at 12:25 AM, Robert Haas  wrote:
>> I agree with these conclusions.  I had a chance to talk with Andres
>> this morning at Postgres Vision and based on that conversation I'd
>> like to suggest a couple of additional tests:
>>
>> 1. Repeat this test on x86.  In particular, I think you should test on
>> the EnterpriseDB server cthulhu, which is an 8-socket x86 server.
>
> I have done my test on cthulhu, basic difference is that In POWER we
> saw ClogControlLock on top at 96 and more client with 300 scale
> factor. But, on cthulhu at 300 scale factor transactionid lock is
> always on top. So I repeated my test with 1000 scale factor as well on
> cthulhu.

So the upshot appears to be that this problem is a lot worse on power2
than cthulhu, which suggests that this is architecture-dependent.  I
guess it could also be kernel-dependent, but it doesn't seem likely,
because:

power2: Red Hat Enterprise Linux Server release 7.1 (Maipo),
3.10.0-229.14.1.ael7b.ppc64le
cthulhu: CentOS Linux release 7.2.1511 (Core), 3.10.0-229.7.2.el7.x86_64

So here's my theory.  The whole reason why Tomas is having difficulty
seeing any big effect from these patches is because he's testing on
x86.  When Dilip tests on x86, he doesn't see a big effect either,
regardless of workload.  But when Dilip tests on POWER, which I think
is where he's mostly been testing, he sees a huge effect, because for
some reason POWER has major problems with this lock that don't exist
on x86.

If that's so, then we ought to be able to reproduce the big gains on
hydra, a community POWER server.  In fact, I think I'll go run a quick
test over there right now...

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Speed up Clog Access by increasing CLOG buffers

2016-10-20 Thread Tomas Vondra

On 10/20/2016 09:36 AM, Dilip Kumar wrote:

On Thu, Oct 13, 2016 at 12:25 AM, Robert Haas  wrote:

I agree with these conclusions.  I had a chance to talk with Andres
this morning at Postgres Vision and based on that conversation I'd
like to suggest a couple of additional tests:

1. Repeat this test on x86.  In particular, I think you should test on
the EnterpriseDB server cthulhu, which is an 8-socket x86 server.


I have done my test on cthulhu, basic difference is that In POWER we
saw ClogControlLock on top at 96 and more client with 300 scale
factor. But, on cthulhu at 300 scale factor transactionid lock is
always on top. So I repeated my test with 1000 scale factor as well on
cthulhu.

All configuration is same as my last test.

Test with 1000 scale factor
-

Test1: number of clients: 192

Head:
tps = 21206.108856 (including connections establishing)
tps = 21206.245441 (excluding connections establishing)
[dilip.kumar@cthulhu bin]$ cat 1000_192_ul.txt
 310489  LWLockNamed | CLogControlLock
 296152  |
  35537  Lock| transactionid
  15821  LWLockTranche   | buffer_mapping
  10342  LWLockTranche   | buffer_content
   8427  LWLockTranche   | clog
   3961
   3165  Lock| extend
   2861  Lock| tuple
   2781  LWLockNamed | ProcArrayLock
   1104  LWLockNamed | XidGenLock
745  LWLockTranche   | lock_manager
371  LWLockNamed | CheckpointerCommLock
 70  LWLockTranche   | wal_insert
  5  BufferPin   | BufferPin
  3  LWLockTranche   | proc

Patch:
tps = 28725.038933 (including connections establishing)
tps = 28725.367102 (excluding connections establishing)
[dilip.kumar@cthulhu bin]$ cat 1000_192_ul.txt
 540061  |
  57810  LWLockNamed | CLogControlLock
  36264  LWLockTranche   | buffer_mapping
  29976  Lock| transactionid
   4770  Lock| extend
   4735  LWLockTranche   | clog
   4479  LWLockNamed | ProcArrayLock
   4006
   3955  LWLockTranche   | buffer_content
   2505  LWLockTranche   | lock_manager
   2179  Lock| tuple
   1977  LWLockNamed | XidGenLock
905  LWLockNamed | CheckpointerCommLock
222  LWLockTranche   | wal_insert
  8  LWLockTranche   | proc

Test2: number of clients: 96

Head:
tps = 25447.861572 (including connections establishing)
tps = 25448.012739 (excluding connections establishing)
 261611  |
  69604  LWLockNamed | CLogControlLock
   6119  Lock| transactionid
   4008
   2874  LWLockTranche   | buffer_mapping
   2578  LWLockTranche   | buffer_content
   2355  LWLockNamed | ProcArrayLock
   1245  Lock| extend
   1168  LWLockTranche   | clog
232  Lock| tuple
217  LWLockNamed | CheckpointerCommLock
160  LWLockNamed | XidGenLock
158  LWLockTranche   | lock_manager
 78  LWLockTranche   | wal_insert
  5  BufferPin   | BufferPin

Patch:
tps = 32708.368938 (including connections establishing)
tps = 32708.765989 (excluding connections establishing)
[dilip.kumar@cthulhu bin]$ cat 1000_96_ul.txt
 326601  |
   7471  LWLockNamed | CLogControlLock
   5387  Lock| transactionid
   4018
   3331  LWLockTranche   | buffer_mapping
   3144  LWLockNamed | ProcArrayLock
   1372  Lock| extend
722  LWLockTranche   | buffer_content
393  LWLockNamed | XidGenLock
237  LWLockTranche   | lock_manager
234  Lock| tuple
194  LWLockTranche   | clog
 96  Lock| relation
 88  LWLockTranche   | wal_insert
 34  LWLockNamed | CheckpointerCommLock

Test3: number of clients: 64

Head:

tps = 28264.194438 (including connections establishing)
tps = 28264.336270 (excluding connections establishing)

 218264  |
  10314  LWLockNamed | CLogControlLock
   4019
   2067  Lock| transactionid
   1950  LWLockTranche   | buffer_mapping
   1879  LWLockNamed | ProcArrayLock
592  Lock| extend
565  LWLockTranche   | buffer_content
222  LWLockNamed | XidGenLock
143  LWLockTranche   | clog
131  LWLockNamed | CheckpointerCommLock
 63  LWLockTranche   | lock_manager
 52  Lock| tuple
 35  LWLockTranche   | wal_insert

Patch:
tps = 27906.376194 (including connections establishing)
tps = 27906.531392 (excluding connections establishing)
[dilip.kumar@cthulhu bin]$ cat 1000_64_ul.txt
 228108  |
   4039
   2294  Lock| transactionid
   2116  LWLockTranche   | buffer_mapping
   1757  LWLockNamed | ProcArrayLock
   1553  LWLockNamed | CLogControlLock
800  Lock| extend
403  LWLockTranche   | buffer_content
 92  LWLockNamed | XidGenLock
 74  LWLockTranche   | lock_manager
 42  Lock| tuple
 35  LWLockTranche   | wal_insert
 34  LWLockTranche   | clog
 

Re: [HACKERS] Speed up Clog Access by increasing CLOG buffers

2016-10-20 Thread Dilip Kumar
On Thu, Oct 13, 2016 at 12:25 AM, Robert Haas  wrote:
> I agree with these conclusions.  I had a chance to talk with Andres
> this morning at Postgres Vision and based on that conversation I'd
> like to suggest a couple of additional tests:
>
> 1. Repeat this test on x86.  In particular, I think you should test on
> the EnterpriseDB server cthulhu, which is an 8-socket x86 server.

I have done my test on cthulhu, basic difference is that In POWER we
saw ClogControlLock on top at 96 and more client with 300 scale
factor. But, on cthulhu at 300 scale factor transactionid lock is
always on top. So I repeated my test with 1000 scale factor as well on
cthulhu.

All configuration is same as my last test.

Test with 1000 scale factor
-

Test1: number of clients: 192

Head:
tps = 21206.108856 (including connections establishing)
tps = 21206.245441 (excluding connections establishing)
[dilip.kumar@cthulhu bin]$ cat 1000_192_ul.txt
 310489  LWLockNamed | CLogControlLock
 296152  |
  35537  Lock| transactionid
  15821  LWLockTranche   | buffer_mapping
  10342  LWLockTranche   | buffer_content
   8427  LWLockTranche   | clog
   3961
   3165  Lock| extend
   2861  Lock| tuple
   2781  LWLockNamed | ProcArrayLock
   1104  LWLockNamed | XidGenLock
745  LWLockTranche   | lock_manager
371  LWLockNamed | CheckpointerCommLock
 70  LWLockTranche   | wal_insert
  5  BufferPin   | BufferPin
  3  LWLockTranche   | proc

Patch:
tps = 28725.038933 (including connections establishing)
tps = 28725.367102 (excluding connections establishing)
[dilip.kumar@cthulhu bin]$ cat 1000_192_ul.txt
 540061  |
  57810  LWLockNamed | CLogControlLock
  36264  LWLockTranche   | buffer_mapping
  29976  Lock| transactionid
   4770  Lock| extend
   4735  LWLockTranche   | clog
   4479  LWLockNamed | ProcArrayLock
   4006
   3955  LWLockTranche   | buffer_content
   2505  LWLockTranche   | lock_manager
   2179  Lock| tuple
   1977  LWLockNamed | XidGenLock
905  LWLockNamed | CheckpointerCommLock
222  LWLockTranche   | wal_insert
  8  LWLockTranche   | proc

Test2: number of clients: 96

Head:
tps = 25447.861572 (including connections establishing)
tps = 25448.012739 (excluding connections establishing)
 261611  |
  69604  LWLockNamed | CLogControlLock
   6119  Lock| transactionid
   4008
   2874  LWLockTranche   | buffer_mapping
   2578  LWLockTranche   | buffer_content
   2355  LWLockNamed | ProcArrayLock
   1245  Lock| extend
   1168  LWLockTranche   | clog
232  Lock| tuple
217  LWLockNamed | CheckpointerCommLock
160  LWLockNamed | XidGenLock
158  LWLockTranche   | lock_manager
 78  LWLockTranche   | wal_insert
  5  BufferPin   | BufferPin

Patch:
tps = 32708.368938 (including connections establishing)
tps = 32708.765989 (excluding connections establishing)
[dilip.kumar@cthulhu bin]$ cat 1000_96_ul.txt
 326601  |
   7471  LWLockNamed | CLogControlLock
   5387  Lock| transactionid
   4018
   3331  LWLockTranche   | buffer_mapping
   3144  LWLockNamed | ProcArrayLock
   1372  Lock| extend
722  LWLockTranche   | buffer_content
393  LWLockNamed | XidGenLock
237  LWLockTranche   | lock_manager
234  Lock| tuple
194  LWLockTranche   | clog
 96  Lock| relation
 88  LWLockTranche   | wal_insert
 34  LWLockNamed | CheckpointerCommLock

Test3: number of clients: 64

Head:

tps = 28264.194438 (including connections establishing)
tps = 28264.336270 (excluding connections establishing)

 218264  |
  10314  LWLockNamed | CLogControlLock
   4019
   2067  Lock| transactionid
   1950  LWLockTranche   | buffer_mapping
   1879  LWLockNamed | ProcArrayLock
592  Lock| extend
565  LWLockTranche   | buffer_content
222  LWLockNamed | XidGenLock
143  LWLockTranche   | clog
131  LWLockNamed | CheckpointerCommLock
 63  LWLockTranche   | lock_manager
 52  Lock| tuple
 35  LWLockTranche   | wal_insert

Patch:
tps = 27906.376194 (including connections establishing)
tps = 27906.531392 (excluding connections establishing)
[dilip.kumar@cthulhu bin]$ cat 1000_64_ul.txt
 228108  |
   4039
   2294  Lock| transactionid
   2116  LWLockTranche   | buffer_mapping
   1757  LWLockNamed | ProcArrayLock
   1553  LWLockNamed | CLogControlLock
800  Lock| extend
403  LWLockTranche   | buffer_content
 92  LWLockNamed | XidGenLock
 74  LWLockTranche   | lock_manager
 42  Lock| tuple
 35  LWLockTranche   | wal_insert
 34  LWLockTranche   | clog
 14  LWLockNamed | 

Re: [HACKERS] Speed up Clog Access by increasing CLOG buffers

2016-10-15 Thread Amit Kapila
On Thu, Oct 13, 2016 at 7:53 AM, Tomas Vondra
 wrote:
> On 10/12/2016 08:55 PM, Robert Haas wrote:
>> On Wed, Oct 12, 2016 at 3:21 AM, Dilip Kumar  wrote:
>>> I think at higher client count from client count 96 onwards contention
>>> on CLogControlLock is clearly visible and which is completely solved
>>> with group lock patch.
>>>
>>> And at lower client count 32,64  contention on CLogControlLock is not
>>> significant hence we can not see any gain with group lock patch.
>>> (though we can see some contention on CLogControlLock is reduced at 64
>>> clients.)
>>
>> I agree with these conclusions.  I had a chance to talk with Andres
>> this morning at Postgres Vision and based on that conversation I'd
>> like to suggest a couple of additional tests:
>>
>> 1. Repeat this test on x86.  In particular, I think you should test on
>> the EnterpriseDB server cthulhu, which is an 8-socket x86 server.
>>
>> 2. Repeat this test with a mixed read-write workload, like -b
>> tpcb-like@1 -b select-only@9
>>
>
> FWIW, I'm already running similar benchmarks on an x86 machine with 72
> cores (144 with HT). It's "just" a 4-socket system, but the results I
> got so far seem quite interesting. The tooling and results (pushed
> incrementally) are available here:
>
> https://bitbucket.org/tvondra/hp05-results/overview
>
> The tooling is completely automated, and it also collects various stats,
> like for example the wait event. So perhaps we could simply run it on
> ctulhu and get comparable results, and also more thorough data sets than
> just snippets posted to the list?
>
> There's also a bunch of reports for the 5 already completed runs
>
>  - dilip-300-logged-sync
>  - dilip-300-unlogged-sync
>  - pgbench-300-logged-sync-skip
>  - pgbench-300-unlogged-sync-noskip
>  - pgbench-300-unlogged-sync-skip
>
> The name identifies the workload type, scale and whether the tables are
> wal-logged (for pgbench the "skip" means "-N" while "noskip" does
> regular pgbench).
>
> For example the "reports/wait-events-count-patches.txt" compares the
> wait even stats with different patches applied (and master):
>
> https://bitbucket.org/tvondra/hp05-results/src/506d0bee9e6557b015a31d72f6c3506e3f198c17/reports/wait-events-count-patches.txt?at=master=file-view-default
>
> and average tps (from 3 runs, 5 minutes each):
>
> https://bitbucket.org/tvondra/hp05-results/src/506d0bee9e6557b015a31d72f6c3506e3f198c17/reports/tps-avg-patches.txt?at=master=file-view-default
>
> There are certainly interesting bits. For example while the "logged"
> case is dominated y WALWriteLock for most client counts, for large
> client counts that's no longer true.
>
> Consider for example dilip-300-logged-sync results with 216 clients:
>
>  wait_event  | master  | gran_lock | no_cont_lock | group_upd
>  +-+---+--+---
>  CLogControlLock |  624566 |474261 |   458599 |225338
>  WALWriteLock|  431106 |623142 |   619596 |699224
>  |  331542 |358220 |   371393 |537076
>  buffer_content  |  261308 |134764 |   138664 |102057
>  ClientRead  |   59826 |100883 |   103609 |118379
>  transactionid   |   26966 | 23155 |23815 | 31700
>  ProcArrayLock   |3967 |  3852 | 4070 |  4576
>  wal_insert  |3948 | 10430 | 9513 | 12079
>  clog|1710 |  4006 | 2443 |   925
>  XidGenLock  |1689 |  3785 | 4229 |  3539
>  tuple   | 965 |   617 |  655 |   840
>  lock_manager| 300 |   571 |  619 |   802
>  WALBufMappingLock   | 168 |   140 |  158 |   147
>  SubtransControlLock |  60 |   115 |  124 |   105
>
> Clearly, CLOG is an issue here, and it's (slightly) improved by all the
> patches (group_update performing the best). And with 288 clients (which
> is 2x the number of virtual cores in the machine, so not entirely crazy)
> you get this:
>
>  wait_event  | master  | gran_lock | no_cont_lock | group_upd
>  +-+---+--+---
>  CLogControlLock |  901670 |736822 |   728823 |398111
>  buffer_content  |  492637 |318129 |   319251 |270416
>  WALWriteLock|  414371 |593804 |   589809 |656613
>  |  380344 |452936 |   470178 |745790
>  ClientRead  |   60261 |111367 |   111391 |126151
>  transactionid   |   43627 | 34585 |35464 | 48679
>  wal_insert  |5423 | 29323 |25898 | 30191
>  ProcArrayLock   |4379 |  3918 | 4006 |  4582
>  clog|2952 |  9135 | 5304 |  2514
>  XidGenLock  |

Re: [HACKERS] Speed up Clog Access by increasing CLOG buffers

2016-10-12 Thread Tomas Vondra
On 10/12/2016 08:55 PM, Robert Haas wrote:
> On Wed, Oct 12, 2016 at 3:21 AM, Dilip Kumar  wrote:
>> I think at higher client count from client count 96 onwards contention
>> on CLogControlLock is clearly visible and which is completely solved
>> with group lock patch.
>>
>> And at lower client count 32,64  contention on CLogControlLock is not
>> significant hence we can not see any gain with group lock patch.
>> (though we can see some contention on CLogControlLock is reduced at 64
>> clients.)
> 
> I agree with these conclusions.  I had a chance to talk with Andres
> this morning at Postgres Vision and based on that conversation I'd
> like to suggest a couple of additional tests:
> 
> 1. Repeat this test on x86.  In particular, I think you should test on
> the EnterpriseDB server cthulhu, which is an 8-socket x86 server.
> 
> 2. Repeat this test with a mixed read-write workload, like -b
> tpcb-like@1 -b select-only@9
> 

FWIW, I'm already running similar benchmarks on an x86 machine with 72
cores (144 with HT). It's "just" a 4-socket system, but the results I
got so far seem quite interesting. The tooling and results (pushed
incrementally) are available here:

https://bitbucket.org/tvondra/hp05-results/overview

The tooling is completely automated, and it also collects various stats,
like for example the wait event. So perhaps we could simply run it on
ctulhu and get comparable results, and also more thorough data sets than
just snippets posted to the list?

There's also a bunch of reports for the 5 already completed runs

 - dilip-300-logged-sync
 - dilip-300-unlogged-sync
 - pgbench-300-logged-sync-skip
 - pgbench-300-unlogged-sync-noskip
 - pgbench-300-unlogged-sync-skip

The name identifies the workload type, scale and whether the tables are
wal-logged (for pgbench the "skip" means "-N" while "noskip" does
regular pgbench).

For example the "reports/wait-events-count-patches.txt" compares the
wait even stats with different patches applied (and master):

https://bitbucket.org/tvondra/hp05-results/src/506d0bee9e6557b015a31d72f6c3506e3f198c17/reports/wait-events-count-patches.txt?at=master=file-view-default

and average tps (from 3 runs, 5 minutes each):

https://bitbucket.org/tvondra/hp05-results/src/506d0bee9e6557b015a31d72f6c3506e3f198c17/reports/tps-avg-patches.txt?at=master=file-view-default

There are certainly interesting bits. For example while the "logged"
case is dominated y WALWriteLock for most client counts, for large
client counts that's no longer true.

Consider for example dilip-300-logged-sync results with 216 clients:

 wait_event  | master  | gran_lock | no_cont_lock | group_upd
 +-+---+--+---
 CLogControlLock |  624566 |474261 |   458599 |225338
 WALWriteLock|  431106 |623142 |   619596 |699224
 |  331542 |358220 |   371393 |537076
 buffer_content  |  261308 |134764 |   138664 |102057
 ClientRead  |   59826 |100883 |   103609 |118379
 transactionid   |   26966 | 23155 |23815 | 31700
 ProcArrayLock   |3967 |  3852 | 4070 |  4576
 wal_insert  |3948 | 10430 | 9513 | 12079
 clog|1710 |  4006 | 2443 |   925
 XidGenLock  |1689 |  3785 | 4229 |  3539
 tuple   | 965 |   617 |  655 |   840
 lock_manager| 300 |   571 |  619 |   802
 WALBufMappingLock   | 168 |   140 |  158 |   147
 SubtransControlLock |  60 |   115 |  124 |   105

Clearly, CLOG is an issue here, and it's (slightly) improved by all the
patches (group_update performing the best). And with 288 clients (which
is 2x the number of virtual cores in the machine, so not entirely crazy)
you get this:

 wait_event  | master  | gran_lock | no_cont_lock | group_upd
 +-+---+--+---
 CLogControlLock |  901670 |736822 |   728823 |398111
 buffer_content  |  492637 |318129 |   319251 |270416
 WALWriteLock|  414371 |593804 |   589809 |656613
 |  380344 |452936 |   470178 |745790
 ClientRead  |   60261 |111367 |   111391 |126151
 transactionid   |   43627 | 34585 |35464 | 48679
 wal_insert  |5423 | 29323 |25898 | 30191
 ProcArrayLock   |4379 |  3918 | 4006 |  4582
 clog|2952 |  9135 | 5304 |  2514
 XidGenLock  |2182 |  9488 | 8894 |  8595
 tuple   |2176 |  1288 | 1409 |  1821
 lock_manager| 323 |   797 |  827 |  1006
 WALBufMappingLock   | 124 |   124 |   

Re: [HACKERS] Speed up Clog Access by increasing CLOG buffers

2016-10-12 Thread Robert Haas
On Wed, Oct 12, 2016 at 3:21 AM, Dilip Kumar  wrote:
> I think at higher client count from client count 96 onwards contention
> on CLogControlLock is clearly visible and which is completely solved
> with group lock patch.
>
> And at lower client count 32,64  contention on CLogControlLock is not
> significant hence we can not see any gain with group lock patch.
> (though we can see some contention on CLogControlLock is reduced at 64
> clients.)

I agree with these conclusions.  I had a chance to talk with Andres
this morning at Postgres Vision and based on that conversation I'd
like to suggest a couple of additional tests:

1. Repeat this test on x86.  In particular, I think you should test on
the EnterpriseDB server cthulhu, which is an 8-socket x86 server.

2. Repeat this test with a mixed read-write workload, like -b
tpcb-like@1 -b select-only@9

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Speed up Clog Access by increasing CLOG buffers

2016-10-12 Thread Dilip Kumar
On Mon, Oct 10, 2016 at 2:17 AM, Tomas Vondra
 wrote:
> after testing each combination (every ~9 hours). Inspired by Robert's wait
> event post a few days ago, I've added wait event sampling so that we can
> perform similar analysis. (Neat idea!)

I have done wait event test on for head vs group lock patch.
I have used similar script what Robert has mentioned in below thread

https://www.postgresql.org/message-id/ca+tgmoav9q5v5zgt3+wp_1tqjt6tgyxrwrdctrrwimc+zy7...@mail.gmail.com

Test details and Results:

Machine, POWER, 4 socket machine (machine details are attached in file.)

30-minute pgbench runs with
configurations,
had max_connections = 200,
shared_buffers = 8GB,
maintenance_work_mem = 4GB,
synchronous_commit =off,
checkpoint_timeout = 15min,
checkpoint_completion_target = 0.9,
log_line_prefix = '%t [%p]
max_wal_size = 40GB,
log_checkpoints =on.

Test1: unlogged table, 192 clients
-
On Head:
tps = 44898.862257 (including connections establishing)
tps = 44899.761934 (excluding connections establishing)

 262092  LWLockNamed | CLogControlLock
 224396  |
 114510  Lock| transactionid
  42908  Client  | ClientRead
  20610  Lock| tuple
  13700  LWLockTranche   | buffer_content
   3637
   2562  LWLockNamed | XidGenLock
   2359  LWLockNamed | ProcArrayLock
   1037  Lock| extend
948  LWLockTranche   | lock_manager
 46  LWLockTranche   | wal_insert
 12  BufferPin   | BufferPin
  4  LWLockTranche   | buffer_mapping

With Patch:

tps = 77846.622956 (including connections establishing)
tps = 77848.234046 (excluding connections establishing)

 101832  Lock| transactionid
  91358  Client  | ClientRead
  16691  LWLockNamed | XidGenLock
  12467  Lock| tuple
   6007  LWLockNamed | CLogControlLock
   3640
   3531  LWLockNamed | ProcArrayLock
   3390  LWLockTranche   | lock_manager
   2683  Lock| extend
   1112  LWLockTranche   | buffer_content
 72  LWLockTranche   | wal_insert
  8  LWLockTranche   | buffer_mapping
  2  LWLockTranche   | proc
  2  BufferPin   | BufferPin


Test2: unlogged table, 96 clients
--
On head:
tps = 58632.065563 (including connections establishing)
tps = 58632.767384 (excluding connections establishing)
  77039  LWLockNamed | CLogControlLock
  39712  Client  | ClientRead
  18358  Lock| transactionid
   4238  LWLockNamed | XidGenLock
   3638
   3518  LWLockTranche   | buffer_content
   2717  LWLockNamed | ProcArrayLock
   1410  Lock| tuple
792  Lock| extend
182  LWLockTranche   | lock_manager
 30  LWLockTranche   | wal_insert
  3  LWLockTranche   | buffer_mapping
  1 Tuples only is on.
  1  BufferPin   | BufferPin

With Patch:
tps = 75204.166640 (including connections establishing)
tps = 75204.922105 (excluding connections establishing)
[dilip.kumar@power2 bin]$ cat out_300_96_ul.txt
 261917  |
  53407  Client  | ClientRead
  14994  Lock| transactionid
   5258  LWLockNamed | XidGenLock
   3660
   3604  LWLockNamed | ProcArrayLock
   2096  LWLockNamed | CLogControlLock
   1102  Lock| tuple
823  Lock| extend
481  LWLockTranche   | buffer_content
372  LWLockTranche   | lock_manager
192  Lock| relation
 65  LWLockTranche   | wal_insert
  6  LWLockTranche   | buffer_mapping
  1 Tuples only is on.
  1  LWLockTranche   | proc


Test3: unlogged table, 64 clients
--
On Head:

tps = 66231.203018 (including connections establishing)
tps = 66231.664990 (excluding connections establishing)

  43446  Client  | ClientRead
   6992  LWLockNamed | CLogControlLock
   4685  Lock| transactionid
   3650
   3381  LWLockNamed | ProcArrayLock
810  LWLockNamed | XidGenLock
734  Lock| extend
439  LWLockTranche   | buffer_content
247  Lock| tuple
136  LWLockTranche   | lock_manager
 64  Lock| relation
 24  LWLockTranche   | wal_insert
  2  LWLockTranche   | buffer_mapping
  1 Tuples only is on.


With Patch:
tps = 67294.042602 (including connections establishing)
tps = 67294.532650 (excluding connections establishing)

  28186  Client  | ClientRead
   3655
   1172  LWLockNamed | ProcArrayLock
619  Lock| transactionid
289  LWLockNamed | CLogControlLock
237  Lock| extend
 81  LWLockTranche   | buffer_content
 48  LWLockNamed | XidGenLock
 28  LWLockTranche   | lock_manager
 23  Lock| tuple
  6  LWLockTranche   | wal_insert



Test4:  unlogged table, 32 clients

Head:
tps = 52320.190549 

Re: [HACKERS] Speed up Clog Access by increasing CLOG buffers

2016-10-09 Thread Tomas Vondra

On 10/08/2016 07:47 AM, Amit Kapila wrote:

On Fri, Oct 7, 2016 at 3:02 PM, Tomas Vondra
 wrote:

>
> ...
>

In total, I plan to test combinations of:

(a) Dilip's workload and pgbench (regular and -N)
(b) logged and unlogged tables
(c) scale 300 and scale 3000 (both fits into RAM)
(d) sync_commit=on/off



sounds sensible.

Thanks for doing the tests.



FWIW I've started those tests on the big machine provided by Oleg and 
Alexander, an estimate to complete all the benchmarks is 9 days. The 
results will be pushed


   to https://bitbucket.org/tvondra/hp05-results/src

after testing each combination (every ~9 hours). Inspired by Robert's 
wait event post a few days ago, I've added wait event sampling so that 
we can perform similar analysis. (Neat idea!)


While messing with the kernel on the other machine I've managed to 
misconfigure it to the extent that it's not accessible anymore. I'll 
start similar benchmarks once I find someone with console access who can 
fix the boot.


regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Speed up Clog Access by increasing CLOG buffers

2016-10-07 Thread Amit Kapila
On Fri, Oct 7, 2016 at 3:02 PM, Tomas Vondra
 wrote:
>
> I got access to a large machine with 72/144 cores (thanks to Oleg and
> Alexander from Postgres Professional), and I'm running the tests on that
> machine too.
>
> Results from Dilip's workload (with scale 300, unlogged tables) look like
> this:
>
> 32  64128 192224 256288
>   master104943  128579  72167  100967  66631   97088  63767
>   granular-locking  103415  141689  83780  120480  71847  115201  67240
>   group-update  105343  144322  92229  130149  81247  126629  76638
>   no-content-lock   103153  140568  80101  119185  70004  115386  66199
>
> So there's some 20-30% improvement for >= 128 clients.
>

So here we see performance improvement starting at 64 clients, this is
somewhat similar to what Dilip saw in his tests.

> But what I find much more intriguing is the zig-zag behavior. I mean, 64
> clients give ~130k tps, 128 clients only give ~70k but 192 clients jump up
> to >100k tps again, etc.
>

No clear answer.

> FWIW I don't see any such behavior on pgbench, and all those tests were done
> on the same cluster.
>
>>> With 4.5.5, results for the same benchmark look like this:
>>>
>>>64128192
>>> 
>>>  master 35693  39822  42151
>>>  granular-locking   35370  39409  41353
>>>  no-content-lock36201  39848  42407
>>>  group-update   35697  39893  42667
>>>
>>> That seems like a fairly bad regression in kernel, although I have not
>>> identified the feature/commit causing it (and it's also possible the
>>> issue
>>> lies somewhere else, of course).
>>>
>>> With regular pgbench, I see no improvement on any kernel version. For
>>> example on 3.19 the results look like this:
>>>
>>>64128192
>>> 
>>>  master 54661  61014  59484
>>>  granular-locking   55904  62481  60711
>>>  no-content-lock56182  62442  61234
>>>  group-update   55019  61587  60485
>>>
>>
>> Are the above results with synchronous_commit=off?
>>
>
> No, but I can do that.
>
>>> I haven't done much more testing (e.g. with -N to eliminate
>>> collisions on branches) yet, let's see if it changes anything.
>>>
>>
>> Yeah, let us see how it behaves with -N. Also, I think we could try
>> at higher scale factor?
>>
>
> Yes, I plan to do that. In total, I plan to test combinations of:
>
> (a) Dilip's workload and pgbench (regular and -N)
> (b) logged and unlogged tables
> (c) scale 300 and scale 3000 (both fits into RAM)
> (d) sync_commit=on/off
>

sounds sensible.

Thanks for doing the tests.


-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Speed up Clog Access by increasing CLOG buffers

2016-10-07 Thread Tomas Vondra

On 10/05/2016 10:03 AM, Amit Kapila wrote:

On Wed, Oct 5, 2016 at 12:05 PM, Tomas Vondra
 wrote:

Hi,

After collecting a lot more results from multiple kernel versions, I can
confirm that I see a significant improvement with 128 and 192 clients,
roughly by 30%:

   64128192

 master 62482  43181  50985
 granular-locking   61701  59611  47483
 no-content-lock62650  59819  47895
 group-update   63702  64758  62596

But I only see this with Dilip's workload, and only with pre-4.3.0 kernels
(the results above are from kernel 3.19).



That appears positive.



I got access to a large machine with 72/144 cores (thanks to Oleg and 
Alexander from Postgres Professional), and I'm running the tests on that 
machine too.


Results from Dilip's workload (with scale 300, unlogged tables) look 
like this:


32  64128 192224 256288
  master104943  128579  72167  100967  66631   97088  63767
  granular-locking  103415  141689  83780  120480  71847  115201  67240
  group-update  105343  144322  92229  130149  81247  126629  76638
  no-content-lock   103153  140568  80101  119185  70004  115386  66199

So there's some 20-30% improvement for >= 128 clients.

But what I find much more intriguing is the zig-zag behavior. I mean, 64 
clients give ~130k tps, 128 clients only give ~70k but 192 clients jump 
up to >100k tps again, etc.


FWIW I don't see any such behavior on pgbench, and all those tests were 
done on the same cluster.



With 4.5.5, results for the same benchmark look like this:

   64128192

 master 35693  39822  42151
 granular-locking   35370  39409  41353
 no-content-lock36201  39848  42407
 group-update   35697  39893  42667

That seems like a fairly bad regression in kernel, although I have not
identified the feature/commit causing it (and it's also possible the issue
lies somewhere else, of course).

With regular pgbench, I see no improvement on any kernel version. For
example on 3.19 the results look like this:

   64128192

 master 54661  61014  59484
 granular-locking   55904  62481  60711
 no-content-lock56182  62442  61234
 group-update   55019  61587  60485



Are the above results with synchronous_commit=off?



No, but I can do that.


I haven't done much more testing (e.g. with -N to eliminate
collisions on branches) yet, let's see if it changes anything.



Yeah, let us see how it behaves with -N. Also, I think we could try
at higher scale factor?



Yes, I plan to do that. In total, I plan to test combinations of:

(a) Dilip's workload and pgbench (regular and -N)
(b) logged and unlogged tables
(c) scale 300 and scale 3000 (both fits into RAM)
(d) sync_commit=on/off

regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Speed up Clog Access by increasing CLOG buffers

2016-10-05 Thread Amit Kapila
On Wed, Oct 5, 2016 at 12:05 PM, Tomas Vondra
 wrote:
> Hi,
>
> After collecting a lot more results from multiple kernel versions, I can
> confirm that I see a significant improvement with 128 and 192 clients,
> roughly by 30%:
>
>64128192
> 
>  master 62482  43181  50985
>  granular-locking   61701  59611  47483
>  no-content-lock62650  59819  47895
>  group-update   63702  64758  62596
>
> But I only see this with Dilip's workload, and only with pre-4.3.0 kernels
> (the results above are from kernel 3.19).
>

That appears positive.

> With 4.5.5, results for the same benchmark look like this:
>
>64128192
> 
>  master 35693  39822  42151
>  granular-locking   35370  39409  41353
>  no-content-lock36201  39848  42407
>  group-update   35697  39893  42667
>
> That seems like a fairly bad regression in kernel, although I have not
> identified the feature/commit causing it (and it's also possible the issue
> lies somewhere else, of course).
>
> With regular pgbench, I see no improvement on any kernel version. For
> example on 3.19 the results look like this:
>
>64128192
> 
>  master 54661  61014  59484
>  granular-locking   55904  62481  60711
>  no-content-lock56182  62442  61234
>  group-update   55019  61587  60485
>

Are the above results with synchronous_commit=off?

> I haven't done much more testing (e.g. with -N to eliminate collisions on
> branches) yet, let's see if it changes anything.
>

Yeah, let us see how it behaves with -N.  Also, I think we could try
at higher scale factor?


-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Speed up Clog Access by increasing CLOG buffers

2016-10-05 Thread Tomas Vondra

Hi,

After collecting a lot more results from multiple kernel versions, I can 
confirm that I see a significant improvement with 128 and 192 clients, 
roughly by 30%:


   64128192

 master 62482  43181  50985
 granular-locking   61701  59611  47483
 no-content-lock62650  59819  47895
 group-update   63702  64758  62596

But I only see this with Dilip's workload, and only with pre-4.3.0 
kernels (the results above are from kernel 3.19).


With 4.5.5, results for the same benchmark look like this:

   64128192

 master 35693  39822  42151
 granular-locking   35370  39409  41353
 no-content-lock36201  39848  42407
 group-update   35697  39893  42667

That seems like a fairly bad regression in kernel, although I have not 
identified the feature/commit causing it (and it's also possible the 
issue lies somewhere else, of course).


With regular pgbench, I see no improvement on any kernel version. For 
example on 3.19 the results look like this:


   64128192

 master 54661  61014  59484
 granular-locking   55904  62481  60711
 no-content-lock56182  62442  61234
 group-update   55019  61587  60485

I haven't done much more testing (e.g. with -N to eliminate collisions 
on branches) yet, let's see if it changes anything.


regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Speed up Clog Access by increasing CLOG buffers

2016-09-29 Thread Dilip Kumar
On Thu, Sep 29, 2016 at 8:05 PM, Robert Haas  wrote:
> OK, another theory: Dilip is, I believe, reinitializing for each run,
> and you are not.

Yes, I am reinitializing for each run.


-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Speed up Clog Access by increasing CLOG buffers

2016-09-29 Thread Robert Haas
On Thu, Sep 29, 2016 at 10:14 AM, Tomas Vondra
 wrote:
>> It's not impossible that the longer runs could matter - performance
>> isn't necessarily stable across time during a pgbench test, and the
>> longer the run the more CLOG pages it will fill.
>
> Sure, but I'm not doing just a single pgbench run. I do a sequence of
> pgbench runs, with different client counts, with ~6h of total runtime.
> There's a checkpoint in between the runs, but as those benchmarks are on
> unlogged tables, that flushes only very few buffers.
>
> Also, the clog SLRU has 128 pages, which is ~1MB of clog data, i.e. ~4M
> transactions. On some kernels (3.10 and 3.12) I can get >50k tps with 64
> clients or more, which means we fill the 128 pages in less than 80 seconds.
>
> So half-way through the run only 50% of clog pages fits into the SLRU, and
> we have a data set with 30M tuples, with uniform random access - so it seems
> rather unlikely we'll get transaction that's still in the SLRU.
>
> But sure, I can do a run with larger data set to verify this.

OK, another theory: Dilip is, I believe, reinitializing for each run,
and you are not.  Maybe somehow the effect Dilip is seeing only
happens with a newly-initialized set of pgbench tables.  For example,
maybe the patches cause a huge improvement when all rows have the same
XID, but the effect fades rapidly once the XIDs spread out...

I'm not saying any of what I'm throwing out here is worth the
electrons upon which it is printed, just that there has to be some
explanation.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Speed up Clog Access by increasing CLOG buffers

2016-09-29 Thread Tomas Vondra

On 09/29/2016 03:47 PM, Robert Haas wrote:

On Wed, Sep 28, 2016 at 9:10 PM, Tomas Vondra
 wrote:

I feel like we must be missing something here.  If Dilip is seeing
huge speedups and you're seeing nothing, something is different, and
we don't know what it is.  Even if the test case is artificial, it
ought to be the same when one of you runs it as when the other runs
it.  Right?


Yes, definitely - we're missing something important, I think. One difference
is that Dilip is using longer runs, but I don't think that's a problem (as I
demonstrated how stable the results are).


It's not impossible that the longer runs could matter - performance
isn't necessarily stable across time during a pgbench test, and the
longer the run the more CLOG pages it will fill.



Sure, but I'm not doing just a single pgbench run. I do a sequence of 
pgbench runs, with different client counts, with ~6h of total runtime. 
There's a checkpoint in between the runs, but as those benchmarks are on 
unlogged tables, that flushes only very few buffers.


Also, the clog SLRU has 128 pages, which is ~1MB of clog data, i.e. ~4M 
transactions. On some kernels (3.10 and 3.12) I can get >50k tps with 64 
clients or more, which means we fill the 128 pages in less than 80 seconds.


So half-way through the run only 50% of clog pages fits into the SLRU, 
and we have a data set with 30M tuples, with uniform random access - so 
it seems rather unlikely we'll get transaction that's still in the SLRU.


But sure, I can do a run with larger data set to verify this.


I wonder what CPU model is Dilip using - I know it's x86, but not which
generation it is. I'm using E5-4620 v1 Xeon, perhaps Dilip is using a newer
model and it makes a difference (although that seems unlikely).


The fact that he's using an 8-socket machine seems more likely to
matter than the CPU generation, which isn't much different.  Maybe
Dilip should try this on a 2-socket machine and see if he sees the
same kinds of results.



Maybe. I wouldn't expect a major difference between 4 and 8 sockets, but 
I may be wrong.


regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Speed up Clog Access by increasing CLOG buffers

2016-09-29 Thread Robert Haas
On Wed, Sep 28, 2016 at 9:10 PM, Tomas Vondra
 wrote:
>> I feel like we must be missing something here.  If Dilip is seeing
>> huge speedups and you're seeing nothing, something is different, and
>> we don't know what it is.  Even if the test case is artificial, it
>> ought to be the same when one of you runs it as when the other runs
>> it.  Right?
>>
> Yes, definitely - we're missing something important, I think. One difference
> is that Dilip is using longer runs, but I don't think that's a problem (as I
> demonstrated how stable the results are).

It's not impossible that the longer runs could matter - performance
isn't necessarily stable across time during a pgbench test, and the
longer the run the more CLOG pages it will fill.

> I wonder what CPU model is Dilip using - I know it's x86, but not which
> generation it is. I'm using E5-4620 v1 Xeon, perhaps Dilip is using a newer
> model and it makes a difference (although that seems unlikely).

The fact that he's using an 8-socket machine seems more likely to
matter than the CPU generation, which isn't much different.  Maybe
Dilip should try this on a 2-socket machine and see if he sees the
same kinds of results.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Speed up Clog Access by increasing CLOG buffers

2016-09-29 Thread Amit Kapila
On Thu, Sep 29, 2016 at 12:56 PM, Dilip Kumar  wrote:
> On Thu, Sep 29, 2016 at 6:40 AM, Tomas Vondra
>  wrote:
>> Yes, definitely - we're missing something important, I think. One difference
>> is that Dilip is using longer runs, but I don't think that's a problem (as I
>> demonstrated how stable the results are).
>>
>> I wonder what CPU model is Dilip using - I know it's x86, but not which
>> generation it is. I'm using E5-4620 v1 Xeon, perhaps Dilip is using a newer
>> model and it makes a difference (although that seems unlikely).
>
> I am using "Intel(R) Xeon(R) CPU E7- 8830  @ 2.13GHz "
>

Another difference is that m/c on which Dilip is doing tests has 8 sockets.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Speed up Clog Access by increasing CLOG buffers

2016-09-29 Thread Dilip Kumar
On Thu, Sep 29, 2016 at 6:40 AM, Tomas Vondra
 wrote:
> Yes, definitely - we're missing something important, I think. One difference
> is that Dilip is using longer runs, but I don't think that's a problem (as I
> demonstrated how stable the results are).
>
> I wonder what CPU model is Dilip using - I know it's x86, but not which
> generation it is. I'm using E5-4620 v1 Xeon, perhaps Dilip is using a newer
> model and it makes a difference (although that seems unlikely).

I am using "Intel(R) Xeon(R) CPU E7- 8830  @ 2.13GHz "


-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Speed up Clog Access by increasing CLOG buffers

2016-09-28 Thread Tomas Vondra

On 09/29/2016 01:59 AM, Robert Haas wrote:

On Wed, Sep 28, 2016 at 6:45 PM, Tomas Vondra
 wrote:

So, is 300 too little? I don't think so, because Dilip saw some benefit from
that. Or what scale factor do we think is needed to reproduce the benefit?
My machine has 256GB of ram, so I can easily go up to 15000 and still keep
everything in RAM. But is it worth it?


Dunno. But it might be worth a test or two at, say, 5000, just to
see if that makes any difference.



OK, I have some benchmarks to run on that machine, but I'll do a few 
tests with scale 5000 - probably sometime next week. I don't think the 
delay matters very much, as it's clear the patch will end up with RwF in 
this CF round.



I feel like we must be missing something here.  If Dilip is seeing
huge speedups and you're seeing nothing, something is different, and
we don't know what it is.  Even if the test case is artificial, it
ought to be the same when one of you runs it as when the other runs
it.  Right?



Yes, definitely - we're missing something important, I think. One 
difference is that Dilip is using longer runs, but I don't think that's 
a problem (as I demonstrated how stable the results are).


I wonder what CPU model is Dilip using - I know it's x86, but not which 
generation it is. I'm using E5-4620 v1 Xeon, perhaps Dilip is using a 
newer model and it makes a difference (although that seems unlikely).


regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Speed up Clog Access by increasing CLOG buffers

2016-09-28 Thread Robert Haas
On Wed, Sep 28, 2016 at 6:45 PM, Tomas Vondra
 wrote:
> So, is 300 too little? I don't think so, because Dilip saw some benefit from
> that. Or what scale factor do we think is needed to reproduce the benefit?
> My machine has 256GB of ram, so I can easily go up to 15000 and still keep
> everything in RAM. But is it worth it?

Dunno.  But it might be worth a test or two at, say, 5000, just to see
if that makes any difference.

I feel like we must be missing something here.  If Dilip is seeing
huge speedups and you're seeing nothing, something is different, and
we don't know what it is.  Even if the test case is artificial, it
ought to be the same when one of you runs it as when the other runs
it.  Right?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Speed up Clog Access by increasing CLOG buffers

2016-09-28 Thread Tomas Vondra

On 09/28/2016 05:39 PM, Robert Haas wrote:

On Tue, Sep 27, 2016 at 5:15 PM, Tomas Vondra
 wrote:

So, I got the results from 3.10.101 (only the pgbench data), and it looks
like this:

 3.10.101   1  8 16 32 64128192

 granular-locking2582  18492  33416  49583  53759  53572  51295
 no-content-lock 2580  18666  33860  49976  54382  54012  51549
 group-update2635  18877  33806  49525  54787  54117  51718
 master  2630  18783  33630  49451  54104  53199  50497

So 3.10.101 performs even better tnan 3.2.80 (and much better than 4.5.5),
and there's no sign any of the patches making a difference.


I'm sure that you mentioned this upthread somewhere, but I can't
immediately find it.  What scale factor are you testing here?



300, the same scale factor as Dilip.



It strikes me that the larger the scale factor, the more
CLogControlLock contention we expect to have.  We'll pretty much do
one CLOG access per update, and the more rows there are, the more
chance there is that the next update hits an "old" row that hasn't
been updated in a long time.  So a larger scale factor also
increases the number of active CLOG pages and, presumably therefore,
the amount of CLOG paging activity.

>

So, is 300 too little? I don't think so, because Dilip saw some benefit 
from that. Or what scale factor do we think is needed to reproduce the 
benefit? My machine has 256GB of ram, so I can easily go up to 15000 and 
still keep everything in RAM. But is it worth it?


regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Speed up Clog Access by increasing CLOG buffers

2016-09-28 Thread Robert Haas
On Tue, Sep 27, 2016 at 5:15 PM, Tomas Vondra
 wrote:
> So, I got the results from 3.10.101 (only the pgbench data), and it looks
> like this:
>
>  3.10.101   1  8 16 32 64128192
> 
>  granular-locking2582  18492  33416  49583  53759  53572  51295
>  no-content-lock 2580  18666  33860  49976  54382  54012  51549
>  group-update2635  18877  33806  49525  54787  54117  51718
>  master  2630  18783  33630  49451  54104  53199  50497
>
> So 3.10.101 performs even better tnan 3.2.80 (and much better than 4.5.5),
> and there's no sign any of the patches making a difference.

I'm sure that you mentioned this upthread somewhere, but I can't
immediately find it.  What scale factor are you testing here?

It strikes me that the larger the scale factor, the more
CLogControlLock contention we expect to have.  We'll pretty much do
one CLOG access per update, and the more rows there are, the more
chance there is that the next update hits an "old" row that hasn't
been updated in a long time.  So a larger scale factor also increases
the number of active CLOG pages and, presumably therefore, the amount
of CLOG paging activity.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Speed up Clog Access by increasing CLOG buffers

2016-09-27 Thread Tomas Vondra

On 09/26/2016 08:48 PM, Tomas Vondra wrote:

On 09/26/2016 07:16 PM, Tomas Vondra wrote:


The averages (over the 10 runs, 5 minute each) look like this:

 3.2.80 1  8 16 32 64128192

 granular-locking1567  12146  26341  44188  43263  49590  15042
 no-content-lock 1567  12180  25549  43787  43675  51800  16831
 group-update1550  12018  26121  44451  42734  51455  15504
 master  1566  12057  25457  42299  42513  42562  10462

 4.5.5  1  8 16 32 64128192

 granular-locking3018  19031  27394  29222  32032  34249  36191
 no-content-lock 2988  18871  27384  29260  32120  34456  36216
 group-update2960  18848  26870  29025  32078  34259  35900
 master  2984  18917  26430  29065  32119  33924  35897



So, I got the results from 3.10.101 (only the pgbench data), and it 
looks like this:


 3.10.101   1  8 16 32 64128192

 granular-locking2582  18492  33416  49583  53759  53572  51295
 no-content-lock 2580  18666  33860  49976  54382  54012  51549
 group-update2635  18877  33806  49525  54787  54117  51718
 master  2630  18783  33630  49451  54104  53199  50497

So 3.10.101 performs even better tnan 3.2.80 (and much better than 
4.5.5), and there's no sign any of the patches making a difference.


It also seems there's a major regression in the kernel, somewhere 
between 3.10 and 4.5. With 64 clients, 3.10 does ~54k transactions, 
while 4.5 does only ~32k - that's helluva difference.


I wonder if this might be due to running the benchmark on unlogged 
tables (and thus not waiting for WAL), but I don't see why that should 
result in such drop on a new kernel.


In any case, this seems like an issue unrelated to the patch, so I'll 
post further data into a new thread instead of hijacking this one.


regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Speed up Clog Access by increasing CLOG buffers

2016-09-27 Thread Dilip Kumar
On Wed, Sep 21, 2016 at 8:47 AM, Dilip Kumar  wrote:
> Summary:
> --
> At 32 clients no gain, I think at this workload Clog Lock is not a problem.
> At 64 Clients we can see ~10% gain with simple update and ~5% with TPCB.
> At 128 Clients we can see > 50% gain.
>
> Currently I have tested with synchronous commit=off, later I can try
> with on. I can also test at 80 client, I think we will see some
> significant gain at this client count also, but as of now I haven't
> yet tested.
>
> With above results, what we think ? should we continue our testing ?

I have done further testing with on TPCB workload to see the impact on
performance gain by increasing scale factor.

Again at 32 client there is no gain, but at 64 client gain is 12% and
at 128 client it's 75%, it shows that improvement with group lock is
better at higher scale factor (at 300 scale factor gain was 5% at 64
client and 50% at 128 clients).

8 socket machine (kernel 3.10)
10 min run(median of 3 run)
synchronous_commit=off
scal factor = 1000
share buffer= 40GB

Test results:


client  head group lock
32  27496   27178
64  31275   35205
1282065634490


LWLOCK_STATS approx. block count on ClogControl Lock ("lwlock main 11")

client  head  group lock
32  8  6
6415 10
128  14   7

Note: These are approx. block count, I have detailed result of
LWLOCK_STAT, incase someone wants to look into.


LWLOCK_STATS shows that ClogControl lock block count reduced by 25% at
32 client, 33% at 64 client and 50% at 128 client.

Conclusion:
1. I think both  LWLOCK_STATS and performance data shows that we get
significant contention reduction on ClogControlLock with the patch.
2. It also shows that though we are not seeing any performance gain at
32 clients, but there is contention reduction with patch.

I am planning to do some more test with higher scale factor (3000 or more).

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Speed up Clog Access by increasing CLOG buffers

2016-09-26 Thread Tomas Vondra

On 09/26/2016 07:16 PM, Tomas Vondra wrote:


The averages (over the 10 runs, 5 minute each) look like this:

 3.2.80 1  8 16 32 64128192

 granular-locking1567  12146  26341  44188  43263  49590  15042
 no-content-lock 1567  12180  25549  43787  43675  51800  16831
 group-update1550  12018  26121  44451  42734  51455  15504
 master  1566  12057  25457  42299  42513  42562  10462

 4.5.5  1  8 16 32 64128192

 granular-locking3018  19031  27394  29222  32032  34249  36191
 no-content-lock 2988  18871  27384  29260  32120  34456  36216
 group-update2960  18848  26870  29025  32078  34259  35900
 master  2984  18917  26430  29065  32119  33924  35897

That is:

(1) The 3.2.80 performs a bit better than before, particularly for 128
and 256 clients - I'm not sure if it's thanks to the reboots or so.

(2) 4.5.5 performs measurably worse for >= 32 clients (by ~30%). That's
a pretty significant regression, on a fairly common workload.



FWIW, now that I think about this, the regression is roughly in line 
with my findings presented in my recent blog post:


http://blog.2ndquadrant.com/postgresql-vs-kernel-versions/

Those numbers were collected on a much smaller machine (2/4 cores only), 
which might be why the difference observed on 32-core machine is much 
more significant.


regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Speed up Clog Access by increasing CLOG buffers

2016-09-26 Thread Robert Haas
On Fri, Sep 23, 2016 at 9:20 AM, Amit Kapila  wrote:
> On Fri, Sep 23, 2016 at 6:50 AM, Robert Haas  wrote:
>> On Thu, Sep 22, 2016 at 7:44 PM, Tomas Vondra
>>  wrote:
>>> I don't dare to suggest rejecting the patch, but I don't see how we could
>>> commit any of the patches at this point. So perhaps "returned with feedback"
>>> and resubmitting in the next CF (along with analysis of improved workloads)
>>> would be appropriate.
>>
>> I think it would be useful to have some kind of theoretical analysis
>> of how much time we're spending waiting for various locks.  So, for
>> example, suppose we one run of these tests with various client counts
>> - say, 1, 8, 16, 32, 64, 96, 128, 192, 256 - and we run "select
>> wait_event from pg_stat_activity" once per second throughout the test.
>> Then we see how many times we get each wait event, including NULL (no
>> wait event).  Now, from this, we can compute the approximate
>> percentage of time we're spending waiting on CLogControlLock and every
>> other lock, too, as well as the percentage of time we're not waiting
>> for lock.  That, it seems to me, would give us a pretty clear idea
>> what the maximum benefit we could hope for from reducing contention on
>> any given lock might be.
>>
> As mentioned earlier, such an activity makes sense, however today,
> again reading this thread, I noticed that Dilip has already posted
> some analysis of lock contention upthread [1].  It is clear that patch
> has reduced LWLock contention from ~28% to ~4% (where the major
> contributor was TransactionIdSetPageStatus which has reduced from ~53%
> to ~3%).  Isn't it inline with what you are looking for?

Hmm, yes.  But it's a little hard to interpret what that means; I
think the test I proposed in the quoted material above would provide
clearer data.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Speed up Clog Access by increasing CLOG buffers

2016-09-24 Thread Tomas Vondra

On 09/24/2016 06:06 AM, Amit Kapila wrote:

On Fri, Sep 23, 2016 at 8:22 PM, Tomas Vondra
 wrote:

...

>>

So I'm using 16GB shared buffers (so with scale 300 everything fits into
shared buffers), min_wal_size=16GB, max_wal_size=128GB, checkpoint timeout
1h etc. So no, there are no checkpoints during the 5-minute runs, only those
triggered explicitly before each run.



Thanks for clarification.  Do you think we should try some different
settings *_flush_after parameters as those can help in reducing spikes
in writes?



I don't see why that settings would matter. The tests are on unlogged 
tables, so there's almost no WAL traffic and checkpoints (triggered 
explicitly before each run) look like this:


checkpoint complete: wrote 17 buffers (0.0%); 0 transaction log file(s) 
added, 0 removed, 13 recycled; write=0.062 s, sync=0.006 s, total=0.092 
s; sync files=10, longest=0.004 s, average=0.000 s; distance=309223 kB, 
estimate=363742 kB


So I don't see how tuning the flushing would change anything, as we're 
not doing any writes.


Moreover, the machine has a bunch of SSD drives (16 or 24, I don't 
remember at the moment), behind a RAID controller with 2GB of write 
cache on it.



Also, I think instead of 5 mins, read-write runs should be run for 15
mins to get consistent data.



Where does the inconsistency come from?


Thats what I am also curious to know.


Lack of warmup?


Can't say, but at least we should try to rule out the possibilities.
I think one way to rule out is to do slightly longer runs for
Dilip's test cases and for pgbench we might need to drop and
re-create database after each reading.



My point is that it's unlikely to be due to insufficient warmup, because 
the inconsistencies appear randomly - generally you get a bunch of slow 
runs, one significantly faster one, then slow ones again.


I believe the runs to be sufficiently long. I don't see why recreating 
the database would be useful - the whole point is to get the database 
and shared buffers into a stable state, and then do measurements on it.


I don't think bloat is a major factor here - I'm collecting some 
additional statistics during this run, including pg_database_size, and I 
can see the size oscillates between 4.8GB and 5.4GB. That's pretty 
negligible, I believe.


I'll let the current set of benchmarks complete - it's running on 4.5.5 
now, I'll do tests on 3.2.80 too.


Then we can re-evaluate if longer runs are needed.


Considering how uniform the results from the 10 runs are (at least
on 4.5.5), I claim  this is not an issue.



It is quite possible that it is some kernel regression which might
be fixed in later version. Like we are doing most tests in cthulhu
which has 3.10 version of kernel and we generally get consistent
results. I am not sure if later version of kernel say 4.5.5 is a net
win, because there is a considerable difference (dip) of performance
in that version, though it produces quite stable results.



Well, the thing is - the 4.5.5 behavior is much nicer in general. I'll 
always prefer lower but more consistent performance (in most cases). In 
any case, we're stuck with whatever kernel version the people are using, 
and they're likely to use the newer ones.


regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Speed up Clog Access by increasing CLOG buffers

2016-09-23 Thread Amit Kapila
On Fri, Sep 23, 2016 at 8:22 PM, Tomas Vondra
 wrote:
> On 09/23/2016 03:07 PM, Amit Kapila wrote:
>>
>> On Fri, Sep 23, 2016 at 6:16 PM, Tomas Vondra
>>  wrote:
>>>
>>> On 09/23/2016 01:44 AM, Tomas Vondra wrote:


 ...
 The 4.5 kernel clearly changed the results significantly:

>>> ...



 (c) Although it's not visible in the results, 4.5.5 almost perfectly
 eliminated the fluctuations in the results. For example when 3.2.80
 produced this results (10 runs with the same parameters):

 12118 11610 27939 11771 18065
 12152 14375 10983 13614 11077

 we get this on 4.5.5

 37354 37650 37371 37190 37233
 38498 37166 36862 37928 38509

 Notice how much more even the 4.5.5 results are, compared to 3.2.80.

>>>
>>> The more I think about these random spikes in pgbench performance on
>>> 3.2.80,
>>> the more I find them intriguing. Let me show you another example (from
>>> Dilip's workload and group-update patch on 64 clients).
>>>
>>> This is on 3.2.80:
>>>
>>>   44175  34619  51944  38384  49066
>>>   37004  47242  36296  46353  36180
>>>
>>> and on 4.5.5 it looks like this:
>>>
>>>   34400  35559  35436  34890  34626
>>>   35233  35756  34876  35347  35486
>>>
>>> So the 4.5.5 results are much more even, but overall clearly below
>>> 3.2.80.
>>> How does 3.2.80 manage to do ~50k tps in some of the runs? Clearly we
>>> randomly do something right, but what is it and why doesn't it happen on
>>> the
>>> new kernel? And how could we do it every time?
>>>
>>
>> As far as I can see you are using default values of min_wal_size,
>> max_wal_size, checkpoint related params, have you changed default
>> shared_buffer settings, because that can have a bigger impact.
>
>
> Huh? Where do you see me using default values?
>

I was referring to one of your script @ http://bit.ly/2doY6ID.  I
haven't noticed that you have changed default values in
postgresql.conf.

> There are settings.log with a
> dump of pg_settings data, and the modified values are
>
> checkpoint_completion_target = 0.9
> checkpoint_timeout = 3600
> effective_io_concurrency = 32
> log_autovacuum_min_duration = 100
> log_checkpoints = on
> log_line_prefix = %m
> log_timezone = UTC
> maintenance_work_mem = 524288
> max_connections = 300
> max_wal_size = 8192
> min_wal_size = 1024
> shared_buffers = 2097152
> synchronous_commit = on
> work_mem = 524288
>
> (ignoring some irrelevant stuff like locales, timezone etc.).
>
>> Using default values of mentioned parameters can lead to checkpoints in
>> between your runs.
>
>
> So I'm using 16GB shared buffers (so with scale 300 everything fits into
> shared buffers), min_wal_size=16GB, max_wal_size=128GB, checkpoint timeout
> 1h etc. So no, there are no checkpoints during the 5-minute runs, only those
> triggered explicitly before each run.
>

Thanks for clarification.  Do you think we should try some different
settings *_flush_after parameters as those can help in reducing spikes
in writes?

>> Also, I think instead of 5 mins, read-write runs should be run for 15
>> mins to get consistent data.
>
>
> Where does the inconsistency come from?

Thats what I am also curious to know.

> Lack of warmup?

Can't say, but at least we should try to rule out the possibilities.
I think one way to rule out is to do slightly longer runs for Dilip's
test cases and for pgbench we might need to drop and re-create
database after each reading.

> Considering how
> uniform the results from the 10 runs are (at least on 4.5.5), I claim this
> is not an issue.
>

It is quite possible that it is some kernel regression which might be
fixed in later version.  Like we are doing most tests in cthulhu which
has 3.10 version of kernel and we generally get consistent results.
I am not sure if later version of kernel say 4.5.5 is a net win,
because there is a considerable difference (dip) of performance in
that version, though it produces quite stable results.


-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Speed up Clog Access by increasing CLOG buffers

2016-09-23 Thread Tomas Vondra

On 09/23/2016 02:59 PM, Pavan Deolasee wrote:



On Fri, Sep 23, 2016 at 6:05 PM, Tomas Vondra
> wrote:

On 09/23/2016 05:10 AM, Amit Kapila wrote:

On Fri, Sep 23, 2016 at 5:14 AM, Tomas Vondra
> wrote:

On 09/21/2016 08:04 AM, Amit Kapila wrote:



(c) Although it's not visible in the results, 4.5.5 almost
perfectly
eliminated the fluctuations in the results. For example when
3.2.80 produced
this results (10 runs with the same parameters):

12118 11610 27939 11771 18065
12152 14375 10983 13614 11077

we get this on 4.5.5

37354 37650 37371 37190 37233
38498 37166 36862 37928 38509

Notice how much more even the 4.5.5 results are, compared to
3.2.80.


how long each run was?  Generally, I do half-hour run to get
stable results.


10 x 5-minute runs for each client count. The full shell script
driving the benchmark is here: http://bit.ly/2doY6ID and in short it
looks like this:

for r in `seq 1 $runs`; do
for c in 1 8 16 32 64 128 192; do
psql -c checkpoint
pgbench -j 8 -c $c ...
done
done



I see couple of problems with the tests:

1. You're running regular pgbench, which also updates the small
tables. At scale 300 and higher clients, there is going to heavy
contention on the pgbench_branches table. Why not test with pgbench
-N?


Sure, I can do a bunch of tests with pgbench -N. Good point.

But notice that I've also done the testing with Dilip's workload, and 
the results are pretty much the same.


regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Speed up Clog Access by increasing CLOG buffers

2016-09-23 Thread Tomas Vondra

On 09/23/2016 03:07 PM, Amit Kapila wrote:

On Fri, Sep 23, 2016 at 6:16 PM, Tomas Vondra
 wrote:

On 09/23/2016 01:44 AM, Tomas Vondra wrote:


...
The 4.5 kernel clearly changed the results significantly:


...



(c) Although it's not visible in the results, 4.5.5 almost perfectly
eliminated the fluctuations in the results. For example when 3.2.80
produced this results (10 runs with the same parameters):

12118 11610 27939 11771 18065
12152 14375 10983 13614 11077

we get this on 4.5.5

37354 37650 37371 37190 37233
38498 37166 36862 37928 38509

Notice how much more even the 4.5.5 results are, compared to 3.2.80.



The more I think about these random spikes in pgbench performance on 3.2.80,
the more I find them intriguing. Let me show you another example (from
Dilip's workload and group-update patch on 64 clients).

This is on 3.2.80:

  44175  34619  51944  38384  49066
  37004  47242  36296  46353  36180

and on 4.5.5 it looks like this:

  34400  35559  35436  34890  34626
  35233  35756  34876  35347  35486

So the 4.5.5 results are much more even, but overall clearly below 3.2.80.
How does 3.2.80 manage to do ~50k tps in some of the runs? Clearly we
randomly do something right, but what is it and why doesn't it happen on the
new kernel? And how could we do it every time?



As far as I can see you are using default values of min_wal_size,
max_wal_size, checkpoint related params, have you changed default
shared_buffer settings, because that can have a bigger impact.


Huh? Where do you see me using default values? There are settings.log 
with a dump of pg_settings data, and the modified values are


checkpoint_completion_target = 0.9
checkpoint_timeout = 3600
effective_io_concurrency = 32
log_autovacuum_min_duration = 100
log_checkpoints = on
log_line_prefix = %m
log_timezone = UTC
maintenance_work_mem = 524288
max_connections = 300
max_wal_size = 8192
min_wal_size = 1024
shared_buffers = 2097152
synchronous_commit = on
work_mem = 524288

(ignoring some irrelevant stuff like locales, timezone etc.).


Using default values of mentioned parameters can lead to checkpoints in
between your runs.


So I'm using 16GB shared buffers (so with scale 300 everything fits into 
shared buffers), min_wal_size=16GB, max_wal_size=128GB, checkpoint 
timeout 1h etc. So no, there are no checkpoints during the 5-minute 
runs, only those triggered explicitly before each run.



Also, I think instead of 5 mins, read-write runs should be run for 15
mins to get consistent data.


Where does the inconsistency come from? Lack of warmup? Considering how 
uniform the results from the 10 runs are (at least on 4.5.5), I claim 
this is not an issue.



For Dilip's workload where he is using only Select ... For Update, i
think it is okay, but otherwise you need to drop and re-create the
database between each run, otherwise data bloat could impact the
readings.


And why should it affect 3.2.80 and 4.5.5 differently?



I think in general, the impact should be same for both the kernels
because you are using same parameters, but I think if use
appropriate parameters, then you can get consistent results for
3.2.80. I have also seen variation in read-write tests, but the
variation you are showing is really a matter of concern, because it
will be difficult to rely on final data.



Both kernels use exactly the same parameters (fairly tuned, IMHO).


--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Speed up Clog Access by increasing CLOG buffers

2016-09-23 Thread Amit Kapila
On Fri, Sep 23, 2016 at 6:50 AM, Robert Haas  wrote:
> On Thu, Sep 22, 2016 at 7:44 PM, Tomas Vondra
>  wrote:
>> I don't dare to suggest rejecting the patch, but I don't see how we could
>> commit any of the patches at this point. So perhaps "returned with feedback"
>> and resubmitting in the next CF (along with analysis of improved workloads)
>> would be appropriate.
>
> I think it would be useful to have some kind of theoretical analysis
> of how much time we're spending waiting for various locks.  So, for
> example, suppose we one run of these tests with various client counts
> - say, 1, 8, 16, 32, 64, 96, 128, 192, 256 - and we run "select
> wait_event from pg_stat_activity" once per second throughout the test.
> Then we see how many times we get each wait event, including NULL (no
> wait event).  Now, from this, we can compute the approximate
> percentage of time we're spending waiting on CLogControlLock and every
> other lock, too, as well as the percentage of time we're not waiting
> for lock.  That, it seems to me, would give us a pretty clear idea
> what the maximum benefit we could hope for from reducing contention on
> any given lock might be.
>

As mentioned earlier, such an activity makes sense, however today,
again reading this thread, I noticed that Dilip has already posted
some analysis of lock contention upthread [1].  It is clear that patch
has reduced LWLock contention from ~28% to ~4% (where the major
contributor was TransactionIdSetPageStatus which has reduced from ~53%
to ~3%).  Isn't it inline with what you are looking for?


[1] - 
https://www.postgresql.org/message-id/CAFiTN-u-XEzhd%3DhNGW586fmQwdTy6Qy6_SXe09tNB%3DgBcVzZ_A%40mail.gmail.com

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Speed up Clog Access by increasing CLOG buffers

2016-09-23 Thread Amit Kapila
On Fri, Sep 23, 2016 at 6:29 PM, Pavan Deolasee
 wrote:
> On Fri, Sep 23, 2016 at 6:05 PM, Tomas Vondra 
> wrote:
>>
>> On 09/23/2016 05:10 AM, Amit Kapila wrote:
>>>
>>> On Fri, Sep 23, 2016 at 5:14 AM, Tomas Vondra
>>>  wrote:

 On 09/21/2016 08:04 AM, Amit Kapila wrote:
>
>

 (c) Although it's not visible in the results, 4.5.5 almost perfectly
 eliminated the fluctuations in the results. For example when 3.2.80
 produced
 this results (10 runs with the same parameters):

 12118 11610 27939 11771 18065
 12152 14375 10983 13614 11077

 we get this on 4.5.5

 37354 37650 37371 37190 37233
 38498 37166 36862 37928 38509

 Notice how much more even the 4.5.5 results are, compared to 3.2.80.

>>>
>>> how long each run was?  Generally, I do half-hour run to get stable
>>> results.
>>>
>>
>> 10 x 5-minute runs for each client count. The full shell script driving
>> the benchmark is here: http://bit.ly/2doY6ID and in short it looks like
>> this:
>>
>> for r in `seq 1 $runs`; do
>> for c in 1 8 16 32 64 128 192; do
>> psql -c checkpoint
>> pgbench -j 8 -c $c ...
>> done
>> done
>
>
>
> I see couple of problems with the tests:
>
> 1. You're running regular pgbench, which also updates the small tables. At
> scale 300 and higher clients, there is going to heavy contention on the
> pgbench_branches table. Why not test with pgbench -N? As far as this patch
> is concerned, we are only interested in seeing contention on
> ClogControlLock. In fact, how about a test which only consumes an XID, but
> does not do any write activity at all? Complete artificial workload, but
> good enough to tell us if and how much the patch helps in the best case. We
> can probably do that with a simple txid_current() call, right?
>

Right, that is why in the initial tests done by Dilip, he has used
Select .. for Update.  I think using txid_current will generate lot of
contention on XidGenLock which will mask the contention around
CLOGControlLock, in-fact we have tried that.


-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


  1   2   3   >