Re: [HACKERS] Moving relation extension locks out of heavyweight lock manager

Mahendra Singh Thalor Wed, 05 Feb 2020 12:28:10 -0800

On Wed, 5 Feb 2020 at 12:07, Masahiko Sawada <sawada.m...@gmail.com> wrote:
>
> On Mon, Feb 3, 2020 at 8:03 PM Amit Kapila <amit.kapil...@gmail.com>
wrote:
> >
> > On Tue, Jun 26, 2018 at 12:47 PM Masahiko Sawada <sawada.m...@gmail.com>
wrote:
> > >
> > > On Fri, Apr 27, 2018 at 4:25 AM, Robert Haas <robertmh...@gmail.com>
wrote:
> > > > On Thu, Apr 26, 2018 at 3:10 PM, Andres Freund <and...@anarazel.de>
wrote:
> > > >>> I think the real question is whether the scenario is common
enough to
> > > >>> worry about.  In practice, you'd have to be extremely unlucky to
be
> > > >>> doing many bulk loads at the same time that all happened to hash
to
> > > >>> the same bucket.
> > > >>
> > > >> With a bunch of parallel bulkloads into partitioned tables that
really
> > > >> doesn't seem that unlikely?
> > > >
> > > > It increases the likelihood of collisions, but probably decreases
the
> > > > number of cases where the contention gets really bad.
> > > >
> > > > For example, suppose each table has 100 partitions and you are
> > > > bulk-loading 10 of them at a time.  It's virtually certain that you
> > > > will have some collisions, but the amount of contention within each
> > > > bucket will remain fairly low because each backend spends only 1% of
> > > > its time in the bucket corresponding to any given partition.
> > > >
> > >
> > > I share another result of performance evaluation between current HEAD
> > > and current HEAD with v13 patch(N_RELEXTLOCK_ENTS = 1024).
> > >
> > > Type of table: normal table, unlogged table
> > > Number of child tables : 16, 64 (all tables are located on the same
tablespace)
> > > Number of clients : 32
> > > Number of trials : 100
> > > Duration: 180 seconds for each trials
> > >
> > > The hardware spec of server is Intel Xeon 2.4GHz (HT 160cores), 256GB
> > > RAM, NVMe SSD 1.5TB.
> > > Each clients load 10kB random data across all partitioned tables.
> > >
> > > Here is the result.
> > >
> > >  childs |   type   | target  |  avg_tps   | diff with HEAD
> > > --------+----------+---------+------------+------------------
> > >      16 | normal   | HEAD    |   1643.833 |
> > >      16 | normal   | Patched |  1619.5404 |      0.985222
> > >      16 | unlogged | HEAD    |  9069.3543 |
> > >      16 | unlogged | Patched |  9368.0263 |      1.032932
> > >      64 | normal   | HEAD    |   1598.698 |
> > >      64 | normal   | Patched |  1587.5906 |      0.993052
> > >      64 | unlogged | HEAD    |  9629.7315 |
> > >      64 | unlogged | Patched | 10208.2196 |      1.060073
> > > (8 rows)
> > >
> > > For normal tables, loading tps decreased 1% ~ 2% with this patch
> > > whereas it increased 3% ~ 6% for unlogged tables. There were
> > > collisions at 0 ~ 5 relation extension lock slots between 2 relations
> > > in the 64 child tables case but it didn't seem to affect the tps.
> > >
> >
> > AFAIU, this resembles the workload that Andres was worried about.   I
> > think we should once run this test in a different environment, but
> > considering this to be correct and repeatable, where do we go with
> > this patch especially when we know it improves many workloads [1] as
> > well.  We know that on a pathological case constructed by Mithun [2],
> > this causes regression as well.  I am not sure if the test done by
> > Mithun really mimics any real-world workload as he has tested by
> > making N_RELEXTLOCK_ENTS = 1 to hit the worst case.
> >
> > Sawada-San, if you have a script or data for the test done by you,
> > then please share it so that others can also try to reproduce it.
>
> Unfortunately the environment I used for performance verification is
> no longer available.
>
> I agree to run this test in a different environment. I've attached the
> rebased version patch. I'm measuring the performance with/without
> patch, so will share the results.
>


Thanks Sawada-san for patch.

>From last few days, I was reading this thread and was reviewing v13 patch.
To debug and test, I did re-base of v13 patch. I compared my re-based patch
and v14 patch. I think,  ordering of header files is not alphabetically in
v14 patch. (I haven't reviewed v14 patch fully because before review, I
wanted to test false sharing).  While debugging, I didn't noticed any hang
or lock related issue.

I did some testing to test false sharing(bulk insert, COPY data, bulk
insert into partitions tables).  Below is the testing summary.


*Test setup(Bulk insert into partition tables):*
autovacuum=off
shared_buffers=512MB -c max_wal_size=20GB -c checkpoint_timeout=12min

Basically, I created a table with 13 partitions. Using pgbench, I inserted
bulk data. I used below pgbench command:
*./pgbench -c $threads -j $threads -T 180 -f insert1.sql@1 -f insert2.sql@1
-f insert3.sql@1 -f insert4.sql@1 postgres*

I took scripts from previews mails and modified. For reference, I am
attaching test scripts.  I tested with default 1024 slots(N_RELEXTLOCK_ENTS
= 1024).


*Clients          HEAD (tps)                     With v14 patch (tps)
%change      (time: 180s)*
1                    92.979796
100.877446                     +8.49 %
32                   392.881863
388.470622                    -1.12 %
56                   551.753235
528.018852                   -4.30 %
60                   648.273767
653.251507                   +0.76 %
64                   645.975124
671.322140                   +3.92 %
66                   662.728010                       673.399762
       +1.61 %
70                   647.103183
660.694914                   +2.10 %
74                   648.824027
676.487622                  +4.26 %

>From above results, we can see that in most cases, TPS is slightly
increased with v14 patch. I am still testing and will post my results.

I want to test extension lock by blocking use of fsm(use_fsm=false in
code).  I think, if we block use of fsm, then load will increase into
extension lock.  Is this correct way to test?

Please let me know if you have any specific testing scenario.

-- 
Thanks and Regards
Mahendra Singh Thalor
EnterpriseDB: http://www.enterprisedb.com

create_table.sql
Description: Binary data

insert4.sql
Description: Binary data

insert1.sql
Description: Binary data

insert2.sql
Description: Binary data

insert3.sql
Description: Binary data

run_test.sh
Description: Bourne shell script

start_server.sh
Description: Bourne shell script

Re: [HACKERS] Moving relation extension locks out of heavyweight lock manager

Reply via email to