HIGH IO and Less CPU utilization

2022-03-29 Thread Rambabu g
Hi All,

We have an issue with high load and IO Wait's but less cpu on postgres
Database, The emp Table size is around 500GB, and the connections are very
less.

Please suggest to us do we need to change and config parameters at system
level or Postgres configuration.

postgres=# select version();

 version


--

 PostgreSQL 11.15 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.8.5
20150623 (Red Hat 4.8.5-44), 64-bit

(1 row)


postgres=# \q


*Postgres Parameters Setting :*


shared_buffers=12GB
work_mem=128MB
effective_cache_size=48GB
maintenance_work_mem=2GB
max_connections=500


14428 | 04:45:59.712892 | active  | INSERT INTO target (empno, name)
 SELECT
empno, '' AS name FROM (select distinct  empno  from emp where sname='test'
and tp='EMP NAME 1' LIMIT 10) AS query   ;



  PID USER  PR  NIVIRTRESSHR S  %CPU %MEM TIME+
COMMAND


14428 postgres  20   0   12.6g  12.2g  12.2g D   5.3 13.3   4:43.57
postgres:  postgres   postgres (59436) INSERT


29136 postgres  20   0   12.6g 401812 398652 D   4.7  0.4   0:01.20
postgres:  postgres   postgres (48220) SELECT


29119 postgres  20   0   12.6g 677704 674064 S   3.3  0.7   0:02.05
postgres:  postgres   postgres (37684) idle


29121 postgres  20   0   12.6g 758428 755252 S   3.0  0.8   0:02.33
postgres:  postgres   postgres (57392) idle


29166 postgres  20   0   12.6g 260436 257408 S   3.0  0.3   0:00.63
postgres:  postgres   postgres (59424) idle


29181 postgres  20   0   12.6g 179136 175860 D   2.3  0.2   0:00.18
postgres:  postgres   postgres (57092) SELECT


29129 postgres  20   0   12.6g 442444 439212 S   1.7  0.5   0:01.33
postgres:  postgres   postgres (36560) idle



-bash-4.2$ cat /etc/redhat-release

Red Hat Enterprise Linux Server release 7.9 (Maipo)

-bash-4.2$ uname

Linux

-bash-4.2$ uname -a

Linux ip.ec2.internal 3.10.0-1160.59.1.el7.x86_64 #1 SMP Wed Feb 16
12:17:35 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux

-bash-4.2$ top


top - 17:02:52 up 1 day,  1:44,  2 users,  load average: 11.60, 22.27, 22.22

Tasks:* 316 *total,*   1 *running,* 315 *sleeping,*   0 *stopped,*   0 *
zombie

%Cpu(s):*  0.5 *us,*  0.5 *sy,*  0.0 *ni,* 92.0 *id,*  7.0 *wa,*  0.0 *hi,
*  0.0 *si,*  0.0 *st

KiB Mem :* 96639952 *total,*   483896 *free,*  1693960 *used,* 94462096 *
buff/cache

KiB Swap:*0 *total,*0 *free,*0 *used.* 81408928 *avail
Mem





-bash-4.2$ iostat -x

Linux 3.10.0-1160.59.1.el7.x86_64 (ip.ec2.internal)  03/29/2022  _x86_64_ (24
CPU)


avg-cpu:  %user   %nice %system %iowait  %steal   %idle

   0.330.000.247.540.00   91.88


Device: rrqm/s   wrqm/s r/s w/srkB/swkB/s avgrq-sz
avgqu-sz   await r_await w_await  svctm  %util

nvme1n1   0.00 3.45 1042.22   29.88 41998.88  1476.75
  81.10 7.617.106.62   23.70   0.40  43.19

nvme2n1   0.00 0.020.021.06
  0.15   268.80   497.00 0.09   80.870.85   82.56   1.40   0.15

nvme0n1   0.00 0.010.210.08 4.94 7.07
  81.37 0.006.880.61   22.83   0.64   0.02


-bash-4.2$ vmstat -a

procs ---memory-- ---swap-- -io -system--
--cpu-

 r  b   swpd   free  inact active   si   sobibo   in   cs us sy id
wa st

 1  8  0 476180 40092640 5304349200  1753732   14  0  0
92  8  0

-bash-4.2$ vmstat -d

disk- reads writes---
-IO--

   total merged sectors  ms  total merged sectors  mscur
  sec

nvme1n1 99492480  0 8016369922 658540488 2849690 329519 281661496
67518819  0  41210

nvme2n1   2126  0   279461811 101078   2312 51264208 8344670
  0144

nvme0n1  20254  6  942763   12340   7953641 1348866  181438
  0 18


-bash-4.2$ sar

Linux 3.10.0-1160.59.1.el7.x86_64 (ip.ec2.internal) 03/29/2022 _x86_64_ (24
CPU)


04:20:01 PM CPU %user %nice   %system   %iowait%steal
%idle

04:30:01 PM all  0.70  0.00  0.69 27.92  0.00
70.68

04:40:01 PM all  0.71  0.00  0.70 27.76  0.00
70.84

04:50:01 PM all  0.70  0.00  0.69 26.34  0.00
72.27

05:00:01 PM all  0.70  0.00  0.68 27.32  0.00
71.31

05:10:01 PM all  0.70  0.00  0.69 27.83  0.00
70.77

05:20:01 PM all  0.70  0.00  0.69 28.16  0.00
70.45

05:30:01 PM all  0.71  0.00  0.69 26.62  0.00
71.98

05:40:01 PM all  0.69  0.00  0.68 25.77  0.00
72.85

Average:all  0.70  0.00  0.69 27.21  0.00
71.40

-bash-4.2$

-bash-4.2$ free -g

  totalusedfree  shared  buff/cache
available

Mem:

Re: HIGH IO and Less CPU utilization

2022-03-29 Thread Justin Pryzby
Hi,

Thanks for providing all this info.

On Tue, Mar 29, 2022 at 11:34:18PM +0530, Rambabu g wrote:
> Hi All,
> 
> We have an issue with high load and IO Wait's but less cpu on postgres
> Database, The emp Table size is around 500GB, and the connections are very
> less.

What indexes are defined on this table ?
How large are they ?

> Red Hat Enterprise Linux Server release 7.9 (Maipo)
>  PostgreSQL 11.15 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.8.5 
> 20150623 (Red Hat 4.8.5-44), 64-bit
> 
> shared_buffers=12GB
> work_mem=128MB

> 14428 | 04:45:59.712892 | active  | INSERT INTO target (empno, name)
> SELECT empno, '' AS name FROM (select distinct  empno  from emp where 
> sname='test'
> and tp='EMP NAME 1' LIMIT 10) AS query   ;

Is the only only problem query, or just one example or ??
Are your issues with loading data, querying data or both ?

> -bash-4.2$ iostat -x

It shows that you only have a few filesystems in use.
It's common to have WAL and temp_tablespaces on a separate FS.
That probably wouldn't help your performance at all, but it would help to tell
what's doing I/O.  Is there anything else running on the VM besides postgres ?

You can also check:
SELECT COUNT(1), wait_event FROM pg_stat_activity GROUP BY 2 ORDER BY 1 DESC;

And the pg_buffercache extension:
SELECT COUNT(nullif(isdirty,'f')) dirty, COUNT(1) all, COALESCE(c.relname, 
b.relfilenode::text) FROM pg_buffercache b LEFT JOIN pg_class c ON 
b.relfilenode=pg_relation_filenode(c.oid) GROUP BY 3 ORDER BY 1 DESC,2 DESC 
LIMIT 9;

> Hypervisor vendor: KVM

Are KSM or THP enabled on the hypervisor ?

tail /sys/kernel/mm/ksm/run 
/sys/kernel/mm/transparent_hugepage/khugepaged/defrag 
/sys/kernel/mm/transparent_hugepage/enabled 
/sys/kernel/mm/transparent_hugepage/defrag  
 

-- 
Justin




Re: HIGH IO and Less CPU utilization

2022-03-29 Thread Rambabu g
Hi Justin,

Thanks for the quick response and your help,  Please go through the inputs
and let me know if need to change anything at OS level parameters tune and
DB parameters.


On Tue, 29 Mar 2022 at 23:54, Justin Pryzby  wrote:

> Hi,
>
> Thanks for providing all this info.
>
> On Tue, Mar 29, 2022 at 11:34:18PM +0530, Rambabu g wrote:
> > Hi All,
> >
> > We have an issue with high load and IO Wait's but less cpu on postgres
> > Database, The emp Table size is around 500GB, and the connections are
> very
> > less.
>
> What indexes are defined on this table ?
> How large are they ?
>
>
There are three indexes defined on the table, each one is around 20 to 25GB
and the indexes is create on

postgres=# explain select distinct  empno  from emp where sname='test' and
tp='EMP NAME 1'

  QUERY PLAN


--

 HashAggregate  (cost=71899575.17..71900816.97 rows=124179 width=9)

   Group Key: empno

   ->  Gather  (cost=1000.00..71820473.80 rows=31640550 width=9)

 Workers Planned: 2

 ->  Parallel Seq Scan on emp  (cost=0.00..68655418.80
rows=13183562 width=9)

   Filter: (((sname)::text = 'test'::text) AND ((tp)::text =
'EMP NAME 1'::text)


> > Red Hat Enterprise Linux Server release 7.9 (Maipo)
> >  PostgreSQL 11.15 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.8.5
> 20150623 (Red Hat 4.8.5-44), 64-bit
> >
> > shared_buffers=12GB
> > work_mem=128MB
>
> > 14428 | 04:45:59.712892 | active  | INSERT INTO target (empno, name)
> > SELECT empno, '' AS name FROM (select distinct  empno  from emp where
> sname='test'
> > and tp='EMP NAME 1' LIMIT 10) AS query   ;
>
> Is the only only problem query, or just one example or ??
> Are your issues with loading data, querying data or both ?
>
> > -bash-4.2$ iostat -x
>
> It shows that you only have a few filesystems in use.
> It's common to have WAL and temp_tablespaces on a separate FS.
> That probably wouldn't help your performance at all, but it would help to
> tell
> what's doing I/O.  Is there anything else running on the VM besides
> postgres ?
>
>
No, the Ec2 VM is delicate to postgres DB instances only.


> You can also check:
> SELECT COUNT(1), wait_event FROM pg_stat_activity GROUP BY 2 ORDER BY 1
> DESC;
>

postgres=# SELECT COUNT(1), wait_event FROM pg_stat_activity GROUP BY 2
ORDER BY 1 DESC;

 count | wait_event

---+-

70 | ClientRead

34 | DataFileRead

 3 |

 1 | LogicalLauncherMain

 1 | WalWriterMain

 1 | BgWriterMain

 1 | AutoVacuumMain
(7 rows)


> And the pg_buffercache extension:
> SELECT COUNT(nullif(isdirty,'f')) dirty, COUNT(1) as all,
> COALESCE(c.relname, b.relfilenode::text) FROM pg_buffercache b LEFT JOIN
> pg_class c ON b.relfilenode=pg_relation_filenode(c.oid) GROUP BY 3 ORDER BY
> 1 DESC,2 DESC LIMIT 9;
>
>
postgres=# SELECT COUNT(nullif(isdirty,'f')) dirty, COUNT(1) as all,
COALESCE(c.relname, b.relfilenode::text) FROM pg_buffercache b LEFT JOIN
pg_class c ON b.relfilenode=pg_relation_filenode(c.oid) GROUP BY 3 ORDER BY
1 DESC,2 DESC LIMIT 9;

 dirty |   all   |  coalesce

---+-+

   189 |  237348 | emp_status

97 | 1214949 | emp

77 | 259 | public_group

75 | 432 | public_gid

74 | 233 | public_utpu

26 | 115 | code_evd

15 |  55 | group

15 |  49 | output

14 |  77 | output_status

(9 rows


> > Hypervisor vendor: KVM
>
> Are KSM or THP enabled on the hypervisor ?
>
> tail /sys/kernel/mm/ksm/run
> /sys/kernel/mm/transparent_hugepage/khugepaged/defrag
> /sys/kernel/mm/transparent_hugepage/enabled
> /sys/kernel/mm/transparent_hugepage/defrag
>
>
>
-bash-4.2$ tail /sys/kernel/mm/ksm/run
/sys/kernel/mm/transparent_hugepage/khugepaged/defrag
/sys/kernel/mm/transparent_hugepage/enabled
/sys/kernel/mm/transparent_hugepage/defrag

==> /sys/kernel/mm/ksm/run <==

0


==> /sys/kernel/mm/transparent_hugepage/khugepaged/defrag <==

1


==> /sys/kernel/mm/transparent_hugepage/enabled <==

[always] madvise never


==> /sys/kernel/mm/transparent_hugepage/defrag <==

[always] madvise never


> --
> Justin
>


Regards,
Rambabu.


Re: HIGH IO and Less CPU utilization

2022-03-29 Thread Justin Pryzby
On Wed, Mar 30, 2022 at 12:52:05AM +0530, Rambabu g wrote:
> > What indexes are defined on this table ?
> > How large are they ?
>
> There are three indexes defined on the table, each one is around 20 to 25GB
> and the indexes is create on

Did you mean to say something else after "on" ?

Show the definition of the indexes from psql \d

> postgres=# explain select distinct  empno  from emp where sname='test' and 
> tp='EMP NAME 1'

Is this the only query that's performing poorly ?
You should send explain (analyze,buffers) for the prolematic queries.

> > > Hypervisor vendor: KVM
> >
> > Are KSM or THP enabled on the hypervisor ?

> No, the Ec2 VM is delicate to postgres DB instances only.

Oh, so this is an EC2 and you cannot change the hypervisor itself.

> -bash-4.2$ tail /sys/kernel/mm/ksm/run 
> /sys/kernel/mm/transparent_hugepage/khugepaged/defrag 
> /sys/kernel/mm/transparent_hugepage/enabled 
> /sys/kernel/mm/transparent_hugepage/defrag
...
> ==> /sys/kernel/mm/transparent_hugepage/defrag <==
> [always] madvise never

I doubt it will help, but you could try disabling these.
It's a quick experiment anyway.




Re: HIGH IO and Less CPU utilization

2022-03-29 Thread Rambabu g
Hi Justin,

Only one query is causing the issue, sharing the def of indexes. Please
have a look.



On Wed, 30 Mar 2022 at 01:09, Justin Pryzby  wrote:

> On Wed, Mar 30, 2022 at 12:52:05AM +0530, Rambabu g wrote:
> > > What indexes are defined on this table ?
> > > How large are they ?
> >
> > There are three indexes defined on the table, each one is around 20 to
> 25GB
> > and the indexes is create on
>
> Did you mean to say something else after "on" ?
>
> Show the definition of the indexes from psql \d
>

Index Definition :

postgres=#\d+ idx_empno

  Index "l2.pd_activity_empi"

 Column |  Type   | Key? | Definition | Storage  | Stats
target

+-+--++--+--

 empno   | character varying(2000) | yes  | empno   | extended |

btree, for table "emp"


postgres=#\d+ id_dt

   Index "dt"

 Column |Type | Key? | Definition | Storage | Stats
target

+-+--++-+--

   dt  | timestamp without time zone  | yes  | dt  | plain   |

btree, for table "emp"


postgres=#\d+ idx_tp

  Index "idx_tp"

 Column |  Type   | Key? | Definition | Storage  | Stats
target

+-+--++--+--

 tp| character varying(2000)   | yes| tp   | extended |

btree, for table "emp"




Query is  been running  for 30min.

> postgres=# explain select distinct  empno  from emp where sname='test'
> and tp='EMP NAME 1'
>
> Is this the only query that's performing poorly ?
> You should send explain (analyze,buffers) for the prolematic queries.
>


postgres=# select pid,(now()-query_start) as
age,wait_event_type,wait_event,query from pg_stat_activity where
state!='idle';

  pid  |   age   | wait_event_type |  wait_event   |
query


---+-+-+---+---

 32154 | 00:09:56.131136 | IPC | ExecuteGather | explain
analyze select distinct  empno  from emp where sname='test' and tp='EMP
NAME 1'

   847 | 00:09:56.131136 | IO  | DataFileRead  | explain
analyze select distinct  empno  from emp where sname='test' and tp='EMP
NAME 1'

   848 | 00:09:56.131136 | IO  | DataFileRead  | explain
analyze select distinct  empno  from emp where sname='test' and tp='EMP
NAME 1'

   849 | 00:09:56.131136 | IO  | DataFileRead  | explain
analyze select distinct  empno  from emp where sname='test' and tp='EMP
NAME 1'

   850 | 00:09:56.131136 | IO  | DataFileRead  | explain
analyze select distinct  empno  from emp where sname='test' and tp='EMP
NAME 1'

   851 | 00:09:56.131136 | IO  | DataFileRead  | explain
analyze select distinct  empno  from emp where sname='test' and tp='EMP
NAME 1'

   852 | 00:09:56.131136 | IO  | DataFileRead  | explain
analyze select distinct  empno  from emp where sname='test' and tp='EMP
NAME 1'

   645 | 00:00:00| |   | select
pid,(now()-query_start) as age,wait_event_type,wait_event,query from
pg_stat_activity where state!='idle'




postgres=# SELECT COUNT(nullif(isdirty,'f')) dirty, COUNT(1) as all,
COALESCE(c.relname, b.relfilenode::text) FROM pg_buffercache b LEFT JOIN
pg_class c ON b.relfilenode=pg_relation_filenode(c.oid) GROUP BY 3 ORDER BY
1 DESC,2 DESC LIMIT 9;

 dirty |   all   |coalesce

---+-+-

32 | 136 | fn_deployment

18 | 176 | fn_deployment_key

 8 |  12 | event_logs_pkey

 6 | 157 | event_logs

 1 | 355 | pg_class

 0 | 2890261 |

 0 |  252734 | utput_status

 0 | 378 | emp

 0 | 299 | 1249

(9 rows)



-bash-4.2$ sar

Linux 3.10.0-1160.59.1.el7.x86_64 (ip-10-54-145-108.ec2.internal)
03/30/2022  _x86_64_ (24 CPU)


12:00:01 AM CPU %user %nice   %system   %iowait%steal
%idle

12:10:01 AM all  1.19  0.00  0.82 36.17  0.00
61.81

12:20:01 AM all  0.72  0.00  0.75 35.59  0.00
62.94

12:30:01 AM all  0.74  0.00  0.77 35.04  0.00
63.46

12:40:02 AM all  0.74  0.00  0.76 34.65  0.00
63.85

12:50:01 AM all  0.77  0.00  0.78 33.36  0.00
65.09

01:00:01 AM all  0.83  0.00  0.78 27.46  0.00
70.93

01:10:01 AM all  0.85  0.00  0.78 30.11  0.00
68.26

01:20:01 AM all  0.70  0.00  0.61 20.46  0.00
78.24

01:30:01 AM all  0.15  0.00  0.06  0.02  0.00
99.77

01:40:01 AM all