HIGH IO and Less CPU utilization
Hi All, We have an issue with high load and IO Wait's but less cpu on postgres Database, The emp Table size is around 500GB, and the connections are very less. Please suggest to us do we need to change and config parameters at system level or Postgres configuration. postgres=# select version(); version -- PostgreSQL 11.15 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-44), 64-bit (1 row) postgres=# \q *Postgres Parameters Setting :* shared_buffers=12GB work_mem=128MB effective_cache_size=48GB maintenance_work_mem=2GB max_connections=500 14428 | 04:45:59.712892 | active | INSERT INTO target (empno, name) SELECT empno, '' AS name FROM (select distinct empno from emp where sname='test' and tp='EMP NAME 1' LIMIT 10) AS query ; PID USER PR NIVIRTRESSHR S %CPU %MEM TIME+ COMMAND 14428 postgres 20 0 12.6g 12.2g 12.2g D 5.3 13.3 4:43.57 postgres: postgres postgres (59436) INSERT 29136 postgres 20 0 12.6g 401812 398652 D 4.7 0.4 0:01.20 postgres: postgres postgres (48220) SELECT 29119 postgres 20 0 12.6g 677704 674064 S 3.3 0.7 0:02.05 postgres: postgres postgres (37684) idle 29121 postgres 20 0 12.6g 758428 755252 S 3.0 0.8 0:02.33 postgres: postgres postgres (57392) idle 29166 postgres 20 0 12.6g 260436 257408 S 3.0 0.3 0:00.63 postgres: postgres postgres (59424) idle 29181 postgres 20 0 12.6g 179136 175860 D 2.3 0.2 0:00.18 postgres: postgres postgres (57092) SELECT 29129 postgres 20 0 12.6g 442444 439212 S 1.7 0.5 0:01.33 postgres: postgres postgres (36560) idle -bash-4.2$ cat /etc/redhat-release Red Hat Enterprise Linux Server release 7.9 (Maipo) -bash-4.2$ uname Linux -bash-4.2$ uname -a Linux ip.ec2.internal 3.10.0-1160.59.1.el7.x86_64 #1 SMP Wed Feb 16 12:17:35 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux -bash-4.2$ top top - 17:02:52 up 1 day, 1:44, 2 users, load average: 11.60, 22.27, 22.22 Tasks:* 316 *total,* 1 *running,* 315 *sleeping,* 0 *stopped,* 0 * zombie %Cpu(s):* 0.5 *us,* 0.5 *sy,* 0.0 *ni,* 92.0 *id,* 7.0 *wa,* 0.0 *hi, * 0.0 *si,* 0.0 *st KiB Mem :* 96639952 *total,* 483896 *free,* 1693960 *used,* 94462096 * buff/cache KiB Swap:*0 *total,*0 *free,*0 *used.* 81408928 *avail Mem -bash-4.2$ iostat -x Linux 3.10.0-1160.59.1.el7.x86_64 (ip.ec2.internal) 03/29/2022 _x86_64_ (24 CPU) avg-cpu: %user %nice %system %iowait %steal %idle 0.330.000.247.540.00 91.88 Device: rrqm/s wrqm/s r/s w/srkB/swkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util nvme1n1 0.00 3.45 1042.22 29.88 41998.88 1476.75 81.10 7.617.106.62 23.70 0.40 43.19 nvme2n1 0.00 0.020.021.06 0.15 268.80 497.00 0.09 80.870.85 82.56 1.40 0.15 nvme0n1 0.00 0.010.210.08 4.94 7.07 81.37 0.006.880.61 22.83 0.64 0.02 -bash-4.2$ vmstat -a procs ---memory-- ---swap-- -io -system-- --cpu- r b swpd free inact active si sobibo in cs us sy id wa st 1 8 0 476180 40092640 5304349200 1753732 14 0 0 92 8 0 -bash-4.2$ vmstat -d disk- reads writes--- -IO-- total merged sectors ms total merged sectors mscur sec nvme1n1 99492480 0 8016369922 658540488 2849690 329519 281661496 67518819 0 41210 nvme2n1 2126 0 279461811 101078 2312 51264208 8344670 0144 nvme0n1 20254 6 942763 12340 7953641 1348866 181438 0 18 -bash-4.2$ sar Linux 3.10.0-1160.59.1.el7.x86_64 (ip.ec2.internal) 03/29/2022 _x86_64_ (24 CPU) 04:20:01 PM CPU %user %nice %system %iowait%steal %idle 04:30:01 PM all 0.70 0.00 0.69 27.92 0.00 70.68 04:40:01 PM all 0.71 0.00 0.70 27.76 0.00 70.84 04:50:01 PM all 0.70 0.00 0.69 26.34 0.00 72.27 05:00:01 PM all 0.70 0.00 0.68 27.32 0.00 71.31 05:10:01 PM all 0.70 0.00 0.69 27.83 0.00 70.77 05:20:01 PM all 0.70 0.00 0.69 28.16 0.00 70.45 05:30:01 PM all 0.71 0.00 0.69 26.62 0.00 71.98 05:40:01 PM all 0.69 0.00 0.68 25.77 0.00 72.85 Average:all 0.70 0.00 0.69 27.21 0.00 71.40 -bash-4.2$ -bash-4.2$ free -g totalusedfree shared buff/cache available Mem:
Re: HIGH IO and Less CPU utilization
Hi, Thanks for providing all this info. On Tue, Mar 29, 2022 at 11:34:18PM +0530, Rambabu g wrote: > Hi All, > > We have an issue with high load and IO Wait's but less cpu on postgres > Database, The emp Table size is around 500GB, and the connections are very > less. What indexes are defined on this table ? How large are they ? > Red Hat Enterprise Linux Server release 7.9 (Maipo) > PostgreSQL 11.15 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.8.5 > 20150623 (Red Hat 4.8.5-44), 64-bit > > shared_buffers=12GB > work_mem=128MB > 14428 | 04:45:59.712892 | active | INSERT INTO target (empno, name) > SELECT empno, '' AS name FROM (select distinct empno from emp where > sname='test' > and tp='EMP NAME 1' LIMIT 10) AS query ; Is the only only problem query, or just one example or ?? Are your issues with loading data, querying data or both ? > -bash-4.2$ iostat -x It shows that you only have a few filesystems in use. It's common to have WAL and temp_tablespaces on a separate FS. That probably wouldn't help your performance at all, but it would help to tell what's doing I/O. Is there anything else running on the VM besides postgres ? You can also check: SELECT COUNT(1), wait_event FROM pg_stat_activity GROUP BY 2 ORDER BY 1 DESC; And the pg_buffercache extension: SELECT COUNT(nullif(isdirty,'f')) dirty, COUNT(1) all, COALESCE(c.relname, b.relfilenode::text) FROM pg_buffercache b LEFT JOIN pg_class c ON b.relfilenode=pg_relation_filenode(c.oid) GROUP BY 3 ORDER BY 1 DESC,2 DESC LIMIT 9; > Hypervisor vendor: KVM Are KSM or THP enabled on the hypervisor ? tail /sys/kernel/mm/ksm/run /sys/kernel/mm/transparent_hugepage/khugepaged/defrag /sys/kernel/mm/transparent_hugepage/enabled /sys/kernel/mm/transparent_hugepage/defrag -- Justin
Re: HIGH IO and Less CPU utilization
Hi Justin, Thanks for the quick response and your help, Please go through the inputs and let me know if need to change anything at OS level parameters tune and DB parameters. On Tue, 29 Mar 2022 at 23:54, Justin Pryzby wrote: > Hi, > > Thanks for providing all this info. > > On Tue, Mar 29, 2022 at 11:34:18PM +0530, Rambabu g wrote: > > Hi All, > > > > We have an issue with high load and IO Wait's but less cpu on postgres > > Database, The emp Table size is around 500GB, and the connections are > very > > less. > > What indexes are defined on this table ? > How large are they ? > > There are three indexes defined on the table, each one is around 20 to 25GB and the indexes is create on postgres=# explain select distinct empno from emp where sname='test' and tp='EMP NAME 1' QUERY PLAN -- HashAggregate (cost=71899575.17..71900816.97 rows=124179 width=9) Group Key: empno -> Gather (cost=1000.00..71820473.80 rows=31640550 width=9) Workers Planned: 2 -> Parallel Seq Scan on emp (cost=0.00..68655418.80 rows=13183562 width=9) Filter: (((sname)::text = 'test'::text) AND ((tp)::text = 'EMP NAME 1'::text) > > Red Hat Enterprise Linux Server release 7.9 (Maipo) > > PostgreSQL 11.15 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.8.5 > 20150623 (Red Hat 4.8.5-44), 64-bit > > > > shared_buffers=12GB > > work_mem=128MB > > > 14428 | 04:45:59.712892 | active | INSERT INTO target (empno, name) > > SELECT empno, '' AS name FROM (select distinct empno from emp where > sname='test' > > and tp='EMP NAME 1' LIMIT 10) AS query ; > > Is the only only problem query, or just one example or ?? > Are your issues with loading data, querying data or both ? > > > -bash-4.2$ iostat -x > > It shows that you only have a few filesystems in use. > It's common to have WAL and temp_tablespaces on a separate FS. > That probably wouldn't help your performance at all, but it would help to > tell > what's doing I/O. Is there anything else running on the VM besides > postgres ? > > No, the Ec2 VM is delicate to postgres DB instances only. > You can also check: > SELECT COUNT(1), wait_event FROM pg_stat_activity GROUP BY 2 ORDER BY 1 > DESC; > postgres=# SELECT COUNT(1), wait_event FROM pg_stat_activity GROUP BY 2 ORDER BY 1 DESC; count | wait_event ---+- 70 | ClientRead 34 | DataFileRead 3 | 1 | LogicalLauncherMain 1 | WalWriterMain 1 | BgWriterMain 1 | AutoVacuumMain (7 rows) > And the pg_buffercache extension: > SELECT COUNT(nullif(isdirty,'f')) dirty, COUNT(1) as all, > COALESCE(c.relname, b.relfilenode::text) FROM pg_buffercache b LEFT JOIN > pg_class c ON b.relfilenode=pg_relation_filenode(c.oid) GROUP BY 3 ORDER BY > 1 DESC,2 DESC LIMIT 9; > > postgres=# SELECT COUNT(nullif(isdirty,'f')) dirty, COUNT(1) as all, COALESCE(c.relname, b.relfilenode::text) FROM pg_buffercache b LEFT JOIN pg_class c ON b.relfilenode=pg_relation_filenode(c.oid) GROUP BY 3 ORDER BY 1 DESC,2 DESC LIMIT 9; dirty | all | coalesce ---+-+ 189 | 237348 | emp_status 97 | 1214949 | emp 77 | 259 | public_group 75 | 432 | public_gid 74 | 233 | public_utpu 26 | 115 | code_evd 15 | 55 | group 15 | 49 | output 14 | 77 | output_status (9 rows > > Hypervisor vendor: KVM > > Are KSM or THP enabled on the hypervisor ? > > tail /sys/kernel/mm/ksm/run > /sys/kernel/mm/transparent_hugepage/khugepaged/defrag > /sys/kernel/mm/transparent_hugepage/enabled > /sys/kernel/mm/transparent_hugepage/defrag > > > -bash-4.2$ tail /sys/kernel/mm/ksm/run /sys/kernel/mm/transparent_hugepage/khugepaged/defrag /sys/kernel/mm/transparent_hugepage/enabled /sys/kernel/mm/transparent_hugepage/defrag ==> /sys/kernel/mm/ksm/run <== 0 ==> /sys/kernel/mm/transparent_hugepage/khugepaged/defrag <== 1 ==> /sys/kernel/mm/transparent_hugepage/enabled <== [always] madvise never ==> /sys/kernel/mm/transparent_hugepage/defrag <== [always] madvise never > -- > Justin > Regards, Rambabu.
Re: HIGH IO and Less CPU utilization
On Wed, Mar 30, 2022 at 12:52:05AM +0530, Rambabu g wrote: > > What indexes are defined on this table ? > > How large are they ? > > There are three indexes defined on the table, each one is around 20 to 25GB > and the indexes is create on Did you mean to say something else after "on" ? Show the definition of the indexes from psql \d > postgres=# explain select distinct empno from emp where sname='test' and > tp='EMP NAME 1' Is this the only query that's performing poorly ? You should send explain (analyze,buffers) for the prolematic queries. > > > Hypervisor vendor: KVM > > > > Are KSM or THP enabled on the hypervisor ? > No, the Ec2 VM is delicate to postgres DB instances only. Oh, so this is an EC2 and you cannot change the hypervisor itself. > -bash-4.2$ tail /sys/kernel/mm/ksm/run > /sys/kernel/mm/transparent_hugepage/khugepaged/defrag > /sys/kernel/mm/transparent_hugepage/enabled > /sys/kernel/mm/transparent_hugepage/defrag ... > ==> /sys/kernel/mm/transparent_hugepage/defrag <== > [always] madvise never I doubt it will help, but you could try disabling these. It's a quick experiment anyway.
Re: HIGH IO and Less CPU utilization
Hi Justin, Only one query is causing the issue, sharing the def of indexes. Please have a look. On Wed, 30 Mar 2022 at 01:09, Justin Pryzby wrote: > On Wed, Mar 30, 2022 at 12:52:05AM +0530, Rambabu g wrote: > > > What indexes are defined on this table ? > > > How large are they ? > > > > There are three indexes defined on the table, each one is around 20 to > 25GB > > and the indexes is create on > > Did you mean to say something else after "on" ? > > Show the definition of the indexes from psql \d > Index Definition : postgres=#\d+ idx_empno Index "l2.pd_activity_empi" Column | Type | Key? | Definition | Storage | Stats target +-+--++--+-- empno | character varying(2000) | yes | empno | extended | btree, for table "emp" postgres=#\d+ id_dt Index "dt" Column |Type | Key? | Definition | Storage | Stats target +-+--++-+-- dt | timestamp without time zone | yes | dt | plain | btree, for table "emp" postgres=#\d+ idx_tp Index "idx_tp" Column | Type | Key? | Definition | Storage | Stats target +-+--++--+-- tp| character varying(2000) | yes| tp | extended | btree, for table "emp" Query is been running for 30min. > postgres=# explain select distinct empno from emp where sname='test' > and tp='EMP NAME 1' > > Is this the only query that's performing poorly ? > You should send explain (analyze,buffers) for the prolematic queries. > postgres=# select pid,(now()-query_start) as age,wait_event_type,wait_event,query from pg_stat_activity where state!='idle'; pid | age | wait_event_type | wait_event | query ---+-+-+---+--- 32154 | 00:09:56.131136 | IPC | ExecuteGather | explain analyze select distinct empno from emp where sname='test' and tp='EMP NAME 1' 847 | 00:09:56.131136 | IO | DataFileRead | explain analyze select distinct empno from emp where sname='test' and tp='EMP NAME 1' 848 | 00:09:56.131136 | IO | DataFileRead | explain analyze select distinct empno from emp where sname='test' and tp='EMP NAME 1' 849 | 00:09:56.131136 | IO | DataFileRead | explain analyze select distinct empno from emp where sname='test' and tp='EMP NAME 1' 850 | 00:09:56.131136 | IO | DataFileRead | explain analyze select distinct empno from emp where sname='test' and tp='EMP NAME 1' 851 | 00:09:56.131136 | IO | DataFileRead | explain analyze select distinct empno from emp where sname='test' and tp='EMP NAME 1' 852 | 00:09:56.131136 | IO | DataFileRead | explain analyze select distinct empno from emp where sname='test' and tp='EMP NAME 1' 645 | 00:00:00| | | select pid,(now()-query_start) as age,wait_event_type,wait_event,query from pg_stat_activity where state!='idle' postgres=# SELECT COUNT(nullif(isdirty,'f')) dirty, COUNT(1) as all, COALESCE(c.relname, b.relfilenode::text) FROM pg_buffercache b LEFT JOIN pg_class c ON b.relfilenode=pg_relation_filenode(c.oid) GROUP BY 3 ORDER BY 1 DESC,2 DESC LIMIT 9; dirty | all |coalesce ---+-+- 32 | 136 | fn_deployment 18 | 176 | fn_deployment_key 8 | 12 | event_logs_pkey 6 | 157 | event_logs 1 | 355 | pg_class 0 | 2890261 | 0 | 252734 | utput_status 0 | 378 | emp 0 | 299 | 1249 (9 rows) -bash-4.2$ sar Linux 3.10.0-1160.59.1.el7.x86_64 (ip-10-54-145-108.ec2.internal) 03/30/2022 _x86_64_ (24 CPU) 12:00:01 AM CPU %user %nice %system %iowait%steal %idle 12:10:01 AM all 1.19 0.00 0.82 36.17 0.00 61.81 12:20:01 AM all 0.72 0.00 0.75 35.59 0.00 62.94 12:30:01 AM all 0.74 0.00 0.77 35.04 0.00 63.46 12:40:02 AM all 0.74 0.00 0.76 34.65 0.00 63.85 12:50:01 AM all 0.77 0.00 0.78 33.36 0.00 65.09 01:00:01 AM all 0.83 0.00 0.78 27.46 0.00 70.93 01:10:01 AM all 0.85 0.00 0.78 30.11 0.00 68.26 01:20:01 AM all 0.70 0.00 0.61 20.46 0.00 78.24 01:30:01 AM all 0.15 0.00 0.06 0.02 0.00 99.77 01:40:01 AM all