Re: [Gluster-users] How to find out what GlusterFS is doing

2020-11-05 Thread mabi
‐‐‐ Original Message ‐‐‐
On Thursday, November 5, 2020 3:28 PM, Yaniv Kaul  wrote:

> Waiting for IO, just like the rest of those in D state.
> You may have a slow storage subsystem. How many cores do you have, btw?
> Y.

Strange because "iostat -xtcm 5" does not show that the disks are 100% used, 
I've pasted below a sample output of "iostat -xtcm".

Both nodes have 1 CPU Intel Xeon E5-2620 v3 @ 2.40GHz which as 12 cores.

11/05/2020 03:37:25 PM
avg-cpu: %user %nice %system %iowait %steal %idle
0.93 0.00 84.81 0.03 0.00 14.22

Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz avgqu-sz await r_await 
w_await svctm %util
sda 0.00 0.00 0.60 23.80 0.00 0.20 17.31 0.03 1.31 45.33 0.20 1.25 3.04
sdc 0.00 0.00 0.60 24.80 0.00 0.24 19.15 0.03 1.04 40.00 0.10 1.01 2.56
sdg 0.00 0.00 0.60 23.00 0.00 0.22 19.05 0.03 1.39 45.33 0.24 1.25 2.96
sdf 0.00 0.00 0.60 25.00 0.00 0.23 18.25 0.03 1.16 41.33 0.19 1.06 2.72
sdd 0.00 0.00 0.60 24.60 0.00 0.19 15.43 0.02 0.86 32.00 0.10 0.83 2.08
sdh 0.00 0.00 0.40 25.00 0.00 0.22 17.64 0.03 1.10 58.00 0.19 1.01 2.56
sdi 0.00 0.00 0.40 25.80 0.00 0.23 17.71 0.03 1.01 60.00 0.09 0.98 2.56
sdj 0.00 0.00 0.60 24.00 0.00 0.19 15.67 0.02 0.91 32.00 0.13 0.85 2.08
sde 0.00 0.00 0.60 26.60 0.00 0.20 15.12 0.03 1.00 36.00 0.21 0.91 2.48
sdk 0.00 0.00 0.60 25.20 0.00 0.20 16.12 0.02 0.78 29.33 0.10 0.74 1.92
sdl 0.00 0.00 0.60 25.00 0.00 0.22 17.56 0.02 0.94 37.33 0.06 0.94 2.40
sdb 0.00 0.00 0.60 15.40 0.00 0.21 27.80 0.03 2.15 42.67 0.57 1.95 3.12



Community Meeting Calendar:

Schedule -
Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
Bridge: https://meet.google.com/cpu-eiue-hvk
Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users


Re: [Gluster-users] How to find out what GlusterFS is doing

2020-11-05 Thread mabi
Below is the top output of running "top -bHd d" on one of the nodes, maybe that 
can help to see what that glusterfsd process is doing?

  PID USER  PR  NIVIRTRESSHR S %CPU %MEM TIME+ COMMAND
 4375 root  20   0 2856784 120492   8360 D 61.1  0.4 117:09.29 glfs_iotwr001
 4385 root  20   0 2856784 120492   8360 R 61.1  0.4 117:12.92 glfs_iotwr003
 4387 root  20   0 2856784 120492   8360 R 61.1  0.4 117:32.19 glfs_iotwr005
 4388 root  20   0 2856784 120492   8360 R 61.1  0.4 117:28.87 glfs_iotwr006
 4391 root  20   0 2856784 120492   8360 D 61.1  0.4 117:20.71 glfs_iotwr008
 4395 root  20   0 2856784 120492   8360 D 61.1  0.4 117:17.22 glfs_iotwr009
 4405 root  20   0 2856784 120492   8360 R 61.1  0.4 117:19.52 glfs_iotwr00d
 4406 root  20   0 2856784 120492   8360 R 61.1  0.4 117:29.51 glfs_iotwr00e
 4366 root  20   0 2856784 120492   8360 D 55.6  0.4 117:27.58 glfs_iotwr000
 4386 root  20   0 2856784 120492   8360 D 55.6  0.4 117:22.77 glfs_iotwr004
 4390 root  20   0 2856784 120492   8360 D 55.6  0.4 117:26.49 glfs_iotwr007
 4396 root  20   0 2856784 120492   8360 R 55.6  0.4 117:23.68 glfs_iotwr00a
 4376 root  20   0 2856784 120492   8360 D 50.0  0.4 117:36.17 glfs_iotwr002
 4397 root  20   0 2856784 120492   8360 D 50.0  0.4 117:11.09 glfs_iotwr00b
 4403 root  20   0 2856784 120492   8360 R 50.0  0.4 117:26.34 glfs_iotwr00c
 4408 root  20   0 2856784 120492   8360 D 50.0  0.4 117:27.47 glfs_iotwr00f
 9814 root  20   0 2043684  75208   8424 D 22.2  0.2  50:15.20 glfs_iotwr003
28131 root  20   0 2043684  75208   8424 R 22.2  0.2  50:07.46 glfs_iotwr004
 2208 root  20   0 2043684  75208   8424 R 22.2  0.2  49:32.70 glfs_iotwr008
 2372 root  20   0 2043684  75208   8424 R 22.2  0.2  49:52.60 glfs_iotwr009
 2375 root  20   0 2043684  75208   8424 D 22.2  0.2  49:54.08 glfs_iotwr00c
  767 root  39  19   0  0  0 R 16.7  0.0  67:50.83 dbuf_evict
 4132 onadmin   20   0   45292   4184   3176 R 16.7  0.0   0:00.04 top
28484 root  20   0 2043684  75208   8424 R 11.1  0.2  49:41.34 glfs_iotwr005
 2376 root  20   0 2043684  75208   8424 R 11.1  0.2  49:49.49 glfs_iotwr00d
 2719 root  20   0 2043684  75208   8424 R 11.1  0.2  49:58.61 glfs_iotwr00e
 4384 root  20   0 2856784 120492   8360 S  5.6  0.4   4:01.27 glfs_rpcrqhnd
 3842 root  20   0 2043684  75208   8424 S  5.6  0.2   0:30.12 glfs_epoll001
1 root  20   0   57696   7340   5248 S  0.0  0.0   0:03.59 systemd
2 root  20   0   0  0  0 S  0.0  0.0   0:09.57 kthreadd
3 root  20   0   0  0  0 S  0.0  0.0   0:00.16 ksoftirqd/0
5 root   0 -20   0  0  0 S  0.0  0.0   0:00.00 kworker/0:0H
7 root  20   0   0  0  0 S  0.0  0.0   0:07.36 rcu_sched
8 root  20   0   0  0  0 S  0.0  0.0   0:00.00 rcu_bh
9 root  rt   0   0  0  0 S  0.0  0.0   0:00.03 migration/0
   10 root   0 -20   0  0  0 S  0.0  0.0   0:00.00 lru-add-drain
   11 root  rt   0   0  0  0 S  0.0  0.0   0:00.01 watchdog/0
   12 root  20   0   0  0  0 S  0.0  0.0   0:00.00 cpuhp/0
   13 root  20   0   0  0  0 S  0.0  0.0   0:00.00 cpuhp/1

Any clues anyone?

The load is really high around 20 now on the two nodes...


‐‐‐ Original Message ‐‐‐
On Thursday, November 5, 2020 11:50 AM, mabi  wrote:

> Hello,
>
> I have a 3 node replica including arbiter GlusterFS 7.8 server with 3 volumes 
> and the two nodes (not arbiter) seem to have a high load due to the 
> glusterfsd brick process taking all CPU resources (12 cores).
>
> Checking these two servers with iostat command shows that the disks are not 
> so busy and that they are mostly doing writes activity. On the FUSE clients 
> there is not so much activity so I was wondering how to find out or explain 
> why GlusterFS is currently generating such a high load on these two servers 
> (the arbiter does not show any high load). There are no files currently 
> healing either. This volume is the only volume which has the quota enabled if 
> this might be a hint. So does anyone know how to see why GlusterFS is so busy 
> on a specific volume?
>
> Here is a sample "vmstat 60" of one of the nodes:
>
> onadmin@gfs1b:~$ vmstat 60
> procs ---memory-- ---swap-- -io -system-- 
> --cpu-
> r b swpd free buff cache si so bi bo in cs us sy id wa st
> 9 2 0 22296776 32004 260284 0 0 33 301 153 39 2 60 36 2 0
> 13 0 0 22244540 32048 260456 0 0 343 2798 10898 367652 2 80 16 1 0
> 18 0 0 22215740 32056 260672 0 0 308 2524 9892 334537 2 83 14 1 0
> 18 0 0 22179348 32084 260828 0 0 169 2038 8703 250351 1 88 10 0 0
>
> I already tried rebooting but that did not help and there is nothing special 
> in the log files either.
>
> Best regards,
> Mabi






Community Meeting Calendar:

Schedule -
Every 2nd and 4th Tuesday 

[Gluster-users] How to find out what GlusterFS is doing

2020-11-05 Thread mabi
Hello,

I have a 3 node replica including arbiter GlusterFS 7.8 server with 3 volumes 
and the two nodes (not arbiter) seem to have a high load due to the glusterfsd 
brick process taking all CPU resources (12 cores).

Checking these two servers with iostat command shows that the disks are not so 
busy and that they are mostly doing writes activity. On the FUSE clients there 
is not so much activity so I was wondering how to find out or explain why 
GlusterFS is currently generating such a high load on these two servers (the 
arbiter does not show any high load). There are no files currently healing 
either. This volume is the only volume which has the quota enabled if this 
might be a hint. So does anyone know how to see why GlusterFS is so busy on a 
specific volume?

Here is a sample "vmstat 60" of one of the nodes:

onadmin@gfs1b:~$ vmstat 60
procs ---memory-- ---swap-- -io -system-- --cpu-
 r  b   swpd   free   buff  cache   si   sobibo   in   cs us sy id wa st
 9  2  0 22296776  32004 2602840033   301  153   39  2 60 36  2 
 0
13  0  0 22244540  32048 26045600   343  2798 10898 367652  2 80 16 
 1  0
18  0  0 22215740  32056 26067200   308  2524 9892 334537  2 83 14  
1  0
18  0  0 22179348  32084 26082800   169  2038 8703 250351  1 88 10  
0  0

I already tried rebooting but that did not help and there is nothing special in 
the log files either.

Best regards,
Mabi




Community Meeting Calendar:

Schedule -
Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
Bridge: https://meet.google.com/cpu-eiue-hvk
Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users


Re: [Gluster-users] How to find out what GlusterFS is doing

2020-11-05 Thread Yaniv Kaul
On Thu, Nov 5, 2020 at 4:18 PM mabi  wrote:

> Below is the top output of running "top -bHd d" on one of the nodes, maybe
> that can help to see what that glusterfsd process is doing?
>
>   PID USER  PR  NIVIRTRESSHR S %CPU %MEM TIME+ COMMAND
>  4375 root  20   0 2856784 120492   8360 D 61.1  0.4 117:09.29
> glfs_iotwr001
>

Waiting for IO, just like the rest of those in D state.
You may have a slow storage subsystem. How many cores do you have, btw?
Y.

 4385 root  20   0 2856784 120492   8360 R 61.1  0.4 117:12.92
> glfs_iotwr003
>  4387 root  20   0 2856784 120492   8360 R 61.1  0.4 117:32.19
> glfs_iotwr005
>  4388 root  20   0 2856784 120492   8360 R 61.1  0.4 117:28.87
> glfs_iotwr006
>  4391 root  20   0 2856784 120492   8360 D 61.1  0.4 117:20.71
> glfs_iotwr008
>  4395 root  20   0 2856784 120492   8360 D 61.1  0.4 117:17.22
> glfs_iotwr009
>  4405 root  20   0 2856784 120492   8360 R 61.1  0.4 117:19.52
> glfs_iotwr00d
>  4406 root  20   0 2856784 120492   8360 R 61.1  0.4 117:29.51
> glfs_iotwr00e
>  4366 root  20   0 2856784 120492   8360 D 55.6  0.4 117:27.58
> glfs_iotwr000
>  4386 root  20   0 2856784 120492   8360 D 55.6  0.4 117:22.77
> glfs_iotwr004
>  4390 root  20   0 2856784 120492   8360 D 55.6  0.4 117:26.49
> glfs_iotwr007
>  4396 root  20   0 2856784 120492   8360 R 55.6  0.4 117:23.68
> glfs_iotwr00a
>  4376 root  20   0 2856784 120492   8360 D 50.0  0.4 117:36.17
> glfs_iotwr002
>  4397 root  20   0 2856784 120492   8360 D 50.0  0.4 117:11.09
> glfs_iotwr00b
>  4403 root  20   0 2856784 120492   8360 R 50.0  0.4 117:26.34
> glfs_iotwr00c
>  4408 root  20   0 2856784 120492   8360 D 50.0  0.4 117:27.47
> glfs_iotwr00f
>  9814 root  20   0 2043684  75208   8424 D 22.2  0.2  50:15.20
> glfs_iotwr003
> 28131 root  20   0 2043684  75208   8424 R 22.2  0.2  50:07.46
> glfs_iotwr004
>  2208 root  20   0 2043684  75208   8424 R 22.2  0.2  49:32.70
> glfs_iotwr008
>  2372 root  20   0 2043684  75208   8424 R 22.2  0.2  49:52.60
> glfs_iotwr009
>  2375 root  20   0 2043684  75208   8424 D 22.2  0.2  49:54.08
> glfs_iotwr00c
>   767 root  39  19   0  0  0 R 16.7  0.0  67:50.83
> dbuf_evict
>  4132 onadmin   20   0   45292   4184   3176 R 16.7  0.0   0:00.04 top
> 28484 root  20   0 2043684  75208   8424 R 11.1  0.2  49:41.34
> glfs_iotwr005
>  2376 root  20   0 2043684  75208   8424 R 11.1  0.2  49:49.49
> glfs_iotwr00d
>  2719 root  20   0 2043684  75208   8424 R 11.1  0.2  49:58.61
> glfs_iotwr00e
>  4384 root  20   0 2856784 120492   8360 S  5.6  0.4   4:01.27
> glfs_rpcrqhnd
>  3842 root  20   0 2043684  75208   8424 S  5.6  0.2   0:30.12
> glfs_epoll001
> 1 root  20   0   57696   7340   5248 S  0.0  0.0   0:03.59 systemd
> 2 root  20   0   0  0  0 S  0.0  0.0   0:09.57 kthreadd
> 3 root  20   0   0  0  0 S  0.0  0.0   0:00.16
> ksoftirqd/0
> 5 root   0 -20   0  0  0 S  0.0  0.0   0:00.00
> kworker/0:0H
> 7 root  20   0   0  0  0 S  0.0  0.0   0:07.36
> rcu_sched
> 8 root  20   0   0  0  0 S  0.0  0.0   0:00.00 rcu_bh
> 9 root  rt   0   0  0  0 S  0.0  0.0   0:00.03
> migration/0
>10 root   0 -20   0  0  0 S  0.0  0.0   0:00.00
> lru-add-drain
>11 root  rt   0   0  0  0 S  0.0  0.0   0:00.01
> watchdog/0
>12 root  20   0   0  0  0 S  0.0  0.0   0:00.00 cpuhp/0
>13 root  20   0   0  0  0 S  0.0  0.0   0:00.00 cpuhp/1
>
> Any clues anyone?
>
> The load is really high around 20 now on the two nodes...
>
>
> ‐‐‐ Original Message ‐‐‐
> On Thursday, November 5, 2020 11:50 AM, mabi  wrote:
>
> > Hello,
> >
> > I have a 3 node replica including arbiter GlusterFS 7.8 server with 3
> volumes and the two nodes (not arbiter) seem to have a high load due to the
> glusterfsd brick process taking all CPU resources (12 cores).
> >
> > Checking these two servers with iostat command shows that the disks are
> not so busy and that they are mostly doing writes activity. On the FUSE
> clients there is not so much activity so I was wondering how to find out or
> explain why GlusterFS is currently generating such a high load on these two
> servers (the arbiter does not show any high load). There are no files
> currently healing either. This volume is the only volume which has the
> quota enabled if this might be a hint. So does anyone know how to see why
> GlusterFS is so busy on a specific volume?
> >
> > Here is a sample "vmstat 60" of one of the nodes:
> >
> > onadmin@gfs1b:~$ vmstat 60
> > procs ---memory-- ---swap-- -io -system--
> --cpu-
> > r b swpd free buff cache si so bi bo in cs us sy id wa st
> > 9 2 0 22296776 32004 260284 0 0 33 301 153 39 2 60 36 2 0
> > 13 0 0 22244540 32048 260456 0 0 343 2798 10898 367652 2 80