Re: [Gluster-users] glusterfsd process thrashing CPU
Actually I had the same experience when I was using 3.4.2 https://www.mail-archive.com/gluster-users@gluster.org/msg15850.html If I understand, I should be using FULL heal rather than DIFF for large vm-images? I was not sure throttling was working for 3.4.2 or not. I attempted to recover the entire volume filled with VM-images ranging from size 10G to 500G I saw it was recovering 2 images at a time rather than all at once. Thanks, Adrian From: gluster-users-boun...@gluster.org [mailto:gluster-users-boun...@gluster.org] On Behalf Of Pranith Kumar Karampuri Sent: Tuesday, November 18, 2014 7:02 PM To: Lindsay Mathieson; gluster-users Subject: Re: [Gluster-users] glusterfsd process thrashing CPU On 11/18/2014 04:14 PM, Lindsay Mathieson wrote: On Tue, 18 Nov 2014 02:36:19 PM Pranith Kumar Karampuri wrote: On 11/18/2014 01:17 PM, Lindsay Mathieson wrote: On 18 November 2014 17:40, Pranith Kumar Karampuri mailto:pkara...@redhat.com pkara...@redhat.com wrote: However given the files are tens of GB in size, won't it thrash my network? Yes you are right. I wonder why thrashing of the network is never reported till now. Not sure if you are being sarcastic or not :) But from what I've observed, sync operations seem to self throttle, I've not seen them use more than 50% of bandwidth, and given most setups have a dedicated network for the servers maybe they just don't notice if it takes a while? No, I was not being sarcastic :-). I am genuinely wondering why it is not reported till now. May be Joe will have more inputs there, that is the reason I CCed him. I still need to think about how best to solve this problem. Setup a array of queues for self healing, sorted by size maybe? Let me tell you a bit more about this issue: there are two processes which heal the VM images: 1) self-heal-daemon. 2) Mount process. Self-heal daemon heals one VM image at a time. But mount process triggers self-heals for all the opened files(VM image is nothing but an opened file from filesystem's perspective) when a brick goes down and comes backup. Thanks, interesting to know. So we need to come up with a scheme to throttle self-heals on the mount point to prevent this issue. I will update you as soon as I come up with a fix. This should not be hard to do. Need some time to choose the best approach. Thanks a lot for bringing up this issue. Thanks you for looking at it! Cheers, ___ Gluster-users mailing list Gluster-users@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-users ___ Gluster-users mailing list Gluster-users@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] glusterfsd process thrashing CPU
On 18 November 2014 17:46, Franco Broi franco.b...@iongeo.com wrote: Try strace -Ff -e file -p 'glusterfsd pid' Thanks, Attached Process 27115 attached with 25 threads - interrupt to quit [pid 27122] stat(/mnt/gluster-brick1/datastore, {st_mode=S_IFDIR|0755, st_size=4, ...}) = 0 [pid 11840] lstat(/mnt/gluster-brick1/datastore/, {st_mode=S_IFDIR|0755, st_size=4, ...}) = 0 [pid 11840] lgetxattr(/mnt/gluster-brick1/datastore/, system.posix_acl_default, 0x0, 0) = -1 EOPNOTSUPP (Operation not supported) [pid 11840] lgetxattr(/mnt/gluster-brick1/datastore/, system.posix_acl_access, 0x0, 0) = -1 EOPNOTSUPP (Operation not supported) [pid 11840] lgetxattr(/mnt/gluster-brick1/datastore/, trusted.glusterfs.dht unfinished ... [pid 29198] lstat(/mnt/gluster-brick1/datastore/, {st_mode=S_IFDIR|0755, st_size=4, ...}) = 0 [pid 29198] lgetxattr(/mnt/gluster-brick1/datastore/, system.posix_acl_default, 0x0, 0) = -1 EOPNOTSUPP (Operation not supported) [pid 29198] lgetxattr(/mnt/gluster-brick1/datastore/, system.posix_acl_access, 0x0, 0) = -1 EOPNOTSUPP (Operation not supported) [pid 29198] lgetxattr(/mnt/gluster-brick1/datastore/, trusted.glusterfs.dht unfinished ... [pid 29197] lstat(/mnt/gluster-brick1/datastore/, {st_mode=S_IFDIR|0755, st_size=4, ...}) = 0 [pid 29197] lgetxattr(/mnt/gluster-brick1/datastore/, system.posix_acl_default, 0x0, 0) = -1 EOPNOTSUPP (Operation not supported) [pid 29197] lgetxattr(/mnt/gluster-brick1/datastore/, system.posix_acl_access, 0x0, 0) = -1 EOPNOTSUPP (Operation not supported) [pid 29197] lgetxattr(/mnt/gluster-brick1/datastore/, trusted.glusterfs.dht unfinished ... [pid 11840] ... lgetxattr resumed , 0x0, 0) = 16 [pid 11840] lgetxattr(/mnt/gluster-brick1/datastore/, trusted.glusterfs.dht, \x00\x00\x00\x01\x00\x00\x00\x00\x00\x00\x00\x00\xff\xff\xff\xff, 16) = 16 [pid 11840] lgetxattr(/mnt/gluster-brick1/datastore/, missing-gfid-ESTALE, 0x0, 0) = -1 EOPNOTSUPP (Operation not supported) [pid 11840] lgetxattr(/mnt/gluster-brick1/datastore/, trusted.afr.datastore1-client-0, 0x0, 0) = -1 ENODATA (No data available) [pid 11840] lgetxattr(/mnt/gluster-brick1/datastore/, trusted.afr.datastore1-client-1, 0x0, 0) = -1 ENODATA (No data available) [pid 11840] llistxattr(/mnt/gluster-brick1/datastore/, (nil), 0) = 63 [pid 11840] llistxattr(/mnt/gluster-brick1/datastore/, 0x7feae3cfda10, 63) = 63 [pid 29198] ... lgetxattr resumed , 0x0, 0) = 16 [pid 29198] lgetxattr(/mnt/gluster-brick1/datastore/, trusted.glusterfs.dht, \x00\x00\x00\x01\x00\x00\x00\x00\x00\x00\x00\x00\xff\xff\xff\xff, 16) = 16 [pid 29198] lgetxattr(/mnt/gluster-brick1/datastore/, missing-gfid-ESTALE, 0x0, 0) = -1 EOPNOTSUPP (Operation not supported) [pid 29198] lgetxattr(/mnt/gluster-brick1/datastore/, trusted.afr.datastore1-client-0, 0x0, 0) = -1 ENODATA (No data available) [pid 29198] lgetxattr(/mnt/gluster-brick1/datastore/, trusted.afr.datastore1-client-1 unfinished ... [pid 29197] ... lgetxattr resumed , 0x0, 0) = 16 [pid 29198] ... lgetxattr resumed , 0x0, 0) = -1 ENODATA (No data available) [pid 29197] lgetxattr(/mnt/gluster-brick1/datastore/, trusted.glusterfs.dht unfinished ... [pid 29198] llistxattr(/mnt/gluster-brick1/datastore/, (nil), 0) = 63 [pid 29198] llistxattr(/mnt/gluster-brick1/datastore/ unfinished ... [pid 29197] ... lgetxattr resumed , \x00\x00\x00\x01\x00\x00\x00\x00\x00\x00\x00\x00\xff\xff\xff\xff, 16) = 16 [pid 29198] ... llistxattr resumed , 0x7feae3ffea10, 63) = 63 [pid 29197] lgetxattr(/mnt/gluster-brick1/datastore/, missing-gfid-ESTALE, 0x0, 0) = -1 EOPNOTSUPP (Operation not supported) [pid 29197] lgetxattr(/mnt/gluster-brick1/datastore/, trusted.afr.datastore1-client-0, 0x0, 0) = -1 ENODATA (No data available) [pid 29197] lgetxattr(/mnt/gluster-brick1/datastore/, trusted.afr.datastore1-client-1, 0x0, 0) = -1 ENODATA (No data available) [pid 29197] llistxattr(/mnt/gluster-brick1/datastore/, (nil), 0) = 63 [pid 29197] llistxattr(/mnt/gluster-brick1/datastore/, 0x7feaf0487a10, 63) = 63 [pid 11846] lstat(/mnt/gluster-brick1/datastore/images, {st_mode=S_IFDIR|0755, st_size=27, ...}) = 0 [pid 11846] lgetxattr(/mnt/gluster-brick1/datastore/images, trusted.gfid unfinished ... [pid 11844] lstat(/mnt/gluster-brick1/datastore/images, {st_mode=S_IFDIR|0755, st_size=27, ...}) = 0 [pid 11844] lgetxattr(/mnt/gluster-brick1/datastore/images, trusted.gfid unfinished ... [pid 11845] lstat(/mnt/gluster-brick1/datastore/images, {st_mode=S_IFDIR|0755, st_size=27, ...}) = 0 [pid 11845] lgetxattr(/mnt/gluster-brick1/datastore/images, trusted.gfid unfinished ... [pid 11844] ... lgetxattr resumed , \xbe\x7fIlH\xb0C\xbd\xaaA=BJ6\xca\xb1, 16) = 16 [pid 11846] ... lgetxattr resumed , \xbe\x7fIlH\xb0C\xbd\xaaA=BJ6\xca\xb1, 16) = 16 [pid 11845] ... lgetxattr resumed , \xbe\x7fIlH\xb0C\xbd\xaaA=BJ6\xca\xb1, 16) = 16 [pid 11846] lgetxattr(/mnt/gluster-brick1/datastore/images, system.posix_acl_default unfinished ... [pid 11845] lgetxattr(/mnt/gluster-brick1/datastore/images,
Re: [Gluster-users] glusterfsd process thrashing CPU
Can't see how any of that could account for 1000% cpu unless it's just stuck in a loop. On Tue, 2014-11-18 at 18:00 +1000, Lindsay Mathieson wrote: On 18 November 2014 17:46, Franco Broi franco.b...@iongeo.com wrote: Try strace -Ff -e file -p 'glusterfsd pid' Thanks, Attached ___ Gluster-users mailing list Gluster-users@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-users ___ Gluster-users mailing list Gluster-users@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] glusterfsd process thrashing CPU
On 18 November 2014 18:05, Franco Broi franco.b...@iongeo.com wrote: Can't see how any of that could account for 1000% cpu unless it's just stuck in a loop. Currently still varying between 400% to 950% Can glusterfsd be killed without effecting the lgfapi clients? (KVM's) ___ Gluster-users mailing list Gluster-users@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] glusterfsd process thrashing CPU
On 11/18/2014 01:17 PM, Lindsay Mathieson wrote: On 18 November 2014 17:40, Pranith Kumar Karampuri pkara...@redhat.com wrote: Sorry didn't see this one. I think this is happening because of 'diff' based self-heal which does full file checksums, that I believe is the root cause. Could you execute 'gluster volume set volname cluster.data-self-heal-algorithm full' to prevent this issue in future. But this option will be effective for the new self-heals that will be triggered after the execution of the command. The ongoing ones will still use the old mode of self-heal. Thanks, makes sense. However given the files are tens of GB in size, won't it thrash my network? Yes you are right. I wonder why thrashing of the network is never reported till now. +Joejulian who also uses VMs on gluster(for 5 years now?). He uses this option of full self-heal (Thats what I saw in his bug reports). I still need to think about how best to solve this problem. Let me tell you a bit more about this issue: there are two processes which heal the VM images: 1) self-heal-daemon. 2) Mount process. Self-heal daemon heals one VM image at a time. But mount process triggers self-heals for all the opened files(VM image is nothing but an opened file from filesystem's perspective) when a brick goes down and comes backup. So we need to come up with a scheme to throttle self-heals on the mount point to prevent this issue. I will update you as soon as I come up with a fix. This should not be hard to do. Need some time to choose the best approach. Thanks a lot for bringing up this issue. Pranith ___ Gluster-users mailing list Gluster-users@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-users ___ Gluster-users mailing list Gluster-users@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] glusterfsd process thrashing CPU
On Tue, 18 Nov 2014 02:36:19 PM Pranith Kumar Karampuri wrote: On 11/18/2014 01:17 PM, Lindsay Mathieson wrote: On 18 November 2014 17:40, Pranith Kumar Karampuri pkara...@redhat.com wrote: However given the files are tens of GB in size, won't it thrash my network? Yes you are right. I wonder why thrashing of the network is never reported till now. Not sure if you are being sarcastic or not :) But from what I've observed, sync operations seem to self throttle, I've not seen them use more than 50% of bandwidth, and given most setups have a dedicated network for the servers maybe they just don't notice if it takes a while? I still need to think about how best to solve this problem. Setup a array of queues for self healing, sorted by size maybe? Let me tell you a bit more about this issue: there are two processes which heal the VM images: 1) self-heal-daemon. 2) Mount process. Self-heal daemon heals one VM image at a time. But mount process triggers self-heals for all the opened files(VM image is nothing but an opened file from filesystem's perspective) when a brick goes down and comes backup. Thanks, interesting to know. So we need to come up with a scheme to throttle self-heals on the mount point to prevent this issue. I will update you as soon as I come up with a fix. This should not be hard to do. Need some time to choose the best approach. Thanks a lot for bringing up this issue. Thanks you for looking at it! Cheers, -- Lindsay signature.asc Description: This is a digitally signed message part. ___ Gluster-users mailing list Gluster-users@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] glusterfsd process thrashing CPU
On 11/18/2014 04:14 PM, Lindsay Mathieson wrote: On Tue, 18 Nov 2014 02:36:19 PM Pranith Kumar Karampuri wrote: On 11/18/2014 01:17 PM, Lindsay Mathieson wrote: On 18 November 2014 17:40, Pranith Kumar Karampuri pkara...@redhat.com wrote: However given the files are tens of GB in size, won't it thrash my network? Yes you are right. I wonder why thrashing of the network is never reported till now. Not sure if you are being sarcastic or not :) But from what I've observed, sync operations seem to self throttle, I've not seen them use more than 50% of bandwidth, and given most setups have a dedicated network for the servers maybe they just don't notice if it takes a while? No, I was not being sarcastic :-). I am genuinely wondering why it is not reported till now. May be Joe will have more inputs there, that is the reason I CCed him. I still need to think about how best to solve this problem. Setup a array of queues for self healing, sorted by size maybe? Let me tell you a bit more about this issue: there are two processes which heal the VM images: 1) self-heal-daemon. 2) Mount process. Self-heal daemon heals one VM image at a time. But mount process triggers self-heals for all the opened files(VM image is nothing but an opened file from filesystem's perspective) when a brick goes down and comes backup. Thanks, interesting to know. So we need to come up with a scheme to throttle self-heals on the mount point to prevent this issue. I will update you as soon as I come up with a fix. This should not be hard to do. Need some time to choose the best approach. Thanks a lot for bringing up this issue. Thanks you for looking at it! Cheers, ___ Gluster-users mailing list Gluster-users@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-users ___ Gluster-users mailing list Gluster-users@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] glusterfsd process thrashing CPU
ps. There is very little network traffic happening -- Lindsay ___ Gluster-users mailing list Gluster-users@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] glusterfsd process thrashing CPU
And its happening on both nodes now, they have become near unusable. On 18 November 2014 17:03, Lindsay Mathieson lindsay.mathie...@gmail.com wrote: ps. There is very little network traffic happening -- Lindsay -- Lindsay ___ Gluster-users mailing list Gluster-users@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] glusterfsd process thrashing CPU
glusterfsd is the filesystem daemon. You could trace strace'ing it to see what it's doing. On Tue, 2014-11-18 at 17:09 +1000, Lindsay Mathieson wrote: And its happening on both nodes now, they have become near unusable. On 18 November 2014 17:03, Lindsay Mathieson lindsay.mathie...@gmail.com wrote: ps. There is very little network traffic happening -- Lindsay ___ Gluster-users mailing list Gluster-users@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] glusterfsd process thrashing CPU
On 11/18/2014 12:32 PM, Lindsay Mathieson wrote: 2 Node replicate setup, Everything has been stable for days untill I had occasion to reboot one of the nodes. Since then (past hour) glusterfsd has been pegging the CPU(s), utilization ranging from 1% to 1000% ! On average its around 500% This is a vm server, so there are only 27 VM images for a total of 800GB. Its an Intel E5-2620 (12 Cores) with 32GB ECC RAM - What does glusterfsd do? - What can I do to fix this? Which version of glusterfs are you using? Do you have directories with lots of files? Pranith thanks, ___ Gluster-users mailing list Gluster-users@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] glusterfsd process thrashing CPU
On 11/18/2014 01:05 PM, Pranith Kumar Karampuri wrote: On 11/18/2014 12:32 PM, Lindsay Mathieson wrote: 2 Node replicate setup, Everything has been stable for days untill I had occasion to reboot one of the nodes. Since then (past hour) glusterfsd has been pegging the CPU(s), utilization ranging from 1% to 1000% ! On average its around 500% This is a vm server, so there are only 27 VM images for a total of 800GB. Its an Intel E5-2620 (12 Cores) with 32GB ECC RAM Sorry didn't see this one. I think this is happening because of 'diff' based self-heal which does full file checksums, that I believe is the root cause. Could you execute 'gluster volume set volname cluster.data-self-heal-algorithm full' to prevent this issue in future. But this option will be effective for the new self-heals that will be triggered after the execution of the command. The ongoing ones will still use the old mode of self-heal. Pranith - What does glusterfsd do? - What can I do to fix this? Which version of glusterfs are you using? Do you have directories with lots of files? Pranith thanks, ___ Gluster-users mailing list Gluster-users@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-users ___ Gluster-users mailing list Gluster-users@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] glusterfsd process thrashing CPU
Gluster 3.5.2 Very few files - its purely a VM image host, 27 files, 10 - 60GB in size. seems to be undergoing a heal: root@vnb:~# gluster volume heal datastore1 info Brick vnb:/mnt/gluster-brick1/datastore/ /images/108/vm-108-disk-1.qcow2 - Possibly undergoing heal /images/105/vm-105-disk-1.qcow2 - Possibly undergoing heal /images/100/vm-100-disk-1.qcow2 - Possibly undergoing heal /images/401/vm-401-disk-1.qcow2 - Possibly undergoing heal /images/201/vm-201-disk-1.qcow2 - Possibly undergoing heal /images/204/vm-204-disk-1.qcow2 - Possibly undergoing heal /images/102/vm-102-disk-1.qcow2 - Possibly undergoing heal /images/501/vm-501-disk-1.qcow2 - Possibly undergoing heal /images/203/vm-203-disk-1.qcow2 - Possibly undergoing heal /images/106/vm-106-disk-1.qcow2 - Possibly undergoing heal /images/400/vm-400-disk-1.qcow2 - Possibly undergoing heal /images/107/vm-107-disk-1.qcow2 - Possibly undergoing heal Number of entries: 12 Brick vng:/mnt/gluster-brick1/datastore/ gfid:59489c91-2df1-4c7d-8dda-265020098b67 - Possibly undergoing heal gfid:ebb7692f-39ee-4b9b-b48a-1241c47c13bf - Possibly undergoing heal gfid:8759bea0-ab64-4f7b-87b3-69217ebfee55 - Possibly undergoing heal Number of entries: 3 What would the gfid entries be? On 18 November 2014 17:35, Pranith Kumar Karampuri pkara...@redhat.com wrote: On 11/18/2014 12:32 PM, Lindsay Mathieson wrote: 2 Node replicate setup, Everything has been stable for days untill I had occasion to reboot one of the nodes. Since then (past hour) glusterfsd has been pegging the CPU(s), utilization ranging from 1% to 1000% ! On average its around 500% This is a vm server, so there are only 27 VM images for a total of 800GB. Its an Intel E5-2620 (12 Cores) with 32GB ECC RAM - What does glusterfsd do? - What can I do to fix this? Which version of glusterfs are you using? Do you have directories with lots of files? Pranith thanks, -- Lindsay ___ Gluster-users mailing list Gluster-users@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] glusterfsd process thrashing CPU
Sorry, meant to send to the list. strace attached. On 18 November 2014 17:35, Pranith Kumar Karampuri pkara...@redhat.com wrote: On 11/18/2014 12:32 PM, Lindsay Mathieson wrote: 2 Node replicate setup, Everything has been stable for days untill I had occasion to reboot one of the nodes. Since then (past hour) glusterfsd has been pegging the CPU(s), utilization ranging from 1% to 1000% ! On average its around 500% This is a vm server, so there are only 27 VM images for a total of 800GB. Its an Intel E5-2620 (12 Cores) with 32GB ECC RAM - What does glusterfsd do? - What can I do to fix this? Which version of glusterfs are you using? Do you have directories with lots of files? Pranith thanks, -- Lindsay execve(/usr/sbin/glusterfsd, [glusterfsd], [/* 15 vars */]) = 0 brk(0) = 0x1e76000 access(/etc/ld.so.nohwcap, F_OK) = -1 ENOENT (No such file or directory) mmap(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f365dc72000 access(/etc/ld.so.preload, R_OK) = -1 ENOENT (No such file or directory) open(/etc/ld.so.cache, O_RDONLY) = 3 fstat(3, {st_mode=S_IFREG|0644, st_size=32190, ...}) = 0 mmap(NULL, 32190, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7f365dc6a000 close(3)= 0 access(/etc/ld.so.nohwcap, F_OK) = -1 ENOENT (No such file or directory) open(/lib/x86_64-linux-gnu/libdl.so.2, O_RDONLY) = 3 read(3, \177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0\0\1\0\0\0\340\r\0\0\0\0\0\0..., 832) = 832 fstat(3, {st_mode=S_IFREG|0644, st_size=14768, ...}) = 0 mmap(NULL, 2109696, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7f365d851000 mprotect(0x7f365d853000, 2097152, PROT_NONE) = 0 mmap(0x7f365da53000, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x2000) = 0x7f365da53000 close(3)= 0 access(/etc/ld.so.nohwcap, F_OK) = -1 ENOENT (No such file or directory) open(/lib/x86_64-linux-gnu/libutil.so.1, O_RDONLY) = 3 read(3, \177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0\0\1\0\0\0\20\16\0\0\0\0\0\0..., 832) = 832 fstat(3, {st_mode=S_IFREG|0644, st_size=10640, ...}) = 0 mmap(NULL, 2105608, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7f365d64e000 mprotect(0x7f365d65, 2093056, PROT_NONE) = 0 mmap(0x7f365d84f000, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x1000) = 0x7f365d84f000 close(3)= 0 access(/etc/ld.so.nohwcap, F_OK) = -1 ENOENT (No such file or directory) open(/lib/x86_64-linux-gnu/libm.so.6, O_RDONLY) = 3 read(3, \177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0\0\1\0\0\0\360\0\0\0\0\0\0..., 832) = 832 fstat(3, {st_mode=S_IFREG|0644, st_size=530736, ...}) = 0 mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f365dc69000 mmap(NULL, 2625768, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7f365d3cc000 mprotect(0x7f365d44d000, 2093056, PROT_NONE) = 0 mmap(0x7f365d64c000, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x8) = 0x7f365d64c000 close(3)= 0 access(/etc/ld.so.nohwcap, F_OK) = -1 ENOENT (No such file or directory) open(/usr/lib/libpython2.7.so.1.0, O_RDONLY) = 3 read(3, \177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0\0\1\0\0\0`\217\4\0\0\0\0\0..., 832) = 832 fstat(3, {st_mode=S_IFREG|0644, st_size=3073448, ...}) = 0 mmap(NULL, 5242520, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7f365cecc000 mprotect(0x7f365d15, 2093056, PROT_NONE) = 0 mmap(0x7f365d34f000, 438272, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x283000) = 0x7f365d34f000 mmap(0x7f365d3ba000, 73368, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x7f365d3ba000 close(3)= 0 access(/etc/ld.so.nohwcap, F_OK) = -1 ENOENT (No such file or directory) open(/usr/lib/x86_64-linux-gnu/libglusterfs.so.0, O_RDONLY) = 3 read(3, \177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0\0\1\0\0\0\340Q\1\0\0\0\0\0..., 832) = 832 fstat(3, {st_mode=S_IFREG|0644, st_size=557592, ...}) = 0 mmap(NULL, 2666280, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7f365cc41000 mprotect(0x7f365ccc7000, 2097152, PROT_NONE) = 0 mmap(0x7f365cec7000, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x86000) = 0x7f365cec7000 mmap(0x7f365cec9000, 12072, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x7f365cec9000 close(3)= 0 access(/etc/ld.so.nohwcap, F_OK) = -1 ENOENT (No such file or directory) open(/usr/lib/x86_64-linux-gnu/libgfrpc.so.0, O_RDONLY) = 3 read(3, \177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0\0\1\0\0\0\360Z\0\0\0\0\0\0..., 832) = 832 fstat(3, {st_mode=S_IFREG|0644, st_size=105848, ...}) = 0 mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f365dc68000 mmap(NULL, 2201016, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7f365ca27000
Re: [Gluster-users] glusterfsd process thrashing CPU
Try strace -Ff -e file -p 'glusterfsd pid' On Tue, 2014-11-18 at 17:42 +1000, Lindsay Mathieson wrote: Sorry, meant to send to the list. strace attached. On 18 November 2014 17:35, Pranith Kumar Karampuri pkara...@redhat.com wrote: On 11/18/2014 12:32 PM, Lindsay Mathieson wrote: 2 Node replicate setup, Everything has been stable for days untill I had occasion to reboot one of the nodes. Since then (past hour) glusterfsd has been pegging the CPU(s), utilization ranging from 1% to 1000% ! On average its around 500% This is a vm server, so there are only 27 VM images for a total of 800GB. Its an Intel E5-2620 (12 Cores) with 32GB ECC RAM - What does glusterfsd do? - What can I do to fix this? Which version of glusterfs are you using? Do you have directories with lots of files? Pranith thanks, ___ Gluster-users mailing list Gluster-users@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-users ___ Gluster-users mailing list Gluster-users@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] glusterfsd process thrashing CPU
On 18 November 2014 17:40, Pranith Kumar Karampuri pkara...@redhat.com wrote: Sorry didn't see this one. I think this is happening because of 'diff' based self-heal which does full file checksums, that I believe is the root cause. Could you execute 'gluster volume set volname cluster.data-self-heal-algorithm full' to prevent this issue in future. But this option will be effective for the new self-heals that will be triggered after the execution of the command. The ongoing ones will still use the old mode of self-heal. Thanks, makes sense. However given the files are tens of GB in size, won't it thrash my network? ___ Gluster-users mailing list Gluster-users@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-users