Re: [Gluster-users] glusterfsd process thrashing CPU

2014-11-21 Thread Adrian Kan
Actually I had the same experience when I was using 3.4.2

https://www.mail-archive.com/gluster-users@gluster.org/msg15850.html

 

If I understand, I should be using FULL heal rather than DIFF for large
vm-images?

 

I was not sure throttling was working for 3.4.2 or not.  I attempted to
recover the entire volume filled with VM-images ranging from size 10G to
500G

I saw it was recovering 2 images at a time rather than all at once.

 

 

Thanks,

Adrian

 

 

 

 

 

From: gluster-users-boun...@gluster.org
[mailto:gluster-users-boun...@gluster.org] On Behalf Of Pranith Kumar
Karampuri
Sent: Tuesday, November 18, 2014 7:02 PM
To: Lindsay Mathieson; gluster-users
Subject: Re: [Gluster-users] glusterfsd process thrashing CPU

 

 

On 11/18/2014 04:14 PM, Lindsay Mathieson wrote:

On Tue, 18 Nov 2014 02:36:19 PM Pranith Kumar Karampuri wrote:

On 11/18/2014 01:17 PM, Lindsay Mathieson wrote:

On 18 November 2014 17:40, Pranith Kumar Karampuri
mailto:pkara...@redhat.com pkara...@redhat.com 

wrote:

 
However given the files are tens of GB in size, won't it thrash my
network?

 
Yes you are right. I wonder why thrashing of the network is never
reported till now.

 
Not sure if you are being sarcastic or not :) But from what I've observed, 
sync operations seem to self throttle, I've not seen them use more than 50%
of 
bandwidth, and given most setups have a dedicated network for the servers 
maybe they just don't notice if it takes a while?

No, I was not being sarcastic :-). I am genuinely wondering why it is not
reported till now. May be Joe will have more inputs there, that is the
reason I CCed him.



 
 

I still need to think about how best to solve this problem.

 
Setup a array of queues for self healing, sorted by size maybe?
 

 
Let me tell you a bit more about this issue:
there are two processes which heal the VM images:
1) self-heal-daemon. 2) Mount process.
Self-heal daemon heals one VM image at a time. But mount process
triggers self-heals for all the opened files(VM image is nothing but an
opened file from filesystem's perspective) when a brick goes down and
comes backup.

 
 
Thanks, interesting to know.
 

So we need to come up with a scheme to throttle self-heals
on the mount point to prevent this issue. I will update you as soon as I
come up with a fix. This should not be hard to do. Need some time to
choose the best approach. Thanks a lot for bringing up this issue.

 
Thanks you for looking at it!
 
Cheers,
 
 






___
Gluster-users mailing list
Gluster-users@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-users

 

___
Gluster-users mailing list
Gluster-users@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] glusterfsd process thrashing CPU

2014-11-18 Thread Lindsay Mathieson
On 18 November 2014 17:46, Franco Broi franco.b...@iongeo.com wrote:

 Try strace -Ff -e file -p 'glusterfsd pid'

Thanks, Attached
Process 27115 attached with 25 threads - interrupt to quit
[pid 27122] stat(/mnt/gluster-brick1/datastore, {st_mode=S_IFDIR|0755, 
st_size=4, ...}) = 0
[pid 11840] lstat(/mnt/gluster-brick1/datastore/, {st_mode=S_IFDIR|0755, 
st_size=4, ...}) = 0
[pid 11840] lgetxattr(/mnt/gluster-brick1/datastore/, 
system.posix_acl_default, 0x0, 0) = -1 EOPNOTSUPP (Operation not supported)
[pid 11840] lgetxattr(/mnt/gluster-brick1/datastore/, 
system.posix_acl_access, 0x0, 0) = -1 EOPNOTSUPP (Operation not supported)
[pid 11840] lgetxattr(/mnt/gluster-brick1/datastore/, trusted.glusterfs.dht 
unfinished ...
[pid 29198] lstat(/mnt/gluster-brick1/datastore/, {st_mode=S_IFDIR|0755, 
st_size=4, ...}) = 0
[pid 29198] lgetxattr(/mnt/gluster-brick1/datastore/, 
system.posix_acl_default, 0x0, 0) = -1 EOPNOTSUPP (Operation not supported)
[pid 29198] lgetxattr(/mnt/gluster-brick1/datastore/, 
system.posix_acl_access, 0x0, 0) = -1 EOPNOTSUPP (Operation not supported)
[pid 29198] lgetxattr(/mnt/gluster-brick1/datastore/, trusted.glusterfs.dht 
unfinished ...
[pid 29197] lstat(/mnt/gluster-brick1/datastore/, {st_mode=S_IFDIR|0755, 
st_size=4, ...}) = 0
[pid 29197] lgetxattr(/mnt/gluster-brick1/datastore/, 
system.posix_acl_default, 0x0, 0) = -1 EOPNOTSUPP (Operation not supported)
[pid 29197] lgetxattr(/mnt/gluster-brick1/datastore/, 
system.posix_acl_access, 0x0, 0) = -1 EOPNOTSUPP (Operation not supported)
[pid 29197] lgetxattr(/mnt/gluster-brick1/datastore/, trusted.glusterfs.dht 
unfinished ...
[pid 11840] ... lgetxattr resumed , 0x0, 0) = 16
[pid 11840] lgetxattr(/mnt/gluster-brick1/datastore/, 
trusted.glusterfs.dht, 
\x00\x00\x00\x01\x00\x00\x00\x00\x00\x00\x00\x00\xff\xff\xff\xff, 16) = 16
[pid 11840] lgetxattr(/mnt/gluster-brick1/datastore/, missing-gfid-ESTALE, 
0x0, 0) = -1 EOPNOTSUPP (Operation not supported)
[pid 11840] lgetxattr(/mnt/gluster-brick1/datastore/, 
trusted.afr.datastore1-client-0, 0x0, 0) = -1 ENODATA (No data available)
[pid 11840] lgetxattr(/mnt/gluster-brick1/datastore/, 
trusted.afr.datastore1-client-1, 0x0, 0) = -1 ENODATA (No data available)
[pid 11840] llistxattr(/mnt/gluster-brick1/datastore/, (nil), 0) = 63
[pid 11840] llistxattr(/mnt/gluster-brick1/datastore/, 0x7feae3cfda10, 63) = 
63
[pid 29198] ... lgetxattr resumed , 0x0, 0) = 16
[pid 29198] lgetxattr(/mnt/gluster-brick1/datastore/, 
trusted.glusterfs.dht, 
\x00\x00\x00\x01\x00\x00\x00\x00\x00\x00\x00\x00\xff\xff\xff\xff, 16) = 16
[pid 29198] lgetxattr(/mnt/gluster-brick1/datastore/, missing-gfid-ESTALE, 
0x0, 0) = -1 EOPNOTSUPP (Operation not supported)
[pid 29198] lgetxattr(/mnt/gluster-brick1/datastore/, 
trusted.afr.datastore1-client-0, 0x0, 0) = -1 ENODATA (No data available)
[pid 29198] lgetxattr(/mnt/gluster-brick1/datastore/, 
trusted.afr.datastore1-client-1 unfinished ...
[pid 29197] ... lgetxattr resumed , 0x0, 0) = 16
[pid 29198] ... lgetxattr resumed , 0x0, 0) = -1 ENODATA (No data available)
[pid 29197] lgetxattr(/mnt/gluster-brick1/datastore/, trusted.glusterfs.dht 
unfinished ...
[pid 29198] llistxattr(/mnt/gluster-brick1/datastore/, (nil), 0) = 63
[pid 29198] llistxattr(/mnt/gluster-brick1/datastore/ unfinished ...
[pid 29197] ... lgetxattr resumed , 
\x00\x00\x00\x01\x00\x00\x00\x00\x00\x00\x00\x00\xff\xff\xff\xff, 16) = 16
[pid 29198] ... llistxattr resumed , 0x7feae3ffea10, 63) = 63
[pid 29197] lgetxattr(/mnt/gluster-brick1/datastore/, missing-gfid-ESTALE, 
0x0, 0) = -1 EOPNOTSUPP (Operation not supported)
[pid 29197] lgetxattr(/mnt/gluster-brick1/datastore/, 
trusted.afr.datastore1-client-0, 0x0, 0) = -1 ENODATA (No data available)
[pid 29197] lgetxattr(/mnt/gluster-brick1/datastore/, 
trusted.afr.datastore1-client-1, 0x0, 0) = -1 ENODATA (No data available)
[pid 29197] llistxattr(/mnt/gluster-brick1/datastore/, (nil), 0) = 63
[pid 29197] llistxattr(/mnt/gluster-brick1/datastore/, 0x7feaf0487a10, 63) = 
63
[pid 11846] lstat(/mnt/gluster-brick1/datastore/images, 
{st_mode=S_IFDIR|0755, st_size=27, ...}) = 0
[pid 11846] lgetxattr(/mnt/gluster-brick1/datastore/images, trusted.gfid 
unfinished ...
[pid 11844] lstat(/mnt/gluster-brick1/datastore/images, 
{st_mode=S_IFDIR|0755, st_size=27, ...}) = 0
[pid 11844] lgetxattr(/mnt/gluster-brick1/datastore/images, trusted.gfid 
unfinished ...
[pid 11845] lstat(/mnt/gluster-brick1/datastore/images, 
{st_mode=S_IFDIR|0755, st_size=27, ...}) = 0
[pid 11845] lgetxattr(/mnt/gluster-brick1/datastore/images, trusted.gfid 
unfinished ...
[pid 11844] ... lgetxattr resumed , \xbe\x7fIlH\xb0C\xbd\xaaA=BJ6\xca\xb1, 
16) = 16
[pid 11846] ... lgetxattr resumed , \xbe\x7fIlH\xb0C\xbd\xaaA=BJ6\xca\xb1, 
16) = 16
[pid 11845] ... lgetxattr resumed , \xbe\x7fIlH\xb0C\xbd\xaaA=BJ6\xca\xb1, 
16) = 16
[pid 11846] lgetxattr(/mnt/gluster-brick1/datastore/images, 
system.posix_acl_default unfinished ...
[pid 11845] lgetxattr(/mnt/gluster-brick1/datastore/images, 

Re: [Gluster-users] glusterfsd process thrashing CPU

2014-11-18 Thread Franco Broi

Can't see how any of that could account for 1000% cpu unless it's just
stuck in a loop.

On Tue, 2014-11-18 at 18:00 +1000, Lindsay Mathieson wrote: 
 On 18 November 2014 17:46, Franco Broi franco.b...@iongeo.com wrote:
 
  Try strace -Ff -e file -p 'glusterfsd pid'
 
 Thanks, Attached
 ___
 Gluster-users mailing list
 Gluster-users@gluster.org
 http://supercolony.gluster.org/mailman/listinfo/gluster-users


___
Gluster-users mailing list
Gluster-users@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-users


Re: [Gluster-users] glusterfsd process thrashing CPU

2014-11-18 Thread Lindsay Mathieson
On 18 November 2014 18:05, Franco Broi franco.b...@iongeo.com wrote:

 Can't see how any of that could account for 1000% cpu unless it's just
 stuck in a loop.


Currently still varying between 400% to 950%

Can glusterfsd be killed without effecting the lgfapi clients? (KVM's)
___
Gluster-users mailing list
Gluster-users@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-users


Re: [Gluster-users] glusterfsd process thrashing CPU

2014-11-18 Thread Pranith Kumar Karampuri


On 11/18/2014 01:17 PM, Lindsay Mathieson wrote:

On 18 November 2014 17:40, Pranith Kumar Karampuri pkara...@redhat.com wrote:

Sorry didn't see this one. I think this is happening because of 'diff' based
self-heal which does full file checksums, that I believe is the root cause.
Could you execute 'gluster volume set volname
cluster.data-self-heal-algorithm full' to prevent this issue in future. But
this option will be effective for the new self-heals that will be triggered
after the execution of the command. The ongoing ones will still use the old
mode of self-heal.

Thanks, makes sense.

However given the files are tens of GB in size, won't it thrash my network?
Yes you are right. I wonder why thrashing of the network is never 
reported till now.
+Joejulian who also uses VMs on gluster(for 5 years now?). He uses this 
option of full self-heal (Thats what I saw in his bug reports).


I still need to think about how best to solve this problem.

Let me tell you a bit more about this issue:
there are two processes which heal the VM images:
1) self-heal-daemon. 2) Mount process.
Self-heal daemon heals one VM image at a time. But mount process 
triggers self-heals for all the opened files(VM image is nothing but an 
opened file from filesystem's perspective) when a brick goes down and 
comes backup. So we need to come up with a scheme to throttle self-heals 
on the mount point to prevent this issue. I will update you as soon as I 
come up with a fix. This should not be hard to do. Need some time to 
choose the best approach. Thanks a lot for bringing up this issue.


Pranith

___
Gluster-users mailing list
Gluster-users@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-users


___
Gluster-users mailing list
Gluster-users@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-users


Re: [Gluster-users] glusterfsd process thrashing CPU

2014-11-18 Thread Lindsay Mathieson
On Tue, 18 Nov 2014 02:36:19 PM Pranith Kumar Karampuri wrote:
 On 11/18/2014 01:17 PM, Lindsay Mathieson wrote:
  On 18 November 2014 17:40, Pranith Kumar Karampuri pkara...@redhat.com 
wrote:
  
  However given the files are tens of GB in size, won't it thrash my
  network?
 
 Yes you are right. I wonder why thrashing of the network is never
 reported till now.

Not sure if you are being sarcastic or not :) But from what I've observed, 
sync operations seem to self throttle, I've not seen them use more than 50% of 
bandwidth, and given most setups have a dedicated network for the servers 
maybe they just don't notice if it takes a while?

 I still need to think about how best to solve this problem.

Setup a array of queues for self healing, sorted by size maybe?

 
 Let me tell you a bit more about this issue:
 there are two processes which heal the VM images:
 1) self-heal-daemon. 2) Mount process.
 Self-heal daemon heals one VM image at a time. But mount process
 triggers self-heals for all the opened files(VM image is nothing but an
 opened file from filesystem's perspective) when a brick goes down and
 comes backup.


Thanks, interesting to know.

 So we need to come up with a scheme to throttle self-heals
 on the mount point to prevent this issue. I will update you as soon as I
 come up with a fix. This should not be hard to do. Need some time to
 choose the best approach. Thanks a lot for bringing up this issue.

Thanks you for looking at it!

Cheers,


-- 
Lindsay

signature.asc
Description: This is a digitally signed message part.
___
Gluster-users mailing list
Gluster-users@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] glusterfsd process thrashing CPU

2014-11-18 Thread Pranith Kumar Karampuri


On 11/18/2014 04:14 PM, Lindsay Mathieson wrote:

On Tue, 18 Nov 2014 02:36:19 PM Pranith Kumar Karampuri wrote:

On 11/18/2014 01:17 PM, Lindsay Mathieson wrote:

On 18 November 2014 17:40, Pranith Kumar Karampuri pkara...@redhat.com

wrote:

However given the files are tens of GB in size, won't it thrash my
network?

Yes you are right. I wonder why thrashing of the network is never
reported till now.

Not sure if you are being sarcastic or not :) But from what I've observed,
sync operations seem to self throttle, I've not seen them use more than 50% of
bandwidth, and given most setups have a dedicated network for the servers
maybe they just don't notice if it takes a while?
No, I was not being sarcastic :-). I am genuinely wondering why it is 
not reported till now. May be Joe will have more inputs there, that is 
the reason I CCed him.



I still need to think about how best to solve this problem.

Setup a array of queues for self healing, sorted by size maybe?


Let me tell you a bit more about this issue:
there are two processes which heal the VM images:
1) self-heal-daemon. 2) Mount process.
Self-heal daemon heals one VM image at a time. But mount process
triggers self-heals for all the opened files(VM image is nothing but an
opened file from filesystem's perspective) when a brick goes down and
comes backup.


Thanks, interesting to know.


So we need to come up with a scheme to throttle self-heals
on the mount point to prevent this issue. I will update you as soon as I
come up with a fix. This should not be hard to do. Need some time to
choose the best approach. Thanks a lot for bringing up this issue.

Thanks you for looking at it!

Cheers,




___
Gluster-users mailing list
Gluster-users@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-users


___
Gluster-users mailing list
Gluster-users@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] glusterfsd process thrashing CPU

2014-11-17 Thread Lindsay Mathieson
ps. There is very little network traffic happening



-- 
Lindsay
___
Gluster-users mailing list
Gluster-users@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-users


Re: [Gluster-users] glusterfsd process thrashing CPU

2014-11-17 Thread Lindsay Mathieson
And its happening on both nodes now, they have become near unusable.

On 18 November 2014 17:03, Lindsay Mathieson
lindsay.mathie...@gmail.com wrote:
 ps. There is very little network traffic happening



 --
 Lindsay



-- 
Lindsay
___
Gluster-users mailing list
Gluster-users@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-users


Re: [Gluster-users] glusterfsd process thrashing CPU

2014-11-17 Thread Franco Broi

glusterfsd is the filesystem daemon. You could trace strace'ing it to
see what it's doing.

On Tue, 2014-11-18 at 17:09 +1000, Lindsay Mathieson wrote: 
 And its happening on both nodes now, they have become near unusable.
 
 On 18 November 2014 17:03, Lindsay Mathieson
 lindsay.mathie...@gmail.com wrote:
  ps. There is very little network traffic happening
 
 
 
  --
  Lindsay
 
 
 


___
Gluster-users mailing list
Gluster-users@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-users


Re: [Gluster-users] glusterfsd process thrashing CPU

2014-11-17 Thread Pranith Kumar Karampuri


On 11/18/2014 12:32 PM, Lindsay Mathieson wrote:

2 Node replicate setup,

Everything has been stable for days untill I had occasion to reboot
one of the nodes. Since then (past hour) glusterfsd has been pegging
the CPU(s), utilization ranging from 1% to 1000% !

On average its around 500%

This is a vm server, so there are only 27 VM images for a total of
800GB. Its an Intel E5-2620 (12 Cores) with 32GB ECC RAM

- What does glusterfsd do?

- What can I do to fix this?
Which version of glusterfs are you using? Do you have directories with 
lots of files?


Pranith


thanks,



___
Gluster-users mailing list
Gluster-users@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-users


Re: [Gluster-users] glusterfsd process thrashing CPU

2014-11-17 Thread Pranith Kumar Karampuri


On 11/18/2014 01:05 PM, Pranith Kumar Karampuri wrote:


On 11/18/2014 12:32 PM, Lindsay Mathieson wrote:

2 Node replicate setup,

Everything has been stable for days untill I had occasion to reboot
one of the nodes. Since then (past hour) glusterfsd has been pegging
the CPU(s), utilization ranging from 1% to 1000% !

On average its around 500%

This is a vm server, so there are only 27 VM images for a total of
800GB. Its an Intel E5-2620 (12 Cores) with 32GB ECC RAM
Sorry didn't see this one. I think this is happening because of 'diff' 
based self-heal which does full file checksums, that I believe is the 
root cause. Could you execute 'gluster volume set volname 
cluster.data-self-heal-algorithm full' to prevent this issue in future. 
But this option will be effective for the new self-heals that will be 
triggered after the execution of the command. The ongoing ones will 
still use the old mode of self-heal.


Pranith


- What does glusterfsd do?

- What can I do to fix this?
Which version of glusterfs are you using? Do you have directories with 
lots of files?


Pranith


thanks,



___
Gluster-users mailing list
Gluster-users@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-users


___
Gluster-users mailing list
Gluster-users@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-users


Re: [Gluster-users] glusterfsd process thrashing CPU

2014-11-17 Thread Lindsay Mathieson
Gluster 3.5.2

Very few files - its purely a VM image host, 27 files, 10 - 60GB in size.

seems to be undergoing a heal:

root@vnb:~# gluster volume heal datastore1 info
Brick vnb:/mnt/gluster-brick1/datastore/
/images/108/vm-108-disk-1.qcow2 - Possibly undergoing heal
/images/105/vm-105-disk-1.qcow2 - Possibly undergoing heal
/images/100/vm-100-disk-1.qcow2 - Possibly undergoing heal
/images/401/vm-401-disk-1.qcow2 - Possibly undergoing heal
/images/201/vm-201-disk-1.qcow2 - Possibly undergoing heal
/images/204/vm-204-disk-1.qcow2 - Possibly undergoing heal
/images/102/vm-102-disk-1.qcow2 - Possibly undergoing heal
/images/501/vm-501-disk-1.qcow2 - Possibly undergoing heal
/images/203/vm-203-disk-1.qcow2 - Possibly undergoing heal
/images/106/vm-106-disk-1.qcow2 - Possibly undergoing heal
/images/400/vm-400-disk-1.qcow2 - Possibly undergoing heal
/images/107/vm-107-disk-1.qcow2 - Possibly undergoing heal
Number of entries: 12

Brick vng:/mnt/gluster-brick1/datastore/
gfid:59489c91-2df1-4c7d-8dda-265020098b67 - Possibly undergoing heal
gfid:ebb7692f-39ee-4b9b-b48a-1241c47c13bf - Possibly undergoing heal
gfid:8759bea0-ab64-4f7b-87b3-69217ebfee55 - Possibly undergoing heal
Number of entries: 3


What would the gfid entries be?

On 18 November 2014 17:35, Pranith Kumar Karampuri pkara...@redhat.com wrote:

 On 11/18/2014 12:32 PM, Lindsay Mathieson wrote:

 2 Node replicate setup,

 Everything has been stable for days untill I had occasion to reboot
 one of the nodes. Since then (past hour) glusterfsd has been pegging
 the CPU(s), utilization ranging from 1% to 1000% !

 On average its around 500%

 This is a vm server, so there are only 27 VM images for a total of
 800GB. Its an Intel E5-2620 (12 Cores) with 32GB ECC RAM

 - What does glusterfsd do?

 - What can I do to fix this?

 Which version of glusterfs are you using? Do you have directories with lots
 of files?

 Pranith


 thanks,





-- 
Lindsay
___
Gluster-users mailing list
Gluster-users@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-users


Re: [Gluster-users] glusterfsd process thrashing CPU

2014-11-17 Thread Lindsay Mathieson
Sorry, meant to send to the list. strace attached.

On 18 November 2014 17:35, Pranith Kumar Karampuri pkara...@redhat.com wrote:

 On 11/18/2014 12:32 PM, Lindsay Mathieson wrote:

 2 Node replicate setup,

 Everything has been stable for days untill I had occasion to reboot
 one of the nodes. Since then (past hour) glusterfsd has been pegging
 the CPU(s), utilization ranging from 1% to 1000% !

 On average its around 500%

 This is a vm server, so there are only 27 VM images for a total of
 800GB. Its an Intel E5-2620 (12 Cores) with 32GB ECC RAM

 - What does glusterfsd do?

 - What can I do to fix this?

 Which version of glusterfs are you using? Do you have directories with lots
 of files?

 Pranith


 thanks,





-- 
Lindsay
execve(/usr/sbin/glusterfsd, [glusterfsd], [/* 15 vars */]) = 0
brk(0)  = 0x1e76000
access(/etc/ld.so.nohwcap, F_OK)  = -1 ENOENT (No such file or directory)
mmap(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 
0x7f365dc72000
access(/etc/ld.so.preload, R_OK)  = -1 ENOENT (No such file or directory)
open(/etc/ld.so.cache, O_RDONLY)  = 3
fstat(3, {st_mode=S_IFREG|0644, st_size=32190, ...}) = 0
mmap(NULL, 32190, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7f365dc6a000
close(3)= 0
access(/etc/ld.so.nohwcap, F_OK)  = -1 ENOENT (No such file or directory)
open(/lib/x86_64-linux-gnu/libdl.so.2, O_RDONLY) = 3
read(3, \177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0\0\1\0\0\0\340\r\0\0\0\0\0\0..., 
832) = 832
fstat(3, {st_mode=S_IFREG|0644, st_size=14768, ...}) = 0
mmap(NULL, 2109696, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 
0x7f365d851000
mprotect(0x7f365d853000, 2097152, PROT_NONE) = 0
mmap(0x7f365da53000, 8192, PROT_READ|PROT_WRITE, 
MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x2000) = 0x7f365da53000
close(3)= 0
access(/etc/ld.so.nohwcap, F_OK)  = -1 ENOENT (No such file or directory)
open(/lib/x86_64-linux-gnu/libutil.so.1, O_RDONLY) = 3
read(3, \177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0\0\1\0\0\0\20\16\0\0\0\0\0\0..., 
832) = 832
fstat(3, {st_mode=S_IFREG|0644, st_size=10640, ...}) = 0
mmap(NULL, 2105608, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 
0x7f365d64e000
mprotect(0x7f365d65, 2093056, PROT_NONE) = 0
mmap(0x7f365d84f000, 8192, PROT_READ|PROT_WRITE, 
MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x1000) = 0x7f365d84f000
close(3)= 0
access(/etc/ld.so.nohwcap, F_OK)  = -1 ENOENT (No such file or directory)
open(/lib/x86_64-linux-gnu/libm.so.6, O_RDONLY) = 3
read(3, \177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0\0\1\0\0\0\360\0\0\0\0\0\0..., 
832) = 832
fstat(3, {st_mode=S_IFREG|0644, st_size=530736, ...}) = 0
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 
0x7f365dc69000
mmap(NULL, 2625768, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 
0x7f365d3cc000
mprotect(0x7f365d44d000, 2093056, PROT_NONE) = 0
mmap(0x7f365d64c000, 8192, PROT_READ|PROT_WRITE, 
MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x8) = 0x7f365d64c000
close(3)= 0
access(/etc/ld.so.nohwcap, F_OK)  = -1 ENOENT (No such file or directory)
open(/usr/lib/libpython2.7.so.1.0, O_RDONLY) = 3
read(3, \177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0\0\1\0\0\0`\217\4\0\0\0\0\0..., 
832) = 832
fstat(3, {st_mode=S_IFREG|0644, st_size=3073448, ...}) = 0
mmap(NULL, 5242520, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 
0x7f365cecc000
mprotect(0x7f365d15, 2093056, PROT_NONE) = 0
mmap(0x7f365d34f000, 438272, PROT_READ|PROT_WRITE, 
MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x283000) = 0x7f365d34f000
mmap(0x7f365d3ba000, 73368, PROT_READ|PROT_WRITE, 
MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x7f365d3ba000
close(3)= 0
access(/etc/ld.so.nohwcap, F_OK)  = -1 ENOENT (No such file or directory)
open(/usr/lib/x86_64-linux-gnu/libglusterfs.so.0, O_RDONLY) = 3
read(3, \177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0\0\1\0\0\0\340Q\1\0\0\0\0\0..., 
832) = 832
fstat(3, {st_mode=S_IFREG|0644, st_size=557592, ...}) = 0
mmap(NULL, 2666280, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 
0x7f365cc41000
mprotect(0x7f365ccc7000, 2097152, PROT_NONE) = 0
mmap(0x7f365cec7000, 8192, PROT_READ|PROT_WRITE, 
MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x86000) = 0x7f365cec7000
mmap(0x7f365cec9000, 12072, PROT_READ|PROT_WRITE, 
MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x7f365cec9000
close(3)= 0
access(/etc/ld.so.nohwcap, F_OK)  = -1 ENOENT (No such file or directory)
open(/usr/lib/x86_64-linux-gnu/libgfrpc.so.0, O_RDONLY) = 3
read(3, \177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0\0\1\0\0\0\360Z\0\0\0\0\0\0..., 
832) = 832
fstat(3, {st_mode=S_IFREG|0644, st_size=105848, ...}) = 0
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 
0x7f365dc68000
mmap(NULL, 2201016, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 
0x7f365ca27000

Re: [Gluster-users] glusterfsd process thrashing CPU

2014-11-17 Thread Franco Broi

Try strace -Ff -e file -p 'glusterfsd pid'

On Tue, 2014-11-18 at 17:42 +1000, Lindsay Mathieson wrote: 
 Sorry, meant to send to the list. strace attached.
 
 On 18 November 2014 17:35, Pranith Kumar Karampuri pkara...@redhat.com 
 wrote:
 
  On 11/18/2014 12:32 PM, Lindsay Mathieson wrote:
 
  2 Node replicate setup,
 
  Everything has been stable for days untill I had occasion to reboot
  one of the nodes. Since then (past hour) glusterfsd has been pegging
  the CPU(s), utilization ranging from 1% to 1000% !
 
  On average its around 500%
 
  This is a vm server, so there are only 27 VM images for a total of
  800GB. Its an Intel E5-2620 (12 Cores) with 32GB ECC RAM
 
  - What does glusterfsd do?
 
  - What can I do to fix this?
 
  Which version of glusterfs are you using? Do you have directories with lots
  of files?
 
  Pranith
 
 
  thanks,
 
 
 
 
 
 ___
 Gluster-users mailing list
 Gluster-users@gluster.org
 http://supercolony.gluster.org/mailman/listinfo/gluster-users


___
Gluster-users mailing list
Gluster-users@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-users


Re: [Gluster-users] glusterfsd process thrashing CPU

2014-11-17 Thread Lindsay Mathieson
On 18 November 2014 17:40, Pranith Kumar Karampuri pkara...@redhat.com wrote:

 Sorry didn't see this one. I think this is happening because of 'diff' based
 self-heal which does full file checksums, that I believe is the root cause.
 Could you execute 'gluster volume set volname
 cluster.data-self-heal-algorithm full' to prevent this issue in future. But
 this option will be effective for the new self-heals that will be triggered
 after the execution of the command. The ongoing ones will still use the old
 mode of self-heal.

Thanks, makes sense.

However given the files are tens of GB in size, won't it thrash my network?
___
Gluster-users mailing list
Gluster-users@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-users