Re: [Gluster-devel] Huge VSZ (VIRT) usage by glustershd on dummy node
OK, here the results go. I've taken 5 statedumps with 30 mins between each statedump. Also, before taking the statedump, I've recorded memory usage. Memory consumption: 1. root 1010 0.0 9.6 7538188 374864 ? Ssl чер07 0:16 /usr/sbin/glusterfs -s localhost --volfile-id gluster/glustershd -p /var/lib/glusterd/glustershd/run/glustershd.pid -l /var/log/glusterfs/glustershd.log -S /var/run/gluster/7848e17764dd4ba80f4623aecb91b07a.socket --xlator-option *replicate*.node-uuid=80bc95e1-2027-4a96-bb66-d9c8ade624d7 2. root 1010 0.0 9.6 7825048 375312 ? Ssl чер07 0:16 /usr/sbin/glusterfs -s localhost --volfile-id gluster/glustershd -p /var/lib/glusterd/glustershd/run/glustershd.pid -l /var/log/glusterfs/glustershd.log -S /var/run/gluster/7848e17764dd4ba80f4623aecb91b07a.socket --xlator-option *replicate*.node-uuid=80bc95e1-2027-4a96-bb66-d9c8ade624d7 3. root 1010 0.0 9.6 7825048 375312 ? Ssl чер07 0:17 /usr/sbin/glusterfs -s localhost --volfile-id gluster/glustershd -p /var/lib/glusterd/glustershd/run/glustershd.pid -l /var/log/glusterfs/glustershd.log -S /var/run/gluster/7848e17764dd4ba80f4623aecb91b07a.socket --xlator-option *replicate*.node-uuid=80bc95e1-2027-4a96-bb66-d9c8ade624d7 4. root 1010 0.0 9.6 8202064 375892 ? Ssl чер07 0:17 /usr/sbin/glusterfs -s localhost --volfile-id gluster/glustershd -p /var/lib/glusterd/glustershd/run/glustershd.pid -l /var/log/glusterfs/glustershd.log -S /var/run/gluster/7848e17764dd4ba80f4623aecb91b07a.socket --xlator-option *replicate*.node-uuid=80bc95e1-2027-4a96-bb66-d9c8ade624d7 5. root 1010 0.0 9.6 8316808 376084 ? Ssl чер07 0:17 /usr/sbin/glusterfs -s localhost --volfile-id gluster/glustershd -p /var/lib/glusterd/glustershd/run/glustershd.pid -l /var/log/glusterfs/glustershd.log -S /var/run/gluster/7848e17764dd4ba80f4623aecb91b07a.socket --xlator-option *replicate*.node-uuid=80bc95e1-2027-4a96-bb66-d9c8ade624d7 As you may see VIRT constantly grows (except for one measurements), and RSS grows as well, although its increase is considerably smaller. Now lets take a look at statedumps: 1. https://gist.github.com/3fa121c7531d05b210b84d9db763f359 2. https://gist.github.com/87f48b8ac8378262b84d448765730fd9 3. https://gist.github.com/f8780014d8430d67687c70cfd1df9c5c 4. https://gist.github.com/916ac788f806328bad9de5311ce319d7 5. https://gist.github.com/8ba5dbf27d2cc61c04ca954d7fb0a7fd I'd go with comparing first statedump with last one, and here is diff output: https://gist.github.com/e94e7f17fe8b3688c6a92f49cbc15193 I see numbers changing, but now cannot conclude what is meaningful and what is meaningless. Pranith? 08.06.2016 10:06, Pranith Kumar Karampuri написав: On Wed, Jun 8, 2016 at 12:33 PM, Oleksandr Natalenkowrote: Yup, I can do that, but please note that RSS does not change. Will statedump show VIRT values? Also, I'm looking at the numbers now, and see that on each reconnect VIRT grows by ~24M (once per ~10–15 mins). Probably, that could give you some idea what is going wrong. That's interesting. Never saw something like this happen. I would still like to see if there are any clues in statedump when all this happens. May be what you said will be confirmed that nothing new is allocated but I would just like to confirm. 08.06.2016 09:50, Pranith Kumar Karampuri написав: Oleksandr, Could you take statedump of the shd process once in 5-10 minutes and send may be 5 samples of them when it starts to increase? This will help us find what datatypes are being allocated a lot and can lead to coming up with possible theories for the increase. On Wed, Jun 8, 2016 at 12:03 PM, Oleksandr Natalenko wrote: Also, I've checked shd log files, and found out that for some reason shd constantly reconnects to bricks: [1] Please note that suggested fix [2] by Pranith does not help, VIRT value still grows: === root 1010 0.0 9.6 7415248 374688 ? Ssl чер07 0:14 /usr/sbin/glusterfs -s localhost --volfile-id gluster/glustershd -p /var/lib/glusterd/glustershd/run/glustershd.pid -l /var/log/glusterfs/glustershd.log -S /var/run/gluster/7848e17764dd4ba80f4623aecb91b07a.socket --xlator-option *replicate*.node-uuid=80bc95e1-2027-4a96-bb66-d9c8ade624d7 === I do not know the reason why it is reconnecting, but I suspect leak to happen on that reconnect. CCing Pranith. [1] http://termbin.com/brob [2] http://review.gluster.org/#/c/14053/ 06.06.2016 12:21, Kaushal M написав: Has multi-threaded SHD been merged into 3.7.* by any chance? If not, what I'm saying below doesn't apply. We saw problems when encrypted transports were used, because the RPC layer was not reaping threads (doing pthread_join) when a connection ended. This lead to similar observations of huge VIRT and relatively small RSS. I'm not sure how multi-threaded shd works, but it could be leaking threads in a similar way. On Mon, Jun 6, 2016 at
Re: [Gluster-devel] Huge VSZ (VIRT) usage by glustershd on dummy node
On Wed, Jun 8, 2016 at 12:33 PM, Oleksandr Natalenko < oleksa...@natalenko.name> wrote: > Yup, I can do that, but please note that RSS does not change. Will > statedump show VIRT values? > > Also, I'm looking at the numbers now, and see that on each reconnect VIRT > grows by ~24M (once per ~10–15 mins). Probably, that could give you some > idea what is going wrong. > That's interesting. Never saw something like this happen. I would still like to see if there are any clues in statedump when all this happens. May be what you said will be confirmed that nothing new is allocated but I would just like to confirm. > 08.06.2016 09:50, Pranith Kumar Karampuri написав: > > Oleksandr, >> Could you take statedump of the shd process once in 5-10 minutes and >> send may be 5 samples of them when it starts to increase? This will >> help us find what datatypes are being allocated a lot and can lead to >> coming up with possible theories for the increase. >> >> On Wed, Jun 8, 2016 at 12:03 PM, Oleksandr Natalenko >>wrote: >> >> Also, I've checked shd log files, and found out that for some reason >>> shd constantly reconnects to bricks: [1] >>> >>> Please note that suggested fix [2] by Pranith does not help, VIRT >>> value still grows: >>> >>> === >>> root 1010 0.0 9.6 7415248 374688 ? Ssl чер07 0:14 >>> /usr/sbin/glusterfs -s localhost --volfile-id gluster/glustershd -p >>> /var/lib/glusterd/glustershd/run/glustershd.pid -l >>> /var/log/glusterfs/glustershd.log -S >>> /var/run/gluster/7848e17764dd4ba80f4623aecb91b07a.socket >>> --xlator-option >>> *replicate*.node-uuid=80bc95e1-2027-4a96-bb66-d9c8ade624d7 >>> === >>> >>> I do not know the reason why it is reconnecting, but I suspect leak >>> to happen on that reconnect. >>> >>> CCing Pranith. >>> >>> [1] http://termbin.com/brob >>> [2] http://review.gluster.org/#/c/14053/ >>> >>> 06.06.2016 12:21, Kaushal M написав: >>> Has multi-threaded SHD been merged into 3.7.* by any chance? If >>> not, >>> >>> what I'm saying below doesn't apply. >>> >>> We saw problems when encrypted transports were used, because the RPC >>> layer was not reaping threads (doing pthread_join) when a connection >>> ended. This lead to similar observations of huge VIRT and relatively >>> small RSS. >>> >>> I'm not sure how multi-threaded shd works, but it could be leaking >>> threads in a similar way. >>> >>> On Mon, Jun 6, 2016 at 1:54 PM, Oleksandr Natalenko >>> wrote: >>> Hello. >>> >>> We use v3.7.11, replica 2 setup between 2 nodes + 1 dummy node for >>> keeping >>> volumes metadata. >>> >>> Now we observe huge VSZ (VIRT) usage by glustershd on dummy node: >>> >>> === >>> root 15109 0.0 13.7 76552820 535272 ? Ssl тра26 2:11 >>> /usr/sbin/glusterfs -s localhost --volfile-id gluster/glustershd -p >>> /var/lib/glusterd/glustershd/run/glustershd.pid -l >>> /var/log/glusterfs/glustershd.log -S >>> /var/run/gluster/7848e17764dd4ba80f4623aecb91b07a.socket >>> --xlator-option >>> *replicate*.node-uuid=80bc95e1-2027-4a96-bb66-d9c8ade624d7 >>> === >>> >>> that is ~73G. RSS seems to be OK (~522M). Here is the statedump of >>> glustershd process: [1] >>> >>> Also, here is sum of sizes, presented in statedump: >>> >>> === >>> # cat /var/run/gluster/glusterdump.15109.dump.1465200139 | awk -F >>> '=' 'BEGIN >>> {sum=0} /^size=/ {sum+=$2} END {print sum}' >>> 353276406 >>> === >>> >>> That is ~337 MiB. >>> >>> Also, here are VIRT values from 2 replica nodes: >>> >>> === >>> root 24659 0.0 0.3 5645836 451796 ? Ssl тра24 3:28 >>> /usr/sbin/glusterfs -s localhost --volfile-id gluster/glustershd -p >>> /var/lib/glusterd/glustershd/run/glustershd.pid -l >>> /var/log/glusterfs/glustershd.log -S >>> /var/run/gluster/44ec3f29003eccedf894865107d5db90.socket >>> --xlator-option >>> *replicate*.node-uuid=a19afcc2-e26c-43ce-bca6-d27dc1713e87 >>> root 18312 0.0 0.3 6137500 477472 ? Ssl тра19 6:37 >>> /usr/sbin/glusterfs -s localhost --volfile-id gluster/glustershd -p >>> /var/lib/glusterd/glustershd/run/glustershd.pid -l >>> /var/log/glusterfs/glustershd.log -S >>> /var/run/gluster/1670a3abbd1eea968126eb6f5be20322.socket >>> --xlator-option >>> *replicate*.node-uuid=52dca21b-c81c-48b5-9de2-1ed37987fbc2 >>> === >>> >>> Those are 5 to 6G, which is much less than dummy node has, but still >>> look >>> too big for us. >>> >>> Should we care about huge VIRT value on dummy node? Also, how one >>> would >>> debug that? >>> >>> Regards, >>> Oleksandr. >>> >>> [1] https://gist.github.com/d2cfa25251136512580220fcdb8a6ce6 >>> ___ >>> Gluster-devel mailing list >>> Gluster-devel@gluster.org >>> http://www.gluster.org/mailman/listinfo/gluster-devel >>> >> >> -- >> >> Pranith >> > -- Pranith ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] Huge VSZ (VIRT) usage by glustershd on dummy node
Yup, I can do that, but please note that RSS does not change. Will statedump show VIRT values? Also, I'm looking at the numbers now, and see that on each reconnect VIRT grows by ~24M (once per ~10–15 mins). Probably, that could give you some idea what is going wrong. 08.06.2016 09:50, Pranith Kumar Karampuri написав: Oleksandr, Could you take statedump of the shd process once in 5-10 minutes and send may be 5 samples of them when it starts to increase? This will help us find what datatypes are being allocated a lot and can lead to coming up with possible theories for the increase. On Wed, Jun 8, 2016 at 12:03 PM, Oleksandr Natalenkowrote: Also, I've checked shd log files, and found out that for some reason shd constantly reconnects to bricks: [1] Please note that suggested fix [2] by Pranith does not help, VIRT value still grows: === root 1010 0.0 9.6 7415248 374688 ? Ssl чер07 0:14 /usr/sbin/glusterfs -s localhost --volfile-id gluster/glustershd -p /var/lib/glusterd/glustershd/run/glustershd.pid -l /var/log/glusterfs/glustershd.log -S /var/run/gluster/7848e17764dd4ba80f4623aecb91b07a.socket --xlator-option *replicate*.node-uuid=80bc95e1-2027-4a96-bb66-d9c8ade624d7 === I do not know the reason why it is reconnecting, but I suspect leak to happen on that reconnect. CCing Pranith. [1] http://termbin.com/brob [2] http://review.gluster.org/#/c/14053/ 06.06.2016 12:21, Kaushal M написав: Has multi-threaded SHD been merged into 3.7.* by any chance? If not, what I'm saying below doesn't apply. We saw problems when encrypted transports were used, because the RPC layer was not reaping threads (doing pthread_join) when a connection ended. This lead to similar observations of huge VIRT and relatively small RSS. I'm not sure how multi-threaded shd works, but it could be leaking threads in a similar way. On Mon, Jun 6, 2016 at 1:54 PM, Oleksandr Natalenko wrote: Hello. We use v3.7.11, replica 2 setup between 2 nodes + 1 dummy node for keeping volumes metadata. Now we observe huge VSZ (VIRT) usage by glustershd on dummy node: === root 15109 0.0 13.7 76552820 535272 ? Ssl тра26 2:11 /usr/sbin/glusterfs -s localhost --volfile-id gluster/glustershd -p /var/lib/glusterd/glustershd/run/glustershd.pid -l /var/log/glusterfs/glustershd.log -S /var/run/gluster/7848e17764dd4ba80f4623aecb91b07a.socket --xlator-option *replicate*.node-uuid=80bc95e1-2027-4a96-bb66-d9c8ade624d7 === that is ~73G. RSS seems to be OK (~522M). Here is the statedump of glustershd process: [1] Also, here is sum of sizes, presented in statedump: === # cat /var/run/gluster/glusterdump.15109.dump.1465200139 | awk -F '=' 'BEGIN {sum=0} /^size=/ {sum+=$2} END {print sum}' 353276406 === That is ~337 MiB. Also, here are VIRT values from 2 replica nodes: === root 24659 0.0 0.3 5645836 451796 ? Ssl тра24 3:28 /usr/sbin/glusterfs -s localhost --volfile-id gluster/glustershd -p /var/lib/glusterd/glustershd/run/glustershd.pid -l /var/log/glusterfs/glustershd.log -S /var/run/gluster/44ec3f29003eccedf894865107d5db90.socket --xlator-option *replicate*.node-uuid=a19afcc2-e26c-43ce-bca6-d27dc1713e87 root 18312 0.0 0.3 6137500 477472 ? Ssl тра19 6:37 /usr/sbin/glusterfs -s localhost --volfile-id gluster/glustershd -p /var/lib/glusterd/glustershd/run/glustershd.pid -l /var/log/glusterfs/glustershd.log -S /var/run/gluster/1670a3abbd1eea968126eb6f5be20322.socket --xlator-option *replicate*.node-uuid=52dca21b-c81c-48b5-9de2-1ed37987fbc2 === Those are 5 to 6G, which is much less than dummy node has, but still look too big for us. Should we care about huge VIRT value on dummy node? Also, how one would debug that? Regards, Oleksandr. [1] https://gist.github.com/d2cfa25251136512580220fcdb8a6ce6 ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel -- Pranith ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] Huge VSZ (VIRT) usage by glustershd on dummy node
Oleksandr, Could you take statedump of the shd process once in 5-10 minutes and send may be 5 samples of them when it starts to increase? This will help us find what datatypes are being allocated a lot and can lead to coming up with possible theories for the increase. On Wed, Jun 8, 2016 at 12:03 PM, Oleksandr Natalenko < oleksa...@natalenko.name> wrote: > Also, I've checked shd log files, and found out that for some reason shd > constantly reconnects to bricks: [1] > > Please note that suggested fix [2] by Pranith does not help, VIRT value > still grows: > > === > root 1010 0.0 9.6 7415248 374688 ? Ssl чер07 0:14 > /usr/sbin/glusterfs -s localhost --volfile-id gluster/glustershd -p > /var/lib/glusterd/glustershd/run/glustershd.pid -l > /var/log/glusterfs/glustershd.log -S > /var/run/gluster/7848e17764dd4ba80f4623aecb91b07a.socket --xlator-option > *replicate*.node-uuid=80bc95e1-2027-4a96-bb66-d9c8ade624d7 > === > > I do not know the reason why it is reconnecting, but I suspect leak to > happen on that reconnect. > > CCing Pranith. > > [1] http://termbin.com/brob > [2] http://review.gluster.org/#/c/14053/ > > 06.06.2016 12:21, Kaushal M написав: > >> Has multi-threaded SHD been merged into 3.7.* by any chance? If not, >> >> what I'm saying below doesn't apply. >> >> We saw problems when encrypted transports were used, because the RPC >> layer was not reaping threads (doing pthread_join) when a connection >> ended. This lead to similar observations of huge VIRT and relatively >> small RSS. >> >> I'm not sure how multi-threaded shd works, but it could be leaking >> threads in a similar way. >> >> On Mon, Jun 6, 2016 at 1:54 PM, Oleksandr Natalenko >>wrote: >> >>> Hello. >>> >>> We use v3.7.11, replica 2 setup between 2 nodes + 1 dummy node for >>> keeping >>> volumes metadata. >>> >>> Now we observe huge VSZ (VIRT) usage by glustershd on dummy node: >>> >>> === >>> root 15109 0.0 13.7 76552820 535272 ? Ssl тра26 2:11 >>> /usr/sbin/glusterfs -s localhost --volfile-id gluster/glustershd -p >>> /var/lib/glusterd/glustershd/run/glustershd.pid -l >>> /var/log/glusterfs/glustershd.log -S >>> /var/run/gluster/7848e17764dd4ba80f4623aecb91b07a.socket --xlator-option >>> *replicate*.node-uuid=80bc95e1-2027-4a96-bb66-d9c8ade624d7 >>> === >>> >>> that is ~73G. RSS seems to be OK (~522M). Here is the statedump of >>> glustershd process: [1] >>> >>> Also, here is sum of sizes, presented in statedump: >>> >>> === >>> # cat /var/run/gluster/glusterdump.15109.dump.1465200139 | awk -F '=' >>> 'BEGIN >>> {sum=0} /^size=/ {sum+=$2} END {print sum}' >>> 353276406 >>> === >>> >>> That is ~337 MiB. >>> >>> Also, here are VIRT values from 2 replica nodes: >>> >>> === >>> root 24659 0.0 0.3 5645836 451796 ? Ssl тра24 3:28 >>> /usr/sbin/glusterfs -s localhost --volfile-id gluster/glustershd -p >>> /var/lib/glusterd/glustershd/run/glustershd.pid -l >>> /var/log/glusterfs/glustershd.log -S >>> /var/run/gluster/44ec3f29003eccedf894865107d5db90.socket --xlator-option >>> *replicate*.node-uuid=a19afcc2-e26c-43ce-bca6-d27dc1713e87 >>> root 18312 0.0 0.3 6137500 477472 ? Ssl тра19 6:37 >>> /usr/sbin/glusterfs -s localhost --volfile-id gluster/glustershd -p >>> /var/lib/glusterd/glustershd/run/glustershd.pid -l >>> /var/log/glusterfs/glustershd.log -S >>> /var/run/gluster/1670a3abbd1eea968126eb6f5be20322.socket --xlator-option >>> *replicate*.node-uuid=52dca21b-c81c-48b5-9de2-1ed37987fbc2 >>> === >>> >>> Those are 5 to 6G, which is much less than dummy node has, but still look >>> too big for us. >>> >>> Should we care about huge VIRT value on dummy node? Also, how one would >>> debug that? >>> >>> Regards, >>> Oleksandr. >>> >>> [1] https://gist.github.com/d2cfa25251136512580220fcdb8a6ce6 >>> ___ >>> Gluster-devel mailing list >>> Gluster-devel@gluster.org >>> http://www.gluster.org/mailman/listinfo/gluster-devel >>> >> -- Pranith ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] Huge VSZ (VIRT) usage by glustershd on dummy node
Also, I've checked shd log files, and found out that for some reason shd constantly reconnects to bricks: [1] Please note that suggested fix [2] by Pranith does not help, VIRT value still grows: === root 1010 0.0 9.6 7415248 374688 ? Ssl чер07 0:14 /usr/sbin/glusterfs -s localhost --volfile-id gluster/glustershd -p /var/lib/glusterd/glustershd/run/glustershd.pid -l /var/log/glusterfs/glustershd.log -S /var/run/gluster/7848e17764dd4ba80f4623aecb91b07a.socket --xlator-option *replicate*.node-uuid=80bc95e1-2027-4a96-bb66-d9c8ade624d7 === I do not know the reason why it is reconnecting, but I suspect leak to happen on that reconnect. CCing Pranith. [1] http://termbin.com/brob [2] http://review.gluster.org/#/c/14053/ 06.06.2016 12:21, Kaushal M написав: Has multi-threaded SHD been merged into 3.7.* by any chance? If not, what I'm saying below doesn't apply. We saw problems when encrypted transports were used, because the RPC layer was not reaping threads (doing pthread_join) when a connection ended. This lead to similar observations of huge VIRT and relatively small RSS. I'm not sure how multi-threaded shd works, but it could be leaking threads in a similar way. On Mon, Jun 6, 2016 at 1:54 PM, Oleksandr Natalenkowrote: Hello. We use v3.7.11, replica 2 setup between 2 nodes + 1 dummy node for keeping volumes metadata. Now we observe huge VSZ (VIRT) usage by glustershd on dummy node: === root 15109 0.0 13.7 76552820 535272 ? Ssl тра26 2:11 /usr/sbin/glusterfs -s localhost --volfile-id gluster/glustershd -p /var/lib/glusterd/glustershd/run/glustershd.pid -l /var/log/glusterfs/glustershd.log -S /var/run/gluster/7848e17764dd4ba80f4623aecb91b07a.socket --xlator-option *replicate*.node-uuid=80bc95e1-2027-4a96-bb66-d9c8ade624d7 === that is ~73G. RSS seems to be OK (~522M). Here is the statedump of glustershd process: [1] Also, here is sum of sizes, presented in statedump: === # cat /var/run/gluster/glusterdump.15109.dump.1465200139 | awk -F '=' 'BEGIN {sum=0} /^size=/ {sum+=$2} END {print sum}' 353276406 === That is ~337 MiB. Also, here are VIRT values from 2 replica nodes: === root 24659 0.0 0.3 5645836 451796 ? Ssl тра24 3:28 /usr/sbin/glusterfs -s localhost --volfile-id gluster/glustershd -p /var/lib/glusterd/glustershd/run/glustershd.pid -l /var/log/glusterfs/glustershd.log -S /var/run/gluster/44ec3f29003eccedf894865107d5db90.socket --xlator-option *replicate*.node-uuid=a19afcc2-e26c-43ce-bca6-d27dc1713e87 root 18312 0.0 0.3 6137500 477472 ? Ssl тра19 6:37 /usr/sbin/glusterfs -s localhost --volfile-id gluster/glustershd -p /var/lib/glusterd/glustershd/run/glustershd.pid -l /var/log/glusterfs/glustershd.log -S /var/run/gluster/1670a3abbd1eea968126eb6f5be20322.socket --xlator-option *replicate*.node-uuid=52dca21b-c81c-48b5-9de2-1ed37987fbc2 === Those are 5 to 6G, which is much less than dummy node has, but still look too big for us. Should we care about huge VIRT value on dummy node? Also, how one would debug that? Regards, Oleksandr. [1] https://gist.github.com/d2cfa25251136512580220fcdb8a6ce6 ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] Huge VSZ (VIRT) usage by glustershd on dummy node
Also, I see lots of entries in pmap output: === 7ef9ff8f3000 4K - [ anon ] 7ef9ff8f4000 8192K rw--- [ anon ] 7efa000f4000 4K - [ anon ] 7efa000f5000 8192K rw--- [ anon ] === If I sum them, I get the following: === # pmap 15109 | grep '[ anon ]' | grep 8192K | wc -l 9261 $ echo "9261*(8192+4)" | bc 75903156 === Which is something like 70G+ I have got in VIRT. 06.06.2016 11:24, Oleksandr Natalenko написав: Hello. We use v3.7.11, replica 2 setup between 2 nodes + 1 dummy node for keeping volumes metadata. Now we observe huge VSZ (VIRT) usage by glustershd on dummy node: === root 15109 0.0 13.7 76552820 535272 ? Ssl тра26 2:11 /usr/sbin/glusterfs -s localhost --volfile-id gluster/glustershd -p /var/lib/glusterd/glustershd/run/glustershd.pid -l /var/log/glusterfs/glustershd.log -S /var/run/gluster/7848e17764dd4ba80f4623aecb91b07a.socket --xlator-option *replicate*.node-uuid=80bc95e1-2027-4a96-bb66-d9c8ade624d7 === that is ~73G. RSS seems to be OK (~522M). Here is the statedump of glustershd process: [1] Also, here is sum of sizes, presented in statedump: === # cat /var/run/gluster/glusterdump.15109.dump.1465200139 | awk -F '=' 'BEGIN {sum=0} /^size=/ {sum+=$2} END {print sum}' 353276406 === That is ~337 MiB. Also, here are VIRT values from 2 replica nodes: === root 24659 0.0 0.3 5645836 451796 ? Ssl тра24 3:28 /usr/sbin/glusterfs -s localhost --volfile-id gluster/glustershd -p /var/lib/glusterd/glustershd/run/glustershd.pid -l /var/log/glusterfs/glustershd.log -S /var/run/gluster/44ec3f29003eccedf894865107d5db90.socket --xlator-option *replicate*.node-uuid=a19afcc2-e26c-43ce-bca6-d27dc1713e87 root 18312 0.0 0.3 6137500 477472 ? Ssl тра19 6:37 /usr/sbin/glusterfs -s localhost --volfile-id gluster/glustershd -p /var/lib/glusterd/glustershd/run/glustershd.pid -l /var/log/glusterfs/glustershd.log -S /var/run/gluster/1670a3abbd1eea968126eb6f5be20322.socket --xlator-option *replicate*.node-uuid=52dca21b-c81c-48b5-9de2-1ed37987fbc2 === Those are 5 to 6G, which is much less than dummy node has, but still look too big for us. Should we care about huge VIRT value on dummy node? Also, how one would debug that? Regards, Oleksandr. [1] https://gist.github.com/d2cfa25251136512580220fcdb8a6ce6 ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] Huge VSZ (VIRT) usage by glustershd on dummy node
I believe, multi-threaded shd has not been merged at least into 3.7 branch prior to 3.7.11 (incl.), because I've found this [1]. [1] https://www.gluster.org/pipermail/maintainers/2016-April/000628.html 06.06.2016 12:21, Kaushal M написав: Has multi-threaded SHD been merged into 3.7.* by any chance? If not, what I'm saying below doesn't apply. We saw problems when encrypted transports were used, because the RPC layer was not reaping threads (doing pthread_join) when a connection ended. This lead to similar observations of huge VIRT and relatively small RSS. I'm not sure how multi-threaded shd works, but it could be leaking threads in a similar way. On Mon, Jun 6, 2016 at 1:54 PM, Oleksandr Natalenkowrote: Hello. We use v3.7.11, replica 2 setup between 2 nodes + 1 dummy node for keeping volumes metadata. Now we observe huge VSZ (VIRT) usage by glustershd on dummy node: === root 15109 0.0 13.7 76552820 535272 ? Ssl тра26 2:11 /usr/sbin/glusterfs -s localhost --volfile-id gluster/glustershd -p /var/lib/glusterd/glustershd/run/glustershd.pid -l /var/log/glusterfs/glustershd.log -S /var/run/gluster/7848e17764dd4ba80f4623aecb91b07a.socket --xlator-option *replicate*.node-uuid=80bc95e1-2027-4a96-bb66-d9c8ade624d7 === that is ~73G. RSS seems to be OK (~522M). Here is the statedump of glustershd process: [1] Also, here is sum of sizes, presented in statedump: === # cat /var/run/gluster/glusterdump.15109.dump.1465200139 | awk -F '=' 'BEGIN {sum=0} /^size=/ {sum+=$2} END {print sum}' 353276406 === That is ~337 MiB. Also, here are VIRT values from 2 replica nodes: === root 24659 0.0 0.3 5645836 451796 ? Ssl тра24 3:28 /usr/sbin/glusterfs -s localhost --volfile-id gluster/glustershd -p /var/lib/glusterd/glustershd/run/glustershd.pid -l /var/log/glusterfs/glustershd.log -S /var/run/gluster/44ec3f29003eccedf894865107d5db90.socket --xlator-option *replicate*.node-uuid=a19afcc2-e26c-43ce-bca6-d27dc1713e87 root 18312 0.0 0.3 6137500 477472 ? Ssl тра19 6:37 /usr/sbin/glusterfs -s localhost --volfile-id gluster/glustershd -p /var/lib/glusterd/glustershd/run/glustershd.pid -l /var/log/glusterfs/glustershd.log -S /var/run/gluster/1670a3abbd1eea968126eb6f5be20322.socket --xlator-option *replicate*.node-uuid=52dca21b-c81c-48b5-9de2-1ed37987fbc2 === Those are 5 to 6G, which is much less than dummy node has, but still look too big for us. Should we care about huge VIRT value on dummy node? Also, how one would debug that? Regards, Oleksandr. [1] https://gist.github.com/d2cfa25251136512580220fcdb8a6ce6 ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] Huge VSZ (VIRT) usage by glustershd on dummy node
Has multi-threaded SHD been merged into 3.7.* by any chance? If not, what I'm saying below doesn't apply. We saw problems when encrypted transports were used, because the RPC layer was not reaping threads (doing pthread_join) when a connection ended. This lead to similar observations of huge VIRT and relatively small RSS. I'm not sure how multi-threaded shd works, but it could be leaking threads in a similar way. On Mon, Jun 6, 2016 at 1:54 PM, Oleksandr Natalenkowrote: > Hello. > > We use v3.7.11, replica 2 setup between 2 nodes + 1 dummy node for keeping > volumes metadata. > > Now we observe huge VSZ (VIRT) usage by glustershd on dummy node: > > === > root 15109 0.0 13.7 76552820 535272 ? Ssl тра26 2:11 > /usr/sbin/glusterfs -s localhost --volfile-id gluster/glustershd -p > /var/lib/glusterd/glustershd/run/glustershd.pid -l > /var/log/glusterfs/glustershd.log -S > /var/run/gluster/7848e17764dd4ba80f4623aecb91b07a.socket --xlator-option > *replicate*.node-uuid=80bc95e1-2027-4a96-bb66-d9c8ade624d7 > === > > that is ~73G. RSS seems to be OK (~522M). Here is the statedump of > glustershd process: [1] > > Also, here is sum of sizes, presented in statedump: > > === > # cat /var/run/gluster/glusterdump.15109.dump.1465200139 | awk -F '=' 'BEGIN > {sum=0} /^size=/ {sum+=$2} END {print sum}' > 353276406 > === > > That is ~337 MiB. > > Also, here are VIRT values from 2 replica nodes: > > === > root 24659 0.0 0.3 5645836 451796 ? Ssl тра24 3:28 > /usr/sbin/glusterfs -s localhost --volfile-id gluster/glustershd -p > /var/lib/glusterd/glustershd/run/glustershd.pid -l > /var/log/glusterfs/glustershd.log -S > /var/run/gluster/44ec3f29003eccedf894865107d5db90.socket --xlator-option > *replicate*.node-uuid=a19afcc2-e26c-43ce-bca6-d27dc1713e87 > root 18312 0.0 0.3 6137500 477472 ? Ssl тра19 6:37 > /usr/sbin/glusterfs -s localhost --volfile-id gluster/glustershd -p > /var/lib/glusterd/glustershd/run/glustershd.pid -l > /var/log/glusterfs/glustershd.log -S > /var/run/gluster/1670a3abbd1eea968126eb6f5be20322.socket --xlator-option > *replicate*.node-uuid=52dca21b-c81c-48b5-9de2-1ed37987fbc2 > === > > Those are 5 to 6G, which is much less than dummy node has, but still look > too big for us. > > Should we care about huge VIRT value on dummy node? Also, how one would > debug that? > > Regards, > Oleksandr. > > [1] https://gist.github.com/d2cfa25251136512580220fcdb8a6ce6 > ___ > Gluster-devel mailing list > Gluster-devel@gluster.org > http://www.gluster.org/mailman/listinfo/gluster-devel ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
[Gluster-devel] Huge VSZ (VIRT) usage by glustershd on dummy node
Hello. We use v3.7.11, replica 2 setup between 2 nodes + 1 dummy node for keeping volumes metadata. Now we observe huge VSZ (VIRT) usage by glustershd on dummy node: === root 15109 0.0 13.7 76552820 535272 ? Ssl тра26 2:11 /usr/sbin/glusterfs -s localhost --volfile-id gluster/glustershd -p /var/lib/glusterd/glustershd/run/glustershd.pid -l /var/log/glusterfs/glustershd.log -S /var/run/gluster/7848e17764dd4ba80f4623aecb91b07a.socket --xlator-option *replicate*.node-uuid=80bc95e1-2027-4a96-bb66-d9c8ade624d7 === that is ~73G. RSS seems to be OK (~522M). Here is the statedump of glustershd process: [1] Also, here is sum of sizes, presented in statedump: === # cat /var/run/gluster/glusterdump.15109.dump.1465200139 | awk -F '=' 'BEGIN {sum=0} /^size=/ {sum+=$2} END {print sum}' 353276406 === That is ~337 MiB. Also, here are VIRT values from 2 replica nodes: === root 24659 0.0 0.3 5645836 451796 ? Ssl тра24 3:28 /usr/sbin/glusterfs -s localhost --volfile-id gluster/glustershd -p /var/lib/glusterd/glustershd/run/glustershd.pid -l /var/log/glusterfs/glustershd.log -S /var/run/gluster/44ec3f29003eccedf894865107d5db90.socket --xlator-option *replicate*.node-uuid=a19afcc2-e26c-43ce-bca6-d27dc1713e87 root 18312 0.0 0.3 6137500 477472 ? Ssl тра19 6:37 /usr/sbin/glusterfs -s localhost --volfile-id gluster/glustershd -p /var/lib/glusterd/glustershd/run/glustershd.pid -l /var/log/glusterfs/glustershd.log -S /var/run/gluster/1670a3abbd1eea968126eb6f5be20322.socket --xlator-option *replicate*.node-uuid=52dca21b-c81c-48b5-9de2-1ed37987fbc2 === Those are 5 to 6G, which is much less than dummy node has, but still look too big for us. Should we care about huge VIRT value on dummy node? Also, how one would debug that? Regards, Oleksandr. [1] https://gist.github.com/d2cfa25251136512580220fcdb8a6ce6 ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel