Re: [Gluster-devel] Problems with ec/nfs.t in regression tests
On 02/12/2015 08:15 PM, Xavier Hernandez wrote: I've made some more investigation and the problem seems worse. It seems that NFS sends a huge amount of requests without waiting for answers (I've had more than 1400 requests ongoing). Probably there will be many factors that can influence on the load that this causes, and one of them could be ec, but it's not related exclusively to ec. I've repeated the test using a replica 3 and a replica 2 volumes and the problem still happens. The test basically writes a file to an NFS mount using 'dd'. The file has a size of 1GB. With a smaller file, the test passes successfully. Using NFS client and gluster NFS server on same machine with BIG file dd operations is known to cause hangs. anon-fd-quota.t used to give similar problems so we changed the test to not involve NFS mounts. Pranith One important thing to note is that I'm not using powerful servers (a dual core Intel Atom), but this problem shouldn't happen anyway. It can even happen on more powerful servers if they are busy doing other things (maybe this is what's happening on jenkins' slaves). I think that this causes some NFS requests to timeout. This can be seen in /var/log/messages (there are many of these messages): Feb 12 15:18:45 celler01 kernel: nfs: server gf01.datalab.es not responding, timed out nfs log also has many errors: [2015-02-12 14:18:45.132905] E [rpcsvc.c:1257:rpcsvc_submit_generic] 0-rpc-service: failed to submit message (XID: 0x7be78dbe, Program: NFS3, ProgVers: 3, Proc: 7) to rpc -transport (socket.nfs-server) [2015-02-12 14:18:45.133009] E [nfs3.c:565:nfs3svc_submit_reply] 0-nfs-nfsv3: Reply submission failed Additionally this causes disconnections from NFS that are not correctly handled causing that a thread gets stuck in an infinite loop (I haven't analyzed this problem deeply, but it seems like an attempt to use an already disconnected socket). After a while, I get this error on the nfs log: [2015-02-12 14:20:19.545429] C [rpc-clnt-ping.c:109:rpc_clnt_ping_timer_expired] 0-patchy-client-0: server 192.168.200.61:49152 has not responded in the last 42 seconds, disconnecting. The console executing the test shows this (nfs.t is creating a replica 3 instead of a dispersed volume): # ./run-tests.sh tests/basic/ec/nfs.t ... GlusterFS Test Framework ... Running tests in file ./tests/basic/ec/nfs.t [14:12:52] ./tests/basic/ec/nfs.t .. 8/10 dd: error writing ‘/mnt/nfs/0/test’: Input/output error [14:12:52] ./tests/basic/ec/nfs.t .. 9/10 not ok 9 [14:12:52] ./tests/basic/ec/nfs.t .. Failed 1/10 subtests [14:27:41] Test Summary Report --- ./tests/basic/ec/nfs.t (Wstat: 0 Tests: 10 Failed: 1) Failed test: 9 Files=1, Tests=10, 889 wallclock secs ( 0.13 usr 0.02 sys + 1.29 cusr 3.45 csys = 4.89 CPU) Result: FAIL Failed tests ./tests/basic/ec/nfs.t Note that the test takes almost 15 minutes to complete. Is there any way to limit the number of requests NFS sends without having an answer ? Xavi On 02/11/2015 04:20 PM, Shyam wrote: On 02/11/2015 09:40 AM, Xavier Hernandez wrote: Hi, it seems that there are some failures in ec/nfs.t test on regression tests. Doing some investigation I've found that before applying the multi-threaded patch (commit 5e25569e) the problem does not seem to happen. This has in interesting history in failures, on the regression runs for the MT epoll this (i.e ec/nfs.t) did not fail (there were others, but not nfs.t). The patch that allows configuration of MT epoll is where this started failing around Feb 5th (but later passed). (see patchset 7 failures on, http://review.gluster.org/#/c/9488/ ) I state the above, as it may help narrowing down the changes in EC (maybe) that could have caused it. Also in the latter commit, there was an error configuring the number of threads so all regression runs would have run with a single epoll thread (the MT epoll patch had this hard coded, so that would have run with 2 threads, but did not show up the issue (patch: http://review.gluster.org/#/c/3842/)). Again I state the above, as this should not be exposing a race/bug/problem due to the multi threaded nature of epoll, but of course needs investigation. I'm not sure if this patch is the cause or it has revealed some bug in ec or any other xlator. I guess we can reproduce this issue? If so I would try setting client.event-threads on master branch to 1, restarting the volume and then running the test (as a part of the test itself maybe) to eliminate the possibility that MT epoll is causing it. My belief on MT epoll causing it is in doubt as the runs failed on the http://review.gluster.org/#/c/9488/ (configuration patch), which had the thread count as 1 due to a bug in that code. I can try to identify it (any help will be appreciated), but it may take some time. Would it be better to remove the test in the meantime ? I am checking if this is reproducible on my machine, so that I can
Re: [Gluster-devel] Problems with ec/nfs.t in regression tests
On 02/12/2015 11:34 PM, Pranith Kumar Karampuri wrote: On 02/12/2015 08:15 PM, Xavier Hernandez wrote: I've made some more investigation and the problem seems worse. It seems that NFS sends a huge amount of requests without waiting for answers (I've had more than 1400 requests ongoing). Probably there will be many factors that can influence on the load that this causes, and one of them could be ec, but it's not related exclusively to ec. I've repeated the test using a replica 3 and a replica 2 volumes and the problem still happens. The test basically writes a file to an NFS mount using 'dd'. The file has a size of 1GB. With a smaller file, the test passes successfully. Using NFS client and gluster NFS server on same machine with BIG file dd operations is known to cause hangs. anon-fd-quota.t used to give similar problems so we changed the test to not involve NFS mounts. I don't re-collect the exact scenario. Avati found the deadlock of memory allocation, when I just joined gluster, in 2010. Raghavendra Bhat raised this bug then. CCed him to the thread as well if he knows the exact scenario. Pranith Pranith One important thing to note is that I'm not using powerful servers (a dual core Intel Atom), but this problem shouldn't happen anyway. It can even happen on more powerful servers if they are busy doing other things (maybe this is what's happening on jenkins' slaves). I think that this causes some NFS requests to timeout. This can be seen in /var/log/messages (there are many of these messages): Feb 12 15:18:45 celler01 kernel: nfs: server gf01.datalab.es not responding, timed out nfs log also has many errors: [2015-02-12 14:18:45.132905] E [rpcsvc.c:1257:rpcsvc_submit_generic] 0-rpc-service: failed to submit message (XID: 0x7be78dbe, Program: NFS3, ProgVers: 3, Proc: 7) to rpc -transport (socket.nfs-server) [2015-02-12 14:18:45.133009] E [nfs3.c:565:nfs3svc_submit_reply] 0-nfs-nfsv3: Reply submission failed Additionally this causes disconnections from NFS that are not correctly handled causing that a thread gets stuck in an infinite loop (I haven't analyzed this problem deeply, but it seems like an attempt to use an already disconnected socket). After a while, I get this error on the nfs log: [2015-02-12 14:20:19.545429] C [rpc-clnt-ping.c:109:rpc_clnt_ping_timer_expired] 0-patchy-client-0: server 192.168.200.61:49152 has not responded in the last 42 seconds, disconnecting. The console executing the test shows this (nfs.t is creating a replica 3 instead of a dispersed volume): # ./run-tests.sh tests/basic/ec/nfs.t ... GlusterFS Test Framework ... Running tests in file ./tests/basic/ec/nfs.t [14:12:52] ./tests/basic/ec/nfs.t .. 8/10 dd: error writing ‘/mnt/nfs/0/test’: Input/output error [14:12:52] ./tests/basic/ec/nfs.t .. 9/10 not ok 9 [14:12:52] ./tests/basic/ec/nfs.t .. Failed 1/10 subtests [14:27:41] Test Summary Report --- ./tests/basic/ec/nfs.t (Wstat: 0 Tests: 10 Failed: 1) Failed test: 9 Files=1, Tests=10, 889 wallclock secs ( 0.13 usr 0.02 sys + 1.29 cusr 3.45 csys = 4.89 CPU) Result: FAIL Failed tests ./tests/basic/ec/nfs.t Note that the test takes almost 15 minutes to complete. Is there any way to limit the number of requests NFS sends without having an answer ? Xavi On 02/11/2015 04:20 PM, Shyam wrote: On 02/11/2015 09:40 AM, Xavier Hernandez wrote: Hi, it seems that there are some failures in ec/nfs.t test on regression tests. Doing some investigation I've found that before applying the multi-threaded patch (commit 5e25569e) the problem does not seem to happen. This has in interesting history in failures, on the regression runs for the MT epoll this (i.e ec/nfs.t) did not fail (there were others, but not nfs.t). The patch that allows configuration of MT epoll is where this started failing around Feb 5th (but later passed). (see patchset 7 failures on, http://review.gluster.org/#/c/9488/ ) I state the above, as it may help narrowing down the changes in EC (maybe) that could have caused it. Also in the latter commit, there was an error configuring the number of threads so all regression runs would have run with a single epoll thread (the MT epoll patch had this hard coded, so that would have run with 2 threads, but did not show up the issue (patch: http://review.gluster.org/#/c/3842/)). Again I state the above, as this should not be exposing a race/bug/problem due to the multi threaded nature of epoll, but of course needs investigation. I'm not sure if this patch is the cause or it has revealed some bug in ec or any other xlator. I guess we can reproduce this issue? If so I would try setting client.event-threads on master branch to 1, restarting the volume and then running the test (as a part of the test itself maybe) to eliminate the possibility that MT epoll is causing it. My belief on MT epoll causing it is in doubt as the runs failed on the http://review.gluster.org/#/c/9488/
Re: [Gluster-devel] Problems with ec/nfs.t in regression tests
On 02/12/2015 01:27 PM, Xavier Hernandez wrote: On 12.02.2015 19:09, Pranith Kumar Karampuri wrote: On 02/12/2015 11:34 PM, Pranith Kumar Karampuri wrote: On 02/12/2015 08:15 PM, Xavier Hernandez wrote: I've made some more investigation and the problem seems worse. It seems that NFS sends a huge amount of requests without waiting for answers (I've had more than 1400 requests ongoing). Probably there will be many factors that can influence on the load that this causes, and one of them could be ec, but it's not related exclusively to ec. I've repeated the test using a replica 3 and a replica 2 volumes and the problem still happens. The test basically writes a file to an NFS mount using 'dd'. The file has a size of 1GB. With a smaller file, the test passes successfully. Using NFS client and gluster NFS server on same machine with BIG file dd operations is known to cause hangs. anon-fd-quota.t used to give similar problems so we changed the test to not involve NFS mounts. I don't re-collect the exact scenario. Avati found the deadlock of memory allocation, when I just joined gluster, in 2010. Raghavendra Bhat raised this bug then. CCed him to the thread as well if he knows the exact scenario. I've been doing some tests with Shyam and it seems that the root cause is the edge-triggered epoll introduced in the multi-threaded epoll patch. It has a side effect that makes the outstanding-rpc-limit option near to useless and gluster gets overflowed of requests, causing timeouts and disconnections on slow/busy machines. Elaborating on this, the MT epoll makes the epoll edge triggered (ET), and so on an poll in event, we attempt to read as much as we can. If the client is able to supply 'n' RPCs till our read, gets a EAGAIN | EWOULDBLOCK, we will read them and not honor the server side throttle. In the previous case, we read RPC by RPC and the epoll was not ET, hence when we reached the throttle limit, we stop reading from the socket. The network pipes would be filled up when this happens and so the client would also not be able to write more RPC, hence outstanding RPC (or ongoing RPCs) would be limited. With the ET case in epoll we are breaking this. Shyam ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] Problems with ec/nfs.t in regression tests
On 12.02.2015 19:09, Pranith Kumar Karampuri wrote: On 02/12/2015 11:34 PM, Pranith Kumar Karampuri wrote: On 02/12/2015 08:15 PM, Xavier Hernandez wrote: I've made some more investigation and the problem seems worse. It seems that NFS sends a huge amount of requests without waiting for answers (I've had more than 1400 requests ongoing). Probably there will be many factors that can influence on the load that this causes, and one of them could be ec, but it's not related exclusively to ec. I've repeated the test using a replica 3 and a replica 2 volumes and the problem still happens. The test basically writes a file to an NFS mount using 'dd'. The file has a size of 1GB. With a smaller file, the test passes successfully. Using NFS client and gluster NFS server on same machine with BIG file dd operations is known to cause hangs. anon-fd-quota.t used to give similar problems so we changed the test to not involve NFS mounts. I don't re-collect the exact scenario. Avati found the deadlock of memory allocation, when I just joined gluster, in 2010. Raghavendra Bhat raised this bug then. CCed him to the thread as well if he knows the exact scenario. I've been doing some tests with Shyam and it seems that the root cause is the edge-triggered epoll introduced in the multi-threaded epoll patch. It has a side effect that makes the outstanding-rpc-limit option near to useless and gluster gets overflowed of requests, causing timeouts and disconnections on slow/busy machines. I've opened bug #1192114 for this problem. Xavi ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] GlusterFS 4.0 Meeting Time Survey
+1 for Thursday (anytime). -1 for Friday. On Wed, Feb 11, 2015 at 7:22 PM, Niels de Vos nde...@redhat.com wrote: On Mon, Feb 09, 2015 at 05:34:04PM -0500, Jeff Darcy wrote: The inaugural GlusterFS 4.0 meeting on Friday was a great success. Thanks to all who attended. Minutes are here: http://meetbot.fedoraproject.org/gluster-meeting/2015-02-06/glusterfs_4.0.2015-02-06-12.05.html One action item was to figure out when IRC meetings (#gluster-meeting on Freenode) should occur. Meetings are likely to be every other week, starting February 26 or 27. Obviously, it would be good to pick a time that gets us maximum participation. If you would be interested in attending, please let me know: * Which day - Thursday or Friday * What time - 11:00, 12:00, or 13:00 UTC Thursday at 13:00 would work, other times not so much. Any of the proposed times on Friday are OK. Thanks, Niels Please try to keep in mind that our two main constituencies are in IST (+0530) and EST (-0500). With at least one developer in Europe and none so far on the US west coast, this is as close to a sweet spot as we can get. If we do pick up developers in time zones where this isn't sufficient, we could rotate or I would be willing to run two meetings per cycle on opposite schedules; let's cross that bridge when we get there. ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] missing files
On 02/12/2015 03:05 PM, Pranith Kumar Karampuri wrote: On 02/12/2015 09:14 AM, Justin Clift wrote: On 12 Feb 2015, at 03:02, Shyam srang...@redhat.com wrote: On 02/11/2015 08:28 AM, David F. Robinson wrote: My base filesystem has 40-TB and the tar takes 19 minutes. I copied over 10-TB and it took the tar extraction from 1-minute to 7-minutes. My suspicion is that it is related to number of files and not necessarily file size. Shyam is looking into reproducing this behavior on a redhat system. I am able to reproduce the issue on a similar setup internally (at least at the surface it seems to be similar to what David is facing). I will continue the investigation for the root cause. Here is the initial analysis of my investigation: (Thanks for providing me with the setup shyam, keep the setup we may need it for further analysis) On bad volume: %-latency Avg-latency Min-Latency Max-Latency No. of calls Fop - --- --- --- 0.00 0.00 us 0.00 us 0.00 us 937104 FORGET 0.00 0.00 us 0.00 us 0.00 us 872478 RELEASE 0.00 0.00 us 0.00 us 0.00 us 23668 RELEASEDIR 0.00 41.86 us 23.00 us 86.00 us 92 STAT 0.01 39.40 us 24.00 us 104.00 us 218 STATFS 0.28 55.99 us 43.00 us1152.00 us 4065 SETXATTR 0.58 56.89 us 25.00 us4505.00 us 8236 OPENDIR 0.73 26.80 us 11.00 us 257.00 us 22238 FLUSH 0.77 152.83 us 92.00 us8819.00 us 4065 RMDIR 2.57 62.00 us 21.00 us 409.00 us 33643 WRITE 5.46 199.16 us 108.00 us 469938.00 us 22238 UNLINK 6.70 69.83 us 43.00 us.00 us 77809 LOOKUP 6.97 447.60 us 21.00 us 54875.00 us 12631 READDIRP 7.73 79.42 us 33.00 us1535.00 us 78909 SETATTR 14.112815.00 us 176.00 us 2106305.00 us 4065 MKDIR 54.091972.62 us 138.00 us 1520773.00 us 22238 CREATE On good volume: %-latency Avg-latency Min-Latency Max-Latency No. of calls Fop - --- --- --- 0.00 0.00 us 0.00 us 0.00 us 58870 FORGET 0.00 0.00 us 0.00 us 0.00 us 66016 RELEASE 0.00 0.00 us 0.00 us 0.00 us 16480 RELEASEDIR 0.00 61.50 us 58.00 us 65.00 us 2OPEN 0.01 39.56 us 16.00 us 112.00 us 71 STAT 0.02 41.29 us 27.00 us 79.00 us 163 STATFS 0.03 36.06 us 17.00 us 98.00 us 301 FSTAT 0.79 62.38 us 39.00 us 269.00 us 4065 SETXATTR 1.14 242.99 us 25.00 us 28636.00 us 1497 READ 1.54 59.76 us 25.00 us6325.00 us 8236 OPENDIR 1.70 133.75 us 89.00 us 374.00 us 4065 RMDIR 2.25 32.65 us 15.00 us 265.00 us 22006 FLUSH 3.37 265.05 us 172.00 us2349.00 us 4065 MKDIR 7.14 68.34 us 21.00 us 21902.00 us 33357 WRITE 11.00 159.68 us 107.00 us2567.00 us 22003 UNLINK 13.82 200.54 us 133.00 us 21762.00 us 22003 CREATE 17.85 448.85 us 22.00 us 54046.00 us 12697 READDIRP 18.37 76.12 us 45.00 us 294.00 us 77044 LOOKUP 20.95 85.54 us 35.00 us1404.00 us 78204 SETATTR As we can see here, FORGET/RELEASE are way more in the brick from full volume compared to the brick from empty volume. It seems to suggest that the inode-table on the volume with lots of data is carrying too many passive inodes in the table which need to be displaced to create new ones. Need to check if they come in the fop-path. Need to continue my investigations further, will let you know. Just to increase confidence performed one more test. Stopped the volumes and re-started. Now on both the volumes, the numbers are almost same: [root@gqac031 gluster-mount]# time rm -rf boost_1_57_0 ; time tar xf boost_1_57_0.tar.gz real1m15.074s user0m0.550s sys 0m4.656s real2m46.866s user0m5.347s sys 0m16.047s [root@gqac031 gluster-mount]# cd /gluster-emptyvol/ [root@gqac031 gluster-emptyvol]# ls boost_1_57_0.tar.gz [root@gqac031 gluster-emptyvol]# time tar xf boost_1_57_0.tar.gz real2m31.467s user0m5.475s sys 0m15.471s gqas015.sbu.lab.eng.bos.redhat.com:testvol on /gluster-mount type fuse.glusterfs (rw,default_permissions,allow_other,max_read=131072) gqas015.sbu.lab.eng.bos.redhat.com:emotyvol on /gluster-emptyvol type fuse.glusterfs (rw,default_permissions,allow_other,max_read=131072) Pranith Pranith Thanks Shyam. :) + Justin -- GlusterFS - http://www.gluster.org An open source, distributed file system scaling to several petabytes, and handling thousands of clients. My personal twitter:
Re: [Gluster-devel] missing files
On 02/12/2015 09:14 AM, Justin Clift wrote: On 12 Feb 2015, at 03:02, Shyam srang...@redhat.com wrote: On 02/11/2015 08:28 AM, David F. Robinson wrote: My base filesystem has 40-TB and the tar takes 19 minutes. I copied over 10-TB and it took the tar extraction from 1-minute to 7-minutes. My suspicion is that it is related to number of files and not necessarily file size. Shyam is looking into reproducing this behavior on a redhat system. I am able to reproduce the issue on a similar setup internally (at least at the surface it seems to be similar to what David is facing). I will continue the investigation for the root cause. Here is the initial analysis of my investigation: (Thanks for providing me with the setup shyam, keep the setup we may need it for further analysis) On bad volume: %-latency Avg-latency Min-Latency Max-Latency No. of calls Fop - --- --- --- 0.00 0.00 us 0.00 us 0.00 us 937104 FORGET 0.00 0.00 us 0.00 us 0.00 us 872478 RELEASE 0.00 0.00 us 0.00 us 0.00 us 23668 RELEASEDIR 0.00 41.86 us 23.00 us 86.00 us 92STAT 0.01 39.40 us 24.00 us 104.00 us 218 STATFS 0.28 55.99 us 43.00 us1152.00 us 4065SETXATTR 0.58 56.89 us 25.00 us4505.00 us 8236 OPENDIR 0.73 26.80 us 11.00 us 257.00 us 22238 FLUSH 0.77 152.83 us 92.00 us8819.00 us 4065 RMDIR 2.57 62.00 us 21.00 us 409.00 us 33643 WRITE 5.46 199.16 us 108.00 us 469938.00 us 22238 UNLINK 6.70 69.83 us 43.00 us.00 us 77809 LOOKUP 6.97 447.60 us 21.00 us 54875.00 us 12631READDIRP 7.73 79.42 us 33.00 us1535.00 us 78909 SETATTR 14.112815.00 us 176.00 us 2106305.00 us 4065 MKDIR 54.091972.62 us 138.00 us 1520773.00 us 22238 CREATE On good volume: %-latency Avg-latency Min-Latency Max-Latency No. of calls Fop - --- --- --- 0.00 0.00 us 0.00 us 0.00 us 58870 FORGET 0.00 0.00 us 0.00 us 0.00 us 66016 RELEASE 0.00 0.00 us 0.00 us 0.00 us 16480 RELEASEDIR 0.00 61.50 us 58.00 us 65.00 us 2OPEN 0.01 39.56 us 16.00 us 112.00 us 71STAT 0.02 41.29 us 27.00 us 79.00 us 163 STATFS 0.03 36.06 us 17.00 us 98.00 us 301 FSTAT 0.79 62.38 us 39.00 us 269.00 us 4065SETXATTR 1.14 242.99 us 25.00 us 28636.00 us 1497READ 1.54 59.76 us 25.00 us6325.00 us 8236 OPENDIR 1.70 133.75 us 89.00 us 374.00 us 4065 RMDIR 2.25 32.65 us 15.00 us 265.00 us 22006 FLUSH 3.37 265.05 us 172.00 us2349.00 us 4065 MKDIR 7.14 68.34 us 21.00 us 21902.00 us 33357 WRITE 11.00 159.68 us 107.00 us2567.00 us 22003 UNLINK 13.82 200.54 us 133.00 us 21762.00 us 22003 CREATE 17.85 448.85 us 22.00 us 54046.00 us 12697READDIRP 18.37 76.12 us 45.00 us 294.00 us 77044 LOOKUP 20.95 85.54 us 35.00 us1404.00 us 78204 SETATTR As we can see here, FORGET/RELEASE are way more in the brick from full volume compared to the brick from empty volume. It seems to suggest that the inode-table on the volume with lots of data is carrying too many passive inodes in the table which need to be displaced to create new ones. Need to check if they come in the fop-path. Need to continue my investigations further, will let you know. Pranith Thanks Shyam. :) + Justin -- GlusterFS - http://www.gluster.org An open source, distributed file system scaling to several petabytes, and handling thousands of clients. My personal twitter: twitter.com/realjustinclift ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] Problems with ec/nfs.t in regression tests
On 02/13/2015 12:07 AM, Niels de Vos wrote: On Thu, Feb 12, 2015 at 11:39:51PM +0530, Pranith Kumar Karampuri wrote: On 02/12/2015 11:34 PM, Pranith Kumar Karampuri wrote: On 02/12/2015 08:15 PM, Xavier Hernandez wrote: I've made some more investigation and the problem seems worse. It seems that NFS sends a huge amount of requests without waiting for answers (I've had more than 1400 requests ongoing). Probably there will be many factors that can influence on the load that this causes, and one of them could be ec, but it's not related exclusively to ec. I've repeated the test using a replica 3 and a replica 2 volumes and the problem still happens. The test basically writes a file to an NFS mount using 'dd'. The file has a size of 1GB. With a smaller file, the test passes successfully. Using NFS client and gluster NFS server on same machine with BIG file dd operations is known to cause hangs. anon-fd-quota.t used to give similar problems so we changed the test to not involve NFS mounts. I don't re-collect the exact scenario. Avati found the deadlock of memory allocation, when I just joined gluster, in 2010. Raghavendra Bhat raised this bug then. CCed him to the thread as well if he knows the exact scenario. This is a well know issue. When a system is under memory pressure, it will try to flush dirty pages from the VFS. The NFS-client will send the dirty pages over the network to the NFS-server. Unfortunately, the NFS-server needs to allocate memory for the handling or the WRITE procedures. This causes a loop and will most often get the system into a hang situation. Yes. This was it :-). Seems like Xavi and Shyam found the reason for the failure though, which is not this. Pranith Mounting with -o sync, or flushing outstanding I/O from the client side should normally be sufficient to prevent these issues. Nice, didn't know about this. Pranith Niels Pranith Pranith One important thing to note is that I'm not using powerful servers (a dual core Intel Atom), but this problem shouldn't happen anyway. It can even happen on more powerful servers if they are busy doing other things (maybe this is what's happening on jenkins' slaves). I think that this causes some NFS requests to timeout. This can be seen in /var/log/messages (there are many of these messages): Feb 12 15:18:45 celler01 kernel: nfs: server gf01.datalab.es not responding, timed out nfs log also has many errors: [2015-02-12 14:18:45.132905] E [rpcsvc.c:1257:rpcsvc_submit_generic] 0-rpc-service: failed to submit message (XID: 0x7be78dbe, Program: NFS3, ProgVers: 3, Proc: 7) to rpc -transport (socket.nfs-server) [2015-02-12 14:18:45.133009] E [nfs3.c:565:nfs3svc_submit_reply] 0-nfs-nfsv3: Reply submission failed Additionally this causes disconnections from NFS that are not correctly handled causing that a thread gets stuck in an infinite loop (I haven't analyzed this problem deeply, but it seems like an attempt to use an already disconnected socket). After a while, I get this error on the nfs log: [2015-02-12 14:20:19.545429] C [rpc-clnt-ping.c:109:rpc_clnt_ping_timer_expired] 0-patchy-client-0: server 192.168.200.61:49152 has not responded in the last 42 seconds, disconnecting. The console executing the test shows this (nfs.t is creating a replica 3 instead of a dispersed volume): # ./run-tests.sh tests/basic/ec/nfs.t ... GlusterFS Test Framework ... Running tests in file ./tests/basic/ec/nfs.t [14:12:52] ./tests/basic/ec/nfs.t .. 8/10 dd: error writing ‘/mnt/nfs/0/test’: Input/output error [14:12:52] ./tests/basic/ec/nfs.t .. 9/10 not ok 9 [14:12:52] ./tests/basic/ec/nfs.t .. Failed 1/10 subtests [14:27:41] Test Summary Report --- ./tests/basic/ec/nfs.t (Wstat: 0 Tests: 10 Failed: 1) Failed test: 9 Files=1, Tests=10, 889 wallclock secs ( 0.13 usr 0.02 sys + 1.29 cusr 3.45 csys = 4.89 CPU) Result: FAIL Failed tests ./tests/basic/ec/nfs.t Note that the test takes almost 15 minutes to complete. Is there any way to limit the number of requests NFS sends without having an answer ? Xavi On 02/11/2015 04:20 PM, Shyam wrote: On 02/11/2015 09:40 AM, Xavier Hernandez wrote: Hi, it seems that there are some failures in ec/nfs.t test on regression tests. Doing some investigation I've found that before applying the multi-threaded patch (commit 5e25569e) the problem does not seem to happen. This has in interesting history in failures, on the regression runs for the MT epoll this (i.e ec/nfs.t) did not fail (there were others, but not nfs.t). The patch that allows configuration of MT epoll is where this started failing around Feb 5th (but later passed). (see patchset 7 failures on, http://review.gluster.org/#/c/9488/ ) I state the above, as it may help narrowing down the changes in EC (maybe) that could have caused it. Also in the latter commit, there was an error configuring the number of threads so all regression runs would have run with a single epoll thread (the MT epoll patch had this hard coded, so
[Gluster-devel] coredumps in regression runs
Several coredumps are being generated in regression runs now [1]. Anyone had a chance to look into this? -Vijay [1] http://build.gluster.org/job/rackspace-regression-2GB-triggered/4233/consoleFull ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] missing files
Shyam, You asked me to stop/start the slow volume to see if it fixed the timing issue. I stopped/started homegfs_backup (the production volume with 40+ TB) and it didn't make it faster. I didn't stop/start the fast volume to see if it made it slower. I just did that and sent out an email. I saw a similar result as Pranith. however, I tried this test below and saw no issues. So, i don't know why restart the older volume of test3brick slowed it down but the test below shows no slowdown. #... Create 2-new bricks gluster volume create test4brick gfsib01bkp.corvidtec.com:/data/brick01bkp/test4brick gfsib01bkp.corvidtec.com:/data/brick02bkp/test4brick gluster volume create test5brick gfsib01bkp.corvidtec.com:/data/brick01bkp/test5brick gfsib01bkp.corvidtec.com:/data/brick02bkp/test5brick gluster volume start test4brick gluster volume start test5brick mount /test4brick mount /test5brick cp /root/boost_1_57_0.tar /test4brick cp /root/boost_1_57_0.tar /test5brick #... Stop/start test4brick to see if this causes a timing issue umount /test4brick gluster volume stop test4brick gluster volume start test4brick mount /test4brick #... Run test on both new bricks cd /test4brick time tar -xPf boost_1_57_0.tar; time rm -rf boost_1_57_0 real1m29.712s user0m0.415s sys 0m2.772s real0m18.866s user0m0.087s sys 0m0.556s cd /test5brick time tar -xPf boost_1_57_0.tar; time rm -rf boost_1_57_0 real 1m28.243s user 0m0.366s sys 0m2.502s real 0m18.193s user 0m0.075s sys 0m0.543s #... Repeat again after stop/start of test4brick umount /test4brick gluster volume stop test4brick gluster volume start test4brick mount /test4brick cd /test4brick time tar -xPf boost_1_57_0.tar; time rm -rf boost_1_57_0 real1m25.277s user0m0.466s sys 0m3.107s real0m16.575s user0m0.084s sys 0m0.577s -- Original Message -- From: Shyam srang...@redhat.com To: Pranith Kumar Karampuri pkara...@redhat.com; Justin Clift jus...@gluster.org Cc: Gluster Devel gluster-devel@gluster.org; David F. Robinson david.robin...@corvidtec.com Sent: 2/12/2015 10:46:14 AM Subject: Re: [Gluster-devel] missing files On 02/12/2015 06:22 AM, Pranith Kumar Karampuri wrote: On 02/12/2015 03:05 PM, Pranith Kumar Karampuri wrote: On 02/12/2015 09:14 AM, Justin Clift wrote: On 12 Feb 2015, at 03:02, Shyam srang...@redhat.com wrote: On 02/11/2015 08:28 AM, David F. Robinson wrote: Just to increase confidence performed one more test. Stopped the volumes and re-started. Now on both the volumes, the numbers are almost same: [root@gqac031 gluster-mount]# time rm -rf boost_1_57_0 ; time tar xf boost_1_57_0.tar.gz real 1m15.074s user 0m0.550s sys 0m4.656s real 2m46.866s user 0m5.347s sys 0m16.047s [root@gqac031 gluster-mount]# cd /gluster-emptyvol/ [root@gqac031 gluster-emptyvol]# ls boost_1_57_0.tar.gz [root@gqac031 gluster-emptyvol]# time tar xf boost_1_57_0.tar.gz real 2m31.467s user 0m5.475s sys 0m15.471s gqas015.sbu.lab.eng.bos.redhat.com:testvol on /gluster-mount type fuse.glusterfs (rw,default_permissions,allow_other,max_read=131072) gqas015.sbu.lab.eng.bos.redhat.com:emotyvol on /gluster-emptyvol type fuse.glusterfs (rw,default_permissions,allow_other,max_read=131072) If I remember right, we performed a similar test on David's setup, but I believe there was no significant performance gain there. David could you clarify? Just so we know where we are headed :) Shyam ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] missing files
On 02/12/2015 11:18 AM, David F. Robinson wrote: Shyam, You asked me to stop/start the slow volume to see if it fixed the timing issue. I stopped/started homegfs_backup (the production volume with 40+ TB) and it didn't make it faster. I didn't stop/start the fast volume to see if it made it slower. I just did that and sent out an email. I saw a similar result as Pranith. Just to be clear even after restart of the slow volume, we see ~19 minutes for the tar to complete, correct? Versus, on the fast volume it is anywhere between 00:55 - 3:00 minutes, irrespective of start, fresh create, etc. correct? Shyam ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] missing files
On 12 Feb 2015, at 11:22, Pranith Kumar Karampuri pkara...@redhat.com wrote: snip Just to increase confidence performed one more test. Stopped the volumes and re-started. Now on both the volumes, the numbers are almost same: Oh. So it's a problem that turns up after a certain amount of activity has happened on a volume? eg a lot of intensive activity would show up quickly, but a less intense amount of activity would take longer to show the effect Kaleb's long running cluster might be useful to catch this kind of thing in future, depending on the workload running on it, and the kind of pre/post tests we run. (eg to catch performance regressions) + Justin -- GlusterFS - http://www.gluster.org An open source, distributed file system scaling to several petabytes, and handling thousands of clients. My personal twitter: twitter.com/realjustinclift ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] missing files
That is very interesting. I tried this test and received a similar result. Start/stopping the volume causes a timing issue on the blank volume. It seems like there is some parameter getting set when you create a volume and gets reset when you start/stop a volume. Or, something gets set during the start/stop operation that causes the problem. Is there a way to list all parameters that are set for a volume? gluster volume info only shows the ones that the user has changed from defaults. [root@gfs01bkp ~]# gluster volume stop test3brick Stopping volume will make its data inaccessible. Do you want to continue? (y/n) y volume stop: test3brick: success [root@gfs01bkp ~]# gluster volume start test3brick volume start: test3brick: success [root@gfs01bkp ~]# mount /test3brick [root@gfs01bkp ~]# cd /test3brick/ [root@gfs01bkp test3brick]# date; time tar -xPf boost_1_57_0.tar ; time rm -rf boost_1_57_0 Thu Feb 12 10:42:43 EST 2015 real3m46.002s user0m0.421s sys 0m2.812s real0m15.406s user0m0.092s sys 0m0.549s -- Original Message -- From: Pranith Kumar Karampuri pkara...@redhat.com To: Justin Clift jus...@gluster.org; Shyam srang...@redhat.com Cc: Gluster Devel gluster-devel@gluster.org; David F. Robinson david.robin...@corvidtec.com Sent: 2/12/2015 6:22:23 AM Subject: Re: [Gluster-devel] missing files On 02/12/2015 03:05 PM, Pranith Kumar Karampuri wrote: On 02/12/2015 09:14 AM, Justin Clift wrote: On 12 Feb 2015, at 03:02, Shyam srang...@redhat.com wrote: On 02/11/2015 08:28 AM, David F. Robinson wrote: My base filesystem has 40-TB and the tar takes 19 minutes. I copied over 10-TB and it took the tar extraction from 1-minute to 7-minutes. My suspicion is that it is related to number of files and not necessarily file size. Shyam is looking into reproducing this behavior on a redhat system. I am able to reproduce the issue on a similar setup internally (at least at the surface it seems to be similar to what David is facing). I will continue the investigation for the root cause. Here is the initial analysis of my investigation: (Thanks for providing me with the setup shyam, keep the setup we may need it for further analysis) On bad volume: %-latency Avg-latency Min-Latency Max-Latency No. of calls Fop - --- --- --- 0.00 0.00 us 0.00 us 0.00 us 937104 FORGET 0.00 0.00 us 0.00 us 0.00 us 872478 RELEASE 0.00 0.00 us 0.00 us 0.00 us 23668 RELEASEDIR 0.00 41.86 us 23.00 us 86.00 us 92 STAT 0.01 39.40 us 24.00 us 104.00 us 218 STATFS 0.28 55.99 us 43.00 us 1152.00 us 4065 SETXATTR 0.58 56.89 us 25.00 us 4505.00 us 8236 OPENDIR 0.73 26.80 us 11.00 us 257.00 us 22238 FLUSH 0.77 152.83 us 92.00 us 8819.00 us 4065 RMDIR 2.57 62.00 us 21.00 us 409.00 us 33643 WRITE 5.46 199.16 us 108.00 us 469938.00 us 22238 UNLINK 6.70 69.83 us 43.00 us .00 us 77809 LOOKUP 6.97 447.60 us 21.00 us 54875.00 us 12631 READDIRP 7.73 79.42 us 33.00 us 1535.00 us 78909 SETATTR 14.11 2815.00 us 176.00 us 2106305.00 us 4065 MKDIR 54.09 1972.62 us 138.00 us 1520773.00 us 22238 CREATE On good volume: %-latency Avg-latency Min-Latency Max-Latency No. of calls Fop - --- --- --- 0.00 0.00 us 0.00 us 0.00 us 58870 FORGET 0.00 0.00 us 0.00 us 0.00 us 66016 RELEASE 0.00 0.00 us 0.00 us 0.00 us 16480 RELEASEDIR 0.00 61.50 us 58.00 us 65.00 us 2 OPEN 0.01 39.56 us 16.00 us 112.00 us 71 STAT 0.02 41.29 us 27.00 us 79.00 us 163 STATFS 0.03 36.06 us 17.00 us 98.00 us 301 FSTAT 0.79 62.38 us 39.00 us 269.00 us 4065 SETXATTR 1.14 242.99 us 25.00 us 28636.00 us 1497 READ 1.54 59.76 us 25.00 us 6325.00 us 8236 OPENDIR 1.70 133.75 us 89.00 us 374.00 us 4065 RMDIR 2.25 32.65 us 15.00 us 265.00 us 22006 FLUSH 3.37 265.05 us 172.00 us 2349.00 us 4065 MKDIR 7.14 68.34 us 21.00 us 21902.00 us 33357 WRITE 11.00 159.68 us 107.00 us 2567.00 us 22003 UNLINK 13.82 200.54 us 133.00 us 21762.00 us 22003 CREATE 17.85 448.85 us 22.00 us 54046.00 us 12697 READDIRP 18.37 76.12 us 45.00 us 294.00 us 77044 LOOKUP 20.95 85.54 us 35.00 us 1404.00 us 78204 SETATTR As we can see here, FORGET/RELEASE are way more in the brick from full volume compared to the brick from empty volume. It seems to suggest that the inode-table on the volume with lots of data is carrying too many passive inodes in the table which need to be displaced to create new ones. Need to check if they come in the fop-path. Need to continue my investigations further, will let you know. Just to increase confidence performed one more test. Stopped the volumes and re-started. Now on both the volumes, the numbers are almost same: [root@gqac031 gluster-mount]# time rm -rf boost_1_57_0 ; time tar xf boost_1_57_0.tar.gz real 1m15.074s
Re: [Gluster-devel] missing files
-- Original Message -- From: Shyam srang...@redhat.com To: David F. Robinson david.robin...@corvidtec.com; Pranith Kumar Karampuri pkara...@redhat.com; Justin Clift jus...@gluster.org Cc: Gluster Devel gluster-devel@gluster.org Sent: 2/12/2015 11:26:51 AM Subject: Re: [Gluster-devel] missing files On 02/12/2015 11:18 AM, David F. Robinson wrote: Shyam, You asked me to stop/start the slow volume to see if it fixed the timing issue. I stopped/started homegfs_backup (the production volume with 40+ TB) and it didn't make it faster. I didn't stop/start the fast volume to see if it made it slower. I just did that and sent out an email. I saw a similar result as Pranith. Just to be clear even after restart of the slow volume, we see ~19 minutes for the tar to complete, correct? Correct Versus, on the fast volume it is anywhere between 00:55 - 3:00 minutes, irrespective of start, fresh create, etc. correct? Correct Shyam ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] missing files
FWIW, starting/stopping a volume that is fast doesn't consistently make it slow. I just tried it again on an older volume... It doesn't make it slow. I also went back and re-ran the test on test3brick and it isn't slow any longer. Maybe there is a time lag after stopping/starting a volume before it becomes fast. Either way, stopping/starting a fast volume only makes it slow for some period of time and it doesn't consistently make it slow. I don't think this is the issue. red-herring. [root@gfs01bkp /]# gluster volume stop test2brick Stopping volume will make its data inaccessible. Do you want to continue? (y/n) y [root@gfs01bkp /]# gluster volume start test2brick volume start: test2brick: success [root@gfs01bkp /]# mount /test2brick [root@gfs01bkp /]# cd /test2brick [root@gfs01bkp test2brick]# time tar -xPf boost_1_57_0.tar; time rm -rf boost_1_57_0 real1m1.124s user0m0.432s sys 0m3.136s real0m16.630s user0m0.083s sys 0m0.570s #... Retest on test3brick after it has been up after a volume restart for 20-minutes... Compare this to running the test immediately after a restart which gave a time of 3.5-minutes. [root@gfs01bkp test3brick]# time tar -xPf boost_1_57_0.tar; time rm -rf boost_1_57_0 real1m17.786s user0m0.502s sys 0m3.278s real0m18.103s user0m0.101s sys 0m0.684s -- Original Message -- From: Shyam srang...@redhat.com To: David F. Robinson david.robin...@corvidtec.com; Pranith Kumar Karampuri pkara...@redhat.com; Justin Clift jus...@gluster.org Cc: Gluster Devel gluster-devel@gluster.org Sent: 2/12/2015 11:26:51 AM Subject: Re: [Gluster-devel] missing files On 02/12/2015 11:18 AM, David F. Robinson wrote: Shyam, You asked me to stop/start the slow volume to see if it fixed the timing issue. I stopped/started homegfs_backup (the production volume with 40+ TB) and it didn't make it faster. I didn't stop/start the fast volume to see if it made it slower. I just did that and sent out an email. I saw a similar result as Pranith. Just to be clear even after restart of the slow volume, we see ~19 minutes for the tar to complete, correct? Versus, on the fast volume it is anywhere between 00:55 - 3:00 minutes, irrespective of start, fresh create, etc. correct? Shyam ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel