[Gluster-infra] [Bug 1379228] smoke test fails with read/write failed (ENOTCONN)
https://bugzilla.redhat.com/show_bug.cgi?id=1379228 Nigel Babuchanged: What|Removed |Added CC|gluster-infra@gluster.org | -- You are receiving this mail because: You are on the CC list for the bug. Unsubscribe from this bug https://bugzilla.redhat.com/token.cgi?t=lylEy1fKjp=cc_unsubscribe ___ Gluster-infra mailing list Gluster-infra@gluster.org http://www.gluster.org/mailman/listinfo/gluster-infra
[Gluster-infra] [Bug 1379228] smoke test fails with read/write failed (ENOTCONN)
https://bugzilla.redhat.com/show_bug.cgi?id=1379228 Shyamsundarchanged: What|Removed |Added CC||srang...@redhat.com --- Comment #9 from Shyamsundar --- My notes: The script: https://github.com/gluster/glusterfs-patch-acceptance-tests/blob/master/smoke.sh Success case: https://build.gluster.org/job/smoke/30870/console = 10:47:56 + wait %3 ---> This happens when %2 wait is complete, so dbench was done by this time and the script started waiting on %3 (IOW the line printed is going to be executed) 10:48:51 All tests successful. 10:48:51 Files=191, Tests=1960, 129 wallclock secs ( 1.28 usr 0.36 sys + 9.57 cusr 7.43 csys = 18.64 CPU) 10:48:51 Result: PASS ---> %3 (compliance) completed (took about 129 seconds, dbench would take about 71-72 seconds including the warmup), so the wait above was over and we proceed ---> cleanup starts 10:48:51 + rm -rf clients 10:48:53 + cd - 10:48:53 /home/jenkins/root/workspace/smoke 10:48:53 + finish 10:48:53 + RET=0 ---> NOTE: RET here takes the output of rm -rf clients, not sure if this is intended 10:48:53 + '[' 0 -ne 0 ']' 10:48:53 + cleanup ---> cleanup invoked by the finish, and this possibly has the set -x enabled by the script (but watchdog does not see the failed case) 10:48:53 + killall -15 glusterfs glusterfsd glusterd ---> All well! Failure case: https://build.gluster.org/job/smoke/30852/console = 00:03:16 All tests successful. 00:03:16 Files=191, Tests=1960, 93 wallclock secs ( 0.89 usr 0.26 sys + 5.46 cusr 3.30 csys = 9.91 CPU) 00:03:16 Result: PASS 00:11:36 Kicking in watchdog after 600 secs ---> Where are the watchdog cleanup calls noted? It appears that watchdog is called before set -x and hence cleanup is not logged here ---> Assuming cleanup was called, it killed all gluster processes, and dbench finally errored out in the read (no connection), and hence %2 completed 00:11:36 + wait %3 ---> wait for %3 starts, and gets over ASAP as compliance has finished running about 8 minutes back (00:03:16) 00:11:36 + rm -rf clients 00:11:36 rm: cannot remove `clients': Transport endpoint is not connected ---> We cannot as watchdog has cleaned up the process, so this rm -rf fails (we failed cleanup, is this an issue for the next run?) 00:11:36 + finish 00:11:36 + RET=1 ---> rm -rf failed, so we caught that, is this what is intended? 00:11:36 + '[' 1 -ne 0 ']' 00:11:36 + cat /build/dbench-logs --- 00:11:36 10 cleanup 581 sec ---> dbench has been attempting cleanup for 580 odd seconds 00:11:36 [643] read failed on handle 10007 (Transport endpoint is not connected) ---> Finally the dbench clients get an error as watchdog shut the process and hence the volume down and we get connection errors and dbench exits --- 00:11:36 + cleanup ---> Called by finish, and everything fails as watchdog has cleaned up already 00:11:36 + killall -15 glusterfs glusterfsd glusterd 00:11:36 glusterfs: no process killed 00:11:36 glusterfsd: no process killed 00:11:36 glusterd: no process killed Root cause: === Looks like dbench got stuck at https://github.com/sahlberg/dbench/blob/master/fileio.c#L400 (or pread) and never was able to break out of it. This caused dbench never to complete till the volume and the mount was taken down and it errored out. Why it got stuck here, would be the next question I guess. -- You are receiving this mail because: You are on the CC list for the bug. Unsubscribe from this bug https://bugzilla.redhat.com/token.cgi?t=Q2R8FSpovJ=cc_unsubscribe ___ Gluster-infra mailing list Gluster-infra@gluster.org http://www.gluster.org/mailman/listinfo/gluster-infra
[Gluster-infra] [Bug 1379228] smoke test fails with read/write failed (ENOTCONN)
https://bugzilla.redhat.com/show_bug.cgi?id=1379228 --- Comment #8 from Nigel Babu--- I'm guessing Pranith was looking for /var/log/glusterfs files. I've added the logic to grab them for smoke. We never did get them for smoke: https://github.com/gluster/glusterfs-patch-acceptance-tests/pull/62/files -- You are receiving this mail because: You are on the CC list for the bug. Unsubscribe from this bug https://bugzilla.redhat.com/token.cgi?t=pvgv1KO4PO=cc_unsubscribe ___ Gluster-infra mailing list Gluster-infra@gluster.org http://www.gluster.org/mailman/listinfo/gluster-infra
[Gluster-infra] [Bug 1379228] smoke test fails with read/write failed (ENOTCONN)
https://bugzilla.redhat.com/show_bug.cgi?id=1379228 --- Comment #7 from Nigel Babu--- >From what I can see from rudimentary searches: "Transport endpoint is not connected" most likely means the FUSE mount has crashed. This most definitely means an intermittent bug has slipped past us? -- You are receiving this mail because: You are on the CC list for the bug. Unsubscribe from this bug https://bugzilla.redhat.com/token.cgi?t=0jLmiVsD1Y=cc_unsubscribe ___ Gluster-infra mailing list Gluster-infra@gluster.org http://www.gluster.org/mailman/listinfo/gluster-infra
[Gluster-infra] [Bug 1379228] smoke test fails with read/write failed (ENOTCONN)
https://bugzilla.redhat.com/show_bug.cgi?id=1379228 Nigel Babuchanged: What|Removed |Added CC||ma...@redhat.com, ||rjos...@redhat.com Flags||needinfo?(ma...@redhat.com) --- Comment #5 from Nigel Babu --- Michael and Rajesh, do you have any idea what's going on in these failures. Any pointers to look at where to start debugging would also be useful. -- You are receiving this mail because: You are on the CC list for the bug. Unsubscribe from this bug https://bugzilla.redhat.com/token.cgi?t=Ux827f0uSj=cc_unsubscribe ___ Gluster-infra mailing list Gluster-infra@gluster.org http://www.gluster.org/mailman/listinfo/gluster-infra
[Gluster-infra] [Bug 1379228] smoke test fails with read/write failed (ENOTCONN)
https://bugzilla.redhat.com/show_bug.cgi?id=1379228 --- Comment #4 from Nigel Babu--- Data from the last 100 smoke jobs: builtOn: slave26.cloud.gluster.org, url: http://build.gluster.org/job/smoke/30791/ builtOn: slave32.cloud.gluster.org, url: http://build.gluster.org/job/smoke/30822/ builtOn: builder1.rht.gluster.org, url: http://build.gluster.org/job/smoke/30821/ url: http://build.gluster.org/job/smoke/30815/ url: http://build.gluster.org/job/smoke/30782/ builtOn: slave20.cloud.gluster.org, url: http://build.gluster.org/job/smoke/30818/ builtOn: slave23.cloud.gluster.org, url: http://build.gluster.org/job/smoke/30750/ builtOn: slave34.cloud.gluster.org, url: http://build.gluster.org/job/smoke/30814/ url: http://build.gluster.org/job/smoke/30813/ url: http://build.gluster.org/job/smoke/30811/ url: http://build.gluster.org/job/smoke/30808/ url: http://build.gluster.org/job/smoke/30770/ url: http://build.gluster.org/job/smoke/30764/ url: http://build.gluster.org/job/smoke/30733/ url: http://build.gluster.org/job/smoke/30724/ -- You are receiving this mail because: You are on the CC list for the bug. Unsubscribe from this bug https://bugzilla.redhat.com/token.cgi?t=yIehAlyjFV=cc_unsubscribe ___ Gluster-infra mailing list Gluster-infra@gluster.org http://www.gluster.org/mailman/listinfo/gluster-infra
[Gluster-infra] [Bug 1379228] smoke test fails with read/write failed (ENOTCONN)
https://bugzilla.redhat.com/show_bug.cgi?id=1379228 --- Comment #3 from Nigel Babu--- Seemingly all these are on slave34. I'm going to quickly sample all the smoke failures to see if they're all happening on slave34. -- You are receiving this mail because: You are on the CC list for the bug. Unsubscribe from this bug https://bugzilla.redhat.com/token.cgi?t=g6MdlWhP0z=cc_unsubscribe ___ Gluster-infra mailing list Gluster-infra@gluster.org http://www.gluster.org/mailman/listinfo/gluster-infra
[Gluster-infra] [Bug 1379228] smoke test fails with read/write failed (ENOTCONN)
https://bugzilla.redhat.com/show_bug.cgi?id=1379228 --- Comment #2 from Nigel Babu--- These are the dbench failures. -- You are receiving this mail because: You are on the CC list for the bug. Unsubscribe from this bug https://bugzilla.redhat.com/token.cgi?t=aypT1C50Sr=cc_unsubscribe ___ Gluster-infra mailing list Gluster-infra@gluster.org http://www.gluster.org/mailman/listinfo/gluster-infra
[Gluster-infra] [Bug 1379228] smoke test fails with read/write failed (ENOTCONN)
https://bugzilla.redhat.com/show_bug.cgi?id=1379228 Nigel Babuchanged: What|Removed |Added Status|NEW |ASSIGNED CC||nig...@redhat.com Assignee|b...@gluster.org|nig...@redhat.com --- Comment #1 from Nigel Babu --- Checking now. -- You are receiving this mail because: You are on the CC list for the bug. Unsubscribe from this bug https://bugzilla.redhat.com/token.cgi?t=q3zhN2iSsR=cc_unsubscribe ___ Gluster-infra mailing list Gluster-infra@gluster.org http://www.gluster.org/mailman/listinfo/gluster-infra