Re: [Gluster-devel] Regression test failures - Call for Action
On 05/05/2015 08:13 AM, Pranith Kumar Karampuri wrote: On 05/05/2015 08:10 AM, Jeff Darcy wrote: Jeff's patch failed again with same problem: http://build.gluster.org/job/rackspace-netbsd7-regression-triggered/4531/console Wouldn't have expected anything different. This one looks like a problem in the Jenkins/Gerrit infrastructure. Sorry for the mis-communication, I was referring to the same infra problem. The situation seems much better now. Thanks everyone for your prompt actions! We seem to be a little distance away from ensuring that our regression runs are clean. Let us continue our timely responses for addressing regression failures to help prevent a lockdown of master for all patches. Regards, Vijay ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] Regression test failures - Call for Action
On 5 May 2015, at 03:40, Jeff Darcy jda...@redhat.com wrote: Jeff's patch failed again with same problem: http://build.gluster.org/job/rackspace-netbsd7-regression-triggered/4531/console Wouldn't have expected anything different. This one looks like a problem in the Jenkins/Gerrit infrastructure. This kind of error message at the end of a failure log indicates the VM has self-disconnected from Jenkins and needs rebooting. Haven't found any other way to fix it. :/ Happens with both CentOS and NetBSD regression runs. [...] ^ FATAL: Unable to delete script file /var/tmp/hudson8377790745169807524.sh hudson.util.IOException2 : remote file operation failed: /var/tmp/hudson8377790745169807524.sh at hudson.remoting.Channel@2bae0315:nbslave72.cloud.gluster.org at hudson.FilePath.act(FilePath.java:900) at hudson.FilePath.act(FilePath.java:877) at hudson.FilePath.delete(FilePath.java:1262) at hudson.tasks.CommandInterpreter.perform(CommandInterpreter.java:101) at hudson.tasks.CommandInterpreter.perform(CommandInterpreter.java:60) [...] + Justin -- GlusterFS - http://www.gluster.org An open source, distributed file system scaling to several petabytes, and handling thousands of clients. My personal twitter: twitter.com/realjustinclift ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] Regression test failures - Call for Action
On 4 May 2015, at 08:06, Vijay Bellur vbel...@redhat.com wrote: Hi All, There has been a spate of regression test failures (due to broken tests or race conditions showing up) in the recent past [1] and I am inclined to block 3.7.0 GA along with acceptance of patches until we fix *all* regression test failures. We seem to have reached a point where this seems to be the only way to restore sanity to our regression runs. I plan to put this into effect 24 hours from now i.e. around 0700 UTC on 05/05. Thoughts? Please do this. :) + Justin Thanks, Vijay [1] https://public.pad.fsfe.org/p/gluster-spurious-failures ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel -- GlusterFS - http://www.gluster.org An open source, distributed file system scaling to several petabytes, and handling thousands of clients. My personal twitter: twitter.com/realjustinclift ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] Regression test failures - Call for Action
On 05/05/2015 12:58 AM, Justin Clift wrote: On 4 May 2015, at 08:06, Vijay Bellur vbel...@redhat.com wrote: Hi All, There has been a spate of regression test failures (due to broken tests or race conditions showing up) in the recent past [1] and I am inclined to block 3.7.0 GA along with acceptance of patches until we fix *all* regression test failures. We seem to have reached a point where this seems to be the only way to restore sanity to our regression runs. I plan to put this into effect 24 hours from now i.e. around 0700 UTC on 05/05. Thoughts? Please do this. :) What happened to NetBSD setup connection? Lot of them are failing with: http://build.gluster.org/job/rackspace-netbsd7-regression-triggered/4528/console Pranith + Justin Thanks, Vijay [1] https://public.pad.fsfe.org/p/gluster-spurious-failures ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel -- GlusterFS - http://www.gluster.org An open source, distributed file system scaling to several petabytes, and handling thousands of clients. My personal twitter: twitter.com/realjustinclift ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] Regression test failures - Call for Action
On 05/05/2015 06:12 AM, Pranith Kumar Karampuri wrote: On 05/05/2015 12:58 AM, Justin Clift wrote: On 4 May 2015, at 08:06, Vijay Bellur vbel...@redhat.com wrote: Hi All, There has been a spate of regression test failures (due to broken tests or race conditions showing up) in the recent past [1] and I am inclined to block 3.7.0 GA along with acceptance of patches until we fix *all* regression test failures. We seem to have reached a point where this seems to be the only way to restore sanity to our regression runs. I plan to put this into effect 24 hours from now i.e. around 0700 UTC on 05/05. Thoughts? Please do this. :) What happened to NetBSD setup connection? Lot of them are failing with: http://build.gluster.org/job/rackspace-netbsd7-regression-triggered/4528/console Jeff's patch failed again with same problem: http://build.gluster.org/job/rackspace-netbsd7-regression-triggered/4531/console Pranith Pranith + Justin Thanks, Vijay [1] https://public.pad.fsfe.org/p/gluster-spurious-failures ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel -- GlusterFS - http://www.gluster.org An open source, distributed file system scaling to several petabytes, and handling thousands of clients. My personal twitter: twitter.com/realjustinclift ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] Regression test failures - Call for Action
On 05/05/2015 08:10 AM, Jeff Darcy wrote: Jeff's patch failed again with same problem: http://build.gluster.org/job/rackspace-netbsd7-regression-triggered/4531/console Wouldn't have expected anything different. This one looks like a problem in the Jenkins/Gerrit infrastructure. Sorry for the mis-communication, I was referring to the same infra problem. Pranith ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] Regression test failures - Call for Action
Just saw two more failures in the same place for netbsd regressions. I am ignoring NetBSD status for the test fixes for now. I am not sure how this needs to be fixed. Please help! Pranith On 05/05/2015 07:17 AM, Pranith Kumar Karampuri wrote: On 05/05/2015 06:12 AM, Pranith Kumar Karampuri wrote: On 05/05/2015 12:58 AM, Justin Clift wrote: On 4 May 2015, at 08:06, Vijay Bellur vbel...@redhat.com wrote: Hi All, There has been a spate of regression test failures (due to broken tests or race conditions showing up) in the recent past [1] and I am inclined to block 3.7.0 GA along with acceptance of patches until we fix *all* regression test failures. We seem to have reached a point where this seems to be the only way to restore sanity to our regression runs. I plan to put this into effect 24 hours from now i.e. around 0700 UTC on 05/05. Thoughts? Please do this. :) What happened to NetBSD setup connection? Lot of them are failing with: http://build.gluster.org/job/rackspace-netbsd7-regression-triggered/4528/console Jeff's patch failed again with same problem: http://build.gluster.org/job/rackspace-netbsd7-regression-triggered/4531/console Pranith Pranith + Justin Thanks, Vijay [1] https://public.pad.fsfe.org/p/gluster-spurious-failures ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel -- GlusterFS - http://www.gluster.org An open source, distributed file system scaling to several petabytes, and handling thousands of clients. My personal twitter: twitter.com/realjustinclift ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] Regression test failures - Call for Action
Jeff's patch failed again with same problem: http://build.gluster.org/job/rackspace-netbsd7-regression-triggered/4531/console Wouldn't have expected anything different. This one looks like a problem in the Jenkins/Gerrit infrastructure. ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] Regression test failures - Call for Action
Also, one of us should go through the last however-many failures and determine the relative frequency of failures caused by each test, so we can prioritize. I started doing this, and very quickly found a runaway winner - data-self-heal.t, which also happens to be the very first test we run. Hmmm. The failures seem to have a common signature: Running all the regression test cases (new way) mkdir: cannot create directory `/mnt/glusterfs/2': File exists rm: cannot remove `/mnt/glusterfs/2': Is a directory mkdir: cannot create directory `/mnt/glusterfs/2': File exists rm: cannot remove `/mnt/glusterfs/2': Is a directory mkdir: cannot create directory `/mnt/glusterfs/2': File exists [18:38:06] ./tests/basic/afr/data-self-heal.t .. Dubious, test returned 1 (wstat 256, 0x100) That mkdir is the last thing in cleanup(). Because that's the last thing each test script calls, that failure turns into a bad exit code for the entire test. The problem is that cleanup() never unmounts that directory, like it does for the others we use. There are only two tests that use it, but if either of them should ever fail to unmount the directory themselves then their failure will become rather persistent - often across the next several runs. I'll be looking into why this condition isn't *completely* permanent, as well as why those tests aren't doing the unmount. Meanwhile, I've implemented a general workaround. http://review.gluster.org/#/c/10536/ With that, I think we'll see enough of a reduction in spurious failures that further drastic action might be unnecessary. ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] Regression test failures - Call for Action
There has been a spate of regression test failures (due to broken tests or race conditions showing up) in the recent past [1] and I am inclined to block 3.7.0 GA along with acceptance of patches until we fix *all* regression test failures. We seem to have reached a point where this seems to be the only way to restore sanity to our regression runs. I plan to put this into effect 24 hours from now i.e. around 0700 UTC on 05/05. Thoughts? As a complement to this, I suggest that we stop the Jenkins queue and make the slaves available to people debugging specific failures. We'll probably need some way - e.g. an Etherpad somewhere - to coordinate access so we don't step all over each other. Also, one of us should go through the last however-many failures and determine the relative frequency of failures caused by each test, so we can prioritize. Any other volunteers before I spend hours doing it myself? ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel