Re: [Gluster-devel] Regression test failures - Call for Action

2015-05-05 Thread Vijay Bellur

On 05/05/2015 08:13 AM, Pranith Kumar Karampuri wrote:


On 05/05/2015 08:10 AM, Jeff Darcy wrote:

Jeff's patch failed again with same problem:
http://build.gluster.org/job/rackspace-netbsd7-regression-triggered/4531/console


Wouldn't have expected anything different.  This one looks like a
problem in the Jenkins/Gerrit infrastructure.

Sorry for the mis-communication, I was referring to the same infra problem.



The situation seems much better now. Thanks everyone for your prompt 
actions!


We seem to be a little distance away from ensuring that our regression 
runs are clean. Let us continue our timely responses for addressing 
regression failures to help prevent a lockdown of master for all patches.


Regards,
Vijay
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] Regression test failures - Call for Action

2015-05-05 Thread Justin Clift
On 5 May 2015, at 03:40, Jeff Darcy jda...@redhat.com wrote:
Jeff's patch failed again with same problem:
 http://build.gluster.org/job/rackspace-netbsd7-regression-triggered/4531/console
 
 Wouldn't have expected anything different.  This one looks like a
 problem in the Jenkins/Gerrit infrastructure.

This kind of error message at the end of a failure log indicates
the VM has self-disconnected from Jenkins and needs rebooting.
Haven't found any other way to fix it. :/

Happens with both CentOS and NetBSD regression runs.

[...]
  ^
FATAL: Unable to delete script file /var/tmp/hudson8377790745169807524.sh

hudson.util.IOException2
: remote file operation failed: /var/tmp/hudson8377790745169807524.sh at 
hudson.remoting.Channel@2bae0315:nbslave72.cloud.gluster.org
at 
hudson.FilePath.act(FilePath.java:900)

at 
hudson.FilePath.act(FilePath.java:877)

at 
hudson.FilePath.delete(FilePath.java:1262)

at 
hudson.tasks.CommandInterpreter.perform(CommandInterpreter.java:101)

at 
hudson.tasks.CommandInterpreter.perform(CommandInterpreter.java:60)
[...]

+ Justin

--
GlusterFS - http://www.gluster.org

An open source, distributed file system scaling to several
petabytes, and handling thousands of clients.

My personal twitter: twitter.com/realjustinclift

___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] Regression test failures - Call for Action

2015-05-04 Thread Justin Clift
On 4 May 2015, at 08:06, Vijay Bellur vbel...@redhat.com wrote:
 Hi All,
 
 There has been a spate of regression test failures (due to broken tests or 
 race conditions showing up) in the recent past [1] and I am inclined to block 
 3.7.0 GA along with acceptance of patches until we fix *all* regression test 
 failures. We seem to have reached a point where this seems to be the only way 
 to restore sanity to our regression runs.
 
 I plan to put this into effect 24 hours from now i.e. around 0700 UTC on 
 05/05. Thoughts?

Please do this. :)

+ Justin


 Thanks,
 Vijay
 
 [1] https://public.pad.fsfe.org/p/gluster-spurious-failures
 ___
 Gluster-devel mailing list
 Gluster-devel@gluster.org
 http://www.gluster.org/mailman/listinfo/gluster-devel

--
GlusterFS - http://www.gluster.org

An open source, distributed file system scaling to several
petabytes, and handling thousands of clients.

My personal twitter: twitter.com/realjustinclift

___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] Regression test failures - Call for Action

2015-05-04 Thread Pranith Kumar Karampuri


On 05/05/2015 12:58 AM, Justin Clift wrote:

On 4 May 2015, at 08:06, Vijay Bellur vbel...@redhat.com wrote:

Hi All,

There has been a spate of regression test failures (due to broken tests or race 
conditions showing up) in the recent past [1] and I am inclined to block 3.7.0 
GA along with acceptance of patches until we fix *all* regression test 
failures. We seem to have reached a point where this seems to be the only way 
to restore sanity to our regression runs.

I plan to put this into effect 24 hours from now i.e. around 0700 UTC on 05/05. 
Thoughts?

Please do this. :)
What happened to NetBSD setup connection? Lot of them are failing with: 
http://build.gluster.org/job/rackspace-netbsd7-regression-triggered/4528/console


Pranith


+ Justin



Thanks,
Vijay

[1] https://public.pad.fsfe.org/p/gluster-spurious-failures
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel

--
GlusterFS - http://www.gluster.org

An open source, distributed file system scaling to several
petabytes, and handling thousands of clients.

My personal twitter: twitter.com/realjustinclift

___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] Regression test failures - Call for Action

2015-05-04 Thread Pranith Kumar Karampuri


On 05/05/2015 06:12 AM, Pranith Kumar Karampuri wrote:


On 05/05/2015 12:58 AM, Justin Clift wrote:

On 4 May 2015, at 08:06, Vijay Bellur vbel...@redhat.com wrote:

Hi All,

There has been a spate of regression test failures (due to broken 
tests or race conditions showing up) in the recent past [1] and I am 
inclined to block 3.7.0 GA along with acceptance of patches until we 
fix *all* regression test failures. We seem to have reached a point 
where this seems to be the only way to restore sanity to our 
regression runs.


I plan to put this into effect 24 hours from now i.e. around 0700 
UTC on 05/05. Thoughts?

Please do this. :)
What happened to NetBSD setup connection? Lot of them are failing 
with: 
http://build.gluster.org/job/rackspace-netbsd7-regression-triggered/4528/console
Jeff's patch failed again with same problem: 
http://build.gluster.org/job/rackspace-netbsd7-regression-triggered/4531/console


Pranith


Pranith


+ Justin



Thanks,
Vijay

[1] https://public.pad.fsfe.org/p/gluster-spurious-failures
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel

--
GlusterFS - http://www.gluster.org

An open source, distributed file system scaling to several
petabytes, and handling thousands of clients.

My personal twitter: twitter.com/realjustinclift

___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] Regression test failures - Call for Action

2015-05-04 Thread Pranith Kumar Karampuri


On 05/05/2015 08:10 AM, Jeff Darcy wrote:

Jeff's patch failed again with same problem:
http://build.gluster.org/job/rackspace-netbsd7-regression-triggered/4531/console

Wouldn't have expected anything different.  This one looks like a
problem in the Jenkins/Gerrit infrastructure.

Sorry for the mis-communication, I was referring to the same infra problem.

Pranith

___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] Regression test failures - Call for Action

2015-05-04 Thread Pranith Kumar Karampuri
Just saw two more failures in the same place for netbsd regressions. I 
am ignoring NetBSD status for the test fixes for now. I am not sure how 
this needs to be fixed. Please help!


Pranith
On 05/05/2015 07:17 AM, Pranith Kumar Karampuri wrote:


On 05/05/2015 06:12 AM, Pranith Kumar Karampuri wrote:


On 05/05/2015 12:58 AM, Justin Clift wrote:

On 4 May 2015, at 08:06, Vijay Bellur vbel...@redhat.com wrote:

Hi All,

There has been a spate of regression test failures (due to broken 
tests or race conditions showing up) in the recent past [1] and I 
am inclined to block 3.7.0 GA along with acceptance of patches 
until we fix *all* regression test failures. We seem to have 
reached a point where this seems to be the only way to restore 
sanity to our regression runs.


I plan to put this into effect 24 hours from now i.e. around 0700 
UTC on 05/05. Thoughts?

Please do this. :)
What happened to NetBSD setup connection? Lot of them are failing 
with: 
http://build.gluster.org/job/rackspace-netbsd7-regression-triggered/4528/console
Jeff's patch failed again with same problem: 
http://build.gluster.org/job/rackspace-netbsd7-regression-triggered/4531/console


Pranith


Pranith


+ Justin



Thanks,
Vijay

[1] https://public.pad.fsfe.org/p/gluster-spurious-failures
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel

--
GlusterFS - http://www.gluster.org

An open source, distributed file system scaling to several
petabytes, and handling thousands of clients.

My personal twitter: twitter.com/realjustinclift

___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] Regression test failures - Call for Action

2015-05-04 Thread Jeff Darcy
 Jeff's patch failed again with same problem:
 http://build.gluster.org/job/rackspace-netbsd7-regression-triggered/4531/console

Wouldn't have expected anything different.  This one looks like a
problem in the Jenkins/Gerrit infrastructure.
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] Regression test failures - Call for Action

2015-05-04 Thread Jeff Darcy
 Also, one of us should
 go through the last however-many failures and determine the relative
 frequency of failures caused by each test, so we can prioritize.

I started doing this, and very quickly found a runaway winner -
data-self-heal.t, which also happens to be the very first test we
run.  Hmmm.  The failures seem to have a common signature:

  Running all the regression test cases (new way)
  mkdir: cannot create directory `/mnt/glusterfs/2': File exists
  rm: cannot remove `/mnt/glusterfs/2': Is a directory
  mkdir: cannot create directory `/mnt/glusterfs/2': File exists
  rm: cannot remove `/mnt/glusterfs/2': Is a directory
  mkdir: cannot create directory `/mnt/glusterfs/2': File exists
  [18:38:06] ./tests/basic/afr/data-self-heal.t .. 
  Dubious, test returned 1 (wstat 256, 0x100)

That mkdir is the last thing in cleanup().  Because that's the
last thing each test script calls, that failure turns into a bad
exit code for the entire test.  The problem is that cleanup()
never unmounts that directory, like it does for the others we
use.  There are only two tests that use it, but if either of
them should ever fail to unmount the directory themselves then
their failure will become rather persistent - often across the
next several runs.  I'll be looking into why this condition
isn't *completely* permanent, as well as why those tests aren't
doing the unmount.  Meanwhile, I've implemented a general
workaround.

  http://review.gluster.org/#/c/10536/

With that, I think we'll see enough of a reduction in spurious
failures that further drastic action might be unnecessary.
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] Regression test failures - Call for Action

2015-05-04 Thread Jeff Darcy
 There has been a spate of regression test failures (due to broken tests
 or race conditions showing up) in the recent past [1] and I am inclined
 to block 3.7.0 GA along with acceptance of patches until we fix *all*
 regression test failures. We seem to have reached a point where this
 seems to be the only way to restore sanity to our regression runs.
 
 I plan to put this into effect 24 hours from now i.e. around 0700 UTC on
 05/05. Thoughts?

As a complement to this, I suggest that we stop the Jenkins queue and
make the slaves available to people debugging specific failures.  We'll
probably need some way - e.g. an Etherpad somewhere - to coordinate
access so we don't step all over each other.  Also, one of us should
go through the last however-many failures and determine the relative
frequency of failures caused by each test, so we can prioritize.  Any
other volunteers before I spend hours doing it myself?
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel