Re: [Gluster-devel] [Gluster-infra] NetBSD regression fixes

2016-01-20 Thread Rajesh Joseph


- Original Message -
> From: "Emmanuel Dreyfus" 
> To: "Niels de Vos" 
> Cc: gluster-in...@gluster.org, gluster-devel@gluster.org
> Sent: Sunday, January 17, 2016 10:23:16 AM
> Subject: Re: [Gluster-devel] [Gluster-infra] NetBSD regression fixes
> 
> Niels de Vos  wrote:
> 
> > > 2) Spurious failures
> > > I added a retry-failed-test-once feature so that we get less regression
> > > failures because of spurious failures. It is not used right now because
> > > it does not play nicely with bad tests blacklist.
> > > 
> > > This will be fixed by that changes:
> > > http://review.gluster.org/13245
> > > http://review.gluster.org/13247
> > > 
> > > I have been looping failure-free regression for a while with that trick.
> > 
> > Nice, thanks for these improvements!
> 
> But I just realized the change is wrong, since running tests "new way"
> stops on first failed test. My change just retry the failed test and
> considers the regression run to be good on success, without running next
> tests.
> 
> I will post an update shortly.
> 

I think we should not take this approach. If the tests are not reliable then 
there
is no guarantee that it will pass in the next retry. In fact we should not rely 
on 
luck here. Lets not run those tests which are spurious in nature. Anyway we 
don't
consider the result of those tests. Therefore I think we should consider the 
patch 
sent by Talur (http://review.gluster.org/13173).

> > Could you send a pull request for the regression.sh script on
> > https://github.com/gluster/glusterfs-patch-acceptance-tests/ ? Or, if
> > you dont use GitHub, send the patch by email and we'll take care of
> > pushing it for you.
> 
> Sure, but let me settle on something that works first.
> 
> --
> Emmanuel Dreyfus
> http://hcpnet.free.fr/pubz
> m...@netbsd.org
> ___
> Gluster-devel mailing list
> Gluster-devel@gluster.org
> http://www.gluster.org/mailman/listinfo/gluster-devel
> 
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] [Gluster-infra] NetBSD regression fixes

2016-01-16 Thread Emmanuel Dreyfus
Emmanuel Dreyfus  wrote:

> But I just realized the change is wrong, since running tests "new way"
> stops on first failed test. My change just retry the failed test and
> considers the regression run to be good on success, without running next
> tests.
> 
> I will post an update shortly.

Done:
http://review.gluster.org/13245
http://review.gluster.org/13247
-- 
Emmanuel Dreyfus
http://hcpnet.free.fr/pubz
m...@netbsd.org
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] [Gluster-infra] NetBSD regression fixes

2016-01-16 Thread Emmanuel Dreyfus
Niels de Vos  wrote:

> > 2) Spurious failures
> > I added a retry-failed-test-once feature so that we get less regression
> > failures because of spurious failures. It is not used right now because
> > it does not play nicely with bad tests blacklist.
> > 
> > This will be fixed by that changes:
> > http://review.gluster.org/13245
> > http://review.gluster.org/13247
> > 
> > I have been looping failure-free regression for a while with that trick.
> 
> Nice, thanks for these improvements!

But I just realized the change is wrong, since running tests "new way"
stops on first failed test. My change just retry the failed test and
considers the regression run to be good on success, without running next
tests.

I will post an update shortly.

> Could you send a pull request for the regression.sh script on
> https://github.com/gluster/glusterfs-patch-acceptance-tests/ ? Or, if
> you dont use GitHub, send the patch by email and we'll take care of
> pushing it for you.

Sure, but let me settle on something that works first.

-- 
Emmanuel Dreyfus
http://hcpnet.free.fr/pubz
m...@netbsd.org
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] [Gluster-infra] NetBSD regression fixes

2016-01-16 Thread Niels de Vos
On Sat, Jan 16, 2016 at 06:55:49PM +0100, Emmanuel Dreyfus wrote:
> Hello all
> 
> Here are the problems identified in NetBSD regression so far:
> 
> 1) Before starting regression, slave compains about "vnconfig:
> VNDIOCGET: Bad file descriptor" and fails the run.
> 
> This will be fixed by that changes:
> http://review.gluster.org/13204
> http://review.gluster.org/13205
> 
> 
> 2) Spurious failures
> I added a retry-failed-test-once feature so that we get less regression
> failures because of spurious failures. It is not used right now because
> it does not play nicely with bad tests blacklist.
> 
> This will be fixed by that changes:
> http://review.gluster.org/13245
> http://review.gluster.org/13247
> 
> I have been looping failure-free regression for a while with that trick.

Nice, thanks for these improvements!

> 3) Stale state from previous regression
> We sometime have processes stuck from previous regression, awaiting
> vnode locks for destroyed NFS filesystems. This cause starting cleanup
> scripts to hang before starting regression and we get a timeout.
> 
> I modified slave's /opt/qa/regression.sh to check for stuck processes
> and reboot the system if we find them. That will fail the current
> regression run, but at least the next ones coming after reboot will be
> safe.
> 
> This fix is not deployed yet, I await the fixes from point 2 to be
> merged

Could you send a pull request for the regression.sh script on
https://github.com/gluster/glusterfs-patch-acceptance-tests/ ? Or, if
you dont use GitHub, send the patch by email and we'll take care of
pushing it for you.

> 4) Jenkins casts concurent runs on the same slave
> We observed Jenkins sometimes runs two jobs on the same slave at once,
> which of course can only lead to horrible failure.
> 
> I modified slave's /opt/qa/regression.sh to add a lock file so that this
> situation is detected early and reported. The second regression will
> fail, but the idea is to get a better understanding of how that can
> occur.
> 
> This fix is not deployed yet, I await the fixes from point 2 to be
> merged

Hmm, I have not seen that before, but it surely is something to be
concerned about :-/

Thanks,
Niels


signature.asc
Description: PGP signature
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel