Re: [Gluster-devel] [Gluster-infra] Progress report for regression tests in Rackspace

2014-05-15 Thread Vijay Bellur

On 05/15/2014 09:08 PM, Luis Pabon wrote:

Should we create bugs for each of these, and divide-and-conquer?


That could be of help. First level of consolidation done (with frequency 
of test failures) by Justin might be a good list to start with. If we 
observe more failures as part of ongoing regression runs, let us open 
new bugs and have them cleaned up.


-Vijay



- Luis

On 05/15/2014 10:27 AM, Niels de Vos wrote:

On Thu, May 15, 2014 at 06:05:00PM +0530, Vijay Bellur wrote:

On 04/30/2014 07:03 PM, Justin Clift wrote:

Hi us,

Was trying out the GlusterFS regression tests in Rackspace VMs last
night for each of the release-3.4, release-3.5, and master branches.

The regression test is just a run of "run-tests.sh", from a git
checkout of the appropriate branch.

The good news is we're adding a lot of testing code with each release:

  * release-3.4 -  6303 lines  (~30 mins to run test)
  * release-3.5 -  9776 lines  (~85 mins to run test)
  * master  - 11660 lines  (~90 mins to run test)

(lines counted using:
  $ find tests -type f -iname "*.t" -exec cat {} >> a \;; wc -l a;
rm -f a)

The bad news is the tests only "kind of" pass now.  I say kind of
because
although the regression run *can* pass for each of these branch's, it's
inconsistent. :(

Results from testing overnight:

  * release-3.4 - 20 runs - 17 PASS, 3 FAIL. 85% success.
* bug-857330/normal.t failed in one run
* bug-887098-gmount-crash.t failed in one run
* bug-857330/normal.t failed in one run

  * release-3.5 - 20 runs, 18 PASS, 2 FAIL. 90% success.
* bug-857330/xml.t failed in one run
* bug-1004744.t failed in another run (same vm for both failures)

  * master - 20 runs, 6 PASS, 14 FAIL. 30% success.
* bug-1070734.t failed in one run
* bug-1087198.t & bug-860663.t failed in one run (same vm as
bug-1070734.t failure above)
* bug-1087198.t & bug-857330/normal.t failed in one run (new vm,
a subsequent run on same vm passed)
* bug-1087198.t & bug-948686.t failed in one run (new vm)
* bug-1070734.t & bug-1087198.t failed in one run (new vm)
* bug-860663.t failed in one run
* bug-1023974.t & bug-1087198.t & bug-948686.t failed in one run
(new vm)
* bug-1004744.t & bug-1023974.t & bug-1087198.t & bug-948686.t
failed in one run (new vm)
* bug-948686.t failed in one run (new vm)
* bug-1070734.t failed in one run (new vm)
* bug-1023974.t failed in one run (new vm)
* bug-1087198.t & bug-948686.t failed in one run (new vm)
* bug-1070734.t failed in one run (new vm)
* bug-1087198.t failed in one run (new vm)

The occasional failing tests aren't completely random, suggesting
something is going on.  Possible race conditions maybe? (no idea).

  * 8 failures - bug-1087198.t
  * 5 failures - bug-948686.t
  * 4 failures - bug-1070734.t
  * 3 failures - bug-1023974.t
  * 3 failures - bug-857330/normal.t
  * 2 failures - bug-860663.t
  * 2 failures - bug-1004744.t
  * 1 failures - bug-857330/xml.t
  * 1 failures - bug-887098-gmount-crash.t

Anyone have suggestions on how to make this work reliably?



I think it would be a good idea to arrive at a list of test cases that
are failing at random and assign owners to address them (default owner
being the submitter of the test case). In addition to these, I have
also seen tests like bd.t and xml.t fail pretty regularly.

Justin - can we publish a consolidated list of regression tests that
fail and owners for them on an etherpad or similar?

Fixing these test cases will enable us to bring in more jenkins
instances for parallel regression runs etc. and will also provide more
determinism for our regression tests. Your help to address the
regression test suite problems will be greatly appreciated!

Indeed, getting the regression tests stable seems like a blocker before
we can move to a scalable Jenkins solution. Unfortunately, it may not be
trivial to debug these test cases... Any suggestion on capturing useful
data that helps in figuring out why the test cases don't pass?

Thanks,
Niels
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-devel


___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-devel



___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] [Gluster-infra] Progress report for regression tests in Rackspace

2014-05-15 Thread Vijay Bellur

On 05/15/2014 07:57 PM, Niels de Vos wrote:

On Thu, May 15, 2014 at 06:05:00PM +0530, Vijay Bellur wrote:

On 04/30/2014 07:03 PM, Justin Clift wrote:

Hi us,

Was trying out the GlusterFS regression tests in Rackspace VMs last
night for each of the release-3.4, release-3.5, and master branches.

The regression test is just a run of "run-tests.sh", from a git
checkout of the appropriate branch.

The good news is we're adding a lot of testing code with each release:

  * release-3.4 -  6303 lines  (~30 mins to run test)
  * release-3.5 -  9776 lines  (~85 mins to run test)
  * master  - 11660 lines  (~90 mins to run test)

(lines counted using:
  $ find tests -type f -iname "*.t" -exec cat {} >> a \;; wc -l a; rm -f a)

The bad news is the tests only "kind of" pass now.  I say kind of because
although the regression run *can* pass for each of these branch's, it's
inconsistent. :(

Results from testing overnight:

  * release-3.4 - 20 runs - 17 PASS, 3 FAIL. 85% success.
* bug-857330/normal.t failed in one run
* bug-887098-gmount-crash.t failed in one run
* bug-857330/normal.t failed in one run

  * release-3.5 - 20 runs, 18 PASS, 2 FAIL. 90% success.
* bug-857330/xml.t failed in one run
* bug-1004744.t failed in another run (same vm for both failures)

  * master - 20 runs, 6 PASS, 14 FAIL. 30% success.
* bug-1070734.t failed in one run
* bug-1087198.t & bug-860663.t failed in one run (same vm as bug-1070734.t 
failure above)
* bug-1087198.t & bug-857330/normal.t failed in one run (new vm, a 
subsequent run on same vm passed)
* bug-1087198.t & bug-948686.t failed in one run (new vm)
* bug-1070734.t & bug-1087198.t failed in one run (new vm)
* bug-860663.t failed in one run
* bug-1023974.t & bug-1087198.t & bug-948686.t failed in one run (new vm)
* bug-1004744.t & bug-1023974.t & bug-1087198.t & bug-948686.t failed in 
one run (new vm)
* bug-948686.t failed in one run (new vm)
* bug-1070734.t failed in one run (new vm)
* bug-1023974.t failed in one run (new vm)
* bug-1087198.t & bug-948686.t failed in one run (new vm)
* bug-1070734.t failed in one run (new vm)
* bug-1087198.t failed in one run (new vm)

The occasional failing tests aren't completely random, suggesting
something is going on.  Possible race conditions maybe? (no idea).

  * 8 failures - bug-1087198.t
  * 5 failures - bug-948686.t
  * 4 failures - bug-1070734.t
  * 3 failures - bug-1023974.t
  * 3 failures - bug-857330/normal.t
  * 2 failures - bug-860663.t
  * 2 failures - bug-1004744.t
  * 1 failures - bug-857330/xml.t
  * 1 failures - bug-887098-gmount-crash.t

Anyone have suggestions on how to make this work reliably?




I think it would be a good idea to arrive at a list of test cases that
are failing at random and assign owners to address them (default owner
being the submitter of the test case). In addition to these, I have
also seen tests like bd.t and xml.t fail pretty regularly.

Justin - can we publish a consolidated list of regression tests that
fail and owners for them on an etherpad or similar?

Fixing these test cases will enable us to bring in more jenkins
instances for parallel regression runs etc. and will also provide more
determinism for our regression tests. Your help to address the
regression test suite problems will be greatly appreciated!


Indeed, getting the regression tests stable seems like a blocker before
we can move to a scalable Jenkins solution. Unfortunately, it may not be
trivial to debug these test cases... Any suggestion on capturing useful
data that helps in figuring out why the test cases don't pass?



To start with, obtaining the logs and cores from a failed regression run 
(/d/logs/...) of build.gluster.org would be useful. Once we start 
debugging a few problems and notice the necessity for more information, 
we can start collecting them for a failed regression run.


-Vijay
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] [Gluster-infra] Progress report for regression tests in Rackspace

2014-05-15 Thread Luis Pabon

Should we create bugs for each of these, and divide-and-conquer?

- Luis

On 05/15/2014 10:27 AM, Niels de Vos wrote:

On Thu, May 15, 2014 at 06:05:00PM +0530, Vijay Bellur wrote:

On 04/30/2014 07:03 PM, Justin Clift wrote:

Hi us,

Was trying out the GlusterFS regression tests in Rackspace VMs last
night for each of the release-3.4, release-3.5, and master branches.

The regression test is just a run of "run-tests.sh", from a git
checkout of the appropriate branch.

The good news is we're adding a lot of testing code with each release:

  * release-3.4 -  6303 lines  (~30 mins to run test)
  * release-3.5 -  9776 lines  (~85 mins to run test)
  * master  - 11660 lines  (~90 mins to run test)

(lines counted using:
  $ find tests -type f -iname "*.t" -exec cat {} >> a \;; wc -l a; rm -f a)

The bad news is the tests only "kind of" pass now.  I say kind of because
although the regression run *can* pass for each of these branch's, it's
inconsistent. :(

Results from testing overnight:

  * release-3.4 - 20 runs - 17 PASS, 3 FAIL. 85% success.
* bug-857330/normal.t failed in one run
* bug-887098-gmount-crash.t failed in one run
* bug-857330/normal.t failed in one run

  * release-3.5 - 20 runs, 18 PASS, 2 FAIL. 90% success.
* bug-857330/xml.t failed in one run
* bug-1004744.t failed in another run (same vm for both failures)

  * master - 20 runs, 6 PASS, 14 FAIL. 30% success.
* bug-1070734.t failed in one run
* bug-1087198.t & bug-860663.t failed in one run (same vm as bug-1070734.t 
failure above)
* bug-1087198.t & bug-857330/normal.t failed in one run (new vm, a 
subsequent run on same vm passed)
* bug-1087198.t & bug-948686.t failed in one run (new vm)
* bug-1070734.t & bug-1087198.t failed in one run (new vm)
* bug-860663.t failed in one run
* bug-1023974.t & bug-1087198.t & bug-948686.t failed in one run (new vm)
* bug-1004744.t & bug-1023974.t & bug-1087198.t & bug-948686.t failed in 
one run (new vm)
* bug-948686.t failed in one run (new vm)
* bug-1070734.t failed in one run (new vm)
* bug-1023974.t failed in one run (new vm)
* bug-1087198.t & bug-948686.t failed in one run (new vm)
* bug-1070734.t failed in one run (new vm)
* bug-1087198.t failed in one run (new vm)

The occasional failing tests aren't completely random, suggesting
something is going on.  Possible race conditions maybe? (no idea).

  * 8 failures - bug-1087198.t
  * 5 failures - bug-948686.t
  * 4 failures - bug-1070734.t
  * 3 failures - bug-1023974.t
  * 3 failures - bug-857330/normal.t
  * 2 failures - bug-860663.t
  * 2 failures - bug-1004744.t
  * 1 failures - bug-857330/xml.t
  * 1 failures - bug-887098-gmount-crash.t

Anyone have suggestions on how to make this work reliably?



I think it would be a good idea to arrive at a list of test cases that
are failing at random and assign owners to address them (default owner
being the submitter of the test case). In addition to these, I have
also seen tests like bd.t and xml.t fail pretty regularly.

Justin - can we publish a consolidated list of regression tests that
fail and owners for them on an etherpad or similar?

Fixing these test cases will enable us to bring in more jenkins
instances for parallel regression runs etc. and will also provide more
determinism for our regression tests. Your help to address the
regression test suite problems will be greatly appreciated!

Indeed, getting the regression tests stable seems like a blocker before
we can move to a scalable Jenkins solution. Unfortunately, it may not be
trivial to debug these test cases... Any suggestion on capturing useful
data that helps in figuring out why the test cases don't pass?

Thanks,
Niels
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-devel


___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] [Gluster-infra] Progress report for regression tests in Rackspace

2014-05-15 Thread Niels de Vos
On Thu, May 15, 2014 at 06:05:00PM +0530, Vijay Bellur wrote:
> On 04/30/2014 07:03 PM, Justin Clift wrote:
> >Hi us,
> >
> >Was trying out the GlusterFS regression tests in Rackspace VMs last
> >night for each of the release-3.4, release-3.5, and master branches.
> >
> >The regression test is just a run of "run-tests.sh", from a git
> >checkout of the appropriate branch.
> >
> >The good news is we're adding a lot of testing code with each release:
> >
> >  * release-3.4 -  6303 lines  (~30 mins to run test)
> >  * release-3.5 -  9776 lines  (~85 mins to run test)
> >  * master  - 11660 lines  (~90 mins to run test)
> >
> >(lines counted using:
> >  $ find tests -type f -iname "*.t" -exec cat {} >> a \;; wc -l a; rm -f a)
> >
> >The bad news is the tests only "kind of" pass now.  I say kind of because
> >although the regression run *can* pass for each of these branch's, it's
> >inconsistent. :(
> >
> >Results from testing overnight:
> >
> >  * release-3.4 - 20 runs - 17 PASS, 3 FAIL. 85% success.
> >* bug-857330/normal.t failed in one run
> >* bug-887098-gmount-crash.t failed in one run
> >* bug-857330/normal.t failed in one run
> >
> >  * release-3.5 - 20 runs, 18 PASS, 2 FAIL. 90% success.
> >* bug-857330/xml.t failed in one run
> >* bug-1004744.t failed in another run (same vm for both failures)
> >
> >  * master - 20 runs, 6 PASS, 14 FAIL. 30% success.
> >* bug-1070734.t failed in one run
> >* bug-1087198.t & bug-860663.t failed in one run (same vm as 
> > bug-1070734.t failure above)
> >* bug-1087198.t & bug-857330/normal.t failed in one run (new vm, a 
> > subsequent run on same vm passed)
> >* bug-1087198.t & bug-948686.t failed in one run (new vm)
> >* bug-1070734.t & bug-1087198.t failed in one run (new vm)
> >* bug-860663.t failed in one run
> >* bug-1023974.t & bug-1087198.t & bug-948686.t failed in one run (new vm)
> >* bug-1004744.t & bug-1023974.t & bug-1087198.t & bug-948686.t failed in 
> > one run (new vm)
> >* bug-948686.t failed in one run (new vm)
> >* bug-1070734.t failed in one run (new vm)
> >* bug-1023974.t failed in one run (new vm)
> >* bug-1087198.t & bug-948686.t failed in one run (new vm)
> >* bug-1070734.t failed in one run (new vm)
> >* bug-1087198.t failed in one run (new vm)
> >
> >The occasional failing tests aren't completely random, suggesting
> >something is going on.  Possible race conditions maybe? (no idea).
> >
> >  * 8 failures - bug-1087198.t
> >  * 5 failures - bug-948686.t
> >  * 4 failures - bug-1070734.t
> >  * 3 failures - bug-1023974.t
> >  * 3 failures - bug-857330/normal.t
> >  * 2 failures - bug-860663.t
> >  * 2 failures - bug-1004744.t
> >  * 1 failures - bug-857330/xml.t
> >  * 1 failures - bug-887098-gmount-crash.t
> >
> >Anyone have suggestions on how to make this work reliably?
> 
> 
> 
> I think it would be a good idea to arrive at a list of test cases that
> are failing at random and assign owners to address them (default owner
> being the submitter of the test case). In addition to these, I have
> also seen tests like bd.t and xml.t fail pretty regularly.
> 
> Justin - can we publish a consolidated list of regression tests that
> fail and owners for them on an etherpad or similar?
> 
> Fixing these test cases will enable us to bring in more jenkins
> instances for parallel regression runs etc. and will also provide more
> determinism for our regression tests. Your help to address the
> regression test suite problems will be greatly appreciated!

Indeed, getting the regression tests stable seems like a blocker before 
we can move to a scalable Jenkins solution. Unfortunately, it may not be 
trivial to debug these test cases... Any suggestion on capturing useful 
data that helps in figuring out why the test cases don't pass?

Thanks,
Niels
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] [Gluster-infra] Progress report for regression tests in Rackspace

2014-05-15 Thread Vijay Bellur

On 04/30/2014 07:03 PM, Justin Clift wrote:

Hi us,

Was trying out the GlusterFS regression tests in Rackspace VMs last
night for each of the release-3.4, release-3.5, and master branches.

The regression test is just a run of "run-tests.sh", from a git
checkout of the appropriate branch.

The good news is we're adding a lot of testing code with each release:

  * release-3.4 -  6303 lines  (~30 mins to run test)
  * release-3.5 -  9776 lines  (~85 mins to run test)
  * master  - 11660 lines  (~90 mins to run test)

(lines counted using:
  $ find tests -type f -iname "*.t" -exec cat {} >> a \;; wc -l a; rm -f a)

The bad news is the tests only "kind of" pass now.  I say kind of because
although the regression run *can* pass for each of these branch's, it's
inconsistent. :(

Results from testing overnight:

  * release-3.4 - 20 runs - 17 PASS, 3 FAIL. 85% success.
* bug-857330/normal.t failed in one run
* bug-887098-gmount-crash.t failed in one run
* bug-857330/normal.t failed in one run

  * release-3.5 - 20 runs, 18 PASS, 2 FAIL. 90% success.
* bug-857330/xml.t failed in one run
* bug-1004744.t failed in another run (same vm for both failures)

  * master - 20 runs, 6 PASS, 14 FAIL. 30% success.
* bug-1070734.t failed in one run
* bug-1087198.t & bug-860663.t failed in one run (same vm as bug-1070734.t 
failure above)
* bug-1087198.t & bug-857330/normal.t failed in one run (new vm, a 
subsequent run on same vm passed)
* bug-1087198.t & bug-948686.t failed in one run (new vm)
* bug-1070734.t & bug-1087198.t failed in one run (new vm)
* bug-860663.t failed in one run
* bug-1023974.t & bug-1087198.t & bug-948686.t failed in one run (new vm)
* bug-1004744.t & bug-1023974.t & bug-1087198.t & bug-948686.t failed in 
one run (new vm)
* bug-948686.t failed in one run (new vm)
* bug-1070734.t failed in one run (new vm)
* bug-1023974.t failed in one run (new vm)
* bug-1087198.t & bug-948686.t failed in one run (new vm)
* bug-1070734.t failed in one run (new vm)
* bug-1087198.t failed in one run (new vm)

The occasional failing tests aren't completely random, suggesting
something is going on.  Possible race conditions maybe? (no idea).

  * 8 failures - bug-1087198.t
  * 5 failures - bug-948686.t
  * 4 failures - bug-1070734.t
  * 3 failures - bug-1023974.t
  * 3 failures - bug-857330/normal.t
  * 2 failures - bug-860663.t
  * 2 failures - bug-1004744.t
  * 1 failures - bug-857330/xml.t
  * 1 failures - bug-887098-gmount-crash.t

Anyone have suggestions on how to make this work reliably?




I think it would be a good idea to arrive at a list of test cases that
are failing at random and assign owners to address them (default owner
being the submitter of the test case). In addition to these, I have also 
seen tests like bd.t and xml.t fail pretty regularly.


Justin - can we publish a consolidated list of regression tests that
fail and owners for them on an etherpad or similar?

Fixing these test cases will enable us to bring in more jenkins
instances for parallel regression runs etc. and will also provide more
determinism for our regression tests. Your help to address the
regression test suite problems will be greatly appreciated!

Thanks,
Vijay



___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-devel