Re: [Gluster-devel] bug-1432542-mpx-restart-crash.t failures

2018-08-02 Thread Shyam Ranganathan
On 08/01/2018 11:10 PM, Nigel Babu wrote:
> Hi Shyam,
> 
> Amar and I sat down to debug this failure[1] this morning. There was a
> bit of fun looking at the logs. It looked like the test restarted
> itself. The first log entry is at 16:20:03. This test has a timeout of
> 400 seconds which is around 16:26:43.
> 
> However, if you account for the fact that we log from the second step or
> so, it looks like the test timed out and we restarted it. The first log
> entry is from a few steps in, this makes sense. I think your patch[2] to
> increase the timeout to 800 seconds is the right way forward.
> 
> The last step before the timeout is this
> [2018-07-30 16:26:29.160943]  : volume stop patchy-vol17 : SUCCESS
> [2018-07-30 16:26:40.222688]  : volume delete patchy-vol17 : SUCCESS
> 
> There are 20 volumes, so it really needs at least a 90 second bump. I'm
> estimating 30 seconds per volume to clean up. You probably want to some
> extra time so it passes on lcov as well. So right now the 800 second
> clean up looks good.

Unfortunately the timeout bump still does not clear lcov, see,
https://build.gluster.org/job/line-coverage/401/console
https://build.gluster.org/job/line-coverage/400/console
https://build.gluster.org/job/line-coverage/406/console

The first test passes, then as a part of the full run it fails again.

Patch also pushes up the EXPECT_WITHIN to 120 seconds... :(

> 
> [1]: https://build.gluster.org/job/regression-test-burn-in/4051/
> [2]: https://review.gluster.org/#/c/20568/2
> -- 
> nigelb
___
Gluster-devel mailing list
Gluster-devel@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-devel

[Gluster-devel] bug-1432542-mpx-restart-crash.t failures

2018-08-01 Thread Nigel Babu
Hi Shyam,

Amar and I sat down to debug this failure[1] this morning. There was a bit
of fun looking at the logs. It looked like the test restarted itself. The
first log entry is at 16:20:03. This test has a timeout of 400 seconds
which is around 16:26:43.

However, if you account for the fact that we log from the second step or
so, it looks like the test timed out and we restarted it. The first log
entry is from a few steps in, this makes sense. I think your patch[2] to
increase the timeout to 800 seconds is the right way forward.

The last step before the timeout is this
[2018-07-30 16:26:29.160943]  : volume stop patchy-vol17 : SUCCESS
[2018-07-30 16:26:40.222688]  : volume delete patchy-vol17 : SUCCESS

There are 20 volumes, so it really needs at least a 90 second bump. I'm
estimating 30 seconds per volume to clean up. You probably want to some
extra time so it passes on lcov as well. So right now the 800 second clean
up looks good.

[1]: https://build.gluster.org/job/regression-test-burn-in/4051/
[2]: https://review.gluster.org/#/c/20568/2
-- 
nigelb
___
Gluster-devel mailing list
Gluster-devel@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-devel