Re: [Gluster-devel] Release 5: Master branch health report (Week of 30th July)
On Tue, Aug 7, 2018, 10:46 PM Shyam Ranganathan wrote: > On 08/07/2018 02:58 PM, Yaniv Kaul wrote: > > The intention is to stabilize master and not add more patches that my > > destabilize it. > > > > > > https://review.gluster.org/#/c/20603/ has been merged. > > As far as I can see, it has nothing to do with stabilization and should > > be reverted. > > Posted this on the gerrit review as well: > > > 4.1 does not have nightly tests, those run on master only. > That should change of course. We cannot strive for stability otherwise, AFAIK. > Stability of master does not (will not), in the near term guarantee > stability of release branches, unless patches that impact code already > on release branches, get fixes on master and are back ported. > > Release branches get fixes back ported (as is normal), this fix and its > merge should not impact current master stability in any way, and neither > stability of 4.1 branch. > > > The current hold is on master, not on release branches. I agree that > merging further code changes on release branches (for example geo-rep > issues that are backported (see [1]), as there are tests that fail > regularly on master), may further destabilize the release branch. This > patch is not one of those. > Two issues I have with the merge: 1. It just makes comparing master branch to release branch harder. For example, to understand if there's a test that fails on master but succeeds on release branch, or vice versa. 2. It means we are not focused on stabilizing master branch. Y. > Merging patches on release branches are allowed by release owners only, > and usual practice is keeping the backlog low (merging weekly) in these > cases as per the dashboard [1]. > > Allowing for the above 2 reasons this patch was found, > - Not on master > - Not stabilizing or destabilizing the release branch > and hence was merged. > > If maintainers disagree I can revert the same. > > Shyam > > [1] Release 4.1 dashboard: > > https://review.gluster.org/#/projects/glusterfs,dashboards/dashboard:4-1-dashboard > ___ Gluster-devel mailing list Gluster-devel@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] Release 5: Master branch health report (Week of 30th July)
On 08/07/2018 02:58 PM, Yaniv Kaul wrote: > The intention is to stabilize master and not add more patches that my > destabilize it. > > > https://review.gluster.org/#/c/20603/ has been merged. > As far as I can see, it has nothing to do with stabilization and should > be reverted. Posted this on the gerrit review as well: 4.1 does not have nightly tests, those run on master only. Stability of master does not (will not), in the near term guarantee stability of release branches, unless patches that impact code already on release branches, get fixes on master and are back ported. Release branches get fixes back ported (as is normal), this fix and its merge should not impact current master stability in any way, and neither stability of 4.1 branch. The current hold is on master, not on release branches. I agree that merging further code changes on release branches (for example geo-rep issues that are backported (see [1]), as there are tests that fail regularly on master), may further destabilize the release branch. This patch is not one of those. Merging patches on release branches are allowed by release owners only, and usual practice is keeping the backlog low (merging weekly) in these cases as per the dashboard [1]. Allowing for the above 2 reasons this patch was found, - Not on master - Not stabilizing or destabilizing the release branch and hence was merged. If maintainers disagree I can revert the same. Shyam [1] Release 4.1 dashboard: https://review.gluster.org/#/projects/glusterfs,dashboards/dashboard:4-1-dashboard ___ Gluster-devel mailing list Gluster-devel@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] Release 5: Master branch health report (Week of 30th July)
On Mon, Aug 6, 2018 at 1:24 AM, Shyam Ranganathan wrote: > On 07/31/2018 07:16 AM, Shyam Ranganathan wrote: > > On 07/30/2018 03:21 PM, Shyam Ranganathan wrote: > >> On 07/24/2018 03:12 PM, Shyam Ranganathan wrote: > >>> 1) master branch health checks (weekly, till branching) > >>> - Expect every Monday a status update on various tests runs > >> See https://build.gluster.org/job/nightly-master/ for a report on > >> various nightly and periodic jobs on master. > > Thinking aloud, we may have to stop merges to master to get these test > > failures addressed at the earliest and to continue maintaining them > > GREEN for the health of the branch. > > > > I would give the above a week, before we lockdown the branch to fix the > > failures. > > > > Let's try and get line-coverage and nightly regression tests addressed > > this week (leaving mux-regression open), and if addressed not lock the > > branch down. > > > > Health on master as of the last nightly run [4] is still the same. > > Potential patches that rectify the situation (as in [1]) are bunched in > a patch [2] that Atin and myself have put through several regressions > (mux, normal and line coverage) and these have also not passed. > > Till we rectify the situation we are locking down master branch commit > rights to the following people, Amar, Atin, Shyam, Vijay. > > The intention is to stabilize master and not add more patches that my > destabilize it. > https://review.gluster.org/#/c/20603/ has been merged. As far as I can see, it has nothing to do with stabilization and should be reverted. Y. > > Test cases that are tracked as failures and need action are present here > [3]. > > @Nigel, request you to apply the commit rights change as you see this > mail and let the list know regarding the same as well. > > Thanks, > Shyam > > [1] Patches that address regression failures: > https://review.gluster.org/#/q/starredby:srangana%2540redhat.com > > [2] Bunched up patch against which regressions were run: > https://review.gluster.org/#/c/20637 > > [3] Failing tests list: > https://docs.google.com/spreadsheets/d/1IF9GhpKah4bto19RQLr0y_Kkw26E_ > -crKALHSaSjZMQ/edit?usp=sharing > > [4] Nightly run dashboard: https://build.gluster.org/job/nightly-master/ > ___ > Gluster-devel mailing list > Gluster-devel@gluster.org > https://lists.gluster.org/mailman/listinfo/gluster-devel > ___ Gluster-devel mailing list Gluster-devel@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] Release 5: Master branch health report (Week of 30th July)
On 07/31/2018 07:16 AM, Shyam Ranganathan wrote: > On 07/30/2018 03:21 PM, Shyam Ranganathan wrote: >> On 07/24/2018 03:12 PM, Shyam Ranganathan wrote: >>> 1) master branch health checks (weekly, till branching) >>> - Expect every Monday a status update on various tests runs >> See https://build.gluster.org/job/nightly-master/ for a report on >> various nightly and periodic jobs on master. > Thinking aloud, we may have to stop merges to master to get these test > failures addressed at the earliest and to continue maintaining them > GREEN for the health of the branch. > > I would give the above a week, before we lockdown the branch to fix the > failures. > > Let's try and get line-coverage and nightly regression tests addressed > this week (leaving mux-regression open), and if addressed not lock the > branch down. > Health on master as of the last nightly run [4] is still the same. Potential patches that rectify the situation (as in [1]) are bunched in a patch [2] that Atin and myself have put through several regressions (mux, normal and line coverage) and these have also not passed. Till we rectify the situation we are locking down master branch commit rights to the following people, Amar, Atin, Shyam, Vijay. The intention is to stabilize master and not add more patches that my destabilize it. Test cases that are tracked as failures and need action are present here [3]. @Nigel, request you to apply the commit rights change as you see this mail and let the list know regarding the same as well. Thanks, Shyam [1] Patches that address regression failures: https://review.gluster.org/#/q/starredby:srangana%2540redhat.com [2] Bunched up patch against which regressions were run: https://review.gluster.org/#/c/20637 [3] Failing tests list: https://docs.google.com/spreadsheets/d/1IF9GhpKah4bto19RQLr0y_Kkw26E_-crKALHSaSjZMQ/edit?usp=sharing [4] Nightly run dashboard: https://build.gluster.org/job/nightly-master/ ___ Gluster-devel mailing list Gluster-devel@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] Release 5: Master branch health report (Week of 30th July)
Below is a summary of failures over the last 7 days on the nightly health check jobs. This is one test per line, sorted in descending order of occurrence (IOW, most frequent failure is on top). The list includes spurious failures as well, IOW passed on a retry. This is because if we do not weed out the spurious errors, failures may persist and make it difficult to gauge the health of the branch. The number at the end of the test line are Jenkins job numbers where these failed. The job numbers runs as follows, - https://build.gluster.org/job/regression-test-burn-in/ ID: 4048 - 4053 - https://build.gluster.org/job/line-coverage/ ID: 392 - 407 - https://build.gluster.org/job/regression-test-with-multiplex/ ID: 811 - 817 So to get to job 4051 (say), use the link https://build.gluster.org/job/regression-test-burn-in/4051 Atin has called out some folks for attention to some tests, consider this a call out to others, if you see a test against your component, help around root causing and fixing it is needed. tests/bugs/core/bug-1432542-mpx-restart-crash.t, 4049, 4051, 4052, 405, 404, 403, 396, 392 tests/00-geo-rep/georep-basic-dr-tarssh.t, 811, 814, 817, 4050, 4053 tests/bugs/bug-1368312.t, 815, 816, 811, 813, 403 tests/bugs/distribute/bug-1122443.t, 4050, 407, 403, 815, 816 tests/bugs/glusterd/add-brick-and-validate-replicated-volume-options.t, 814, 816, 817, 812, 815 tests/bugs/replicate/bug-1586020-mark-dirty-for-entry-txn-on-quorum-failure.t, 4049, 812, 814, 405, 392 tests/bitrot/bug-1373520.t, 811, 816, 817, 813 tests/bugs/ec/bug-1236065.t, 812, 813, 815 tests/00-geo-rep/georep-basic-dr-rsync.t, 813, 4046 tests/basic/ec/ec-1468261.t, 817, 812 tests/bugs/glusterd/quorum-validation.t, 4049, 407 tests/bugs/quota/bug-1293601.t, 811, 812 tests/basic/afr/add-brick-self-heal.t, 407 tests/basic/afr/granular-esh/replace-brick.t, 392 tests/bugs/core/multiplex-limit-issue-151.t, 405 tests/bugs/distribute/bug-1042725.t, 405 tests/bugs/distribute/bug-1117851.t, 405 tests/bugs/glusterd/rebalance-operations-in-single-node.t, 405 tests/bugs/index/bug-1559004-EMLINK-handling.t, 405 tests/bugs/replicate/bug-1386188-sbrain-fav-child.t, 4048 tests/bugs/replicate/bug-1433571-undo-pending-only-on-up-bricks.t, 813 Thanks, Shyam On 07/30/2018 03:21 PM, Shyam Ranganathan wrote: > On 07/24/2018 03:12 PM, Shyam Ranganathan wrote: >> 1) master branch health checks (weekly, till branching) >> - Expect every Monday a status update on various tests runs > > See https://build.gluster.org/job/nightly-master/ for a report on > various nightly and periodic jobs on master. > > RED: > 1. Nightly regression (3/6 failed) > - Tests that reported failure: > ./tests/00-geo-rep/georep-basic-dr-rsync.t > ./tests/bugs/core/bug-1432542-mpx-restart-crash.t > ./tests/bugs/replicate/bug-1586020-mark-dirty-for-entry-txn-on-quorum-failure.t > ./tests/bugs/distribute/bug-1122443.t > > - Tests that needed a retry: > ./tests/00-geo-rep/georep-basic-dr-tarssh.t > ./tests/bugs/glusterd/quorum-validation.t > > 2. Regression with multiplex (cores and test failures) > > 3. line-coverage (cores and test failures) > - Tests that failed: > ./tests/bugs/core/bug-1432542-mpx-restart-crash.t (patch > https://review.gluster.org/20568 does not fix the timeout entirely, as > can be seen in this run, > https://build.gluster.org/job/line-coverage/401/consoleFull ) > > Calling out to contributors to take a look at various failures, and post > the same as bugs AND to the lists (so that duplication is avoided) to > get this to a GREEN status. > > GREEN: > 1. cpp-check > 2. RPM builds > > IGNORE (for now): > 1. clang scan (@nigel, this job requires clang warnings to be fixed to > go green, right?) > > Shyam > ___ > Gluster-devel mailing list > Gluster-devel@gluster.org > https://lists.gluster.org/mailman/listinfo/gluster-devel > ___ Gluster-devel mailing list Gluster-devel@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] Release 5: Master branch health report (Week of 30th July)
On 07/30/2018 03:21 PM, Shyam Ranganathan wrote: > On 07/24/2018 03:12 PM, Shyam Ranganathan wrote: >> 1) master branch health checks (weekly, till branching) >> - Expect every Monday a status update on various tests runs > > See https://build.gluster.org/job/nightly-master/ for a report on > various nightly and periodic jobs on master. Thinking aloud, we may have to stop merges to master to get these test failures addressed at the earliest and to continue maintaining them GREEN for the health of the branch. I would give the above a week, before we lockdown the branch to fix the failures. Let's try and get line-coverage and nightly regression tests addressed this week (leaving mux-regression open), and if addressed not lock the branch down. > > RED: > 1. Nightly regression (3/6 failed) > - Tests that reported failure: > ./tests/00-geo-rep/georep-basic-dr-rsync.t > ./tests/bugs/core/bug-1432542-mpx-restart-crash.t > ./tests/bugs/replicate/bug-1586020-mark-dirty-for-entry-txn-on-quorum-failure.t > ./tests/bugs/distribute/bug-1122443.t > > - Tests that needed a retry: > ./tests/00-geo-rep/georep-basic-dr-tarssh.t > ./tests/bugs/glusterd/quorum-validation.t > > 2. Regression with multiplex (cores and test failures) > > 3. line-coverage (cores and test failures) > - Tests that failed: > ./tests/bugs/core/bug-1432542-mpx-restart-crash.t (patch > https://review.gluster.org/20568 does not fix the timeout entirely, as > can be seen in this run, > https://build.gluster.org/job/line-coverage/401/consoleFull ) > > Calling out to contributors to take a look at various failures, and post > the same as bugs AND to the lists (so that duplication is avoided) to > get this to a GREEN status. > > GREEN: > 1. cpp-check > 2. RPM builds > > IGNORE (for now): > 1. clang scan (@nigel, this job requires clang warnings to be fixed to > go green, right?) > > Shyam > ___ > Gluster-devel mailing list > Gluster-devel@gluster.org > https://lists.gluster.org/mailman/listinfo/gluster-devel > ___ Gluster-devel mailing list Gluster-devel@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] Release 5: Master branch health report (Week of 30th July)
On 07/24/2018 03:12 PM, Shyam Ranganathan wrote: > 1) master branch health checks (weekly, till branching) > - Expect every Monday a status update on various tests runs See https://build.gluster.org/job/nightly-master/ for a report on various nightly and periodic jobs on master. RED: 1. Nightly regression (3/6 failed) - Tests that reported failure: ./tests/00-geo-rep/georep-basic-dr-rsync.t ./tests/bugs/core/bug-1432542-mpx-restart-crash.t ./tests/bugs/replicate/bug-1586020-mark-dirty-for-entry-txn-on-quorum-failure.t ./tests/bugs/distribute/bug-1122443.t - Tests that needed a retry: ./tests/00-geo-rep/georep-basic-dr-tarssh.t ./tests/bugs/glusterd/quorum-validation.t 2. Regression with multiplex (cores and test failures) 3. line-coverage (cores and test failures) - Tests that failed: ./tests/bugs/core/bug-1432542-mpx-restart-crash.t (patch https://review.gluster.org/20568 does not fix the timeout entirely, as can be seen in this run, https://build.gluster.org/job/line-coverage/401/consoleFull ) Calling out to contributors to take a look at various failures, and post the same as bugs AND to the lists (so that duplication is avoided) to get this to a GREEN status. GREEN: 1. cpp-check 2. RPM builds IGNORE (for now): 1. clang scan (@nigel, this job requires clang warnings to be fixed to go green, right?) Shyam ___ Gluster-devel mailing list Gluster-devel@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-devel