Re: [Gluster-devel] [Gluster-Maintainers] Master branch lock down status (Wed, August 08th)

2018-08-10 Thread Kotresh Hiremath Ravishankar
Hi Shyam/Atin,

I have posted the patch[1] for geo-rep test cases failure:
tests/00-geo-rep/georep-basic-dr-rsync.t
tests/00-geo-rep/georep-basic-dr-tarssh.t
tests/00-geo-rep/00-georep-verify-setup.t

Please include patch [1] while triggering tests.
The instrumentation patch [2] which was included can be removed.

[1]  https://review.gluster.org/#/c/glusterfs/+/20704/
[2]  https://review.gluster.org/#/c/glusterfs/+/20477/

Thanks,
Kotresh HR




On Fri, Aug 10, 2018 at 3:21 PM, Pranith Kumar Karampuri <
pkara...@redhat.com> wrote:

>
>
> On Thu, Aug 9, 2018 at 4:02 PM Pranith Kumar Karampuri <
> pkara...@redhat.com> wrote:
>
>>
>>
>> On Thu, Aug 9, 2018 at 6:34 AM Shyam Ranganathan 
>> wrote:
>>
>>> Today's patch set 7 [1], included fixes provided till last evening IST,
>>> and its runs can be seen here [2] (yay! we can link to comments in
>>> gerrit now).
>>>
>>> New failures: (added to the spreadsheet)
>>> ./tests/bugs/protocol/bug-808400-repl.t (core dumped)
>>> ./tests/bugs/quick-read/bug-846240.t
>>>
>>> Older tests that had not recurred, but failed today: (moved up in the
>>> spreadsheet)
>>> ./tests/bugs/replicate/bug-1433571-undo-pending-only-on-up-bricks.t
>>> ./tests/bugs/index/bug-1559004-EMLINK-handling.t
>>>
>>
>> The above test is timing out. I had to increase the timeout while adding
>> the .t so that creation of maximum number of links that will max-out in
>> ext4. Will re-check if it is the same issue and get back.
>>
>
> This test is timing out with lcov. I bumped up timeout to 30 minutes @
> https://review.gluster.org/#/c/glusterfs/+/20699, I am not happy that
> this test takes so long, but without this it is difficult to find
> regression on ext4 which has limits on number of hardlinks in a
> directory(It took us almost one year after we introduced regression to find
> this problem when we did introduce regression last time). If there is a way
> of running this .t once per day and before each release. I will be happy to
> make it part of that. Let me know.
>
>
>>
>>
>>>
>>> Other issues;
>>> Test ./tests/basic/ec/ec-5-2.t core dumped again
>>> Few geo-rep failures, Kotresh should have more logs to look at with
>>> these runs
>>> Test ./tests/bugs/glusterd/quorum-validation.t dumped core again
>>>
>>> Atin/Amar, we may need to merge some of the patches that have proven to
>>> be holding up and fixing issues today, so that we do not leave
>>> everything to the last. Check and move them along or lmk.
>>>
>>> Shyam
>>>
>>> [1] Patch set 7: https://review.gluster.org/c/glusterfs/+/20637/7
>>> [2] Runs against patch set 7 and its status (incomplete as some runs
>>> have not completed):
>>> https://review.gluster.org/c/glusterfs/+/20637/7#message-
>>> 37bc68ce6f2157f2947da6fd03b361ab1b0d1a77
>>> (also updated in the spreadsheet)
>>>
>>> On 08/07/2018 07:37 PM, Shyam Ranganathan wrote:
>>> > Deserves a new beginning, threads on the other mail have gone deep
>>> enough.
>>> >
>>> > NOTE: (5) below needs your attention, rest is just process and data on
>>> > how to find failures.
>>> >
>>> > 1) We are running the tests using the patch [2].
>>> >
>>> > 2) Run details are extracted into a separate sheet in [3] named "Run
>>> > Failures" use a search to find a failing test and the corresponding run
>>> > that it failed in.
>>> >
>>> > 3) Patches that are fixing issues can be found here [1], if you think
>>> > you have a patch out there, that is not in this list, shout out.
>>> >
>>> > 4) If you own up a test case failure, update the spreadsheet [3] with
>>> > your name against the test, and also update other details as needed (as
>>> > comments, as edit rights to the sheet are restricted).
>>> >
>>> > 5) Current test failures
>>> > We still have the following tests failing and some without any RCA or
>>> > attention, (If something is incorrect, write back).
>>> >
>>> > ./tests/bugs/replicate/bug-1290965-detect-bitrotten-objects.t (needs
>>> > attention)
>>> > ./tests/00-geo-rep/georep-basic-dr-tarssh.t (Kotresh)
>>> > ./tests/bugs/glusterd/add-brick-and-validate-replicated-
>>> volume-options.t
>>> > (Atin)
>>> > ./tests/bugs/ec/bug-1236065.t (Ashish)
>>> > ./tests/00-geo-rep/georep-basic-dr-rsync.t (Kotresh)
>>> > ./tests/basic/ec/ec-1468261.t (needs attention)
>>> > ./tests/basic/afr/add-brick-self-heal.t (needs attention)
>>> > ./tests/basic/afr/granular-esh/replace-brick.t (needs attention)
>>> > ./tests/bugs/core/multiplex-limit-issue-151.t (needs attention)
>>> > ./tests/bugs/glusterd/validating-server-quorum.t (Atin)
>>> > ./tests/bugs/replicate/bug-1363721.t (Ravi)
>>> >
>>> > Here are some newer failures, but mostly one-off failures except cores
>>> > in ec-5-2.t. All of the following need attention as these are new.
>>> >
>>> > ./tests/00-geo-rep/00-georep-verify-setup.t
>>> > ./tests/basic/afr/gfid-mismatch-resolution-with-fav-child-policy.t
>>> > ./tests/basic/stats-dump.t
>>> > ./tests/bugs/bug-1110262.t
>>> > ./tests/bugs/glusterd/mgmt-handshake-and-volume-sync-
>>> 

Re: [Gluster-devel] [Gluster-Maintainers] Master branch lock down status (Wed, August 08th)

2018-08-10 Thread Pranith Kumar Karampuri
On Thu, Aug 9, 2018 at 4:02 PM Pranith Kumar Karampuri 
wrote:

>
>
> On Thu, Aug 9, 2018 at 6:34 AM Shyam Ranganathan 
> wrote:
>
>> Today's patch set 7 [1], included fixes provided till last evening IST,
>> and its runs can be seen here [2] (yay! we can link to comments in
>> gerrit now).
>>
>> New failures: (added to the spreadsheet)
>> ./tests/bugs/protocol/bug-808400-repl.t (core dumped)
>> ./tests/bugs/quick-read/bug-846240.t
>>
>> Older tests that had not recurred, but failed today: (moved up in the
>> spreadsheet)
>> ./tests/bugs/replicate/bug-1433571-undo-pending-only-on-up-bricks.t
>> ./tests/bugs/index/bug-1559004-EMLINK-handling.t
>>
>
> The above test is timing out. I had to increase the timeout while adding
> the .t so that creation of maximum number of links that will max-out in
> ext4. Will re-check if it is the same issue and get back.
>

This test is timing out with lcov. I bumped up timeout to 30 minutes @
https://review.gluster.org/#/c/glusterfs/+/20699, I am not happy that this
test takes so long, but without this it is difficult to find regression on
ext4 which has limits on number of hardlinks in a directory(It took us
almost one year after we introduced regression to find this problem when we
did introduce regression last time). If there is a way of running this .t
once per day and before each release. I will be happy to make it part of
that. Let me know.


>
>
>>
>> Other issues;
>> Test ./tests/basic/ec/ec-5-2.t core dumped again
>> Few geo-rep failures, Kotresh should have more logs to look at with
>> these runs
>> Test ./tests/bugs/glusterd/quorum-validation.t dumped core again
>>
>> Atin/Amar, we may need to merge some of the patches that have proven to
>> be holding up and fixing issues today, so that we do not leave
>> everything to the last. Check and move them along or lmk.
>>
>> Shyam
>>
>> [1] Patch set 7: https://review.gluster.org/c/glusterfs/+/20637/7
>> [2] Runs against patch set 7 and its status (incomplete as some runs
>> have not completed):
>>
>> https://review.gluster.org/c/glusterfs/+/20637/7#message-37bc68ce6f2157f2947da6fd03b361ab1b0d1a77
>> (also updated in the spreadsheet)
>>
>> On 08/07/2018 07:37 PM, Shyam Ranganathan wrote:
>> > Deserves a new beginning, threads on the other mail have gone deep
>> enough.
>> >
>> > NOTE: (5) below needs your attention, rest is just process and data on
>> > how to find failures.
>> >
>> > 1) We are running the tests using the patch [2].
>> >
>> > 2) Run details are extracted into a separate sheet in [3] named "Run
>> > Failures" use a search to find a failing test and the corresponding run
>> > that it failed in.
>> >
>> > 3) Patches that are fixing issues can be found here [1], if you think
>> > you have a patch out there, that is not in this list, shout out.
>> >
>> > 4) If you own up a test case failure, update the spreadsheet [3] with
>> > your name against the test, and also update other details as needed (as
>> > comments, as edit rights to the sheet are restricted).
>> >
>> > 5) Current test failures
>> > We still have the following tests failing and some without any RCA or
>> > attention, (If something is incorrect, write back).
>> >
>> > ./tests/bugs/replicate/bug-1290965-detect-bitrotten-objects.t (needs
>> > attention)
>> > ./tests/00-geo-rep/georep-basic-dr-tarssh.t (Kotresh)
>> > ./tests/bugs/glusterd/add-brick-and-validate-replicated-volume-options.t
>> > (Atin)
>> > ./tests/bugs/ec/bug-1236065.t (Ashish)
>> > ./tests/00-geo-rep/georep-basic-dr-rsync.t (Kotresh)
>> > ./tests/basic/ec/ec-1468261.t (needs attention)
>> > ./tests/basic/afr/add-brick-self-heal.t (needs attention)
>> > ./tests/basic/afr/granular-esh/replace-brick.t (needs attention)
>> > ./tests/bugs/core/multiplex-limit-issue-151.t (needs attention)
>> > ./tests/bugs/glusterd/validating-server-quorum.t (Atin)
>> > ./tests/bugs/replicate/bug-1363721.t (Ravi)
>> >
>> > Here are some newer failures, but mostly one-off failures except cores
>> > in ec-5-2.t. All of the following need attention as these are new.
>> >
>> > ./tests/00-geo-rep/00-georep-verify-setup.t
>> > ./tests/basic/afr/gfid-mismatch-resolution-with-fav-child-policy.t
>> > ./tests/basic/stats-dump.t
>> > ./tests/bugs/bug-1110262.t
>> >
>> ./tests/bugs/glusterd/mgmt-handshake-and-volume-sync-post-glusterd-restart.t
>> > ./tests/basic/ec/ec-data-heal.t
>> > ./tests/bugs/replicate/bug-1448804-check-quorum-type-values.t
>> >
>> ./tests/bugs/snapshot/bug-1482023-snpashot-issue-with-other-processes-accessing-mounted-path.t
>> > ./tests/basic/ec/ec-5-2.t
>> >
>> > 6) Tests that are addressed or are not occurring anymore are,
>> >
>> > ./tests/bugs/glusterd/rebalance-operations-in-single-node.t
>> > ./tests/bugs/index/bug-1559004-EMLINK-handling.t
>> > ./tests/bugs/replicate/bug-1386188-sbrain-fav-child.t
>> > ./tests/bugs/replicate/bug-1433571-undo-pending-only-on-up-bricks.t
>> > ./tests/bitrot/bug-1373520.t
>> > ./tests/bugs/distribute/bug-1117851.t
>> > 

Re: [Gluster-devel] [Gluster-Maintainers] Master branch lock down status (Wed, August 08th)

2018-08-09 Thread Pranith Kumar Karampuri
On Thu, Aug 9, 2018 at 6:34 AM Shyam Ranganathan 
wrote:

> Today's patch set 7 [1], included fixes provided till last evening IST,
> and its runs can be seen here [2] (yay! we can link to comments in
> gerrit now).
>
> New failures: (added to the spreadsheet)
> ./tests/bugs/protocol/bug-808400-repl.t (core dumped)
> ./tests/bugs/quick-read/bug-846240.t
>
> Older tests that had not recurred, but failed today: (moved up in the
> spreadsheet)
> ./tests/bugs/replicate/bug-1433571-undo-pending-only-on-up-bricks.t
> ./tests/bugs/index/bug-1559004-EMLINK-handling.t
>

The above test is timing out. I had to increase the timeout while adding
the .t so that creation of maximum number of links that will max-out in
ext4. Will re-check if it is the same issue and get back.


>
> Other issues;
> Test ./tests/basic/ec/ec-5-2.t core dumped again
> Few geo-rep failures, Kotresh should have more logs to look at with
> these runs
> Test ./tests/bugs/glusterd/quorum-validation.t dumped core again
>
> Atin/Amar, we may need to merge some of the patches that have proven to
> be holding up and fixing issues today, so that we do not leave
> everything to the last. Check and move them along or lmk.
>
> Shyam
>
> [1] Patch set 7: https://review.gluster.org/c/glusterfs/+/20637/7
> [2] Runs against patch set 7 and its status (incomplete as some runs
> have not completed):
>
> https://review.gluster.org/c/glusterfs/+/20637/7#message-37bc68ce6f2157f2947da6fd03b361ab1b0d1a77
> (also updated in the spreadsheet)
>
> On 08/07/2018 07:37 PM, Shyam Ranganathan wrote:
> > Deserves a new beginning, threads on the other mail have gone deep
> enough.
> >
> > NOTE: (5) below needs your attention, rest is just process and data on
> > how to find failures.
> >
> > 1) We are running the tests using the patch [2].
> >
> > 2) Run details are extracted into a separate sheet in [3] named "Run
> > Failures" use a search to find a failing test and the corresponding run
> > that it failed in.
> >
> > 3) Patches that are fixing issues can be found here [1], if you think
> > you have a patch out there, that is not in this list, shout out.
> >
> > 4) If you own up a test case failure, update the spreadsheet [3] with
> > your name against the test, and also update other details as needed (as
> > comments, as edit rights to the sheet are restricted).
> >
> > 5) Current test failures
> > We still have the following tests failing and some without any RCA or
> > attention, (If something is incorrect, write back).
> >
> > ./tests/bugs/replicate/bug-1290965-detect-bitrotten-objects.t (needs
> > attention)
> > ./tests/00-geo-rep/georep-basic-dr-tarssh.t (Kotresh)
> > ./tests/bugs/glusterd/add-brick-and-validate-replicated-volume-options.t
> > (Atin)
> > ./tests/bugs/ec/bug-1236065.t (Ashish)
> > ./tests/00-geo-rep/georep-basic-dr-rsync.t (Kotresh)
> > ./tests/basic/ec/ec-1468261.t (needs attention)
> > ./tests/basic/afr/add-brick-self-heal.t (needs attention)
> > ./tests/basic/afr/granular-esh/replace-brick.t (needs attention)
> > ./tests/bugs/core/multiplex-limit-issue-151.t (needs attention)
> > ./tests/bugs/glusterd/validating-server-quorum.t (Atin)
> > ./tests/bugs/replicate/bug-1363721.t (Ravi)
> >
> > Here are some newer failures, but mostly one-off failures except cores
> > in ec-5-2.t. All of the following need attention as these are new.
> >
> > ./tests/00-geo-rep/00-georep-verify-setup.t
> > ./tests/basic/afr/gfid-mismatch-resolution-with-fav-child-policy.t
> > ./tests/basic/stats-dump.t
> > ./tests/bugs/bug-1110262.t
> >
> ./tests/bugs/glusterd/mgmt-handshake-and-volume-sync-post-glusterd-restart.t
> > ./tests/basic/ec/ec-data-heal.t
> > ./tests/bugs/replicate/bug-1448804-check-quorum-type-values.t
> >
> ./tests/bugs/snapshot/bug-1482023-snpashot-issue-with-other-processes-accessing-mounted-path.t
> > ./tests/basic/ec/ec-5-2.t
> >
> > 6) Tests that are addressed or are not occurring anymore are,
> >
> > ./tests/bugs/glusterd/rebalance-operations-in-single-node.t
> > ./tests/bugs/index/bug-1559004-EMLINK-handling.t
> > ./tests/bugs/replicate/bug-1386188-sbrain-fav-child.t
> > ./tests/bugs/replicate/bug-1433571-undo-pending-only-on-up-bricks.t
> > ./tests/bitrot/bug-1373520.t
> > ./tests/bugs/distribute/bug-1117851.t
> > ./tests/bugs/glusterd/quorum-validation.t
> > ./tests/bugs/distribute/bug-1042725.t
> >
> ./tests/bugs/replicate/bug-1586020-mark-dirty-for-entry-txn-on-quorum-failure.t
> > ./tests/bugs/quota/bug-1293601.t
> > ./tests/bugs/bug-1368312.t
> > ./tests/bugs/distribute/bug-1122443.t
> > ./tests/bugs/core/bug-1432542-mpx-restart-crash.t
> >
> > Shyam (and Atin)
> >
> > On 08/05/2018 06:24 PM, Shyam Ranganathan wrote:
> >> Health on master as of the last nightly run [4] is still the same.
> >>
> >> Potential patches that rectify the situation (as in [1]) are bunched in
> >> a patch [2] that Atin and myself have put through several regressions
> >> (mux, normal and line coverage) and these have also not passed.
> >>
> >> Till we rectify the 

Re: [Gluster-devel] [Gluster-Maintainers] Master branch lock down status (Wed, August 08th)

2018-08-08 Thread Atin Mukherjee
On Thu, 9 Aug 2018 at 06:34, Shyam Ranganathan  wrote:

> Today's patch set 7 [1], included fixes provided till last evening IST,
> and its runs can be seen here [2] (yay! we can link to comments in
> gerrit now).
>
> New failures: (added to the spreadsheet)
> ./tests/bugs/protocol/bug-808400-repl.t (core dumped)
> ./tests/bugs/quick-read/bug-846240.t
>
> Older tests that had not recurred, but failed today: (moved up in the
> spreadsheet)
> ./tests/bugs/replicate/bug-1433571-undo-pending-only-on-up-bricks.t
> ./tests/bugs/index/bug-1559004-EMLINK-handling.t
>
> Other issues;
> Test ./tests/basic/ec/ec-5-2.t core dumped again




> Few geo-rep failures, Kotresh should have more logs to look at with
> these runs
> Test ./tests/bugs/glusterd/quorum-validation.t dumped core again


>
> Atin/Amar, we may need to merge some of the patches that have proven to
> be holding up and fixing issues today, so that we do not leave
> everything to the last. Check and move them along or lmk.


Ack. I’ll be merging those patches.


>
> Shyam
>
> [1] Patch set 7: https://review.gluster.org/c/glusterfs/+/20637/7
> [2] Runs against patch set 7 and its status (incomplete as some runs
> have not completed):
>
> https://review.gluster.org/c/glusterfs/+/20637/7#message-37bc68ce6f2157f2947da6fd03b361ab1b0d1a77
> (also updated in the spreadsheet)
>
> On 08/07/2018 07:37 PM, Shyam Ranganathan wrote:
> > Deserves a new beginning, threads on the other mail have gone deep
> enough.
> >
> > NOTE: (5) below needs your attention, rest is just process and data on
> > how to find failures.
> >
> > 1) We are running the tests using the patch [2].
> >
> > 2) Run details are extracted into a separate sheet in [3] named "Run
> > Failures" use a search to find a failing test and the corresponding run
> > that it failed in.
> >
> > 3) Patches that are fixing issues can be found here [1], if you think
> > you have a patch out there, that is not in this list, shout out.
> >
> > 4) If you own up a test case failure, update the spreadsheet [3] with
> > your name against the test, and also update other details as needed (as
> > comments, as edit rights to the sheet are restricted).
> >
> > 5) Current test failures
> > We still have the following tests failing and some without any RCA or
> > attention, (If something is incorrect, write back).
> >
> > ./tests/bugs/replicate/bug-1290965-detect-bitrotten-objects.t (needs
> > attention)
> > ./tests/00-geo-rep/georep-basic-dr-tarssh.t (Kotresh)
> > ./tests/bugs/glusterd/add-brick-and-validate-replicated-volume-options.t
> > (Atin)
> > ./tests/bugs/ec/bug-1236065.t (Ashish)
> > ./tests/00-geo-rep/georep-basic-dr-rsync.t (Kotresh)
> > ./tests/basic/ec/ec-1468261.t (needs attention)
> > ./tests/basic/afr/add-brick-self-heal.t (needs attention)
> > ./tests/basic/afr/granular-esh/replace-brick.t (needs attention)
> > ./tests/bugs/core/multiplex-limit-issue-151.t (needs attention)
> > ./tests/bugs/glusterd/validating-server-quorum.t (Atin)
> > ./tests/bugs/replicate/bug-1363721.t (Ravi)
> >
> > Here are some newer failures, but mostly one-off failures except cores
> > in ec-5-2.t. All of the following need attention as these are new.
> >
> > ./tests/00-geo-rep/00-georep-verify-setup.t
> > ./tests/basic/afr/gfid-mismatch-resolution-with-fav-child-policy.t
> > ./tests/basic/stats-dump.t
> > ./tests/bugs/bug-1110262.t
> >
> ./tests/bugs/glusterd/mgmt-handshake-and-volume-sync-post-glusterd-restart.t
> > ./tests/basic/ec/ec-data-heal.t
> > ./tests/bugs/replicate/bug-1448804-check-quorum-type-values.t
> >
> ./tests/bugs/snapshot/bug-1482023-snpashot-issue-with-other-processes-accessing-mounted-path.t
> > ./tests/basic/ec/ec-5-2.t
> >
> > 6) Tests that are addressed or are not occurring anymore are,
> >
> > ./tests/bugs/glusterd/rebalance-operations-in-single-node.t
> > ./tests/bugs/index/bug-1559004-EMLINK-handling.t
> > ./tests/bugs/replicate/bug-1386188-sbrain-fav-child.t
> > ./tests/bugs/replicate/bug-1433571-undo-pending-only-on-up-bricks.t
> > ./tests/bitrot/bug-1373520.t
> > ./tests/bugs/distribute/bug-1117851.t
> > ./tests/bugs/glusterd/quorum-validation.t
> > ./tests/bugs/distribute/bug-1042725.t
> >
> ./tests/bugs/replicate/bug-1586020-mark-dirty-for-entry-txn-on-quorum-failure.t
> > ./tests/bugs/quota/bug-1293601.t
> > ./tests/bugs/bug-1368312.t
> > ./tests/bugs/distribute/bug-1122443.t
> > ./tests/bugs/core/bug-1432542-mpx-restart-crash.t
> >
> > Shyam (and Atin)
> >
> > On 08/05/2018 06:24 PM, Shyam Ranganathan wrote:
> >> Health on master as of the last nightly run [4] is still the same.
> >>
> >> Potential patches that rectify the situation (as in [1]) are bunched in
> >> a patch [2] that Atin and myself have put through several regressions
> >> (mux, normal and line coverage) and these have also not passed.
> >>
> >> Till we rectify the situation we are locking down master branch commit
> >> rights to the following people, Amar, Atin, Shyam, Vijay.
> >>
> >> The intention is to stabilize master and not