Re: [Gluster-devel] Master branch lock down status (Wed, August 08th)

2018-08-12 Thread Raghavendra Gowdappa
On Sun, Aug 12, 2018 at 9:11 AM, Raghavendra Gowdappa 
wrote:

>
>
> On Sat, Aug 11, 2018 at 10:33 PM, Shyam Ranganathan 
> wrote:
>
>> On 08/09/2018 10:58 PM, Raghavendra Gowdappa wrote:
>> >
>> >
>> > On Fri, Aug 10, 2018 at 1:38 AM, Shyam Ranganathan > > > wrote:
>> >
>> > On 08/08/2018 09:04 PM, Shyam Ranganathan wrote:
>> > > Today's patch set 7 [1], included fixes provided till last
>> evening IST,
>> > > and its runs can be seen here [2] (yay! we can link to comments in
>> > > gerrit now).
>> > >
>> > > New failures: (added to the spreadsheet)
>> > > ./tests/bugs/quick-read/bug-846240.t
>> >
>> > The above test fails always if there is a sleep of 10 added at line
>> 36.
>> >
>> > I tried to replicate this in my setup, and was able to do so 3/150
>> times
>> > and the failures were the same as the ones reported in the build
>> logs
>> > (as below).
>> >
>> > Not finding any clear reason for the failure, I delayed the test
>> (i.e
>> > added a sleep 10) after the open on M0 to see if the race is
>> uncovered,
>> > and it was.
>> >
>> > Du, request you to take a look at the same, as the test is around
>> > quick-read but involves open-behind as well.
>> >
>> >
>> > Thanks for that information. I'll be working on this today.
>>
>> Heads up Du, failed again with the same pattern in run
>> https://build.gluster.org/job/regression-on-demand-full-run/
>> 46/consoleFull
>
>
> Sorry Shyam.
>
> I found out the cause [1]. But still thinking about the fix or to remove
> the test given recent changes to open-behind from [1]. You'll have an
> answer by EOD today.
>

Fix submitted at  https://review.gluster.org/#/c/glusterfs/+/20710/


> [1] https://review.gluster.org/20428
>
>
>>
>> >
>> >
>> > Failure snippet:
>> > 
>> > 23:41:24 [23:41:28] Running tests in file
>> > ./tests/bugs/quick-read/bug-846240.t
>> > 23:41:28 ./tests/bugs/quick-read/bug-846240.t ..
>> > 23:41:28 1..17
>> > 23:41:28 ok 1, LINENUM:9
>> > 23:41:28 ok 2, LINENUM:10
>> > 
>> > 23:41:28 ok 13, LINENUM:40
>> > 23:41:28 not ok 14 , LINENUM:50
>> > 23:41:28 FAILED COMMAND: [ 0 -ne 0 ]
>> >
>> > Shyam
>> >
>> >
>>
>
>
___
Gluster-devel mailing list
Gluster-devel@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] Master branch lock down status (Wed, August 08th)

2018-08-11 Thread Raghavendra Gowdappa
On Sat, Aug 11, 2018 at 10:33 PM, Shyam Ranganathan 
wrote:

> On 08/09/2018 10:58 PM, Raghavendra Gowdappa wrote:
> >
> >
> > On Fri, Aug 10, 2018 at 1:38 AM, Shyam Ranganathan  > > wrote:
> >
> > On 08/08/2018 09:04 PM, Shyam Ranganathan wrote:
> > > Today's patch set 7 [1], included fixes provided till last evening
> IST,
> > > and its runs can be seen here [2] (yay! we can link to comments in
> > > gerrit now).
> > >
> > > New failures: (added to the spreadsheet)
> > > ./tests/bugs/quick-read/bug-846240.t
> >
> > The above test fails always if there is a sleep of 10 added at line
> 36.
> >
> > I tried to replicate this in my setup, and was able to do so 3/150
> times
> > and the failures were the same as the ones reported in the build logs
> > (as below).
> >
> > Not finding any clear reason for the failure, I delayed the test (i.e
> > added a sleep 10) after the open on M0 to see if the race is
> uncovered,
> > and it was.
> >
> > Du, request you to take a look at the same, as the test is around
> > quick-read but involves open-behind as well.
> >
> >
> > Thanks for that information. I'll be working on this today.
>
> Heads up Du, failed again with the same pattern in run
> https://build.gluster.org/job/regression-on-demand-full-run/46/consoleFull


Sorry Shyam.

I found out the cause [1]. But still thinking about the fix or to remove
the test given recent changes to open-behind from [1]. You'll have an
answer by EOD today.

[1] https://review.gluster.org/20428


>
> >
> >
> > Failure snippet:
> > 
> > 23:41:24 [23:41:28] Running tests in file
> > ./tests/bugs/quick-read/bug-846240.t
> > 23:41:28 ./tests/bugs/quick-read/bug-846240.t ..
> > 23:41:28 1..17
> > 23:41:28 ok 1, LINENUM:9
> > 23:41:28 ok 2, LINENUM:10
> > 
> > 23:41:28 ok 13, LINENUM:40
> > 23:41:28 not ok 14 , LINENUM:50
> > 23:41:28 FAILED COMMAND: [ 0 -ne 0 ]
> >
> > Shyam
> >
> >
>
___
Gluster-devel mailing list
Gluster-devel@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] Master branch lock down status (Wed, August 08th)

2018-08-11 Thread Shyam Ranganathan
On 08/09/2018 10:58 PM, Raghavendra Gowdappa wrote:
> 
> 
> On Fri, Aug 10, 2018 at 1:38 AM, Shyam Ranganathan  > wrote:
> 
> On 08/08/2018 09:04 PM, Shyam Ranganathan wrote:
> > Today's patch set 7 [1], included fixes provided till last evening IST,
> > and its runs can be seen here [2] (yay! we can link to comments in
> > gerrit now).
> > 
> > New failures: (added to the spreadsheet)
> > ./tests/bugs/quick-read/bug-846240.t
> 
> The above test fails always if there is a sleep of 10 added at line 36.
> 
> I tried to replicate this in my setup, and was able to do so 3/150 times
> and the failures were the same as the ones reported in the build logs
> (as below).
> 
> Not finding any clear reason for the failure, I delayed the test (i.e
> added a sleep 10) after the open on M0 to see if the race is uncovered,
> and it was.
> 
> Du, request you to take a look at the same, as the test is around
> quick-read but involves open-behind as well.
> 
> 
> Thanks for that information. I'll be working on this today.

Heads up Du, failed again with the same pattern in run
https://build.gluster.org/job/regression-on-demand-full-run/46/consoleFull

> 
> 
> Failure snippet:
> 
> 23:41:24 [23:41:28] Running tests in file
> ./tests/bugs/quick-read/bug-846240.t
> 23:41:28 ./tests/bugs/quick-read/bug-846240.t ..
> 23:41:28 1..17
> 23:41:28 ok 1, LINENUM:9
> 23:41:28 ok 2, LINENUM:10
> 
> 23:41:28 ok 13, LINENUM:40
> 23:41:28 not ok 14 , LINENUM:50
> 23:41:28 FAILED COMMAND: [ 0 -ne 0 ]
> 
> Shyam
> 
> 
___
Gluster-devel mailing list
Gluster-devel@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] Master branch lock down status (Wed, August 08th)

2018-08-09 Thread Raghavendra Gowdappa
On Fri, Aug 10, 2018 at 1:38 AM, Shyam Ranganathan 
wrote:

> On 08/08/2018 09:04 PM, Shyam Ranganathan wrote:
> > Today's patch set 7 [1], included fixes provided till last evening IST,
> > and its runs can be seen here [2] (yay! we can link to comments in
> > gerrit now).
> >
> > New failures: (added to the spreadsheet)
> > ./tests/bugs/quick-read/bug-846240.t
>
> The above test fails always if there is a sleep of 10 added at line 36.
>
> I tried to replicate this in my setup, and was able to do so 3/150 times
> and the failures were the same as the ones reported in the build logs
> (as below).
>
> Not finding any clear reason for the failure, I delayed the test (i.e
> added a sleep 10) after the open on M0 to see if the race is uncovered,
> and it was.
>
> Du, request you to take a look at the same, as the test is around
> quick-read but involves open-behind as well.
>

Thanks for that information. I'll be working on this today.


> Failure snippet:
> 
> 23:41:24 [23:41:28] Running tests in file
> ./tests/bugs/quick-read/bug-846240.t
> 23:41:28 ./tests/bugs/quick-read/bug-846240.t ..
> 23:41:28 1..17
> 23:41:28 ok 1, LINENUM:9
> 23:41:28 ok 2, LINENUM:10
> 
> 23:41:28 ok 13, LINENUM:40
> 23:41:28 not ok 14 , LINENUM:50
> 23:41:28 FAILED COMMAND: [ 0 -ne 0 ]
>
> Shyam
>
___
Gluster-devel mailing list
Gluster-devel@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] Master branch lock down status (Wed, August 08th)

2018-08-09 Thread Shyam Ranganathan
On 08/08/2018 09:04 PM, Shyam Ranganathan wrote:
> Today's patch set 7 [1], included fixes provided till last evening IST,
> and its runs can be seen here [2] (yay! we can link to comments in
> gerrit now).
> 
> New failures: (added to the spreadsheet)
> ./tests/bugs/quick-read/bug-846240.t

The above test fails always if there is a sleep of 10 added at line 36.

I tried to replicate this in my setup, and was able to do so 3/150 times
and the failures were the same as the ones reported in the build logs
(as below).

Not finding any clear reason for the failure, I delayed the test (i.e
added a sleep 10) after the open on M0 to see if the race is uncovered,
and it was.

Du, request you to take a look at the same, as the test is around
quick-read but involves open-behind as well.

Failure snippet:

23:41:24 [23:41:28] Running tests in file
./tests/bugs/quick-read/bug-846240.t
23:41:28 ./tests/bugs/quick-read/bug-846240.t ..
23:41:28 1..17
23:41:28 ok 1, LINENUM:9
23:41:28 ok 2, LINENUM:10

23:41:28 ok 13, LINENUM:40
23:41:28 not ok 14 , LINENUM:50
23:41:28 FAILED COMMAND: [ 0 -ne 0 ]

Shyam
___
Gluster-devel mailing list
Gluster-devel@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] Master branch lock down status (Wed, August 08th)

2018-08-08 Thread Shyam Ranganathan
Today's patch set 7 [1], included fixes provided till last evening IST,
and its runs can be seen here [2] (yay! we can link to comments in
gerrit now).

New failures: (added to the spreadsheet)
./tests/bugs/protocol/bug-808400-repl.t (core dumped)
./tests/bugs/quick-read/bug-846240.t

Older tests that had not recurred, but failed today: (moved up in the
spreadsheet)
./tests/bugs/replicate/bug-1433571-undo-pending-only-on-up-bricks.t
./tests/bugs/index/bug-1559004-EMLINK-handling.t

Other issues;
Test ./tests/basic/ec/ec-5-2.t core dumped again
Few geo-rep failures, Kotresh should have more logs to look at with
these runs
Test ./tests/bugs/glusterd/quorum-validation.t dumped core again

Atin/Amar, we may need to merge some of the patches that have proven to
be holding up and fixing issues today, so that we do not leave
everything to the last. Check and move them along or lmk.

Shyam

[1] Patch set 7: https://review.gluster.org/c/glusterfs/+/20637/7
[2] Runs against patch set 7 and its status (incomplete as some runs
have not completed):
https://review.gluster.org/c/glusterfs/+/20637/7#message-37bc68ce6f2157f2947da6fd03b361ab1b0d1a77
(also updated in the spreadsheet)

On 08/07/2018 07:37 PM, Shyam Ranganathan wrote:
> Deserves a new beginning, threads on the other mail have gone deep enough.
> 
> NOTE: (5) below needs your attention, rest is just process and data on
> how to find failures.
> 
> 1) We are running the tests using the patch [2].
> 
> 2) Run details are extracted into a separate sheet in [3] named "Run
> Failures" use a search to find a failing test and the corresponding run
> that it failed in.
> 
> 3) Patches that are fixing issues can be found here [1], if you think
> you have a patch out there, that is not in this list, shout out.
> 
> 4) If you own up a test case failure, update the spreadsheet [3] with
> your name against the test, and also update other details as needed (as
> comments, as edit rights to the sheet are restricted).
> 
> 5) Current test failures
> We still have the following tests failing and some without any RCA or
> attention, (If something is incorrect, write back).
> 
> ./tests/bugs/replicate/bug-1290965-detect-bitrotten-objects.t (needs
> attention)
> ./tests/00-geo-rep/georep-basic-dr-tarssh.t (Kotresh)
> ./tests/bugs/glusterd/add-brick-and-validate-replicated-volume-options.t
> (Atin)
> ./tests/bugs/ec/bug-1236065.t (Ashish)
> ./tests/00-geo-rep/georep-basic-dr-rsync.t (Kotresh)
> ./tests/basic/ec/ec-1468261.t (needs attention)
> ./tests/basic/afr/add-brick-self-heal.t (needs attention)
> ./tests/basic/afr/granular-esh/replace-brick.t (needs attention)
> ./tests/bugs/core/multiplex-limit-issue-151.t (needs attention)
> ./tests/bugs/glusterd/validating-server-quorum.t (Atin)
> ./tests/bugs/replicate/bug-1363721.t (Ravi)
> 
> Here are some newer failures, but mostly one-off failures except cores
> in ec-5-2.t. All of the following need attention as these are new.
> 
> ./tests/00-geo-rep/00-georep-verify-setup.t
> ./tests/basic/afr/gfid-mismatch-resolution-with-fav-child-policy.t
> ./tests/basic/stats-dump.t
> ./tests/bugs/bug-1110262.t
> ./tests/bugs/glusterd/mgmt-handshake-and-volume-sync-post-glusterd-restart.t
> ./tests/basic/ec/ec-data-heal.t
> ./tests/bugs/replicate/bug-1448804-check-quorum-type-values.t
> ./tests/bugs/snapshot/bug-1482023-snpashot-issue-with-other-processes-accessing-mounted-path.t
> ./tests/basic/ec/ec-5-2.t
> 
> 6) Tests that are addressed or are not occurring anymore are,
> 
> ./tests/bugs/glusterd/rebalance-operations-in-single-node.t
> ./tests/bugs/index/bug-1559004-EMLINK-handling.t
> ./tests/bugs/replicate/bug-1386188-sbrain-fav-child.t
> ./tests/bugs/replicate/bug-1433571-undo-pending-only-on-up-bricks.t
> ./tests/bitrot/bug-1373520.t
> ./tests/bugs/distribute/bug-1117851.t
> ./tests/bugs/glusterd/quorum-validation.t
> ./tests/bugs/distribute/bug-1042725.t
> ./tests/bugs/replicate/bug-1586020-mark-dirty-for-entry-txn-on-quorum-failure.t
> ./tests/bugs/quota/bug-1293601.t
> ./tests/bugs/bug-1368312.t
> ./tests/bugs/distribute/bug-1122443.t
> ./tests/bugs/core/bug-1432542-mpx-restart-crash.t
> 
> Shyam (and Atin)
> 
> On 08/05/2018 06:24 PM, Shyam Ranganathan wrote:
>> Health on master as of the last nightly run [4] is still the same.
>>
>> Potential patches that rectify the situation (as in [1]) are bunched in
>> a patch [2] that Atin and myself have put through several regressions
>> (mux, normal and line coverage) and these have also not passed.
>>
>> Till we rectify the situation we are locking down master branch commit
>> rights to the following people, Amar, Atin, Shyam, Vijay.
>>
>> The intention is to stabilize master and not add more patches that my
>> destabilize it.
>>
>> Test cases that are tracked as failures and need action are present here
>> [3].
>>
>> @Nigel, request you to apply the commit rights change as you see this
>> mail and let the list know regarding the same as well.
>>
>> Thanks,
>> Shyam
>>
>> [1] Patches that