Re: [Gluster-devel] [Gluster-Maintainers] Master branch lock down for stabilization (unlocking the same)

2018-08-12 Thread Pranith Kumar Karampuri
On Mon, Aug 13, 2018 at 6:05 AM Shyam Ranganathan 
wrote:

> Hi,
>
> So we have had master locked down for a week to ensure we only get fixes
> for failing tests in order to stabilize the code base, partly for
> release-5 branching as well.
>
> As of this weekend, we (Atin and myself) have been looking at the
> pass/fail rates on the tests, and whether we are discovering newer
> failures of more of the same.
>
> Our runs with patch sets 10->11->12 is looking better than where we
> started, and we have a list of tests that we need to still fix.
>
> But there are other issues and fixes that are needed in the code that
> are lagging behind due to the lock down. The plan going forward is as
> follows,
>
> - Unlock master, and ensure that we do not start seeing newer failures
> as we merge other patches in, if so raise them on the lists and as bugs
> and let's work towards ensuring these are addressed. *Maintainers*
> please pay special attention when merging patches.
>
> - Address the current pending set of tests that have been identified as
> failing, over the course of the next 2 weeks. *Contributors* continue
> the focus here, so that we do not have to end up with another drive
> towards the same in 2 weeks.
>
> - At the end of 2 weeks, reassess master and nightly test status, and
> see if we need another drive towards stabilizing master by locking down
> the same and focusing only on test and code stability around the same.
>

When will there be a discussion about coming up with guidelines to prevent
lock down in future?

I think it is better to lock-down specific components by removing commit
access for the respective owners for those components when a test in a
particular component starts to fail.


>
> Atin and Shyam
> ___
> maintainers mailing list
> maintain...@gluster.org
> https://lists.gluster.org/mailman/listinfo/maintainers
>


-- 
Pranith
___
Gluster-devel mailing list
Gluster-devel@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] Master branch lock down: RCA for tests (ec-1468261.t)

2018-08-12 Thread Ashish Pandey
Correction. 

RCA - https://lists.gluster.org/pipermail/gluster-devel/2018-August/055167.html 
Patch - Mohit is working on this patch (server side) which is yet to be merged. 

We can put extra test to make sure bricks are connected to shd before heal 
begin. Will send a patch for that. 

--- 
Ashish 

- Original Message -

From: "Ashish Pandey"  
To: "Shyam Ranganathan"  
Cc: "GlusterFS Maintainers" , "Gluster Devel" 
 
Sent: Monday, August 13, 2018 10:54:16 AM 
Subject: Re: [Gluster-devel] Master branch lock down: RCA for tests 
(ec-1468261.t) 


RCA - https://lists.gluster.org/pipermail/gluster-devel/2018-August/055167.html 
Patch - https://review.gluster.org/#/c/glusterfs/+/20657/ should also fix this 
issue. 

Checking if we can put extra test to make sure bricks are connected to shd 
before heal begin. Will send a patch for that. 

--- 
Ashish 

- Original Message -

From: "Shyam Ranganathan"  
To: "Gluster Devel" , "GlusterFS Maintainers" 
 
Sent: Monday, August 13, 2018 6:12:59 AM 
Subject: Re: [Gluster-devel] Master branch lock down: RCA for tests 
(testname.t) 

As a means of keeping the focus going and squashing the remaining tests 
that were failing sporadically, request each test/component owner to, 

- respond to this mail changing the subject (testname.t) to the test 
name that they are responding to (adding more than one in case they have 
the same RCA) 
- with the current RCA and status of the same 

List of tests and current owners as per the spreadsheet that we were 
tracking are: 

./tests/basic/distribute/rebal-all-nodes-migrate.t TBD 
./tests/basic/tier/tier-heald.t TBD 
./tests/basic/afr/sparse-file-self-heal.t TBD 
./tests/bugs/shard/bug-1251824.t TBD 
./tests/bugs/shard/configure-lru-limit.t TBD 
./tests/bugs/replicate/bug-1408712.t Ravi 
./tests/basic/afr/replace-brick-self-heal.t TBD 
./tests/00-geo-rep/00-georep-verify-setup.t Kotresh 
./tests/basic/afr/gfid-mismatch-resolution-with-fav-child-policy.t Karthik 
./tests/basic/stats-dump.t TBD 
./tests/bugs/bug-1110262.t TBD 
./tests/basic/ec/ec-data-heal.t Mohit 
./tests/bugs/replicate/bug-1448804-check-quorum-type-values.t Pranith 
./tests/bugs/snapshot/bug-1482023-snpashot-issue-with-other-processes-accessing-mounted-path.t
 
TBD 
./tests/basic/ec/ec-5-2.t Sunil 
./tests/bugs/shard/bug-shard-discard.t TBD 
./tests/bugs/glusterd/remove-brick-testcases.t TBD 
./tests/bugs/protocol/bug-808400-repl.t TBD 
./tests/bugs/quick-read/bug-846240.t Du 
./tests/bugs/replicate/bug-1290965-detect-bitrotten-objects.t Mohit 
./tests/00-geo-rep/georep-basic-dr-tarssh.t Kotresh 
./tests/bugs/ec/bug-1236065.t Pranith 
./tests/00-geo-rep/georep-basic-dr-rsync.t Kotresh 
./tests/basic/ec/ec-1468261.t Ashish 
./tests/basic/afr/add-brick-self-heal.t Ravi 
./tests/basic/afr/granular-esh/replace-brick.t Pranith 
./tests/bugs/core/multiplex-limit-issue-151.t Sanju 
./tests/bugs/glusterd/validating-server-quorum.t Atin 
./tests/bugs/replicate/bug-1363721.t Ravi 
./tests/bugs/index/bug-1559004-EMLINK-handling.t Pranith 
./tests/bugs/replicate/bug-1433571-undo-pending-only-on-up-bricks.t Karthik 
./tests/bugs/glusterd/add-brick-and-validate-replicated-volume-options.t 
Atin 
./tests/bugs/glusterd/rebalance-operations-in-single-node.t TBD 
./tests/bugs/replicate/bug-1386188-sbrain-fav-child.t TBD 
./tests/bitrot/bug-1373520.t Kotresh 
./tests/bugs/distribute/bug-1117851.t Shyam/Nigel 
./tests/bugs/glusterd/quorum-validation.t Atin 
./tests/bugs/distribute/bug-1042725.t Shyam 
./tests/bugs/replicate/bug-1586020-mark-dirty-for-entry-txn-on-quorum-failure.t 
Karthik 
./tests/bugs/quota/bug-1293601.t TBD 
./tests/bugs/bug-1368312.t Du 
./tests/bugs/distribute/bug-1122443.t Du 
./tests/bugs/core/bug-1432542-mpx-restart-crash.t 1608568 Nithya/Shyam 

Thanks, 
Shyam 
___ 
Gluster-devel mailing list 
Gluster-devel@gluster.org 
https://lists.gluster.org/mailman/listinfo/gluster-devel 


___ 
Gluster-devel mailing list 
Gluster-devel@gluster.org 
https://lists.gluster.org/mailman/listinfo/gluster-devel 

___
Gluster-devel mailing list
Gluster-devel@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] Master branch lock down: RCA for tests (ec-1468261.t)

2018-08-12 Thread Ashish Pandey

RCA - https://lists.gluster.org/pipermail/gluster-devel/2018-August/055167.html 
Patch - https://review.gluster.org/#/c/glusterfs/+/20657/ should also fix this 
issue. 

Checking if we can put extra test to make sure bricks are connected to shd 
before heal begin. Will send a patch for that. 

--- 
Ashish 

- Original Message -

From: "Shyam Ranganathan"  
To: "Gluster Devel" , "GlusterFS Maintainers" 
 
Sent: Monday, August 13, 2018 6:12:59 AM 
Subject: Re: [Gluster-devel] Master branch lock down: RCA for tests 
(testname.t) 

As a means of keeping the focus going and squashing the remaining tests 
that were failing sporadically, request each test/component owner to, 

- respond to this mail changing the subject (testname.t) to the test 
name that they are responding to (adding more than one in case they have 
the same RCA) 
- with the current RCA and status of the same 

List of tests and current owners as per the spreadsheet that we were 
tracking are: 

./tests/basic/distribute/rebal-all-nodes-migrate.t TBD 
./tests/basic/tier/tier-heald.t TBD 
./tests/basic/afr/sparse-file-self-heal.t TBD 
./tests/bugs/shard/bug-1251824.t TBD 
./tests/bugs/shard/configure-lru-limit.t TBD 
./tests/bugs/replicate/bug-1408712.t Ravi 
./tests/basic/afr/replace-brick-self-heal.t TBD 
./tests/00-geo-rep/00-georep-verify-setup.t Kotresh 
./tests/basic/afr/gfid-mismatch-resolution-with-fav-child-policy.t Karthik 
./tests/basic/stats-dump.t TBD 
./tests/bugs/bug-1110262.t TBD 
./tests/basic/ec/ec-data-heal.t Mohit 
./tests/bugs/replicate/bug-1448804-check-quorum-type-values.t Pranith 
./tests/bugs/snapshot/bug-1482023-snpashot-issue-with-other-processes-accessing-mounted-path.t
 
TBD 
./tests/basic/ec/ec-5-2.t Sunil 
./tests/bugs/shard/bug-shard-discard.t TBD 
./tests/bugs/glusterd/remove-brick-testcases.t TBD 
./tests/bugs/protocol/bug-808400-repl.t TBD 
./tests/bugs/quick-read/bug-846240.t Du 
./tests/bugs/replicate/bug-1290965-detect-bitrotten-objects.t Mohit 
./tests/00-geo-rep/georep-basic-dr-tarssh.t Kotresh 
./tests/bugs/ec/bug-1236065.t Pranith 
./tests/00-geo-rep/georep-basic-dr-rsync.t Kotresh 
./tests/basic/ec/ec-1468261.t Ashish 
./tests/basic/afr/add-brick-self-heal.t Ravi 
./tests/basic/afr/granular-esh/replace-brick.t Pranith 
./tests/bugs/core/multiplex-limit-issue-151.t Sanju 
./tests/bugs/glusterd/validating-server-quorum.t Atin 
./tests/bugs/replicate/bug-1363721.t Ravi 
./tests/bugs/index/bug-1559004-EMLINK-handling.t Pranith 
./tests/bugs/replicate/bug-1433571-undo-pending-only-on-up-bricks.t Karthik 
./tests/bugs/glusterd/add-brick-and-validate-replicated-volume-options.t 
Atin 
./tests/bugs/glusterd/rebalance-operations-in-single-node.t TBD 
./tests/bugs/replicate/bug-1386188-sbrain-fav-child.t TBD 
./tests/bitrot/bug-1373520.t Kotresh 
./tests/bugs/distribute/bug-1117851.t Shyam/Nigel 
./tests/bugs/glusterd/quorum-validation.t Atin 
./tests/bugs/distribute/bug-1042725.t Shyam 
./tests/bugs/replicate/bug-1586020-mark-dirty-for-entry-txn-on-quorum-failure.t 
Karthik 
./tests/bugs/quota/bug-1293601.t TBD 
./tests/bugs/bug-1368312.t Du 
./tests/bugs/distribute/bug-1122443.t Du 
./tests/bugs/core/bug-1432542-mpx-restart-crash.t 1608568 Nithya/Shyam 

Thanks, 
Shyam 
___ 
Gluster-devel mailing list 
Gluster-devel@gluster.org 
https://lists.gluster.org/mailman/listinfo/gluster-devel 

___
Gluster-devel mailing list
Gluster-devel@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] [Gluster-Maintainers] Master branch lock down: RCA for tests ( ./tests/bugs/quick-read/bug-846240.t)

2018-08-12 Thread Raghavendra Gowdappa
Failure is tracked by bz:
https://bugzilla.redhat.com/show_bug.cgi?id=1615096



Earlier this test did following things on M0 and M1 mounted on same
volume:
1 create file  M0/testfile
2 open an fd on M0/testfile
3 remove the file from M1, M1/testfile
4 echo "data" >> M0/testfile

The test expects appending data to M0/testfile to fail. However,
redirector ">>" creates a file if it doesn't exist. So, the only
reason test succeeded was due to lookup succeeding due to stale stat
in md-cache. This hypothesis is verified by two experiments:
* Add a sleep of 10 seconds before append operation. md-cache cache
  expires and lookup fails followed by creation of file and hence append
  succeeds to new file.
* set md-cache timeout to 600 seconds and test never fails even with
  sleep 10 before append operation. Reason is stale stat in md-cache
  survives sleep 10.

So, the spurious nature of failure was dependent on whether lookup is
done when stat is present in md-cache or not.

The actual test should've been to write to the fd opened in step 2
above. I've changed the test accordingly. Note that this patch also
remounts M0 after initial file creation as open-behind disables
opening-behind on witnessing a setattr on the inode and touch involves
a setattr. On remount, create operation is not done and hence file is
opened-behind.



Fix submitted at: https://review.gluster.org/#/c/glusterfs/+/20710/

regards,
Raghavendra

On Mon, Aug 13, 2018 at 6:12 AM, Shyam Ranganathan 
wrote:

> As a means of keeping the focus going and squashing the remaining tests
> that were failing sporadically, request each test/component owner to,
>
> - respond to this mail changing the subject (testname.t) to the test
> name that they are responding to (adding more than one in case they have
> the same RCA)
> - with the current RCA and status of the same
>
> List of tests and current owners as per the spreadsheet that we were
> tracking are:
>
> ./tests/basic/distribute/rebal-all-nodes-migrate.t  TBD
> ./tests/basic/tier/tier-heald.t TBD
> ./tests/basic/afr/sparse-file-self-heal.t   TBD
> ./tests/bugs/shard/bug-1251824.tTBD
> ./tests/bugs/shard/configure-lru-limit.tTBD
> ./tests/bugs/replicate/bug-1408712.tRavi
> ./tests/basic/afr/replace-brick-self-heal.t TBD
> ./tests/00-geo-rep/00-georep-verify-setup.t Kotresh
> ./tests/basic/afr/gfid-mismatch-resolution-with-fav-child-policy.t Karthik
> ./tests/basic/stats-dump.t  TBD
> ./tests/bugs/bug-1110262.t  TBD
> ./tests/basic/ec/ec-data-heal.t Mohit
> ./tests/bugs/replicate/bug-1448804-check-quorum-type-values.t
>  Pranith
> ./tests/bugs/snapshot/bug-1482023-snpashot-issue-with-
> other-processes-accessing-mounted-path.t
> TBD
> ./tests/basic/ec/ec-5-2.t   Sunil
> ./tests/bugs/shard/bug-shard-discard.t  TBD
> ./tests/bugs/glusterd/remove-brick-testcases.t  TBD
> ./tests/bugs/protocol/bug-808400-repl.t TBD
> ./tests/bugs/quick-read/bug-846240.tDu
> ./tests/bugs/replicate/bug-1290965-detect-bitrotten-objects.t
>  Mohit
> ./tests/00-geo-rep/georep-basic-dr-tarssh.t Kotresh
> ./tests/bugs/ec/bug-1236065.t   Pranith
> ./tests/00-geo-rep/georep-basic-dr-rsync.t  Kotresh
> ./tests/basic/ec/ec-1468261.t   Ashish
> ./tests/basic/afr/add-brick-self-heal.t Ravi
> ./tests/basic/afr/granular-esh/replace-brick.t  Pranith
> ./tests/bugs/core/multiplex-limit-issue-151.t   Sanju
> ./tests/bugs/glusterd/validating-server-quorum.tAtin
> ./tests/bugs/replicate/bug-1363721.tRavi
> ./tests/bugs/index/bug-1559004-EMLINK-handling.tPranith
> ./tests/bugs/replicate/bug-1433571-undo-pending-only-on-up-bricks.t
>Karthik
> ./tests/bugs/glusterd/add-brick-and-validate-replicated-volume-options.t
> Atin
> ./tests/bugs/glusterd/rebalance-operations-in-single-node.t
>  TBD
> ./tests/bugs/replicate/bug-1386188-sbrain-fav-child.t   TBD
> ./tests/bitrot/bug-1373520.tKotresh
> ./tests/bugs/distribute/bug-1117851.t   Shyam/Nigel
> ./tests/bugs/glusterd/quorum-validation.t   Atin
> ./tests/bugs/distribute/bug-1042725.t   Shyam
> ./tests/bugs/replicate/bug-1586020-mark-dirty-for-entry-
> txn-on-quorum-failure.t
> Karthik
> ./tests/bugs/quota/bug-1293601.tTBD
> ./tests/bugs/bug-1368312.t  Du
> ./tests/bugs/distribute/bug-1122443.t   Du
> ./tests/bugs/core/bug-1432542-mpx-restart-crash.t   1608568
> Nithya/Shyam
>
> Thanks,
> Shyam
> ___
> maintainers mailing list
> maintain...@gluster.org
> https://lists.gluster.org/mailman/listinfo/maintainers
>
___
Gluster-devel mailing list
Gluster-devel@gluster.org
https://lists.gluster.org/mai

Re: [Gluster-devel] [Gluster-Maintainers] Master branch lock down: RCA for tests (bugs/distribute/bug-1122443.t)

2018-08-12 Thread Raghavendra Gowdappa
Initial RCA to point out commit 7131de81f72dda0ef685ed60d0887c6e14289b8c
caused the issue was done by Nithya. Following was the conversation:



With the latest master, I created a single brick volume and some files
inside it.

[root@rhgs313-6 ~]# umount -f /mnt/fuse1; mount -t glusterfs -s
192.168.122.6:/thunder /mnt/fuse1; ls -l /mnt/fuse1/; echo "Trying
again"; ls -l /mnt/fuse1
umount: /mnt/fuse1: not mounted
total 0
--. 0 root root 0 Jan  1  1970 file-1
--. 0 root root 0 Jan  1  1970 file-2
--. 0 root root 0 Jan  1  1970 file-3
--. 0 root root 0 Jan  1  1970 file-4
--. 0 root root 0 Jan  1  1970 file-5
d-. 0 root root 0 Jan  1  1970 subdir
Trying again
total 3
-rw-r--r--. 1 root root 33 Aug  3 14:06 file-1
-rw-r--r--. 1 root root 33 Aug  3 14:06 file-2
-rw-r--r--. 1 root root 33 Aug  3 14:06 file-3
-rw-r--r--. 1 root root 33 Aug  3 14:06 file-4
-rw-r--r--. 1 root root 33 Aug  3 14:06 file-5
d-. 0 root root  0 Jan  1  1970 subdir
[root@rhgs313-6 ~]#

Conversation can be followed on gluster-devel on thread with subj:
tests/bugs/distribute/bug-1122443.t - spurious failure. git-bisected
pointed this patch as culprit.


commit 7131de81f72dda0ef685ed60d0887c6e14289b8c zeroed out all members of
iatt except for ia_gfid and ia_type in certain scenarios (one case that led
to this bug was when a fresh inode - not linked - was picked up by
readdirplus). This led to fuse_readdirp_cbk to wrongly think it has a valid
stat (due to valid ia_gfid and ia_type) and give to kernel zeroed out
attributes causing failures. Fix is included in
https://review.gluster.org/20639 to make sure to let kernel know attributes
are not valid in this scenario (and not zero out stats even if inode picked
up by readdirplus is not linked yet).

regards,
Raghavendra

On Mon, Aug 13, 2018 at 6:12 AM, Shyam Ranganathan 
wrote:

> As a means of keeping the focus going and squashing the remaining tests
> that were failing sporadically, request each test/component owner to,
>
> - respond to this mail changing the subject (testname.t) to the test
> name that they are responding to (adding more than one in case they have
> the same RCA)
> - with the current RCA and status of the same
>
> List of tests and current owners as per the spreadsheet that we were
> tracking are:
>
> ./tests/basic/distribute/rebal-all-nodes-migrate.t  TBD
> ./tests/basic/tier/tier-heald.t TBD
> ./tests/basic/afr/sparse-file-self-heal.t   TBD
> ./tests/bugs/shard/bug-1251824.tTBD
> ./tests/bugs/shard/configure-lru-limit.tTBD
> ./tests/bugs/replicate/bug-1408712.tRavi
> ./tests/basic/afr/replace-brick-self-heal.t TBD
> ./tests/00-geo-rep/00-georep-verify-setup.t Kotresh
> ./tests/basic/afr/gfid-mismatch-resolution-with-fav-child-policy.t Karthik
> ./tests/basic/stats-dump.t  TBD
> ./tests/bugs/bug-1110262.t  TBD
> ./tests/basic/ec/ec-data-heal.t Mohit
> ./tests/bugs/replicate/bug-1448804-check-quorum-type-values.t
>  Pranith
> ./tests/bugs/snapshot/bug-1482023-snpashot-issue-with-
> other-processes-accessing-mounted-path.t
> TBD
> ./tests/basic/ec/ec-5-2.t   Sunil
> ./tests/bugs/shard/bug-shard-discard.t  TBD
> ./tests/bugs/glusterd/remove-brick-testcases.t  TBD
> ./tests/bugs/protocol/bug-808400-repl.t TBD
> ./tests/bugs/quick-read/bug-846240.tDu
> ./tests/bugs/replicate/bug-1290965-detect-bitrotten-objects.t
>  Mohit
> ./tests/00-geo-rep/georep-basic-dr-tarssh.t Kotresh
> ./tests/bugs/ec/bug-1236065.t   Pranith
> ./tests/00-geo-rep/georep-basic-dr-rsync.t  Kotresh
> ./tests/basic/ec/ec-1468261.t   Ashish
> ./tests/basic/afr/add-brick-self-heal.t Ravi
> ./tests/basic/afr/granular-esh/replace-brick.t  Pranith
> ./tests/bugs/core/multiplex-limit-issue-151.t   Sanju
> ./tests/bugs/glusterd/validating-server-quorum.tAtin
> ./tests/bugs/replicate/bug-1363721.tRavi
> ./tests/bugs/index/bug-1559004-EMLINK-handling.tPranith
> ./tests/bugs/replicate/bug-1433571-undo-pending-only-on-up-bricks.t
>Karthik
> ./tests/bugs/glusterd/add-brick-and-validate-replicated-volume-options.t
> Atin
> ./tests/bugs/glusterd/rebalance-operations-in-single-node.t
>  TBD
> ./tests/bugs/replicate/bug-1386188-sbrain-fav-child.t   TBD
> ./tests/bitrot/bug-1373520.tKotresh
> ./tests/bugs/distribute/bug-1117851.t   Shyam/Nigel
> ./tests/bugs/glusterd/quorum-validation.t   Atin
> ./tests/bugs/distribute/bug-1042725.t   Shyam
> ./tests/bugs/replicate/bug-1586020-mark-dirty-for-entry-
> txn-on-quorum-failure.t
> Karthik
> ./tests/bugs/quota/bug-1293601.tTBD
> ./tests/bugs/bug-1368312.t  Du
> ./tests/bugs/distribute/bug-1122443.t   Du
> ./

Re: [Gluster-devel] gluster fuse comsumes huge memory

2018-08-12 Thread huting3






https://drive.google.com/file/d/1ZlttTzt4E56Qtk9j7b4I9GkZC2W3mJgp/view?usp=sharingHi exports:    I upload the statedump file to the google drive, could you please help me find what is the cause of gluster fuse comsuming huge memory? Thank you!
huting3
huti...@corp.netease.com


签名由
网易邮箱大师
定制

  



On 08/9/2018 13:46,huting3 wrote: 







I upload the dump file as the attchment.



   










huting3







huti...@corp.netease.com








签名由
网易邮箱大师
定制

  



On 08/9/2018 13:30,huting3 wrote: 






em, the data set is complicated. There are many big files as well as small files. There is about 50T data in gluster server, so I do not know how many files in the dataset exactly. Can the inode cache consume so huge memory? How can I limit the inode cache?ps:$ grep itable glusterdump.109182.dump.1533730324 | grep lru | wc -l191728When I dump the process info, the fuse process consumed about 30G memory.




   










huting3







huti...@corp.netease.com








签名由
网易邮箱大师
定制

  



On 08/9/2018 13:13,Raghavendra Gowdappa wrote: 


On Thu, Aug 9, 2018 at 10:36 AM, huting3  wrote:







grep count will ouput nothing, so I grep size, the results are:$ grep itable glusterdump.109182.dump.1533730324 | grep lru | grep sizexlator.mount.fuse.itable.lru_size=191726Kernel is holding too many inodes in its cache. What's the data set like? Do you've too many directories? How many files do you have? $ grep itable glusterdump.109182.dump.1533730324 | grep active | grep sizexlator.mount.fuse.itable.active_size=17
huting3
huti...@corp.netease.com


签名由
网易邮箱大师
定制

  



On 08/9/2018 12:36,Raghavendra Gowdappa wrote: 


Can you get the output of following cmds?# grep itable  | grep lru | grep count# grep itable  | grep active | grep countOn Thu, Aug 9, 2018 at 9:25 AM, huting3  wrote:







Yes, I got the dump file and found there are many huge num_allocs just like following:I found memusage of 4 variable types are extreamly huge. [protocol/client.gv0-client-0 - usage-type gf_common_mt_char memusage]size=47202352num_allocs=2030212max_size=47203074max_num_allocs=2030235total_allocs=26892201[protocol/client.gv0-client-0 - usage-type gf_common_mt_memdup memusage]size=24362448num_allocs=2030204max_size=24367560max_num_allocs=2030226total_allocs=17830860[mount/fuse.fuse - usage-type gf_common_mt_inode_ctx memusage]size=2497947552num_allocs=4578229max_size=2459135680max_num_allocs=7123206total_allocs=41635232[mount/fuse.fuse - usage-type gf_fuse_mt_iov_base memusage]size=4038730976num_allocs=1max_size=4294962264max_num_allocs=37total_allocs=150049981
huting3
huti...@corp.netease.com


签名由
网易邮箱大师
定制

  



On 08/9/2018 11:36,Raghavendra Gowdappa wrote: 


On Thu, Aug 9, 2018 at 8:55 AM, huting3  wrote:







Hi expert:I meet a problem when I use glusterfs. The problem is that the fuse client consumes huge memory when write a   lot of files(>million) to the gluster, at last leading to killed by OS oom. The memory the fuse process consumes can up to 100G! I wonder if there are memory leaks in the gluster fuse process, or some other causes.Can you get statedump of fuse process consuming huge memory? My gluster version is 3.13.2, the gluster volume info is listed as following:Volume Name: gv0Type: Distributed-ReplicateVolume ID: 4a6f96f8-b3fb-4550-bd19-e1a5dffad4d0Status: StartedSnapshot Count: 0Number of Bricks: 19 x 3 = 57Transport-type: tcpBricks:B

Re: [Gluster-devel] [Gluster-Maintainers] Master branch lock down: RCA for tests (bug-1368312.t)

2018-08-12 Thread Raghavendra Gowdappa
Failure of this test is tracked by bz https://bugzilla.redhat.com/
show_bug.cgi?id=1608158.



I was trying to debug regression failures on [1] and observed that
split-brain-resolution.t was failing consistently.

=
TEST 45 (line 88): 0 get_pending_heal_count patchy
./tests/basic/afr/split-brain-resolution.t .. 45/45 RESULT 45: 1
./tests/basic/afr/split-brain-resolution.t .. Failed 17/45 subtests

Test Summary Report
---
./tests/basic/afr/split-brain-resolution.t (Wstat: 0 Tests: 45 Failed: 17)
  Failed tests:  24-26, 28-36, 41-45


On probing deeper, I observed a curious fact - on most of the failures
stat was not served from md-cache, but instead was wound down to afr
which failed stat with EIO as the file was in split brain. So, I did
another test:
* disabled md-cache
* mount glusterfs with attribute-timeout 0 and entry-timeout 0

Now the test fails always. So, I think the test relied on stat
requests being absorbed either by kernel attribute cache or md-cache.
When its not happening stats are reaching afr and resulting in
failures of cmds like getfattr etc. Thoughts?

[1] https://review.gluster.org/#/c/20549/
tests/basic/afr/split-brain-resolution.t:
tests/bugs/bug-1368312.t:
tests/bugs/replicate/bug-1238398-split-brain-resolution.t:
tests/bugs/replicate/bug-1417522-block-split-brain-resolution.t

Discussion on this topic can be found on gluster-devel with subj:
regression failures on afr/split-brain-resolution



regards,
Raghavendra



On Mon, Aug 13, 2018 at 6:12 AM, Shyam Ranganathan 
wrote:

> As a means of keeping the focus going and squashing the remaining tests
> that were failing sporadically, request each test/component owner to,
>
> - respond to this mail changing the subject (testname.t) to the test
> name that they are responding to (adding more than one in case they have
> the same RCA)
> - with the current RCA and status of the same
>
> List of tests and current owners as per the spreadsheet that we were
> tracking are:
>
> ./tests/basic/distribute/rebal-all-nodes-migrate.t  TBD
> ./tests/basic/tier/tier-heald.t TBD
> ./tests/basic/afr/sparse-file-self-heal.t   TBD
> ./tests/bugs/shard/bug-1251824.tTBD
> ./tests/bugs/shard/configure-lru-limit.tTBD
> ./tests/bugs/replicate/bug-1408712.tRavi
> ./tests/basic/afr/replace-brick-self-heal.t TBD
> ./tests/00-geo-rep/00-georep-verify-setup.t Kotresh
> ./tests/basic/afr/gfid-mismatch-resolution-with-fav-child-policy.t Karthik
> ./tests/basic/stats-dump.t  TBD
> ./tests/bugs/bug-1110262.t  TBD
> ./tests/basic/ec/ec-data-heal.t Mohit
> ./tests/bugs/replicate/bug-1448804-check-quorum-type-values.t
>  Pranith
> ./tests/bugs/snapshot/bug-1482023-snpashot-issue-with-
> other-processes-accessing-mounted-path.t
> TBD
> ./tests/basic/ec/ec-5-2.t   Sunil
> ./tests/bugs/shard/bug-shard-discard.t  TBD
> ./tests/bugs/glusterd/remove-brick-testcases.t  TBD
> ./tests/bugs/protocol/bug-808400-repl.t TBD
> ./tests/bugs/quick-read/bug-846240.tDu
> ./tests/bugs/replicate/bug-1290965-detect-bitrotten-objects.t
>  Mohit
> ./tests/00-geo-rep/georep-basic-dr-tarssh.t Kotresh
> ./tests/bugs/ec/bug-1236065.t   Pranith
> ./tests/00-geo-rep/georep-basic-dr-rsync.t  Kotresh
> ./tests/basic/ec/ec-1468261.t   Ashish
> ./tests/basic/afr/add-brick-self-heal.t Ravi
> ./tests/basic/afr/granular-esh/replace-brick.t  Pranith
> ./tests/bugs/core/multiplex-limit-issue-151.t   Sanju
> ./tests/bugs/glusterd/validating-server-quorum.tAtin
> ./tests/bugs/replicate/bug-1363721.tRavi
> ./tests/bugs/index/bug-1559004-EMLINK-handling.tPranith
> ./tests/bugs/replicate/bug-1433571-undo-pending-only-on-up-bricks.t
>Karthik
> ./tests/bugs/glusterd/add-brick-and-validate-replicated-volume-options.t
> Atin
> ./tests/bugs/glusterd/rebalance-operations-in-single-node.t
>  TBD
> ./tests/bugs/replicate/bug-1386188-sbrain-fav-child.t   TBD
> ./tests/bitrot/bug-1373520.tKotresh
> ./tests/bugs/distribute/bug-1117851.t   Shyam/Nigel
> ./tests/bugs/glusterd/quorum-validation.t   Atin
> ./tests/bugs/distribute/bug-1042725.t   Shyam
> ./tests/bugs/replicate/bug-1586020-mark-dirty-for-entry-
> txn-on-quorum-failure.t
> Karthik
> ./tests/bugs/quota/bug-1293601.tTBD
> ./tests/bugs/bug-1368312.t  Du
> ./tests/bugs/distribute/bug-1122443.t   Du
> ./tests/bugs/core/bug-1432542-mpx-restart-crash.t   1608568
> Nithya/Shyam
>
> Thanks,
> Shyam
> ___
> maintainers mailing list
> maintain...@gluster.org
> https://lists.gluster.org/mailman/listinfo/maintainers
>
___
Gluster-devel mailing list
Gluster-devel@gluster.org
https://lists.gluster.org/mailman/listinfo/glus

Re: [Gluster-devel] [Gluster-Maintainers] bug-1368312.t

2018-08-12 Thread Raghavendra Gowdappa
Failure of this test is tracked by bz
https://bugzilla.redhat.com/show_bug.cgi?id=1608158.



I was trying to debug regression failures on [1] and observed that
split-brain-resolution.t was failing consistently.

=
TEST 45 (line 88): 0 get_pending_heal_count patchy
./tests/basic/afr/split-brain-resolution.t .. 45/45 RESULT 45: 1
./tests/basic/afr/split-brain-resolution.t .. Failed 17/45 subtests

Test Summary Report
---
./tests/basic/afr/split-brain-resolution.t (Wstat: 0 Tests: 45 Failed: 17)
  Failed tests:  24-26, 28-36, 41-45


On probing deeper, I observed a curious fact - on most of the failures
stat was not served from md-cache, but instead was wound down to afr
which failed stat with EIO as the file was in split brain. So, I did
another test:
* disabled md-cache
* mount glusterfs with attribute-timeout 0 and entry-timeout 0

Now the test fails always. So, I think the test relied on stat
requests being absorbed either by kernel attribute cache or md-cache.
When its not happening stats are reaching afr and resulting in
failures of cmds like getfattr etc. Thoughts?

[1] https://review.gluster.org/#/c/20549/
tests/basic/afr/split-brain-resolution.t:
tests/bugs/bug-1368312.t:
tests/bugs/replicate/bug-1238398-split-brain-resolution.t:
tests/bugs/replicate/bug-1417522-block-split-brain-resolution.t

Discussion on this topic can be found on gluster-devel with subj:
regression failures on afr/split-brain-resolution



regards,
Raghavendra
___
Gluster-devel mailing list
Gluster-devel@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-devel

[Gluster-devel] Weekly Untriaged Bugs

2018-08-12 Thread jenkins
[...truncated 6 lines...]
https://bugzilla.redhat.com/1601356 / core: Problem with SSL/TLS encryption on 
Gluster 4.0 & 4.1
https://bugzilla.redhat.com/1605066 / disperse: RFE: Need to optimize on time 
taken by heal info to display o/p when large number of entries exist
https://bugzilla.redhat.com/1614310 / distribute: Lookup/find on nfs(using 
ganesha) mount failed for a directory which was recreated
https://bugzilla.redhat.com/1608305 / fuse: [glusterfs-3.6.9] Fuse-mount has 
been forced off
https://bugzilla.redhat.com/1612615 / fuse: glusterfs-client on armhf crashes 
writing files to disperse volumes
https://bugzilla.redhat.com/1603576 / fuse: glusterfs dying with SIGSEGV
https://bugzilla.redhat.com/1611945 / geo-replication: [geo-rep] Cannot 
configure ssh_port and ssh_command options in glusterfs georeplication
https://bugzilla.redhat.com/1613512 / glusterd: Backport glusterfs-client 
memory leak fix to 3.12.x
https://bugzilla.redhat.com/1602824 / libgfapi: SMBD crashes when streams_attr 
VFS is used with Gluster VFS
https://bugzilla.redhat.com/1614769 / packaging: Typo in build flag
https://bugzilla.redhat.com/1612655 / project-infrastructure: bugziller doesn't 
get an updated /opt/qa automatically
https://bugzilla.redhat.com/1611635 / project-infrastructure: infra: softserve 
machines, regression tests fails
https://bugzilla.redhat.com/1614145 / project-infrastructure: Install buildah 
in builders20 to builder 29
https://bugzilla.redhat.com/1613721 / project-infrastructure: regression logs 
not available
https://bugzilla.redhat.com/1614631 / project-infrastructure: Spurious smoke 
failure in build rpms
https://bugzilla.redhat.com/1609363 / project-infrastructure: the comment on 
github job should post full commit message to issue.
https://bugzilla.redhat.com/1611546 / rpc: Log file glustershd.log being filled 
with errors
https://bugzilla.redhat.com/1612617 / selfheal: glustershd on armhf crashes on 
disperse volumes
https://bugzilla.redhat.com/1615083 / stripe: generate_file_traditional() 
needlessly memset an array before writing into it
https://bugzilla.redhat.com/1615003 / tests: Not getting logs if test case is 
time out on build server
https://bugzilla.redhat.com/1613841 / website: Bare domain doesn't get SSL
https://bugzilla.redhat.com/1613842 / website: Bare domain doesn't get SSL
[...truncated 2 lines...]

build.log
Description: Binary data
___
Gluster-devel mailing list
Gluster-devel@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] Master branch lock down: RCA for tests (testname.t)

2018-08-12 Thread Shyam Ranganathan
As a means of keeping the focus going and squashing the remaining tests
that were failing sporadically, request each test/component owner to,

- respond to this mail changing the subject (testname.t) to the test
name that they are responding to (adding more than one in case they have
the same RCA)
- with the current RCA and status of the same

List of tests and current owners as per the spreadsheet that we were
tracking are:

./tests/basic/distribute/rebal-all-nodes-migrate.t  TBD
./tests/basic/tier/tier-heald.t TBD
./tests/basic/afr/sparse-file-self-heal.t   TBD
./tests/bugs/shard/bug-1251824.tTBD
./tests/bugs/shard/configure-lru-limit.tTBD
./tests/bugs/replicate/bug-1408712.tRavi
./tests/basic/afr/replace-brick-self-heal.t TBD
./tests/00-geo-rep/00-georep-verify-setup.t Kotresh
./tests/basic/afr/gfid-mismatch-resolution-with-fav-child-policy.t Karthik
./tests/basic/stats-dump.t  TBD
./tests/bugs/bug-1110262.t  TBD
./tests/basic/ec/ec-data-heal.t Mohit
./tests/bugs/replicate/bug-1448804-check-quorum-type-values.t   Pranith
./tests/bugs/snapshot/bug-1482023-snpashot-issue-with-other-processes-accessing-mounted-path.t
TBD
./tests/basic/ec/ec-5-2.t   Sunil
./tests/bugs/shard/bug-shard-discard.t  TBD
./tests/bugs/glusterd/remove-brick-testcases.t  TBD
./tests/bugs/protocol/bug-808400-repl.t TBD
./tests/bugs/quick-read/bug-846240.tDu
./tests/bugs/replicate/bug-1290965-detect-bitrotten-objects.t   Mohit
./tests/00-geo-rep/georep-basic-dr-tarssh.t Kotresh
./tests/bugs/ec/bug-1236065.t   Pranith
./tests/00-geo-rep/georep-basic-dr-rsync.t  Kotresh
./tests/basic/ec/ec-1468261.t   Ashish
./tests/basic/afr/add-brick-self-heal.t Ravi
./tests/basic/afr/granular-esh/replace-brick.t  Pranith
./tests/bugs/core/multiplex-limit-issue-151.t   Sanju
./tests/bugs/glusterd/validating-server-quorum.tAtin
./tests/bugs/replicate/bug-1363721.tRavi
./tests/bugs/index/bug-1559004-EMLINK-handling.tPranith
./tests/bugs/replicate/bug-1433571-undo-pending-only-on-up-bricks.t 
Karthik
./tests/bugs/glusterd/add-brick-and-validate-replicated-volume-options.t
Atin
./tests/bugs/glusterd/rebalance-operations-in-single-node.t TBD
./tests/bugs/replicate/bug-1386188-sbrain-fav-child.t   TBD
./tests/bitrot/bug-1373520.tKotresh
./tests/bugs/distribute/bug-1117851.t   Shyam/Nigel
./tests/bugs/glusterd/quorum-validation.t   Atin
./tests/bugs/distribute/bug-1042725.t   Shyam
./tests/bugs/replicate/bug-1586020-mark-dirty-for-entry-txn-on-quorum-failure.t
Karthik
./tests/bugs/quota/bug-1293601.tTBD
./tests/bugs/bug-1368312.t  Du
./tests/bugs/distribute/bug-1122443.t   Du
./tests/bugs/core/bug-1432542-mpx-restart-crash.t   1608568 Nithya/Shyam

Thanks,
Shyam
___
Gluster-devel mailing list
Gluster-devel@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-devel


[Gluster-devel] Master branch lock down for stabilization (unlocking the same)

2018-08-12 Thread Shyam Ranganathan
Hi,

So we have had master locked down for a week to ensure we only get fixes
for failing tests in order to stabilize the code base, partly for
release-5 branching as well.

As of this weekend, we (Atin and myself) have been looking at the
pass/fail rates on the tests, and whether we are discovering newer
failures of more of the same.

Our runs with patch sets 10->11->12 is looking better than where we
started, and we have a list of tests that we need to still fix.

But there are other issues and fixes that are needed in the code that
are lagging behind due to the lock down. The plan going forward is as
follows,

- Unlock master, and ensure that we do not start seeing newer failures
as we merge other patches in, if so raise them on the lists and as bugs
and let's work towards ensuring these are addressed. *Maintainers*
please pay special attention when merging patches.

- Address the current pending set of tests that have been identified as
failing, over the course of the next 2 weeks. *Contributors* continue
the focus here, so that we do not have to end up with another drive
towards the same in 2 weeks.

- At the end of 2 weeks, reassess master and nightly test status, and
see if we need another drive towards stabilizing master by locking down
the same and focusing only on test and code stability around the same.

Atin and Shyam
___
Gluster-devel mailing list
Gluster-devel@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] Master branch lock down status (Aug 12th, 2018) (patchset 12)

2018-08-12 Thread Shyam Ranganathan
Patch set 12 results:

./tests/bugs/glusterd/quorum-validation.t (3 retries, 1 core)
./tests/bugs/glusterd/validating-server-quorum.t (1 core)
(NEW) ./tests/basic/distribute/rebal-all-nodes-migrate.t (1 retry)
./tests/basic/stats-dump.t (1 retry)
./tests/bugs/shard/bug-1251824.t (1 retry)
./tests/basic/ec/ec-5-2.t (1 core)
(NEW) ./tests/basic/tier/tier-heald.t (1 core) (Looks similar to,
./tests/bugs/glusterd/remove-brick-testcases.t (run: lcov#432))

Sheet updated here:
https://docs.google.com/spreadsheets/d/1IF9GhpKah4bto19RQLr0y_Kkw26E_-crKALHSaSjZMQ/edit#gid=522127663

Gerrit comment here:
https://review.gluster.org/c/glusterfs/+/20637/12#message-186adbee76d6999385022239cb2daba589f0a81f

Shyam
On 08/07/2018 07:37 PM, Shyam Ranganathan wrote:
> Deserves a new beginning, threads on the other mail have gone deep enough.
> 
> NOTE: (5) below needs your attention, rest is just process and data on
> how to find failures.
> 
> 1) We are running the tests using the patch [2].
> 
> 2) Run details are extracted into a separate sheet in [3] named "Run
> Failures" use a search to find a failing test and the corresponding run
> that it failed in.
> 
> 3) Patches that are fixing issues can be found here [1], if you think
> you have a patch out there, that is not in this list, shout out.
> 
> 4) If you own up a test case failure, update the spreadsheet [3] with
> your name against the test, and also update other details as needed (as
> comments, as edit rights to the sheet are restricted).
> 
> 5) Current test failures
> We still have the following tests failing and some without any RCA or
> attention, (If something is incorrect, write back).
> 
> ./tests/bugs/replicate/bug-1290965-detect-bitrotten-objects.t (needs
> attention)
> ./tests/00-geo-rep/georep-basic-dr-tarssh.t (Kotresh)
> ./tests/bugs/glusterd/add-brick-and-validate-replicated-volume-options.t
> (Atin)
> ./tests/bugs/ec/bug-1236065.t (Ashish)
> ./tests/00-geo-rep/georep-basic-dr-rsync.t (Kotresh)
> ./tests/basic/ec/ec-1468261.t (needs attention)
> ./tests/basic/afr/add-brick-self-heal.t (needs attention)
> ./tests/basic/afr/granular-esh/replace-brick.t (needs attention)
> ./tests/bugs/core/multiplex-limit-issue-151.t (needs attention)
> ./tests/bugs/glusterd/validating-server-quorum.t (Atin)
> ./tests/bugs/replicate/bug-1363721.t (Ravi)
> 
> Here are some newer failures, but mostly one-off failures except cores
> in ec-5-2.t. All of the following need attention as these are new.
> 
> ./tests/00-geo-rep/00-georep-verify-setup.t
> ./tests/basic/afr/gfid-mismatch-resolution-with-fav-child-policy.t
> ./tests/basic/stats-dump.t
> ./tests/bugs/bug-1110262.t
> ./tests/bugs/glusterd/mgmt-handshake-and-volume-sync-post-glusterd-restart.t
> ./tests/basic/ec/ec-data-heal.t
> ./tests/bugs/replicate/bug-1448804-check-quorum-type-values.t
> ./tests/bugs/snapshot/bug-1482023-snpashot-issue-with-other-processes-accessing-mounted-path.t
> ./tests/basic/ec/ec-5-2.t
> 
> 6) Tests that are addressed or are not occurring anymore are,
> 
> ./tests/bugs/glusterd/rebalance-operations-in-single-node.t
> ./tests/bugs/index/bug-1559004-EMLINK-handling.t
> ./tests/bugs/replicate/bug-1386188-sbrain-fav-child.t
> ./tests/bugs/replicate/bug-1433571-undo-pending-only-on-up-bricks.t
> ./tests/bitrot/bug-1373520.t
> ./tests/bugs/distribute/bug-1117851.t
> ./tests/bugs/glusterd/quorum-validation.t
> ./tests/bugs/distribute/bug-1042725.t
> ./tests/bugs/replicate/bug-1586020-mark-dirty-for-entry-txn-on-quorum-failure.t
> ./tests/bugs/quota/bug-1293601.t
> ./tests/bugs/bug-1368312.t
> ./tests/bugs/distribute/bug-1122443.t
> ./tests/bugs/core/bug-1432542-mpx-restart-crash.t
> 
> Shyam (and Atin)
> 
> On 08/05/2018 06:24 PM, Shyam Ranganathan wrote:
>> Health on master as of the last nightly run [4] is still the same.
>>
>> Potential patches that rectify the situation (as in [1]) are bunched in
>> a patch [2] that Atin and myself have put through several regressions
>> (mux, normal and line coverage) and these have also not passed.
>>
>> Till we rectify the situation we are locking down master branch commit
>> rights to the following people, Amar, Atin, Shyam, Vijay.
>>
>> The intention is to stabilize master and not add more patches that my
>> destabilize it.
>>
>> Test cases that are tracked as failures and need action are present here
>> [3].
>>
>> @Nigel, request you to apply the commit rights change as you see this
>> mail and let the list know regarding the same as well.
>>
>> Thanks,
>> Shyam
>>
>> [1] Patches that address regression failures:
>> https://review.gluster.org/#/q/starredby:srangana%2540redhat.com
>>
>> [2] Bunched up patch against which regressions were run:
>> https://review.gluster.org/#/c/20637
>>
>> [3] Failing tests list:
>> https://docs.google.com/spreadsheets/d/1IF9GhpKah4bto19RQLr0y_Kkw26E_-crKALHSaSjZMQ/edit?usp=sharing
>>
>> [4] Nightly run dashboard: https://build.gluster.org/job/nightly-master/
> ___
> Gluster-devel m

Re: [Gluster-devel] Master branch lock down status (Patch set 11, Aug 12, 2018)

2018-08-12 Thread Shyam Ranganathan
Patch set 11 report:

line coverage: 4/8 PASS, 7/8 with retries, 1 core
CentOS regression: 5/8 PASS, 8/8 PASS-With-RETRIES
Mux regression: 7/8 PASS, 1 core

No NEW failures, sheet [1] updated with run details, and so is the WIP
patch with the same data [2].

Cores:
- ./tests/bugs/glusterd/validating-server-quorum.t
- ./tests/basic/ec/ec-5-2.t

Other retries/failures:
- ./tests/bugs/shard/bug-shard-discard.t
- ./tests/basic/afr/replace-brick-self-heal.t
- ./tests/bugs/core/multiplex-limit-issue-151.t
- ./tests/00-geo-rep/georep-basic-dr-tarssh.t
- ./tests/bugs/core/bug-1432542-mpx-restart-crash.t
- ./tests/bugs/shard/configure-lru-limit.t
- ./tests/bugs/glusterd/quorum-validation.t


[1] Sheet with failure and run data:
https://docs.google.com/spreadsheets/d/1IF9GhpKah4bto19RQLr0y_Kkw26E_-crKALHSaSjZMQ/edit#gid=1434742898

[2] Gerrit comment with the same information:
https://review.gluster.org/c/glusterfs/+/20637/12#message-1f8f94aaa88be276229f20eb25a650381bc37543
On 08/07/2018 07:37 PM, Shyam Ranganathan wrote:
> Deserves a new beginning, threads on the other mail have gone deep enough.
> 
> NOTE: (5) below needs your attention, rest is just process and data on
> how to find failures.
> 
> 1) We are running the tests using the patch [2].
> 
> 2) Run details are extracted into a separate sheet in [3] named "Run
> Failures" use a search to find a failing test and the corresponding run
> that it failed in.
> 
> 3) Patches that are fixing issues can be found here [1], if you think
> you have a patch out there, that is not in this list, shout out.
> 
> 4) If you own up a test case failure, update the spreadsheet [3] with
> your name against the test, and also update other details as needed (as
> comments, as edit rights to the sheet are restricted).
> 
> 5) Current test failures
> We still have the following tests failing and some without any RCA or
> attention, (If something is incorrect, write back).
> 
> ./tests/bugs/replicate/bug-1290965-detect-bitrotten-objects.t (needs
> attention)
> ./tests/00-geo-rep/georep-basic-dr-tarssh.t (Kotresh)
> ./tests/bugs/glusterd/add-brick-and-validate-replicated-volume-options.t
> (Atin)
> ./tests/bugs/ec/bug-1236065.t (Ashish)
> ./tests/00-geo-rep/georep-basic-dr-rsync.t (Kotresh)
> ./tests/basic/ec/ec-1468261.t (needs attention)
> ./tests/basic/afr/add-brick-self-heal.t (needs attention)
> ./tests/basic/afr/granular-esh/replace-brick.t (needs attention)
> ./tests/bugs/core/multiplex-limit-issue-151.t (needs attention)
> ./tests/bugs/glusterd/validating-server-quorum.t (Atin)
> ./tests/bugs/replicate/bug-1363721.t (Ravi)
> 
> Here are some newer failures, but mostly one-off failures except cores
> in ec-5-2.t. All of the following need attention as these are new.
> 
> ./tests/00-geo-rep/00-georep-verify-setup.t
> ./tests/basic/afr/gfid-mismatch-resolution-with-fav-child-policy.t
> ./tests/basic/stats-dump.t
> ./tests/bugs/bug-1110262.t
> ./tests/bugs/glusterd/mgmt-handshake-and-volume-sync-post-glusterd-restart.t
> ./tests/basic/ec/ec-data-heal.t
> ./tests/bugs/replicate/bug-1448804-check-quorum-type-values.t
> ./tests/bugs/snapshot/bug-1482023-snpashot-issue-with-other-processes-accessing-mounted-path.t
> ./tests/basic/ec/ec-5-2.t
> 
> 6) Tests that are addressed or are not occurring anymore are,
> 
> ./tests/bugs/glusterd/rebalance-operations-in-single-node.t
> ./tests/bugs/index/bug-1559004-EMLINK-handling.t
> ./tests/bugs/replicate/bug-1386188-sbrain-fav-child.t
> ./tests/bugs/replicate/bug-1433571-undo-pending-only-on-up-bricks.t
> ./tests/bitrot/bug-1373520.t
> ./tests/bugs/distribute/bug-1117851.t
> ./tests/bugs/glusterd/quorum-validation.t
> ./tests/bugs/distribute/bug-1042725.t
> ./tests/bugs/replicate/bug-1586020-mark-dirty-for-entry-txn-on-quorum-failure.t
> ./tests/bugs/quota/bug-1293601.t
> ./tests/bugs/bug-1368312.t
> ./tests/bugs/distribute/bug-1122443.t
> ./tests/bugs/core/bug-1432542-mpx-restart-crash.t
> 
> Shyam (and Atin)
> 
> On 08/05/2018 06:24 PM, Shyam Ranganathan wrote:
>> Health on master as of the last nightly run [4] is still the same.
>>
>> Potential patches that rectify the situation (as in [1]) are bunched in
>> a patch [2] that Atin and myself have put through several regressions
>> (mux, normal and line coverage) and these have also not passed.
>>
>> Till we rectify the situation we are locking down master branch commit
>> rights to the following people, Amar, Atin, Shyam, Vijay.
>>
>> The intention is to stabilize master and not add more patches that my
>> destabilize it.
>>
>> Test cases that are tracked as failures and need action are present here
>> [3].
>>
>> @Nigel, request you to apply the commit rights change as you see this
>> mail and let the list know regarding the same as well.
>>
>> Thanks,
>> Shyam
>>
>> [1] Patches that address regression failures:
>> https://review.gluster.org/#/q/starredby:srangana%2540redhat.com
>>
>> [2] Bunched up patch against which regressions were run:
>> https://review.gluster.org/#/c/20637
>>
>> [3

Re: [Gluster-devel] Master branch lock down status (Wed, August 08th)

2018-08-12 Thread Raghavendra Gowdappa
On Sun, Aug 12, 2018 at 9:11 AM, Raghavendra Gowdappa 
wrote:

>
>
> On Sat, Aug 11, 2018 at 10:33 PM, Shyam Ranganathan 
> wrote:
>
>> On 08/09/2018 10:58 PM, Raghavendra Gowdappa wrote:
>> >
>> >
>> > On Fri, Aug 10, 2018 at 1:38 AM, Shyam Ranganathan > > > wrote:
>> >
>> > On 08/08/2018 09:04 PM, Shyam Ranganathan wrote:
>> > > Today's patch set 7 [1], included fixes provided till last
>> evening IST,
>> > > and its runs can be seen here [2] (yay! we can link to comments in
>> > > gerrit now).
>> > >
>> > > New failures: (added to the spreadsheet)
>> > > ./tests/bugs/quick-read/bug-846240.t
>> >
>> > The above test fails always if there is a sleep of 10 added at line
>> 36.
>> >
>> > I tried to replicate this in my setup, and was able to do so 3/150
>> times
>> > and the failures were the same as the ones reported in the build
>> logs
>> > (as below).
>> >
>> > Not finding any clear reason for the failure, I delayed the test
>> (i.e
>> > added a sleep 10) after the open on M0 to see if the race is
>> uncovered,
>> > and it was.
>> >
>> > Du, request you to take a look at the same, as the test is around
>> > quick-read but involves open-behind as well.
>> >
>> >
>> > Thanks for that information. I'll be working on this today.
>>
>> Heads up Du, failed again with the same pattern in run
>> https://build.gluster.org/job/regression-on-demand-full-run/
>> 46/consoleFull
>
>
> Sorry Shyam.
>
> I found out the cause [1]. But still thinking about the fix or to remove
> the test given recent changes to open-behind from [1]. You'll have an
> answer by EOD today.
>

Fix submitted at  https://review.gluster.org/#/c/glusterfs/+/20710/


> [1] https://review.gluster.org/20428
>
>
>>
>> >
>> >
>> > Failure snippet:
>> > 
>> > 23:41:24 [23:41:28] Running tests in file
>> > ./tests/bugs/quick-read/bug-846240.t
>> > 23:41:28 ./tests/bugs/quick-read/bug-846240.t ..
>> > 23:41:28 1..17
>> > 23:41:28 ok 1, LINENUM:9
>> > 23:41:28 ok 2, LINENUM:10
>> > 
>> > 23:41:28 ok 13, LINENUM:40
>> > 23:41:28 not ok 14 , LINENUM:50
>> > 23:41:28 FAILED COMMAND: [ 0 -ne 0 ]
>> >
>> > Shyam
>> >
>> >
>>
>
>
___
Gluster-devel mailing list
Gluster-devel@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-devel

[Gluster-devel] Trello may automatically add comments to GitHub Issues/PRs/...

2018-08-12 Thread Niels de Vos
Hi,

Today I just noticed that my Trello updates to a private board (for an
internal Red Hat project) added comments to GitHub PRs. In this case, it
was not really much of a problem, except that non-board members would
not have access to the linked card. This is obviously very community
unfriendly. These comments have now been removed.

In order to prevent this from happening in the future, I recommend
others that use private Trello boards to change the settings of the
Gitub Power-Up in Trello:

1. click "Show Menu" in the upper-right corner
2. click "Power-Ups" in the menu
3. click "Enabled" in the board-sized pop-up
4. click the gear-button on the GitHub pane
5. click "Edit Power-Up Settings" in the menu
6. unselect "Add a comment with a link to the Trello card when attaching
 commits, issues, and pull requests."

The last few steps are visible in the attached screenshot.

These settings seem to be user and board specific, so public Trello
boards can still have the option to post comments enabled.

Niels
___
Gluster-devel mailing list
Gluster-devel@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-devel