Re: [Gluster-devel] regression failures on afr/split-brain-resolution

2018-07-24 Thread Raghavendra Gowdappa
On Wed, Jul 25, 2018 at 10:25 AM, Ravishankar N 
wrote:

>
>
> On 07/24/2018 08:45 PM, Raghavendra Gowdappa wrote:
>
>
> I tried higher values of attribute-timeout and its not helping. Are there
> any other similar split brain related tests? Can I mark these tests bad for
> time being as  the md-cache patch has a deadline?
>
>
>>
>>> `git grep split-brain-status ` on the tests folder returned the
> following:
> tests/basic/afr/split-brain-resolution.t:
> tests/bugs/bug-1368312.t:
> tests/bugs/replicate/bug-1238398-split-brain-resolution.t:
> tests/bugs/replicate/bug-1417522-block-split-brain-resolution.t
>
> I guess if it is blocking you , you can mark them as bad tests and assign
> the bug to me.
>

https://bugzilla.redhat.com/show_bug.cgi?id=1608158.

Will mark these tests as bad in the md-cache patch referred in the first
mail.

-Ravi
>
___
Gluster-devel mailing list
Gluster-devel@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] regression failures on afr/split-brain-resolution

2018-07-24 Thread Ravishankar N



On 07/24/2018 08:45 PM, Raghavendra Gowdappa wrote:


I tried higher values of attribute-timeout and its not helping. Are 
there any other similar split brain related tests? Can I mark these 
tests bad for time being as  the md-cache patch has a deadline?





`git grep split-brain-status ` on the tests folder returned the following:
tests/basic/afr/split-brain-resolution.t:
tests/bugs/bug-1368312.t:
tests/bugs/replicate/bug-1238398-split-brain-resolution.t:
tests/bugs/replicate/bug-1417522-block-split-brain-resolution.t

I guess if it is blocking you , you can mark them as bad tests and 
assign the bug to me.

-Ravi
___
Gluster-devel mailing list
Gluster-devel@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] regression failures on afr/split-brain-resolution

2018-07-24 Thread Ravishankar N



On 07/25/2018 09:06 AM, Raghavendra Gowdappa wrote:



On Tue, Jul 24, 2018 at 6:54 PM, Ravishankar N > wrote:




On 07/24/2018 06:30 PM, Ravishankar N wrote:




On 07/24/2018 02:56 PM, Raghavendra Gowdappa wrote:

All,

I was trying to debug regression failures on [1] and observed
that split-brain-resolution.t was failing consistently.

=
TEST 45 (line 88): 0 get_pending_heal_count patchy
./tests/basic/afr/split-brain-resolution.t .. 45/45 RESULT 45: 1
./tests/basic/afr/split-brain-resolution.t .. Failed 17/45 subtests

Test Summary Report
---
./tests/basic/afr/split-brain-resolution.t (Wstat: 0 Tests: 45
Failed: 17)
  Failed tests:  24-26, 28-36, 41-45


On probing deeper, I observed a curious fact - on most of the
failures stat was not served from md-cache, but instead was
wound down to afr which failed stat with EIO as the file was in
split brain. So, I did another test:
* disabled md-cache
* mount glusterfs with attribute-timeout 0 and entry-timeout 0

Now the test fails always. So, I think the test relied on stat
requests being absorbed either by kernel attribute cache or
md-cache. When its not happening stats are reaching afr and
resulting in failures of cmds like getfattr etc.


This indeed seems to be the case.  Is there any way we can avoid
the stat? When a getfattr is performed on the mount, aren't
lookup + getfattr are the only fops that need to be hit in gluster?


Or should afr allow (f)stat even for replica-2 split-brains
because it is allowing lookup anyway (lookup cbk contains stat
information from one of its children) ?


I think the question here should be what kind of access we've to 
provide for files in split-brain. Once that broader question is 
answered, we should think about what fops come under those kinds of 
access. If setfattr/getfattr cmd access has to be provided I guess 
lookup, stat, setxattr, getxattr need to work with split-brain files.


Ideally, the only fop that should be allowed access is checking whether 
the file exists or not (i.e. lookup), subject to quorum checks. All 
others should be denied. This is how it works as of today too but we 
(afr) overloaded setfattr and getfattr with virtual xattrs to allow 
examining and resolving split-brain from the mount, which is now failing 
in the .t because of the stat failing like you pointed out. I think we 
should allow (f)stat too for replica-2 case even when there are no good 
copies (i.e. read_subvol) to support the mount based split-brain 
resolution method.  Pranith, what do you think?


-Ravi




-Ravi

-Ravi


Thoughts?

[1] https://review.gluster.org/#/c/20549/



___
Gluster-devel mailing list
Gluster-devel@gluster.org 
https://lists.gluster.org/mailman/listinfo/gluster-devel





___
Gluster-devel mailing list
Gluster-devel@gluster.org 
https://lists.gluster.org/mailman/listinfo/gluster-devel






___
Gluster-devel mailing list
Gluster-devel@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] regression failures on afr/split-brain-resolution

2018-07-24 Thread Raghavendra Gowdappa
On Tue, Jul 24, 2018 at 6:54 PM, Ravishankar N 
wrote:

>
>
> On 07/24/2018 06:30 PM, Ravishankar N wrote:
>
>
>
> On 07/24/2018 02:56 PM, Raghavendra Gowdappa wrote:
>
> All,
>
> I was trying to debug regression failures on [1] and observed that
> split-brain-resolution.t was failing consistently.
>
> =
> TEST 45 (line 88): 0 get_pending_heal_count patchy
> ./tests/basic/afr/split-brain-resolution.t .. 45/45 RESULT 45: 1
> ./tests/basic/afr/split-brain-resolution.t .. Failed 17/45 subtests
>
> Test Summary Report
> ---
> ./tests/basic/afr/split-brain-resolution.t (Wstat: 0 Tests: 45 Failed: 17)
>   Failed tests:  24-26, 28-36, 41-45
>
>
> On probing deeper, I observed a curious fact - on most of the failures
> stat was not served from md-cache, but instead was wound down to afr which
> failed stat with EIO as the file was in split brain. So, I did another test:
> * disabled md-cache
> * mount glusterfs with attribute-timeout 0 and entry-timeout 0
>
> Now the test fails always. So, I think the test relied on stat requests
> being absorbed either by kernel attribute cache or md-cache. When its not
> happening stats are reaching afr and resulting in failures of cmds like
> getfattr etc.
>
>
> This indeed seems to be the case.  Is there any way we can avoid the stat?
> When a getfattr is performed on the mount, aren't lookup + getfattr are the
> only fops that need to be hit in gluster?
>
>
> Or should afr allow (f)stat even for replica-2 split-brains because it is
> allowing lookup anyway (lookup cbk contains stat information from one of
> its children) ?
>

I think the question here should be what kind of access we've to provide
for files in split-brain. Once that broader question is answered, we should
think about what fops come under those kinds of access. If
setfattr/getfattr cmd access has to be provided I guess lookup, stat,
setxattr, getxattr need to work with split-brain files.

-Ravi
>
> -Ravi
>
> Thoughts?
>
> [1] https://review.gluster.org/#/c/20549/
>
>
> ___
> Gluster-devel mailing 
> listGluster-devel@gluster.orghttps://lists.gluster.org/mailman/listinfo/gluster-devel
>
>
>
>
> ___
> Gluster-devel mailing 
> listGluster-devel@gluster.orghttps://lists.gluster.org/mailman/listinfo/gluster-devel
>
>
>
___
Gluster-devel mailing list
Gluster-devel@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] regression failures on afr/split-brain-resolution

2018-07-24 Thread Raghavendra Gowdappa
On Tue, Jul 24, 2018 at 8:36 PM, Raghavendra Gowdappa 
wrote:

>
>
> On Tue, Jul 24, 2018 at 8:35 PM, Raghavendra Gowdappa  > wrote:
>
>>
>>
>> On Tue, Jul 24, 2018 at 6:30 PM, Ravishankar N 
>> wrote:
>>
>>>
>>>
>>> On 07/24/2018 02:56 PM, Raghavendra Gowdappa wrote:
>>>
>>> All,
>>>
>>> I was trying to debug regression failures on [1] and observed that
>>> split-brain-resolution.t was failing consistently.
>>>
>>> =
>>> TEST 45 (line 88): 0 get_pending_heal_count patchy
>>> ./tests/basic/afr/split-brain-resolution.t .. 45/45 RESULT 45: 1
>>> ./tests/basic/afr/split-brain-resolution.t .. Failed 17/45 subtests
>>>
>>> Test Summary Report
>>> ---
>>> ./tests/basic/afr/split-brain-resolution.t (Wstat: 0 Tests: 45 Failed:
>>> 17)
>>>   Failed tests:  24-26, 28-36, 41-45
>>>
>>>
>>> On probing deeper, I observed a curious fact - on most of the failures
>>> stat was not served from md-cache, but instead was wound down to afr which
>>> failed stat with EIO as the file was in split brain. So, I did another test:
>>> * disabled md-cache
>>> * mount glusterfs with attribute-timeout 0 and entry-timeout 0
>>>
>>> Now the test fails always. So, I think the test relied on stat requests
>>> being absorbed either by kernel attribute cache or md-cache. When its not
>>> happening stats are reaching afr and resulting in failures of cmds like
>>> getfattr etc.
>>>
>>>
>>> This indeed seems to be the case.  Is there any way we can avoid the
>>> stat? When a getfattr is performed on the mount, aren't lookup + getfattr
>>> are the only fops that need to be hit in gluster?
>>>
>>
>> Its a black box to me how kernel decides whether to do lookup or stat.
>> But I guess, if only stat is needed and its not available in cache it would
>> do a stat.
>>
>
> Another thing you can do is mounting with a higher value of
> attribute-timeout. Let us know whether it works.
>

I tried higher values of attribute-timeout and its not helping. Are there
any other similar split brain related tests? Can I mark these tests bad for
time being as  the md-cache patch has a deadline?


>
>> -Ravi
>>>
>>> Thoughts?
>>>
>>> [1] https://review.gluster.org/#/c/20549/
>>>
>>>
>>> ___
>>> Gluster-devel mailing 
>>> listGluster-devel@gluster.orghttps://lists.gluster.org/mailman/listinfo/gluster-devel
>>>
>>>
>>>
>>
>
___
Gluster-devel mailing list
Gluster-devel@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] regression failures on afr/split-brain-resolution

2018-07-24 Thread Raghavendra Gowdappa
On Tue, Jul 24, 2018 at 8:35 PM, Raghavendra Gowdappa 
wrote:

>
>
> On Tue, Jul 24, 2018 at 6:30 PM, Ravishankar N 
> wrote:
>
>>
>>
>> On 07/24/2018 02:56 PM, Raghavendra Gowdappa wrote:
>>
>> All,
>>
>> I was trying to debug regression failures on [1] and observed that
>> split-brain-resolution.t was failing consistently.
>>
>> =
>> TEST 45 (line 88): 0 get_pending_heal_count patchy
>> ./tests/basic/afr/split-brain-resolution.t .. 45/45 RESULT 45: 1
>> ./tests/basic/afr/split-brain-resolution.t .. Failed 17/45 subtests
>>
>> Test Summary Report
>> ---
>> ./tests/basic/afr/split-brain-resolution.t (Wstat: 0 Tests: 45 Failed:
>> 17)
>>   Failed tests:  24-26, 28-36, 41-45
>>
>>
>> On probing deeper, I observed a curious fact - on most of the failures
>> stat was not served from md-cache, but instead was wound down to afr which
>> failed stat with EIO as the file was in split brain. So, I did another test:
>> * disabled md-cache
>> * mount glusterfs with attribute-timeout 0 and entry-timeout 0
>>
>> Now the test fails always. So, I think the test relied on stat requests
>> being absorbed either by kernel attribute cache or md-cache. When its not
>> happening stats are reaching afr and resulting in failures of cmds like
>> getfattr etc.
>>
>>
>> This indeed seems to be the case.  Is there any way we can avoid the
>> stat? When a getfattr is performed on the mount, aren't lookup + getfattr
>> are the only fops that need to be hit in gluster?
>>
>
> Its a black box to me how kernel decides whether to do lookup or stat.
> But I guess, if only stat is needed and its not available in cache it would
> do a stat.
>

Another thing you can do is mounting with a higher value of
attribute-timeout. Let us know whether it works.


> -Ravi
>>
>> Thoughts?
>>
>> [1] https://review.gluster.org/#/c/20549/
>>
>>
>> ___
>> Gluster-devel mailing 
>> listGluster-devel@gluster.orghttps://lists.gluster.org/mailman/listinfo/gluster-devel
>>
>>
>>
>
___
Gluster-devel mailing list
Gluster-devel@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] regression failures on afr/split-brain-resolution

2018-07-24 Thread Raghavendra Gowdappa
On Tue, Jul 24, 2018 at 6:30 PM, Ravishankar N 
wrote:

>
>
> On 07/24/2018 02:56 PM, Raghavendra Gowdappa wrote:
>
> All,
>
> I was trying to debug regression failures on [1] and observed that
> split-brain-resolution.t was failing consistently.
>
> =
> TEST 45 (line 88): 0 get_pending_heal_count patchy
> ./tests/basic/afr/split-brain-resolution.t .. 45/45 RESULT 45: 1
> ./tests/basic/afr/split-brain-resolution.t .. Failed 17/45 subtests
>
> Test Summary Report
> ---
> ./tests/basic/afr/split-brain-resolution.t (Wstat: 0 Tests: 45 Failed: 17)
>   Failed tests:  24-26, 28-36, 41-45
>
>
> On probing deeper, I observed a curious fact - on most of the failures
> stat was not served from md-cache, but instead was wound down to afr which
> failed stat with EIO as the file was in split brain. So, I did another test:
> * disabled md-cache
> * mount glusterfs with attribute-timeout 0 and entry-timeout 0
>
> Now the test fails always. So, I think the test relied on stat requests
> being absorbed either by kernel attribute cache or md-cache. When its not
> happening stats are reaching afr and resulting in failures of cmds like
> getfattr etc.
>
>
> This indeed seems to be the case.  Is there any way we can avoid the stat?
> When a getfattr is performed on the mount, aren't lookup + getfattr are the
> only fops that need to be hit in gluster?
>

Its a black box to me how kernel decides whether to do lookup or stat.  But
I guess, if only stat is needed and its not available in cache it would do
a stat.

-Ravi
>
> Thoughts?
>
> [1] https://review.gluster.org/#/c/20549/
>
>
> ___
> Gluster-devel mailing 
> listGluster-devel@gluster.orghttps://lists.gluster.org/mailman/listinfo/gluster-devel
>
>
>
___
Gluster-devel mailing list
Gluster-devel@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] regression failures on afr/split-brain-resolution

2018-07-24 Thread Ravishankar N



On 07/24/2018 06:30 PM, Ravishankar N wrote:




On 07/24/2018 02:56 PM, Raghavendra Gowdappa wrote:

All,

I was trying to debug regression failures on [1] and observed that 
split-brain-resolution.t was failing consistently.


=
TEST 45 (line 88): 0 get_pending_heal_count patchy
./tests/basic/afr/split-brain-resolution.t .. 45/45 RESULT 45: 1
./tests/basic/afr/split-brain-resolution.t .. Failed 17/45 subtests

Test Summary Report
---
./tests/basic/afr/split-brain-resolution.t (Wstat: 0 Tests: 45 
Failed: 17)

  Failed tests:  24-26, 28-36, 41-45


On probing deeper, I observed a curious fact - on most of the 
failures stat was not served from md-cache, but instead was wound 
down to afr which failed stat with EIO as the file was in split 
brain. So, I did another test:

* disabled md-cache
* mount glusterfs with attribute-timeout 0 and entry-timeout 0

Now the test fails always. So, I think the test relied on stat 
requests being absorbed either by kernel attribute cache or md-cache. 
When its not happening stats are reaching afr and resulting in 
failures of cmds like getfattr etc.


This indeed seems to be the case.  Is there any way we can avoid the 
stat? When a getfattr is performed on the mount, aren't lookup + 
getfattr are the only fops that need to be hit in gluster?


Or should afr allow (f)stat even for replica-2 split-brains because it 
is allowing lookup anyway (lookup cbk contains stat information from one 
of its children) ?

-Ravi

-Ravi


Thoughts?

[1] https://review.gluster.org/#/c/20549/


___
Gluster-devel mailing list
Gluster-devel@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-devel




___
Gluster-devel mailing list
Gluster-devel@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-devel


___
Gluster-devel mailing list
Gluster-devel@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] regression failures on afr/split-brain-resolution

2018-07-24 Thread Ravishankar N



On 07/24/2018 02:56 PM, Raghavendra Gowdappa wrote:

All,

I was trying to debug regression failures on [1] and observed that 
split-brain-resolution.t was failing consistently.


=
TEST 45 (line 88): 0 get_pending_heal_count patchy
./tests/basic/afr/split-brain-resolution.t .. 45/45 RESULT 45: 1
./tests/basic/afr/split-brain-resolution.t .. Failed 17/45 subtests

Test Summary Report
---
./tests/basic/afr/split-brain-resolution.t (Wstat: 0 Tests: 45 Failed: 17)
  Failed tests:  24-26, 28-36, 41-45


On probing deeper, I observed a curious fact - on most of the failures 
stat was not served from md-cache, but instead was wound down to afr 
which failed stat with EIO as the file was in split brain. So, I did 
another test:

* disabled md-cache
* mount glusterfs with attribute-timeout 0 and entry-timeout 0

Now the test fails always. So, I think the test relied on stat 
requests being absorbed either by kernel attribute cache or md-cache. 
When its not happening stats are reaching afr and resulting in 
failures of cmds like getfattr etc.


This indeed seems to be the case.  Is there any way we can avoid the 
stat? When a getfattr is performed on the mount, aren't lookup + 
getfattr are the only fops that need to be hit in gluster?

-Ravi


Thoughts?

[1] https://review.gluster.org/#/c/20549/


___
Gluster-devel mailing list
Gluster-devel@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-devel


___
Gluster-devel mailing list
Gluster-devel@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-devel

[Gluster-devel] regression failures on afr/split-brain-resolution

2018-07-24 Thread Raghavendra Gowdappa
All,

I was trying to debug regression failures on [1] and observed that
split-brain-resolution.t was failing consistently.

=
TEST 45 (line 88): 0 get_pending_heal_count patchy
./tests/basic/afr/split-brain-resolution.t .. 45/45 RESULT 45: 1
./tests/basic/afr/split-brain-resolution.t .. Failed 17/45 subtests

Test Summary Report
---
./tests/basic/afr/split-brain-resolution.t (Wstat: 0 Tests: 45 Failed: 17)
  Failed tests:  24-26, 28-36, 41-45


On probing deeper, I observed a curious fact - on most of the failures stat
was not served from md-cache, but instead was wound down to afr which
failed stat with EIO as the file was in split brain. So, I did another test:
* disabled md-cache
* mount glusterfs with attribute-timeout 0 and entry-timeout 0

Now the test fails always. So, I think the test relied on stat requests
being absorbed either by kernel attribute cache or md-cache. When its not
happening stats are reaching afr and resulting in failures of cmds like
getfattr etc. Thoughts?

[1] https://review.gluster.org/#/c/20549/
___
Gluster-devel mailing list
Gluster-devel@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-devel