Re: [Gluster-devel] [FAILED] NetBSD-regression for ./tests/basic/afr/self-heald.t

2016-02-08 Thread Emmanuel Dreyfus
On Mon, Feb 08, 2016 at 03:26:54PM +0530, Milind Changire wrote:
> https://build.gluster.org/job/rackspace-netbsd7-regression-triggered/14089/consoleFull
> 
> 
> [08:44:20] ./tests/basic/afr/self-heald.t ..
> not ok 37 Got "0" instead of "1"
> not ok 52 Got "0" instead of "1"
> not ok 67
> Failed 4/83 subtests

There is a core but it is from NetBSD FUSE subsystem. The trace is
not helpful but suggests an abort() call because of unexpected 
situation:

Core was generated by `perfused'.
Program terminated with signal SIGABRT, Aborted.
#0  0xbb7574b7 in _lwp_kill () from /usr/lib/libc.so.12
(gdb) bt
#0  0xbb7574b7 in _lwp_kill () from /usr/lib/libc.so.12

/var/log/messages has a hint:
Feb  8 08:43:15 nbslave7c perfused: file write grow without resize

Indeed I have this assertion in NetBSD FUSE to catch a race condition. 
I think it is the first time I see hit raised, but I am unable to 
conclude on the cause. Let us retrigger (I did it) and see if someone 
else ever hit that again. The bug is more likely in NetBSD FUSE than 
in glusterfs.

-- 
Emmanuel Dreyfus
m...@netbsd.org
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] [FAILED] NetBSD-regression for ./tests/basic/afr/self-heald.t

2016-02-08 Thread Michael Scherer
Le lundi 08 février 2016 à 16:22 +0530, Pranith Kumar Karampuri a
écrit :
> 
> On 02/08/2016 04:16 PM, Ravishankar N wrote:
> > [Removing Milind, adding Pranith]
> >
> > On 02/08/2016 04:09 PM, Emmanuel Dreyfus wrote:
> >> On Mon, Feb 08, 2016 at 04:05:44PM +0530, Ravishankar N wrote:
> >>> The patch to add it to bad tests has already been merged, so I guess 
> >>> this
> >>> .t's failure won't pop up again.
> >> IMo that was a bit too quick.
> > I guess Pranith merged it because of last week's complaint for the 
> > same .t and not wanting to block other patches from being merged.
> 
> Yes, two people came to my desk and said their patches are blocked 
> because of this. So had to merge until we figure out the problem.

I suspect it would be better if people did use the list rather than
going to the desk, as it would help others who are either absent, in
another office or even not working in the same company be aware of the
issue.

next time this happen, can you direct people to gluster-devel ?

-- 
Michael Scherer
Sysadmin, Community Infrastructure and Platform, OSAS




signature.asc
Description: This is a digitally signed message part
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel

[Gluster-devel] [FAILED] NetBSD-regression for ./tests/basic/afr/self-heald.t

2016-02-08 Thread Milind Changire
https://build.gluster.org/job/rackspace-netbsd7-regression-triggered/14089/consoleFull


[08:44:20] ./tests/basic/afr/self-heald.t ..
not ok 37 Got "0" instead of "1"
not ok 52 Got "0" instead of "1"
not ok 67
Failed 4/83 subtests


Please advise.

--

Milind
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] [FAILED] NetBSD-regression for ./tests/basic/afr/self-heald.t

2016-02-08 Thread Pranith Kumar Karampuri



On 02/08/2016 04:16 PM, Ravishankar N wrote:

[Removing Milind, adding Pranith]

On 02/08/2016 04:09 PM, Emmanuel Dreyfus wrote:

On Mon, Feb 08, 2016 at 04:05:44PM +0530, Ravishankar N wrote:
The patch to add it to bad tests has already been merged, so I guess 
this

.t's failure won't pop up again.

IMo that was a bit too quick.
I guess Pranith merged it because of last week's complaint for the 
same .t and not wanting to block other patches from being merged.


Yes, two people came to my desk and said their patches are blocked 
because of this. So had to merge until we figure out the problem.


Pranith

  What is the procedure to get out of the
list?

Usually, you just fix the problem with the testcase and send a patch 
with the fix and removing it from bad_tests. (For example 
http://review.gluster.org/13233)




___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] [FAILED] NetBSD-regression for ./tests/basic/afr/self-heald.t

2016-02-08 Thread Emmanuel Dreyfus
On Mon, Feb 08, 2016 at 04:05:44PM +0530, Ravishankar N wrote:
> The patch to add it to bad tests has already been merged, so I guess this
> .t's failure won't pop up again.

IMo that was a bit too quick. What is the procedure to get out of the
list?

-- 
Emmanuel Dreyfus
m...@netbsd.org
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] [FAILED] NetBSD-regression for ./tests/basic/afr/self-heald.t

2016-02-08 Thread Ravishankar N

On 02/08/2016 03:37 PM, Emmanuel Dreyfus wrote:

On Mon, Feb 08, 2016 at 03:26:54PM +0530, Milind Changire wrote:

https://build.gluster.org/job/rackspace-netbsd7-regression-triggered/14089/consoleFull


[08:44:20] ./tests/basic/afr/self-heald.t ..
not ok 37 Got "0" instead of "1"
not ok 52 Got "0" instead of "1"
not ok 67
Failed 4/83 subtests

There is a core but it is from NetBSD FUSE subsystem. The trace is
not helpful but suggests an abort() call because of unexpected
situation:

Core was generated by `perfused'.
Program terminated with signal SIGABRT, Aborted.
#0  0xbb7574b7 in _lwp_kill () from /usr/lib/libc.so.12
(gdb) bt
#0  0xbb7574b7 in _lwp_kill () from /usr/lib/libc.so.12

/var/log/messages has a hint:
Feb  8 08:43:15 nbslave7c perfused: file write grow without resize

Indeed I have this assertion in NetBSD FUSE to catch a race condition.
I think it is the first time I see hit raised, but I am unable to
conclude on the cause. Let us retrigger (I did it) and see if someone
else ever hit that again. The bug is more likely in NetBSD FUSE than
in glusterfs.

The .t has been added to bad tests for now @ 
http://review.gluster.org/#/c/13344/, so you can probably rebase your patch.
I'm not sure this is a problem with the case, the same issue was 
reported by Manikandan last week : 
https://build.gluster.org/job/rackspace-netbsd7-regression-triggered/13895/consoleFull 

Is it one of those vndconfig errors? The .t seems to have skipped a few 
tests:


---
./tests/basic/afr/self-heald.t (Wstat: 0 Tests: 82 Failed: 3)
  Failed tests:  37, 52, 67
  Parse errors: Tests out of sequence.  Found (31) but expected (30)
Tests out of sequence.  Found (32) but expected (31)
Tests out of sequence.  Found (33) but expected (32)
Tests out of sequence.  Found (34) but expected (33)
Tests out of sequence.  Found (35) but expected (34)
Displayed the first 5 of 54 TAP syntax errors.
Re-run prove with the -p option to see them all.



___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] [FAILED] NetBSD-regression for ./tests/basic/afr/self-heald.t

2016-02-08 Thread Emmanuel Dreyfus
On Mon, Feb 08, 2016 at 10:26:22AM +, Emmanuel Dreyfus wrote:
> Indeed, same problem. But unfortunately it is not very reproductible since
> we need to make a full week of runs to see it again. I am tempted to
> just remove the assertion.

NB: this does not fail on stock NetBSD release: the assertion is only there
because FUSE is build with -DDEBUG on NetBSD slave VM. 

OTOH if it happens only in tests/basic/afr/self-heal.t I may be able to 
get it by looping on the test for a while. I will try this on nbslave70.

In the meatime if that one pops up too often and gets annoying, I can get
rid of it by just disabling debug mode.

-- 
Emmanuel Dreyfus
m...@netbsd.org
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] [FAILED] NetBSD-regression for ./tests/basic/afr/self-heald.t

2016-02-08 Thread Pranith Kumar Karampuri



On 02/08/2016 04:22 PM, Pranith Kumar Karampuri wrote:



On 02/08/2016 04:16 PM, Ravishankar N wrote:

[Removing Milind, adding Pranith]

On 02/08/2016 04:09 PM, Emmanuel Dreyfus wrote:

On Mon, Feb 08, 2016 at 04:05:44PM +0530, Ravishankar N wrote:
The patch to add it to bad tests has already been merged, so I 
guess this

.t's failure won't pop up again.

IMo that was a bit too quick.
I guess Pranith merged it because of last week's complaint for the 
same .t and not wanting to block other patches from being merged.


Yes, two people came to my desk and said their patches are blocked 
because of this. So had to merge until we figure out the problem.


Patch is from last week though.

Pranith


Pranith

  What is the procedure to get out of the
list?

Usually, you just fix the problem with the testcase and send a patch 
with the fix and removing it from bad_tests. (For example 
http://review.gluster.org/13233)




___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] [FAILED] NetBSD-regression for ./tests/basic/afr/self-heald.t

2016-02-08 Thread Emmanuel Dreyfus
On Mon, Feb 08, 2016 at 03:44:43PM +0530, Ravishankar N wrote:
> The .t has been added to bad tests for now @

I am note sure this is relevant: does it fails again? I am very interested
if it is reproductible.

> http://review.gluster.org/#/c/13344/, so you can probably rebase your patch.
> I'm not sure this is a problem with the case, the same issue was reported by
> Manikandan last week : 
> https://build.gluster.org/job/rackspace-netbsd7-regression-triggered/13895/consoleFull

Indeed, same problem. But unfortunately it is not very reproductible since
we need to make a full week of runs to see it again. I am tempted to
just remove the assertion.

> Is it one of those vndconfig errors? The .t seems to have skipped a few
> tests:

This is because FUSE went away during the test.
The vnconfig problems are fixed now and should not happen anymore.
> 

-- 
Emmanuel Dreyfus
m...@netbsd.org
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] [FAILED] NetBSD-regression for ./tests/basic/afr/self-heald.t

2016-02-08 Thread Ravishankar N

On 02/08/2016 04:00 PM, Emmanuel Dreyfus wrote:

On Mon, Feb 08, 2016 at 10:26:22AM +, Emmanuel Dreyfus wrote:

Indeed, same problem. But unfortunately it is not very reproductible since
we need to make a full week of runs to see it again. I am tempted to
just remove the assertion.

NB: this does not fail on stock NetBSD release: the assertion is only there
because FUSE is build with -DDEBUG on NetBSD slave VM.

OTOH if it happens only in tests/basic/afr/self-heal.t I may be able to
get it by looping on the test for a while. I will try this on nbslave70.

Thanks Emmanuel!


In the meatime if that one pops up too often and gets annoying, I can get
rid of it by just disabling debug mode.

The patch to add it to bad tests has already been merged, so I guess 
this .t's failure won't pop up again.


___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] [FAILED] NetBSD-regression for ./tests/basic/afr/self-heald.t

2016-02-08 Thread Ravishankar N

[Removing Milind, adding Pranith]

On 02/08/2016 04:09 PM, Emmanuel Dreyfus wrote:

On Mon, Feb 08, 2016 at 04:05:44PM +0530, Ravishankar N wrote:

The patch to add it to bad tests has already been merged, so I guess this
.t's failure won't pop up again.

IMo that was a bit too quick.
I guess Pranith merged it because of last week's complaint for the same 
.t and not wanting to block other patches from being merged.

  What is the procedure to get out of the
list?

Usually, you just fix the problem with the testcase and send a patch 
with the fix and removing it from bad_tests. (For example 
http://review.gluster.org/13233)


___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] [FAILED] NetBSD-regression for ./tests/basic/afr/self-heald.t

2016-02-08 Thread Pranith Kumar Karampuri



On 02/08/2016 05:04 PM, Michael Scherer wrote:

Le lundi 08 février 2016 à 16:22 +0530, Pranith Kumar Karampuri a
écrit :

On 02/08/2016 04:16 PM, Ravishankar N wrote:

[Removing Milind, adding Pranith]

On 02/08/2016 04:09 PM, Emmanuel Dreyfus wrote:

On Mon, Feb 08, 2016 at 04:05:44PM +0530, Ravishankar N wrote:

The patch to add it to bad tests has already been merged, so I guess
this
.t's failure won't pop up again.

IMo that was a bit too quick.

I guess Pranith merged it because of last week's complaint for the
same .t and not wanting to block other patches from being merged.

Yes, two people came to my desk and said their patches are blocked
because of this. So had to merge until we figure out the problem.

I suspect it would be better if people did use the list rather than
going to the desk, as it would help others who are either absent, in
another office or even not working in the same company be aware of the
issue.

next time this happen, can you direct people to gluster-devel ?

Will do :-).

Pranith




___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel