date:20150331

On 1 Apr 2015, at 05:04, Emmanuel Dreyfus  wrote:
> Justin Clift  wrote:
> 
>>> That, or perhaps we could have two verified fields?
>> 
>> Sure.  Whichever works. :)
>> 
>> Personally, I'm not sure how to do either yet.
> 
> In http://build.gluster.org/gerrit-trigger/ you have "Verdict
> categories" with CRVW (code review) and VRIF (verified), and there is a
> "add verdict category", which suggest this is something that can be
> done.
> 
> Of course the Gerrit side will need some configuration too, but if
> Jenkins can deal with more Gerrit fields, there must be a way to add
> fields in Gerrit.

Interesting.  Marcelo, this sounds like something you'd know
about.  Any ideas? :)

We're trying to add an extra "Verified" column to our Gerrit +
Jenkins setup.  We have an existing one for "Gluster Build System"
(which is our CentOS Regression testing).  Now we want to add one for
our NetBSD Regression testing.

Regards and best wishes,

Justin Clift

--
GlusterFS - http://www.gluster.org

An open source, distributed file system scaling to several
petabytes, and handling thousands of clients.

My personal twitter: twitter.com/realjustinclift

___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] rackspace-netbsd7-regression-triggered has been disabled

Justin Clift  wrote:

> > That, or perhaps we could have two verified fields?
> 
> Sure.  Whichever works. :)
> 
> Personally, I'm not sure how to do either yet.

In http://build.gluster.org/gerrit-trigger/ you have "Verdict
categories" with CRVW (code review) and VRIF (verified), and there is a
"add verdict category", which suggest this is something that can be
done.

Of course the Gerrit side will need some configuration too, but if
Jenkins can deal with more Gerrit fields, there must be a way to add
fields in Gerrit.

-- 
Emmanuel Dreyfus
http://hcpnet.free.fr/pubz
m...@netbsd.org
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] rackspace-netbsd7-regression-triggered has been disabled

On 1 Apr 2015, at 04:07, Emmanuel Dreyfus  wrote:
> Justin Clift  wrote:
> 
>> It sounds like we need a solution to have both the NetBSD and CentOS
>> regressions run, and only give the +1 when both of them have successfully
>> finished.  If either of them fail, then it gets a -1.
> 
> That, or perhaps we could have two verified fields?

Sure.  Whichever works. :)

Personally, I'm not sure how to do either yet.

+ Justin

--
GlusterFS - http://www.gluster.org

An open source, distributed file system scaling to several
petabytes, and handling thousands of clients.

My personal twitter: twitter.com/realjustinclift

___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] rackspace-netbsd7-regression-triggered has been disabled

Justin Clift  wrote:

> It sounds like we need a solution to have both the NetBSD and CentOS
> regressions run, and only give the +1 when both of them have successfully
> finished.  If either of them fail, then it gets a -1.

That, or perhaps we could have two verified fields?

-- 
Emmanuel Dreyfus
http://hcpnet.free.fr/pubz
m...@netbsd.org
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] rackspace-netbsd7-regression-triggered has been disabled

On 1 Apr 2015, at 03:03, Emmanuel Dreyfus  wrote:
> Jeff Darcy  wrote:
> 
>> That's fine.  I left a note for you in the script, regarding what I
>> think it needs to do at that point.
> 
> Here is the comment:
> 
>> # We shouldn't be touching CR at all.  For V, we should set V+1 iff this
>> # test succeeded *and* the value was already 0 or 1, V-1 otherwise. I
>> # don't know how to do that, but the various smoke tests must be doing
>> # something similar/equivalent.  It's also possible that this part should
>> # be done as a post-build action instead.
> 
> The problem is indeed that we do now know how to retreive previous V
> value. I guess gerrit is the place where V combinations should be
> correctly handled.
> 
> What is the plan for NetBSD regression now? It will fail anything which
> has not been rebased after recent fixes were merged, but apart from that
> the thing is in rather good shape right now.

It sounds like we need a solution to have both the NetBSD and CentOS
regressions run, and only give the +1 when both of them have successfully
finished.  If either of them fail, then it gets a -1.

Research time. ;)

+ Justin

--
GlusterFS - http://www.gluster.org

An open source, distributed file system scaling to several
petabytes, and handling thousands of clients.

My personal twitter: twitter.com/realjustinclift

___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] Extra overnight regression test run results

On 31 Mar 2015, at 17:43, Nithya Balachandran  wrote:

>  * 11 x tests/bugs/distribute/bug-1117851.t
>Failed test:  15
> 
>55% fail rate
> 
> Is the test output for the bug-1117851.t failure available anywhere? 

Not at the moment.  It would be really easy to setup a new VM with
a failure of this, and give you access to it, if that would help?

+ Justin

--
GlusterFS - http://www.gluster.org

An open source, distributed file system scaling to several
petabytes, and handling thousands of clients.

My personal twitter: twitter.com/realjustinclift

___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] Extra overnight regression test run results

On 31 Mar 2015, at 14:18, Shyam  wrote:

>> Also, most of the regression runs produced cores.  Here are
>> the first two:
>> 
>>   http://ded.ninja/gluster/blk0/
> 
> There are 4 cores here, 3 pointing to the (by now hopefully) famous bug 
> #1195415. One of the cores exhibit a different stack etc. Need more analysis 
> to see what the issue could be here, core file: core.16937
> 
>>   http://ded.ninja/gluster/blk1/
> 
> There is a single core here, pointing to the above bug again.

Both the blk0 and blk1 VM's are still online and available,
if that's helpful?

If not, please let me know and I'll nuke them. :)

+ Justin

--
GlusterFS - http://www.gluster.org

An open source, distributed file system scaling to several
petabytes, and handling thousands of clients.

My personal twitter: twitter.com/realjustinclift

___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] Does anyone care if GlusterFS 3.7 does not work on older distributions?

2015-03-31 Thread Dan Mons

On 27 March 2015 at 04:48, Niels de Vos  wrote:
> If you have a strong desire for GlusterFS 3.7 clients on older
> distributions, contact us as soon as possible (definitely within the
> next two/three weeks) so that we can look into the matter.

You mention RHEL5 and Ubuntu 12.04LTS Precise as two targets that are
potentially causing problems.

I'm assuming for "newer" targets you're referring to RHEL7, Ubuntu
14.04LTS Trusty, and similar current long-term releases?

My only objection is when software requires me to start running
non-LTS releases (Ubuntu short term releases, Fedora, etc).  That's a
recipe for pain and heartache in a busy production world.

-Dan

Dan Mons - R&D Sysadmin
Cutting Edge
http://cuttingedge.com.au
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] rackspace-netbsd7-regression-triggered has been disabled

Jeff Darcy  wrote:

> That's fine.  I left a note for you in the script, regarding what I
> think it needs to do at that point.

Here is the comment:

> # We shouldn't be touching CR at all.  For V, we should set V+1 iff this
> # test succeeded *and* the value was already 0 or 1, V-1 otherwise. I
> # don't know how to do that, but the various smoke tests must be doing
> # something similar/equivalent.  It's also possible that this part should
> # be done as a post-build action instead.

The problem is indeed that we do now know how to retreive previous V
value. I guess gerrit is the place where V combinations should be
correctly handled.

What is the plan for NetBSD regression now? It will fail anything which
has not been rebased after recent fixes were merged, but apart from that
the thing is in rather good shape right now.

-- 
Emmanuel Dreyfus
http://hcpnet.free.fr/pubz
m...@netbsd.org
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] rackspace-netbsd7-regression-triggered has been disabled

> > > http://review.gluster.org/#/c/9970/ (Kotresh HR)
> > > extras: Fix stop-all-gluster-processes.sh script
> 
> Theses are the NetBSD regression failures for which we got fixes merged
> recently. Doesn't it just need to be rebased?

Quite possibly.  I wasn't looking at patch contents all that closely.

> I re-enabled NetBSD regression, with voting disabled until the mess is
> fixed.

That's fine.  I left a note for you in the script, regarding what I
think it needs to do at that point.
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] rackspace-netbsd7-regression-triggered has been disabled

Jeff Darcy  wrote:

> > http://review.gluster.org/#/c/9970/ (Kotresh HR)
> > extras: Fix stop-all-gluster-processes.sh script

Theses are the NetBSD regression failures for which we got fixes merged
recently. Doesn't it just need to be rebased?

I re-enabled NetBSD regression, with voting disabled until the mess is
fixed.

-- 
Emmanuel Dreyfus
http://hcpnet.free.fr/pubz
m...@netbsd.org
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] rackspace-netbsd7-regression-triggered has been disabled

> I've done the first one.  I'll leave the others for you, so you
> embed the skill :)

Done.  Thanks!  I also canceled the now-superfluous jobs.  Maybe
in my Copious Spare Time(tm) I'll write a script to do this more
easily for other obviously-spurious regression results.
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] rackspace-netbsd7-regression-triggered has been disabled

On 1 Apr 2015, at 00:48, Jeff Darcy  wrote:
>> The following Gerrit patchsets were affected:
>> 
>>http://review.gluster.org/#/c/9557/ (Nandaja Varma)
>>changelog: Fixing buffer overrun coverity issues
>> 
>>http://review.gluster.org/#/c/9981/ (Pranith Kumar Karampuri)
>>cluster/ec: Refactor inode-writev
>> 
>>http://review.gluster.org/#/c/9970/ (Kotresh HR)
>>extras: Fix stop-all-gluster-processes.sh script
>> 
>>http://review.gluster.org/#/c/10075/ (Jeff Darcy)
>>socket: use OpenSSL multi-threading interfaces
>>this one nuked a CR+1 (from Kaleb) as well as V+1
>> 
>> In the absence of any other obvious way to fix this up, I'll
>> start new jobs for these momentarily.
> 
> Found another one:
> 
>http://review.gluster.org/#/c/9859/ (Raghavendra Talur)
>libglusterfs/syncop: Add xdata to all syncop calls
> 
> Started a new job for that one too.

If you have a build.gluster.org login, fixing this is pretty
simple.  Doesn't need the job to be re-run. ;)

All you need to do is change to the jenkins user (on
build.gluster.org) then run the command that's at the bottom
of the regression test run.

For example, looking at the regression run for the first
issue you have in the list:

  
http://build.gluster.org/job/rackspace-regression-2GB-triggered/6198/consoleFull

At the very end of the regression run, it shows this:

  ssh bu...@review.gluster.org gerrit review --message 
''\''http://build.gluster.org/job/rackspace-regression-2GB-triggered/6198/consoleFull
 : SUCCESS'\''' --project=glusterfs --verified=+1 --code-review=0 
ab9bdb54f89a6f8080f8b338b32b23698e9de515

Running that command from the jenkins user on build.gluster.org
resend the SUCCESS message to Gerrit:

 [homepc]$ ssh build.gluster.org

 [justin@build]$ sudo su - jenkins

 [jenkins@build]$ ssh bu...@review.gluster.org gerrit review --message 
''\''http://build.gluster.org/job/rackspace-regression-2GB-triggered/6198/consoleFull
 : SUCCESS'\''' --project=glusterfs --verified=+1 --code-review=0 
ab9bdb54f89a6f8080f8b338b32b23698e9de515
 [jenkins@build]$

And it's done. ;)

I've done the first one.  I'll leave the others for you, so you
embed the skill :)

Regards and best wishes,

Justin Clift

--
GlusterFS - http://www.gluster.org

An open source, distributed file system scaling to several
petabytes, and handling thousands of clients.

My personal twitter: twitter.com/realjustinclift

___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] rackspace-netbsd7-regression-triggered has been disabled

> The following Gerrit patchsets were affected:
> 
> http://review.gluster.org/#/c/9557/ (Nandaja Varma)
> changelog: Fixing buffer overrun coverity issues
> 
> http://review.gluster.org/#/c/9981/ (Pranith Kumar Karampuri)
> cluster/ec: Refactor inode-writev
> 
> http://review.gluster.org/#/c/9970/ (Kotresh HR)
> extras: Fix stop-all-gluster-processes.sh script
> 
> http://review.gluster.org/#/c/10075/ (Jeff Darcy)
> socket: use OpenSSL multi-threading interfaces
> this one nuked a CR+1 (from Kaleb) as well as V+1
> 
> In the absence of any other obvious way to fix this up, I'll
> start new jobs for these momentarily.

Found another one:

http://review.gluster.org/#/c/9859/ (Raghavendra Talur)
libglusterfs/syncop: Add xdata to all syncop calls

Started a new job for that one too.

___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] rackspace-netbsd7-regression-triggered has been disabled

The following Gerrit patchsets were affected:

http://review.gluster.org/#/c/9557/ (Nandaja Varma)
changelog: Fixing buffer overrun coverity issues

http://review.gluster.org/#/c/9981/ (Pranith Kumar Karampuri)
cluster/ec: Refactor inode-writev

http://review.gluster.org/#/c/9970/ (Kotresh HR)
extras: Fix stop-all-gluster-processes.sh script

http://review.gluster.org/#/c/10075/ (Jeff Darcy)
socket: use OpenSSL multi-threading interfaces
this one nuked a CR+1 (from Kaleb) as well as V+1

In the absence of any other obvious way to fix this up, I'll
start new jobs for these momentarily.
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel

[Gluster-devel] rackspace-netbsd7-regression-triggered has been disabled

It was improperly clearing previously-set V+1 flags, even on success.  That is 
counterproductive in the most literal sense of the word.
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel

[Gluster-devel] Major function rename needed for uuid_*() API

Manu noticed a very ugly issue related to different implementations/API
for uuid_*() functions. It seems that not all OS implementations of uuid
functions have the same API. This is not a major issue, Gluster carries
contrib/uuid/ for this.

However, applications that trigger loading of libglusterfs.so through a
dlopen() call, might have uuid_* symbols loaded already. On Linux this
problem is likely not noticible, because the symbols from libuuid and
libglusterfs do not conflict. Unfortunately, on NetBSD the libc library
provides the same uuid_* symbols, but these expect different parameters.

The plan to clean this up, and fix the dlopen() loading on NetBSD is
like this:

1. replace/rename all uuid_*() functions with gf_uuid_*()
   NetBSD can use the contrib/uuid (with gf_ prefix) symbols

2. glue the OS implementations of uuid_*() functions into libglusterfs,
   replacing the gf_uuid_*() functions from contrib/uuid
   - this can be done gradually, contrib/uuid will become unneeded when
 a glue layer is available

3. once all OS glue layers are in place, remove contrib/uuid completely


Please keep an eye out for patch #8 from Many:

http://review.gluster.org/10017

For tracking this particular issue, Bug 1206587 was opened. The patch
above should make it for the 3.7 release, but points 2 and 3 do not have
the same high priority.

Thanks,
Niels
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] Extra overnight regression test run results

On Tue, Mar 31, 2015 at 01:33:49PM +0100, Justin Clift wrote:
> Hi all,
> 
> Ran 20 x regression test jobs on (severely resource
> constrained) 1GB Rackspace VM's last night (in addition to the
> 20x normal VM's ones also run).
> 
> The 1GB VM's have much much slower disk, only one virtual CPU,
> and 1/2 the RAM of our "standard" 2GB testing VMs.
> 
> These are the failure results:
> 
>   * 20 x tests/basic/mount-nfs-auth.t
> Failed test:  40
> 
> 100% fail rate. ;)

Jiffin is working on improving this, should be ready soon:

http://review.gluster.org/10047

Cheers,
Niels

> 
>   * 20 x tests/basic/uss.t
> Failed tests:  149, 151-153, 157-159
> 
> 100% fail rate
> 
>   * 11 x tests/bugs/distribute/bug-1117851.t
> Failed test:  15
> 
> 55% fail rate
> 
>   * 2 x tests/performance/open-behind.t
> Failed test:  17
> 
> 10% fail rate
> 
>   * 1 x tests/basic/afr/self-heald.t
> Failed tests:  13-14, 16, 19-29, 32-50, 52-65,
>67-75, 77, 79-81
> 
> 5% fail rate
> 
>   * 1 x tests/basic/afr/entry-self-heal.t
> Failed tests:  127-128
> 
> 5% fail rate
> 
>   * 1 x tests/features/trash.t
> Failed test:  57
> 
> 5% fail rate
> 
> Wouldn't surprise me if some/many of the failures are due to
> time out of various sorts in tests.  Very slow VMs. ;)
> 
> Also, most of the regression runs produced cores.  Here are
> the first two:
> 
>   http://ded.ninja/gluster/blk0/
>   http://ded.ninja/gluster/blk1/
> 
> Hoping someone has some time to check those quickly and see
> if there's anything useful in them or not.
> 
> (the hosts are all still online atm, shortly to be nuked)
> 
> Regards and best wishes,
> 
> Justin Clift
> 
> --
> GlusterFS - http://www.gluster.org
> 
> An open source, distributed file system scaling to several
> petabytes, and handling thousands of clients.
> 
> My personal twitter: twitter.com/realjustinclift
> 
> ___
> Gluster-devel mailing list
> Gluster-devel@gluster.org
> http://www.gluster.org/mailman/listinfo/gluster-devel


pgpuI72yYMmXM.pgp
Description: PGP signature
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] Extra overnight regression test run results

2015-03-31 Thread Nithya Balachandran

- Original Message -
From: "Justin Clift" 
To: "Gluster Devel" 
Sent: Tuesday, 31 March, 2015 6:03:49 PM
Subject: [Gluster-devel] Extra overnight regression test run results

Hi all,

Ran 20 x regression test jobs on (severely resource
constrained) 1GB Rackspace VM's last night (in addition to the
20x normal VM's ones also run).

The 1GB VM's have much much slower disk, only one virtual CPU,
and 1/2 the RAM of our "standard" 2GB testing VMs.

These are the failure results:

  * 20 x tests/basic/mount-nfs-auth.t
Failed test:  40

100% fail rate. ;)

  * 20 x tests/basic/uss.t
Failed tests:  149, 151-153, 157-159

100% fail rate

  * 11 x tests/bugs/distribute/bug-1117851.t
Failed test:  15

55% fail rate

Is the test output for the bug-1117851.t failure available anywhere? 

Nithya

___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] About split-brain-resolution.t

Anuradha Talur  wrote:

> 1) I send a patch today to revert the .t and send it again along with the fix.
> Or...
> 2) Can this be failure be ignored till the fix is merged in?

We can ignore: NetBSD regresssion skips the test for now.

-- 
Emmanuel Dreyfus
http://hcpnet.free.fr/pubz
m...@netbsd.org
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel

[Gluster-devel] Review request for patch: libglusterfs/syncop: Add xdata to all syncop calls

2015-03-31 Thread Raghavendra Talur


Hi,

I have sent updated patch which adds xdata support to all syncop calls.
It adds xdata in both request and response path of syncop.

Considering that this patch has changes in many files,
I request a quick review and merge to avoid rebase issues.

Patch link http://review.gluster.org/#/c/9859/
Bug Id: https://bugzilla.redhat.com/show_bug.cgi?id=1158621

Thanks,
Raghavendra Talur

___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] [HEADS UP] NetBSD regression voting enabled

On Tue, Mar 31, 2015 at 06:21:00AM +0200, Emmanuel Dreyfus wrote:
> On success: verified=0, code-review=0
> On failure: verified=0, code-review=-2

But unfortunately this approach is broken, as Linux regression 
overrides NetBSD's result, even if it does not cast a vote for
code review. 

It seems we need to have two different users in Gerrit to
report NetBSD and Linux regressions. Opinion, anyone?

-- 
Emmanuel Dreyfus
m...@netbsd.org
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] Extra overnight regression test run results

2015-03-31 Thread Vijay Bellur


On 03/31/2015 06:48 PM, Shyam wrote:

On 03/31/2015 08:33 AM, Justin Clift wrote:

Hi all,

Ran 20 x regression test jobs on (severely resource
constrained) 1GB Rackspace VM's last night (in addition to the
20x normal VM's ones also run).

The 1GB VM's have much much slower disk, only one virtual CPU,
and 1/2 the RAM of our "standard" 2GB testing VMs.

These are the failure results:

   * 20 x tests/basic/mount-nfs-auth.t
 Failed test:  40

 100% fail rate. ;)

   * 20 x tests/basic/uss.t
 Failed tests:  149, 151-153, 157-159

 100% fail rate

   * 11 x tests/bugs/distribute/bug-1117851.t
 Failed test:  15

 55% fail rate

   * 2 x tests/performance/open-behind.t
 Failed test:  17

 10% fail rate

   * 1 x tests/basic/afr/self-heald.t
 Failed tests:  13-14, 16, 19-29, 32-50, 52-65,
67-75, 77, 79-81

 5% fail rate

   * 1 x tests/basic/afr/entry-self-heal.t
 Failed tests:  127-128

 5% fail rate

   * 1 x tests/features/trash.t
 Failed test:  57

 5% fail rate

Wouldn't surprise me if some/many of the failures are due to
time out of various sorts in tests.  Very slow VMs. ;)

Also, most of the regression runs produced cores.  Here are
the first two:

   http://ded.ninja/gluster/blk0/


There are 4 cores here, 3 pointing to the (by now hopefully) famous bug
#1195415. One of the cores exhibit a different stack etc. Need more
analysis to see what the issue could be here, core file: core.16937



Adding Pranith as he mentioned a possible root cause for this now famous 
bug :).


-Vijay
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] Extra overnight regression test run results

2015-03-31 Thread Shyam


On 03/31/2015 08:33 AM, Justin Clift wrote:

Hi all,

Ran 20 x regression test jobs on (severely resource
constrained) 1GB Rackspace VM's last night (in addition to the
20x normal VM's ones also run).

The 1GB VM's have much much slower disk, only one virtual CPU,
and 1/2 the RAM of our "standard" 2GB testing VMs.

These are the failure results:

   * 20 x tests/basic/mount-nfs-auth.t
 Failed test:  40

 100% fail rate. ;)

   * 20 x tests/basic/uss.t
 Failed tests:  149, 151-153, 157-159

 100% fail rate

   * 11 x tests/bugs/distribute/bug-1117851.t
 Failed test:  15

 55% fail rate

   * 2 x tests/performance/open-behind.t
 Failed test:  17

 10% fail rate

   * 1 x tests/basic/afr/self-heald.t
 Failed tests:  13-14, 16, 19-29, 32-50, 52-65,
67-75, 77, 79-81

 5% fail rate

   * 1 x tests/basic/afr/entry-self-heal.t
 Failed tests:  127-128

 5% fail rate

   * 1 x tests/features/trash.t
 Failed test:  57

 5% fail rate

Wouldn't surprise me if some/many of the failures are due to
time out of various sorts in tests.  Very slow VMs. ;)

Also, most of the regression runs produced cores.  Here are
the first two:

   http://ded.ninja/gluster/blk0/


There are 4 cores here, 3 pointing to the (by now hopefully) famous bug 
#1195415. One of the cores exhibit a different stack etc. Need more 
analysis to see what the issue could be here, core file: core.16937



   http://ded.ninja/gluster/blk1/


There is a single core here, pointing to the above bug again.



Hoping someone has some time to check those quickly and see
if there's anything useful in them or not.

(the hosts are all still online atm, shortly to be nuked)

Regards and best wishes,

Justin Clift

--
GlusterFS - http://www.gluster.org

An open source, distributed file system scaling to several
petabytes, and handling thousands of clients.

My personal twitter: twitter.com/realjustinclift


___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel

[Gluster-devel] Extra overnight regression test run results

Hi all,

Ran 20 x regression test jobs on (severely resource
constrained) 1GB Rackspace VM's last night (in addition to the
20x normal VM's ones also run).

The 1GB VM's have much much slower disk, only one virtual CPU,
and 1/2 the RAM of our "standard" 2GB testing VMs.

These are the failure results:

  * 20 x tests/basic/mount-nfs-auth.t
Failed test:  40

100% fail rate. ;)

  * 20 x tests/basic/uss.t
Failed tests:  149, 151-153, 157-159

100% fail rate

  * 11 x tests/bugs/distribute/bug-1117851.t
Failed test:  15

55% fail rate

  * 2 x tests/performance/open-behind.t
Failed test:  17

10% fail rate

  * 1 x tests/basic/afr/self-heald.t
Failed tests:  13-14, 16, 19-29, 32-50, 52-65,
   67-75, 77, 79-81

5% fail rate

  * 1 x tests/basic/afr/entry-self-heal.t
Failed tests:  127-128

5% fail rate

  * 1 x tests/features/trash.t
Failed test:  57

5% fail rate

Wouldn't surprise me if some/many of the failures are due to
time out of various sorts in tests.  Very slow VMs. ;)

Also, most of the regression runs produced cores.  Here are
the first two:

  http://ded.ninja/gluster/blk0/
  http://ded.ninja/gluster/blk1/

Hoping someone has some time to check those quickly and see
if there's anything useful in them or not.

(the hosts are all still online atm, shortly to be nuked)

Regards and best wishes,

Justin Clift

--
GlusterFS - http://www.gluster.org

An open source, distributed file system scaling to several
petabytes, and handling thousands of clients.

My personal twitter: twitter.com/realjustinclift

___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] About split-brain-resolution.t

2015-03-31 Thread Anuradha Talur



- Original Message -
> From: "Pranith Kumar Karampuri" 
> To: "Emmanuel Dreyfus" 
> Cc: gluster-devel@gluster.org, "Anuradha Talur" 
> Sent: Monday, 30 March, 2015 6:09:58 PM
> Subject: Re: [Gluster-devel] About split-brain-resolution.t
> 
> 
> On 03/30/2015 06:01 PM, Emmanuel Dreyfus wrote:
> > On Mon, Mar 30, 2015 at 05:44:23PM +0530, Pranith Kumar Karampuri wrote:
> >> Problem here is that ' inode_forget' is coming even before it gets to
> >> inspect the file. We initially thought we should 'ref' the inode when the
> >> user specifies the choice and 'unref' it at the time of 'finalize' or
> >> 'abort' of the operation. But that may lead to un-necessary leaks when the
> >> user forgets to do either finalize/abort the operation. One way to get
> >> around it is to ref the inode for some 'pre-determined time' when 'choice'
> >> is given.
> > That suggests the design is not finalized ans the implementation ought to
> > have unwanted behaviors. IMO the test should be retired until the design
> > and implementation is completed.
> I will work with Anuradha tomorrow about this one and either send a
> patch to remove the .t file or send the fix which makes things right.
> 
> Pranith
Hi Emmanuel,

I spoke with Pranith about the issue. I'll need 2 days to send a fix.
One of the two things can be done :
Either..
1) I send a patch today to revert the .t and send it again along with the fix.
Or...
2) Can this be failure be ignored till the fix is merged in?
> >
> 
> 

-- 
Thanks,
Anuradha.
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] feature/trash and NetBSD

2015-03-31 Thread Anoop C S



On 03/31/2015 02:49 PM, Emmanuel Dreyfus wrote:
> On Tue, Mar 31, 2015 at 10:57:12AM +0530, Anoop C S wrote:
>> The above mentioned patch for skipping extended truncate
>> [http://review.gluster.org/#/c/9984/] got merged yesterday. And some
>> portability fixes for trash.t was included in your recently merged patch
>> [http://review.gluster.org/#/c/10033/]. Now we expect trash.t to run
>> more smoothly than before on NetBSD. Feel free to reply with outstanding
>> failures.
> 
> There are other problems, many tieming issues that can be addressed
> using the appropriate wrappers (see patch below). However, it still fails
> on test 56 which is about restarting the volume:
> 

Thanks for the patch.

> TEST 56 (line 207): gluster --mode=script --wignore volume start patchy1 force
> [09:12:53] ./tests/features/trash.t .. 56/65 
> not ok 56 
> 
> Could you have a look? You will find the test ready to run with my
> latest patches on nbslave76.cloud.gluster.org:/autobuild/glusterfs
> 

Thanks for spending your valuable time on trash.t. I will login and
check now. By the way, what is the password for root login?

--Anoop C S.

> diff --git a/tests/features/trash.t b/tests/features/trash.t
> index cbcff23..4546b57 100755
> --- a/tests/features/trash.t
> +++ b/tests/features/trash.t
> @@ -7,7 +7,11 @@ cleanup
>  
>  test_mount() {
>  glusterfs -s $H0 --volfile-id $V0 $M0 --attribute-timeout=0
> -test -d $M0/.trashcan
> +timeout=0
> +while [ $timeout -lt $PROCESS_UP_TIMEOUT ] ; do
> + timeout=$(( $timeout + 1 ))
> +test -d $M0/.trashcan && break
> +done
>  }
>  
>  start_vol() {
> @@ -15,19 +19,23 @@ start_vol() {
>  test_mount
>  }
>  
> -stop_vol() {
> -umount $M0
> -$CLI volume stop $V0
> -}
> -
>  create_files() {
>  echo 'Hi' > $1
>  echo 'Hai' > $2
>  }
>  
> -file_exists() {
> -test -e $B0/${V0}1/$1 -o -e $B0/${V0}2/$1
> -test -e $B0/${V0}1/$2 -o -e $B0/${V0}2/$2
> +file_exists () {
> +vol=$1
> + shift
> + for file in `ls $B0/${vol}1/$@ 2>/dev/null` ; do
> +test -e ${file} && { echo "Y"; return 0; }
> +done
> + for file in `ls $B0/${vol}2/$@ 2>/dev/null` ; do
> +test -e ${file} && { echo "Y"; return 0; }
> +done
> +
> + echo "N"
> + return 1;
>  }
>  
>  unlink_op() {
> @@ -85,7 +93,7 @@ EXPECT 'on' volinfo_field $V0 'features.trash'
>  
>  # files directly under mount point [13]
>  create_files $M0/file1 $M0/file2
> -TEST file_exists file1 file2
> +TEST file_exists $V0 file1 file2
>  
>  # perform unlink [14]
>  TEST unlink_op file1
> @@ -96,7 +104,7 @@ TEST truncate_op file2 4
>  # create files directory hierarchy and check [16]
>  mkdir -p $M0/1/2/3
>  create_files $M0/1/2/3/foo1 $M0/1/2/3/foo2
> -TEST file_exists 1/2/3/foo1 1/2/3/foo2
> +TEST file_exists $V0 1/2/3/foo1 1/2/3/foo2
>  
>  # perform unlink [17]
>  TEST unlink_op 1/2/3/foo1
> @@ -113,7 +121,7 @@ EXPECT '/a' volinfo_field $V0 
> 'features.trash-eliminate-path'
>  
>  # create two files and check [21]
>  create_files $M0/a/test1 $M0/a/test2
> -TEST file_exists a/test1 a/test2
> +TEST file_exists $V0 a/test1 a/test2
>  
>  # remove from eliminate pattern [22]
>  rm -f $M0/a/test1
> @@ -131,7 +139,7 @@ EXPECT 'on' volinfo_field $V0 'features.trash-internal-op'
>  
>  # again create two files and check [28]
>  create_files $M0/inop1 $M0/inop2
> -TEST file_exists inop1 inop2
> +TEST file_exists $V0 inop1 inop2
>  
>  # perform unlink [29]
>  TEST unlink_op inop1
> @@ -141,11 +149,12 @@ TEST truncate_op inop2 4
>  
>  # remove one brick and restart the volume [31-33]
>  TEST $CLI volume remove-brick $V0 $H0:$B0/${V0}2 force
> -TEST stop_vol
> +EXPECT_WITHIN $UMOUNT_TIMEOUT "Y" force_umount $M0
> +$CLI volume stop $V0
>  TEST start_vol
>  # again create two files and check [34]
>  create_files $M0/rebal1 $M0/rebal2
> -TEST file_exists rebal1 rebal2
> +TEST file_exists $V0 rebal1 rebal2
>  
>  # add one brick [35-36]
>  TEST $CLI volume add-brick $V0 $H0:$B0/${V0}3
> @@ -158,7 +167,8 @@ sleep 3
>  # check whether rebalance was succesful [38-40]
>  TEST [ -e $B0/${V0}3/rebal2 ]
>  TEST [ -e $B0/${V0}1/.trashcan/internal_op/rebal2* ]
> -TEST stop_vol
> +EXPECT_WITHIN $UMOUNT_TIMEOUT "Y" force_umount $M0
> +$CLI volume stop $V0
>  
>  # create a replicated volume [41]
>  TEST $CLI volume create $V1 replica 2 $H0:$B0/${V1}{1,2}
> @@ -187,9 +197,10 @@ touch $M1/self
>  TEST [ -e $B0/${V1}1/self -a -e $B0/${V1}2/self ]
>  
>  # kill one brick and delete the file from mount point [55]
> -kill `ps aux| grep glusterfsd | awk '{print $2}' | head -1`
> +kill `ps auxww| grep glusterfsd | awk '{print $2}' | head -1`
>  sleep 2
>  rm -f $M1/self
> +sleep 1
>  TEST [ -e $M1/.trashcan/self* ]
>  
>  # force start the volume and trigger the self-heal manually [56]
> @@ -197,7 +208,7 @@ TEST $CLI volume start $V1 force
>  sleep 3
>  
>  # check for the removed file in trash

Re: [Gluster-devel] feature/trash and NetBSD

On Tue, Mar 31, 2015 at 10:57:12AM +0530, Anoop C S wrote:
> The above mentioned patch for skipping extended truncate
> [http://review.gluster.org/#/c/9984/] got merged yesterday. And some
> portability fixes for trash.t was included in your recently merged patch
> [http://review.gluster.org/#/c/10033/]. Now we expect trash.t to run
> more smoothly than before on NetBSD. Feel free to reply with outstanding
> failures.

There are other problems, many tieming issues that can be addressed
using the appropriate wrappers (see patch below). However, it still fails
on test 56 which is about restarting the volume:

TEST 56 (line 207): gluster --mode=script --wignore volume start patchy1 force
[09:12:53] ./tests/features/trash.t .. 56/65 
not ok 56 

Could you have a look? You will find the test ready to run with my
latest patches on nbslave76.cloud.gluster.org:/autobuild/glusterfs

diff --git a/tests/features/trash.t b/tests/features/trash.t
index cbcff23..4546b57 100755
--- a/tests/features/trash.t
+++ b/tests/features/trash.t
@@ -7,7 +7,11 @@ cleanup
 
 test_mount() {
 glusterfs -s $H0 --volfile-id $V0 $M0 --attribute-timeout=0
-test -d $M0/.trashcan
+timeout=0
+while [ $timeout -lt $PROCESS_UP_TIMEOUT ] ; do
+   timeout=$(( $timeout + 1 ))
+test -d $M0/.trashcan && break
+done
 }
 
 start_vol() {
@@ -15,19 +19,23 @@ start_vol() {
 test_mount
 }
 
-stop_vol() {
-umount $M0
-$CLI volume stop $V0
-}
-
 create_files() {
 echo 'Hi' > $1
 echo 'Hai' > $2
 }
 
-file_exists() {
-test -e $B0/${V0}1/$1 -o -e $B0/${V0}2/$1
-test -e $B0/${V0}1/$2 -o -e $B0/${V0}2/$2
+file_exists () {
+vol=$1
+   shift
+   for file in `ls $B0/${vol}1/$@ 2>/dev/null` ; do
+test -e ${file} && { echo "Y"; return 0; }
+done
+   for file in `ls $B0/${vol}2/$@ 2>/dev/null` ; do
+test -e ${file} && { echo "Y"; return 0; }
+done
+
+   echo "N"
+   return 1;
 }
 
 unlink_op() {
@@ -85,7 +93,7 @@ EXPECT 'on' volinfo_field $V0 'features.trash'
 
 # files directly under mount point [13]
 create_files $M0/file1 $M0/file2
-TEST file_exists file1 file2
+TEST file_exists $V0 file1 file2
 
 # perform unlink [14]
 TEST unlink_op file1
@@ -96,7 +104,7 @@ TEST truncate_op file2 4
 # create files directory hierarchy and check [16]
 mkdir -p $M0/1/2/3
 create_files $M0/1/2/3/foo1 $M0/1/2/3/foo2
-TEST file_exists 1/2/3/foo1 1/2/3/foo2
+TEST file_exists $V0 1/2/3/foo1 1/2/3/foo2
 
 # perform unlink [17]
 TEST unlink_op 1/2/3/foo1
@@ -113,7 +121,7 @@ EXPECT '/a' volinfo_field $V0 
'features.trash-eliminate-path'
 
 # create two files and check [21]
 create_files $M0/a/test1 $M0/a/test2
-TEST file_exists a/test1 a/test2
+TEST file_exists $V0 a/test1 a/test2
 
 # remove from eliminate pattern [22]
 rm -f $M0/a/test1
@@ -131,7 +139,7 @@ EXPECT 'on' volinfo_field $V0 'features.trash-internal-op'
 
 # again create two files and check [28]
 create_files $M0/inop1 $M0/inop2
-TEST file_exists inop1 inop2
+TEST file_exists $V0 inop1 inop2
 
 # perform unlink [29]
 TEST unlink_op inop1
@@ -141,11 +149,12 @@ TEST truncate_op inop2 4
 
 # remove one brick and restart the volume [31-33]
 TEST $CLI volume remove-brick $V0 $H0:$B0/${V0}2 force
-TEST stop_vol
+EXPECT_WITHIN $UMOUNT_TIMEOUT "Y" force_umount $M0
+$CLI volume stop $V0
 TEST start_vol
 # again create two files and check [34]
 create_files $M0/rebal1 $M0/rebal2
-TEST file_exists rebal1 rebal2
+TEST file_exists $V0 rebal1 rebal2
 
 # add one brick [35-36]
 TEST $CLI volume add-brick $V0 $H0:$B0/${V0}3
@@ -158,7 +167,8 @@ sleep 3
 # check whether rebalance was succesful [38-40]
 TEST [ -e $B0/${V0}3/rebal2 ]
 TEST [ -e $B0/${V0}1/.trashcan/internal_op/rebal2* ]
-TEST stop_vol
+EXPECT_WITHIN $UMOUNT_TIMEOUT "Y" force_umount $M0
+$CLI volume stop $V0
 
 # create a replicated volume [41]
 TEST $CLI volume create $V1 replica 2 $H0:$B0/${V1}{1,2}
@@ -187,9 +197,10 @@ touch $M1/self
 TEST [ -e $B0/${V1}1/self -a -e $B0/${V1}2/self ]
 
 # kill one brick and delete the file from mount point [55]
-kill `ps aux| grep glusterfsd | awk '{print $2}' | head -1`
+kill `ps auxww| grep glusterfsd | awk '{print $2}' | head -1`
 sleep 2
 rm -f $M1/self
+sleep 1
 TEST [ -e $M1/.trashcan/self* ]
 
 # force start the volume and trigger the self-heal manually [56]
@@ -197,7 +208,7 @@ TEST $CLI volume start $V1 force
 sleep 3
 
 # check for the removed file in trashcan [57]
-TEST [ -e $B0/${V1}1/.trashcan/internal_op/self* -o -e 
$B0/${V1}2/.trashcan/internal_op/self* ]
+EXPECT_WITHIN $HEAL_TIMEOUT "Y" file_exists $V1 .trashcan/internal_op/self*
 
 # check renaming of trash directory through cli [58-62]
 TEST $CLI volume set $V0 trash-dir abc


-- 
Emmanuel Dreyfus
m...@netbsd.org
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel

[Gluster-devel] Rebalance improvement design

2015-03-31 Thread Susant Palai

Hi,
Posted patch for rebalance improvement here:
http://review.gluster.org/#/c/9657/ .
You can find the feature page here:
http://www.gluster.org/community/documentation/index.php/Features/improve_rebalance_performance

The current patch address two part of the design proposed.
1. Rebalance multiple files in parallel
2. Crawl only bricks that belong to the current node

Brief design explanation for the above two points.

1. Rebalance multiple files in parallel:
-

The existing rebalance engine is single threaded. Hence, introduced
multiple threads which will be running parallel to the crawler.
The current rebalance migration is converted to a "Producer-Consumer"
frame work.
Where Producer is : Crawler
Consumer is : Migrating Threads

Crawler: Crawler is the main thread. The job of the crawler is now
limited to fix-layout of each directory and add the files
which are eligible for the migration to a global queue. Hence,
the crawler will not be "blocked" by migration process.

Producer: Producer will monitor the global queue. If any file is added
to this queue, it will dqueue that entry and migrate the file.
Currently 15 migration threads are spawned at the beginning of
the rebalance process. Hence, multiple file migration
happens in parallel.

2. Crawl only bricks that belong to the current node:
--

As rebalance process is spawned per node, it migrates only the files
that belongs to it's own node for the sake of load
balancing. But it also reads entries from the whole cluster, which
is not necessary as readdir hits other nodes.

New Design:
As part of the new design the rebalancer decides the subvols that
are local to the rebalancer node by checking the node-uuid of
root directory prior to the crawler starts. Hence, readdir won't hit
the whole cluster as it has already the context of
local subvols and also node-uuid request for each file can be
avoided. This makes the rebalance process "more scalable".

Requesting reviews asap.

Regards,
Susant

___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] Additional pre-post checks(WAS: Responsibilities and expectations of our maintainers)

2015-03-31 Thread Kaushal M

On Tue, Mar 31, 2015 at 12:49 PM, Niels de Vos  wrote:
>
> On Tue, Mar 31, 2015 at 12:14:29PM +0530, Vijay Bellur wrote:
> > On 03/28/2015 02:08 PM, Emmanuel Dreyfus wrote:
> > >Pranith Kumar Karampuri  wrote:
> > >
> > >>Emmanuel,
> > >>  What can we do to make it vote -2 when it fails? Things will
> > >>automatically fall in place if it gives -2.
> > >
> > >I will do this once I will have recovered. The changelog change broke
> > >regression for weeks, and now we have a fix for it I discover many other
> > >poblems have crop.
> > >
> > >While there, to anyone:
> > >- dd bs=1M is not portable. Use
> > >   dd bs=1024k
> > >- echo 3 > /proc/sys/vm/drop_caches is not portable. use instead this
> > >command that fails but flushes inodes first.
> > >   ( cd $M0 && umount $M0 )
> > >- umount $N0 brings many problems, use instead
> > >   EXPECT_WITHIN $UMOUNT_TIMEOUT "Y" umount_nfs $N0
> > >
> >
> >
> > I wonder if we can add these as checks to flag errors in checkpatch.pl so
> > that we nip these problems off even before they appear for review?
>
> That would surely be good. I heard that Kaushal understands and can
> write Perl ;-)
>

This is not true. I can understand the hieroglyphics with some
difficulty, but I sure cannot write it.
But if needed, I could try.

>
> While on the topic of checkpatch.pl, having a check for empty commit
> messages and multi-line subjects would be nice too.
>
> Niels
> ___
> Gluster-devel mailing list
> Gluster-devel@gluster.org
> http://www.gluster.org/mailman/listinfo/gluster-devel
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel

[Gluster-devel] Additional pre-post checks(WAS: Responsibilities and expectations of our maintainers)

On Tue, Mar 31, 2015 at 12:14:29PM +0530, Vijay Bellur wrote:
> On 03/28/2015 02:08 PM, Emmanuel Dreyfus wrote:
> >Pranith Kumar Karampuri  wrote:
> >
> >>Emmanuel,
> >>  What can we do to make it vote -2 when it fails? Things will
> >>automatically fall in place if it gives -2.
> >
> >I will do this once I will have recovered. The changelog change broke
> >regression for weeks, and now we have a fix for it I discover many other
> >poblems have crop.
> >
> >While there, to anyone:
> >- dd bs=1M is not portable. Use
> >   dd bs=1024k
> >- echo 3 > /proc/sys/vm/drop_caches is not portable. use instead this
> >command that fails but flushes inodes first.
> >   ( cd $M0 && umount $M0 )
> >- umount $N0 brings many problems, use instead
> >   EXPECT_WITHIN $UMOUNT_TIMEOUT "Y" umount_nfs $N0
> >
> 
> 
> I wonder if we can add these as checks to flag errors in checkpatch.pl so
> that we nip these problems off even before they appear for review?

That would surely be good. I heard that Kaushal understands and can
write Perl ;-)

While on the topic of checkpatch.pl, having a check for empty commit
messages and multi-line subjects would be nice too.

Niels
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] Security hardening RELRO & PIE flags