Re: [Gluster-devel] GlusterFS-3.4.4beta is slipping
On 05/15/2014 08:37 AM, Pranith Kumar Karampuri wrote: - Original Message - From: "Kaleb S. KEITHLEY" To: "Gluster Devel" Sent: Wednesday, May 14, 2014 5:04:02 PM Subject: [Gluster-devel] GlusterFS-3.4.4beta is slipping At last week's community meeting we tentatively agreed that today — May 14th — we would ship 3.4.4beta. Three changes for 3.4.4 need to be reviewed before they can be merged: 1 Ubuntu code audit results (blocking inclusion in Ubuntu Main repo): https://bugzilla.redhat.com/show_bug.cgi?id=1086460 http://review.gluster.org/#/c/7583/ (also http://review.gluster.org/#/c/7605/ for 3.5.1) Done with the review. Please address the comments and re-submit. 2 Addition of new server after upgrade from 3.3 results in peer rejected: https://bugzilla.redhat.com/show_bug.cgi?id=1090298 http://review.gluster.org/#/c/7729/ Kp may have a better idea, CCed him. Abandoned this patch. The reason and the workaround are detailed in the review comments [1] TL;DR fix: After upgrading all nodes to 3.4 and before doing any new peer probes, do a dummy volume set operation, viz: `gluster volume set brick-log-level INFO` to update the checksum. The documentation for upgrading to 3.4 [2] seems to be pointing to Vijay's blog. The above fix could be included in the steps mentioned there. (CC'ing Vijay) Regards, Ravi [1] http://review.gluster.org/#/c/7729/1/xlators/mgmt/glusterd/src/glusterd-store.c [2] http://www.gluster.org/community/documentation/index.php/Main_Page#GlusterFS_3.4 3 Disabling NFS causes E level errors in nfs.log: https://bugzilla.redhat.com/show_bug.cgi?id=1095330 http://review.gluster.org/#/c/7699/ Reviewed it and merged. I do not maintain nfs component but the patch is same as the one in 3.5 except for the tabs to spaces conversion. I tested it on my machine before merging. And Joe Julian added https://bugzilla.redhat.com/show_bug.cgi?id=1095596 as a blocker for 3.4.4 — Stick to IANA standard while allocating brick ports. This needs a port or rebase of http://review.gluster.com/#/c/3339/ -- Kaleb ___ Gluster-devel mailing list Gluster-devel@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-devel ___ Gluster-devel mailing list Gluster-devel@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-devel ___ Gluster-devel mailing list Gluster-devel@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] Borking Gluster
On Fri, May 16, 2014 at 12:06 AM, Krishnan Parthasarathi wrote: > Which version of gluster are you using? 3.5.0 on CentOS 6.5 x86_64. Sorry I forgot to mention it. Cheers, James > > thanks, > Krish > > - Original Message - >> Due to some weird automation things, I noticed the following: >> >> Given a cluster of hosts A,B,C,D >> >> It turns out that if you restart glusterd on host B while you are >> running volume create on host A, this can cause host B to be borked. >> This means: glusterd will refuse to start, and the only fix I found >> was to delete the volume data from it, and re-create the volume. Not >> sure if this is useful or not, but reproducing this is pretty easy in >> case this uncovers a code path that isn't working properly. >> >> HTH, >> James >> ___ >> Gluster-devel mailing list >> Gluster-devel@gluster.org >> http://supercolony.gluster.org/mailman/listinfo/gluster-devel >> ___ Gluster-devel mailing list Gluster-devel@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] Borking Gluster
Which version of gluster are you using? thanks, Krish - Original Message - > Due to some weird automation things, I noticed the following: > > Given a cluster of hosts A,B,C,D > > It turns out that if you restart glusterd on host B while you are > running volume create on host A, this can cause host B to be borked. > This means: glusterd will refuse to start, and the only fix I found > was to delete the volume data from it, and re-create the volume. Not > sure if this is useful or not, but reproducing this is pretty easy in > case this uncovers a code path that isn't working properly. > > HTH, > James > ___ > Gluster-devel mailing list > Gluster-devel@gluster.org > http://supercolony.gluster.org/mailman/listinfo/gluster-devel > ___ Gluster-devel mailing list Gluster-devel@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-devel
[Gluster-devel] Volume failed to create (but did)
Hi, When automatically building volumes, a volume create failed: volume create: puppet: failed: Commit failed on ----. Please check log file for details. The funny thing was that 'gluster volume info' showed a normal looking volume, and starting it worked fine. Attached are all the logs. Hopefully someone can decipher this, and maybe kill a gluster bug. HTH, James PS: Cluster was a two host, Replica=2, single volume, with two disks each host, all running in VM's. glusterfs.annex1.tar.gz Description: GNU Zip compressed data glusterfs.annex2.tar.gz Description: GNU Zip compressed data ___ Gluster-devel mailing list Gluster-devel@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-devel
[Gluster-devel] Borking Gluster
Due to some weird automation things, I noticed the following: Given a cluster of hosts A,B,C,D It turns out that if you restart glusterd on host B while you are running volume create on host A, this can cause host B to be borked. This means: glusterd will refuse to start, and the only fix I found was to delete the volume data from it, and re-create the volume. Not sure if this is useful or not, but reproducing this is pretty easy in case this uncovers a code path that isn't working properly. HTH, James ___ Gluster-devel mailing list Gluster-devel@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-devel
[Gluster-devel] Gluster SELinux attributes
Hey Gluster. I realized that things might have changed at some point, so I was hoping someone could help me get this straight. Who can enumerate or provide reference to a list of all the files that should have specific SELinux attributes in Gluster, and what each of them is. Currently, /var/lib/glusterd/glusterd.info has the seluser set as 'system_u', but something keeps changing it to 'unconfined_u' so perhaps that is now what is correct. (It didn't used to be.) What other files need attrs and what should they be? Thanks, James ___ Gluster-devel mailing list Gluster-devel@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] Need inputs for command deprecation output
On 05/16/2014 07:23 AM, Pranith Kumar Karampuri wrote: - Original Message - From: "Ravishankar N" To: "Pranith Kumar Karampuri" , "Gluster Devel" Sent: Friday, May 16, 2014 7:15:58 AM Subject: Re: [Gluster-devel] Need inputs for command deprecation output On 05/16/2014 06:25 AM, Pranith Kumar Karampuri wrote: Hi, As part of changing behaviour of 'volume heal' commands. I want the commands to show the following output. Any feedback in making them better would be awesome :-). root@pranithk-laptop - ~ 06:20:10 :) ⚡ gluster volume heal r2 info healed This command has been deprecated root@pranithk-laptop - ~ 06:20:13 :( ⚡ gluster volume heal r2 info heal-failed This command has been deprecated When a command is deprecated, it still works the way it did but gives out a warning about it not being maintained and possible alternatives to it. If I understand http://review.gluster.org/#/c/7766/ correctly, we are not supporting these commands any more, in which case the right message would be "Command not supported" I am wondering if we should even let the command be sent to self-heal-daemons from glusterd. How about 06:20:10 :) ⚡ gluster volume heal r2 info healed Command not supported. Makes sense; +1 Instead of 06:20:10 :) ⚡ gluster volume heal r2 info healed brick: brick-1 status: Command not supported brick: brick-2 status: Command not supported Pranith -Ravi Pranith. ___ Gluster-devel mailing list Gluster-devel@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-devel ___ Gluster-devel mailing list Gluster-devel@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] Need inputs for command deprecation output
- Original Message - > From: "Ravishankar N" > To: "Pranith Kumar Karampuri" , "Gluster Devel" > > Sent: Friday, May 16, 2014 7:15:58 AM > Subject: Re: [Gluster-devel] Need inputs for command deprecation output > > On 05/16/2014 06:25 AM, Pranith Kumar Karampuri wrote: > > Hi, > > As part of changing behaviour of 'volume heal' commands. I want the > > commands to show the following output. Any feedback in making them > > better would be awesome :-). > > > > root@pranithk-laptop - ~ > > 06:20:10 :) ⚡ gluster volume heal r2 info healed > > This command has been deprecated > > > > root@pranithk-laptop - ~ > > 06:20:13 :( ⚡ gluster volume heal r2 info heal-failed > > This command has been deprecated > When a command is deprecated, it still works the way it did but gives > out a warning about it not being maintained and possible alternatives to it. > If I understand http://review.gluster.org/#/c/7766/ correctly, we are > not supporting these commands any more, in which case the right message > would be "Command not supported" I am wondering if we should even let the command be sent to self-heal-daemons from glusterd. How about 06:20:10 :) ⚡ gluster volume heal r2 info healed Command not supported. Instead of 06:20:10 :) ⚡ gluster volume heal r2 info healed brick: brick-1 status: Command not supported brick: brick-2 status: Command not supported Pranith > > -Ravi > > Pranith. > > ___ > > Gluster-devel mailing list > > Gluster-devel@gluster.org > > http://supercolony.gluster.org/mailman/listinfo/gluster-devel > > ___ Gluster-devel mailing list Gluster-devel@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] Need inputs for command deprecation output
On 05/16/2014 06:25 AM, Pranith Kumar Karampuri wrote: Hi, As part of changing behaviour of 'volume heal' commands. I want the commands to show the following output. Any feedback in making them better would be awesome :-). root@pranithk-laptop - ~ 06:20:10 :) ⚡ gluster volume heal r2 info healed This command has been deprecated root@pranithk-laptop - ~ 06:20:13 :( ⚡ gluster volume heal r2 info heal-failed This command has been deprecated When a command is deprecated, it still works the way it did but gives out a warning about it not being maintained and possible alternatives to it. If I understand http://review.gluster.org/#/c/7766/ correctly, we are not supporting these commands any more, in which case the right message would be "Command not supported" -Ravi Pranith. ___ Gluster-devel mailing list Gluster-devel@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-devel ___ Gluster-devel mailing list Gluster-devel@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] Spurious failures because of nfs and snapshots
- Original Message - > From: "Anand Avati" > To: "Pranith Kumar Karampuri" > Cc: "Gluster Devel" > Sent: Friday, May 16, 2014 6:30:44 AM > Subject: Re: [Gluster-devel] Spurious failures because of nfs and snapshots > > On Thu, May 15, 2014 at 5:49 PM, Pranith Kumar Karampuri < > pkara...@redhat.com> wrote: > > > hi, > > In the latest build I fired for review.gluster.com/7766 ( > > http://build.gluster.org/job/regression/4443/console) failed because of > > spurious failure. The script doesn't wait for nfs export to be available. I > > fixed that, but interestingly I found quite a few scripts with same > > problem. Some of the scripts are relying on 'sleep 5' which also could lead > > to spurious failures if the export is not available in 5 seconds. We found > > that waiting for 20 seconds is better, but 'sleep 20' would unnecessarily > > delay the build execution. So if you guys are going to write any scripts > > which has to do nfs mounts, please do it the following way: > > > > EXPECT_WITHIN 20 "1" is_nfs_export_available; > > TEST mount -t nfs -o vers=3 $H0:/$V0 $N0; > > > > Always please also add mount -o soft,intr in the regression scripts for > mounting nfs. Becomes so much easier to cleanup any "hung" mess. We > probably need an NFS mounting helper function which can be called like: > > TEST mount_nfs $H0:/$V0 $N0; Will do, there seems to be some extra-options(noac etc) for some of these, so will add one more argument for any extra options for nfs mount. Pranith > > Thanks > > Avati > ___ Gluster-devel mailing list Gluster-devel@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] Spurious failures because of nfs and snapshots
On Thu, May 15, 2014 at 5:49 PM, Pranith Kumar Karampuri < pkara...@redhat.com> wrote: > hi, > In the latest build I fired for review.gluster.com/7766 ( > http://build.gluster.org/job/regression/4443/console) failed because of > spurious failure. The script doesn't wait for nfs export to be available. I > fixed that, but interestingly I found quite a few scripts with same > problem. Some of the scripts are relying on 'sleep 5' which also could lead > to spurious failures if the export is not available in 5 seconds. We found > that waiting for 20 seconds is better, but 'sleep 20' would unnecessarily > delay the build execution. So if you guys are going to write any scripts > which has to do nfs mounts, please do it the following way: > > EXPECT_WITHIN 20 "1" is_nfs_export_available; > TEST mount -t nfs -o vers=3 $H0:/$V0 $N0; > Always please also add mount -o soft,intr in the regression scripts for mounting nfs. Becomes so much easier to cleanup any "hung" mess. We probably need an NFS mounting helper function which can be called like: TEST mount_nfs $H0:/$V0 $N0; Thanks Avati ___ Gluster-devel mailing list Gluster-devel@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-devel
[Gluster-devel] Need inputs for command deprecation output
Hi, As part of changing behaviour of 'volume heal' commands. I want the commands to show the following output. Any feedback in making them better would be awesome :-). root@pranithk-laptop - ~ 06:20:10 :) ⚡ gluster volume heal r2 info healed This command has been deprecated root@pranithk-laptop - ~ 06:20:13 :( ⚡ gluster volume heal r2 info heal-failed This command has been deprecated Pranith. ___ Gluster-devel mailing list Gluster-devel@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-devel
[Gluster-devel] Spurious failures because of nfs and snapshots
hi, In the latest build I fired for review.gluster.com/7766 (http://build.gluster.org/job/regression/4443/console) failed because of spurious failure. The script doesn't wait for nfs export to be available. I fixed that, but interestingly I found quite a few scripts with same problem. Some of the scripts are relying on 'sleep 5' which also could lead to spurious failures if the export is not available in 5 seconds. We found that waiting for 20 seconds is better, but 'sleep 20' would unnecessarily delay the build execution. So if you guys are going to write any scripts which has to do nfs mounts, please do it the following way: EXPECT_WITHIN 20 "1" is_nfs_export_available; TEST mount -t nfs -o vers=3 $H0:/$V0 $N0; Please review http://review.gluster.com/7773 :-) I saw one more spurious failure in a snapshot related script tests/bugs/bug-1090042.t on the next build fired by Niels. Joesph (CCed) is debugging it. He agreed to reply what he finds and share it with us so that we won't introduce similar bugs in future. I encourage you guys to share what you fix to prevent spurious failures in future. Thanks Pranith ___ Gluster-devel mailing list Gluster-devel@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] [Gluster-infra] Progress report for regression tests in Rackspace
On 05/15/2014 09:08 PM, Luis Pabon wrote: Should we create bugs for each of these, and divide-and-conquer? That could be of help. First level of consolidation done (with frequency of test failures) by Justin might be a good list to start with. If we observe more failures as part of ongoing regression runs, let us open new bugs and have them cleaned up. -Vijay - Luis On 05/15/2014 10:27 AM, Niels de Vos wrote: On Thu, May 15, 2014 at 06:05:00PM +0530, Vijay Bellur wrote: On 04/30/2014 07:03 PM, Justin Clift wrote: Hi us, Was trying out the GlusterFS regression tests in Rackspace VMs last night for each of the release-3.4, release-3.5, and master branches. The regression test is just a run of "run-tests.sh", from a git checkout of the appropriate branch. The good news is we're adding a lot of testing code with each release: * release-3.4 - 6303 lines (~30 mins to run test) * release-3.5 - 9776 lines (~85 mins to run test) * master - 11660 lines (~90 mins to run test) (lines counted using: $ find tests -type f -iname "*.t" -exec cat {} >> a \;; wc -l a; rm -f a) The bad news is the tests only "kind of" pass now. I say kind of because although the regression run *can* pass for each of these branch's, it's inconsistent. :( Results from testing overnight: * release-3.4 - 20 runs - 17 PASS, 3 FAIL. 85% success. * bug-857330/normal.t failed in one run * bug-887098-gmount-crash.t failed in one run * bug-857330/normal.t failed in one run * release-3.5 - 20 runs, 18 PASS, 2 FAIL. 90% success. * bug-857330/xml.t failed in one run * bug-1004744.t failed in another run (same vm for both failures) * master - 20 runs, 6 PASS, 14 FAIL. 30% success. * bug-1070734.t failed in one run * bug-1087198.t & bug-860663.t failed in one run (same vm as bug-1070734.t failure above) * bug-1087198.t & bug-857330/normal.t failed in one run (new vm, a subsequent run on same vm passed) * bug-1087198.t & bug-948686.t failed in one run (new vm) * bug-1070734.t & bug-1087198.t failed in one run (new vm) * bug-860663.t failed in one run * bug-1023974.t & bug-1087198.t & bug-948686.t failed in one run (new vm) * bug-1004744.t & bug-1023974.t & bug-1087198.t & bug-948686.t failed in one run (new vm) * bug-948686.t failed in one run (new vm) * bug-1070734.t failed in one run (new vm) * bug-1023974.t failed in one run (new vm) * bug-1087198.t & bug-948686.t failed in one run (new vm) * bug-1070734.t failed in one run (new vm) * bug-1087198.t failed in one run (new vm) The occasional failing tests aren't completely random, suggesting something is going on. Possible race conditions maybe? (no idea). * 8 failures - bug-1087198.t * 5 failures - bug-948686.t * 4 failures - bug-1070734.t * 3 failures - bug-1023974.t * 3 failures - bug-857330/normal.t * 2 failures - bug-860663.t * 2 failures - bug-1004744.t * 1 failures - bug-857330/xml.t * 1 failures - bug-887098-gmount-crash.t Anyone have suggestions on how to make this work reliably? I think it would be a good idea to arrive at a list of test cases that are failing at random and assign owners to address them (default owner being the submitter of the test case). In addition to these, I have also seen tests like bd.t and xml.t fail pretty regularly. Justin - can we publish a consolidated list of regression tests that fail and owners for them on an etherpad or similar? Fixing these test cases will enable us to bring in more jenkins instances for parallel regression runs etc. and will also provide more determinism for our regression tests. Your help to address the regression test suite problems will be greatly appreciated! Indeed, getting the regression tests stable seems like a blocker before we can move to a scalable Jenkins solution. Unfortunately, it may not be trivial to debug these test cases... Any suggestion on capturing useful data that helps in figuring out why the test cases don't pass? Thanks, Niels ___ Gluster-devel mailing list Gluster-devel@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-devel ___ Gluster-devel mailing list Gluster-devel@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-devel ___ Gluster-devel mailing list Gluster-devel@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] [Gluster-infra] Progress report for regression tests in Rackspace
On 05/15/2014 07:57 PM, Niels de Vos wrote: On Thu, May 15, 2014 at 06:05:00PM +0530, Vijay Bellur wrote: On 04/30/2014 07:03 PM, Justin Clift wrote: Hi us, Was trying out the GlusterFS regression tests in Rackspace VMs last night for each of the release-3.4, release-3.5, and master branches. The regression test is just a run of "run-tests.sh", from a git checkout of the appropriate branch. The good news is we're adding a lot of testing code with each release: * release-3.4 - 6303 lines (~30 mins to run test) * release-3.5 - 9776 lines (~85 mins to run test) * master - 11660 lines (~90 mins to run test) (lines counted using: $ find tests -type f -iname "*.t" -exec cat {} >> a \;; wc -l a; rm -f a) The bad news is the tests only "kind of" pass now. I say kind of because although the regression run *can* pass for each of these branch's, it's inconsistent. :( Results from testing overnight: * release-3.4 - 20 runs - 17 PASS, 3 FAIL. 85% success. * bug-857330/normal.t failed in one run * bug-887098-gmount-crash.t failed in one run * bug-857330/normal.t failed in one run * release-3.5 - 20 runs, 18 PASS, 2 FAIL. 90% success. * bug-857330/xml.t failed in one run * bug-1004744.t failed in another run (same vm for both failures) * master - 20 runs, 6 PASS, 14 FAIL. 30% success. * bug-1070734.t failed in one run * bug-1087198.t & bug-860663.t failed in one run (same vm as bug-1070734.t failure above) * bug-1087198.t & bug-857330/normal.t failed in one run (new vm, a subsequent run on same vm passed) * bug-1087198.t & bug-948686.t failed in one run (new vm) * bug-1070734.t & bug-1087198.t failed in one run (new vm) * bug-860663.t failed in one run * bug-1023974.t & bug-1087198.t & bug-948686.t failed in one run (new vm) * bug-1004744.t & bug-1023974.t & bug-1087198.t & bug-948686.t failed in one run (new vm) * bug-948686.t failed in one run (new vm) * bug-1070734.t failed in one run (new vm) * bug-1023974.t failed in one run (new vm) * bug-1087198.t & bug-948686.t failed in one run (new vm) * bug-1070734.t failed in one run (new vm) * bug-1087198.t failed in one run (new vm) The occasional failing tests aren't completely random, suggesting something is going on. Possible race conditions maybe? (no idea). * 8 failures - bug-1087198.t * 5 failures - bug-948686.t * 4 failures - bug-1070734.t * 3 failures - bug-1023974.t * 3 failures - bug-857330/normal.t * 2 failures - bug-860663.t * 2 failures - bug-1004744.t * 1 failures - bug-857330/xml.t * 1 failures - bug-887098-gmount-crash.t Anyone have suggestions on how to make this work reliably? I think it would be a good idea to arrive at a list of test cases that are failing at random and assign owners to address them (default owner being the submitter of the test case). In addition to these, I have also seen tests like bd.t and xml.t fail pretty regularly. Justin - can we publish a consolidated list of regression tests that fail and owners for them on an etherpad or similar? Fixing these test cases will enable us to bring in more jenkins instances for parallel regression runs etc. and will also provide more determinism for our regression tests. Your help to address the regression test suite problems will be greatly appreciated! Indeed, getting the regression tests stable seems like a blocker before we can move to a scalable Jenkins solution. Unfortunately, it may not be trivial to debug these test cases... Any suggestion on capturing useful data that helps in figuring out why the test cases don't pass? To start with, obtaining the logs and cores from a failed regression run (/d/logs/...) of build.gluster.org would be useful. Once we start debugging a few problems and notice the necessity for more information, we can start collecting them for a failed regression run. -Vijay ___ Gluster-devel mailing list Gluster-devel@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] [Gluster-infra] Progress report for regression tests in Rackspace
Should we create bugs for each of these, and divide-and-conquer? - Luis On 05/15/2014 10:27 AM, Niels de Vos wrote: On Thu, May 15, 2014 at 06:05:00PM +0530, Vijay Bellur wrote: On 04/30/2014 07:03 PM, Justin Clift wrote: Hi us, Was trying out the GlusterFS regression tests in Rackspace VMs last night for each of the release-3.4, release-3.5, and master branches. The regression test is just a run of "run-tests.sh", from a git checkout of the appropriate branch. The good news is we're adding a lot of testing code with each release: * release-3.4 - 6303 lines (~30 mins to run test) * release-3.5 - 9776 lines (~85 mins to run test) * master - 11660 lines (~90 mins to run test) (lines counted using: $ find tests -type f -iname "*.t" -exec cat {} >> a \;; wc -l a; rm -f a) The bad news is the tests only "kind of" pass now. I say kind of because although the regression run *can* pass for each of these branch's, it's inconsistent. :( Results from testing overnight: * release-3.4 - 20 runs - 17 PASS, 3 FAIL. 85% success. * bug-857330/normal.t failed in one run * bug-887098-gmount-crash.t failed in one run * bug-857330/normal.t failed in one run * release-3.5 - 20 runs, 18 PASS, 2 FAIL. 90% success. * bug-857330/xml.t failed in one run * bug-1004744.t failed in another run (same vm for both failures) * master - 20 runs, 6 PASS, 14 FAIL. 30% success. * bug-1070734.t failed in one run * bug-1087198.t & bug-860663.t failed in one run (same vm as bug-1070734.t failure above) * bug-1087198.t & bug-857330/normal.t failed in one run (new vm, a subsequent run on same vm passed) * bug-1087198.t & bug-948686.t failed in one run (new vm) * bug-1070734.t & bug-1087198.t failed in one run (new vm) * bug-860663.t failed in one run * bug-1023974.t & bug-1087198.t & bug-948686.t failed in one run (new vm) * bug-1004744.t & bug-1023974.t & bug-1087198.t & bug-948686.t failed in one run (new vm) * bug-948686.t failed in one run (new vm) * bug-1070734.t failed in one run (new vm) * bug-1023974.t failed in one run (new vm) * bug-1087198.t & bug-948686.t failed in one run (new vm) * bug-1070734.t failed in one run (new vm) * bug-1087198.t failed in one run (new vm) The occasional failing tests aren't completely random, suggesting something is going on. Possible race conditions maybe? (no idea). * 8 failures - bug-1087198.t * 5 failures - bug-948686.t * 4 failures - bug-1070734.t * 3 failures - bug-1023974.t * 3 failures - bug-857330/normal.t * 2 failures - bug-860663.t * 2 failures - bug-1004744.t * 1 failures - bug-857330/xml.t * 1 failures - bug-887098-gmount-crash.t Anyone have suggestions on how to make this work reliably? I think it would be a good idea to arrive at a list of test cases that are failing at random and assign owners to address them (default owner being the submitter of the test case). In addition to these, I have also seen tests like bd.t and xml.t fail pretty regularly. Justin - can we publish a consolidated list of regression tests that fail and owners for them on an etherpad or similar? Fixing these test cases will enable us to bring in more jenkins instances for parallel regression runs etc. and will also provide more determinism for our regression tests. Your help to address the regression test suite problems will be greatly appreciated! Indeed, getting the regression tests stable seems like a blocker before we can move to a scalable Jenkins solution. Unfortunately, it may not be trivial to debug these test cases... Any suggestion on capturing useful data that helps in figuring out why the test cases don't pass? Thanks, Niels ___ Gluster-devel mailing list Gluster-devel@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-devel ___ Gluster-devel mailing list Gluster-devel@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] [Gluster-infra] Progress report for regression tests in Rackspace
On Thu, May 15, 2014 at 06:05:00PM +0530, Vijay Bellur wrote: > On 04/30/2014 07:03 PM, Justin Clift wrote: > >Hi us, > > > >Was trying out the GlusterFS regression tests in Rackspace VMs last > >night for each of the release-3.4, release-3.5, and master branches. > > > >The regression test is just a run of "run-tests.sh", from a git > >checkout of the appropriate branch. > > > >The good news is we're adding a lot of testing code with each release: > > > > * release-3.4 - 6303 lines (~30 mins to run test) > > * release-3.5 - 9776 lines (~85 mins to run test) > > * master - 11660 lines (~90 mins to run test) > > > >(lines counted using: > > $ find tests -type f -iname "*.t" -exec cat {} >> a \;; wc -l a; rm -f a) > > > >The bad news is the tests only "kind of" pass now. I say kind of because > >although the regression run *can* pass for each of these branch's, it's > >inconsistent. :( > > > >Results from testing overnight: > > > > * release-3.4 - 20 runs - 17 PASS, 3 FAIL. 85% success. > >* bug-857330/normal.t failed in one run > >* bug-887098-gmount-crash.t failed in one run > >* bug-857330/normal.t failed in one run > > > > * release-3.5 - 20 runs, 18 PASS, 2 FAIL. 90% success. > >* bug-857330/xml.t failed in one run > >* bug-1004744.t failed in another run (same vm for both failures) > > > > * master - 20 runs, 6 PASS, 14 FAIL. 30% success. > >* bug-1070734.t failed in one run > >* bug-1087198.t & bug-860663.t failed in one run (same vm as > > bug-1070734.t failure above) > >* bug-1087198.t & bug-857330/normal.t failed in one run (new vm, a > > subsequent run on same vm passed) > >* bug-1087198.t & bug-948686.t failed in one run (new vm) > >* bug-1070734.t & bug-1087198.t failed in one run (new vm) > >* bug-860663.t failed in one run > >* bug-1023974.t & bug-1087198.t & bug-948686.t failed in one run (new vm) > >* bug-1004744.t & bug-1023974.t & bug-1087198.t & bug-948686.t failed in > > one run (new vm) > >* bug-948686.t failed in one run (new vm) > >* bug-1070734.t failed in one run (new vm) > >* bug-1023974.t failed in one run (new vm) > >* bug-1087198.t & bug-948686.t failed in one run (new vm) > >* bug-1070734.t failed in one run (new vm) > >* bug-1087198.t failed in one run (new vm) > > > >The occasional failing tests aren't completely random, suggesting > >something is going on. Possible race conditions maybe? (no idea). > > > > * 8 failures - bug-1087198.t > > * 5 failures - bug-948686.t > > * 4 failures - bug-1070734.t > > * 3 failures - bug-1023974.t > > * 3 failures - bug-857330/normal.t > > * 2 failures - bug-860663.t > > * 2 failures - bug-1004744.t > > * 1 failures - bug-857330/xml.t > > * 1 failures - bug-887098-gmount-crash.t > > > >Anyone have suggestions on how to make this work reliably? > > > > I think it would be a good idea to arrive at a list of test cases that > are failing at random and assign owners to address them (default owner > being the submitter of the test case). In addition to these, I have > also seen tests like bd.t and xml.t fail pretty regularly. > > Justin - can we publish a consolidated list of regression tests that > fail and owners for them on an etherpad or similar? > > Fixing these test cases will enable us to bring in more jenkins > instances for parallel regression runs etc. and will also provide more > determinism for our regression tests. Your help to address the > regression test suite problems will be greatly appreciated! Indeed, getting the regression tests stable seems like a blocker before we can move to a scalable Jenkins solution. Unfortunately, it may not be trivial to debug these test cases... Any suggestion on capturing useful data that helps in figuring out why the test cases don't pass? Thanks, Niels ___ Gluster-devel mailing list Gluster-devel@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-devel
[Gluster-devel] portability
Hi I have not built master for a while, and now find GNU specific extensions that are not portable. Since it is not the first time I address them, I would like to send a reminder about it: 1) bash-specific syntax Do not write: test $foo == "bar" But instead write: test $foo = "bar" The = operator is POSIX compliant. The == operator works in bash end ksh. 2) GNU sed specific flag Do not write: sed -i 's/foo/bar/' buz But instead write: sed 's/foo/bar/' buz > buz.new && mv buz.new buz The -i flags is a GNU extension that is not implemented in BSD sed. 3) GNU make specific syntax Do not write: foo.c: foo.x foo.h ${RPCGEN} $< But instead write: foo.c: foo.x foo.h ${RPCGEN} foo.x Or even: foo.c: foo.x foo.h ${RPCGEN} ${@:.c=.c} $< does not work everywhere in non GNU make. If I understand autoconf doc correctly, it does not work outside of generic rules (.c.o:) -- Emmanuel Dreyfus m...@netbsd.org ___ Gluster-devel mailing list Gluster-devel@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] cluster/ec: Added the erasure code xlator
The cli changes for disperse volumes are ready for review: http://review.gluster.org/7782/ Xavi On Thursday 15 May 2014 10:11:49 Xavier Hernandez wrote: > Hi Kaushal, > > On Tuesday 13 May 2014 20:22:06 Kaushal M wrote: > > The syntax looks good. If you need help with the cli and glusterd changes, > > I'll be happy to help. > > Thanks. It will be really appreciated. > > I think I've a working modification. Not sure if there's something else that > have to be modified. I'll push it for review very soon and add you as a > reviewer. > > I also decided to change the volume option 'size' to 'redundancy', and its > format. It seems more intuitive now. The 'size' option had the format 'N:R', > where N was the total number of subvolumes and R the redundancy. Since the > number of subvolumes can be directly calculated from the 'subvolumes' > keyword, only the redundancy is really needed. > > Xavi > > > On Tue, May 13, 2014 at 8:08 PM, Xavier Hernandez > > wrote: > > > I'm trying to modify the cli to allow the creation of dispersed volumes. > > > > > > Current syntax for volume creation is like this: > > > volume create [stripe ] \ > > > > > > [replica ] \ > > > [transport ] \ > > > ?... \ > > > [force] > > > > > > I propose to use this modified syntax: > > > volume create [stripe ] \ > > > > > > [replica ] \ > > > [disperse ] \ > > > [redundancy ] \ > > > [transport ] \ > > > ?... \ > > > [force] > > > > > > If 'disperse' is specified and 'redundancy' is not, 1 is assumed for > > > redundancy. > > > > > > If 'redundancy' is specified and 'disperse' is not, disperse count is > > > taken > > > from the number of bricks. > > > > > > If 'disperse' is specified and the number of bricks is greater than the > > > number > > > indicated (and it is a multiple), a distributed-dispersed volume is > > > created. > > > > > > 'disperse' and 'redundancy' cannot be combined with 'stripe' or > > > 'replica'. > > > > > > Would this syntax be ok ? > > > > > > Xavi > > > > > > On Tuesday 13 May 2014 12:29:34 Xavier Hernandez wrote: > > > > I forgot to say that performance is not good, however there are some > > > > optimizations not yet incorporated that may improve it. They will be > > > > > > added > > > > > > > in following patches. > > > > > > > > Xavi > > > > > > > > On Tuesday 13 May 2014 12:23:15 Xavier Hernandez wrote: > > > > > Hello, > > > > > > > > > > I've just added the cluster/ec translator for review [1]. > > > > > > > > > > It's a rewrite that does not use any additional translator or > > > > > library. > > > > > It's > > > > > still a work in progress with some bugs to solve, but its > > > > > architecture > > > > > should be stable. The main missing feature is self-heal, that will > > > > > be > > > > > added > > > > > once the main code is stabilized and reviewed. > > > > > > > > > > Feel free to review it and send any comment you think appropiate. > > > > > > > > > > Thanks, > > > > > > > > > > Xavi > > > > > > > > > > [1] http://review.gluster.org/7749 > > > > > ___ > > > > > Gluster-devel mailing list > > > > > Gluster-devel@gluster.org > > > > > http://supercolony.gluster.org/mailman/listinfo/gluster-devel > > > > > > > > ___ > > > > Gluster-devel mailing list > > > > Gluster-devel@gluster.org > > > > http://supercolony.gluster.org/mailman/listinfo/gluster-devel > > > > > > ___ > > > Gluster-devel mailing list > > > Gluster-devel@gluster.org > > > http://supercolony.gluster.org/mailman/listinfo/gluster-devel > > ___ > Gluster-devel mailing list > Gluster-devel@gluster.org > http://supercolony.gluster.org/mailman/listinfo/gluster-devel ___ Gluster-devel mailing list Gluster-devel@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] [Gluster-infra] Progress report for regression tests in Rackspace
On 04/30/2014 07:03 PM, Justin Clift wrote: Hi us, Was trying out the GlusterFS regression tests in Rackspace VMs last night for each of the release-3.4, release-3.5, and master branches. The regression test is just a run of "run-tests.sh", from a git checkout of the appropriate branch. The good news is we're adding a lot of testing code with each release: * release-3.4 - 6303 lines (~30 mins to run test) * release-3.5 - 9776 lines (~85 mins to run test) * master - 11660 lines (~90 mins to run test) (lines counted using: $ find tests -type f -iname "*.t" -exec cat {} >> a \;; wc -l a; rm -f a) The bad news is the tests only "kind of" pass now. I say kind of because although the regression run *can* pass for each of these branch's, it's inconsistent. :( Results from testing overnight: * release-3.4 - 20 runs - 17 PASS, 3 FAIL. 85% success. * bug-857330/normal.t failed in one run * bug-887098-gmount-crash.t failed in one run * bug-857330/normal.t failed in one run * release-3.5 - 20 runs, 18 PASS, 2 FAIL. 90% success. * bug-857330/xml.t failed in one run * bug-1004744.t failed in another run (same vm for both failures) * master - 20 runs, 6 PASS, 14 FAIL. 30% success. * bug-1070734.t failed in one run * bug-1087198.t & bug-860663.t failed in one run (same vm as bug-1070734.t failure above) * bug-1087198.t & bug-857330/normal.t failed in one run (new vm, a subsequent run on same vm passed) * bug-1087198.t & bug-948686.t failed in one run (new vm) * bug-1070734.t & bug-1087198.t failed in one run (new vm) * bug-860663.t failed in one run * bug-1023974.t & bug-1087198.t & bug-948686.t failed in one run (new vm) * bug-1004744.t & bug-1023974.t & bug-1087198.t & bug-948686.t failed in one run (new vm) * bug-948686.t failed in one run (new vm) * bug-1070734.t failed in one run (new vm) * bug-1023974.t failed in one run (new vm) * bug-1087198.t & bug-948686.t failed in one run (new vm) * bug-1070734.t failed in one run (new vm) * bug-1087198.t failed in one run (new vm) The occasional failing tests aren't completely random, suggesting something is going on. Possible race conditions maybe? (no idea). * 8 failures - bug-1087198.t * 5 failures - bug-948686.t * 4 failures - bug-1070734.t * 3 failures - bug-1023974.t * 3 failures - bug-857330/normal.t * 2 failures - bug-860663.t * 2 failures - bug-1004744.t * 1 failures - bug-857330/xml.t * 1 failures - bug-887098-gmount-crash.t Anyone have suggestions on how to make this work reliably? I think it would be a good idea to arrive at a list of test cases that are failing at random and assign owners to address them (default owner being the submitter of the test case). In addition to these, I have also seen tests like bd.t and xml.t fail pretty regularly. Justin - can we publish a consolidated list of regression tests that fail and owners for them on an etherpad or similar? Fixing these test cases will enable us to bring in more jenkins instances for parallel regression runs etc. and will also provide more determinism for our regression tests. Your help to address the regression test suite problems will be greatly appreciated! Thanks, Vijay ___ Gluster-devel mailing list Gluster-devel@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] cluster/ec: Added the erasure code xlator
Hi Kaushal, On Tuesday 13 May 2014 20:22:06 Kaushal M wrote: > The syntax looks good. If you need help with the cli and glusterd changes, > I'll be happy to help. Thanks. It will be really appreciated. I think I've a working modification. Not sure if there's something else that have to be modified. I'll push it for review very soon and add you as a reviewer. I also decided to change the volume option 'size' to 'redundancy', and its format. It seems more intuitive now. The 'size' option had the format 'N:R', where N was the total number of subvolumes and R the redundancy. Since the number of subvolumes can be directly calculated from the 'subvolumes' keyword, only the redundancy is really needed. Xavi > > On Tue, May 13, 2014 at 8:08 PM, Xavier Hernandez wrote: > > I'm trying to modify the cli to allow the creation of dispersed volumes. > > > > Current syntax for volume creation is like this: > > volume create [stripe ] \ > > > > [replica ] \ > > [transport ] \ > > ?... \ > > [force] > > > > I propose to use this modified syntax: > > volume create [stripe ] \ > > > > [replica ] \ > > [disperse ] \ > > [redundancy ] \ > > [transport ] \ > > ?... \ > > [force] > > > > If 'disperse' is specified and 'redundancy' is not, 1 is assumed for > > redundancy. > > > > If 'redundancy' is specified and 'disperse' is not, disperse count is > > taken > > from the number of bricks. > > > > If 'disperse' is specified and the number of bricks is greater than the > > number > > indicated (and it is a multiple), a distributed-dispersed volume is > > created. > > > > 'disperse' and 'redundancy' cannot be combined with 'stripe' or 'replica'. > > > > Would this syntax be ok ? > > > > Xavi > > > > On Tuesday 13 May 2014 12:29:34 Xavier Hernandez wrote: > > > I forgot to say that performance is not good, however there are some > > > optimizations not yet incorporated that may improve it. They will be > > > > added > > > > > in following patches. > > > > > > Xavi > > > > > > On Tuesday 13 May 2014 12:23:15 Xavier Hernandez wrote: > > > > Hello, > > > > > > > > I've just added the cluster/ec translator for review [1]. > > > > > > > > It's a rewrite that does not use any additional translator or library. > > > > It's > > > > still a work in progress with some bugs to solve, but its architecture > > > > should be stable. The main missing feature is self-heal, that will be > > > > added > > > > once the main code is stabilized and reviewed. > > > > > > > > Feel free to review it and send any comment you think appropiate. > > > > > > > > Thanks, > > > > > > > > Xavi > > > > > > > > [1] http://review.gluster.org/7749 > > > > ___ > > > > Gluster-devel mailing list > > > > Gluster-devel@gluster.org > > > > http://supercolony.gluster.org/mailman/listinfo/gluster-devel > > > > > > ___ > > > Gluster-devel mailing list > > > Gluster-devel@gluster.org > > > http://supercolony.gluster.org/mailman/listinfo/gluster-devel > > > > ___ > > Gluster-devel mailing list > > Gluster-devel@gluster.org > > http://supercolony.gluster.org/mailman/listinfo/gluster-devel ___ Gluster-devel mailing list Gluster-devel@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-devel