[Gluster-devel] Regarding regression failure in rackspace-regression-2GB machine
Pranith, The core's backtrace [1] is not 'analysable'. It doesn't show function names and displays ()? for all the frames across all threads. It would be helpful if we had the glusterd logs corresponding to cluster.rc setup. These logs are missing too. thanks, Krish [1] - glusterd core file can be found here - http://build.gluster.org/job/rackspace-regression-2GB/250/consoleFull ___ Gluster-devel mailing list Gluster-devel@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] Regarding regression failure in rackspace-regression-2GB machine
On 19/06/2014, at 11:07 AM, Justin Clift wrote: On 19/06/2014, at 10:52 AM, Krishnan Parthasarathi wrote: Pranith, The core's backtrace [1] is not 'analysable'. It doesn't show function names and displays ()? for all the frames across all threads. It would be helpful if we had the glusterd logs corresponding to cluster.rc setup. These logs are missing too. Is there something we can do on the slaves to make it work properly? Some sort of config change maybe? I'm happy to give you remote SSH to the slaves too if that's helpful. (just let me know if you change stuff, so I can apply the same change to the others) As an extra thought, this is the regression test scripting code: https://forge.gluster.org/gluster-patch-acceptance-tests/gluster-patch-acceptance-tests/trees/master Feel free to suggest improvements, send merge requests, etc. :) + Justin -- GlusterFS - http://www.gluster.org An open source, distributed file system scaling to several petabytes, and handling thousands of clients. My personal twitter: twitter.com/realjustinclift ___ Gluster-devel mailing list Gluster-devel@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] Automating spurious failure status
On 06/19/2014 06:14 PM, Justin Clift wrote: On 19/06/2014, at 1:23 PM, Pranith Kumar Karampuri wrote: hi, I was told that Justin and I were given permission to mark a patch as verified+1 when the tests that failed are spurious failures. I think this process can be automated as well. I already have a script to parse the Console log to identify the tests that failed (I send mails using this, yet to automate the mailing part). What we need to do now is the following: 1) Find the list of tests that are modified/added as part of the commit. 2) Parse the list of tests that failed the full regression (I already have this script). Run 'prove' on these files separately say 5/10 times. If a particular test fails all the time. It is a real failure with more probability. Otherwise it is a spurious failure. If a file that is added as a new test fails even a single time, lets accept the patch after fixing the failures. Otherwise we can give +1 on it, instead of Justin/I manually doing it. Sounds good to me. :) + Justin Also send a mail to gluster-devel about the failures for each test. We'll might want to make that weekly or something? There are several failures every day. :/ Agreed. Pranith + Justin -- GlusterFS - http://www.gluster.org An open source, distributed file system scaling to several petabytes, and handling thousands of clients. My personal twitter: twitter.com/realjustinclift ___ Gluster-devel mailing list Gluster-devel@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] 3.5.1-beta2 Problems with suid and sgid bits on directories
On 2014-06-19 13:48, Susant Palai wrote: Adding Susant Unfortunately things don't go so well here, with --brick-log-level=DEBUG, I get very weird results (probably because the first brick is slower to respond while it's printing debug info), I suspect I trigger some timing related bug. I attach my testscript and a log of 20 runs (with 02777 flags). The real worrisome thing here is: backing: 0 0:0 /data/disk2/gluster/test/dir1 which means that the backing store has an unreadable dir, which gets propagated to clients... /Anders - Original Message - From: Anders Blomdell anders.blomd...@control.lth.se To: Shyamsundar Ranganathan srang...@redhat.com Cc: Gluster Devel gluster-devel@gluster.org Sent: Wednesday, 18 June, 2014 9:33:04 PM Subject: Re: [Gluster-devel] 3.5.1-beta2 Problems with suid and sgid bits on directories On 2014-06-17 18:47, Anders Blomdell wrote: On 2014-06-17 17:49, Shyamsundar Ranganathan wrote: You maybe looking at the problem being fixed here, [1]. On a lookup attribute mismatch was not being healed across directories, and this patch attempts to address the same. Currently the version of the patch does not heal the S_ISUID and S_ISGID bits, which is work in progress (but easy enough to incorporate and test based on the patch at [1]). Thanks, will look into it tomorrow. On a separate note, add-brick just adds a brick to the cluster, the lookup is where the heal (or creation of the directory across all sub volumes in DHT xlator) is being done. Thanks for the clarification (I guess that a rebalance would trigger it as well?) Attached slightly modified version of patch [1] seems to work correctly after a rebalance that is allowed to run to completion on its own, if directories are traversed during rebalance, some 0 dirs show spurious 01777, 0 and sometimes ends up with the wrong permission. Continuing debug tomorrow... Shyam [1] http://review.gluster.org/#/c/6983/ - Original Message - From: Anders Blomdell anders.blomd...@control.lth.se To: Gluster Devel gluster-devel@gluster.org Sent: Tuesday, June 17, 2014 10:53:52 AM Subject: [Gluster-devel] 3.5.1-beta2 Problems with suid and sgid bits on directories With a glusterfs-3.5.1-0.3.beta2.fc20.x86_64 with a reverted 3dc56cbd16b1074d7ca1a4fe4c5bf44400eb63ff (due to local lack of IPv4 addresses), I get weird behavior if I: 1. Create a directory with suid/sgid/sticky bit set (/mnt/gluster/test) 2. Make a subdirectory of #1 (/mnt/gluster/test/dir1) 3. Do an add-brick Before add-brick 755 /mnt/gluster 7775 /mnt/gluster/test 2755 /mnt/gluster/test/dir1 After add-brick 755 /mnt/gluster 1775 /mnt/gluster/test 755 /mnt/gluster/test/dir1 On the server it looks like this: 7775 /data/disk1/gluster/test 2755 /data/disk1/gluster/test/dir1 1775 /data/disk2/gluster/test 755 /data/disk2/gluster/test/dir1 Filed as bug: https://bugzilla.redhat.com/show_bug.cgi?id=1110262 If somebody can point me to where the logic of add-brick is placed, I can give it a shot (a find/grep on mkdir didn't immediately point me to the right place). /Anders /Anders -- Anders Blomdell Email: anders.blomd...@control.lth.se Department of Automatic Control Lund University Phone:+46 46 222 4625 P.O. Box 118 Fax: +46 46 138118 SE-221 00 Lund, Sweden bug-add-brick.sh Description: application/shellscript volume create: testvol: success: please start the volume to access data volume start: testvol: success mounted: 755 0:0 /mnt/gluster mounted: 2777 0:1600 /mnt/gluster/test mounted: 2755 247:1600 /mnt/gluster/test/dir1 Before add-brick 755 /mnt/gluster 2777 /mnt/gluster/test 2755 /mnt/gluster/test/dir1 volume add-brick: success volume set: success Files /tmp/tmp.3lK6STezID and /tmp/tmp.Z2Pr46kVu1 differ ## Differ tor jun 19 15:30:01 CEST 2014 -mounted: 755 0:0 /mnt/gluster -mounted: 2777 0:1600 /mnt/gluster/test -mounted: 2755 247:1600 /mnt/gluster/test/dir1 -mounted: 2755 247:1600 /mnt/gluster/test/dir1/dir2 +755 0:0 /mnt/gluster +2777 0:1600 /mnt/gluster/test +2755 247:1600 /mnt/gluster/test/dir1 +2755 247:1600 /mnt/gluster/test/dir1/dir2 ## TIMEOUT tor jun 19 15:30:06 CEST 2014 mounted: 755 0:0 /mnt/gluster mounted: 2777 0:1600 /mnt/gluster/test mounted: 2755 247:1600 /mnt/gluster/test/dir1 mounted: 2755 247:1600 /mnt/gluster/test/dir1/dir2 backing: 2777 0:1600 /data/disk1/gluster/test backing: 2755 247:1600 /data/disk1/gluster/test/dir1 backing: 2755 247:1600 /data/disk1/gluster/test/dir1/dir2 volume create: testvol: success: please start the volume to access data volume start: testvol: success mounted: 755 0:0 /mnt/gluster mounted: 2777 0:1600 /mnt/gluster/test mounted: 2755 247:1600 /mnt/gluster/test/dir1 Before add-brick 755 /mnt/gluster 2777 /mnt/gluster/test 2755 /mnt/gluster/test/dir1 volume add-brick: success volume set: success Files /tmp/tmp.5DWFQY6fus and /tmp/tmp.p7BxWShXLg differ
Re: [Gluster-devel] 3.5.1-beta2 Problems with suid and sgid bits on directories
On 06/19/2014 03:39 PM, Anders Blomdell wrote: On 2014-06-19 13:48, Susant Palai wrote: Adding Susant Unfortunately things don't go so well here, with --brick-log-level=DEBUG, I get very weird results (probably because the first brick is slower to respond while it's printing debug info), I suspect I trigger some timing related bug. I attach my testscript and a log of 20 runs (with 02777 flags). The real worrisome thing here is: backing: 0 0:0 /data/disk2/gluster/test/dir1 which means that the backing store has an unreadable dir, which gets propagated to clients... I have an embryo of an theory of what happens: 1. directories are created on the first brick. 2. fuse starts to read directories from the first brick. 3. getdents64 or fstatat64 to first brick takes too long, and is redirected to second brick. 4. self-heal is initiated on second brick. On monday, I will see if I can come up with some clever firewall tricks to trigger this behaviour in a reliable way. /Anders -- Anders Blomdell Email: anders.blomd...@control.lth.se Department of Automatic Control Lund University Phone:+46 46 222 4625 P.O. Box 118 SE-221 00 Lund, Sweden ___ Gluster-devel mailing list Gluster-devel@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] 3.5.1 beta 2 Sanity tests
On Tue, Jun 17, 2014 at 7:54 PM, Benjamin Turner bennytu...@gmail.com wrote: Yup, On Jun 17, 2014 7:45 PM, Justin Clift jus...@gluster.org wrote: On 17/06/2014, at 11:33 PM, Benjamin Turner wrote: Here are the tests that failed. Note that n0 is a generated wname, name255 is a 255 character string, and path 1023 is a 1023 long path /opt/qa/tools/posix-testsuite/tests/link/02.t(Wstat: 0 Tests: 10 Failed: 2) Failed tests: 4, 6 expect 0 link ${n0} ${name255} #4 expect 0 unlink ${n0} #5 - this passed expect 0 unlink ${name255} #6 /opt/qa/tools/posix-testsuite/tests/link/03.t(Wstat: 0 Tests: 16 Failed: 2) Failed tests: 8-9 expect 0 link ${n0} ${path1023} #8 expect 0 unlink ${path1023} #9 I gotta go for the day, I'll try to repro outside the script tomorrow. As a data point, people have occasionally mentioned to me in IRC and via email that these posix tests fail for them... even when run against a (non-glustered) ext4/xfs filesystem. So, it _could_ be just some weird spurious thing. If you figure out what though, that'd be cool. :) + Justin -- GlusterFS - http://www.gluster.org An open source, distributed file system scaling to several petabytes, and handling thousands of clients. My personal twitter: twitter.com/realjustinclift I went through these a while back and removed anything that wasn't valid for GlusterFS. This test was passing on 3.4.59 when it was released, i am thinking it may have something to do with a sym link to the same directory bz i found a while back? Idk, I'll get it sorted tomorrow. I got this sorted, I needed to add a sleep between the file create and the link. I ran through it manually and it worked every time, took me a few goes to think of timing issue. I didn't need this on 3.4.0.59, is there anything that needs investigated? -b ___ Gluster-devel mailing list Gluster-devel@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] 3.5.1 beta 2 Sanity tests
On 19/06/2014, at 6:55 PM, Benjamin Turner wrote: snip I went through these a while back and removed anything that wasn't valid for GlusterFS. This test was passing on 3.4.59 when it was released, i am thinking it may have something to do with a sym link to the same directory bz i found a while back? Idk, I'll get it sorted tomorrow. I got this sorted, I needed to add a sleep between the file create and the link. I ran through it manually and it worked every time, took me a few goes to think of timing issue. I didn't need this on 3.4.0.59, is there anything that needs investigated? Any ideas? :) + Justin -- GlusterFS - http://www.gluster.org An open source, distributed file system scaling to several petabytes, and handling thousands of clients. My personal twitter: twitter.com/realjustinclift ___ Gluster-devel mailing list Gluster-devel@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-devel