[Gluster-devel] Regarding regression failure in rackspace-regression-2GB machine

2014-06-19 Thread Krishnan Parthasarathi
Pranith,

The core's backtrace [1] is not 'analysable'. It doesn't show function names
and displays ()? for all the frames across all threads. It would be helpful
if we had the glusterd logs corresponding to cluster.rc setup. These logs
are missing too.

thanks,
Krish


[1] - glusterd core file can be found here - 
http://build.gluster.org/job/rackspace-regression-2GB/250/consoleFull
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] Regarding regression failure in rackspace-regression-2GB machine

2014-06-19 Thread Justin Clift
On 19/06/2014, at 11:07 AM, Justin Clift wrote:
 On 19/06/2014, at 10:52 AM, Krishnan Parthasarathi wrote:
 Pranith,
 
 The core's backtrace [1] is not 'analysable'. It doesn't show function names
 and displays ()? for all the frames across all threads. It would be helpful
 if we had the glusterd logs corresponding to cluster.rc setup. These logs
 are missing too.
 
 Is there something we can do on the slaves to make it work properly?  Some
 sort of config change maybe?
 
 I'm happy to give you remote SSH to the slaves too if that's helpful.
 
 (just let me know if you change stuff, so I can apply the same change to
 the others)


As an extra thought, this is the regression test scripting code:

  
https://forge.gluster.org/gluster-patch-acceptance-tests/gluster-patch-acceptance-tests/trees/master

Feel free to suggest improvements, send merge requests, etc. :)

+ Justin

--
GlusterFS - http://www.gluster.org

An open source, distributed file system scaling to several
petabytes, and handling thousands of clients.

My personal twitter: twitter.com/realjustinclift

___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] Automating spurious failure status

2014-06-19 Thread Pranith Kumar Karampuri


On 06/19/2014 06:14 PM, Justin Clift wrote:

On 19/06/2014, at 1:23 PM, Pranith Kumar Karampuri wrote:

hi,
  I was told that Justin and I were given permission to mark a patch as 
verified+1 when the tests that failed are spurious failures. I think this 
process can be automated as well. I already have a script to parse the Console 
log to identify the tests that failed (I send mails using this, yet to automate 
the mailing part). What we need to do now is the following:
1) Find the list of tests that are modified/added as part of the commit.
2) Parse the list of tests that failed the full regression (I already have this 
script).

Run 'prove' on these files separately say 5/10 times. If a particular test 
fails all the time. It is a real failure with more probability. Otherwise it is 
a spurious failure.
If a file that is added as a new test fails even a single time, lets accept the 
patch after fixing the failures.
Otherwise we can give +1 on it, instead of Justin/I manually doing it.

Sounds good to me. :)

+ Justin



Also send a mail to gluster-devel about the failures for each test.


We'll might want to make that weekly or something?  There are several failures
every day. :/

Agreed.

Pranith


+ Justin

--
GlusterFS - http://www.gluster.org

An open source, distributed file system scaling to several
petabytes, and handling thousands of clients.

My personal twitter: twitter.com/realjustinclift



___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] 3.5.1-beta2 Problems with suid and sgid bits on directories

2014-06-19 Thread Anders Blomdell
On 2014-06-19 13:48, Susant Palai wrote:
 Adding Susant
Unfortunately things don't go so well here, with --brick-log-level=DEBUG, I get 
very
weird results (probably because the first brick is slower to respond while it's
printing debug info), I suspect I trigger some timing related bug.

I attach my testscript and a log of 20 runs (with 02777 flags).

The real worrisome thing here is:

  backing: 0 0:0 /data/disk2/gluster/test/dir1

which means that the backing store has an unreadable dir, which gets propagated 
to clients...

/Anders
 
 - Original Message -
 From: Anders Blomdell anders.blomd...@control.lth.se
 To: Shyamsundar Ranganathan srang...@redhat.com
 Cc: Gluster Devel gluster-devel@gluster.org
 Sent: Wednesday, 18 June, 2014 9:33:04 PM
 Subject: Re: [Gluster-devel] 3.5.1-beta2 Problems with suid and sgid bits on 
 directories
 
 On 2014-06-17 18:47, Anders Blomdell wrote:
 On 2014-06-17 17:49, Shyamsundar Ranganathan wrote:
 You maybe looking at the problem being fixed here, [1].

 On a lookup attribute mismatch was not being healed across
 directories, and this patch attempts to address the same. Currently
 the version of the patch does not heal the S_ISUID and S_ISGID bits,
 which is work in progress (but easy enough to incorporate and test
 based on the patch at [1]).
 Thanks, will look into it tomorrow.

 On a separate note, add-brick just adds a brick to the cluster, the
 lookup is where the heal (or creation of the directory across all sub
 volumes in DHT xlator) is being done.
 Thanks for the clarification (I guess that a rebalance would trigger it as 
 well?)
 Attached slightly modified version of patch [1] seems to work correctly after 
 a rebalance
 that is allowed to run to completion on its own, if directories are traversed 
 during rebalance, 
 some 0 dirs show spurious 01777, 0 and sometimes ends up with the 
 wrong permission.
 
 Continuing debug tomorrow...


 Shyam

 [1] http://review.gluster.org/#/c/6983/

 - Original Message -
 From: Anders Blomdell anders.blomd...@control.lth.se To:
 Gluster Devel gluster-devel@gluster.org Sent: Tuesday, June 17,
 2014 10:53:52 AM Subject: [Gluster-devel] 3.5.1-beta2 Problems with
 suid and sgid bits on  directories

 With a glusterfs-3.5.1-0.3.beta2.fc20.x86_64 with a reverted 
 3dc56cbd16b1074d7ca1a4fe4c5bf44400eb63ff (due to local lack of IPv4 
 addresses), I get weird behavior if I:

 1. Create a directory with suid/sgid/sticky bit set
 (/mnt/gluster/test) 2. Make a subdirectory of #1
 (/mnt/gluster/test/dir1) 3. Do an add-brick

 Before add-brick

 755 /mnt/gluster 7775 /mnt/gluster/test 2755 /mnt/gluster/test/dir1

 After add-brick

 755 /mnt/gluster 1775 /mnt/gluster/test 755 /mnt/gluster/test/dir1

 On the server it looks like this:

 7775 /data/disk1/gluster/test 2755 /data/disk1/gluster/test/dir1 1775
 /data/disk2/gluster/test 755 /data/disk2/gluster/test/dir1

 Filed as bug:

 https://bugzilla.redhat.com/show_bug.cgi?id=1110262

 If somebody can point me to where the logic of add-brick is placed, I
 can give it a shot (a find/grep on mkdir didn't immediately point me
 to the right place).


 /Anders

 /Anders
 


-- 
Anders Blomdell  Email: anders.blomd...@control.lth.se
Department of Automatic Control
Lund University  Phone:+46 46 222 4625
P.O. Box 118 Fax:  +46 46 138118
SE-221 00 Lund, Sweden



bug-add-brick.sh
Description: application/shellscript
volume create: testvol: success: please start the volume to access data
volume start: testvol: success
mounted: 755 0:0 /mnt/gluster
mounted: 2777 0:1600 /mnt/gluster/test
mounted: 2755 247:1600 /mnt/gluster/test/dir1

Before add-brick
755 /mnt/gluster
2777 /mnt/gluster/test
2755 /mnt/gluster/test/dir1
volume add-brick: success
volume set: success
Files /tmp/tmp.3lK6STezID and /tmp/tmp.Z2Pr46kVu1 differ
## Differ tor jun 19 15:30:01 CEST 2014
-mounted: 755 0:0 /mnt/gluster
-mounted: 2777 0:1600 /mnt/gluster/test
-mounted: 2755 247:1600 /mnt/gluster/test/dir1
-mounted: 2755 247:1600 /mnt/gluster/test/dir1/dir2
+755 0:0 /mnt/gluster
+2777 0:1600 /mnt/gluster/test
+2755 247:1600 /mnt/gluster/test/dir1
+2755 247:1600 /mnt/gluster/test/dir1/dir2

## TIMEOUT tor jun 19 15:30:06 CEST 2014
mounted: 755 0:0 /mnt/gluster
mounted: 2777 0:1600 /mnt/gluster/test
mounted: 2755 247:1600 /mnt/gluster/test/dir1
mounted: 2755 247:1600 /mnt/gluster/test/dir1/dir2
backing: 2777 0:1600 /data/disk1/gluster/test
backing: 2755 247:1600 /data/disk1/gluster/test/dir1
backing: 2755 247:1600 /data/disk1/gluster/test/dir1/dir2
volume create: testvol: success: please start the volume to access data
volume start: testvol: success
mounted: 755 0:0 /mnt/gluster
mounted: 2777 0:1600 /mnt/gluster/test
mounted: 2755 247:1600 /mnt/gluster/test/dir1

Before add-brick
755 /mnt/gluster
2777 /mnt/gluster/test
2755 /mnt/gluster/test/dir1
volume add-brick: success
volume set: success
Files /tmp/tmp.5DWFQY6fus and /tmp/tmp.p7BxWShXLg differ

Re: [Gluster-devel] 3.5.1-beta2 Problems with suid and sgid bits on directories

2014-06-19 Thread Anders Blomdell

On 06/19/2014 03:39 PM, Anders Blomdell wrote:

On 2014-06-19 13:48, Susant Palai wrote:

Adding Susant

Unfortunately things don't go so well here, with --brick-log-level=DEBUG, I get 
very
weird results (probably because the first brick is slower to respond while it's
printing debug info), I suspect I trigger some timing related bug.

I attach my testscript and a log of 20 runs (with 02777 flags).

The real worrisome thing here is:

   backing: 0 0:0 /data/disk2/gluster/test/dir1

which means that the backing store has an unreadable dir, which gets propagated 
to clients...

I have an embryo of an theory of what happens:

1. directories are created on the first brick.
2. fuse starts to read directories from the first brick.
3. getdents64 or fstatat64 to first brick takes too long, and
   is redirected to second brick.
4. self-heal is initiated on second brick.

On monday, I will see if I can come up with some clever firewall tricks 
to trigger this behaviour in a reliable way.


/Anders

--
Anders Blomdell  Email: anders.blomd...@control.lth.se
Department of Automatic Control
Lund University  Phone:+46 46 222 4625
P.O. Box 118
SE-221 00 Lund, Sweden

___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] 3.5.1 beta 2 Sanity tests

2014-06-19 Thread Benjamin Turner
On Tue, Jun 17, 2014 at 7:54 PM, Benjamin Turner bennytu...@gmail.com
wrote:

 Yup,
 On Jun 17, 2014 7:45 PM, Justin Clift jus...@gluster.org wrote:
 
  On 17/06/2014, at 11:33 PM, Benjamin Turner wrote:
   Here are the tests that failed.  Note that n0 is a generated wname,
 name255 is a 255 character string, and path 1023 is a 1023 long path

  
   /opt/qa/tools/posix-testsuite/tests/link/02.t(Wstat: 0 Tests: 10
 Failed: 2)
 Failed tests:  4, 6
  
   expect 0 link ${n0} ${name255}   #4
   expect 0 unlink ${n0} #5   - this passed
   expect 0 unlink ${name255}   #6
  
   /opt/qa/tools/posix-testsuite/tests/link/03.t(Wstat: 0 Tests: 16
 Failed: 2)
 Failed tests:  8-9
  
   expect 0 link ${n0} ${path1023}  #8
   expect 0 unlink ${path1023}   #9
  
   I gotta go for the day, I'll try to repro outside the script tomorrow.
 
  As a data point, people have occasionally mentioned to me in IRC
  and via email that these posix tests fail for them... even when
  run against a (non-glustered) ext4/xfs filesystem.  So, it _could_
  be just some weird spurious thing.  If you figure out what though,
  that'd be cool. :)
 
  + Justin
 
  --
  GlusterFS - http://www.gluster.org
 
  An open source, distributed file system scaling to several
  petabytes, and handling thousands of clients.
 
  My personal twitter: twitter.com/realjustinclift
 
 I went through these a while back and removed anything that wasn't valid
 for GlusterFS.  This test was passing on 3.4.59 when it was released, i am
 thinking it may have something to do with a sym link to the same directory
 bz i found a while back? Idk, I'll get it sorted tomorrow.

  I got this sorted, I needed to add a sleep between the file create and
the link.  I ran through it manually and it worked every time, took me a
few goes to think of timing issue.  I didn't need this on 3.4.0.59, is
there anything that needs investigated?

-b
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] 3.5.1 beta 2 Sanity tests

2014-06-19 Thread Justin Clift
On 19/06/2014, at 6:55 PM, Benjamin Turner wrote:
snip
 
 I went through these a while back and removed anything that wasn't valid for 
 GlusterFS.  This test was passing on 3.4.59 when it was released, i am 
 thinking it may have something to do with a sym link to the same directory bz 
 i found a while back? Idk, I'll get it sorted tomorrow.
 
 I got this sorted, I needed to add a sleep between the file create and the 
 link.  I ran through it manually and it worked every time, took me a few goes 
 to think of timing issue.  I didn't need this on 3.4.0.59, is there anything 
 that needs investigated?

Any ideas? :)

+ Justin

--
GlusterFS - http://www.gluster.org

An open source, distributed file system scaling to several
petabytes, and handling thousands of clients.

My personal twitter: twitter.com/realjustinclift

___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-devel