Re: [Gluster-devel] bug-857330/normal.t failure
- Original Message - > On 22/05/2014, at 1:34 PM, Kaushal M wrote: > > Thanks Justin, I found the problem. The VM can be deleted now. > > Done. :) > > > > Turns out, there was more than enough time for the rebalance to complete. > > But we hit a race, which caused a command to fail. > > > > The particular test that failed is waiting for rebalance to finish. It does > > this by doing a 'gluster volume rebalance <> status' command and checking > > the result. The EXPECT_WITHIN function runs this command till we have a > > match, the command fails or the timeout happens. > > > > For a rebalance status command, glusterd sends a request to the rebalance > > process (as a brick_op) to get the latest stats. It had done the same in > > this case as well. But while glusterd was waiting for the reply, the > > rebalance completed and the process stopped itself. This caused the rpc > > connection between glusterd and rebalance proc to close. This caused the > > all pending requests to be unwound as failures. Which in turnlead to the > > command failing. > > > > I cannot think of a way to avoid this race from within glusterd. For this > > particular test, we could avoid using the 'rebalance status' command if we > > directly checked the rebalance process state using its pid etc. I don't > > particularly approve of this approach, as I think I used the 'rebalance > > status' command for a reason. But I currently cannot recall the reason, > > and if cannot come with it soon, I wouldn't mind changing the test to > > avoid rebalance status. > I think its the rebalance daemon's life cycle which is problematic. It makes it inconvenient, if not impossible, for glusterd to gather progress/status deterministically. The rebalance process could wait for the rebalance-commit subcommand to terminate. There is no other daemon, managed by glusterd, has this kind of life cycle. I don't see any good reason why rebalance should kill itself on completion of data migration. Thoughts? ~Krish > Hmmm, is it the kind of thing where the "rebalance status" command > should retry, if it's connection gets closed by a just-completed- > rebalance (as happened here)? > > Or would that not work as well? > > + Justin > > -- > Open Source and Standards @ Red Hat > > twitter.com/realjustinclift > > ___ > Gluster-devel mailing list > Gluster-devel@gluster.org > http://supercolony.gluster.org/mailman/listinfo/gluster-devel > ___ Gluster-devel mailing list Gluster-devel@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] bug-857330/normal.t failure
- Original Message - > From: "Kaushal M" > To: "Justin Clift" , "Gluster Devel" > > Sent: Thursday, May 22, 2014 6:04:29 PM > Subject: Re: [Gluster-devel] bug-857330/normal.t failure > > Thanks Justin, I found the problem. The VM can be deleted now. > > Turns out, there was more than enough time for the rebalance to complete. But > we hit a race, which caused a command to fail. > > The particular test that failed is waiting for rebalance to finish. It does > this by doing a 'gluster volume rebalance <> status' command and checking > the result. The EXPECT_WITHIN function runs this command till we have a > match, the command fails or the timeout happens. > > For a rebalance status command, glusterd sends a request to the rebalance > process (as a brick_op) to get the latest stats. It had done the same in > this case as well. But while glusterd was waiting for the reply, the > rebalance completed and the process stopped itself. This caused the rpc > connection between glusterd and rebalance proc to close. This caused the all > pending requests to be unwound as failures. Which in turnlead to the command > failing. Do you think we can print the status of the process as 'not-responding' when such a thing happens, instead of failing the command? Pranith > > I cannot think of a way to avoid this race from within glusterd. For this > particular test, we could avoid using the 'rebalance status' command if we > directly checked the rebalance process state using its pid etc. I don't > particularly approve of this approach, as I think I used the 'rebalance > status' command for a reason. But I currently cannot recall the reason, and > if cannot come with it soon, I wouldn't mind changing the test to avoid > rebalance status. > > ~kaushal > > > > On Thu, May 22, 2014 at 5:22 PM, Justin Clift < jus...@gluster.org > wrote: > > > > On 22/05/2014, at 12:32 PM, Kaushal M wrote: > > I haven't yet. But I will. > > > > Justin, > > Can I get take a peek inside the vm? > > Sure. > > IP: 23.253.57.20 > User: root > Password: foobar123 > > The stdout log from the regression test is in /tmp/regression.log. > > The GlusterFS git repo is in /root/glusterfs. Um, you should be > able to find everything else pretty easily. > > Btw, this is just a temp VM, so feel free to do anything you want > with it. When you're finished with it let me know so I can delete > it. :) > > + Justin > > > > ~kaushal > > > > > > On Thu, May 22, 2014 at 4:53 PM, Pranith Kumar Karampuri < > > pkara...@redhat.com > wrote: > > Kaushal, > > Rebalance status command seems to be failing sometimes. I sent a mail about > > such spurious failure earlier today. Did you get a chance to look at the > > logs and confirm that rebalance didn't fail and it is indeed a timeout? > > > > Pranith > > - Original Message - > > > From: "Kaushal M" < kshlms...@gmail.com > > > > To: "Pranith Kumar Karampuri" < pkara...@redhat.com > > > > Cc: "Justin Clift" < jus...@gluster.org >, "Gluster Devel" < > > > gluster-devel@gluster.org > > > > Sent: Thursday, May 22, 2014 4:40:25 PM > > > Subject: Re: [Gluster-devel] bug-857330/normal.t failure > > > > > > The test is waiting for rebalance to finish. This is a rebalance with > > > some > > > actual data so it could have taken a long time to finish. I did set a > > > pretty high timeout, but it seems like it's not enough for the new VMs. > > > > > > Possible options are, > > > - Increase this timeout further > > > - Reduce the amount of data. Currently this is 100 directories with 10 > > > files each of size between 10-500KB > > > > > > ~kaushal > > > > > > > > > On Thu, May 22, 2014 at 3:59 PM, Pranith Kumar Karampuri < > > > pkara...@redhat.com > wrote: > > > > > > > Kaushal has more context about these CCed. Keep the setup until he > > > > responds so that he can take a look. > > > > > > > > Pranith > > > > - Original Message - > > > > > From: "Justin Clift" < jus...@gluster.org > > > > > > To: "Pranith Kumar Karampuri" < pkara...@redhat.com > > > > > > Cc: "Gluster Devel" < gluster-devel@gluster.org > > > > > > Sent: Thursday, May 22, 2014 3:54:46 PM > > > > > Subject: bug-857330/normal.t failure > > > > > > > > > > Hi Pranith, > > > > > > > > > > Ran a few VM's with your Gerrit CR 7835 applied, and in "DEBUG" > > > > > mode (I think). > > > > > > > > > > One of the VM's had a failure in bug-857330/normal.t: > > > > > > > > > > Test Summary Report > > > > > --- > > > > > ./tests/basic/rpm.t (Wstat: 0 Tests: 0 > > > > Failed: > > > > > 0) > > > > > Parse errors: Bad plan. You planned 8 tests but ran 0. > > > > > ./tests/bugs/bug-857330/normal.t (Wstat: 0 Tests: 24 > > > > Failed: > > > > > 1) > > > > > Failed test: 13 > > > > > Files=230, Tests=4369, 5407 wallclock secs ( 2.13 usr 1.73 sys + > > > > 941.82 > > > > > cusr 645.54 csys = 1591.22 CPU) > > > > > Result: FAIL > > > > > > > > > > Seems to be this test: > > > > > > > > > > COMMAND="volume rebal
Re: [Gluster-devel] regarding special treatment of ENOTSUP for setxattr
http://review.gluster.com/#/c/7823/ - the fix here On Thu, May 22, 2014 at 1:41 PM, Harshavardhana wrote: > Here are the important locations in the XFS tree coming from 2.6.32 branch > > STATIC int > xfs_set_acl(struct inode *inode, int type, struct posix_acl *acl) > { > struct xfs_inode *ip = XFS_I(inode); > unsigned char *ea_name; > int error; > > if (S_ISLNK(inode->i_mode)) > I would > generally think this is the issue. > return -EOPNOTSUPP; > > STATIC long > xfs_vn_fallocate( > struct inode*inode, > int mode, > loff_t offset, > loff_t len) > { > longerror; > loff_t new_size = 0; > xfs_flock64_t bf; > xfs_inode_t *ip = XFS_I(inode); > int cmd = XFS_IOC_RESVSP; > int attr_flags = XFS_ATTR_NOLOCK; > > if (mode & ~(FALLOC_FL_KEEP_SIZE | FALLOC_FL_PUNCH_HOLE)) > return -EOPNOTSUPP; > > STATIC int > xfs_ioc_setxflags( > xfs_inode_t *ip, > struct file *filp, > void__user *arg) > { > struct fsxattr fa; > unsigned intflags; > unsigned intmask; > int error; > > if (copy_from_user(&flags, arg, sizeof(flags))) > return -EFAULT; > > if (flags & ~(FS_IMMUTABLE_FL | FS_APPEND_FL | \ > FS_NOATIME_FL | FS_NODUMP_FL | \ > FS_SYNC_FL)) > return -EOPNOTSUPP; > > Perhaps some sort of system level acl's are being propagated by us > over symlinks() ? - perhaps this is the related to the same issue of > following symlinks? > > On Sun, May 18, 2014 at 10:48 AM, Pranith Kumar Karampuri > wrote: >> Sent the following patch to remove the special treatment of ENOTSUP here: >> http://review.gluster.org/7788 >> >> Pranith >> - Original Message - >>> From: "Kaleb KEITHLEY" >>> To: gluster-devel@gluster.org >>> Sent: Tuesday, May 13, 2014 8:01:53 PM >>> Subject: Re: [Gluster-devel] regarding special treatment of ENOTSUP for >>> setxattr >>> >>> On 05/13/2014 08:00 AM, Nagaprasad Sathyanarayana wrote: >>> > On 05/07/2014 03:44 PM, Pranith Kumar Karampuri wrote: >>> >> >>> >> - Original Message - >>> >>> From: "Raghavendra Gowdappa" >>> >>> To: "Pranith Kumar Karampuri" >>> >>> Cc: "Vijay Bellur" , gluster-devel@gluster.org, >>> >>> "Anand Avati" >>> >>> Sent: Wednesday, May 7, 2014 3:42:16 PM >>> >>> Subject: Re: [Gluster-devel] regarding special treatment of ENOTSUP >>> >>> for setxattr >>> >>> >>> >>> I think with "repetitive log message suppression" patch being merged, we >>> >>> don't really need gf_log_occasionally (except if they are logged in >>> >>> DEBUG or >>> >>> TRACE levels). >>> >> That definitely helps. But still, setxattr calls are not supposed to >>> >> fail with ENOTSUP on FS where we support gluster. If there are special >>> >> keys which fail with ENOTSUPP, we can conditionally log setxattr >>> >> failures only when the key is something new? >>> >>> I know this is about EOPNOTSUPP (a.k.a. ENOTSUPP) returned by >>> setxattr(2) for legitimate attrs. >>> >>> But I can't help but wondering if this isn't related to other bugs we've >>> had with, e.g., lgetxattr(2) called on invalid xattrs? >>> >>> E.g. see https://bugzilla.redhat.com/show_bug.cgi?id=765202. We have a >>> hack where xlators communicate with each other by getting (and setting?) >>> invalid xattrs; the posix xlator has logic to filter out invalid >>> xattrs, but due to bugs this hasn't always worked perfectly. >>> >>> It would be interesting to know which xattrs are getting errors and on >>> which fs types. >>> >>> FWIW, in a quick perusal of a fairly recent (3.14.3) kernel, in xfs >>> there are only six places where EOPNOTSUPP is returned, none of them >>> related to xattrs. In ext[34] EOPNOTSUPP can be returned if the >>> user_xattr option is not enabled (enabled by default in ext4.) And in >>> the higher level vfs xattr code there are many places where EOPNOTSUPP >>> _might_ be returned, primarily only if subordinate function calls aren't >>> invoked which would clear the default or return a different error. >>> >>> -- >>> >>> Kaleb >>> >>> >>> >>> >>> >>> ___ >>> Gluster-devel mailing list >>> Gluster-devel@gluster.org >>> http://supercolony.gluster.org/mailman/listinfo/gluster-devel >>> >> ___ >> Gluster-devel mailing list >> Gluster-devel@gluster.org >> http://supercolony.gluster.org/mailman/listinfo/gluster-devel > > > > -- > Religious confuse piety with mere ritual, the virtuous confuse > regulation with outcomes -- Religious confuse piety with mere ritual, the virtuous confuse regulation with outcomes ___ Gluster-d
Re: [Gluster-devel] regarding special treatment of ENOTSUP for setxattr
Here are the important locations in the XFS tree coming from 2.6.32 branch STATIC int xfs_set_acl(struct inode *inode, int type, struct posix_acl *acl) { struct xfs_inode *ip = XFS_I(inode); unsigned char *ea_name; int error; if (S_ISLNK(inode->i_mode)) > I would generally think this is the issue. return -EOPNOTSUPP; STATIC long xfs_vn_fallocate( struct inode*inode, int mode, loff_t offset, loff_t len) { longerror; loff_t new_size = 0; xfs_flock64_t bf; xfs_inode_t *ip = XFS_I(inode); int cmd = XFS_IOC_RESVSP; int attr_flags = XFS_ATTR_NOLOCK; if (mode & ~(FALLOC_FL_KEEP_SIZE | FALLOC_FL_PUNCH_HOLE)) return -EOPNOTSUPP; STATIC int xfs_ioc_setxflags( xfs_inode_t *ip, struct file *filp, void__user *arg) { struct fsxattr fa; unsigned intflags; unsigned intmask; int error; if (copy_from_user(&flags, arg, sizeof(flags))) return -EFAULT; if (flags & ~(FS_IMMUTABLE_FL | FS_APPEND_FL | \ FS_NOATIME_FL | FS_NODUMP_FL | \ FS_SYNC_FL)) return -EOPNOTSUPP; Perhaps some sort of system level acl's are being propagated by us over symlinks() ? - perhaps this is the related to the same issue of following symlinks? On Sun, May 18, 2014 at 10:48 AM, Pranith Kumar Karampuri wrote: > Sent the following patch to remove the special treatment of ENOTSUP here: > http://review.gluster.org/7788 > > Pranith > - Original Message - >> From: "Kaleb KEITHLEY" >> To: gluster-devel@gluster.org >> Sent: Tuesday, May 13, 2014 8:01:53 PM >> Subject: Re: [Gluster-devel] regarding special treatment of ENOTSUP for >> setxattr >> >> On 05/13/2014 08:00 AM, Nagaprasad Sathyanarayana wrote: >> > On 05/07/2014 03:44 PM, Pranith Kumar Karampuri wrote: >> >> >> >> - Original Message - >> >>> From: "Raghavendra Gowdappa" >> >>> To: "Pranith Kumar Karampuri" >> >>> Cc: "Vijay Bellur" , gluster-devel@gluster.org, >> >>> "Anand Avati" >> >>> Sent: Wednesday, May 7, 2014 3:42:16 PM >> >>> Subject: Re: [Gluster-devel] regarding special treatment of ENOTSUP >> >>> for setxattr >> >>> >> >>> I think with "repetitive log message suppression" patch being merged, we >> >>> don't really need gf_log_occasionally (except if they are logged in >> >>> DEBUG or >> >>> TRACE levels). >> >> That definitely helps. But still, setxattr calls are not supposed to >> >> fail with ENOTSUP on FS where we support gluster. If there are special >> >> keys which fail with ENOTSUPP, we can conditionally log setxattr >> >> failures only when the key is something new? >> >> I know this is about EOPNOTSUPP (a.k.a. ENOTSUPP) returned by >> setxattr(2) for legitimate attrs. >> >> But I can't help but wondering if this isn't related to other bugs we've >> had with, e.g., lgetxattr(2) called on invalid xattrs? >> >> E.g. see https://bugzilla.redhat.com/show_bug.cgi?id=765202. We have a >> hack where xlators communicate with each other by getting (and setting?) >> invalid xattrs; the posix xlator has logic to filter out invalid >> xattrs, but due to bugs this hasn't always worked perfectly. >> >> It would be interesting to know which xattrs are getting errors and on >> which fs types. >> >> FWIW, in a quick perusal of a fairly recent (3.14.3) kernel, in xfs >> there are only six places where EOPNOTSUPP is returned, none of them >> related to xattrs. In ext[34] EOPNOTSUPP can be returned if the >> user_xattr option is not enabled (enabled by default in ext4.) And in >> the higher level vfs xattr code there are many places where EOPNOTSUPP >> _might_ be returned, primarily only if subordinate function calls aren't >> invoked which would clear the default or return a different error. >> >> -- >> >> Kaleb >> >> >> >> >> >> ___ >> Gluster-devel mailing list >> Gluster-devel@gluster.org >> http://supercolony.gluster.org/mailman/listinfo/gluster-devel >> > ___ > Gluster-devel mailing list > Gluster-devel@gluster.org > http://supercolony.gluster.org/mailman/listinfo/gluster-devel -- Religious confuse piety with mere ritual, the virtuous confuse regulation with outcomes ___ Gluster-devel mailing list Gluster-devel@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] Gluster driver for Archipelago - Development process
On 05/22/2014 02:10 AM, Alex Pyrgiotis wrote: On 02/17/2014 06:22 PM, Vijay Bellur wrote: On 02/17/2014 05:11 PM, Alex Pyrgiotis wrote: On 02/10/2014 07:06 PM, Vijay Bellur wrote: On 02/05/2014 04:10 PM, Alex Pyrgiotis wrote: Hi all, Just wondering, do we have any news on that? Hi Alex, I have started some work on this. The progress has been rather slow owing to 3.5 release cycle amongst other things. I intend to propose this as a feature for 3.6 and will keep you posted as we have something more to get you going. Hi Vijay, That sounds good. I suppose that if it gets included in 3.6, we will see it in this page [1], right? Hi Alex, Yes, that is correct. Thanks, Vijay Hi Vijay, On the planning page for 3.6 [1], I see that Archipelago is included (great!) and that the feature freeze was due to 21st of May. So, do we have any news on which features will get included on 3.6, as well as more info about the Archipelago integration? Yes, the gfapi and related changes needed by Archipelago are planned for inclusion in 3.6. The feature freeze was moved by a month after a discussion in yesterday's GlusterFS community meeting. I will ping you back once we have something tangible to get started with integration testing. Cheers, Vijay ___ Gluster-devel mailing list Gluster-devel@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] Changes needing review before a glusterfs-3.5.1 Beta is released
On Wed, May 21, 2014 at 06:40:57PM +0200, Niels de Vos wrote: > A lot of work has been done on getting blockers resolved for the next > 3.5 release. We're not there yet, but we're definitely getting close to > releasing a 1st beta. > > Humble will follow-up with an email related to the documentation that is > still missing for features introduced with 3.5. We will not hold back on > the Beta if the documentation is STILL incomplete, but it is seen as a > major blocker for the final 3.5.1 release. > > The following list is based on the bugs that have been requested as > blockers¹: > > * 1089054 gf-error-codes.h is missing from source tarball > Depends on 1038391 for getting the changes reviewed and included in > the master branch firsh: > - http://review.gluster.org/7714 > - http://review.gluster.org/7786 These have been reviewed merged in the master branch. Backports have been posted for review: - http://review.gluster.org/7850 - http://review.gluster.org/7851 > * 1096425 i/o error when one user tries to access RHS volume over NFS > with 100+ > Patches for 3.5 posted for review: > - http://review.gluster.org/7829 > - http://review.gluster.org/7830 Review of the backport http://review.gluster.org/7830 is still pending. > * 1099878 Need support for handle based Ops to fetch/modify extended > attributes of a file > Patch for 3.5 posted for review: > - http://review.gluster.org/7825 Got reviewed and merged! New addition, confirmed yesterday: * 1081016 glusterd needs xfsprogs and e2fsprogs packages (don't leave zombies if required programs aren't installed) Needs review+merging in master: http://review.gluster.org/7361 After approval for master, a backport for release-3.5 can be sent. Thanks, Niels ___ Gluster-devel mailing list Gluster-devel@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] bug-857330/normal.t failure
On 22/05/2014, at 1:34 PM, Kaushal M wrote: > Thanks Justin, I found the problem. The VM can be deleted now. Done. :) > Turns out, there was more than enough time for the rebalance to complete. But > we hit a race, which caused a command to fail. > > The particular test that failed is waiting for rebalance to finish. It does > this by doing a 'gluster volume rebalance <> status' command and checking the > result. The EXPECT_WITHIN function runs this command till we have a match, > the command fails or the timeout happens. > > For a rebalance status command, glusterd sends a request to the rebalance > process (as a brick_op) to get the latest stats. It had done the same in this > case as well. But while glusterd was waiting for the reply, the rebalance > completed and the process stopped itself. This caused the rpc connection > between glusterd and rebalance proc to close. This caused the all pending > requests to be unwound as failures. Which in turnlead to the command failing. > > I cannot think of a way to avoid this race from within glusterd. For this > particular test, we could avoid using the 'rebalance status' command if we > directly checked the rebalance process state using its pid etc. I don't > particularly approve of this approach, as I think I used the 'rebalance > status' command for a reason. But I currently cannot recall the reason, and > if cannot come with it soon, I wouldn't mind changing the test to avoid > rebalance status. Hmmm, is it the kind of thing where the "rebalance status" command should retry, if it's connection gets closed by a just-completed- rebalance (as happened here)? Or would that not work as well? + Justin -- Open Source and Standards @ Red Hat twitter.com/realjustinclift ___ Gluster-devel mailing list Gluster-devel@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] bug-857330/normal.t failure
Thanks Justin, I found the problem. The VM can be deleted now. Turns out, there was more than enough time for the rebalance to complete. But we hit a race, which caused a command to fail. The particular test that failed is waiting for rebalance to finish. It does this by doing a 'gluster volume rebalance <> status' command and checking the result. The EXPECT_WITHIN function runs this command till we have a match, the command fails or the timeout happens. For a rebalance status command, glusterd sends a request to the rebalance process (as a brick_op) to get the latest stats. It had done the same in this case as well. But while glusterd was waiting for the reply, the rebalance completed and the process stopped itself. This caused the rpc connection between glusterd and rebalance proc to close. This caused the all pending requests to be unwound as failures. Which in turnlead to the command failing. I cannot think of a way to avoid this race from within glusterd. For this particular test, we could avoid using the 'rebalance status' command if we directly checked the rebalance process state using its pid etc. I don't particularly approve of this approach, as I think I used the 'rebalance status' command for a reason. But I currently cannot recall the reason, and if cannot come with it soon, I wouldn't mind changing the test to avoid rebalance status. ~kaushal On Thu, May 22, 2014 at 5:22 PM, Justin Clift wrote: > On 22/05/2014, at 12:32 PM, Kaushal M wrote: > > I haven't yet. But I will. > > > > Justin, > > Can I get take a peek inside the vm? > > Sure. > > IP: 23.253.57.20 > User: root > Password: foobar123 > > The stdout log from the regression test is in /tmp/regression.log. > > The GlusterFS git repo is in /root/glusterfs. Um, you should be > able to find everything else pretty easily. > > Btw, this is just a temp VM, so feel free to do anything you want > with it. When you're finished with it let me know so I can delete > it. :) > > + Justin > > > > ~kaushal > > > > > > On Thu, May 22, 2014 at 4:53 PM, Pranith Kumar Karampuri < > pkara...@redhat.com> wrote: > > Kaushal, > >Rebalance status command seems to be failing sometimes. I sent a mail > about such spurious failure earlier today. Did you get a chance to look at > the logs and confirm that rebalance didn't fail and it is indeed a timeout? > > > > Pranith > > - Original Message - > > > From: "Kaushal M" > > > To: "Pranith Kumar Karampuri" > > > Cc: "Justin Clift" , "Gluster Devel" < > gluster-devel@gluster.org> > > > Sent: Thursday, May 22, 2014 4:40:25 PM > > > Subject: Re: [Gluster-devel] bug-857330/normal.t failure > > > > > > The test is waiting for rebalance to finish. This is a rebalance with > some > > > actual data so it could have taken a long time to finish. I did set a > > > pretty high timeout, but it seems like it's not enough for the new VMs. > > > > > > Possible options are, > > > - Increase this timeout further > > > - Reduce the amount of data. Currently this is 100 directories with 10 > > > files each of size between 10-500KB > > > > > > ~kaushal > > > > > > > > > On Thu, May 22, 2014 at 3:59 PM, Pranith Kumar Karampuri < > > > pkara...@redhat.com> wrote: > > > > > > > Kaushal has more context about these CCed. Keep the setup until he > > > > responds so that he can take a look. > > > > > > > > Pranith > > > > - Original Message - > > > > > From: "Justin Clift" > > > > > To: "Pranith Kumar Karampuri" > > > > > Cc: "Gluster Devel" > > > > > Sent: Thursday, May 22, 2014 3:54:46 PM > > > > > Subject: bug-857330/normal.t failure > > > > > > > > > > Hi Pranith, > > > > > > > > > > Ran a few VM's with your Gerrit CR 7835 applied, and in "DEBUG" > > > > > mode (I think). > > > > > > > > > > One of the VM's had a failure in bug-857330/normal.t: > > > > > > > > > > Test Summary Report > > > > > --- > > > > > ./tests/basic/rpm.t (Wstat: 0 Tests: > 0 > > > > Failed: > > > > > 0) > > > > > Parse errors: Bad plan. You planned 8 tests but ran 0. > > > > > ./tests/bugs/bug-857330/normal.t(Wstat: 0 Tests: > 24 > > > > Failed: > > > > > 1) > > > > > Failed test: 13 > > > > > Files=230, Tests=4369, 5407 wallclock secs ( 2.13 usr 1.73 sys + > > > > 941.82 > > > > > cusr 645.54 csys = 1591.22 CPU) > > > > > Result: FAIL > > > > > > > > > > Seems to be this test: > > > > > > > > > > COMMAND="volume rebalance $V0 status" > > > > > PATTERN="completed" > > > > > EXPECT_WITHIN 300 $PATTERN get-task-status > > > > > > > > > > Is this one on your radar already? > > > > > > > > > > Btw, this VM is still online. Can give you access to retrieve logs > > > > > if useful. > > > > > > > > > > + Justin > > > > > > > > > > -- > > > > > Open Source and Standards @ Red Hat > > > > > > > > > > twitter.com/realjustinclift > > > > > > > > > > > > > > ___ > > > > Gluster-devel mailing list > >
Re: [Gluster-devel] [Gluster-users] Guidelines for Maintainers
[Adding the right alias for gluster-devel this time around] On 05/22/2014 05:29 PM, Vijay Bellur wrote: Hi All, Given the addition of new sub-maintainers & release maintainers to the community [1], I have felt the need to publish a set of guidelines for all categories of maintainers to have a non-ambiguous operational state. A first cut of one such document can be found at [2]. I would love to hear your thoughts and feedback to make the proposal very clear to everybody. We can convert this draft to a real set of guidelines once there is consensus. Cheers, Vijay [1] http://thread.gmane.org/gmane.comp.file-systems.gluster.devel/6249 [2] http://www.gluster.org/community/documentation/index.php/Guidelines_For_Maintainers ___ Gluster-users mailing list gluster-us...@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-users ___ Gluster-devel mailing list Gluster-devel@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] bug-857330/normal.t failure
I haven't yet. But I will. Justin, Can I get take a peek inside the vm? ~kaushal On Thu, May 22, 2014 at 4:53 PM, Pranith Kumar Karampuri < pkara...@redhat.com> wrote: > Kaushal, >Rebalance status command seems to be failing sometimes. I sent a mail > about such spurious failure earlier today. Did you get a chance to look at > the logs and confirm that rebalance didn't fail and it is indeed a timeout? > > Pranith > - Original Message - > > From: "Kaushal M" > > To: "Pranith Kumar Karampuri" > > Cc: "Justin Clift" , "Gluster Devel" < > gluster-devel@gluster.org> > > Sent: Thursday, May 22, 2014 4:40:25 PM > > Subject: Re: [Gluster-devel] bug-857330/normal.t failure > > > > The test is waiting for rebalance to finish. This is a rebalance with > some > > actual data so it could have taken a long time to finish. I did set a > > pretty high timeout, but it seems like it's not enough for the new VMs. > > > > Possible options are, > > - Increase this timeout further > > - Reduce the amount of data. Currently this is 100 directories with 10 > > files each of size between 10-500KB > > > > ~kaushal > > > > > > On Thu, May 22, 2014 at 3:59 PM, Pranith Kumar Karampuri < > > pkara...@redhat.com> wrote: > > > > > Kaushal has more context about these CCed. Keep the setup until he > > > responds so that he can take a look. > > > > > > Pranith > > > - Original Message - > > > > From: "Justin Clift" > > > > To: "Pranith Kumar Karampuri" > > > > Cc: "Gluster Devel" > > > > Sent: Thursday, May 22, 2014 3:54:46 PM > > > > Subject: bug-857330/normal.t failure > > > > > > > > Hi Pranith, > > > > > > > > Ran a few VM's with your Gerrit CR 7835 applied, and in "DEBUG" > > > > mode (I think). > > > > > > > > One of the VM's had a failure in bug-857330/normal.t: > > > > > > > > Test Summary Report > > > > --- > > > > ./tests/basic/rpm.t (Wstat: 0 Tests: 0 > > > Failed: > > > > 0) > > > > Parse errors: Bad plan. You planned 8 tests but ran 0. > > > > ./tests/bugs/bug-857330/normal.t(Wstat: 0 Tests: 24 > > > Failed: > > > > 1) > > > > Failed test: 13 > > > > Files=230, Tests=4369, 5407 wallclock secs ( 2.13 usr 1.73 sys + > > > 941.82 > > > > cusr 645.54 csys = 1591.22 CPU) > > > > Result: FAIL > > > > > > > > Seems to be this test: > > > > > > > > COMMAND="volume rebalance $V0 status" > > > > PATTERN="completed" > > > > EXPECT_WITHIN 300 $PATTERN get-task-status > > > > > > > > Is this one on your radar already? > > > > > > > > Btw, this VM is still online. Can give you access to retrieve logs > > > > if useful. > > > > > > > > + Justin > > > > > > > > -- > > > > Open Source and Standards @ Red Hat > > > > > > > > twitter.com/realjustinclift > > > > > > > > > > > ___ > > > Gluster-devel mailing list > > > Gluster-devel@gluster.org > > > http://supercolony.gluster.org/mailman/listinfo/gluster-devel > > > > > > ___ Gluster-devel mailing list Gluster-devel@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] bug-857330/normal.t failure
Kaushal, Rebalance status command seems to be failing sometimes. I sent a mail about such spurious failure earlier today. Did you get a chance to look at the logs and confirm that rebalance didn't fail and it is indeed a timeout? Pranith - Original Message - > From: "Kaushal M" > To: "Pranith Kumar Karampuri" > Cc: "Justin Clift" , "Gluster Devel" > > Sent: Thursday, May 22, 2014 4:40:25 PM > Subject: Re: [Gluster-devel] bug-857330/normal.t failure > > The test is waiting for rebalance to finish. This is a rebalance with some > actual data so it could have taken a long time to finish. I did set a > pretty high timeout, but it seems like it's not enough for the new VMs. > > Possible options are, > - Increase this timeout further > - Reduce the amount of data. Currently this is 100 directories with 10 > files each of size between 10-500KB > > ~kaushal > > > On Thu, May 22, 2014 at 3:59 PM, Pranith Kumar Karampuri < > pkara...@redhat.com> wrote: > > > Kaushal has more context about these CCed. Keep the setup until he > > responds so that he can take a look. > > > > Pranith > > - Original Message - > > > From: "Justin Clift" > > > To: "Pranith Kumar Karampuri" > > > Cc: "Gluster Devel" > > > Sent: Thursday, May 22, 2014 3:54:46 PM > > > Subject: bug-857330/normal.t failure > > > > > > Hi Pranith, > > > > > > Ran a few VM's with your Gerrit CR 7835 applied, and in "DEBUG" > > > mode (I think). > > > > > > One of the VM's had a failure in bug-857330/normal.t: > > > > > > Test Summary Report > > > --- > > > ./tests/basic/rpm.t (Wstat: 0 Tests: 0 > > Failed: > > > 0) > > > Parse errors: Bad plan. You planned 8 tests but ran 0. > > > ./tests/bugs/bug-857330/normal.t(Wstat: 0 Tests: 24 > > Failed: > > > 1) > > > Failed test: 13 > > > Files=230, Tests=4369, 5407 wallclock secs ( 2.13 usr 1.73 sys + > > 941.82 > > > cusr 645.54 csys = 1591.22 CPU) > > > Result: FAIL > > > > > > Seems to be this test: > > > > > > COMMAND="volume rebalance $V0 status" > > > PATTERN="completed" > > > EXPECT_WITHIN 300 $PATTERN get-task-status > > > > > > Is this one on your radar already? > > > > > > Btw, this VM is still online. Can give you access to retrieve logs > > > if useful. > > > > > > + Justin > > > > > > -- > > > Open Source and Standards @ Red Hat > > > > > > twitter.com/realjustinclift > > > > > > > > ___ > > Gluster-devel mailing list > > Gluster-devel@gluster.org > > http://supercolony.gluster.org/mailman/listinfo/gluster-devel > > > ___ Gluster-devel mailing list Gluster-devel@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] bug-857330/normal.t failure
The test is waiting for rebalance to finish. This is a rebalance with some actual data so it could have taken a long time to finish. I did set a pretty high timeout, but it seems like it's not enough for the new VMs. Possible options are, - Increase this timeout further - Reduce the amount of data. Currently this is 100 directories with 10 files each of size between 10-500KB ~kaushal On Thu, May 22, 2014 at 3:59 PM, Pranith Kumar Karampuri < pkara...@redhat.com> wrote: > Kaushal has more context about these CCed. Keep the setup until he > responds so that he can take a look. > > Pranith > - Original Message - > > From: "Justin Clift" > > To: "Pranith Kumar Karampuri" > > Cc: "Gluster Devel" > > Sent: Thursday, May 22, 2014 3:54:46 PM > > Subject: bug-857330/normal.t failure > > > > Hi Pranith, > > > > Ran a few VM's with your Gerrit CR 7835 applied, and in "DEBUG" > > mode (I think). > > > > One of the VM's had a failure in bug-857330/normal.t: > > > > Test Summary Report > > --- > > ./tests/basic/rpm.t (Wstat: 0 Tests: 0 > Failed: > > 0) > > Parse errors: Bad plan. You planned 8 tests but ran 0. > > ./tests/bugs/bug-857330/normal.t(Wstat: 0 Tests: 24 > Failed: > > 1) > > Failed test: 13 > > Files=230, Tests=4369, 5407 wallclock secs ( 2.13 usr 1.73 sys + > 941.82 > > cusr 645.54 csys = 1591.22 CPU) > > Result: FAIL > > > > Seems to be this test: > > > > COMMAND="volume rebalance $V0 status" > > PATTERN="completed" > > EXPECT_WITHIN 300 $PATTERN get-task-status > > > > Is this one on your radar already? > > > > Btw, this VM is still online. Can give you access to retrieve logs > > if useful. > > > > + Justin > > > > -- > > Open Source and Standards @ Red Hat > > > > twitter.com/realjustinclift > > > > > ___ > Gluster-devel mailing list > Gluster-devel@gluster.org > http://supercolony.gluster.org/mailman/listinfo/gluster-devel > ___ Gluster-devel mailing list Gluster-devel@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] bug-857330/normal.t failure
Kaushal has more context about these CCed. Keep the setup until he responds so that he can take a look. Pranith - Original Message - > From: "Justin Clift" > To: "Pranith Kumar Karampuri" > Cc: "Gluster Devel" > Sent: Thursday, May 22, 2014 3:54:46 PM > Subject: bug-857330/normal.t failure > > Hi Pranith, > > Ran a few VM's with your Gerrit CR 7835 applied, and in "DEBUG" > mode (I think). > > One of the VM's had a failure in bug-857330/normal.t: > > Test Summary Report > --- > ./tests/basic/rpm.t (Wstat: 0 Tests: 0 Failed: > 0) > Parse errors: Bad plan. You planned 8 tests but ran 0. > ./tests/bugs/bug-857330/normal.t(Wstat: 0 Tests: 24 Failed: > 1) > Failed test: 13 > Files=230, Tests=4369, 5407 wallclock secs ( 2.13 usr 1.73 sys + 941.82 > cusr 645.54 csys = 1591.22 CPU) > Result: FAIL > > Seems to be this test: > > COMMAND="volume rebalance $V0 status" > PATTERN="completed" > EXPECT_WITHIN 300 $PATTERN get-task-status > > Is this one on your radar already? > > Btw, this VM is still online. Can give you access to retrieve logs > if useful. > > + Justin > > -- > Open Source and Standards @ Red Hat > > twitter.com/realjustinclift > > ___ Gluster-devel mailing list Gluster-devel@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-devel
[Gluster-devel] bug-857330/normal.t failure
Hi Pranith, Ran a few VM's with your Gerrit CR 7835 applied, and in "DEBUG" mode (I think). One of the VM's had a failure in bug-857330/normal.t: Test Summary Report --- ./tests/basic/rpm.t (Wstat: 0 Tests: 0 Failed: 0) Parse errors: Bad plan. You planned 8 tests but ran 0. ./tests/bugs/bug-857330/normal.t(Wstat: 0 Tests: 24 Failed: 1) Failed test: 13 Files=230, Tests=4369, 5407 wallclock secs ( 2.13 usr 1.73 sys + 941.82 cusr 645.54 csys = 1591.22 CPU) Result: FAIL Seems to be this test: COMMAND="volume rebalance $V0 status" PATTERN="completed" EXPECT_WITHIN 300 $PATTERN get-task-status Is this one on your radar already? Btw, this VM is still online. Can give you access to retrieve logs if useful. + Justin -- Open Source and Standards @ Red Hat twitter.com/realjustinclift ___ Gluster-devel mailing list Gluster-devel@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] Spurious failures because of nfs and snapshots
I have posted a patch that fixes this issue: http://review.gluster.org/#/c/7842/ Thanks, Vijay On Thursday 22 May 2014 11:35 AM, Vijay Bellur wrote: On 05/21/2014 08:50 PM, Vijaikumar M wrote: KP, Atin and myself did some debugging and found that there was a deadlock in glusterd. When creating a volume snapshot, the back-end operation 'taking a lvm_snapshot and starting brick' for the each brick are executed in parallel using synctask framework. brick_start was releasing a big_lock with brick_connect and does a lock again. This caused a deadlock in some race condition where main-thread waiting for one of the synctask thread to finish and synctask-thread waiting for the big_lock. We are working on fixing this issue. If this fix is going to take more time, can we please log a bug to track this problem and remove the test cases that need to be addressed from the test unit? This way other valid patches will not be blocked by the failure of the snapshot test unit. We can introduce these tests again as part of the fix for the problem. -Vijay ___ Gluster-devel mailing list Gluster-devel@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] Regression testing results for master branch
/glusterd-backend%N.log maybe? On 22/05/2014, at 8:03 AM, Kaushal M wrote: > The glusterds spawned using cluster.rc store their logs at > /d/backends//glusterd.log . But the cleanup() function cleans > /d/backends/, so those logs are lost before we can archive. > > cluster.rc should be fixed to use a better location for the logs. > > ~kaushal -- Open Source and Standards @ Red Hat twitter.com/realjustinclift ___ Gluster-devel mailing list Gluster-devel@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] Regression testing results for master branch
The glusterds spawned using cluster.rc store their logs at /d/backends//glusterd.log . But the cleanup() function cleans /d/backends/, so those logs are lost before we can archive. cluster.rc should be fixed to use a better location for the logs. ~kaushal On Thu, May 22, 2014 at 11:45 AM, Kaushal M wrote: > It should be possible. I'll check and do the change. > > ~kaushal > > > On Thu, May 22, 2014 at 8:14 AM, Pranith Kumar Karampuri < > pkara...@redhat.com> wrote: > >> >> >> - Original Message - >> > From: "Pranith Kumar Karampuri" >> > To: "Justin Clift" >> > Cc: "Gluster Devel" >> > Sent: Thursday, May 22, 2014 6:23:16 AM >> > Subject: Re: [Gluster-devel] Regression testing results for master >> branch >> > >> > >> > >> > - Original Message - >> > > From: "Justin Clift" >> > > To: "Pranith Kumar Karampuri" >> > > Cc: "Gluster Devel" >> > > Sent: Wednesday, May 21, 2014 11:01:36 PM >> > > Subject: Re: [Gluster-devel] Regression testing results for master >> branch >> > > >> > > On 21/05/2014, at 6:17 PM, Justin Clift wrote: >> > > > Hi all, >> > > > >> > > > Kicked off 21 VM's in Rackspace earlier today, running the >> regression >> > > > tests >> > > > against master branch. >> > > > >> > > > Only 3 VM's failed out of the 21 (86% PASS, 14% FAIL), with all >> three >> > > > being >> > > > for the same test: >> > > > >> > > > Test Summary Report >> > > > --- >> > > > ./tests/bugs/bug-948686.t (Wstat: 0 Tests: 20 >> > > > Failed: >> > > > 2) >> > > > Failed tests: 13-14 >> > > > Files=230, Tests=4373, 5601 wallclock secs ( 2.09 usr 1.58 sys + >> 1012.66 >> > > > cusr 688.80 csys = 1705.13 CPU) >> > > > Result: FAIL >> > > >> > > >> > > Interestingly, this one looks like a simple time based thing >> > > too. The failed tests are the ones after the sleep: >> > > >> > > ... >> > > #modify volume config to see change in volume-sync >> > > TEST $CLI_1 volume set $V0 write-behind off >> > > #add some files to the volume to see effect of volume-heal cmd >> > > TEST touch $M0/{1..100}; >> > > TEST $CLI_1 volume stop $V0; >> > > TEST $glusterd_3; >> > > sleep 3; >> > > TEST $CLI_3 volume start $V0; >> > > TEST $CLI_2 volume stop $V0; >> > > TEST $CLI_2 volume delete $V0; >> > > >> > > Do you already have this one on your radar? >> > >> > It wasn't, thanks for bringing it on my radar :-). Sent >> > http://review.gluster.org/7837 to address this. >> >> Kaushal, >> I made this fix based on the assumption that the script seems to be >> waiting for all glusterds to be online. I could not check the logs because >> glusterds spawned by cluster.rc seem to be storing the logs not in the >> default location. Do you think we can make changes to the script so that we >> can get logs from glusterds spawned by cluster.rc as well? >> >> Pranith >> >> > >> > Pranith >> > >> > > >> > > + Justin >> > > >> > > -- >> > > Open Source and Standards @ Red Hat >> > > >> > > twitter.com/realjustinclift >> > > >> > > >> > ___ >> > Gluster-devel mailing list >> > Gluster-devel@gluster.org >> > http://supercolony.gluster.org/mailman/listinfo/gluster-devel >> > >> ___ >> Gluster-devel mailing list >> Gluster-devel@gluster.org >> http://supercolony.gluster.org/mailman/listinfo/gluster-devel >> > > ___ Gluster-devel mailing list Gluster-devel@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-devel