Re: [Gluster-devel] bug-857330/normal.t failure

2014-05-22 Thread Krishnan Parthasarathi

- Original Message -
> On 22/05/2014, at 1:34 PM, Kaushal M wrote:
> > Thanks Justin, I found the problem. The VM can be deleted now.
> 
> Done. :)
> 
> 
> > Turns out, there was more than enough time for the rebalance to complete.
> > But we hit a race, which caused a command to fail.
> > 
> > The particular test that failed is waiting for rebalance to finish. It does
> > this by doing a 'gluster volume rebalance <> status' command and checking
> > the result. The EXPECT_WITHIN function runs this command till we have a
> > match, the command fails or the timeout happens.
> > 
> > For a rebalance status command, glusterd sends a request to the rebalance
> > process (as a brick_op) to get the latest stats. It had done the same in
> > this case as well. But while glusterd was waiting for the reply, the
> > rebalance completed and the process stopped itself. This caused the rpc
> > connection between glusterd and rebalance proc to close. This caused the
> > all pending requests to be unwound as failures. Which in turnlead to the
> > command failing.
> > 
> > I cannot think of a way to avoid this race from within glusterd. For this
> > particular test, we could avoid using the 'rebalance status' command if we
> > directly checked the rebalance process state using its pid etc. I don't
> > particularly approve of this approach, as I think I used the 'rebalance
> > status' command for a reason. But I currently cannot recall the reason,
> > and if cannot come with it soon, I wouldn't mind changing the test to
> > avoid rebalance status.
> 

I think its the rebalance daemon's life cycle which is problematic. It makes it
inconvenient, if not impossible, for glusterd to gather progress/status 
deterministically.
The rebalance process could wait for the rebalance-commit subcommand to 
terminate.
There is no other daemon, managed by glusterd, has this kind of life cycle.
I don't see any good reason why rebalance should kill itself on completion
of data migration.

Thoughts?

~Krish

> Hmmm, is it the kind of thing where the "rebalance status" command
> should retry, if it's connection gets closed by a just-completed-
> rebalance (as happened here)?
> 
> Or would that not work as well?
> 
> + Justin
> 
> --
> Open Source and Standards @ Red Hat
> 
> twitter.com/realjustinclift
> 
> ___
> Gluster-devel mailing list
> Gluster-devel@gluster.org
> http://supercolony.gluster.org/mailman/listinfo/gluster-devel
> 
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] bug-857330/normal.t failure

2014-05-22 Thread Pranith Kumar Karampuri


- Original Message -
> From: "Kaushal M" 
> To: "Justin Clift" , "Gluster Devel" 
> 
> Sent: Thursday, May 22, 2014 6:04:29 PM
> Subject: Re: [Gluster-devel] bug-857330/normal.t failure
> 
> Thanks Justin, I found the problem. The VM can be deleted now.
> 
> Turns out, there was more than enough time for the rebalance to complete. But
> we hit a race, which caused a command to fail.
> 
> The particular test that failed is waiting for rebalance to finish. It does
> this by doing a 'gluster volume rebalance <> status' command and checking
> the result. The EXPECT_WITHIN function runs this command till we have a
> match, the command fails or the timeout happens.
> 
> For a rebalance status command, glusterd sends a request to the rebalance
> process (as a brick_op) to get the latest stats. It had done the same in
> this case as well. But while glusterd was waiting for the reply, the
> rebalance completed and the process stopped itself. This caused the rpc
> connection between glusterd and rebalance proc to close. This caused the all
> pending requests to be unwound as failures. Which in turnlead to the command
> failing.

Do you think we can print the status of the process as 'not-responding' when 
such a thing happens, instead of failing the command?

Pranith

> 
> I cannot think of a way to avoid this race from within glusterd. For this
> particular test, we could avoid using the 'rebalance status' command if we
> directly checked the rebalance process state using its pid etc. I don't
> particularly approve of this approach, as I think I used the 'rebalance
> status' command for a reason. But I currently cannot recall the reason, and
> if cannot come with it soon, I wouldn't mind changing the test to avoid
> rebalance status.
> 
> ~kaushal
> 
> 
> 
> On Thu, May 22, 2014 at 5:22 PM, Justin Clift < jus...@gluster.org > wrote:
> 
> 
> 
> On 22/05/2014, at 12:32 PM, Kaushal M wrote:
> > I haven't yet. But I will.
> > 
> > Justin,
> > Can I get take a peek inside the vm?
> 
> Sure.
> 
> IP: 23.253.57.20
> User: root
> Password: foobar123
> 
> The stdout log from the regression test is in /tmp/regression.log.
> 
> The GlusterFS git repo is in /root/glusterfs. Um, you should be
> able to find everything else pretty easily.
> 
> Btw, this is just a temp VM, so feel free to do anything you want
> with it. When you're finished with it let me know so I can delete
> it. :)
> 
> + Justin
> 
> 
> > ~kaushal
> > 
> > 
> > On Thu, May 22, 2014 at 4:53 PM, Pranith Kumar Karampuri <
> > pkara...@redhat.com > wrote:
> > Kaushal,
> > Rebalance status command seems to be failing sometimes. I sent a mail about
> > such spurious failure earlier today. Did you get a chance to look at the
> > logs and confirm that rebalance didn't fail and it is indeed a timeout?
> > 
> > Pranith
> > - Original Message -
> > > From: "Kaushal M" < kshlms...@gmail.com >
> > > To: "Pranith Kumar Karampuri" < pkara...@redhat.com >
> > > Cc: "Justin Clift" < jus...@gluster.org >, "Gluster Devel" <
> > > gluster-devel@gluster.org >
> > > Sent: Thursday, May 22, 2014 4:40:25 PM
> > > Subject: Re: [Gluster-devel] bug-857330/normal.t failure
> > > 
> > > The test is waiting for rebalance to finish. This is a rebalance with
> > > some
> > > actual data so it could have taken a long time to finish. I did set a
> > > pretty high timeout, but it seems like it's not enough for the new VMs.
> > > 
> > > Possible options are,
> > > - Increase this timeout further
> > > - Reduce the amount of data. Currently this is 100 directories with 10
> > > files each of size between 10-500KB
> > > 
> > > ~kaushal
> > > 
> > > 
> > > On Thu, May 22, 2014 at 3:59 PM, Pranith Kumar Karampuri <
> > > pkara...@redhat.com > wrote:
> > > 
> > > > Kaushal has more context about these CCed. Keep the setup until he
> > > > responds so that he can take a look.
> > > > 
> > > > Pranith
> > > > - Original Message -
> > > > > From: "Justin Clift" < jus...@gluster.org >
> > > > > To: "Pranith Kumar Karampuri" < pkara...@redhat.com >
> > > > > Cc: "Gluster Devel" < gluster-devel@gluster.org >
> > > > > Sent: Thursday, May 22, 2014 3:54:46 PM
> > > > > Subject: bug-857330/normal.t failure
> > > > > 
> > > > > Hi Pranith,
> > > > > 
> > > > > Ran a few VM's with your Gerrit CR 7835 applied, and in "DEBUG"
> > > > > mode (I think).
> > > > > 
> > > > > One of the VM's had a failure in bug-857330/normal.t:
> > > > > 
> > > > > Test Summary Report
> > > > > ---
> > > > > ./tests/basic/rpm.t (Wstat: 0 Tests: 0
> > > > Failed:
> > > > > 0)
> > > > > Parse errors: Bad plan. You planned 8 tests but ran 0.
> > > > > ./tests/bugs/bug-857330/normal.t (Wstat: 0 Tests: 24
> > > > Failed:
> > > > > 1)
> > > > > Failed test: 13
> > > > > Files=230, Tests=4369, 5407 wallclock secs ( 2.13 usr 1.73 sys +
> > > > 941.82
> > > > > cusr 645.54 csys = 1591.22 CPU)
> > > > > Result: FAIL
> > > > > 
> > > > > Seems to be this test:
> > > > > 
> > > > > COMMAND="volume rebal

Re: [Gluster-devel] regarding special treatment of ENOTSUP for setxattr

2014-05-22 Thread Harshavardhana
http://review.gluster.com/#/c/7823/ - the fix here

On Thu, May 22, 2014 at 1:41 PM, Harshavardhana
 wrote:
> Here are the important locations in the XFS tree coming from 2.6.32 branch
>
> STATIC int
> xfs_set_acl(struct inode *inode, int type, struct posix_acl *acl)
> {
> struct xfs_inode *ip = XFS_I(inode);
> unsigned char *ea_name;
> int error;
>
> if (S_ISLNK(inode->i_mode)) > I would
> generally think this is the issue.
> return -EOPNOTSUPP;
>
> STATIC long
> xfs_vn_fallocate(
> struct inode*inode,
> int mode,
> loff_t  offset,
> loff_t  len)
> {
> longerror;
> loff_t  new_size = 0;
> xfs_flock64_t   bf;
> xfs_inode_t *ip = XFS_I(inode);
> int cmd = XFS_IOC_RESVSP;
> int attr_flags = XFS_ATTR_NOLOCK;
>
> if (mode & ~(FALLOC_FL_KEEP_SIZE | FALLOC_FL_PUNCH_HOLE))
> return -EOPNOTSUPP;
>
> STATIC int
> xfs_ioc_setxflags(
> xfs_inode_t *ip,
> struct file *filp,
> void__user *arg)
> {
> struct fsxattr  fa;
> unsigned intflags;
> unsigned intmask;
> int error;
>
> if (copy_from_user(&flags, arg, sizeof(flags)))
> return -EFAULT;
>
> if (flags & ~(FS_IMMUTABLE_FL | FS_APPEND_FL | \
>   FS_NOATIME_FL | FS_NODUMP_FL | \
>   FS_SYNC_FL))
> return -EOPNOTSUPP;
>
> Perhaps some sort of system level acl's are being propagated by us
> over symlinks() ? - perhaps this is the related to the same issue of
> following symlinks?
>
> On Sun, May 18, 2014 at 10:48 AM, Pranith Kumar Karampuri
>  wrote:
>> Sent the following patch to remove the special treatment of ENOTSUP here: 
>> http://review.gluster.org/7788
>>
>> Pranith
>> - Original Message -
>>> From: "Kaleb KEITHLEY" 
>>> To: gluster-devel@gluster.org
>>> Sent: Tuesday, May 13, 2014 8:01:53 PM
>>> Subject: Re: [Gluster-devel] regarding special treatment of ENOTSUP for 
>>>   setxattr
>>>
>>> On 05/13/2014 08:00 AM, Nagaprasad Sathyanarayana wrote:
>>> > On 05/07/2014 03:44 PM, Pranith Kumar Karampuri wrote:
>>> >>
>>> >> - Original Message -
>>> >>> From: "Raghavendra Gowdappa" 
>>> >>> To: "Pranith Kumar Karampuri" 
>>> >>> Cc: "Vijay Bellur" , gluster-devel@gluster.org,
>>> >>> "Anand Avati" 
>>> >>> Sent: Wednesday, May 7, 2014 3:42:16 PM
>>> >>> Subject: Re: [Gluster-devel] regarding special treatment of ENOTSUP
>>> >>> for setxattr
>>> >>>
>>> >>> I think with "repetitive log message suppression" patch being merged, we
>>> >>> don't really need gf_log_occasionally (except if they are logged in
>>> >>> DEBUG or
>>> >>> TRACE levels).
>>> >> That definitely helps. But still, setxattr calls are not supposed to
>>> >> fail with ENOTSUP on FS where we support gluster. If there are special
>>> >> keys which fail with ENOTSUPP, we can conditionally log setxattr
>>> >> failures only when the key is something new?
>>>
>>> I know this is about EOPNOTSUPP (a.k.a. ENOTSUPP) returned by
>>> setxattr(2) for legitimate attrs.
>>>
>>> But I can't help but wondering if this isn't related to other bugs we've
>>> had with, e.g., lgetxattr(2) called on invalid xattrs?
>>>
>>> E.g. see https://bugzilla.redhat.com/show_bug.cgi?id=765202. We have a
>>> hack where xlators communicate with each other by getting (and setting?)
>>> invalid xattrs; the posix xlator has logic to filter out  invalid
>>> xattrs, but due to bugs this hasn't always worked perfectly.
>>>
>>> It would be interesting to know which xattrs are getting errors and on
>>> which fs types.
>>>
>>> FWIW, in a quick perusal of a fairly recent (3.14.3) kernel, in xfs
>>> there are only six places where EOPNOTSUPP is returned, none of them
>>> related to xattrs. In ext[34] EOPNOTSUPP can be returned if the
>>> user_xattr option is not enabled (enabled by default in ext4.) And in
>>> the higher level vfs xattr code there are many places where EOPNOTSUPP
>>> _might_ be returned, primarily only if subordinate function calls aren't
>>> invoked which would clear the default or return a different error.
>>>
>>> --
>>>
>>> Kaleb
>>>
>>>
>>>
>>>
>>>
>>> ___
>>> Gluster-devel mailing list
>>> Gluster-devel@gluster.org
>>> http://supercolony.gluster.org/mailman/listinfo/gluster-devel
>>>
>> ___
>> Gluster-devel mailing list
>> Gluster-devel@gluster.org
>> http://supercolony.gluster.org/mailman/listinfo/gluster-devel
>
>
>
> --
> Religious confuse piety with mere ritual, the virtuous confuse
> regulation with outcomes



-- 
Religious confuse piety with mere ritual, the virtuous confuse
regulation with outcomes
___
Gluster-d

Re: [Gluster-devel] regarding special treatment of ENOTSUP for setxattr

2014-05-22 Thread Harshavardhana
Here are the important locations in the XFS tree coming from 2.6.32 branch

STATIC int
xfs_set_acl(struct inode *inode, int type, struct posix_acl *acl)
{
struct xfs_inode *ip = XFS_I(inode);
unsigned char *ea_name;
int error;

if (S_ISLNK(inode->i_mode)) > I would
generally think this is the issue.
return -EOPNOTSUPP;

STATIC long
xfs_vn_fallocate(
struct inode*inode,
int mode,
loff_t  offset,
loff_t  len)
{
longerror;
loff_t  new_size = 0;
xfs_flock64_t   bf;
xfs_inode_t *ip = XFS_I(inode);
int cmd = XFS_IOC_RESVSP;
int attr_flags = XFS_ATTR_NOLOCK;

if (mode & ~(FALLOC_FL_KEEP_SIZE | FALLOC_FL_PUNCH_HOLE))
return -EOPNOTSUPP;

STATIC int
xfs_ioc_setxflags(
xfs_inode_t *ip,
struct file *filp,
void__user *arg)
{
struct fsxattr  fa;
unsigned intflags;
unsigned intmask;
int error;

if (copy_from_user(&flags, arg, sizeof(flags)))
return -EFAULT;

if (flags & ~(FS_IMMUTABLE_FL | FS_APPEND_FL | \
  FS_NOATIME_FL | FS_NODUMP_FL | \
  FS_SYNC_FL))
return -EOPNOTSUPP;

Perhaps some sort of system level acl's are being propagated by us
over symlinks() ? - perhaps this is the related to the same issue of
following symlinks?

On Sun, May 18, 2014 at 10:48 AM, Pranith Kumar Karampuri
 wrote:
> Sent the following patch to remove the special treatment of ENOTSUP here: 
> http://review.gluster.org/7788
>
> Pranith
> - Original Message -
>> From: "Kaleb KEITHLEY" 
>> To: gluster-devel@gluster.org
>> Sent: Tuesday, May 13, 2014 8:01:53 PM
>> Subject: Re: [Gluster-devel] regarding special treatment of ENOTSUP for  
>>  setxattr
>>
>> On 05/13/2014 08:00 AM, Nagaprasad Sathyanarayana wrote:
>> > On 05/07/2014 03:44 PM, Pranith Kumar Karampuri wrote:
>> >>
>> >> - Original Message -
>> >>> From: "Raghavendra Gowdappa" 
>> >>> To: "Pranith Kumar Karampuri" 
>> >>> Cc: "Vijay Bellur" , gluster-devel@gluster.org,
>> >>> "Anand Avati" 
>> >>> Sent: Wednesday, May 7, 2014 3:42:16 PM
>> >>> Subject: Re: [Gluster-devel] regarding special treatment of ENOTSUP
>> >>> for setxattr
>> >>>
>> >>> I think with "repetitive log message suppression" patch being merged, we
>> >>> don't really need gf_log_occasionally (except if they are logged in
>> >>> DEBUG or
>> >>> TRACE levels).
>> >> That definitely helps. But still, setxattr calls are not supposed to
>> >> fail with ENOTSUP on FS where we support gluster. If there are special
>> >> keys which fail with ENOTSUPP, we can conditionally log setxattr
>> >> failures only when the key is something new?
>>
>> I know this is about EOPNOTSUPP (a.k.a. ENOTSUPP) returned by
>> setxattr(2) for legitimate attrs.
>>
>> But I can't help but wondering if this isn't related to other bugs we've
>> had with, e.g., lgetxattr(2) called on invalid xattrs?
>>
>> E.g. see https://bugzilla.redhat.com/show_bug.cgi?id=765202. We have a
>> hack where xlators communicate with each other by getting (and setting?)
>> invalid xattrs; the posix xlator has logic to filter out  invalid
>> xattrs, but due to bugs this hasn't always worked perfectly.
>>
>> It would be interesting to know which xattrs are getting errors and on
>> which fs types.
>>
>> FWIW, in a quick perusal of a fairly recent (3.14.3) kernel, in xfs
>> there are only six places where EOPNOTSUPP is returned, none of them
>> related to xattrs. In ext[34] EOPNOTSUPP can be returned if the
>> user_xattr option is not enabled (enabled by default in ext4.) And in
>> the higher level vfs xattr code there are many places where EOPNOTSUPP
>> _might_ be returned, primarily only if subordinate function calls aren't
>> invoked which would clear the default or return a different error.
>>
>> --
>>
>> Kaleb
>>
>>
>>
>>
>>
>> ___
>> Gluster-devel mailing list
>> Gluster-devel@gluster.org
>> http://supercolony.gluster.org/mailman/listinfo/gluster-devel
>>
> ___
> Gluster-devel mailing list
> Gluster-devel@gluster.org
> http://supercolony.gluster.org/mailman/listinfo/gluster-devel



-- 
Religious confuse piety with mere ritual, the virtuous confuse
regulation with outcomes
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] Gluster driver for Archipelago - Development process

2014-05-22 Thread Vijay Bellur

On 05/22/2014 02:10 AM, Alex Pyrgiotis wrote:

On 02/17/2014 06:22 PM, Vijay Bellur wrote:

On 02/17/2014 05:11 PM, Alex Pyrgiotis wrote:

On 02/10/2014 07:06 PM, Vijay Bellur wrote:

On 02/05/2014 04:10 PM, Alex Pyrgiotis wrote:

Hi all,

Just wondering, do we have any news on that?


Hi Alex,

I have started some work on this. The progress has been rather slow
owing to 3.5 release cycle amongst other things. I intend to propose
this as a feature for 3.6 and will keep you posted as we have something
more to get you going.



Hi Vijay,

That sounds good. I suppose that if it gets included in 3.6, we will see
it in this page [1], right?



Hi Alex,

Yes, that is correct.

Thanks,
Vijay



Hi Vijay,

On the planning page for 3.6 [1], I see that Archipelago is included
(great!) and that the feature freeze was due to 21st of May. So, do we
have any news on which features will get included on 3.6, as well as
more info about the Archipelago integration?



Yes, the gfapi and related changes needed by Archipelago are planned for 
inclusion in 3.6.


The feature freeze was moved by a month after a discussion in 
yesterday's GlusterFS community meeting. I will ping you back once we 
have something tangible to get started with integration testing.


Cheers,
Vijay

___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] Changes needing review before a glusterfs-3.5.1 Beta is released

2014-05-22 Thread Niels de Vos
On Wed, May 21, 2014 at 06:40:57PM +0200, Niels de Vos wrote:
> A lot of work has been done on getting blockers resolved for the next 
> 3.5 release. We're not there yet, but we're definitely getting close to
> releasing a 1st beta.
> 
> Humble will follow-up with an email related to the documentation that is 
> still missing for features introduced with 3.5. We will not hold back on
> the Beta if the documentation is STILL incomplete, but it is seen as a
> major blocker for the final 3.5.1 release.
> 
> The following list is based on the bugs that have been requested as 
> blockers¹:
> 
> * 1089054 gf-error-codes.h is missing from source tarball
>   Depends on 1038391 for getting the changes reviewed and included in 
>   the master branch firsh:
>   - http://review.gluster.org/7714
>   - http://review.gluster.org/7786

These have been reviewed merged in the master branch. Backports have 
been posted for review:
- http://review.gluster.org/7850
- http://review.gluster.org/7851

> * 1096425 i/o error when one user tries to access RHS volume over NFS
>   with 100+
>   Patches for 3.5 posted for review:
>   - http://review.gluster.org/7829
>   - http://review.gluster.org/7830

Review of the backport http://review.gluster.org/7830 is still pending.

> * 1099878 Need support for handle based Ops to fetch/modify extended
>   attributes of a file
>   Patch for 3.5 posted for review:
>   - http://review.gluster.org/7825

Got reviewed and merged!

New addition, confirmed yesterday:

* 1081016 glusterd needs xfsprogs and e2fsprogs packages
  (don't leave zombies if required programs aren't installed)
  Needs review+merging in master: http://review.gluster.org/7361
  After approval for master, a backport for release-3.5 can be sent.

Thanks,
Niels
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] bug-857330/normal.t failure

2014-05-22 Thread Justin Clift
On 22/05/2014, at 1:34 PM, Kaushal M wrote:
> Thanks Justin, I found the problem. The VM can be deleted now.

Done. :)


> Turns out, there was more than enough time for the rebalance to complete. But 
> we hit a race, which caused a command to fail.
> 
> The particular test that failed is waiting for rebalance to finish. It does 
> this by doing a 'gluster volume rebalance <> status' command and checking the 
> result. The EXPECT_WITHIN function runs this command till we have a match, 
> the command fails or the timeout happens.
> 
> For a rebalance status command, glusterd sends a request to the rebalance 
> process (as a brick_op) to get the latest stats. It had done the same in this 
> case as well. But while glusterd was waiting for the reply, the rebalance 
> completed and the process stopped itself. This caused the rpc connection 
> between glusterd and rebalance proc to close. This caused the all pending 
> requests to be unwound as failures. Which in turnlead to the command failing.
> 
> I cannot think of a way to avoid this race from within glusterd. For this 
> particular test, we could avoid using the 'rebalance status' command if we 
> directly checked the rebalance process state using its pid etc. I don't 
> particularly approve of this approach, as I think I used the 'rebalance 
> status' command for a reason. But I currently cannot recall the reason, and 
> if cannot come with it soon, I wouldn't mind changing the test to avoid 
> rebalance status.

Hmmm, is it the kind of thing where the "rebalance status" command
should retry, if it's connection gets closed by a just-completed-
rebalance (as happened here)?

Or would that not work as well?

+ Justin

--
Open Source and Standards @ Red Hat

twitter.com/realjustinclift

___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] bug-857330/normal.t failure

2014-05-22 Thread Kaushal M
Thanks Justin, I found the problem. The VM can be deleted now.

Turns out, there was more than enough time for the rebalance to complete.
But we hit a race, which caused a command to fail.

The particular test that failed is waiting for rebalance to finish. It does
this by doing a 'gluster volume rebalance <> status' command and checking
the result. The EXPECT_WITHIN function runs this command till we have a
match, the command fails or the timeout happens.

For a rebalance status command, glusterd sends a request to the rebalance
process (as a brick_op) to get the latest stats. It had done the same in
this case as well. But while glusterd was waiting for the reply, the
rebalance completed and the process stopped itself. This caused the rpc
connection between glusterd and rebalance proc to close. This caused the
all pending requests to be unwound as failures. Which in turnlead to the
command failing.

I cannot think of a way to avoid this race from within glusterd. For this
particular test, we could avoid using the 'rebalance status' command if we
directly checked the rebalance process state using its pid etc. I don't
particularly approve of this approach, as I think I used the 'rebalance
status' command for a reason. But I currently cannot recall the reason, and
if cannot come with it soon, I wouldn't mind changing the test to avoid
rebalance status.

~kaushal



On Thu, May 22, 2014 at 5:22 PM, Justin Clift  wrote:

> On 22/05/2014, at 12:32 PM, Kaushal M wrote:
> > I haven't yet. But I will.
> >
> > Justin,
> > Can I get take a peek inside the vm?
>
> Sure.
>
>   IP: 23.253.57.20
>   User: root
>   Password: foobar123
>
> The stdout log from the regression test is in /tmp/regression.log.
>
> The GlusterFS git repo is in /root/glusterfs.  Um, you should be
> able to find everything else pretty easily.
>
> Btw, this is just a temp VM, so feel free to do anything you want
> with it.  When you're finished with it let me know so I can delete
> it. :)
>
> + Justin
>
>
> > ~kaushal
> >
> >
> > On Thu, May 22, 2014 at 4:53 PM, Pranith Kumar Karampuri <
> pkara...@redhat.com> wrote:
> > Kaushal,
> >Rebalance status command seems to be failing sometimes. I sent a mail
> about such spurious failure earlier today. Did you get a chance to look at
> the logs and confirm that rebalance didn't fail and it is indeed a timeout?
> >
> > Pranith
> > - Original Message -
> > > From: "Kaushal M" 
> > > To: "Pranith Kumar Karampuri" 
> > > Cc: "Justin Clift" , "Gluster Devel" <
> gluster-devel@gluster.org>
> > > Sent: Thursday, May 22, 2014 4:40:25 PM
> > > Subject: Re: [Gluster-devel] bug-857330/normal.t failure
> > >
> > > The test is waiting for rebalance to finish. This is a rebalance with
> some
> > > actual data so it could have taken a long time to finish. I did set a
> > > pretty high timeout, but it seems like it's not enough for the new VMs.
> > >
> > > Possible options are,
> > > - Increase this timeout further
> > > - Reduce the amount of data. Currently this is 100 directories with 10
> > > files each of size between 10-500KB
> > >
> > > ~kaushal
> > >
> > >
> > > On Thu, May 22, 2014 at 3:59 PM, Pranith Kumar Karampuri <
> > > pkara...@redhat.com> wrote:
> > >
> > > > Kaushal has more context about these CCed. Keep the setup until he
> > > > responds so that he can take a look.
> > > >
> > > > Pranith
> > > > - Original Message -
> > > > > From: "Justin Clift" 
> > > > > To: "Pranith Kumar Karampuri" 
> > > > > Cc: "Gluster Devel" 
> > > > > Sent: Thursday, May 22, 2014 3:54:46 PM
> > > > > Subject: bug-857330/normal.t failure
> > > > >
> > > > > Hi Pranith,
> > > > >
> > > > > Ran a few VM's with your Gerrit CR 7835 applied, and in "DEBUG"
> > > > > mode (I think).
> > > > >
> > > > > One of the VM's had a failure in bug-857330/normal.t:
> > > > >
> > > > >   Test Summary Report
> > > > >   ---
> > > > >   ./tests/basic/rpm.t (Wstat: 0 Tests:
> 0
> > > > Failed:
> > > > >   0)
> > > > > Parse errors: Bad plan.  You planned 8 tests but ran 0.
> > > > >   ./tests/bugs/bug-857330/normal.t(Wstat: 0 Tests:
> 24
> > > > Failed:
> > > > >   1)
> > > > > Failed test:  13
> > > > >   Files=230, Tests=4369, 5407 wallclock secs ( 2.13 usr  1.73 sys +
> > > > 941.82
> > > > >   cusr 645.54 csys = 1591.22 CPU)
> > > > >   Result: FAIL
> > > > >
> > > > > Seems to be this test:
> > > > >
> > > > >   COMMAND="volume rebalance $V0 status"
> > > > >   PATTERN="completed"
> > > > >   EXPECT_WITHIN 300 $PATTERN get-task-status
> > > > >
> > > > > Is this one on your radar already?
> > > > >
> > > > > Btw, this VM is still online.  Can give you access to retrieve logs
> > > > > if useful.
> > > > >
> > > > > + Justin
> > > > >
> > > > > --
> > > > > Open Source and Standards @ Red Hat
> > > > >
> > > > > twitter.com/realjustinclift
> > > > >
> > > > >
> > > > ___
> > > > Gluster-devel mailing list
> > 

Re: [Gluster-devel] [Gluster-users] Guidelines for Maintainers

2014-05-22 Thread Vijay Bellur

[Adding the right alias for gluster-devel this time around]

On 05/22/2014 05:29 PM, Vijay Bellur wrote:

Hi All,

Given the addition of new sub-maintainers & release maintainers to the
community [1], I have felt the need to publish a set of guidelines for
all categories of maintainers to have a non-ambiguous operational state.
A first cut of one such document can be found at [2]. I would love to
hear your thoughts and feedback to make the proposal very clear to
everybody. We can convert this draft to a real set of guidelines once
there is consensus.

Cheers,
Vijay

[1] http://thread.gmane.org/gmane.comp.file-systems.gluster.devel/6249

[2]
http://www.gluster.org/community/documentation/index.php/Guidelines_For_Maintainers

___
Gluster-users mailing list
gluster-us...@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-users




___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] bug-857330/normal.t failure

2014-05-22 Thread Kaushal M
I haven't yet. But I will.

Justin,
Can I get take a peek inside the vm?

~kaushal


On Thu, May 22, 2014 at 4:53 PM, Pranith Kumar Karampuri <
pkara...@redhat.com> wrote:

> Kaushal,
>Rebalance status command seems to be failing sometimes. I sent a mail
> about such spurious failure earlier today. Did you get a chance to look at
> the logs and confirm that rebalance didn't fail and it is indeed a timeout?
>
> Pranith
> - Original Message -
> > From: "Kaushal M" 
> > To: "Pranith Kumar Karampuri" 
> > Cc: "Justin Clift" , "Gluster Devel" <
> gluster-devel@gluster.org>
> > Sent: Thursday, May 22, 2014 4:40:25 PM
> > Subject: Re: [Gluster-devel] bug-857330/normal.t failure
> >
> > The test is waiting for rebalance to finish. This is a rebalance with
> some
> > actual data so it could have taken a long time to finish. I did set a
> > pretty high timeout, but it seems like it's not enough for the new VMs.
> >
> > Possible options are,
> > - Increase this timeout further
> > - Reduce the amount of data. Currently this is 100 directories with 10
> > files each of size between 10-500KB
> >
> > ~kaushal
> >
> >
> > On Thu, May 22, 2014 at 3:59 PM, Pranith Kumar Karampuri <
> > pkara...@redhat.com> wrote:
> >
> > > Kaushal has more context about these CCed. Keep the setup until he
> > > responds so that he can take a look.
> > >
> > > Pranith
> > > - Original Message -
> > > > From: "Justin Clift" 
> > > > To: "Pranith Kumar Karampuri" 
> > > > Cc: "Gluster Devel" 
> > > > Sent: Thursday, May 22, 2014 3:54:46 PM
> > > > Subject: bug-857330/normal.t failure
> > > >
> > > > Hi Pranith,
> > > >
> > > > Ran a few VM's with your Gerrit CR 7835 applied, and in "DEBUG"
> > > > mode (I think).
> > > >
> > > > One of the VM's had a failure in bug-857330/normal.t:
> > > >
> > > >   Test Summary Report
> > > >   ---
> > > >   ./tests/basic/rpm.t (Wstat: 0 Tests: 0
> > > Failed:
> > > >   0)
> > > > Parse errors: Bad plan.  You planned 8 tests but ran 0.
> > > >   ./tests/bugs/bug-857330/normal.t(Wstat: 0 Tests: 24
> > > Failed:
> > > >   1)
> > > > Failed test:  13
> > > >   Files=230, Tests=4369, 5407 wallclock secs ( 2.13 usr  1.73 sys +
> > > 941.82
> > > >   cusr 645.54 csys = 1591.22 CPU)
> > > >   Result: FAIL
> > > >
> > > > Seems to be this test:
> > > >
> > > >   COMMAND="volume rebalance $V0 status"
> > > >   PATTERN="completed"
> > > >   EXPECT_WITHIN 300 $PATTERN get-task-status
> > > >
> > > > Is this one on your radar already?
> > > >
> > > > Btw, this VM is still online.  Can give you access to retrieve logs
> > > > if useful.
> > > >
> > > > + Justin
> > > >
> > > > --
> > > > Open Source and Standards @ Red Hat
> > > >
> > > > twitter.com/realjustinclift
> > > >
> > > >
> > > ___
> > > Gluster-devel mailing list
> > > Gluster-devel@gluster.org
> > > http://supercolony.gluster.org/mailman/listinfo/gluster-devel
> > >
> >
>
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] bug-857330/normal.t failure

2014-05-22 Thread Pranith Kumar Karampuri
Kaushal,
   Rebalance status command seems to be failing sometimes. I sent a mail about 
such spurious failure earlier today. Did you get a chance to look at the logs 
and confirm that rebalance didn't fail and it is indeed a timeout?

Pranith
- Original Message -
> From: "Kaushal M" 
> To: "Pranith Kumar Karampuri" 
> Cc: "Justin Clift" , "Gluster Devel" 
> 
> Sent: Thursday, May 22, 2014 4:40:25 PM
> Subject: Re: [Gluster-devel] bug-857330/normal.t failure
> 
> The test is waiting for rebalance to finish. This is a rebalance with some
> actual data so it could have taken a long time to finish. I did set a
> pretty high timeout, but it seems like it's not enough for the new VMs.
> 
> Possible options are,
> - Increase this timeout further
> - Reduce the amount of data. Currently this is 100 directories with 10
> files each of size between 10-500KB
> 
> ~kaushal
> 
> 
> On Thu, May 22, 2014 at 3:59 PM, Pranith Kumar Karampuri <
> pkara...@redhat.com> wrote:
> 
> > Kaushal has more context about these CCed. Keep the setup until he
> > responds so that he can take a look.
> >
> > Pranith
> > - Original Message -
> > > From: "Justin Clift" 
> > > To: "Pranith Kumar Karampuri" 
> > > Cc: "Gluster Devel" 
> > > Sent: Thursday, May 22, 2014 3:54:46 PM
> > > Subject: bug-857330/normal.t failure
> > >
> > > Hi Pranith,
> > >
> > > Ran a few VM's with your Gerrit CR 7835 applied, and in "DEBUG"
> > > mode (I think).
> > >
> > > One of the VM's had a failure in bug-857330/normal.t:
> > >
> > >   Test Summary Report
> > >   ---
> > >   ./tests/basic/rpm.t (Wstat: 0 Tests: 0
> > Failed:
> > >   0)
> > > Parse errors: Bad plan.  You planned 8 tests but ran 0.
> > >   ./tests/bugs/bug-857330/normal.t(Wstat: 0 Tests: 24
> > Failed:
> > >   1)
> > > Failed test:  13
> > >   Files=230, Tests=4369, 5407 wallclock secs ( 2.13 usr  1.73 sys +
> > 941.82
> > >   cusr 645.54 csys = 1591.22 CPU)
> > >   Result: FAIL
> > >
> > > Seems to be this test:
> > >
> > >   COMMAND="volume rebalance $V0 status"
> > >   PATTERN="completed"
> > >   EXPECT_WITHIN 300 $PATTERN get-task-status
> > >
> > > Is this one on your radar already?
> > >
> > > Btw, this VM is still online.  Can give you access to retrieve logs
> > > if useful.
> > >
> > > + Justin
> > >
> > > --
> > > Open Source and Standards @ Red Hat
> > >
> > > twitter.com/realjustinclift
> > >
> > >
> > ___
> > Gluster-devel mailing list
> > Gluster-devel@gluster.org
> > http://supercolony.gluster.org/mailman/listinfo/gluster-devel
> >
> 
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] bug-857330/normal.t failure

2014-05-22 Thread Kaushal M
The test is waiting for rebalance to finish. This is a rebalance with some
actual data so it could have taken a long time to finish. I did set a
pretty high timeout, but it seems like it's not enough for the new VMs.

Possible options are,
- Increase this timeout further
- Reduce the amount of data. Currently this is 100 directories with 10
files each of size between 10-500KB

~kaushal


On Thu, May 22, 2014 at 3:59 PM, Pranith Kumar Karampuri <
pkara...@redhat.com> wrote:

> Kaushal has more context about these CCed. Keep the setup until he
> responds so that he can take a look.
>
> Pranith
> - Original Message -
> > From: "Justin Clift" 
> > To: "Pranith Kumar Karampuri" 
> > Cc: "Gluster Devel" 
> > Sent: Thursday, May 22, 2014 3:54:46 PM
> > Subject: bug-857330/normal.t failure
> >
> > Hi Pranith,
> >
> > Ran a few VM's with your Gerrit CR 7835 applied, and in "DEBUG"
> > mode (I think).
> >
> > One of the VM's had a failure in bug-857330/normal.t:
> >
> >   Test Summary Report
> >   ---
> >   ./tests/basic/rpm.t (Wstat: 0 Tests: 0
> Failed:
> >   0)
> > Parse errors: Bad plan.  You planned 8 tests but ran 0.
> >   ./tests/bugs/bug-857330/normal.t(Wstat: 0 Tests: 24
> Failed:
> >   1)
> > Failed test:  13
> >   Files=230, Tests=4369, 5407 wallclock secs ( 2.13 usr  1.73 sys +
> 941.82
> >   cusr 645.54 csys = 1591.22 CPU)
> >   Result: FAIL
> >
> > Seems to be this test:
> >
> >   COMMAND="volume rebalance $V0 status"
> >   PATTERN="completed"
> >   EXPECT_WITHIN 300 $PATTERN get-task-status
> >
> > Is this one on your radar already?
> >
> > Btw, this VM is still online.  Can give you access to retrieve logs
> > if useful.
> >
> > + Justin
> >
> > --
> > Open Source and Standards @ Red Hat
> >
> > twitter.com/realjustinclift
> >
> >
> ___
> Gluster-devel mailing list
> Gluster-devel@gluster.org
> http://supercolony.gluster.org/mailman/listinfo/gluster-devel
>
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] bug-857330/normal.t failure

2014-05-22 Thread Pranith Kumar Karampuri
Kaushal has more context about these CCed. Keep the setup until he responds so 
that he can take a look.

Pranith
- Original Message -
> From: "Justin Clift" 
> To: "Pranith Kumar Karampuri" 
> Cc: "Gluster Devel" 
> Sent: Thursday, May 22, 2014 3:54:46 PM
> Subject: bug-857330/normal.t failure
> 
> Hi Pranith,
> 
> Ran a few VM's with your Gerrit CR 7835 applied, and in "DEBUG"
> mode (I think).
> 
> One of the VM's had a failure in bug-857330/normal.t:
> 
>   Test Summary Report
>   ---
>   ./tests/basic/rpm.t (Wstat: 0 Tests: 0 Failed:
>   0)
> Parse errors: Bad plan.  You planned 8 tests but ran 0.
>   ./tests/bugs/bug-857330/normal.t(Wstat: 0 Tests: 24 Failed:
>   1)
> Failed test:  13
>   Files=230, Tests=4369, 5407 wallclock secs ( 2.13 usr  1.73 sys + 941.82
>   cusr 645.54 csys = 1591.22 CPU)
>   Result: FAIL
> 
> Seems to be this test:
> 
>   COMMAND="volume rebalance $V0 status"
>   PATTERN="completed"
>   EXPECT_WITHIN 300 $PATTERN get-task-status
> 
> Is this one on your radar already?
> 
> Btw, this VM is still online.  Can give you access to retrieve logs
> if useful.
> 
> + Justin
> 
> --
> Open Source and Standards @ Red Hat
> 
> twitter.com/realjustinclift
> 
> 
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-devel


[Gluster-devel] bug-857330/normal.t failure

2014-05-22 Thread Justin Clift
Hi Pranith,

Ran a few VM's with your Gerrit CR 7835 applied, and in "DEBUG"
mode (I think).

One of the VM's had a failure in bug-857330/normal.t:

  Test Summary Report
  ---
  ./tests/basic/rpm.t (Wstat: 0 Tests: 0 Failed: 0)
Parse errors: Bad plan.  You planned 8 tests but ran 0.
  ./tests/bugs/bug-857330/normal.t(Wstat: 0 Tests: 24 Failed: 1)
Failed test:  13
  Files=230, Tests=4369, 5407 wallclock secs ( 2.13 usr  1.73 sys + 941.82 cusr 
645.54 csys = 1591.22 CPU)
  Result: FAIL

Seems to be this test:

  COMMAND="volume rebalance $V0 status"
  PATTERN="completed"
  EXPECT_WITHIN 300 $PATTERN get-task-status

Is this one on your radar already?

Btw, this VM is still online.  Can give you access to retrieve logs
if useful.

+ Justin

--
Open Source and Standards @ Red Hat

twitter.com/realjustinclift

___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] Spurious failures because of nfs and snapshots

2014-05-22 Thread Vijaikumar M

I have posted a patch that fixes this issue:
http://review.gluster.org/#/c/7842/

Thanks,
Vijay


On Thursday 22 May 2014 11:35 AM, Vijay Bellur wrote:

On 05/21/2014 08:50 PM, Vijaikumar M wrote:

KP, Atin and myself did some debugging and found that there was a
deadlock in glusterd.

When creating a volume snapshot, the back-end operation 'taking a
lvm_snapshot and starting brick' for the each brick
are executed in parallel using synctask framework.

brick_start was releasing a big_lock with brick_connect and does a lock
again.
This caused a deadlock in some race condition where main-thread waiting
for one of the synctask thread to finish and
synctask-thread waiting for the big_lock.


We are working on fixing this issue.



If this fix is going to take more time, can we please log a bug to 
track this problem and remove the test cases that need to be addressed 
from the test unit? This way other valid patches will not be blocked 
by the failure of the snapshot test unit.


We can introduce these tests again as part of the fix for the problem.

-Vijay



___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] Regression testing results for master branch

2014-05-22 Thread Justin Clift
/glusterd-backend%N.log maybe?

On 22/05/2014, at 8:03 AM, Kaushal M wrote:
> The glusterds spawned using cluster.rc store their logs at 
> /d/backends//glusterd.log . But the cleanup() function cleans 
> /d/backends/, so those logs are lost before we can archive.
> 
> cluster.rc should be fixed to use a better location for the logs.
> 
> ~kaushal

--
Open Source and Standards @ Red Hat

twitter.com/realjustinclift

___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] Regression testing results for master branch

2014-05-22 Thread Kaushal M
The glusterds spawned using cluster.rc store their logs at
/d/backends//glusterd.log . But the cleanup() function cleans
/d/backends/, so those logs are lost before we can archive.

cluster.rc should be fixed to use a better location for the logs.

~kaushal


On Thu, May 22, 2014 at 11:45 AM, Kaushal M  wrote:

> It should be possible. I'll check and do the change.
>
> ~kaushal
>
>
> On Thu, May 22, 2014 at 8:14 AM, Pranith Kumar Karampuri <
> pkara...@redhat.com> wrote:
>
>>
>>
>> - Original Message -
>> > From: "Pranith Kumar Karampuri" 
>> > To: "Justin Clift" 
>> > Cc: "Gluster Devel" 
>> > Sent: Thursday, May 22, 2014 6:23:16 AM
>> > Subject: Re: [Gluster-devel] Regression testing results for master
>> branch
>> >
>> >
>> >
>> > - Original Message -
>> > > From: "Justin Clift" 
>> > > To: "Pranith Kumar Karampuri" 
>> > > Cc: "Gluster Devel" 
>> > > Sent: Wednesday, May 21, 2014 11:01:36 PM
>> > > Subject: Re: [Gluster-devel] Regression testing results for master
>> branch
>> > >
>> > > On 21/05/2014, at 6:17 PM, Justin Clift wrote:
>> > > > Hi all,
>> > > >
>> > > > Kicked off 21 VM's in Rackspace earlier today, running the
>> regression
>> > > > tests
>> > > > against master branch.
>> > > >
>> > > > Only 3 VM's failed out of the 21 (86% PASS, 14% FAIL), with all
>> three
>> > > > being
>> > > > for the same test:
>> > > >
>> > > > Test Summary Report
>> > > > ---
>> > > > ./tests/bugs/bug-948686.t   (Wstat: 0 Tests: 20
>> > > > Failed:
>> > > > 2)
>> > > >  Failed tests:  13-14
>> > > > Files=230, Tests=4373, 5601 wallclock secs ( 2.09 usr  1.58 sys +
>> 1012.66
>> > > > cusr 688.80 csys = 1705.13 CPU)
>> > > > Result: FAIL
>> > >
>> > >
>> > > Interestingly, this one looks like a simple time based thing
>> > > too.  The failed tests are the ones after the sleep:
>> > >
>> > >   ...
>> > >   #modify volume config to see change in volume-sync
>> > >   TEST $CLI_1 volume set $V0 write-behind off
>> > >   #add some files to the volume to see effect of volume-heal cmd
>> > >   TEST touch $M0/{1..100};
>> > >   TEST $CLI_1 volume stop $V0;
>> > >   TEST $glusterd_3;
>> > >   sleep 3;
>> > >   TEST $CLI_3 volume start $V0;
>> > >   TEST $CLI_2 volume stop $V0;
>> > >   TEST $CLI_2 volume delete $V0;
>> > >
>> > > Do you already have this one on your radar?
>> >
>> > It wasn't, thanks for bringing it on my radar :-). Sent
>> > http://review.gluster.org/7837 to address this.
>>
>> Kaushal,
>> I made this fix based on the assumption that the script seems to be
>> waiting for all glusterds to be online. I could not check the logs because
>> glusterds spawned by cluster.rc seem to be storing the logs not in the
>> default location. Do you think we can make changes to the script so that we
>> can get logs from glusterds spawned by cluster.rc as well?
>>
>> Pranith
>>
>> >
>> > Pranith
>> >
>> > >
>> > > + Justin
>> > >
>> > > --
>> > > Open Source and Standards @ Red Hat
>> > >
>> > > twitter.com/realjustinclift
>> > >
>> > >
>> > ___
>> > Gluster-devel mailing list
>> > Gluster-devel@gluster.org
>> > http://supercolony.gluster.org/mailman/listinfo/gluster-devel
>> >
>> ___
>> Gluster-devel mailing list
>> Gluster-devel@gluster.org
>> http://supercolony.gluster.org/mailman/listinfo/gluster-devel
>>
>
>
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-devel