Re: [Gluster-devel] autodelete in snapshots

2014-06-03 Thread Kaushal M
I agree as well. We shouldn't be deleting any data without the
explicit consent of the user.

The approach proposed by MS is better than the earlier approach.

~kaushal

On Tue, Jun 3, 2014 at 1:02 AM, M S Vishwanath Bhat  wrote:
>
>
>
> On 2 June 2014 20:22, Vijay Bellur  wrote:
>>
>> On 04/23/2014 05:50 AM, Vijay Bellur wrote:
>>>
>>> On 04/20/2014 11:42 PM, Lalatendu Mohanty wrote:

 On 04/16/2014 11:39 AM, Avra Sengupta wrote:
>
> The whole purpose of introducing the soft-limit is, that at any point
> of time the number of
> snaps should not exceed the hard limit. If we trigger auto-delete on
> hitting hard-limit, then
> the purpose itself is lost, because at that point we would be taking a
> snap, making the limit
> hard-limit + 1, and then triggering auto-delete, which violates the
> sanctity of the hard-limit.
> Also what happens when we are at hard-limit + 1, and another snap is
> issued, while auto-delete
> is yet to process the first delete. At that point we end up at
> hard-limit + 1. Also what happens
> if for a particular snap the auto-delete fails.
>
> We should see the hard-limit, as something set by the admin keeping in
> mind the resource consumption
> and at no-point should we cross this limit, come what may. If we hit
> this limit, the create command
> should fail asking the user to delete snaps using the "snapshot
> delete" command.
>
> The two options Raghavendra mentioned are applicable for the
> soft-limit only, in which cases on
> hitting the soft-limit
>
> 1. Trigger auto-delete
>
> or
>
> 2. Log a warning-message, for the user saying the number of snaps is
> exceeding the snap-limit and
> display the number of available snaps
>
> Now which of these should happen also depends on the user, because the
> auto-delete option
> is configurable.
>
> So if the auto-delete option is set as true, auto-delete should be
> triggered and the above message
> should also be logged.
>
> But if the option is set as false, only the message should be logged.
>
> This is the behaviour as designed. Adding Rahul, and Seema in the
> mail, to reflect upon the
> behaviour as well.
>
> Regards,
> Avra


 This sounds correct. However we need to make sure that the usage or
 documentation around this should be good enough , so that users
 understand the each of the limits correctly.

>>>
>>> It might be better to avoid the usage of the term "soft-limit".
>>> soft-limit as used in quota and other places generally has an alerting
>>> connotation. Something like "auto-deletion-limit" might be better.
>>>
>>
>> I still see references to "soft-limit" and auto deletion seems to get
>> triggered upon reaching soft-limit.
>>
>> Why is the ability to auto delete not configurable? It does seem pretty
>> nasty to go about deleting snapshots without obtaining explicit consent from
>> the user.
>
>
> I agree with Vijay here. It's not good to delete a snap (even though it is
> oldest) without the explicit consent from user.
>
> FYI It took me more than 2 weeks to figure out that my snaps were getting
> autodeleted after reaching "soft-limit". For all I know I had not done
> anything and my snap restore were failing.
>
> I propose to remove the terms "soft" and "hard" limit. I believe there
> should be a limit (just "limit") after which all snapshot creates should
> fail with proper error messages. And there can be a water-mark after which
> user should get warning messages. So below is my proposal.
>
> auto-delete + snap-limit:  If the snap-limit is set to n, next snap create
> (n+1th) will succeed only if if auto-delete is set to on/true/1 and oldest
> snap will get deleted automatically. If autodelete is set to off/false/0 ,
> (n+1)th snap create will fail with proper error message from gluster CLI
> command.  But again by default autodelete should be off.
>
> snap-water-mark: This should come in picture only if autodelete is turned
> off. It should not have any meaning if auto-delete is turned ON. Basically
> it's usage is to give the user warning that limit almost being reached and
> it is time for admin to decide which snaps should be deleted (or which
> should be kept)
>
> *my two cents*
>
> -MS
>
>>
>>
>> Cheers,
>>
>> Vijay
>>
>> ___
>> Gluster-devel mailing list
>> Gluster-devel@gluster.org
>> http://supercolony.gluster.org/mailman/listinfo/gluster-devel
>
>
>
> ___
> Gluster-devel mailing list
> Gluster-devel@gluster.org
> http://supercolony.gluster.org/mailman/listinfo/gluster-devel
>
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] inode lru limit

2014-06-03 Thread Raghavendra Gowdappa
> >> Hi,
> >>
> >> But as of now the inode table is bound to bound_xl which is associated
> >> with
> >> the client_t object for the client being connected. As part of fops we can
> >> get the bound_xl (thus the inode table) from the rpc request
> >> (req->trans->xl_private). But in reconfigure we get just the xlator
> >> pointer
> >> of protocol/server and dict containing new options.
> >>
> >> So what I am planning is this. If the xprt_list (transport list
> >> corresponding
> >> to the clients mounted) is empty, then just set the private structure's
> >> variable for lru limit (which will be used to create the inode table when
> >> a
> >> client mounts). If xprt_list of protocol/server's private structure is not
> >> empty, then get one of the transports from that list and get the client_t
> >> object corresponding to the transport, from which bould_xl is obtained
> >> (all
> >> the client_t objects share the same inode table) . Then from bound_xl
> >> pointer to inode table is got and its variable for lru limit is also set
> >> to
> >> the value specified via cli and inode_table_prune is called to purge the
> >> extra inodes.
> > In the above proposal if there are no active clients, lru limit of itable
> > is not reconfigured. Here are two options to improve correctness of your
> > proposal.


> If there are no active clients, then there will not be any itable.
> itable will be created when 1st client connects to the brick. And while
> creating the itable we use the inode_lru_limit variable present in
> protocol/server's private structure and inode table that is created also
> saves the same value.

A zero client current count doesn't mean that itables are absent in bounded_xl. 
There can be previous connections which resulted in itable creations.

> > 1. On a successful handshake, you check whether the lru_limit of itable is
> > equal to configured value. If not equal, set it to the configured value
> > and prune the itable. The cost is that you check inode table's lru limit
> > on every client connection.
> On successful handshake, for the 1st client inode table will be created
> with lru_limit value saved in protocol/server's private. For further
> handshakes since inode table is already there, new inode tables will not
> be created. So instead of waiting for a new handshake to happen to set
> the lru_limit and purge the inode table, I think its better to do it as
> part of reconfigure itself.
> >
> > 2. Traverse through the list of all xlators (since there is no easy way of
> > finding potential candidates for bound_xl other than peaking into options
> > specific to authentication) and if there is an itable associated with that
> > xlator, set its lru limit and prune it. The cost here is traversing the
> > list of xlators. However, our xlator list in brick process is relatively
> > small, this shouldn't have too much performance impact.
> >
> > Comments are welcome.
> 
> Regards,
> Raghavendra Bhat
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] doubts in posix_handle_path and posix_handle_pump

2014-06-03 Thread Xavier Hernandez
Hi Pranith,

I'm not Avati but I think I can explain the purpose of these functions.

As I've interpreted it, posix_handle_path() should return the path to a "real" 
entry of the filesystem. It can be called for all kind of files, but it seems 
that basename is != NULL only for entries where gfid refers to a directory. 
That seems good.

If the specified gfid refers to anything other than a directory, since they 
are stored inside .glusterfs as hard links, the returned path should simply be 
/.glusterfs/xx/yy/.

For gfid's that specify directories, since they are stored inside .glusterfs 
as symlinks, they need to be dereferenced before returning (to point to a real 
directory). That is the purpose of line 362: only if the lstat succeeded (it 
basically means that the entry exists), the path refers to a symbolic link, 
and it has more than 1 hard link (to differentiate a symlink of a directory, 
with 1 hard link, from a "normal" symlink, with at least 2 hard links) then it 
tries to replace the symlink by its referenced real directory.

This is done using the function posix_handle_pump(). This function reads the 
symlink and replaces it into the path:

  If the original path were: /cd/47/cd473c26-b6d6-42a6-a0a4-28e04de13915
  And cd473c26-b6d6-42a6-a0a4-28e04de13915 points to:
  ../../0e/5d/0e5d04a1-f418-4388-90db-d1d7ebb8a071/dir1
  The result is: /0e/5d/0e5d04a1-f418-4388-90db-d1d7ebb8a071/dir1

The returned path points to a directory, so it should be ok. However since 
this path can be composed by other symlinks (in this case 0e5d04a1-
f418-4388-90db-d1d7ebb8a071 will be another symlink), the system needs to 
resolve them every time it accesses the entry. If there are too many of them 
(for example an entry very deep in the directory hierarchy) the system calls 
will return ELOOP (even if there isn't any symlink loop, the system has a 
limit in the number of symlinks it will resolve for any request). This is what 
is tested in the lstat() call at line 374 and the following test at 375: while 
ELOOP is returned by lstat, posix_handle_pump() is called to remove another 
level of symlink indirection. This guarantees that the returned path points to 
a directory *and* it won't fail on system calls with ELOOP.

It's very easy to check this:

  # cd 
  # mkdir -p d/d/d/d/d/d/d/d/d/d/d/d/d/d/d/d
  # cd 
  # getfattr -m. -e hex -d d/d/d/d/d/d/d/d/d/d/d/d/d/d/d/d
  # file: d/d/d/d/d/d/d/d/d/d/d/d/d/d/d/d
  trusted.gfid=0xf6e98d542c4f48259976861b7269a396
  trusted.glusterfs.dht=0x0001
  # ls .glusterfs/f6/e9/f6e98d54-2c4f-4825-9976-861b7269a396
  ls: cannot access .glusterfs/f6/e9/f6e98d54-2c4f-4825-9976-861b7269a396: Too 
many levels of symbolic links

The loop on posix_handle_pump() avoids this problem.

The possible problem I see is that in the comments it says that this function 
returns a path to an IA_IFDIR (it will return IA_IFDIR on an lstat), however 
if one of the symlinks is missing or anything else fails, it won't return an 
error *but* it will return a path to an existing file. An lstat on this path 
will return IA_IFLNK instead of IA_IFDIR. I don't know if this can be a 
problem in some places.

Xavi

On Monday 02 June 2014 02:00:30 Pranith Kumar Karampuri wrote:
> hi Avati,
>Could you please explain why posix_handle_pump fixes ELOOP.
> 
> posix_handle_path seems to return gfid-path when the lstat on the base-gfid
> path fails. I am not sure what the behaviour is supposed to be.
> 
> I added the snippet of code for reference below.
> 
> 346 base_len = (priv->base_path_length + SLEN(GF_HIDDEN_PATH) + 45);
> 347 base_str = alloca (base_len + 1);
> 348 base_len = snprintf (base_str, base_len + 1,
> "%s/%s/%02x/%02x/%s", 349  priv->base_path,
> GF_HIDDEN_PATH, gfid[0], gfid[1], 350 
> uuid_str);
> 351
> 352 pfx_len = priv->base_path_length + 1 + SLEN(GF_HIDDEN_PATH) + 1;
> 353
> 354 if (basename) {
> 355 len = snprintf (buf, maxlen, "%s/%s", base_str,
> basename); 356 } else {
> 357 len = snprintf (buf, maxlen, "%s", base_str);
> 358 }
> 359
> 360 ret = lstat (base_str, &stat);
> 361
> 362 if (!(ret == 0 && S_ISLNK(stat.st_mode) && stat.st_nlink == 1)) 
> <<<- 363 goto out;
> 
> Pranith.
> ___
> Gluster-devel mailing list
> Gluster-devel@gluster.org
> http://supercolony.gluster.org/mailman/listinfo/gluster-devel

___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] op-version issues in the 3.5.1 beta release

2014-06-03 Thread Niels de Vos
On Tue, Jun 03, 2014 at 11:01:25AM +0530, Kaushal M wrote:
> Niels,
> This approach will work for well when the cluster is uniform, ie. of
> the same (major) version. This could lead to problems in mixed
> clusters when using volume set. Volume set compares the op-versions of
> the options being set and will reject the set operation when the
> op-versions are different. So, if a user were to run a mixed cluster
> of gluster-3.5.1 and gluster-3.6.0, he wouldn't be able to set
> server.manage-gids as its op-versions would be different.

Thanks Kaushal, that is exactly my expectation.

> But I don't expect anyone to be running such a cluster always. It
> would mostly be during upgrades, during which users shouldn't be doing
> volume operations.

Yes, I agree. And it should be easy enough to diagnose 'volume set' 
issues when glusterfs versions are different.

Would you like me to send a patch for the master branch that makes the 
changes mentioned below, or is that something you can do soon? I'll be 
waiting for the change to be merged before 3.5.1 can get the updated 
op-version for server.manage-gids.

Thanks,
Niels

> 
> ~kaushal
> 
> On Mon, Jun 2, 2014 at 8:28 PM, Niels de Vos  wrote:
> > Hi,
> >
> > today on IRC we has a discussion about the op-version for the current
> > 3.5.1 beta. This beta includes a backport that introduces a new volume
> > option (server.manage-gids) and needed to increase the op-version to
> > prevent issues with systems that do not know about this new option.
> >
> > Currently, the op-version in 3.5.1 is (seems to be) hard-coded to '3':
> >
> >   libglusterfs/src/globals.h:#define GD_OP_VERSION_MAX  3
> >
> > Now, the new option required a glusterd with op-version=4. This worked
> > fine when setting the option, and glusterd.info got updated too.
> > Unfortunately, a restart of glusterd fails, because the op-version from
> > the configuration is greater than the GD_OP_VERSION_MAX.
> >
> > Increasing GD_OP_VERSION_MAX is not really suitable, because
> > op-version=4 would make other systems assume that the 3.5.1 release has
> > all the op-version=4 features (incorrect, because the upcoming 3.6 has
> > op-version=4).
> >
> > I see one option to fix this issue, that allows stable branches to
> > include backports of volume options and similar, without conflicting
> > with the development branch or newer versions:
> >
> > 1. define an op-version as multi-digit value, with gaps for stable
> >releases
> > 2. stable releases may only include backports of volume options that are
> >in the development branch and newer versions
> > 3. stable maintainers should pay extra care when new volume options are
> >being backported
> >
> > The idea is the following:
> >
> > - update the hard-coded op-version in libglusterfs/src/globals.h in the
> >   master branch to 360 (based on the 3.6 release for easier matching)
> > - update any options that have op-version >= 4 to 360 (master branch)
> > - update the op-version in libglusterfs/src/globals.h in the release-3.5
> >   branch to 351
> > - update the op-version of server.manage-gids option in 3.5.1 to 351
> >
> >
> > The only issue would be that current 3.6 packages in testing have
> > a lower op-version than the new 3.5.1 packages. I hope it is not
> > a common practise to have systems installed with packages from the
> > master-branch in the same environment as 3.5.1 servers.
> >
> > Any ideas, suggestions or thoughts?
> >
> > If this can not be solved in a similar easy way, I will be forced to
> > revert the 3.5.1 server.manage-gids option. Users were expecting this to
> > be present so that deployments with many (ca. 93+) secondary groups have
> > permissions working as expected.
> >
> > Thanks,
> > Niels
> > ___
> > Gluster-devel mailing list
> > Gluster-devel@gluster.org
> > http://supercolony.gluster.org/mailman/listinfo/gluster-devel
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] autodelete in snapshots

2014-06-03 Thread Rajesh Joseph


- Original Message -
From: "M S Vishwanath Bhat" 
To: "Vijay Bellur" 
Cc: "Seema Naik" , "Gluster Devel" 

Sent: Tuesday, June 3, 2014 1:02:08 AM
Subject: Re: [Gluster-devel] autodelete in snapshots




On 2 June 2014 20:22, Vijay Bellur < vbel...@redhat.com > wrote: 



On 04/23/2014 05:50 AM, Vijay Bellur wrote: 


On 04/20/2014 11:42 PM, Lalatendu Mohanty wrote: 


On 04/16/2014 11:39 AM, Avra Sengupta wrote: 


The whole purpose of introducing the soft-limit is, that at any point 
of time the number of 
snaps should not exceed the hard limit. If we trigger auto-delete on 
hitting hard-limit, then 
the purpose itself is lost, because at that point we would be taking a 
snap, making the limit 
hard-limit + 1, and then triggering auto-delete, which violates the 
sanctity of the hard-limit. 
Also what happens when we are at hard-limit + 1, and another snap is 
issued, while auto-delete 
is yet to process the first delete. At that point we end up at 
hard-limit + 1. Also what happens 
if for a particular snap the auto-delete fails. 

We should see the hard-limit, as something set by the admin keeping in 
mind the resource consumption 
and at no-point should we cross this limit, come what may. If we hit 
this limit, the create command 
should fail asking the user to delete snaps using the "snapshot 
delete" command. 

The two options Raghavendra mentioned are applicable for the 
soft-limit only, in which cases on 
hitting the soft-limit 

1. Trigger auto-delete 

or 

2. Log a warning-message, for the user saying the number of snaps is 
exceeding the snap-limit and 
display the number of available snaps 

Now which of these should happen also depends on the user, because the 
auto-delete option 
is configurable. 

So if the auto-delete option is set as true, auto-delete should be 
triggered and the above message 
should also be logged. 

But if the option is set as false, only the message should be logged. 

This is the behaviour as designed. Adding Rahul, and Seema in the 
mail, to reflect upon the 
behaviour as well. 

Regards, 
Avra 

This sounds correct. However we need to make sure that the usage or 
documentation around this should be good enough , so that users 
understand the each of the limits correctly. 


It might be better to avoid the usage of the term "soft-limit". 
soft-limit as used in quota and other places generally has an alerting 
connotation. Something like "auto-deletion-limit" might be better. 


I still see references to "soft-limit" and auto deletion seems to get triggered 
upon reaching soft-limit. 

Why is the ability to auto delete not configurable? It does seem pretty nasty 
to go about deleting snapshots without obtaining explicit consent from the 
user. 

I agree with Vijay here. It's not good to delete a snap (even though it is 
oldest) without the explicit consent from user. 

FYI It took me more than 2 weeks to figure out that my snaps were getting 
autodeleted after reaching "soft-limit". For all I know I had not done anything 
and my snap restore were failing. 

I propose to remove the terms "soft" and "hard" limit. I believe there should 
be a limit (just "limit") after which all snapshot creates should fail with 
proper error messages. And there can be a water-mark after which user should 
get warning messages. So below is my proposal. 

auto-delete + snap-limit: If the snap-limit is set to n , next snap create 
(n+1th) will succeed only if if auto-delete is set to on/true/1 and oldest snap 
will get deleted automatically. If autodelete is set to off/false/0 , (n+1)th 
snap create will fail with proper error message from gluster CLI command. But 
again by default autodelete should be off. 

snap-water-mark : This should come in picture only if autodelete is turned off. 
It should not have any meaning if auto-delete is turned ON. Basically it's 
usage is to give the user warning that limit almost being reached and it is 
time for admin to decide which snaps should be deleted (or which should be 
kept) 

*my two cents* 

-MS 


The reason for having a hard-limit is to stop snapshot creation once we reached 
this limit. This helps to have a control over the resource consumption. 
Therefore if we only have this limit (as snap-limit) then there is no question 
of auto-delete. Auto-delete can only be triggered once the count crosses the 
limit. Therefore we introduced the concept of soft-limit and a hard-limit. As 
the name suggests once the hard-limit is reached no more snaps will be created.

So the idea is to keep the number of snapshots always less than the hard-limit. 
To do so we introduced the concept of soft-limit, wherein we allow snapshots 
even when this limit is crossed and once the snapshot is taken we delete the 
oldest snap. If you consider this definition then the name soft-limit and 
hard-limit looks ok to me.

In phase II we are planning to have auto-delete feature configurable with 
different policies, e.g. delete oldest, delete with more space con

Re: [Gluster-devel] doubts in posix_handle_path and posix_handle_pump

2014-06-03 Thread Pranith Kumar Karampuri


On 06/03/2014 02:42 PM, Xavier Hernandez wrote:

Hi Pranith,

I'm not Avati but I think I can explain the purpose of these functions.

As I've interpreted it, posix_handle_path() should return the path to a "real"
entry of the filesystem. It can be called for all kind of files, but it seems
that basename is != NULL only for entries where gfid refers to a directory.
That seems good.

If the specified gfid refers to anything other than a directory, since they
are stored inside .glusterfs as hard links, the returned path should simply be
/.glusterfs/xx/yy/.

For gfid's that specify directories, since they are stored inside .glusterfs
as symlinks, they need to be dereferenced before returning (to point to a real
directory). That is the purpose of line 362: only if the lstat succeeded (it
basically means that the entry exists), the path refers to a symbolic link,
and it has more than 1 hard link (to differentiate a symlink of a directory,
with 1 hard link, from a "normal" symlink, with at least 2 hard links) then it
tries to replace the symlink by its referenced real directory.

This is done using the function posix_handle_pump(). This function reads the
symlink and replaces it into the path:

   If the original path were: /cd/47/cd473c26-b6d6-42a6-a0a4-28e04de13915
   And cd473c26-b6d6-42a6-a0a4-28e04de13915 points to:
   ../../0e/5d/0e5d04a1-f418-4388-90db-d1d7ebb8a071/dir1
   The result is: /0e/5d/0e5d04a1-f418-4388-90db-d1d7ebb8a071/dir1

The returned path points to a directory, so it should be ok. However since
this path can be composed by other symlinks (in this case 0e5d04a1-
f418-4388-90db-d1d7ebb8a071 will be another symlink), the system needs to
resolve them every time it accesses the entry. If there are too many of them
(for example an entry very deep in the directory hierarchy) the system calls
will return ELOOP (even if there isn't any symlink loop, the system has a
limit in the number of symlinks it will resolve for any request). This is what
is tested in the lstat() call at line 374 and the following test at 375: while
ELOOP is returned by lstat, posix_handle_pump() is called to remove another
level of symlink indirection. This guarantees that the returned path points to
a directory *and* it won't fail on system calls with ELOOP.

It's very easy to check this:

   # cd 
   # mkdir -p d/d/d/d/d/d/d/d/d/d/d/d/d/d/d/d
   # cd 
   # getfattr -m. -e hex -d d/d/d/d/d/d/d/d/d/d/d/d/d/d/d/d
   # file: d/d/d/d/d/d/d/d/d/d/d/d/d/d/d/d
   trusted.gfid=0xf6e98d542c4f48259976861b7269a396
   trusted.glusterfs.dht=0x0001
   # ls .glusterfs/f6/e9/f6e98d54-2c4f-4825-9976-861b7269a396
   ls: cannot access .glusterfs/f6/e9/f6e98d54-2c4f-4825-9976-861b7269a396: Too
many levels of symbolic links

The loop on posix_handle_pump() avoids this problem.


Wonderful explanation! :-). Thanks!. Now I understand why there is that 
while loop. I will add this part of the mail as a comment for that 
function when I re-submit the code.




The possible problem I see is that in the comments it says that this function
returns a path to an IA_IFDIR (it will return IA_IFDIR on an lstat), however
if one of the symlinks is missing or anything else fails, it won't return an
error *but* it will return a path to an existing file. An lstat on this path
will return IA_IFLNK instead of IA_IFDIR. I don't know if this can be a
problem in some places.
This is exactly what I was referring to, I don't see an easy way to find 
out if there is any failure in the function. One needs to do extra lstat 
or a 'path' based syscall like getxattr etc on the returned path to 
check if it returned a good path. So do you think the best thing is to 
ignore the return value of the function call but depend on an lstat or a 
path based syscall of the path?


Xavi

On Monday 02 June 2014 02:00:30 Pranith Kumar Karampuri wrote:

hi Avati,
Could you please explain why posix_handle_pump fixes ELOOP.

posix_handle_path seems to return gfid-path when the lstat on the base-gfid
path fails. I am not sure what the behaviour is supposed to be.

I added the snippet of code for reference below.

346 base_len = (priv->base_path_length + SLEN(GF_HIDDEN_PATH) + 45);
347 base_str = alloca (base_len + 1);
348 base_len = snprintf (base_str, base_len + 1,
"%s/%s/%02x/%02x/%s", 349  priv->base_path,
GF_HIDDEN_PATH, gfid[0], gfid[1], 350
uuid_str);
351
352 pfx_len = priv->base_path_length + 1 + SLEN(GF_HIDDEN_PATH) + 1;
353
354 if (basename) {
355 len = snprintf (buf, maxlen, "%s/%s", base_str,
basename); 356 } else {
357 len = snprintf (buf, maxlen, "%s", base_str);
358 }
359
360 ret = lstat (base_str, &stat);
361
362 if (!(ret == 0 && S_ISLNK(stat.st_mode) && stat.st_nlink == 1))
 <<<- 363 goto out;

Pranith.
___
Gluster-devel mailing list
Gl

Re: [Gluster-devel] doubts in posix_handle_path and posix_handle_pump

2014-06-03 Thread Xavier Hernandez
On Tuesday 03 June 2014 15:42:19 Pranith Kumar Karampuri wrote:
> On 06/03/2014 02:42 PM, Xavier Hernandez wrote:
> > The possible problem I see is that in the comments it says that this
> > function returns a path to an IA_IFDIR (it will return IA_IFDIR on an
> > lstat), however if one of the symlinks is missing or anything else fails,
> > it won't return an error *but* it will return a path to an existing file.
> > An lstat on this path will return IA_IFLNK instead of IA_IFDIR. I don't
> > know if this can be a problem in some places.
> 
> This is exactly what I was referring to, I don't see an easy way to find
> out if there is any failure in the function. One needs to do extra lstat
> or a 'path' based syscall like getxattr etc on the returned path to
> check if it returned a good path. So do you think the best thing is to
> ignore the return value of the function call but depend on an lstat or a
> path based syscall of the path?
> 
The only point to consider is for gfid's representing directories. Other types 
of file do not have any problem (the returned path can be considered valid 
even if lstat() fails). For directories there are 3 places where things can 
fail:

At line 360: I think this is not a problem. If lstat() fails (basically 
because it does not exist), the returned path can be considered valid.

At line 367: If posix_handle_pump() fails, it could mean:

* The symlink is not a valid directory symlink:
  * it's a corrupted one: any operation on this file should be denied
  * it's a normal symlink that has lost one of the hard-links: though it's bad 
to have damaged gfid's, the returned path can be considered valid.

* readlink() failed: this woulb be very weird. Access to the file should be 
denied.

At line 374: If lstat() fails, probably it means that the symlink of one of 
the parents of the directory is missing. The returned path won't fail on 
lstat(), but it should because it will return symlink information instead of 
directory information.

I think that it's very hard to determine if something went wrong only by 
inspecting the returned path. I think the best approach would be that 
posix_handle_path() return -1 if posix_handle_pump() or lstat() at line 374 
fail, and each caller decide what to do in case of failure.

However I don't know all the details of the posix xlator, so maybe I'm wrong 
and this is not necessary. Let's see if there is someone else with more 
experience on it to see what he thinks.

Xavi
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] All builds are failing with BUILD ERROR

2014-06-03 Thread Pranith Kumar Karampuri

Guys its failing again with the same error:

Please proceed with configuring, compiling, and installing.
rm: cannot remove `/build/install/var/run/gluster/patchy': Device or resource 
busy
+ RET=1
+ '[' 1 '!=' 0 ']'
+ VERDICT='BUILD FAILURE'


Pranith

On 06/02/2014 09:08 PM, Justin Clift wrote:

On 02/06/2014, at 7:04 AM, Kaleb KEITHLEY wrote:


someone cleaned the loopback devices. I deleted 500 unix domain sockets in 
/d/install/var/run and requeued the regressions.

Interesting.  The extra sockets problem is what prompted me
to rewrite the cleanup function.  The sockets are being
created by glusterd during each test startup, but aren't
removed by the existing cleanup function.  (so, substantial
build up over time)



I'm not sure which of those two things was the solution.


_Probably_ the loopback device thing.  The extra sockets seem
to be "messy" but (so far) I haven't seen them break anything.

+ Justin

--
Open Source and Standards @ Red Hat

twitter.com/realjustinclift

___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-devel


___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] op-version issues in the 3.5.1 beta release

2014-06-03 Thread Kaushal M
I've sent out a draft/rfc patch on the master branch
http://review.gluster.org/7963.

~kaushal

On Tue, Jun 3, 2014 at 3:20 PM, Niels de Vos  wrote:
> On Tue, Jun 03, 2014 at 11:01:25AM +0530, Kaushal M wrote:
>> Niels,
>> This approach will work for well when the cluster is uniform, ie. of
>> the same (major) version. This could lead to problems in mixed
>> clusters when using volume set. Volume set compares the op-versions of
>> the options being set and will reject the set operation when the
>> op-versions are different. So, if a user were to run a mixed cluster
>> of gluster-3.5.1 and gluster-3.6.0, he wouldn't be able to set
>> server.manage-gids as its op-versions would be different.
>
> Thanks Kaushal, that is exactly my expectation.
>
>> But I don't expect anyone to be running such a cluster always. It
>> would mostly be during upgrades, during which users shouldn't be doing
>> volume operations.
>
> Yes, I agree. And it should be easy enough to diagnose 'volume set'
> issues when glusterfs versions are different.
>
> Would you like me to send a patch for the master branch that makes the
> changes mentioned below, or is that something you can do soon? I'll be
> waiting for the change to be merged before 3.5.1 can get the updated
> op-version for server.manage-gids.
>
> Thanks,
> Niels
>
>>
>> ~kaushal
>>
>> On Mon, Jun 2, 2014 at 8:28 PM, Niels de Vos  wrote:
>> > Hi,
>> >
>> > today on IRC we has a discussion about the op-version for the current
>> > 3.5.1 beta. This beta includes a backport that introduces a new volume
>> > option (server.manage-gids) and needed to increase the op-version to
>> > prevent issues with systems that do not know about this new option.
>> >
>> > Currently, the op-version in 3.5.1 is (seems to be) hard-coded to '3':
>> >
>> >   libglusterfs/src/globals.h:#define GD_OP_VERSION_MAX  3
>> >
>> > Now, the new option required a glusterd with op-version=4. This worked
>> > fine when setting the option, and glusterd.info got updated too.
>> > Unfortunately, a restart of glusterd fails, because the op-version from
>> > the configuration is greater than the GD_OP_VERSION_MAX.
>> >
>> > Increasing GD_OP_VERSION_MAX is not really suitable, because
>> > op-version=4 would make other systems assume that the 3.5.1 release has
>> > all the op-version=4 features (incorrect, because the upcoming 3.6 has
>> > op-version=4).
>> >
>> > I see one option to fix this issue, that allows stable branches to
>> > include backports of volume options and similar, without conflicting
>> > with the development branch or newer versions:
>> >
>> > 1. define an op-version as multi-digit value, with gaps for stable
>> >releases
>> > 2. stable releases may only include backports of volume options that are
>> >in the development branch and newer versions
>> > 3. stable maintainers should pay extra care when new volume options are
>> >being backported
>> >
>> > The idea is the following:
>> >
>> > - update the hard-coded op-version in libglusterfs/src/globals.h in the
>> >   master branch to 360 (based on the 3.6 release for easier matching)
>> > - update any options that have op-version >= 4 to 360 (master branch)
>> > - update the op-version in libglusterfs/src/globals.h in the release-3.5
>> >   branch to 351
>> > - update the op-version of server.manage-gids option in 3.5.1 to 351
>> >
>> >
>> > The only issue would be that current 3.6 packages in testing have
>> > a lower op-version than the new 3.5.1 packages. I hope it is not
>> > a common practise to have systems installed with packages from the
>> > master-branch in the same environment as 3.5.1 servers.
>> >
>> > Any ideas, suggestions or thoughts?
>> >
>> > If this can not be solved in a similar easy way, I will be forced to
>> > revert the 3.5.1 server.manage-gids option. Users were expecting this to
>> > be present so that deployments with many (ca. 93+) secondary groups have
>> > permissions working as expected.
>> >
>> > Thanks,
>> > Niels
>> > ___
>> > Gluster-devel mailing list
>> > Gluster-devel@gluster.org
>> > http://supercolony.gluster.org/mailman/listinfo/gluster-devel
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-devel


[Gluster-devel] Erasure coding doubts session

2014-06-03 Thread Pranith Kumar Karampuri

hi Xavier,
Some of the developers are reading the code you submitted for 
erasure code. We want to know if you would be available on Friday IST so 
that we can have a discussion and doubt clarification session on IRC. 
Could you tell which time is good for you.


Pranith
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] Erasure coding doubts session

2014-06-03 Thread Xavier Hernandez
Hi Pranith,

On Tuesday 03 June 2014 17:04:12 Pranith Kumar Karampuri wrote:
> hi Xavier,
>  Some of the developers are reading the code you submitted for
> erasure code. We want to know if you would be available on Friday IST so
> that we can have a discussion and doubt clarification session on IRC.
> Could you tell which time is good for you.
Sure. I can get some time between 12:30 PM IST and 6:30 PM IST. Tell me when 
do you prefer.

Xavi
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] autodelete in snapshots

2014-06-03 Thread M S Vishwanath Bhat
On 3 June 2014 15:21, Rajesh Joseph  wrote:

>
>
> - Original Message -
> From: "M S Vishwanath Bhat" 
> To: "Vijay Bellur" 
> Cc: "Seema Naik" , "Gluster Devel" <
> gluster-devel@gluster.org>
> Sent: Tuesday, June 3, 2014 1:02:08 AM
> Subject: Re: [Gluster-devel] autodelete in snapshots
>
>
>
>
> On 2 June 2014 20:22, Vijay Bellur < vbel...@redhat.com > wrote:
>
>
>
> On 04/23/2014 05:50 AM, Vijay Bellur wrote:
>
>
> On 04/20/2014 11:42 PM, Lalatendu Mohanty wrote:
>
>
> On 04/16/2014 11:39 AM, Avra Sengupta wrote:
>
>
> The whole purpose of introducing the soft-limit is, that at any point
> of time the number of
> snaps should not exceed the hard limit. If we trigger auto-delete on
> hitting hard-limit, then
> the purpose itself is lost, because at that point we would be taking a
> snap, making the limit
> hard-limit + 1, and then triggering auto-delete, which violates the
> sanctity of the hard-limit.
> Also what happens when we are at hard-limit + 1, and another snap is
> issued, while auto-delete
> is yet to process the first delete. At that point we end up at
> hard-limit + 1. Also what happens
> if for a particular snap the auto-delete fails.
>
> We should see the hard-limit, as something set by the admin keeping in
> mind the resource consumption
> and at no-point should we cross this limit, come what may. If we hit
> this limit, the create command
> should fail asking the user to delete snaps using the "snapshot
> delete" command.
>
> The two options Raghavendra mentioned are applicable for the
> soft-limit only, in which cases on
> hitting the soft-limit
>
> 1. Trigger auto-delete
>
> or
>
> 2. Log a warning-message, for the user saying the number of snaps is
> exceeding the snap-limit and
> display the number of available snaps
>
> Now which of these should happen also depends on the user, because the
> auto-delete option
> is configurable.
>
> So if the auto-delete option is set as true, auto-delete should be
> triggered and the above message
> should also be logged.
>
> But if the option is set as false, only the message should be logged.
>
> This is the behaviour as designed. Adding Rahul, and Seema in the
> mail, to reflect upon the
> behaviour as well.
>
> Regards,
> Avra
>
> This sounds correct. However we need to make sure that the usage or
> documentation around this should be good enough , so that users
> understand the each of the limits correctly.
>
>
> It might be better to avoid the usage of the term "soft-limit".
> soft-limit as used in quota and other places generally has an alerting
> connotation. Something like "auto-deletion-limit" might be better.
>
>
> I still see references to "soft-limit" and auto deletion seems to get
> triggered upon reaching soft-limit.
>
> Why is the ability to auto delete not configurable? It does seem pretty
> nasty to go about deleting snapshots without obtaining explicit consent
> from the user.
>
> I agree with Vijay here. It's not good to delete a snap (even though it is
> oldest) without the explicit consent from user.
>
> FYI It took me more than 2 weeks to figure out that my snaps were getting
> autodeleted after reaching "soft-limit". For all I know I had not done
> anything and my snap restore were failing.
>
> I propose to remove the terms "soft" and "hard" limit. I believe there
> should be a limit (just "limit") after which all snapshot creates should
> fail with proper error messages. And there can be a water-mark after which
> user should get warning messages. So below is my proposal.
>
> auto-delete + snap-limit: If the snap-limit is set to n , next snap create
> (n+1th) will succeed only if if auto-delete is set to on/true/1 and oldest
> snap will get deleted automatically. If autodelete is set to off/false/0 ,
> (n+1)th snap create will fail with proper error message from gluster CLI
> command. But again by default autodelete should be off.
>
> snap-water-mark : This should come in picture only if autodelete is turned
> off. It should not have any meaning if auto-delete is turned ON. Basically
> it's usage is to give the user warning that limit almost being reached and
> it is time for admin to decide which snaps should be deleted (or which
> should be kept)
>
> *my two cents*
>
> -MS
>
>
> The reason for having a hard-limit is to stop snapshot creation once we
> reached this limit. This helps to have a control over the resource
> consumption. Therefore if we only have this limit (as snap-limit) then
> there is no question of auto-delete. Auto-delete can only be triggered once
> the count crosses the limit. Therefore we introduced the concept of
> soft-limit and a hard-limit. As the name suggests once the hard-limit is
> reached no more snaps will be created.
>

Perhaps I could have been more clearer. auto-delete value does come into
picture when limit is reached.

There is a limit 'n' (snap-limit), and when we reach this limit, what
happens to next snap create depends on the value of auto-delete ( should be
user configurable)

Re: [Gluster-devel] All builds are failing with BUILD ERROR

2014-06-03 Thread Justin Clift
On 03/06/2014, at 12:12 PM, Pranith Kumar Karampuri wrote:
> Guys its failing again with the same error:
> 
> Please proceed with configuring, compiling, and installing.
> rm: cannot remove `/build/install/var/run/gluster/patchy': Device or resource 
> busy
> + RET=1
> + '[' 1 '!=' 0 ']'
> + VERDICT='BUILD FAILURE'


You have a login to build.gluster.org don't you? :)

+ Justin

--
Open Source and Standards @ Red Hat

twitter.com/realjustinclift

___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] Erasure coding doubts session

2014-06-03 Thread Vijay Bellur

On 06/03/2014 05:31 PM, Xavier Hernandez wrote:

Hi Pranith,

On Tuesday 03 June 2014 17:04:12 Pranith Kumar Karampuri wrote:

hi Xavier,
  Some of the developers are reading the code you submitted for
erasure code. We want to know if you would be available on Friday IST so
that we can have a discussion and doubt clarification session on IRC.
Could you tell which time is good for you.

Sure. I can get some time between 12:30 PM IST and 6:30 PM IST. Tell me when
do you prefer.



Would 2:00 - 3:00 PM IST work for you? We could hang out in 
#gluster-meeting.


-Vijay

___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] Erasure coding doubts session

2014-06-03 Thread Xavier Hernandez
On Tuesday 03 June 2014 19:12:30 Vijay Bellur wrote:
> > Sure. I can get some time between 12:30 PM IST and 6:30 PM IST. Tell me
> > when do you prefer.
> 
> Would 2:00 - 3:00 PM IST work for you? We could hang out in
> #gluster-meeting.
> 
Yes, no problem. I'll be there.

Xavi
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] All builds are failing with BUILD ERROR

2014-06-03 Thread Pranith Kumar Karampuri


On 06/03/2014 07:00 PM, Justin Clift wrote:

On 03/06/2014, at 12:12 PM, Pranith Kumar Karampuri wrote:

Guys its failing again with the same error:

Please proceed with configuring, compiling, and installing.
rm: cannot remove `/build/install/var/run/gluster/patchy': Device or resource 
busy
+ RET=1
+ '[' 1 '!=' 0 ']'
+ VERDICT='BUILD FAILURE'


You have a login to build.gluster.org don't you? :)
I only have login to launch builds. No permission to inspect the machine 
:-(.


Pranith


+ Justin

--
Open Source and Standards @ Red Hat

twitter.com/realjustinclift



___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] All builds are failing with BUILD ERROR

2014-06-03 Thread Justin Clift
On 03/06/2014, at 3:04 PM, Pranith Kumar Karampuri wrote:
> On 06/03/2014 07:00 PM, Justin Clift wrote:
>> On 03/06/2014, at 12:12 PM, Pranith Kumar Karampuri wrote:
>>> Guys its failing again with the same error:
>>> 
>>> Please proceed with configuring, compiling, and installing.
>>> rm: cannot remove `/build/install/var/run/gluster/patchy': Device or 
>>> resource busy
>>> + RET=1
>>> + '[' 1 '!=' 0 ']'
>>> + VERDICT='BUILD FAILURE'
>> 
>> You have a login to build.gluster.org don't you? :)
> I only have login to launch builds. No permission to inspect the machine :-(.


Just checked, and you're definitely in the sudo setup, with the same permissions
as me.  eg you can "sudo su -" to root

If it asks for a password, its your own password.  If you don't know it, let me
know, and I'll set an initial one for you. :)

+ Justin

--
Open Source and Standards @ Red Hat

twitter.com/realjustinclift

___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] All builds are failing with BUILD ERROR

2014-06-03 Thread Pranith Kumar Karampuri


On 06/03/2014 07:45 PM, Justin Clift wrote:

On 03/06/2014, at 3:04 PM, Pranith Kumar Karampuri wrote:

On 06/03/2014 07:00 PM, Justin Clift wrote:

On 03/06/2014, at 12:12 PM, Pranith Kumar Karampuri wrote:

Guys its failing again with the same error:

Please proceed with configuring, compiling, and installing.
rm: cannot remove `/build/install/var/run/gluster/patchy': Device or resource 
busy
+ RET=1
+ '[' 1 '!=' 0 ']'
+ VERDICT='BUILD FAILURE'

You have a login to build.gluster.org don't you? :)

I only have login to launch builds. No permission to inspect the machine :-(.


Just checked, and you're definitely in the sudo setup, with the same permissions
as me.  eg you can "sudo su -" to root

If it asks for a password, its your own password.  If you don't know it, let me
know, and I'll set an initial one for you. :)
Justin fixed the login for me. Feel free to reach out to me as well if 
there is any problem with build machine that anyone needs fixing.


Pranith


+ Justin

--
Open Source and Standards @ Red Hat

twitter.com/realjustinclift



___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-devel


[Gluster-devel] Fwd: New Defects reported by Coverity Scan for GlusterFS

2014-06-03 Thread Lalatendu Mohanty


If you are interested to fix these Coverity issues , check the below 
link for guidelines:

http://www.gluster.org/community/documentation/index.php/Fixing_Issues_Reported_By_Tools_For_Static_Code_Analysis#Coverity

Thanks,
Lala
 Original Message 
Subject:New Defects reported by Coverity Scan for GlusterFS
Date:   Tue, 03 Jun 2014 08:22:38 -0700
From:   scan-ad...@coverity.com



Hi,


Please find the latest report on new defect(s) introduced to GlusterFS found 
with Coverity Scan.

Defect(s) Reported-by: Coverity Scan
Showing 14 of 14 defect(s)


** CID 1220068:  Missing parentheses  (CONSTANT_EXPRESSION_RESULT)
/xlators/features/snapview-server/src/snapview-server.c: 1265 in svs_fgetxattr()

** CID 1220067:  Missing parentheses  (CONSTANT_EXPRESSION_RESULT)
/xlators/features/snapview-server/src/snapview-server.c: 1158 in svs_getxattr()

** CID 1220066:  Logically dead code  (DEADCODE)
/xlators/features/snapview-server/src/snapview-server.c: 1268 in svs_fgetxattr()

** CID 1220065:  Logically dead code  (DEADCODE)
/xlators/features/snapview-server/src/snapview-server.c: 1160 in svs_getxattr()

** CID 1220064:  Logically dead code  (DEADCODE)
/xlators/features/snapview-server/src/snapview-server.c: 594 in 
svs_lookup_entry_point()

** CID 1220060:  Resource leak  (RESOURCE_LEAK)
/xlators/features/snapview-server/src/snapview-server.c: 1500 in 
svs_get_snapshot_list()

** CID 1220059:  Resource leak  (RESOURCE_LEAK)
/xlators/features/snapview-server/src/snapview-server.c: 1500 in 
svs_get_snapshot_list()

** CID 1220058:  Resource leak  (RESOURCE_LEAK)
/xlators/features/snapview-client/src/snapview-client.c: 1315 in svc_readdirp()

** CID 1220057:  Resource leak  (RESOURCE_LEAK)
/xlators/features/snapview-server/src/snapview-server.c: 860 in 
svs_lookup_entry()

** CID 1220063:  Improper use of negative value  (NEGATIVE_RETURNS)
/xlators/features/snapview-server/src/snapview-server.c: 1264 in svs_fgetxattr()

** CID 1220062:  Improper use of negative value  (NEGATIVE_RETURNS)
/xlators/features/snapview-server/src/snapview-server.c: 1157 in svs_getxattr()

** CID 1220061:  Array compared against 0  (NO_EFFECT)
/xlators/features/snapview-server/src/snapview-server.c: 693 in 
svs_lookup_gfid()

** CID 1220056:  Unused pointer value  (UNUSED_VALUE)
/xlators/features/snapview-server/src/snapview-server.c: 999 in svs_lookup()

** CID 1220055:  Use after free  (USE_AFTER_FREE)
/xlators/features/snapview-server/src/snapview-server.c: 1319 in svs_fgetxattr()
/xlators/features/snapview-server/src/snapview-server.c: 1319 in svs_fgetxattr()



*** CID 1220068:  Missing parentheses  (CONSTANT_EXPRESSION_RESULT)
/xlators/features/snapview-server/src/snapview-server.c: 1265 in svs_fgetxattr()
1259 op_errno = EINVAL;
1260 goto out;
1261 }
1262
1263 if (inode_ctx->type == SNAP_VIEW_VIRTUAL_INODE) {
1264 size = glfs_fgetxattr (glfd, name, NULL, 0);

CID 1220068:  Missing parentheses  (CONSTANT_EXPRESSION_RESULT)
"!size == -1" is always false regardless of the values of its operands. Did 
you intend to either negate the entire comparison expression, in which case parentheses 
would be required around the entire comparison expression to force that interpretation, 
or negate the sense of the comparison (that is, use '!=' rather than '==')? This occurs 
as the logical operand of if.

1265 if (!size == -1) {
1266 gf_log (this->name, GF_LOG_ERROR, "getxattr on %s 
"
1267 "failed (key: %s)", uuid_utoa 
(fd->inode->gfid),
1268 name);
1269 op_ret = -1;
1270 op_errno = errno;


*** CID 1220067:  Missing parentheses  (CONSTANT_EXPRESSION_RESULT)
/xlators/features/snapview-server/src/snapview-server.c: 1158 in svs_getxattr()
1152 op_errno = EINVAL;
1153 goto out;
1154 } else if (inode_ctx->type == SNAP_VIEW_VIRTUAL_INODE) {
1155 fs = inode_ctx->fs;
1156 object = inode_ctx->object;
1157 size = glfs_h_getxattrs (fs, object, name, NULL, 0);

CID 1220067:  Missing parentheses  (CONSTANT_EXPRESSION_RESULT)
"!size == -1" is always false regardless of the values of its operands. Did 
you intend to either negate the entire comparison expression, in which case parentheses 
would be required around the entire comparison expression to force that interpretation, 
or negate the sense of the comparison (that is, use '!=' rather than '==')? This occurs 
as the logical operand of if.

1158 if (!size == -1) {
1159  

Re: [Gluster-devel] [Gluster-users] Need testers for GlusterFS 3.4.4

2014-06-03 Thread Justin Clift
On 03/06/2014, at 9:05 PM, Ben Turner wrote:
>> From: "Justin Clift" 
>> Sent: Thursday, May 29, 2014 6:12:40 PM

>> Excellent Ben!  Please send feedback to gluster-devel. :)
> 
> So far so good on 3.4.4, sorry for the delay here.  I had to fix my 
> downstream test suites to run outside of RHS / downstream gluster.  I did 
> basic sanity testing on glusterfs mounts including:
> 
> FSSANITY_TEST_LIST: arequal bonnie glusterfs_build compile_kernel dbench dd 
> ffsb fileop fsx fs_mark iozone locks ltp multiple_files posix_compliance 
> postmark read_large rpc syscallbench tiobench
> 
> I am starting on NFS now, I'll have results tonight or tomorrow morning.  
> I'll look updating the component scripts to work and run them as well.


Thanks Ben. :)

+ Justin

--
Open Source and Standards @ Red Hat

twitter.com/realjustinclift

___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] [Gluster-users] [RFC] GlusterFS Operations Guide

2014-06-03 Thread Paul Cuzner
This is a really good initiative Lala. 

Anything that helps Operations folks always gets my vote :) 

I've added a few items to the etherpad. 

Cheers, 

PC 

- Original Message -

> From: "Lalatendu Mohanty" 
> To: gluster-us...@gluster.org, gluster-devel@gluster.org
> Sent: Friday, 30 May, 2014 11:33:35 PM
> Subject: Re: [Gluster-users] [Gluster-devel] [RFC] GlusterFS Operations Guide

> On 05/30/2014 04:50 PM, Lalatendu Mohanty wrote:

> > I think it is time to create an operations/ops guide for GlusterFS.
> > Operations guide should address issues which administrators face while
> > running/maintaining GlusterFS storage nodes. Openstack project has an
> > operations guide [2] which try to address similar issues and it is pretty
> > cool.
> 

> > IMO these are the typical/example questions which operations guide should
> > try
> > to address.
> 

> > * Maintenance, Failures, and Debugging
> 

> > > * What are steps for planned maintenance for GlusterFS node?
> > 
> 

> > > * Steps for replacing a failed node?
> > 
> 

> > > * Steps to decommission a brick?
> > 
> 

> > * Logging and Monitoring
> 

> > > * Where are the log files?
> > 
> 

> > > * How to find-out if self-heal is working properly?
> > 
> 

> > > * Which log files to monitor for detecting failures?
> > 
> 

> > Operating guide needs good amount of work, hence we all need to come
> > together
> > for this. You can contribute for this by either of the following
> 

> > * Let know others about the questions you want to get answered in the
> > operating guide. ( I have set-up a etherpad for this [1])
> 
> > * Answer the questions/issues raised by others.
> 

> > Comments, suggestions?
> 
> > Should this be part of gluster code base i.e. /doc or somewhere else?
> 

> > [1] http://titanpad.com/op-guide
> 
> > [2] http://docs.openstack.org/ops/oreilly-openstack-ops-guide.pdf
> 

> > Thanks,
> 
> > Lala
> 
> > #lalatenduM on freenode
> 

> > ___
> 
> > Gluster-devel mailing list Gluster-devel@gluster.org
> > http://supercolony.gluster.org/mailman/listinfo/gluster-devel
> 

> Resending after fixing few typos.

> ___
> Gluster-users mailing list
> gluster-us...@gluster.org
> http://supercolony.gluster.org/mailman/listinfo/gluster-users___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] All builds are failing with BUILD ERROR

2014-06-03 Thread Kaleb KEITHLEY

On 06/03/2014 04:42 PM, Pranith Kumar Karampuri wrote:

Guys its failing again with the same error:

Please proceed with configuring, compiling, and installing.
rm: cannot remove `/build/install/var/run/gluster/patchy': Device or
resource busy
+ RET=1
+ '[' 1 '!=' 0 ']'
+ VERDICT='BUILD FAILURE'



Has someone changed the way builds are cleaned up?

Recent regressions are now failing with

...
Running automake...
Running autogen.sh in argp-standalone ...

Please proceed with configuring, compiling, and installing.
configure: error: source directory already configured; run "make 
distclean" there first

+ RET=1
+ '[' 1 '!=' 0 ']'
+ VERDICT='BUILD FAILURE'
...



I looked and found that /var/lib/jenkins/jobs/regression/workspace 
contained what looked like a _previous_ successful regression build.


I manually cleaned the directory (moved it and created a new workspace 
dir actually) and now a regression is running.


What's going on? !!!

--

Kaleb


___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] [Gluster-users] Need testers for GlusterFS 3.4.4

2014-06-03 Thread Pranith Kumar Karampuri


On 06/04/2014 01:35 AM, Ben Turner wrote:

- Original Message -

From: "Justin Clift" 
To: "Ben Turner" 
Cc: "James" , gluster-us...@gluster.org, "Gluster Devel" 

Sent: Thursday, May 29, 2014 6:12:40 PM
Subject: Re: [Gluster-users] [Gluster-devel] Need testers for GlusterFS 3.4.4

On 29/05/2014, at 8:04 PM, Ben Turner wrote:

From: "James" 
Sent: Wednesday, May 28, 2014 5:21:21 PM
On Wed, May 28, 2014 at 5:02 PM, Justin Clift  wrote:

Hi all,

Are there any Community members around who can test the GlusterFS 3.4.4
beta (rpms are available)?

I've provided all the tools and how-to to do this yourself. Should
probably take about ~20 min.

Old example:

https://ttboj.wordpress.com/2014/01/16/testing-glusterfs-during-glusterfest/

Same process should work, except base your testing on the latest
vagrant article:

https://ttboj.wordpress.com/2014/05/13/vagrant-on-fedora-with-libvirt-reprise/

If you haven't set it up already.

I can help out here, I'll have a chance to run through some stuff this
weekend.  Where should I post feedback?


Excellent Ben!  Please send feedback to gluster-devel. :)

So far so good on 3.4.4, sorry for the delay here.  I had to fix my downstream 
test suites to run outside of RHS / downstream gluster.  I did basic sanity 
testing on glusterfs mounts including:

FSSANITY_TEST_LIST: arequal bonnie glusterfs_build compile_kernel dbench dd 
ffsb fileop fsx fs_mark iozone locks ltp multiple_files posix_compliance 
postmark read_large rpc syscallbench tiobench

I am starting on NFS now, I'll have results tonight or tomorrow morning.  I'll 
look updating the component scripts to work and run them as well.

Thanks a lot for this ben.

Justin, Ben,
 Do you think we can automate running of these scripts without a 
lot of human intervention? If yes, how can I help?


We can use that just before making any release in future :-).

Pranith



-b
  

+ Justin

--
Open Source and Standards @ Red Hat

twitter.com/realjustinclift



___
Gluster-users mailing list
gluster-us...@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-users


___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] autodelete in snapshots

2014-06-03 Thread Rajesh Joseph


- Original Message -
> From: "M S Vishwanath Bhat" 
> To: "Rajesh Joseph" 
> Cc: "Vijay Bellur" , "Seema Naik" , 
> "Gluster Devel"
> 
> Sent: Tuesday, June 3, 2014 5:55:27 PM
> Subject: Re: [Gluster-devel] autodelete in snapshots
> 
> On 3 June 2014 15:21, Rajesh Joseph  wrote:
> 
> >
> >
> > - Original Message -
> > From: "M S Vishwanath Bhat" 
> > To: "Vijay Bellur" 
> > Cc: "Seema Naik" , "Gluster Devel" <
> > gluster-devel@gluster.org>
> > Sent: Tuesday, June 3, 2014 1:02:08 AM
> > Subject: Re: [Gluster-devel] autodelete in snapshots
> >
> >
> >
> >
> > On 2 June 2014 20:22, Vijay Bellur < vbel...@redhat.com > wrote:
> >
> >
> >
> > On 04/23/2014 05:50 AM, Vijay Bellur wrote:
> >
> >
> > On 04/20/2014 11:42 PM, Lalatendu Mohanty wrote:
> >
> >
> > On 04/16/2014 11:39 AM, Avra Sengupta wrote:
> >
> >
> > The whole purpose of introducing the soft-limit is, that at any point
> > of time the number of
> > snaps should not exceed the hard limit. If we trigger auto-delete on
> > hitting hard-limit, then
> > the purpose itself is lost, because at that point we would be taking a
> > snap, making the limit
> > hard-limit + 1, and then triggering auto-delete, which violates the
> > sanctity of the hard-limit.
> > Also what happens when we are at hard-limit + 1, and another snap is
> > issued, while auto-delete
> > is yet to process the first delete. At that point we end up at
> > hard-limit + 1. Also what happens
> > if for a particular snap the auto-delete fails.
> >
> > We should see the hard-limit, as something set by the admin keeping in
> > mind the resource consumption
> > and at no-point should we cross this limit, come what may. If we hit
> > this limit, the create command
> > should fail asking the user to delete snaps using the "snapshot
> > delete" command.
> >
> > The two options Raghavendra mentioned are applicable for the
> > soft-limit only, in which cases on
> > hitting the soft-limit
> >
> > 1. Trigger auto-delete
> >
> > or
> >
> > 2. Log a warning-message, for the user saying the number of snaps is
> > exceeding the snap-limit and
> > display the number of available snaps
> >
> > Now which of these should happen also depends on the user, because the
> > auto-delete option
> > is configurable.
> >
> > So if the auto-delete option is set as true, auto-delete should be
> > triggered and the above message
> > should also be logged.
> >
> > But if the option is set as false, only the message should be logged.
> >
> > This is the behaviour as designed. Adding Rahul, and Seema in the
> > mail, to reflect upon the
> > behaviour as well.
> >
> > Regards,
> > Avra
> >
> > This sounds correct. However we need to make sure that the usage or
> > documentation around this should be good enough , so that users
> > understand the each of the limits correctly.
> >
> >
> > It might be better to avoid the usage of the term "soft-limit".
> > soft-limit as used in quota and other places generally has an alerting
> > connotation. Something like "auto-deletion-limit" might be better.
> >
> >
> > I still see references to "soft-limit" and auto deletion seems to get
> > triggered upon reaching soft-limit.
> >
> > Why is the ability to auto delete not configurable? It does seem pretty
> > nasty to go about deleting snapshots without obtaining explicit consent
> > from the user.
> >
> > I agree with Vijay here. It's not good to delete a snap (even though it is
> > oldest) without the explicit consent from user.
> >
> > FYI It took me more than 2 weeks to figure out that my snaps were getting
> > autodeleted after reaching "soft-limit". For all I know I had not done
> > anything and my snap restore were failing.
> >
> > I propose to remove the terms "soft" and "hard" limit. I believe there
> > should be a limit (just "limit") after which all snapshot creates should
> > fail with proper error messages. And there can be a water-mark after which
> > user should get warning messages. So below is my proposal.
> >
> > auto-delete + snap-limit: If the snap-limit is set to n , next snap create
> > (n+1th) will succeed only if if auto-delete is set to on/true/1 and oldest
> > snap will get deleted automatically. If autodelete is set to off/false/0 ,
> > (n+1)th snap create will fail with proper error message from gluster CLI
> > command. But again by default autodelete should be off.
> >
> > snap-water-mark : This should come in picture only if autodelete is turned
> > off. It should not have any meaning if auto-delete is turned ON. Basically
> > it's usage is to give the user warning that limit almost being reached and
> > it is time for admin to decide which snaps should be deleted (or which
> > should be kept)
> >
> > *my two cents*
> >
> > -MS
> >
> >
> > The reason for having a hard-limit is to stop snapshot creation once we
> > reached this limit. This helps to have a control over the resource
> > consumption. Therefore if we only have this limit (as snap-limit) then
> > there is no question of auto-delet

[Gluster-devel] Regarding doing away with refkeeper in locks xlator

2014-06-03 Thread Krutika Dhananjay
Hi, 

Recently there was a crash in locks translator (BZ 1103347, BZ 1097102) with 
the following backtrace: 
(gdb) bt 
#0 uuid_unpack (in=0x8 , uu=0x7fffea6c6a60) at 
../../contrib/uuid/unpack.c:44 
#1 0x7feeba9e19d6 in uuid_unparse_x (uu=, 
out=0x2350fc0 "081bbc7a-7551-44ac-85c7-aad5e2633db9", 
fmt=0x7feebaa08e00 "%08x-%04x-%04x-%02x%02x-%02x%02x%02x%02x%02x%02x") at 
../../contrib/uuid/unparse.c:55 
#2 0x7feeba9be837 in uuid_utoa (uuid=0x8 ) at 
common-utils.c:2138 
#3 0x7feeb06e8a58 in pl_inodelk_log_cleanup (this=0x230d910, 
ctx=0x7fee700f0c60) at inodelk.c:396 
#4 pl_inodelk_client_cleanup (this=0x230d910, ctx=0x7fee700f0c60) at 
inodelk.c:428 
#5 0x7feeb06ddf3a in pl_client_disconnect_cbk (this=0x230d910, 
client=) at posix.c:2550 
#6 0x7feeba9fa2dd in gf_client_disconnect (client=0x27724a0) at 
client_t.c:368 
#7 0x7feeab77ed48 in server_connection_cleanup (this=0x2316390, 
client=0x27724a0, flags=) at server-helpers.c:354 
#8 0x7feeab77ae2c in server_rpc_notify (rpc=, 
xl=0x2316390, event=, data=0x2bf51c0) at server.c:527 
#9 0x7feeba775155 in rpcsvc_handle_disconnect (svc=0x2325980, 
trans=0x2bf51c0) at rpcsvc.c:720 
#10 0x7feeba776c30 in rpcsvc_notify (trans=0x2bf51c0, mydata=, event=, data=0x2bf51c0) at rpcsvc.c:758 
#11 0x7feeba778638 in rpc_transport_notify (this=, 
event=, data=) at rpc-transport.c:512 
#12 0x7feeb115e971 in socket_event_poll_err (fd=, 
idx=, data=0x2bf51c0, poll_in=, 
poll_out=0, 
poll_err=0) at socket.c:1071 
#13 socket_event_handler (fd=, idx=, 
data=0x2bf51c0, poll_in=, poll_out=0, poll_err=0) at 
socket.c:2240 
#14 0x7feeba9fc6a7 in event_dispatch_epoll_handler (event_pool=0x22e2d00) 
at event-epoll.c:384 
#15 event_dispatch_epoll (event_pool=0x22e2d00) at event-epoll.c:445 
#16 0x00407e93 in main (argc=19, argv=0x7fffea6c7f88) at 
glusterfsd.c:2023 
(gdb) f 4 
#4 pl_inodelk_client_cleanup (this=0x230d910, ctx=0x7fee700f0c60) at 
inodelk.c:428 
428 pl_inodelk_log_cleanup (l); 
(gdb) p l->pl_inode->refkeeper 
$1 = (inode_t *) 0x0 
(gdb) 

pl_inode->refkeeper was found to be NULL even when there were some blocked 
inodelks in a certain domain of the inode, 
which when dereferenced by the epoll thread in the cleanup codepath led to a 
crash. 

On inspecting the code (for want of a consistent reproducer), three things were 
found: 

1. The function where the crash happens (pl_inodelk_log_cleanup()), makes an 
attempt to resolve the inode to path as can be seen below. But the way 
inode_path() itself 
works is to first construct the path based on the given inode's ancestry and 
place it in the buffer provided. And if all else fails, the gfid of the inode 
is placed in a certain format (""). 
This eliminates the need for statements from line 4 through 7 below, thereby 
"preventing" dereferencing of pl_inode->refkeeper. 
Now, although this change prevents the crash altogether, it still does not fix 
the race that led to pl_inode->refkeeper becoming NULL, and comes at the cost 
of 
printing "(null)" in the log message on line 9 every time pl_inode->refkeeper 
is found to be NULL, rendering the logged messages somewhat useless. 

 
0 pl_inode = lock->pl_inode; 
1 
2 inode_path (pl_inode->refkeeper, NULL, &path); 
3 
4 if (path) 
5 file = path; 
6 else 
7 file = uuid_utoa (pl_inode->refkeeper->gfid); 
8 
9 gf_log (THIS->name, GF_LOG_WARNING, 
10 "releasing lock on %s held by " 
11 "{client=%p, pid=%"PRId64" lk-owner=%s}", 
12 file, lock->client, (uint64_t) lock->client_pid, 
13 lkowner_utoa (&lock->owner)); 
<\code> 

2. There is at least one codepath found that can lead to this crash: 
Imagine an inode on which an inodelk operation is attempted by a client and is 
successfully granted too. 
Now, between the time the lock was granted and pl_update_refkeeper() was called 
by this thread, the client could send a DISCONNECT event, 
causing cleanup codepath to be executed, where the epoll thread crashes on 
dereferencing pl_inode->refkeeper which is STILL NULL at this point. 

Besides, there are still places in locks xlator where the refkeeper is NOT 
updated whenever the lists are modified - for instance in the cleanup codepath 
from a DISCONNECT. 

3. Also, pl_update_refkeeper() seems to be not taking into account blocked 
locks on the inode in the __pl_inode_is_empty() check, when it should, as there 
could still be cases 
where the granted list could be empty but not the blocked list at the time of 
udpating the refkeeper, in which case pl_inode must still take ref on the 
inode. 

Proposed solution to 2/3: 

1. Do away with refkeeper in pl_inode_t altogether. 
2. Let every lock object (inodelk/entryllk/posix_lock) have an inode_t * member 
to act as a placeholder for the associated inode object on which it is locking. 
3. Let each lock object hold a ref on the inode at the time of its creation and 
unref the inode during its destruction. 

Please let me know what you think of the above. 

-Krutika 


___

Re: [Gluster-devel] All builds are failing with BUILD ERROR

2014-06-03 Thread Kaleb KEITHLEY

And since doing this, regression runs seem to be proceeding without issues.



On 06/04/2014 09:59 AM, Kaleb KEITHLEY wrote:

On 06/03/2014 04:42 PM, Pranith Kumar Karampuri wrote:

Guys its failing again with the same error:

Please proceed with configuring, compiling, and installing.
rm: cannot remove `/build/install/var/run/gluster/patchy': Device or
resource busy
+ RET=1
+ '[' 1 '!=' 0 ']'
+ VERDICT='BUILD FAILURE'



Has someone changed the way builds are cleaned up?

Recent regressions are now failing with

...
Running automake...
Running autogen.sh in argp-standalone ...

Please proceed with configuring, compiling, and installing.
configure: error: source directory already configured; run "make
distclean" there first
+ RET=1
+ '[' 1 '!=' 0 ']'
+ VERDICT='BUILD FAILURE'
...



I looked and found that /var/lib/jenkins/jobs/regression/workspace
contained what looked like a _previous_ successful regression build.

I manually cleaned the directory (moved it and created a new workspace
dir actually) and now a regression is running.

What's going on? !!!

--

Kaleb




___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] Regarding doing away with refkeeper in locks xlator

2014-06-03 Thread Pranith Kumar Karampuri


On 06/04/2014 11:37 AM, Krutika Dhananjay wrote:

Hi,

Recently there was a crash in locks translator (BZ 1103347, BZ 
1097102) with the following backtrace:

(gdb) bt
#0  uuid_unpack (in=0x8 , 
uu=0x7fffea6c6a60) at ../../contrib/uuid/unpack.c:44
#1  0x7feeba9e19d6 in uuid_unparse_x (uu=, 
out=0x2350fc0 "081bbc7a-7551-44ac-85c7-aad5e2633db9",
fmt=0x7feebaa08e00 
"%08x-%04x-%04x-%02x%02x-%02x%02x%02x%02x%02x%02x") at 
../../contrib/uuid/unparse.c:55
#2  0x7feeba9be837 in uuid_utoa (uuid=0x8 bounds>) at common-utils.c:2138
#3  0x7feeb06e8a58 in pl_inodelk_log_cleanup (this=0x230d910, 
ctx=0x7fee700f0c60) at inodelk.c:396
#4  pl_inodelk_client_cleanup (this=0x230d910, ctx=0x7fee700f0c60) at 
inodelk.c:428
#5  0x7feeb06ddf3a in pl_client_disconnect_cbk (this=0x230d910, 
client=) at posix.c:2550
#6  0x7feeba9fa2dd in gf_client_disconnect (client=0x27724a0) at 
client_t.c:368
#7  0x7feeab77ed48 in server_connection_cleanup (this=0x2316390, 
client=0x27724a0, flags=) at server-helpers.c:354
#8  0x7feeab77ae2c in server_rpc_notify (rpc=out>, xl=0x2316390, event=, data=0x2bf51c0) at 
server.c:527
#9  0x7feeba775155 in rpcsvc_handle_disconnect (svc=0x2325980, 
trans=0x2bf51c0) at rpcsvc.c:720
#10 0x7feeba776c30 in rpcsvc_notify (trans=0x2bf51c0, 
mydata=, event=, 
data=0x2bf51c0) at rpcsvc.c:758
#11 0x7feeba778638 in rpc_transport_notify (this=out>, event=, data=) at 
rpc-transport.c:512
#12 0x7feeb115e971 in socket_event_poll_err (fd=out>, idx=, data=0x2bf51c0, poll_in=optimized out>, poll_out=0,

poll_err=0) at socket.c:1071
#13 socket_event_handler (fd=, idx=optimized out>, data=0x2bf51c0, poll_in=, 
poll_out=0, poll_err=0) at socket.c:2240
#14 0x7feeba9fc6a7 in event_dispatch_epoll_handler 
(event_pool=0x22e2d00) at event-epoll.c:384

#15 event_dispatch_epoll (event_pool=0x22e2d00) at event-epoll.c:445
#16 0x00407e93 in main (argc=19, argv=0x7fffea6c7f88) at 
glusterfsd.c:2023

(gdb) f 4
#4  pl_inodelk_client_cleanup (this=0x230d910, ctx=0x7fee700f0c60) at 
inodelk.c:428

428pl_inodelk_log_cleanup (l);
(gdb) p l->pl_inode->refkeeper
$1 = (inode_t *) 0x0
(gdb)

pl_inode->refkeeper was found to be NULL even when there were some 
blocked inodelks in a certain domain of the inode,
which when dereferenced by the epoll thread in the cleanup codepath 
led to a crash.


On inspecting the code (for want of a consistent reproducer), three 
things were found:


1. The function where the crash happens (pl_inodelk_log_cleanup()), 
makes an attempt to resolve the inode to path as can be seen below. 
But the way inode_path() itself
works is to first construct the path based on the given inode's 
ancestry and place it in the buffer provided. And if all else fails, 
the gfid of the inode is placed in a certain format ("").
This eliminates the need for statements from line 4 through 7 
below, thereby "preventing" dereferencing of pl_inode->refkeeper.
Now, although this change prevents the crash altogether, it still 
does not fix the race that led to pl_inode->refkeeper becoming NULL, 
and comes at the cost of
printing "(null)" in the log message on line 9 every time 
pl_inode->refkeeper is found to be NULL, rendering the logged messages 
somewhat useless.



  0 pl_inode = lock->pl_inode;
  1
  2 inode_path (pl_inode->refkeeper, NULL, &path);
  3
  4 if (path)
  5 file = path;
  6 else
  7 file = uuid_utoa (pl_inode->refkeeper->gfid);
8
  9 gf_log (THIS->name, GF_LOG_WARNING,
 10 "releasing lock on %s held by "
 11 "{client=%p, pid=%"PRId64" lk-owner=%s}",
 12 file, lock->client, (uint64_t) lock->client_pid,
 13 lkowner_utoa (&lock->owner));
<\code>
I think this logging code is from the days when gfid handle concept was 
not there. So it wasn't returning  in cases the path is 
not present in the dentries. I believe the else block can be deleted 
safely now.


Pranith


2. There is at least one codepath found that can lead to this crash:
Imagine an inode on which an inodelk operation is attempted by a 
client and is successfully granted too.
   Now, between the time the lock was granted and 
pl_update_refkeeper() was called by this thread, the client could send 
a DISCONNECT event,
   causing cleanup codepath to be executed, where the epoll thread 
crashes on dereferencing pl_inode->refkeeper which is STILL NULL at 
this point.


   Besides, there are still places in locks xlator where the refkeeper 
is NOT updated whenever the lists are modified - for instance in the 
cleanup codepath from a DISCONNECT.


3. Also, pl_update_refkeeper() seems to be not taking into account 
blocked locks on the inode in the __pl_inode_is_empty() check, when it 
should, as there could still be cases
where the granted list could be empty but not the blocked list at 
the time of udpating the refkeeper, in which c

Re: [Gluster-devel] Regarding doing away with refkeeper in locks xlator

2014-06-03 Thread Pranith Kumar Karampuri


On 06/04/2014 12:02 PM, Pranith Kumar Karampuri wrote:


On 06/04/2014 11:37 AM, Krutika Dhananjay wrote:

Hi,

Recently there was a crash in locks translator (BZ 1103347, BZ 
1097102) with the following backtrace:

(gdb) bt
#0  uuid_unpack (in=0x8 , 
uu=0x7fffea6c6a60) at ../../contrib/uuid/unpack.c:44
#1  0x7feeba9e19d6 in uuid_unparse_x (uu=, 
out=0x2350fc0 "081bbc7a-7551-44ac-85c7-aad5e2633db9",
fmt=0x7feebaa08e00 
"%08x-%04x-%04x-%02x%02x-%02x%02x%02x%02x%02x%02x") at 
../../contrib/uuid/unparse.c:55
#2  0x7feeba9be837 in uuid_utoa (uuid=0x8 bounds>) at common-utils.c:2138
#3  0x7feeb06e8a58 in pl_inodelk_log_cleanup (this=0x230d910, 
ctx=0x7fee700f0c60) at inodelk.c:396
#4  pl_inodelk_client_cleanup (this=0x230d910, ctx=0x7fee700f0c60) at 
inodelk.c:428
#5  0x7feeb06ddf3a in pl_client_disconnect_cbk (this=0x230d910, 
client=) at posix.c:2550
#6  0x7feeba9fa2dd in gf_client_disconnect (client=0x27724a0) at 
client_t.c:368
#7  0x7feeab77ed48 in server_connection_cleanup (this=0x2316390, 
client=0x27724a0, flags=) at server-helpers.c:354
#8  0x7feeab77ae2c in server_rpc_notify (rpc=out>, xl=0x2316390, event=, data=0x2bf51c0) at 
server.c:527
#9  0x7feeba775155 in rpcsvc_handle_disconnect (svc=0x2325980, 
trans=0x2bf51c0) at rpcsvc.c:720
#10 0x7feeba776c30 in rpcsvc_notify (trans=0x2bf51c0, 
mydata=, event=, 
data=0x2bf51c0) at rpcsvc.c:758
#11 0x7feeba778638 in rpc_transport_notify (this=out>, event=, data=) at 
rpc-transport.c:512
#12 0x7feeb115e971 in socket_event_poll_err (fd=out>, idx=, data=0x2bf51c0, poll_in=optimized out>, poll_out=0,

poll_err=0) at socket.c:1071
#13 socket_event_handler (fd=, idx=optimized out>, data=0x2bf51c0, poll_in=, 
poll_out=0, poll_err=0) at socket.c:2240
#14 0x7feeba9fc6a7 in event_dispatch_epoll_handler 
(event_pool=0x22e2d00) at event-epoll.c:384

#15 event_dispatch_epoll (event_pool=0x22e2d00) at event-epoll.c:445
#16 0x00407e93 in main (argc=19, argv=0x7fffea6c7f88) at 
glusterfsd.c:2023

(gdb) f 4
#4  pl_inodelk_client_cleanup (this=0x230d910, ctx=0x7fee700f0c60) at 
inodelk.c:428

428pl_inodelk_log_cleanup (l);
(gdb) p l->pl_inode->refkeeper
$1 = (inode_t *) 0x0
(gdb)

pl_inode->refkeeper was found to be NULL even when there were some 
blocked inodelks in a certain domain of the inode,
which when dereferenced by the epoll thread in the cleanup codepath 
led to a crash.


On inspecting the code (for want of a consistent reproducer), three 
things were found:


1. The function where the crash happens (pl_inodelk_log_cleanup()), 
makes an attempt to resolve the inode to path as can be seen below. 
But the way inode_path() itself
works is to first construct the path based on the given inode's 
ancestry and place it in the buffer provided. And if all else fails, 
the gfid of the inode is placed in a certain format ("").
This eliminates the need for statements from line 4 through 7 
below, thereby "preventing" dereferencing of pl_inode->refkeeper.
Now, although this change prevents the crash altogether, it still 
does not fix the race that led to pl_inode->refkeeper becoming NULL, 
and comes at the cost of
printing "(null)" in the log message on line 9 every time 
pl_inode->refkeeper is found to be NULL, rendering the logged 
messages somewhat useless.



  0 pl_inode = lock->pl_inode;
  1
  2 inode_path (pl_inode->refkeeper, NULL, &path);
  3
  4 if (path)
  5 file = path;
  6 else
  7 file = uuid_utoa (pl_inode->refkeeper->gfid);
8
  9 gf_log (THIS->name, GF_LOG_WARNING,
 10 "releasing lock on %s held by "
 11 "{client=%p, pid=%"PRId64" lk-owner=%s}",
 12 file, lock->client, (uint64_t) lock->client_pid,
 13 lkowner_utoa (&lock->owner));
<\code>
I think this logging code is from the days when gfid handle concept 
was not there. So it wasn't returning  in cases the 
path is not present in the dentries. I believe the else block can be 
deleted safely now.


Pranith


2. There is at least one codepath found that can lead to this crash:
Imagine an inode on which an inodelk operation is attempted by a 
client and is successfully granted too.
   Now, between the time the lock was granted and 
pl_update_refkeeper() was called by this thread, the client could 
send a DISCONNECT event,
   causing cleanup codepath to be executed, where the epoll thread 
crashes on dereferencing pl_inode->refkeeper which is STILL NULL at 
this point.


   Besides, there are still places in locks xlator where the 
refkeeper is NOT updated whenever the lists are modified - for 
instance in the cleanup codepath from a DISCONNECT.


3. Also, pl_update_refkeeper() seems to be not taking into account 
blocked locks on the inode in the __pl_inode_is_empty() check, when 
it should, as there could still be cases
where the granted list could be empty but not the blocke