[Gluster-devel] 'Reviewd-by' tag for commits

2016-09-30 Thread Pranith Kumar Karampuri
hi,
 At the moment 'Reviewed-by' tag comes only if a +1 is given on the
final version of the patch. But for most of the patches, different people
would spend time on different versions making the patch better, they may
not get time to do the review for every version of the patch. Is it
possible to change the gerrit script to add 'Reviewed-by' for all the
people who participated in the review?

Or removing 'Reviewed-by' tag completely would also help to make sure it
doesn't give skewed counts.

-- 
Pranith
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] RPC, DHT and AFR logging errors that are expected, reduce log level?

2016-09-30 Thread Pranith Kumar Karampuri
On Thu, Sep 29, 2016 at 7:51 PM, Niels de Vos  wrote:

> Hello,
>
> When NFS-Ganesha does an UNLINK of a filename on an inode, it does a
> follow-up check to see if the inode has been deleted or if there are
> still other filenames linked (like hardlinks) to it.
>
> Users are getting confused about the errors that are logged by RPC, DHT
> and AFR. The file is missing (which is often perfectly expected from a
> NFS-Ganesha point of view) and this causes a flood of messages.
>
> From https://bugzilla.redhat.com/show_bug.cgi?id=1328581#c5 :
>
> > If we reduce the log level for
> > client-rpc-fops.c:2974:client3_3_lookup_cbk there would be the
> > following entries left:
> >
> > 2x dht-helper.c:1179:dht_migration_complete_check_task
> > 2x afr-read-txn.c:250:afr_read_txn
> >
> > it would reduce the logging for this non-error with 10 out of 14
> > messages. We need to know from the AFR and DHT team if these messages
> > are sufficient for them to identify potential issues.
>

Updated the bug from the perspective of AFR as well:
https://bugzilla.redhat.com/show_bug.cgi?id=1328581#c12

"I am not sure how an inode which is not in split-brain is linked as 'no
read-subvolumes' case. That is something to debug."


>
> Thanks,
> Niels
>



-- 
Pranith
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] Dht readdir filtering out names

2016-09-30 Thread Pranith Kumar Karampuri
What if the lower xlators want to set the entry->inode to NULL and clear
the entry->d_stat to force a lookup on the name? i.e.
gfid-split-brain/ia_type mismatches.

On Fri, Sep 30, 2016 at 10:00 AM, Raghavendra Gowdappa 
wrote:

>
>
> - Original Message -
> > From: "Raghavendra Gowdappa" 
> > To: "Pranith Kumar Karampuri" 
> > Cc: "Shyam Ranganathan" , "Nithya Balachandran" <
> nbala...@redhat.com>, "Gluster Devel"
> > 
> > Sent: Friday, September 30, 2016 9:58:34 AM
> > Subject: Re: Dht readdir filtering out names
> >
> >
> >
> > - Original Message -
> > > From: "Pranith Kumar Karampuri" 
> > > To: "Raghavendra Gowdappa" 
> > > Cc: "Shyam Ranganathan" , "Nithya Balachandran"
> > > , "Gluster Devel"
> > > 
> > > Sent: Friday, September 30, 2016 9:53:44 AM
> > > Subject: Re: Dht readdir filtering out names
> > >
> > > On Fri, Sep 30, 2016 at 9:50 AM, Raghavendra Gowdappa <
> rgowd...@redhat.com>
> > > wrote:
> > >
> > > >
> > > >
> > > > - Original Message -
> > > > > From: "Pranith Kumar Karampuri" 
> > > > > To: "Raghavendra Gowdappa" 
> > > > > Cc: "Shyam Ranganathan" , "Nithya
> Balachandran" <
> > > > nbala...@redhat.com>, "Gluster Devel"
> > > > > 
> > > > > Sent: Friday, September 30, 2016 9:15:04 AM
> > > > > Subject: Re: Dht readdir filtering out names
> > > > >
> > > > > On Fri, Sep 30, 2016 at 9:13 AM, Raghavendra Gowdappa <
> > > > rgowd...@redhat.com>
> > > > > wrote:
> > > > >
> > > > > > dht_readdirp_cbk has different behaviour for directories and
> files.
> > > > > >
> > > > > > 1. If file, pick the dentry (passed from subvols as part of
> readdirp
> > > > > > response) if the it corresponds to data file.
> > > > > > 2. If directory pick the dentry if readdirp response is from
> > > > hashed-subvol.
> > > > > >
> > > > > > In all other cases, the dentry is skipped and not passed to
> higher
> > > > > > layers/application. To elaborate, the dentries which are ignored
> are:
> > > > > > 1. dentries corresponding to linkto files.
> > > > > > 2. dentries from non-hashed subvols corresponding to directories.
> > > > > >
> > > > > > Since the behaviour is different for different filesystem
> objects,
> > > > > > dht
> > > > > > needs ia_type to choose its behaviour.
> > > > > >
> > > > > > - Original Message -
> > > > > > > From: "Pranith Kumar Karampuri" 
> > > > > > > To: "Shyam Ranganathan" , "Raghavendra
> > > > Gowdappa" <
> > > > > > rgowd...@redhat.com>, "Nithya Balachandran"
> > > > > > > 
> > > > > > > Cc: "Gluster Devel" 
> > > > > > > Sent: Friday, September 30, 2016 8:39:28 AM
> > > > > > > Subject: Dht readdir filtering out names
> > > > > > >
> > > > > > > hi,
> > > > > > >In dht_readdirp_cbk() there is a check about skipping
> files
> > > > > > without
> > > > > > > ia_type. Could you help me understand why this check is added?
> > > > > > > There
> > > > are
> > > > > > > times when users have to delete gfid of the entries and trigger
> > > > something
> > > > > > > like 'find . | xargs stat' to heal the gfids. This case would
> fail
> > > > if we
> > > > > > > skip entries without gfid, if the lower xlators don't send stat
> > > > > > information
> > > > > > > for them.
> > > > > >
> > > > > > Probably we can make readdirp_cbk not rely on ia_type and pass
> _all_
> > > > > > dentries received by subvols to application without filtering.
> > > > > > However
> > > > we
> > > > > > should make this behaviour optional and use this only for
> recovery
> > > > setups.
> > &

Re: [Gluster-devel] [Gluster-Maintainers] 'Reviewd-by' tag for commits

2016-10-02 Thread Pranith Kumar Karampuri
On Fri, Sep 30, 2016 at 8:50 PM, Ravishankar N 
wrote:

> On 09/30/2016 06:38 PM, Niels de Vos wrote:
>
> On Fri, Sep 30, 2016 at 07:11:51AM +0530, Pranith Kumar Karampuri wrote:
>
> hi,
>  At the moment 'Reviewed-by' tag comes only if a +1 is given on the
> final version of the patch. But for most of the patches, different people
> would spend time on different versions making the patch better, they may
> not get time to do the review for every version of the patch. Is it
> possible to change the gerrit script to add 'Reviewed-by' for all the
> people who participated in the review?
>
> +1 to this. For the argument that this *might* encourage me-too +1s, it
> only exposes
> such persons in bad light.
>
> Or removing 'Reviewed-by' tag completely would also help to make sure it
> doesn't give skewed counts.
>
> I'm not going to lie, for me, that takes away the incentive of doing any
> reviews at all.
>

Could you elaborate why? May be you should also talk about your primary
motivation for doing reviews.

I would not feel comfortable automatically adding Reviewed-by tags for
> people that did not review the last version. They may not agree with the
> last version, so adding their "approved stamp" on it may not be correct.
> See the description of Reviewed-by in the Linux kernel sources [0].
>
> While the Linux kernel model is the poster child for projects to draw
> standards
> from, IMO, their email based review system is certainly not one to
> emulate. It
> does not provide a clean way to view patch-set diffs, does not present a
> single
> URL based history that tracks all review comments, relies on the sender to
> provide information on what changed between versions, allows a variety of
> 'Komedians' [1] to add random tags which may or may not be picked up
> by the maintainer who takes patches in etc.
>
> Maybe we can add an additional tag that mentions all the people that
> did do reviews of older versions of the patch. Not sure what the tag
> would be, maybe just CC?
>
> It depends on what tags would be processed to obtain statistics on review
> contributions.
> I agree that not all reviewers might be okay with the latest revision but
> that
> % might be miniscule (zero, really) compared to the normal case where the
> reviewer spent
> considerable time and effort to provide feedback (and an eventual +1) on
> previous
> revisions. If converting all +1s into 'Reviewed-by's is not feasible in
> gerrit
> or is not considered acceptable, then the maintainer could wait for a
> reasonable
> time for reviewers to give +1 for the final revision before he/she goes
> ahead
> with a +2 and merges it. While we cannot wait indefinitely for all acks, a
> comment
> like 'LGTM, will wait for a day for other acks before I go ahead and
> merge' would be
> appreciated.
>
> Enough of bike-shedding from my end I suppose.:-)
> Ravi
>
> [1] https://lwn.net/Articles/503829/
>
> Niels
>
> 0. 
> http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/Documentation/SubmittingPatches#n552
>
>
>
> ___
> Gluster-devel mailing 
> listGluster-devel@gluster.orghttp://www.gluster.org/mailman/listinfo/gluster-devel
>
>
>


-- 
Pranith
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] [Gluster-Maintainers] 'Reviewd-by' tag for commits

2016-10-02 Thread Pranith Kumar Karampuri
On Mon, Oct 3, 2016 at 6:41 AM, Pranith Kumar Karampuri  wrote:

>
>
> On Fri, Sep 30, 2016 at 8:50 PM, Ravishankar N 
> wrote:
>
>> On 09/30/2016 06:38 PM, Niels de Vos wrote:
>>
>> On Fri, Sep 30, 2016 at 07:11:51AM +0530, Pranith Kumar Karampuri wrote:
>>
>> hi,
>>  At the moment 'Reviewed-by' tag comes only if a +1 is given on the
>> final version of the patch. But for most of the patches, different people
>> would spend time on different versions making the patch better, they may
>> not get time to do the review for every version of the patch. Is it
>> possible to change the gerrit script to add 'Reviewed-by' for all the
>> people who participated in the review?
>>
>> +1 to this. For the argument that this *might* encourage me-too +1s, it
>> only exposes
>> such persons in bad light.
>>
>> Or removing 'Reviewed-by' tag completely would also help to make sure it
>> doesn't give skewed counts.
>>
>> I'm not going to lie, for me, that takes away the incentive of doing any
>> reviews at all.
>>
>
> Could you elaborate why? May be you should also talk about your primary
> motivation for doing reviews.
>

I guess it is probably because the effort needs to be recognized? I think
there is an option to recognize it so it is probably not a good idea to
remove the tag I guess.


>
> I would not feel comfortable automatically adding Reviewed-by tags for
>> people that did not review the last version. They may not agree with the
>> last version, so adding their "approved stamp" on it may not be correct.
>> See the description of Reviewed-by in the Linux kernel sources [0].
>>
>> While the Linux kernel model is the poster child for projects to draw
>> standards
>> from, IMO, their email based review system is certainly not one to
>> emulate. It
>> does not provide a clean way to view patch-set diffs, does not present a
>> single
>> URL based history that tracks all review comments, relies on the sender to
>> provide information on what changed between versions, allows a variety of
>> 'Komedians' [1] to add random tags which may or may not be picked up
>> by the maintainer who takes patches in etc.
>>
>> Maybe we can add an additional tag that mentions all the people that
>> did do reviews of older versions of the patch. Not sure what the tag
>> would be, maybe just CC?
>>
>> It depends on what tags would be processed to obtain statistics on review
>> contributions.
>> I agree that not all reviewers might be okay with the latest revision but
>> that
>> % might be miniscule (zero, really) compared to the normal case where the
>> reviewer spent
>> considerable time and effort to provide feedback (and an eventual +1) on
>> previous
>> revisions. If converting all +1s into 'Reviewed-by's is not feasible in
>> gerrit
>> or is not considered acceptable, then the maintainer could wait for a
>> reasonable
>> time for reviewers to give +1 for the final revision before he/she goes
>> ahead
>> with a +2 and merges it. While we cannot wait indefinitely for all acks,
>> a comment
>> like 'LGTM, will wait for a day for other acks before I go ahead and
>> merge' would be
>> appreciated.
>>
>> Enough of bike-shedding from my end I suppose.:-)
>> Ravi
>>
>> [1] https://lwn.net/Articles/503829/
>>
>> Niels
>>
>> 0. 
>> http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/Documentation/SubmittingPatches#n552
>>
>>
>>
>> ___
>> Gluster-devel mailing 
>> listGluster-devel@gluster.orghttp://www.gluster.org/mailman/listinfo/gluster-devel
>>
>>
>>
>
>
> --
> Pranith
>



-- 
Pranith
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] [Gluster-Maintainers] 'Reviewd-by' tag for commits

2016-10-02 Thread Pranith Kumar Karampuri
On Mon, Oct 3, 2016 at 7:23 AM, Ravishankar N 
wrote:

> On 10/03/2016 06:58 AM, Pranith Kumar Karampuri wrote:
>
>
>
> On Mon, Oct 3, 2016 at 6:41 AM, Pranith Kumar Karampuri <
> pkara...@redhat.com> wrote:
>
>>
>>
>> On Fri, Sep 30, 2016 at 8:50 PM, Ravishankar N 
>> wrote:
>>
>>> On 09/30/2016 06:38 PM, Niels de Vos wrote:
>>>
>>> On Fri, Sep 30, 2016 at 07:11:51AM +0530, Pranith Kumar Karampuri wrote:
>>>
>>> hi,
>>>  At the moment 'Reviewed-by' tag comes only if a +1 is given on the
>>> final version of the patch. But for most of the patches, different people
>>> would spend time on different versions making the patch better, they may
>>> not get time to do the review for every version of the patch. Is it
>>> possible to change the gerrit script to add 'Reviewed-by' for all the
>>> people who participated in the review?
>>>
>>> +1 to this. For the argument that this *might* encourage me-too +1s, it
>>> only exposes
>>> such persons in bad light.
>>>
>>> Or removing 'Reviewed-by' tag completely would also help to make sure it
>>> doesn't give skewed counts.
>>>
>>> I'm not going to lie, for me, that takes away the incentive of doing any
>>> reviews at all.
>>>
>>
>> Could you elaborate why? May be you should also talk about your primary
>> motivation for doing reviews.
>>
>
> I guess it is probably because the effort needs to be recognized? I think
> there is an option to recognize it so it is probably not a good idea to
> remove the tag I guess.
>
>
> Yes, numbers provide good motivation for me:
> Motivation for looking at patches and finding bugs for known components
> even though I am not its maintainer.
> Motivation to learning new components because a bug and a fix is usually
> when I look at code for unknown components.
> Motivation to level-up when statistics indicate I'm behind my peers.
>
> I think even you said some time back in an ML thread that what can be
> measured can be improved.
>

I am still not sure how to quantify good review from a bad one. So not sure
how it can be measured thus improved. I guess at this point getting more
eyes on the patches is good enough.


>
> -Ravi
>
>
>
>>
>> I would not feel comfortable automatically adding Reviewed-by tags for
>>> people that did not review the last version. They may not agree with the
>>> last version, so adding their "approved stamp" on it may not be correct.
>>> See the description of Reviewed-by in the Linux kernel sources [0].
>>>
>>> While the Linux kernel model is the poster child for projects to draw
>>> standards
>>> from, IMO, their email based review system is certainly not one to
>>> emulate. It
>>> does not provide a clean way to view patch-set diffs, does not present a
>>> single
>>> URL based history that tracks all review comments, relies on the sender
>>> to
>>> provide information on what changed between versions, allows a variety of
>>> 'Komedians' [1] to add random tags which may or may not be picked up
>>> by the maintainer who takes patches in etc.
>>>
>>> Maybe we can add an additional tag that mentions all the people that
>>> did do reviews of older versions of the patch. Not sure what the tag
>>> would be, maybe just CC?
>>>
>>> It depends on what tags would be processed to obtain statistics on
>>> review contributions.
>>> I agree that not all reviewers might be okay with the latest revision
>>> but that
>>> % might be miniscule (zero, really) compared to the normal case where
>>> the reviewer spent
>>> considerable time and effort to provide feedback (and an eventual +1) on
>>> previous
>>> revisions. If converting all +1s into 'Reviewed-by's is not feasible in
>>> gerrit
>>> or is not considered acceptable, then the maintainer could wait for a
>>> reasonable
>>> time for reviewers to give +1 for the final revision before he/she goes
>>> ahead
>>> with a +2 and merges it. While we cannot wait indefinitely for all acks,
>>> a comment
>>> like 'LGTM, will wait for a day for other acks before I go ahead and
>>> merge' would be
>>> appreciated.
>>>
>>> Enough of bike-shedding from my end I suppose.:-)
>>> Ravi
>>>
>>> [1] https://lwn.net/Articles/503829/
>>>
>>> Niels
>>>
>>> 0. 
>>> http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/Documentation/SubmittingPatches#n552
>>>
>>> ___
>>> Gluster-devel mailing 
>>> listGluster-devel@gluster.orghttp://www.gluster.org/mailman/listinfo/gluster-devel
>>>
>>> --
>> Pranith
>>
> --
> Pranith
>
>


-- 
Pranith
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] [Gluster-Maintainers] 'Reviewd-by' tag for commits

2016-10-03 Thread Pranith Kumar Karampuri
On Mon, Oct 3, 2016 at 12:17 PM, Joe Julian  wrote:

> If you get credit for +1, shouldn't you also get credit for -1? It seems
> to me that catching a fault is at least as valuable if not more so.
>

Yes when I said review it could be either +1/-1/+2


>
> On October 3, 2016 3:58:32 AM GMT+02:00, Pranith Kumar Karampuri <
> pkara...@redhat.com> wrote:
>>
>>
>>
>> On Mon, Oct 3, 2016 at 7:23 AM, Ravishankar N 
>> wrote:
>>
>>> On 10/03/2016 06:58 AM, Pranith Kumar Karampuri wrote:
>>>
>>>
>>>
>>> On Mon, Oct 3, 2016 at 6:41 AM, Pranith Kumar Karampuri <
>>> pkara...@redhat.com> wrote:
>>>
>>>>
>>>>
>>>> On Fri, Sep 30, 2016 at 8:50 PM, Ravishankar N 
>>>> wrote:
>>>>
>>>>> On 09/30/2016 06:38 PM, Niels de Vos wrote:
>>>>>
>>>>> On Fri, Sep 30, 2016 at 07:11:51AM +0530, Pranith Kumar Karampuri wrote:
>>>>>
>>>>> hi,
>>>>>  At the moment 'Reviewed-by' tag comes only if a +1 is given on the
>>>>> final version of the patch. But for most of the patches, different people
>>>>> would spend time on different versions making the patch better, they may
>>>>> not get time to do the review for every version of the patch. Is it
>>>>> possible to change the gerrit script to add 'Reviewed-by' for all the
>>>>> people who participated in the review?
>>>>>
>>>>> +1 to this. For the argument that this *might* encourage me-too +1s,
>>>>> it only exposes
>>>>> such persons in bad light.
>>>>>
>>>>> Or removing 'Reviewed-by' tag completely would also help to make sure it
>>>>> doesn't give skewed counts.
>>>>>
>>>>> I'm not going to lie, for me, that takes away the incentive of doing
>>>>> any reviews at all.
>>>>>
>>>>
>>>> Could you elaborate why? May be you should also talk about your primary
>>>> motivation for doing reviews.
>>>>
>>>
>>> I guess it is probably because the effort needs to be recognized? I
>>> think there is an option to recognize it so it is probably not a good idea
>>> to remove the tag I guess.
>>>
>>>
>>> Yes, numbers provide good motivation for me:
>>> Motivation for looking at patches and finding bugs for known components
>>> even though I am not its maintainer.
>>> Motivation to learning new components because a bug and a fix is usually
>>> when I look at code for unknown components.
>>> Motivation to level-up when statistics indicate I'm behind my peers.
>>>
>>> I think even you said some time back in an ML thread that what can be
>>> measured can be improved.
>>>
>>
>> I am still not sure how to quantify good review from a bad one. So not
>> sure how it can be measured thus improved. I guess at this point getting
>> more eyes on the patches is good enough.
>>
>>
>>>
>>> -Ravi
>>>
>>>
>>>
>>>>
>>>> I would not feel comfortable automatically adding Reviewed-by tags for
>>>>> people that did not review the last version. They may not agree with the
>>>>> last version, so adding their "approved stamp" on it may not be correct.
>>>>> See the description of Reviewed-by in the Linux kernel sources [0].
>>>>>
>>>>> While the Linux kernel model is the poster child for projects to draw
>>>>> standards
>>>>> from, IMO, their email based review system is certainly not one to
>>>>> emulate. It
>>>>> does not provide a clean way to view patch-set diffs, does not present
>>>>> a single
>>>>> URL based history that tracks all review comments, relies on the
>>>>> sender to
>>>>> provide information on what changed between versions, allows a variety
>>>>> of
>>>>> 'Komedians' [1] to add random tags which may or may not be picked up
>>>>> by the maintainer who takes patches in etc.
>>>>>
>>>>> Maybe we can add an additional tag that mentions all the people that
>>>>> did do reviews of older versions of the patch. Not sure what the tag
>>>>> would be, maybe just CC?
>>>>>
>>>>> It depends on what tags would

Re: [Gluster-devel] Regression caused to gfapi applications with enabling client-io-threads by default

2016-10-05 Thread Pranith Kumar Karampuri
On Wed, Oct 5, 2016 at 2:00 PM, Soumya Koduri  wrote:

> Hi,
>
> With http://review.gluster.org/#/c/15051/, performace/client-io-threads
> is enabled by default. But with that we see regression caused to
> nfs-ganesha application trying to un/re-export any glusterfs volume. This
> shall be the same case with any gfapi application using glfs_fini().
>
> More details and the RCA can be found at [1].
>
> In short, iot-worker threads spawned  (when the above option is enabled)
> are not cleaned up as part of io-threads-xlator->fini() and those threads
> could end up accessing invalid/freed memory post glfs_fini().
>
> The actual fix is to address io-threads-xlator->fini() to cleanup those
> threads before exiting. But since those threads' IDs are currently not
> stored, the fix could be very intricate and take a while. So till then to
> avoid all existing applications crash, I suggest to keep this option
> disabled by default and update this known_issue with enabling this option
> in the release-notes.
>
> I sent a patch to revert the commit - http://review.gluster.org/#/c/15616/
> [2]
>

Good catch! I think the correct fix would be to make sure all threads die
as part of PARENT_DOWN then?


> Comments/Suggestions are welcome.
>
> Thanks,
> Soumya
>
> [1] https://bugzilla.redhat.com/show_bug.cgi?id=1380619#c11
> [2] http://review.gluster.org/#/c/15616/
>



-- 
Pranith
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel

[Gluster-devel] new spurious regressions

2014-11-08 Thread Pranith Kumar Karampuri

hi,
 The following tests keep failing spuriously nowadays. I CCed glusterd 
folks and original author(Kritika) and Last change author (Emmanuel).

You can check 
http://build.gluster.org/job/rackspace-regression-2GB-triggered/2497/consoleFull
 for full logs.

volume create: patchy: failed: parent directory /d/backends is already part of 
a volume
volume add-brick: failed: Volume patchy does not exist
volume stop: patchy: failed: Volume patchy does not exist
[03:22:09] ./tests/bugs/bug-948729/bug-948729-mode-script.t 

not ok 19
not ok 23
Failed 2/23 subtests
volume create: patchy: failed: parent directory /d/backends is already part of 
a volume
volume add-brick: failed: Volume patchy does not exist
volume stop: patchy: failed: Volume patchy does not exist
[03:22:22] ./tests/bugs/bug-948729/bug-948729.t 

not ok 19
not ok 23

Pranith

___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] new spurious regressions

2014-11-09 Thread Pranith Kumar Karampuri


On 11/10/2014 01:04 AM, Emmanuel Dreyfus wrote:

Justin Clift  wrote:


I've just used that page to disconnect slave25, so you're fine to
investigate there (same login credentials as before).  Please reconnect
it when you're done. :)

Since I could spot nothing from, I reconnected it. I will try by
submitting a change with set -x for that script.
It was consistently happening with my change just on regression machine. 
So I added set -x and submitted the change. Lets see what the results 
will be.


Pranith




___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] new spurious regressions

2014-11-09 Thread Pranith Kumar Karampuri


On 11/10/2014 10:58 AM, Emmanuel Dreyfus wrote:

Pranith Kumar Karampuri  wrote:


Since I could spot nothing from, I reconnected it. I will try by
submitting a change with set -x for that script.

It was consistently happening with my change just on regression machine.
So I added set -x and submitted the change. Lets see what the results
will be.

I did that too and submitted a possible fix:
http://review.gluster.com/9081

Cool. Thats great. Thanks for taking a look into this one :-)

Pranith




___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] [Gluster-users] info heal-failed shown as gfid

2014-11-10 Thread Pranith Kumar Karampuri


On 11/10/2014 11:21 AM, Vijay Bellur wrote:

On 11/08/2014 08:19 AM, Peter Auyeung wrote:

I have a node down while gfs still open for writing.

Got tons of heal-failed on a replicated volume showing as gfid.

Tried gfid-resolver and got the following:

# ./gfid-resolver.sh /brick02/gfs/ 88417c43-7d0f-4ec5-8fcd-f696617b5bc1
88417c43-7d0f-4ec5-8fcd-f696617b5bc1==File:11/07/14 18:47:19
[ /root/scripts ]

Any one has clue how to resolve and fix these heal-failed entries??
heal-failed problems are generally fixed in subsequent heals. Nothing to 
worry there. Check if there are any split-brains. Then there will be 
something that needs to be done manually.


Pranith




Procedure detailed in [1] might be useful.

-Vijay

[1] 
https://github.com/gluster/glusterfs/blob/master/doc/debugging/split-brain.md



___
Gluster-users mailing list
gluster-us...@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-users


___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] Regression testing report: Gluster v3.6.1 on CentOS 6.6

2014-11-11 Thread Pranith Kumar Karampuri


On 11/11/2014 03:13 PM, Kiran Patil wrote:

Test Summary Report
--
./tests/basic/quota-anon-fd-nfs.t  (Wstat: 0 Tests: 16 
Failed: 1)

  Failed test:  16
This is a spurious failure at least on master. Could you run it 2-3 
times to see if it is a consistent failure on cent-os.
./tests/bugs/886998/strict-readdir.t   (Wstat: 0 Tests: 30 
Failed: 2)

  Failed tests:  10, 24

What is the underlying backend filesystem?
./tests/bugs/bug-1112559.t (Wstat: 0 Tests: 11 
Failed: 2)

  Failed tests:  9, 11

CC Joseph fernandez
Files=277, Tests=7908, 8046 wallclock secs ( 4.54 usr  0.98 sys + 
902.74 cusr 644.05 csys = 1552.31 CPU)

Result: FAIL



___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-devel


___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] Regression testing report: Gluster v3.6.1 on CentOS 6.6

2014-11-12 Thread Pranith Kumar Karampuri


On 11/11/2014 05:25 PM, Kiran Patil wrote:
I have installed gluster v3.6.1 from 
http://download.gluster.org/pub/gluster/glusterfs/LATEST/EPEL.repo/epel-6/ 



The /tests/bugs/bug-1112559.t testcase passed in all 3 runs and rest 
of the two tests quota-anon-fd-nfs.t, 
/tests/bugs/886998/strict-readdir.t failed in all 3 runs.
Thanks for this mail Kiran. http://review.gluster.org/8201 will fix 
/tests/bugs/886998/strict-readdir.t


CCed vijaikumar to look into the quota-anon-fd-nfs.t

Pranith


Ondisk filesystem is ext4.

Thanks for quick feedback.

On Tue, Nov 11, 2014 at 3:33 PM, Pranith Kumar Karampuri 
mailto:pkara...@redhat.com>> wrote:



On 11/11/2014 03:13 PM, Kiran Patil wrote:

Test Summary Report
--
./tests/basic/quota-anon-fd-nfs.t(Wstat: 0 Tests: 16 Failed: 1)
  Failed test:  16

This is a spurious failure at least on master. Could you run it
2-3 times to see if it is a consistent failure on cent-os.

./tests/bugs/886998/strict-readdir.t (Wstat: 0 Tests: 30
Failed: 2)
  Failed tests:  10, 24

What is the underlying backend filesystem?

./tests/bugs/bug-1112559.t (Wstat: 0 Tests: 11 Failed: 2)
  Failed tests:  9, 11

CC Joseph fernandez

Files=277, Tests=7908, 8046 wallclock secs ( 4.54 usr  0.98 sys +
902.74 cusr 644.05 csys = 1552.31 CPU)
Result: FAIL



___
Gluster-devel mailing list
Gluster-devel@gluster.org  <mailto:Gluster-devel@gluster.org>
http://supercolony.gluster.org/mailman/listinfo/gluster-devel





___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] IMPORTANT - Adding further volume types to our smoke tests

2014-11-12 Thread Pranith Kumar Karampuri


On 11/13/2014 03:23 AM, Justin Clift wrote:

Hi all,

At the moment, our smoke tests in Jenkins only run on a
replicated volume.  Extending that out to other volume types
should (in theory :>) help catch other simple gotchas.

Xavi has put together a patch for doing just this, which I'd
like to apply and get us running:

   
https://forge.gluster.org/gluster-patch-acceptance-tests/gluster-patch-acceptance-tests/merge_requests/4

What are people's thoughts on the general idea, and on the
above proposed patch?  (The Forge isn't using Gerrit, so
review/comments back here please :>)
This is good initiative. How about doing this tests with different 
backend filesystems as well? like ext4 and btrfs?


Pranith


Regards and best wishes,

Justin Clift



___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] IMPORTANT - Adding further volume types to our smoke tests

2014-11-13 Thread Pranith Kumar Karampuri


On 11/13/2014 03:51 AM, Jeff Darcy wrote:

\> At the moment, our smoke tests in Jenkins only run on a

replicated volume.  Extending that out to other volume types
should (in theory :>) help catch other simple gotchas.

Xavi has put together a patch for doing just this, which I'd
like to apply and get us running:

   
https://forge.gluster.org/gluster-patch-acceptance-tests/gluster-patch-acceptance-tests/merge_requests/4

What are people's thoughts on the general idea, and on the
above proposed patch?  (The Forge isn't using Gerrit, so
review/comments back here please :>)

I'm ambivalent.  On the one hand, I think this is an important
step in the right direction.  Sometimes we need to be able to
run *all* of our existing tests with some feature enabled, not
just a few feature-specific tests.  SSL is an example of this,
and transport (or other forms of) multi-threading will be as
well.

On the other hand, I'm not sure smoke is the place to do this.
Smoke is supposed to be a *quick* test to catch *basic* errors
(e.g. source fails to build) before we devote hours to a full
regression test.  How much does this change throughput on the
smoke-test queue?  Should we be doing this in regression
instead, or in a third testing tier between the two we have?
That makes sense. Should we have daily regression runs which will 
contain a lot more things that need to be tested on a regular basis? 
Running regressions per disk fs type is something that we need to do. We 
can improve them going forward with long running tests like disk 
replacement tests/ Rebalance, geo-rep tests etc. Let me know your 
thoughts on this.


Pranith


My gut feel is that we need to think more about how to run
a matrix of M tests across N configurations, instead of just
putting feature/regression tests and configuration tests into
one big bucket.  Or maybe that's a longer-term thing.
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-devel


___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] quota and snapshot testcase failure (zfs on CentOS 6.6)

2014-11-18 Thread Pranith Kumar Karampuri


On 11/12/2014 04:52 PM, Kiran Patil wrote:
I have create zpool with name d and mnt and they appear in filesystem 
as follows.


d on /d type zfs (rw,xattr)
mnt on /mnt type zfs (rw,xattr)

Debug enabled output of quota.t testcase is at http://ur1.ca/irbt1.

CC vijaikumar


On Wed, Nov 12, 2014 at 3:22 PM, Kiran Patil > wrote:


Hi,

Gluster suite report,

Gluster version: glusterfs 3.6.1

On disk filesystem: Zfs 0.6.3-1.1

Operating system: CentOS release 6.6 (Final)

We are seeing quota and snapshot testcase failures.

We are not sure why quota is failing since quotas worked fine on
gluster 3.4.

Test Summary Report
---
./tests/basic/quota-anon-fd-nfs.t  (Wstat: 0 Tests: 16 Failed: 1)
  Failed test:  16
./tests/basic/quota.t  (Wstat: 0 Tests: 73 Failed: 4)
  Failed tests:  24, 28, 32, 65
./tests/basic/uss.t  (Wstat: 0 Tests: 147 Failed: 78)
  Failed tests:  8-11, 16-25, 28-29, 31-32, 39-40, 45-47
49-57, 60-61, 63-64, 71-72, 78-87, 90-91
93-94, 101-102, 107-115, 118-119, 121-122
129-130, 134, 136-137, 139-140, 142-143
145-146
./tests/basic/volume-snapshot.t  (Wstat: 0 Tests: 30 Failed: 12)
  Failed tests:  11-18, 21-24
./tests/basic/volume-status.t  (Wstat: 0 Tests: 14 Failed: 1)
  Failed test:  14
./tests/bugs/bug-1023974.t   (Wstat: 0 Tests: 15 Failed: 1)
  Failed test:  12
./tests/bugs/bug-1038598.t   (Wstat: 0 Tests: 28 Failed: 6)
  Failed tests:  17, 21-22, 26-28
./tests/bugs/bug-1045333.t   (Wstat: 0 Tests: 16 Failed: 9)
  Failed tests:  7-15
./tests/bugs/bug-1049834.t   (Wstat: 0 Tests: 18 Failed: 7)
  Failed tests:  11-14, 16-18
./tests/bugs/bug-1087203.t   (Wstat: 0 Tests: 43 Failed: 2)
  Failed tests:  31, 41
./tests/bugs/bug-1090042.t   (Wstat: 0 Tests: 12 Failed: 3)
  Failed tests:  9-11
./tests/bugs/bug-1109770.t   (Wstat: 0 Tests: 19 Failed: 4)
  Failed tests:  8-11
./tests/bugs/bug-1109889.t   (Wstat: 0 Tests: 20 Failed: 4)
  Failed tests:  8-11
./tests/bugs/bug-1112559.t   (Wstat: 0 Tests: 11 Failed: 3)
  Failed tests:  8-9, 11
./tests/bugs/bug-1112613.t   (Wstat: 0 Tests: 22 Failed: 5)
  Failed tests:  12-14, 17-18
./tests/bugs/bug-1113975.t   (Wstat: 0 Tests: 13 Failed: 4)
  Failed tests:  8-9, 11-12
./tests/bugs/bug-847622.t  (Wstat: 0 Tests: 10 Failed: 1)
  Failed test:  8
./tests/bugs/bug-861542.t  (Wstat: 0 Tests: 13 Failed: 7)
  Failed tests:  7-13
./tests/features/ssl-authz.t   (Wstat: 0 Tests: 18 Failed: 1)
  Failed test:  18
Files=277, Tests=7908, 8147 wallclock secs ( 4.56 usr  0.78 sys +
774.74 cusr 666.97 csys = 1447.05 CPU)
Result: FAIL

Thanks,
Kiran.




___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-devel


___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] quota and snapshot testcase failure (zfs on CentOS 6.6)

2014-11-18 Thread Pranith Kumar Karampuri


On 11/19/2014 10:30 AM, Atin Mukherjee wrote:


On 11/18/2014 10:35 PM, Pranith Kumar Karampuri wrote:

On 11/12/2014 04:52 PM, Kiran Patil wrote:

I have create zpool with name d and mnt and they appear in filesystem
as follows.

d on /d type zfs (rw,xattr)
mnt on /mnt type zfs (rw,xattr)

Debug enabled output of quota.t testcase is at http://ur1.ca/irbt1.

CC vijaikumar

quota-anon-fd-nfs.t spurious failure fix is addressed by
http://review.gluster.org/#/c/9108/

This is just quota.t in tests/basic, not the anon-fd one

Pranith

~Atin

On Wed, Nov 12, 2014 at 3:22 PM, Kiran Patil mailto:ki...@fractalio.com>> wrote:

 Hi,

 Gluster suite report,

 Gluster version: glusterfs 3.6.1

 On disk filesystem: Zfs 0.6.3-1.1

 Operating system: CentOS release 6.6 (Final)

 We are seeing quota and snapshot testcase failures.

 We are not sure why quota is failing since quotas worked fine on
 gluster 3.4.

 Test Summary Report
 ---
 ./tests/basic/quota-anon-fd-nfs.t  (Wstat: 0 Tests: 16 Failed: 1)
   Failed test:  16
 ./tests/basic/quota.t  (Wstat: 0 Tests: 73 Failed: 4)
   Failed tests:  24, 28, 32, 65
 ./tests/basic/uss.t  (Wstat: 0 Tests: 147 Failed: 78)
   Failed tests:  8-11, 16-25, 28-29, 31-32, 39-40, 45-47
 49-57, 60-61, 63-64, 71-72, 78-87, 90-91
 93-94, 101-102, 107-115, 118-119, 121-122
 129-130, 134, 136-137, 139-140, 142-143
 145-146
 ./tests/basic/volume-snapshot.t  (Wstat: 0 Tests: 30 Failed: 12)
   Failed tests:  11-18, 21-24
 ./tests/basic/volume-status.t  (Wstat: 0 Tests: 14 Failed: 1)
   Failed test:  14
 ./tests/bugs/bug-1023974.t   (Wstat: 0 Tests: 15 Failed: 1)
   Failed test:  12
 ./tests/bugs/bug-1038598.t   (Wstat: 0 Tests: 28 Failed: 6)
   Failed tests:  17, 21-22, 26-28
 ./tests/bugs/bug-1045333.t   (Wstat: 0 Tests: 16 Failed: 9)
   Failed tests:  7-15
 ./tests/bugs/bug-1049834.t   (Wstat: 0 Tests: 18 Failed: 7)
   Failed tests:  11-14, 16-18
 ./tests/bugs/bug-1087203.t   (Wstat: 0 Tests: 43 Failed: 2)
   Failed tests:  31, 41
 ./tests/bugs/bug-1090042.t   (Wstat: 0 Tests: 12 Failed: 3)
   Failed tests:  9-11
 ./tests/bugs/bug-1109770.t   (Wstat: 0 Tests: 19 Failed: 4)
   Failed tests:  8-11
 ./tests/bugs/bug-1109889.t   (Wstat: 0 Tests: 20 Failed: 4)
   Failed tests:  8-11
 ./tests/bugs/bug-1112559.t   (Wstat: 0 Tests: 11 Failed: 3)
   Failed tests:  8-9, 11
 ./tests/bugs/bug-1112613.t   (Wstat: 0 Tests: 22 Failed: 5)
   Failed tests:  12-14, 17-18
 ./tests/bugs/bug-1113975.t   (Wstat: 0 Tests: 13 Failed: 4)
   Failed tests:  8-9, 11-12
 ./tests/bugs/bug-847622.t  (Wstat: 0 Tests: 10 Failed: 1)
   Failed test:  8
 ./tests/bugs/bug-861542.t  (Wstat: 0 Tests: 13 Failed: 7)
   Failed tests:  7-13
 ./tests/features/ssl-authz.t   (Wstat: 0 Tests: 18 Failed: 1)
   Failed test:  18
 Files=277, Tests=7908, 8147 wallclock secs ( 4.56 usr  0.78 sys +
 774.74 cusr 666.97 csys = 1447.05 CPU)
 Result: FAIL

 Thanks,
 Kiran.




___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-devel




___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-devel



___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] How to resolve gfid (and .glusterfs symlink) for a deleted file

2014-11-21 Thread Pranith Kumar Karampuri


On 11/21/2014 09:04 PM, Nux! wrote:

Hi,

I deleted a file by mistake in a brick. I never managed to find out its gfid so 
now I have a rogue symlink in .glusterfs pointing to it (if I got how it works).
Any way I can discover which is this file and get rid of it?
symlinks exist in .glusterfs for directories. For files it will be 
hardlinks. If the volume has replicate, you have the file even on the 
other brick in replication, so you can find the gfid. Accessing this 
file from mount will try to re-create the file, but the linking will 
fail because the rogue link still exists.


If your system doesn't have hardlinks, you can probably do a 'find' for 
files with link-count as '1'


Pranith


--
Sent from the Delta quadrant using Borg technology!

Nux!
www.nux.ro
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-devel


___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] How to resolve gfid (and .glusterfs symlink) for a deleted file

2014-11-21 Thread Pranith Kumar Karampuri


On 11/21/2014 09:50 PM, Ben England wrote:

Nux,

Those thousands of entries all would match "-links 2" but not "-links 1"  The only entry in 
.glusterfs that would match is the entry where you deleted the file from the brick.  That's how hardlinks work - when 
you create a regular file, the link count is increased to 1 (since the directory entry now references the inode), and 
when you create an additional hard link to the same file, the link count is increased to 2.   Try this with the 
"stat your-file" command and look at the link count, watch how it changes.  The "find" command that 
I gave you just tracks down the one hardlink that you want and nothing else.
Does this filter out symlinks? because gfid-symlinks of directories will 
have link-count 1. So may be the command should filter out symlinks?


Pranith


-ben

- Original Message -

From: "Nux!" 
To: "Ben England" 
Cc: "Gluster Devel" 
Sent: Friday, November 21, 2014 11:03:46 AM
Subject: Re: [Gluster-devel] How to resolve gfid (and .glusterfs symlink) for a 
deleted file

Hi Ben,

I have thousands of entries under /your/brick/directory/.glusterfs .. find
would return too many results.
How do I find the one I'm looking for? :-)

--
Sent from the Delta quadrant using Borg technology!

Nux!
www.nux.ro

- Original Message -

From: "Ben England" 
To: "Nux!" 
Cc: "Gluster Devel" 
Sent: Friday, 21 November, 2014 16:00:40
Subject: Re: [Gluster-devel] How to resolve gfid (and .glusterfs symlink)
for a   deleted file
first of all, links in .glusterfs are HARD links not symlinks.   So the
file is
not actually deleted, since the local filesystem keeps a count of
references to
the inode and won't release the inode until the ref count reaches zero.   I
tried this, it turns out you can find it with

# find /your/brick/directory/.glusterfs -links 1 -type f

You use "type f" because it's a hard link to a file, and you don't want to
look
at directories or "." or ".." .  Once you find the link, you can copy the
file
off somewhere, and then delete the link.  At that point, regular self-heal
could repair it (i.e. just do "ls" on the file from a Gluster mountpoint).

- Original Message -

From: "Nux!" 
To: "Gluster Devel" 
Sent: Friday, November 21, 2014 10:34:09 AM
Subject: [Gluster-devel] How to resolve gfid (and .glusterfs symlink) for
a
deleted file

Hi,

I deleted a file by mistake in a brick. I never managed to find out its
gfid
so now I have a rogue symlink in .glusterfs pointing to it (if I got how
it
works).
Any way I can discover which is this file and get rid of it?

--
Sent from the Delta quadrant using Borg technology!

Nux!
www.nux.ro
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-devel

___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-devel


___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] How to resolve gfid (and .glusterfs symlink) for a deleted file

2014-11-21 Thread Pranith Kumar Karampuri


On 11/22/2014 12:10 PM, Pranith Kumar Karampuri wrote:


On 11/21/2014 09:50 PM, Ben England wrote:

Nux,

Those thousands of entries all would match "-links 2" but not "-links 
1"  The only entry in .glusterfs that would match is the entry where 
you deleted the file from the brick.  That's how hardlinks work - 
when you create a regular file, the link count is increased to 1 
(since the directory entry now references the inode), and when you 
create an additional hard link to the same file, the link count is 
increased to 2.   Try this with the "stat your-file" command and look 
at the link count, watch how it changes.  The "find" command that I 
gave you just tracks down the one hardlink that you want and nothing 
else.
Does this filter out symlinks? because gfid-symlinks of directories 
will have link-count 1. So may be the command should filter out symlinks?

Ah you gave '-type f'. Sorry missed that part :-)

Pranith


Pranith


-ben

- Original Message -

From: "Nux!" 
To: "Ben England" 
Cc: "Gluster Devel" 
Sent: Friday, November 21, 2014 11:03:46 AM
Subject: Re: [Gluster-devel] How to resolve gfid (and .glusterfs 
symlink) for adeleted file


Hi Ben,

I have thousands of entries under /your/brick/directory/.glusterfs 
.. find

would return too many results.
How do I find the one I'm looking for? :-)

--
Sent from the Delta quadrant using Borg technology!

Nux!
www.nux.ro

- Original Message -

From: "Ben England" 
To: "Nux!" 
Cc: "Gluster Devel" 
Sent: Friday, 21 November, 2014 16:00:40
Subject: Re: [Gluster-devel] How to resolve gfid (and .glusterfs 
symlink)

for adeleted file
first of all, links in .glusterfs are HARD links not symlinks.   So 
the

file is
not actually deleted, since the local filesystem keeps a count of
references to
the inode and won't release the inode until the ref count reaches 
zero.   I

tried this, it turns out you can find it with

# find /your/brick/directory/.glusterfs -links 1 -type f

You use "type f" because it's a hard link to a file, and you don't 
want to

look
at directories or "." or ".." .  Once you find the link, you can 
copy the

file
off somewhere, and then delete the link.  At that point, regular 
self-heal
could repair it (i.e. just do "ls" on the file from a Gluster 
mountpoint).


- Original Message -

From: "Nux!" 
To: "Gluster Devel" 
Sent: Friday, November 21, 2014 10:34:09 AM
Subject: [Gluster-devel] How to resolve gfid (and .glusterfs 
symlink) for

a
deleted file

Hi,

I deleted a file by mistake in a brick. I never managed to find 
out its

gfid
so now I have a rogue symlink in .glusterfs pointing to it (if I 
got how

it
works).
Any way I can discover which is this file and get rid of it?

--
Sent from the Delta quadrant using Borg technology!

Nux!
www.nux.ro
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-devel

___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-devel


___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-devel


___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] Proposal for more sub-maintainers

2014-12-04 Thread Pranith Kumar Karampuri


On 12/04/2014 08:32 PM, Niels de Vos wrote:

On Fri, Nov 28, 2014 at 01:08:29PM +0530, Vijay Bellur wrote:

Hi All,

To supplement our ongoing effort of better patch management, I am proposing
the addition of more sub-maintainers for various components. The rationale
behind this proposal & the responsibilities of maintainers continue to be
the same as discussed in these lists a while ago [1]. Here is the proposed
list:

Build - Kaleb Keithley & Niels de Vos

DHT   - Raghavendra Gowdappa & Shyam Ranganathan

docs  - Humble Chirammal & Lalatendu Mohanty

gfapi - Niels de Vos & Shyam Ranganathan

index & io-threads - Pranith Karampuri

posix - Pranith Karampuri & Raghavendra Bhat

I'm wondering if there are any volunteers for maintaining the FUSE
component?

I am interested in this work if you can guide me.

Pranith


And maybe rewrite it to use libgfapi and drop the mount.glusterfs
script?

Niels


We intend to update Gerrit with this list by 8th of December. Please let us
know if you have objections, concerns or feedback on this process by then.

Thanks,
Vijay

[1] http://gluster.org/pipermail/gluster-devel/2014-April/025425.html

___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-devel


___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-devel


___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] bit rot

2015-01-07 Thread Pranith Kumar Karampuri


On 01/07/2015 12:48 PM, Raghavendra Bhat wrote:


Hi,

As per the design dicussion it was mentioned that, there will be one 
BitD running per node which will take care of all the bricks of all 
the volumes running on that node. But, here once thing that becomes 
important is doing graph changes for the BitD process upon 
enabling/disabling of bit-rot functionality for the volumes. With more 
and more graph changes, there is more chance of BitD running out of 
memory (as of now the older graphs in glusterfs are not cleaned up).
Both NFS and SHD processes have same problem, but upon graph switch the 
daemons are restarted. Is there any reason this approach is not taken?


Pranith


So for now it will be better to have one BitD per volume per node. In 
this case, there will not be graph changes in BitD. It will be started 
for a volume upon enabling bit-rot functionality for that volume and 
will be brought down when bit-rot is disabled for a volume.


Regards,
Raghavendra Bhat
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] Improvement of eager locking

2015-01-15 Thread Pranith Kumar Karampuri


On 01/15/2015 10:53 PM, Xavier Hernandez wrote:

Hi,

currently eager locking is implemented by checking the open-fd-count 
special xattr for each write. If there's more than one open on the 
same file, eager locking is disabled to avoid starvation.


This works quite well for file writes, but makes eager locking 
unusable for other request types that do not involve an open fd (in 
fact, this method is only for writes on regular files, not reads or 
directories). This may cause a performance problem for other 
operations, like metadata.


To be able to use eager locking for other purposes, what do you think 
about this proposal:


Instead of implementing open-fd-count on posix xlator, do something 
similar but in locks xlator. The difference will be that locks xlator 
can use the pending locking information to determine if there are 
other processes waiting for a resource. If so, set a flag in the cbk 
xdata to let high level xlators know that they should not use eager 
locking (this can be done only upon request by xdata).


I think this way provides a more precise way to avoid starvation and 
maximize performance at the same time, and it can be used for any 
request even if it doesn't depend on an fd.


Another advantage is that if one file has been opened multiple times 
but all of them from the same glusterfs client, that client could use 
a single inodelk to manage all the accesses, not needing to release 
the lock. Current implementation in posix xlator cannot differentiate 
from opens from the same client or different clients.


What do you think ?
I like the idea. So basically we can propagate list_empty information of 
'blocking_locks' list. And for sending locks, we need to use lk-owner 
based on gfid so that locks from same client i.e. lkowner+transport are 
granted irrespective of conflicting locks. The respective xls need to 
make sure to order the fops so that they don't step on each other in a 
single process. This can be used for entry-locks also.


Pranith


Xavi


___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] Improvement of eager locking

2015-01-22 Thread Pranith Kumar Karampuri


On 01/16/2015 05:40 PM, Xavier Hernandez wrote:

On 01/16/2015 04:58 AM, Pranith Kumar Karampuri wrote:


On 01/15/2015 10:53 PM, Xavier Hernandez wrote:

Hi,

currently eager locking is implemented by checking the open-fd-count
special xattr for each write. If there's more than one open on the
same file, eager locking is disabled to avoid starvation.

This works quite well for file writes, but makes eager locking
unusable for other request types that do not involve an open fd (in
fact, this method is only for writes on regular files, not reads or
directories). This may cause a performance problem for other
operations, like metadata.

To be able to use eager locking for other purposes, what do you think
about this proposal:

Instead of implementing open-fd-count on posix xlator, do something
similar but in locks xlator. The difference will be that locks xlator
can use the pending locking information to determine if there are
other processes waiting for a resource. If so, set a flag in the cbk
xdata to let high level xlators know that they should not use eager
locking (this can be done only upon request by xdata).

I think this way provides a more precise way to avoid starvation and
maximize performance at the same time, and it can be used for any
request even if it doesn't depend on an fd.

Another advantage is that if one file has been opened multiple times
but all of them from the same glusterfs client, that client could use
a single inodelk to manage all the accesses, not needing to release
the lock. Current implementation in posix xlator cannot differentiate
from opens from the same client or different clients.

What do you think ?

I like the idea. So basically we can propagate list_empty information of
'blocking_locks' list. And for sending locks, we need to use lk-owner
based on gfid so that locks from same client i.e. lkowner+transport are
granted irrespective of conflicting locks. The respective xls need to
make sure to order the fops so that they don't step on each other in a
single process. This can be used for entry-locks also.


I don't understand what are the benefits of checking for 
lkowner+transport to grant a lock bypassing conflicts. It seems 
dangerous and I don't see exactly how this can help the upper xlator. 
If this xlator already needs to take care of fop ordering for each 
inode, it can share the same lock. It seems there's no need to do 
additional locking calls. I may be missing some detail though.
Afr at the time of 3.2 or 3.3 used to take full file lock for doing 
self-heal. But this scheme was useless for VM healing. So we had to 
migrate the locking in a backward compatible way, so this strategy was 
employed where healing will take 128k chunk lock at a time heal that 
chunk and move to next chunk, but at no point we needed another 
self-heal to start. So the locking scheme came to be: Take a full file 
lock, find good/bad copies, take a lock on chunk-1, unlock full lock, 
heal chunk-1 then take a lock on chunk-2 unlock the lock on chunk-1 etc. 
To have this we needed the locks to be granted even when there were 
conflicting locks, so we chose (lk-owner+ transport) being same as a way 
to grant conflicting locks. We found that this can lead to another 
problem where truncate fop etc can hang, so we are moving to a different 
mechanism now. You can find the complete lock evolution document here 
https://github.com/gluster/glusterfs/blob/master/doc/code/xlators/cluster/afr/afr-locks-evolution.md




Thinking a litle more about the way to detect multiple accesses to the 
same inode using the list of pending locks, there's a case where some 
more logic must be added to avoid unnecessary delays.


Suppose you receive a request for an inode from one client. If there 
isn't anyone else waiting, a flag is set into the answer indicating 
that there's no conflict. After that the caller begins an eager lock 
timer because there isn't anyone else waiting. During that timeout, 
another client tries to access the same inode. It will block until the 
eager lock timer expires (at this time it will release the inode lock) 
or another request from the first client arrives (in this case the 
request is served and the result will indicate that it should release 
the lock since there are other clients waiting). When the lock is 
released, it will be granted to the other client. It's possible that 
this client completes the request before the first one tries to 
acquire the lock again (because it had more requests pending), causing 
that the second client initiates another eager lock timer because 
there were no other client waiting at the moment of executing the 
request. This is an unnecessary delay.


To avoid this problem, we could add a flag in the inodelk/entrylk 
calls to indicate that the lock is released to let other clients to 
proceed, but we will want the lock again as soon as possible. It would 
be as a combined unlo

Re: [Gluster-devel] v3.6.2

2015-01-26 Thread Pranith Kumar Karampuri


On 01/26/2015 09:41 PM, Justin Clift wrote:

On 26 Jan 2015, at 14:50, David F. Robinson  
wrote:

I have a server with v3.6.2 from which I cannot mount using NFS.  The FUSE 
mount works, however, I cannot get the NFS mount to work. From /var/log/message:
  
Jan 26 09:27:28 gfs01bkp mount[2810]: mount to NFS server 'gfsib01bkp.corvidtec.com' failed: Connection refused, retrying

Jan 26 09:27:53 gfs01bkp mount[4456]: mount to NFS server 
'gfsib01bkp.corvidtec.com' failed: Connection refused, retrying
Jan 26 09:29:28 gfs01bkp mount[2810]: mount to NFS server 
'gfsib01bkp.corvidtec.com' failed: Connection refused, retrying
Jan 26 09:29:53 gfs01bkp mount[4456]: mount to NFS server 
'gfsib01bkp.corvidtec.com' failed: Connection refused, retrying
Jan 26 09:31:28 gfs01bkp mount[2810]: mount to NFS server 
'gfsib01bkp.corvidtec.com' failed: Connection refused, retrying
Jan 26 09:31:53 gfs01bkp mount[4456]: mount to NFS server 
'gfsib01bkp.corvidtec.com' failed: Connection refused, retrying
Jan 26 09:33:28 gfs01bkp mount[2810]: mount to NFS server 
'gfsib01bkp.corvidtec.com' failed: Connection refused, retrying
Jan 26 09:33:53 gfs01bkp mount[4456]: mount to NFS server 
'gfsib01bkp.corvidtec.com' failed: Connection refused, retrying
Jan 26 09:35:28 gfs01bkp mount[2810]: mount to NFS server 
'gfsib01bkp.corvidtec.com' failed: Connection refused, retrying
Jan 26 09:35:53 gfs01bkp mount[4456]: mount to NFS server 
'gfsib01bkp.corvidtec.com' failed: Connection refused, retrying
  
  
I also am continually getting the following errors in /var/log/glusterfs:
  
[root@gfs01bkp glusterfs]# tail -f etc-glusterfs-glusterd.vol.log

[2015-01-26 14:41:51.260827] W [socket.c:611:__socket_rwv] 0-management: readv 
on /var/run/1f0cee5a2d074e39b32ee5a81c70e68c.socket failed (Invalid argument)
[2015-01-26 14:41:54.261240] W [socket.c:611:__socket_rwv] 0-management: readv 
on /var/run/1f0cee5a2d074e39b32ee5a81c70e68c.socket failed (Invalid argument)
[2015-01-26 14:41:57.261642] W [socket.c:611:__socket_rwv] 0-management: readv 
on /var/run/1f0cee5a2d074e39b32ee5a81c70e68c.socket failed (Invalid argument)
[2015-01-26 14:42:00.262073] W [socket.c:611:__socket_rwv] 0-management: readv 
on /var/run/1f0cee5a2d074e39b32ee5a81c70e68c.socket failed (Invalid argument)
[2015-01-26 14:42:03.262504] W [socket.c:611:__socket_rwv] 0-management: readv 
on /var/run/1f0cee5a2d074e39b32ee5a81c70e68c.socket failed (Invalid argument)
[2015-01-26 14:42:06.262935] W [socket.c:611:__socket_rwv] 0-management: readv 
on /var/run/1f0cee5a2d074e39b32ee5a81c70e68c.socket failed (Invalid argument)
[2015-01-26 14:42:09.263334] W [socket.c:611:__socket_rwv] 0-management: readv 
on /var/run/1f0cee5a2d074e39b32ee5a81c70e68c.socket failed (Invalid argument)
[2015-01-26 14:42:12.263761] W [socket.c:611:__socket_rwv] 0-management: readv 
on /var/run/1f0cee5a2d074e39b32ee5a81c70e68c.socket failed (Invalid argument)
[2015-01-26 14:42:15.264177] W [socket.c:611:__socket_rwv] 0-management: readv 
on /var/run/1f0cee5a2d074e39b32ee5a81c70e68c.socket failed (Invalid argument)
[2015-01-26 14:42:18.264623] W [socket.c:611:__socket_rwv] 0-management: readv 
on /var/run/1f0cee5a2d074e39b32ee5a81c70e68c.socket failed (Invalid argument)
[2015-01-26 14:42:21.265053] W [socket.c:611:__socket_rwv] 0-management: readv 
on /var/run/1f0cee5a2d074e39b32ee5a81c70e68c.socket failed (Invalid argument)
[2015-01-26 14:42:24.265504] W [socket.c:611:__socket_rwv] 0-management: readv 
on /var/run/1f0cee5a2d074e39b32ee5a81c70e68c.socket failed (Invalid argument)
I believe this error message comes when the socket file is not present. 
I see the following commit which changed the location of the sockets. 
May be Atin may know more. about this: +Atin.


Pranith

^C
  
Also, when I try to NFS mount my gluster volume, I am getting

Any chance there's a network or host based firewall stopping some of the ports?

+ Justin

--
GlusterFS - http://www.gluster.org

An open source, distributed file system scaling to several
petabytes, and handling thousands of clients.

My personal twitter: twitter.com/realjustinclift

___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] [Gluster-users] v3.6.2

2015-01-26 Thread Pranith Kumar Karampuri


On 01/27/2015 07:33 AM, Pranith Kumar Karampuri wrote:


On 01/26/2015 09:41 PM, Justin Clift wrote:
On 26 Jan 2015, at 14:50, David F. Robinson 
 wrote:
I have a server with v3.6.2 from which I cannot mount using NFS.  
The FUSE mount works, however, I cannot get the NFS mount to work. 
From /var/log/message:
  Jan 26 09:27:28 gfs01bkp mount[2810]: mount to NFS server 
'gfsib01bkp.corvidtec.com' failed: Connection refused, retrying
Jan 26 09:27:53 gfs01bkp mount[4456]: mount to NFS server 
'gfsib01bkp.corvidtec.com' failed: Connection refused, retrying
Jan 26 09:29:28 gfs01bkp mount[2810]: mount to NFS server 
'gfsib01bkp.corvidtec.com' failed: Connection refused, retrying
Jan 26 09:29:53 gfs01bkp mount[4456]: mount to NFS server 
'gfsib01bkp.corvidtec.com' failed: Connection refused, retrying
Jan 26 09:31:28 gfs01bkp mount[2810]: mount to NFS server 
'gfsib01bkp.corvidtec.com' failed: Connection refused, retrying
Jan 26 09:31:53 gfs01bkp mount[4456]: mount to NFS server 
'gfsib01bkp.corvidtec.com' failed: Connection refused, retrying
Jan 26 09:33:28 gfs01bkp mount[2810]: mount to NFS server 
'gfsib01bkp.corvidtec.com' failed: Connection refused, retrying
Jan 26 09:33:53 gfs01bkp mount[4456]: mount to NFS server 
'gfsib01bkp.corvidtec.com' failed: Connection refused, retrying
Jan 26 09:35:28 gfs01bkp mount[2810]: mount to NFS server 
'gfsib01bkp.corvidtec.com' failed: Connection refused, retrying
Jan 26 09:35:53 gfs01bkp mount[4456]: mount to NFS server 
'gfsib01bkp.corvidtec.com' failed: Connection refused, retrying
I also am continually getting the following errors in 
/var/log/glusterfs:

  [root@gfs01bkp glusterfs]# tail -f etc-glusterfs-glusterd.vol.log
[2015-01-26 14:41:51.260827] W [socket.c:611:__socket_rwv] 
0-management: readv on 
/var/run/1f0cee5a2d074e39b32ee5a81c70e68c.socket failed (Invalid 
argument)
[2015-01-26 14:41:54.261240] W [socket.c:611:__socket_rwv] 
0-management: readv on 
/var/run/1f0cee5a2d074e39b32ee5a81c70e68c.socket failed (Invalid 
argument)
[2015-01-26 14:41:57.261642] W [socket.c:611:__socket_rwv] 
0-management: readv on 
/var/run/1f0cee5a2d074e39b32ee5a81c70e68c.socket failed (Invalid 
argument)
[2015-01-26 14:42:00.262073] W [socket.c:611:__socket_rwv] 
0-management: readv on 
/var/run/1f0cee5a2d074e39b32ee5a81c70e68c.socket failed (Invalid 
argument)
[2015-01-26 14:42:03.262504] W [socket.c:611:__socket_rwv] 
0-management: readv on 
/var/run/1f0cee5a2d074e39b32ee5a81c70e68c.socket failed (Invalid 
argument)
[2015-01-26 14:42:06.262935] W [socket.c:611:__socket_rwv] 
0-management: readv on 
/var/run/1f0cee5a2d074e39b32ee5a81c70e68c.socket failed (Invalid 
argument)
[2015-01-26 14:42:09.263334] W [socket.c:611:__socket_rwv] 
0-management: readv on 
/var/run/1f0cee5a2d074e39b32ee5a81c70e68c.socket failed (Invalid 
argument)
[2015-01-26 14:42:12.263761] W [socket.c:611:__socket_rwv] 
0-management: readv on 
/var/run/1f0cee5a2d074e39b32ee5a81c70e68c.socket failed (Invalid 
argument)
[2015-01-26 14:42:15.264177] W [socket.c:611:__socket_rwv] 
0-management: readv on 
/var/run/1f0cee5a2d074e39b32ee5a81c70e68c.socket failed (Invalid 
argument)
[2015-01-26 14:42:18.264623] W [socket.c:611:__socket_rwv] 
0-management: readv on 
/var/run/1f0cee5a2d074e39b32ee5a81c70e68c.socket failed (Invalid 
argument)
[2015-01-26 14:42:21.265053] W [socket.c:611:__socket_rwv] 
0-management: readv on 
/var/run/1f0cee5a2d074e39b32ee5a81c70e68c.socket failed (Invalid 
argument)
[2015-01-26 14:42:24.265504] W [socket.c:611:__socket_rwv] 
0-management: readv on 
/var/run/1f0cee5a2d074e39b32ee5a81c70e68c.socket failed (Invalid 
argument)
I believe this error message comes when the socket file is not 
present. I see the following commit which changed the location of the 
sockets. May be Atin may know more. about this: +Atin.
Ignore this mail above. I see that the commit is only present on Master: 
http://review.gluster.org/9423


Pranith


Pranith

^C
  Also, when I try to NFS mount my gluster volume, I am getting
Any chance there's a network or host based firewall stopping some of 
the ports?


+ Justin

--
GlusterFS - http://www.gluster.org

An open source, distributed file system scaling to several
petabytes, and handling thousands of clients.

My personal twitter: twitter.com/realjustinclift

___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


___
Gluster-users mailing list
gluster-us...@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users


___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


[Gluster-devel] Regarding multi-threaded epoll and notify

2015-01-29 Thread Pranith Kumar Karampuri

hi,
With the addition of multi-threaded epoll functionality, 
handling CHILD_UP/CHILD_DOWN in cluster xlators gets tricky. i.e. Let us 
assume one of the bricks in replication is down while the other one is 
up. Now if in parallel the brick that went down comes up and the one 
that is up goes down, there is a possibility for wrong CHILD_UP/DOWN to 
go to parent of afr, similar for dht as well. One way to fix it is each 
cluster xlator handle this correctly but it entails calling 'notify' 
inside mutex locks to the parent translator. Another way to fix it is 
that we should add the CHILD_UP/CHILD_DOWN related epoll events from rpc 
layer into a queue and have a dedicated thread process these events, 
something similar to how timer thread processes events. If this gets in 
we get best of both worlds where CHILD_UP/DOWN will be single threaded 
so xlators don't have to worry about it, while the fops get 
multi-threaded behaviour.


I like the second approach. What do you guys suggest?

Pranith
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


[Gluster-devel] Some thing is wrong with review.gluster.org

2015-02-01 Thread Pranith Kumar Karampuri

hi,
I get following errors, when I try to do git fetch
pk1@localhost - ~/workspace/gerrit-repo (ec-notify-1)
15:48:44 :( ⚡ git fetch && git rebase origin/master
fatal: internal server error
remote: internal server error
fatal: protocol error: bad pack header

What is worrisome is that when I click on any of the first 5 patch URLs 
it gives the following exception:
org.eclipse.jgit.errors.MissingObjectException: Missing unknown 
5a276a3fa2e7d09a43f8e9334d7453d14888c0a3


Some of the patch URLs are opening properly, only first 5 patches 
@review.gluster.org at the moment are giving problems. Does anyone have 
any clue?


Pranith
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] Error in dht-common.c?

2015-02-01 Thread Pranith Kumar Karampuri


On 01/31/2015 10:49 PM, Dennis Schafroth wrote:

Hi

when compiling dht-common.c with clang (on mac, but I dont think that 
matters) some warnings seem to reveal an error:


  CC   dht-common.lo
*dht-common.c:2997:57: **warning: **size argument in 'strncmp' call is 
a comparison [-Wmemsize-comparison]*

 strlen (GF_XATTR_LOCKINFO_KEY) != 0))) {
*  ~~~^~~~*
*dht-common.c:2996:17: **note: *did you mean to compare the result of 
'strncmp' instead?

&& (strncmp (key, GF_XATTR_LOCKINFO_KEY,
*^*
*dht-common.c:2997:26: **note: *explicitly cast the argument to size_t 
to silence this warning

 strlen (GF_XATTR_LOCKINFO_KEY) != 0))) {
*  ^*
(size_t)(  )

I believe that the a parentes is misplaced, so the code is doing

strncmp(key, GF_XATTR_LOCKINFO_KEY, 0)

I think the following patch moves the paretens the correct place:

*--- a/xlators/cluster/dht/src/dht-common.c*
*+++ b/xlators/cluster/dht/src/dht-common.c*
@@ -2994,7 +2994,7 @@ dht_fgetxattr (call_frame_t *frame, xlator_t *this,
 if ((fd->inode->ia_type == IA_IFDIR)
&& key
 && (strncmp (key, GF_XATTR_LOCKINFO_KEY,
- strlen (GF_XATTR_LOCKINFO_KEY) != 0))) {
+ strlen (GF_XATTR_LOCKINFO_KEY)) != 0)) {
 cnt = local->call_cnt = layout->cnt;
 } else {
 cnt = local->call_cnt  = 1;
Thanks for submitting the patch. Gave +1 already. One of the maintainers 
of dht will pick it up.


Pranith



cheers,
:-Dennis



___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] Some thing is wrong with review.gluster.org

2015-02-01 Thread Pranith Kumar Karampuri

Thanks vijay, it works now.

Pranith
On 02/01/2015 04:27 PM, Vijay Bellur wrote:

On 02/01/2015 11:24 AM, Pranith Kumar Karampuri wrote:

hi,
I get following errors, when I try to do git fetch
pk1@localhost - ~/workspace/gerrit-repo (ec-notify-1)
15:48:44 :( ⚡ git fetch && git rebase origin/master
fatal: internal server error
remote: internal server error
fatal: protocol error: bad pack header

What is worrisome is that when I click on any of the first 5 patch URLs
it gives the following exception:
org.eclipse.jgit.errors.MissingObjectException: Missing unknown
5a276a3fa2e7d09a43f8e9334d7453d14888c0a3

Some of the patch URLs are opening properly, only first 5 patches
@review.gluster.org at the moment are giving problems. Does anyone have
any clue?


Not sure why this could have happened. I did run `git gc` on the 
server repository and we seem to be doing fine now.


Thanks,
Vijay



___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


[Gluster-devel] submitted but merge pending issue seems to be back :-(

2015-02-02 Thread Pranith Kumar Karampuri

I see the following two patches in that state:
http://review.gluster.com/9456
http://review.gluster.com/9409

Pranith
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] 3.6.2 volume heal

2015-02-02 Thread Pranith Kumar Karampuri


On 02/03/2015 12:13 PM, Raghavendra Bhat wrote:

On Monday 02 February 2015 09:07 PM, David F. Robinson wrote:
I upgraded one of my bricks from 3.6.1 to 3.6.2 and I can no longer 
do a 'gluster volume heal homegfs info'.  It hangs and never returns 
any information.
I was trying to ensure that gfs01a had finished healing before 
upgrading the other machines (gfs01b, gfs02a, gfs02b) in my 
configuration (see below).

'gluster volume homegfs statistics' still works fine.
Do I need to upgrade my other bricks to get the 'gluster volume heal 
homegfs info' working?  Or, should I fix this issue before upgrading 
my other machines?

Volume Name: homegfs
Type: Distributed-Replicate
Volume ID: 1e32672a-f1b7-4b58-ba94-58c085e59071
Status: Started
Number of Bricks: 4 x 2 = 8
Transport-type: tcp
Bricks:
Brick1: gfsib01a.corvidtec.com:/data/brick01a/homegfs
Brick2: gfsib01b.corvidtec.com:/data/brick01b/homegfs
Brick3: gfsib01a.corvidtec.com:/data/brick02a/homegfs
Brick4: gfsib01b.corvidtec.com:/data/brick02b/homegfs
Brick5: gfsib02a.corvidtec.com:/data/brick01a/homegfs
Brick6: gfsib02b.corvidtec.com:/data/brick01b/homegfs
Brick7: gfsib02a.corvidtec.com:/data/brick02a/homegfs
Brick8: gfsib02b.corvidtec.com:/data/brick02b/homegfs
Options Reconfigured:
performance.io-thread-count: 32
performance.cache-size: 128MB
performance.write-behind-window-size: 128MB
server.allow-insecure: on
network.ping-timeout: 10
storage.owner-gid: 100
geo-replication.indexing: off
geo-replication.ignore-pid-check: on
changelog.changelog: on
changelog.fsync-interval: 3
changelog.rollover-time: 15
server.manage-gids: on


___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


CCing Pranith, the maintainer of replicate. In the meantime can you 
please provide the logs from the machine where you have upgraded?
Anuradha already followed up with David. Seems like he got out of this 
situation by upgrading the other node and removing some files which went 
into split-brain.
Next steps we are following up with David will be to check if the heal 
info command is run from 3.6.1 nodes or 3.6.2 nodes.


Pranith


Regards,
Raghavendra Bhat


___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] failed heal

2015-02-03 Thread Pranith Kumar Karampuri


On 02/02/2015 03:34 AM, David F. Robinson wrote:
I have several files that gluster says it cannot heal.  I deleted the 
files from all of the bricks 
(/data/brick0*/hpc_shared/motorsports/gmics/Raven/p3/*) and ran a full 
heal using 'gluster volume heal homegfs full'.  Even after the full 
heal, the entries below still show up.

How do I clear these?
3.6.1 Had an issue where files undergoing I/O will also be shown in the 
output of 'gluster volume heal  info', we addressed that in 
3.6.2. Is this output from 3.6.1 by any chance?


Pranith

[root@gfs01a ~]# gluster volume heal homegfs info
Gathering list of entries to be healed on volume homegfs has been 
successful

Brick gfsib01a.corvidtec.com:/data/brick01a/homegfs
Number of entries: 10
/hpc_shared/motorsports/gmics/Raven/p3/70_rke/Movies







/hpc_shared/motorsports/gmics/Raven/p3/70_rke/.Convrg.swp
/hpc_shared/motorsports/gmics/Raven/p3/70_rke
Brick gfsib01b.corvidtec.com:/data/brick01b/homegfs
Number of entries: 2

/hpc_shared/motorsports/gmics/Raven/p3/70_rke
Brick gfsib01a.corvidtec.com:/data/brick02a/homegfs
Number of entries: 7

/hpc_shared/motorsports/gmics/Raven/p3/70_rke/PICTURES/.tmpcheck
/hpc_shared/motorsports/gmics/Raven/p3/70_rke/PICTURES
/hpc_shared/motorsports/gmics/Raven/p3/70_rke/Movies



Brick gfsib01b.corvidtec.com:/data/brick02b/homegfs
Number of entries: 0
Brick gfsib02a.corvidtec.com:/data/brick01a/homegfs
Number of entries: 0
Brick gfsib02b.corvidtec.com:/data/brick01b/homegfs
Number of entries: 0
Brick gfsib02a.corvidtec.com:/data/brick02a/homegfs
Number of entries: 0
Brick gfsib02b.corvidtec.com:/data/brick02b/homegfs
Number of entries: 0
===
David F. Robinson, Ph.D.
President - Corvid Technologies
704.799.6944 x101 [office]
704.252.1310 [cell]
704.799.7974 [fax]
david.robin...@corvidtec.com 
http://www.corvidtechnologies.com


___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] failed heal

2015-02-04 Thread Pranith Kumar Karampuri


On 02/04/2015 11:52 PM, David F. Robinson wrote:

I don't recall if that was before or after my upgrade.
I'll forward you an email thread for the current heal issues which are 
after the 3.6.2 upgrade...
This is executed after the upgrade on just one machine. 3.6.2 entry 
locks are not compatible with versions <= 3.5.3 and 3.6.1 that is the 
reason. From 3.5.4 and releases >=3.6.2 it should work fine.


Pranith

David
-- Original Message --
From: "Pranith Kumar Karampuri" <mailto:pkara...@redhat.com>>
To: "David F. Robinson" <mailto:david.robin...@corvidtec.com>>; "gluster-us...@gluster.org" 
mailto:gluster-us...@gluster.org>>; 
"Gluster Devel" <mailto:gluster-devel@gluster.org>>

Sent: 2/4/2015 2:33:20 AM
Subject: Re: [Gluster-devel] failed heal


On 02/02/2015 03:34 AM, David F. Robinson wrote:
I have several files that gluster says it cannot heal. I deleted the 
files from all of the bricks 
(/data/brick0*/hpc_shared/motorsports/gmics/Raven/p3/*) and ran a 
full heal using 'gluster volume heal homegfs full'.  Even after the 
full heal, the entries below still show up.

How do I clear these?
3.6.1 Had an issue where files undergoing I/O will also be shown in 
the output of 'gluster volume heal  info', we addressed that 
in 3.6.2. Is this output from 3.6.1 by any chance?


Pranith

[root@gfs01a ~]# gluster volume heal homegfs info
Gathering list of entries to be healed on volume homegfs has been 
successful

Brick gfsib01a.corvidtec.com:/data/brick01a/homegfs
Number of entries: 10
/hpc_shared/motorsports/gmics/Raven/p3/70_rke/Movies







/hpc_shared/motorsports/gmics/Raven/p3/70_rke/.Convrg.swp
/hpc_shared/motorsports/gmics/Raven/p3/70_rke
Brick gfsib01b.corvidtec.com:/data/brick01b/homegfs
Number of entries: 2

/hpc_shared/motorsports/gmics/Raven/p3/70_rke
Brick gfsib01a.corvidtec.com:/data/brick02a/homegfs
Number of entries: 7

/hpc_shared/motorsports/gmics/Raven/p3/70_rke/PICTURES/.tmpcheck
/hpc_shared/motorsports/gmics/Raven/p3/70_rke/PICTURES
/hpc_shared/motorsports/gmics/Raven/p3/70_rke/Movies



Brick gfsib01b.corvidtec.com:/data/brick02b/homegfs
Number of entries: 0
Brick gfsib02a.corvidtec.com:/data/brick01a/homegfs
Number of entries: 0
Brick gfsib02b.corvidtec.com:/data/brick01b/homegfs
Number of entries: 0
Brick gfsib02a.corvidtec.com:/data/brick02a/homegfs
Number of entries: 0
Brick gfsib02b.corvidtec.com:/data/brick02b/homegfs
Number of entries: 0
===
David F. Robinson, Ph.D.
President - Corvid Technologies
704.799.6944 x101 [office]
704.252.1310 [cell]
704.799.7974 [fax]
david.robin...@corvidtec.com <mailto:david.robin...@corvidtec.com>
http://www.corvidtechnologies.com <http://www.corvidtechnologies.com/>


___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel




___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] missing files

2015-02-05 Thread Pranith Kumar Karampuri
I believe David already fixed this. I hope this is the same issue he 
told about permissions issue.


Pranith
On 02/05/2015 03:44 PM, Xavier Hernandez wrote:

Is the failure repeatable ? with the same directories ?

It's very weird that the directories appear on the volume when you do 
an 'ls' on the bricks. Could it be that you only made a single 'ls' on 
fuse mount which not showed the directory ? Is it possible that this 
'ls' triggered a self-heal that repaired the problem, whatever it was, 
and when you did another 'ls' on the fuse mount after the 'ls' on the 
bricks, the directories were there ?


The first 'ls' could have healed the files, causing that the following 
'ls' on the bricks showed the files as if nothing were damaged. If 
that's the case, it's possible that there were some disconnections 
during the copy.


Added Pranith because he knows better replication and self-heal details.

Xavi

On 02/04/2015 07:23 PM, David F. Robinson wrote:

Distributed/replicated

Volume Name: homegfs
Type: Distributed-Replicate
Volume ID: 1e32672a-f1b7-4b58-ba94-58c085e59071
Status: Started
Number of Bricks: 4 x 2 = 8
Transport-type: tcp
Bricks:
Brick1: gfsib01a.corvidtec.com:/data/brick01a/homegfs
Brick2: gfsib01b.corvidtec.com:/data/brick01b/homegfs
Brick3: gfsib01a.corvidtec.com:/data/brick02a/homegfs
Brick4: gfsib01b.corvidtec.com:/data/brick02b/homegfs
Brick5: gfsib02a.corvidtec.com:/data/brick01a/homegfs
Brick6: gfsib02b.corvidtec.com:/data/brick01b/homegfs
Brick7: gfsib02a.corvidtec.com:/data/brick02a/homegfs
Brick8: gfsib02b.corvidtec.com:/data/brick02b/homegfs
Options Reconfigured:
performance.io-thread-count: 32
performance.cache-size: 128MB
performance.write-behind-window-size: 128MB
server.allow-insecure: on
network.ping-timeout: 10
storage.owner-gid: 100
geo-replication.indexing: off
geo-replication.ignore-pid-check: on
changelog.changelog: on
changelog.fsync-interval: 3
changelog.rollover-time: 15
server.manage-gids: on


-- Original Message --
From: "Xavier Hernandez" 
To: "David F. Robinson" ; "Benjamin
Turner" 
Cc: "gluster-us...@gluster.org" ; "Gluster
Devel" 
Sent: 2/4/2015 6:03:45 AM
Subject: Re: [Gluster-devel] missing files


On 02/04/2015 01:30 AM, David F. Robinson wrote:

Sorry. Thought about this a little more. I should have been clearer.
The files were on both bricks of the replica, not just one side. So,
both bricks had to have been up... The files/directories just don't 
show

up on the mount.
I was reading and saw a related bug
(https://bugzilla.redhat.com/show_bug.cgi?id=1159484). I saw it
suggested to run:
 find  -d -exec getfattr -h -n trusted.ec.heal {} \;


This command is specific for a dispersed volume. It won't do anything
(aside from the error you are seeing) on a replicated volume.

I think you are using a replicated volume, right ?

In this case I'm not sure what can be happening. Is your volume a pure
replicated one or a distributed-replicated ? on a pure replicated it
doesn't make sense that some entries do not show in an 'ls' when the
file is in both replicas (at least without any error message in the
logs). On a distributed-replicated it could be caused by some problem
while combining contents of each replica set.

What's the configuration of your volume ?

Xavi



I get a bunch of errors for operation not supported:
[root@gfs02a homegfs]# find wks_backup -d -exec getfattr -h -n
trusted.ec.heal {} \;
find: warning: the -d option is deprecated; please use -depth instead,
because the latter is a POSIX-compliant feature.
wks_backup/homer_backup/backup: trusted.ec.heal: Operation not 
supported
wks_backup/homer_backup/logs/2014_05_20.log: trusted.ec.heal: 
Operation

not supported
wks_backup/homer_backup/logs/2014_05_21.log: trusted.ec.heal: 
Operation

not supported
wks_backup/homer_backup/logs/2014_05_18.log: trusted.ec.heal: 
Operation

not supported
wks_backup/homer_backup/logs/2014_05_19.log: trusted.ec.heal: 
Operation

not supported
wks_backup/homer_backup/logs/2014_05_22.log: trusted.ec.heal: 
Operation

not supported
wks_backup/homer_backup/logs: trusted.ec.heal: Operation not supported
wks_backup/homer_backup: trusted.ec.heal: Operation not supported
-- Original Message --
From: "Benjamin Turner" mailto:bennytu...@gmail.com>>
To: "David F. Robinson" mailto:david.robin...@corvidtec.com>>
Cc: "Gluster Devel" mailto:gluster-devel@gluster.org>>; "gluster-us...@gluster.org"
mailto:gluster-us...@gluster.org>>
Sent: 2/3/2015 7:12:34 PM
Subject: Re: [Gluster-devel] missing files
It sounds to me like the files were only copied to one replica, 
werent

there for the initial for the initial ls which triggered a self heal,
and were there for the last ls because they were healed. Is there any
chance that one of the replicas was down during the rsync? It could
be that you lost a brick during copy or something like that. To
confirm I would look for disconnects in the brick logs as well as
checking glusterfshd.log to verify the missing files were actu

Re: [Gluster-devel] [Gluster-users] missing files

2015-02-05 Thread Pranith Kumar Karampuri


On 02/05/2015 03:48 PM, Pranith Kumar Karampuri wrote:
I believe David already fixed this. I hope this is the same issue he 
told about permissions issue.

Oops, it is not. I will take a look.

Pranith


Pranith
On 02/05/2015 03:44 PM, Xavier Hernandez wrote:

Is the failure repeatable ? with the same directories ?

It's very weird that the directories appear on the volume when you do 
an 'ls' on the bricks. Could it be that you only made a single 'ls' 
on fuse mount which not showed the directory ? Is it possible that 
this 'ls' triggered a self-heal that repaired the problem, whatever 
it was, and when you did another 'ls' on the fuse mount after the 
'ls' on the bricks, the directories were there ?


The first 'ls' could have healed the files, causing that the 
following 'ls' on the bricks showed the files as if nothing were 
damaged. If that's the case, it's possible that there were some 
disconnections during the copy.


Added Pranith because he knows better replication and self-heal details.

Xavi

On 02/04/2015 07:23 PM, David F. Robinson wrote:

Distributed/replicated

Volume Name: homegfs
Type: Distributed-Replicate
Volume ID: 1e32672a-f1b7-4b58-ba94-58c085e59071
Status: Started
Number of Bricks: 4 x 2 = 8
Transport-type: tcp
Bricks:
Brick1: gfsib01a.corvidtec.com:/data/brick01a/homegfs
Brick2: gfsib01b.corvidtec.com:/data/brick01b/homegfs
Brick3: gfsib01a.corvidtec.com:/data/brick02a/homegfs
Brick4: gfsib01b.corvidtec.com:/data/brick02b/homegfs
Brick5: gfsib02a.corvidtec.com:/data/brick01a/homegfs
Brick6: gfsib02b.corvidtec.com:/data/brick01b/homegfs
Brick7: gfsib02a.corvidtec.com:/data/brick02a/homegfs
Brick8: gfsib02b.corvidtec.com:/data/brick02b/homegfs
Options Reconfigured:
performance.io-thread-count: 32
performance.cache-size: 128MB
performance.write-behind-window-size: 128MB
server.allow-insecure: on
network.ping-timeout: 10
storage.owner-gid: 100
geo-replication.indexing: off
geo-replication.ignore-pid-check: on
changelog.changelog: on
changelog.fsync-interval: 3
changelog.rollover-time: 15
server.manage-gids: on


-- Original Message --
From: "Xavier Hernandez" 
To: "David F. Robinson" ; "Benjamin
Turner" 
Cc: "gluster-us...@gluster.org" ; "Gluster
Devel" 
Sent: 2/4/2015 6:03:45 AM
Subject: Re: [Gluster-devel] missing files


On 02/04/2015 01:30 AM, David F. Robinson wrote:

Sorry. Thought about this a little more. I should have been clearer.
The files were on both bricks of the replica, not just one side. So,
both bricks had to have been up... The files/directories just 
don't show

up on the mount.
I was reading and saw a related bug
(https://bugzilla.redhat.com/show_bug.cgi?id=1159484). I saw it
suggested to run:
 find  -d -exec getfattr -h -n trusted.ec.heal {} \;


This command is specific for a dispersed volume. It won't do anything
(aside from the error you are seeing) on a replicated volume.

I think you are using a replicated volume, right ?

In this case I'm not sure what can be happening. Is your volume a pure
replicated one or a distributed-replicated ? on a pure replicated it
doesn't make sense that some entries do not show in an 'ls' when the
file is in both replicas (at least without any error message in the
logs). On a distributed-replicated it could be caused by some problem
while combining contents of each replica set.

What's the configuration of your volume ?

Xavi



I get a bunch of errors for operation not supported:
[root@gfs02a homegfs]# find wks_backup -d -exec getfattr -h -n
trusted.ec.heal {} \;
find: warning: the -d option is deprecated; please use -depth 
instead,

because the latter is a POSIX-compliant feature.
wks_backup/homer_backup/backup: trusted.ec.heal: Operation not 
supported
wks_backup/homer_backup/logs/2014_05_20.log: trusted.ec.heal: 
Operation

not supported
wks_backup/homer_backup/logs/2014_05_21.log: trusted.ec.heal: 
Operation

not supported
wks_backup/homer_backup/logs/2014_05_18.log: trusted.ec.heal: 
Operation

not supported
wks_backup/homer_backup/logs/2014_05_19.log: trusted.ec.heal: 
Operation

not supported
wks_backup/homer_backup/logs/2014_05_22.log: trusted.ec.heal: 
Operation

not supported
wks_backup/homer_backup/logs: trusted.ec.heal: Operation not 
supported

wks_backup/homer_backup: trusted.ec.heal: Operation not supported
-- Original Message --
From: "Benjamin Turner" mailto:bennytu...@gmail.com>>
To: "David F. Robinson" mailto:david.robin...@corvidtec.com>>
Cc: "Gluster Devel" mailto:gluster-devel@gluster.org>>; "gluster-us...@gluster.org"
mailto:gluster-us...@gluster.org>>
Sent: 2/3/2015 7:12:34 PM
Subject: Re: [Gluster-devel] missing files
It sounds to me like the files were only copied to one replica, 
werent
there for the initial for the initial ls which triggered

Re: [Gluster-devel] failed heal

2015-02-05 Thread Pranith Kumar Karampuri


- Original Message -
> From: "Niels de Vos" 
> To: "Pranith Kumar Karampuri" 
> Cc: gluster-us...@gluster.org, "Gluster Devel" 
> Sent: Friday, February 6, 2015 2:32:36 AM
> Subject: Re: [Gluster-devel] failed heal
> 
> On Thu, Feb 05, 2015 at 11:21:58AM +0530, Pranith Kumar Karampuri wrote:
> > 
> > On 02/04/2015 11:52 PM, David F. Robinson wrote:
> > >I don't recall if that was before or after my upgrade.
> > >I'll forward you an email thread for the current heal issues which are
> > >after the 3.6.2 upgrade...
> > This is executed after the upgrade on just one machine. 3.6.2 entry locks
> > are not compatible with versions <= 3.5.3 and 3.6.1 that is the reason.
> > From
> > 3.5.4 and releases >=3.6.2 it should work fine.
> 
> Oh, I was not aware of this requirement. Does it mean we should not mix
> deployments with these versions (what about 3.4?) any longer? 3.5.4 has
> not been released yet, so anyone with a mixed 3.5/3.6.2 environment will
> hit these issues? Is this only for the self-heal daemon, or are the
> triggered/stat self-heal procedures affected too?
> 
> It should be noted *very* clearly in the release notes, and I think an
> announcement (email+blog) as a warning/reminder would be good. Could you
> get some details and advice written down, please?
Will do today.

Pranith
> 
> Thanks,
> Niels
> 
> 
> > 
> > Pranith
> > >David
> > >-- Original Message --
> > >From: "Pranith Kumar Karampuri"  > ><mailto:pkara...@redhat.com>>
> > >To: "David F. Robinson"  > ><mailto:david.robin...@corvidtec.com>>; "gluster-us...@gluster.org"
> > >mailto:gluster-us...@gluster.org>>; "Gluster
> > >Devel" mailto:gluster-devel@gluster.org>>
> > >Sent: 2/4/2015 2:33:20 AM
> > >Subject: Re: [Gluster-devel] failed heal
> > >>
> > >>On 02/02/2015 03:34 AM, David F. Robinson wrote:
> > >>>I have several files that gluster says it cannot heal. I deleted the
> > >>>files from all of the bricks
> > >>>(/data/brick0*/hpc_shared/motorsports/gmics/Raven/p3/*) and ran a full
> > >>>heal using 'gluster volume heal homegfs full'.  Even after the full
> > >>>heal, the entries below still show up.
> > >>>How do I clear these?
> > >>3.6.1 Had an issue where files undergoing I/O will also be shown in the
> > >>output of 'gluster volume heal  info', we addressed that in
> > >>3.6.2. Is this output from 3.6.1 by any chance?
> > >>
> > >>Pranith
> > >>>[root@gfs01a ~]# gluster volume heal homegfs info
> > >>>Gathering list of entries to be healed on volume homegfs has been
> > >>>successful
> > >>>Brick gfsib01a.corvidtec.com:/data/brick01a/homegfs
> > >>>Number of entries: 10
> > >>>/hpc_shared/motorsports/gmics/Raven/p3/70_rke/Movies
> > >>>
> > >>>
> > >>>
> > >>>
> > >>>
> > >>>
> > >>>
> > >>>/hpc_shared/motorsports/gmics/Raven/p3/70_rke/.Convrg.swp
> > >>>/hpc_shared/motorsports/gmics/Raven/p3/70_rke
> > >>>Brick gfsib01b.corvidtec.com:/data/brick01b/homegfs
> > >>>Number of entries: 2
> > >>>
> > >>>/hpc_shared/motorsports/gmics/Raven/p3/70_rke
> > >>>Brick gfsib01a.corvidtec.com:/data/brick02a/homegfs
> > >>>Number of entries: 7
> > >>>
> > >>>/hpc_shared/motorsports/gmics/Raven/p3/70_rke/PICTURES/.tmpcheck
> > >>>/hpc_shared/motorsports/gmics/Raven/p3/70_rke/PICTURES
> > >>>/hpc_shared/motorsports/gmics/Raven/p3/70_rke/Movies
> > >>>
> > >>>
> > >>>
> > >>>Brick gfsib01b.corvidtec.com:/data/brick02b/homegfs
> > >>>Number of entries: 0
> > >>>Brick gfsib02a.corvidtec.com:/data/brick01a/homegfs
> > >>>Number of entries: 0
> > >>>Brick gfsib02b.corvidtec.com:/data/brick01b/homegfs
> > >>>Number of entries: 0
> > >>>Brick gfsib02a.corvidtec.com:/data/brick02a/homegfs
> > >>>Number of entries: 0
> > >>>Brick gfsib02b.corvidtec.com:/data/brick02b/homegfs
> > >>>Number of entries: 0
> > >>>===
> > >>>David F. Robinson, Ph.D.
> > >>>President - Corvid Technologies
> > >>>704.799.6944 x101 [office]
> > >>>704.252.1310 [cell]
> > >>>704.799.7974 [fax]
> > >>>david.robin...@corvidtec.com <mailto:david.robin...@corvidtec.com>
> > >>>http://www.corvidtechnologies.com <http://www.corvidtechnologies.com/>
> > >>>
> > >>>
> > >>>___
> > >>>Gluster-devel mailing list
> > >>>Gluster-devel@gluster.org
> > >>>http://www.gluster.org/mailman/listinfo/gluster-devel
> > >>
> > 
> 
> > ___
> > Gluster-devel mailing list
> > Gluster-devel@gluster.org
> > http://www.gluster.org/mailman/listinfo/gluster-devel
> 
> 
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] [Gluster-users] missing files

2015-02-05 Thread Pranith Kumar Karampuri


- Original Message -
> From: "Ben Turner" 
> To: "Pranith Kumar Karampuri" , "David F. Robinson" 
> 
> Cc: "Xavier Hernandez" , "Benjamin Turner" 
> , gluster-us...@gluster.org,
> "Gluster Devel" 
> Sent: Friday, February 6, 2015 3:25:28 AM
> Subject: Re: [Gluster-users] [Gluster-devel] missing files
> 
> - Original Message -
> > From: "Pranith Kumar Karampuri" 
> > To: "Xavier Hernandez" , "David F. Robinson"
> > , "Benjamin Turner"
> > 
> > Cc: gluster-us...@gluster.org, "Gluster Devel" 
> > Sent: Thursday, February 5, 2015 5:30:04 AM
> > Subject: Re: [Gluster-users] [Gluster-devel] missing files
> > 
> > 
> > On 02/05/2015 03:48 PM, Pranith Kumar Karampuri wrote:
> > > I believe David already fixed this. I hope this is the same issue he
> > > told about permissions issue.
> > Oops, it is not. I will take a look.
> 
> Yes David exactly like these:
> 
> data-brick02a-homegfs.log:[2015-02-03 19:09:34.568842] I
> [server.c:518:server_rpc_notify] 0-homegfs-server: disconnecting connection
> from
> gfs02a.corvidtec.com-18563-2015/02/03-19:07:58:519134-homegfs-client-2-0-0
> data-brick02a-homegfs.log:[2015-02-03 19:09:41.286551] I
> [server.c:518:server_rpc_notify] 0-homegfs-server: disconnecting connection
> from
> gfs01a.corvidtec.com-12804-2015/02/03-19:09:38:497808-homegfs-client-2-0-0
> data-brick02a-homegfs.log:[2015-02-03 19:16:35.906412] I
> [server.c:518:server_rpc_notify] 0-homegfs-server: disconnecting connection
> from
> gfs02b.corvidtec.com-27190-2015/02/03-19:15:53:458467-homegfs-client-2-0-0
> data-brick02a-homegfs.log:[2015-02-03 19:51:22.761293] I
> [server.c:518:server_rpc_notify] 0-homegfs-server: disconnecting connection
> from
> gfs01a.corvidtec.com-25926-2015/02/03-19:51:02:89070-homegfs-client-2-0-0
> data-brick02a-homegfs.log:[2015-02-03 20:54:02.772180] I
> [server.c:518:server_rpc_notify] 0-homegfs-server: disconnecting connection
> from
> gfs01b.corvidtec.com-4175-2015/02/02-16:44:31:179119-homegfs-client-2-0-1
> 
> You can 100% verify my theory if you can correlate the time on the
> disconnects to the time that the missing files were healed.  Can you have a
> look at /var/log/glusterfs/glustershd.log?  That has all of the healed files
> + timestamps, if we can see a disconnect during the rsync and a self heal of
> the missing file I think we can safely assume that the disconnects may have
> caused this.  I'll try this on my test systems, how much data did you rsync?
> What size ish of files / an idea of the dir layout?
> 
> @Pranith - Could bricks flapping up and down during the rsync cause the files
> to be missing on the first ls(written to 1 subvol but not the other cause it
> was down), the ls triggered SH, and thats why the files were there for the
> second ls be a possible cause here?

No it would be a bug. Afr should serve the directory contents from the brick 
with those files.

> 
> -b
> 
>  
> > Pranith
> > >
> > > Pranith
> > > On 02/05/2015 03:44 PM, Xavier Hernandez wrote:
> > >> Is the failure repeatable ? with the same directories ?
> > >>
> > >> It's very weird that the directories appear on the volume when you do
> > >> an 'ls' on the bricks. Could it be that you only made a single 'ls'
> > >> on fuse mount which not showed the directory ? Is it possible that
> > >> this 'ls' triggered a self-heal that repaired the problem, whatever
> > >> it was, and when you did another 'ls' on the fuse mount after the
> > >> 'ls' on the bricks, the directories were there ?
> > >>
> > >> The first 'ls' could have healed the files, causing that the
> > >> following 'ls' on the bricks showed the files as if nothing were
> > >> damaged. If that's the case, it's possible that there were some
> > >> disconnections during the copy.
> > >>
> > >> Added Pranith because he knows better replication and self-heal details.
> > >>
> > >> Xavi
> > >>
> > >> On 02/04/2015 07:23 PM, David F. Robinson wrote:
> > >>> Distributed/replicated
> > >>>
> > >>> Volume Name: homegfs
> > >>> Type: Distributed-Replicate
> > >>> Volume ID: 1e32672a-f1b7-4b58-ba94-58c085e59071
> > >>> Status: Started
> > >>> Number of Bricks: 4 x 2 = 8
> > >>> Transport-type: tcp
> > >>> Bric

Re: [Gluster-devel] missing files

2015-02-05 Thread Pranith Kumar Karampuri
- 2 sgilbert sbir 2974120 Jan 22 09:15 FEASABILITY STUDY.docx
-rwxrw 2 streadway sbir 3826704 Jan 21 14:57 FEASABILITY STUDY.one

/data/brick02b/homegfs/documentation/programs/OLD_PROGRAMS/SBIR_TOM/Phase_1_SOCOM14-003_adv_armor/References: 


total 0
drwxrws--- 2 root root 10 Feb 4 18:12 .
drwxrws--x 6 root root 95 Feb 4 18:12 ..

[root@gfs02a ~]# ls -alR 
/data/brick0*/homegfs/documentation/programs/OLD_PROGRAMS/SBIR_TOM/Phase_1_SOCOM14-003_adv_armor/References
/data/brick01a/homegfs/documentation/programs/OLD_PROGRAMS/SBIR_TOM/Phase_1_SOCOM14-003_adv_armor/References: 


total 0
drwxrws--- 3 root root 41 Feb 4 18:12 .
drwxrws--x 7 root root 118 Feb 4 18:12 ..
drwxrws--- 2 streadway sbir 80 Jan 23 14:46 USSOCOM_OPAQUE_ARMOR

/data/brick01a/homegfs/documentation/programs/OLD_PROGRAMS/SBIR_TOM/Phase_1_SOCOM14-003_adv_armor/References/USSOCOM_OPAQUE_ARMOR: 


total 72
drwxrws--- 2 streadway sbir 80 Jan 23 14:46 .
drwxrws--- 3 root root 41 Feb 4 18:12 ..
-rwxrw 2 streadway sbir 17248 Jun 19 2014 COMPARISON OF SOLUTIONS.one
-rwxrw 2 streadway sbir 49736 Jan 21 13:18 GIVEN TRADE SPACE.one

/data/brick02a/homegfs/documentation/programs/OLD_PROGRAMS/SBIR_TOM/Phase_1_SOCOM14-003_adv_armor/References: 


total 0
drwxrws--- 3 root root 41 Feb 4 18:12 .
drwxrws--x 7 root root 118 Feb 4 18:12 ..
drwxrws--- 2 streadway sbir 79 Jan 23 14:46 USSOCOM_OPAQUE_ARMOR

/data/brick02a/homegfs/documentation/programs/OLD_PROGRAMS/SBIR_TOM/Phase_1_SOCOM14-003_adv_armor/References/USSOCOM_OPAQUE_ARMOR: 


total 84
drwxrws--- 2 streadway sbir 79 Jan 23 14:46 .
drwxrws--- 3 root root 41 Feb 4 18:12 ..
-rwxrw 2 streadway sbir 42440 Jun 19 2014 ARMOR PACKAGES.one
-rwxrw 2 streadway sbir 38184 Jun 19 2014 CURRENT STANDARD 
ARMORING.one


[root@gfs02b ~]# ls -alR 
/data/brick0*/homegfs/documentation/programs/OLD_PROGRAMS/SBIR_TOM/Phase_1_SOCOM14-003_adv_armor/References
/data/brick01b/homegfs/documentation/programs/OLD_PROGRAMS/SBIR_TOM/Phase_1_SOCOM14-003_adv_armor/References: 


total 0
drwxrws--- 3 root root 41 Feb 4 18:12 .
drwxrws--x 7 root root 118 Feb 4 18:12 ..
drwxrws--- 2 streadway sbir 80 Jan 23 14:46 USSOCOM_OPAQUE_ARMOR

/data/brick01b/homegfs/documentation/programs/OLD_PROGRAMS/SBIR_TOM/Phase_1_SOCOM14-003_adv_armor/References/USSOCOM_OPAQUE_ARMOR: 


total 72
drwxrws--- 2 streadway sbir 80 Jan 23 14:46 .
drwxrws--- 3 root root 41 Feb 4 18:12 ..
-rwxrw 2 streadway sbir 17248 Jun 19 2014 COMPARISON OF SOLUTIONS.one
-rwxrw 2 streadway sbir 49736 Jan 21 13:18 GIVEN TRADE SPACE.one

/data/brick02b/homegfs/documentation/programs/OLD_PROGRAMS/SBIR_TOM/Phase_1_SOCOM14-003_adv_armor/References: 


total 0
drwxrws--- 3 root root 41 Feb 4 18:12 .
drwxrws--x 7 root root 118 Feb 4 18:12 ..
drwxrws--- 2 streadway sbir 79 Jan 23 14:46 USSOCOM_OPAQUE_ARMOR

/data/brick02b/homegfs/documentation/programs/OLD_PROGRAMS/SBIR_TOM/Phase_1_SOCOM14-003_adv_armor/References/USSOCOM_OPAQUE_ARMOR: 


total 84
drwxrws--- 2 streadway sbir 79 Jan 23 14:46 .
drwxrws--- 3 root root 41 Feb 4 18:12 ..
-rwxrw 2 streadway sbir 42440 Jun 19 2014 ARMOR PACKAGES.one
-rwxrw 2 streadway sbir 38184 Jun 19 2014 CURRENT STANDARD 
ARMORING.one






-- Original Message --
From: "Xavier Hernandez" 
To: "David F. Robinson" ; "Benjamin 
Turner" ; "Pranith Kumar Karampuri" 

Cc: "gluster-us...@gluster.org" ; "Gluster 
Devel" 

Sent: 2/5/2015 5:14:22 AM
Subject: Re: [Gluster-devel] missing files


Is the failure repeatable ? with the same directories ?

It's very weird that the directories appear on the volume when you do 
an 'ls' on the bricks. Could it be that you only made a single 'ls' 
on fuse mount which not showed the directory ? Is it possible that 
this 'ls' triggered a self-heal that repaired the problem, whatever 
it was, and when you did another 'ls' on the fuse mount after the 
'ls' on the bricks, the directories were there ?


The first 'ls' could have healed the files, causing that the 
following 'ls' on the bricks showed the files as if nothing were 
damaged. If that's the case, it's possible that there were some 
disconnections during the copy.


Added Pranith because he knows better replication and self-heal details.

Xavi

On 02/04/2015 07:23 PM, David F. Robinson wrote:

Distributed/replicated

Volume Name: homegfs
Type: Distributed-Replicate
Volume ID: 1e32672a-f1b7-4b58-ba94-58c085e59071
Status: Started
Number of Bricks: 4 x 2 = 8
Transport-type: tcp
Bricks:
Brick1: gfsib01a.corvidtec.com:/data/brick01a/homegfs
Brick2: gfsib01b.corvidtec.com:/data/brick01b/homegfs
Brick3: gfsib01a.corvidtec.com:/data/brick02a/homegfs
Brick4: gfsib01b.corvidtec.com:/data/brick02b/homegfs
Brick5: gfsib02a.corvidtec.com:/data/brick01a/homegfs
Brick6: gfsib02b.corvidtec.com:/data/brick01b/homegfs
Brick7: gfsib02a.corvidtec.com:/data/brick02a/homegfs
Brick8: gfsib02b.corvidtec.com:/data/bri

Re: [Gluster-devel] GlusterFS 4.0 Call For Participation

2015-02-11 Thread Pranith Kumar Karampuri


On 02/10/2015 03:42 AM, Jeff Darcy wrote:

Interest in 4.0 seems to be increasing.  So is developer activity, but
all of the developers involved in 4.0 are stretched a bit thin.  As a
result, some sub-projects still don't have anyone who's working on them
often enough to make significant progress.  The full list is here:

http://www.gluster.org/community/documentation/index.php/Planning40

In particular, the following sub-projects could benefit from more
volunteers:

* Multi-network support

* Composite operations (small-file performance)
I was thinking of an xlator for doing something similar. I will be happy 
to do this part. Post Feb though, is that fine?


Pranith


* All of the "other" stuff except code generation

I'm not going to pretend that any of these will be easy to pick up, but
I'd be glad to work with any volunteers to establish the necessary
knowledge baseline.  If you want to get in early and make your mark on
the codebase that will eventually replace some of that hoary old 3.x
cruft, please respond here or let me know some other way.
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] Fw: Re[2]: missing files

2015-02-11 Thread Pranith Kumar Karampuri


On 02/11/2015 08:36 AM, Shyam wrote:

Did some analysis with David today on this here is a gist for the list,

1) Volumes classified as slow (i.e with a lot of pre-existing data) 
and fast (new volumes carved from the same backend file system that 
slow bricks are on, with little or no data)


2) We ran an strace of tar and also collected io-stats outputs from 
these volumes, both show that create and mkdir is slower on slow as 
compared to the fast volume. This seems to be the overall reason for 
slowness.
Did you happen to do strace of the brick when this happened? If not, 
David, can we get that information as well?


Pranith


3) The tarball extraction is to a new directory on the gluster mount, 
so all lookups etc. happen within this new name space on the volume


4) Checked memory footprints of the slow bricks and fast bricks etc. 
nothing untoward noticed there


5) Restarted the slow volume, just as a test case to do things from 
scratch, no improvement in performance.


Currently attempting to reproduce this on a local system to see if the 
same behavior is seen so that it becomes easier to debug etc.


Others on the list can chime in as they see fit.

Thanks,
Shyam

On 02/10/2015 09:58 AM, David F. Robinson wrote:

Forwarding to devel list as recommended by Justin...

David


-- Forwarded Message --
From: "David F. Robinson" 
To: "Justin Clift" 
Sent: 2/10/2015 9:49:09 AM
Subject: Re[2]: [Gluster-devel] missing files

Bad news... I don't think it is the old linkto files. Bad because if
that was the issue, cleaning up all of bad linkto files would have fixed
the issue. It seems like the system just gets slower as you add data.

First, I setup a new clean volume (test2brick) on the same system as the
old one (homegfs_bkp). See 'gluster v info' below. I ran my simple tar
extraction test on the new volume and it took 58-seconds to complete
(which, BTW, is 10-seconds faster than my old non-gluster system, so
kudos). The time on homegfs_bkp is 19-minutes.

Next, I copied 10-terabytes of data over to test2brick and re-ran the
test which then took 7-minutes. I created a test3brick and ran the test
and it took 53-seconds.

To confirm all of this, I deleted all of the data from test2brick and
re-ran the test. It took 51-seconds!!!

BTW. I also checked the .glusterfs for stale linkto files (find . -type
f -size 0 -perm 1000 -exec ls -al {} \;). There are many, many thousands
of these types of files on the old volume and none on the new one, so I
don't think this is related to the performance issue.

Let me know how I should proceed. Send this to devel list? Pranith?
others? Thanks...

[root@gfs01bkp .glusterfs]# gluster volume info homegfs_bkp
Volume Name: homegfs_bkp
Type: Distribute
Volume ID: 96de8872-d957-4205-bf5a-076e3f35b294
Status: Started
Number of Bricks: 2
Transport-type: tcp
Bricks:
Brick1: gfsib01bkp.corvidtec.com:/data/brick01bkp/homegfs_bkp
Brick2: gfsib01bkp.corvidtec.com:/data/brick02bkp/homegfs_bkp

[root@gfs01bkp .glusterfs]# gluster volume info test2brick
Volume Name: test2brick
Type: Distribute
Volume ID: 123259b2-3c61-4277-a7e8-27c7ec15e550
Status: Started
Number of Bricks: 2
Transport-type: tcp
Bricks:
Brick1: gfsib01bkp.corvidtec.com:/data/brick01bkp/test2brick
Brick2: gfsib01bkp.corvidtec.com:/data/brick02bkp/test2brick

[root@gfs01bkp glusterfs]# gluster volume info test3brick
Volume Name: test3brick
Type: Distribute
Volume ID: 9b1613fc-f7e5-4325-8f94-e3611a5c3701
Status: Started
Number of Bricks: 2
Transport-type: tcp
Bricks:
Brick1: gfsib01bkp.corvidtec.com:/data/brick01bkp/test3brick
Brick2: gfsib01bkp.corvidtec.com:/data/brick02bkp/test3brick


 From homegfs_bkp:
# find . -type f -size 0 -perm 1000 -exec ls -al {} \;
T 2 gmathur pme_ics 0 Jan 9 16:59
./00/16/00169a69-1a7a-44c9-b2d8-991671ee87c4
-T 3 jcowan users 0 Jan 9 17:51
./00/16/0016a0a0-fd22-4fb5-b6fb-5d7f9024ab74
-T 2 morourke sbir 0 Jan 9 18:17
./00/16/0016b36f-32fc-4f2c-accd-e36be2f6c602
-T 2 carpentr irl 0 Jan 9 18:52
./00/16/00163faf-741c-4e40-8081-784786b3cc71
-T 3 601 raven 0 Jan 9 22:49
./00/16/00163385-a332-4050-8104-1b1af6cd8249
-T 3 bangell sbir 0 Jan 9 22:56
./00/16/00167803-0244-46de-8246-d9c382dd3083
-T 2 morourke sbir 0 Jan 9 23:17
./00/16/00167bc5-fc56-42ee-9e3f-1e238f3828f4
-T 3 morourke sbir 0 Jan 9 23:34
./00/16/0016a71e-89cf-4a86-9575-49c7e9d216c6
-T 2 gmathur users 0 Jan 9 23:47
./00/16/00168aa2-d069-4a77-8790-e36431324ca5
-T 2 bangell users 0 Jan 22 09:24
./00/16/0016e720-a190-4e43-962f-aa3e4216e5f5
-T 2 root root 0 Jan 22 09:26
./00/16/00169e95-64b7-455c-82dc-d9940ee7fe43
-T 2 dfrobins users 0 Jan 22 09:27
./00/16/00161b04-1612-4fba-99a4-2a2b54062fdb
-T 2 mdick users 0 Jan 22 09:27
./00/16/0016ba60-310a-4bee-968a-36eb290e8c9e
-T 2 dfrobins users 0 Jan 22 09:43
./00/16/00160315-1533-4290-8c1a-72e2fbb1962a
 From test2brick:
find . -type f -size 0 -perm 1000 -exec ls -al {} \;






Re: [Gluster-devel] Fw: Re[2]: missing files

2015-02-11 Thread Pranith Kumar Karampuri


On 02/11/2015 06:49 PM, Pranith Kumar Karampuri wrote:


On 02/11/2015 08:36 AM, Shyam wrote:

Did some analysis with David today on this here is a gist for the list,

1) Volumes classified as slow (i.e with a lot of pre-existing data) 
and fast (new volumes carved from the same backend file system that 
slow bricks are on, with little or no data)


2) We ran an strace of tar and also collected io-stats outputs from 
these volumes, both show that create and mkdir is slower on slow as 
compared to the fast volume. This seems to be the overall reason for 
slowness.
Did you happen to do strace of the brick when this happened? If not, 
David, can we get that information as well?
It would be nice to compare the difference in syscalls of the bricks of 
two volumes to see if there are any extra syscalls that is adding to the 
delay.


Pranith


Pranith


3) The tarball extraction is to a new directory on the gluster mount, 
so all lookups etc. happen within this new name space on the volume


4) Checked memory footprints of the slow bricks and fast bricks etc. 
nothing untoward noticed there


5) Restarted the slow volume, just as a test case to do things from 
scratch, no improvement in performance.


Currently attempting to reproduce this on a local system to see if 
the same behavior is seen so that it becomes easier to debug etc.


Others on the list can chime in as they see fit.

Thanks,
Shyam

On 02/10/2015 09:58 AM, David F. Robinson wrote:

Forwarding to devel list as recommended by Justin...

David


-- Forwarded Message --
From: "David F. Robinson" 
To: "Justin Clift" 
Sent: 2/10/2015 9:49:09 AM
Subject: Re[2]: [Gluster-devel] missing files

Bad news... I don't think it is the old linkto files. Bad because if
that was the issue, cleaning up all of bad linkto files would have 
fixed

the issue. It seems like the system just gets slower as you add data.

First, I setup a new clean volume (test2brick) on the same system as 
the

old one (homegfs_bkp). See 'gluster v info' below. I ran my simple tar
extraction test on the new volume and it took 58-seconds to complete
(which, BTW, is 10-seconds faster than my old non-gluster system, so
kudos). The time on homegfs_bkp is 19-minutes.

Next, I copied 10-terabytes of data over to test2brick and re-ran the
test which then took 7-minutes. I created a test3brick and ran the test
and it took 53-seconds.

To confirm all of this, I deleted all of the data from test2brick and
re-ran the test. It took 51-seconds!!!

BTW. I also checked the .glusterfs for stale linkto files (find . -type
f -size 0 -perm 1000 -exec ls -al {} \;). There are many, many 
thousands

of these types of files on the old volume and none on the new one, so I
don't think this is related to the performance issue.

Let me know how I should proceed. Send this to devel list? Pranith?
others? Thanks...

[root@gfs01bkp .glusterfs]# gluster volume info homegfs_bkp
Volume Name: homegfs_bkp
Type: Distribute
Volume ID: 96de8872-d957-4205-bf5a-076e3f35b294
Status: Started
Number of Bricks: 2
Transport-type: tcp
Bricks:
Brick1: gfsib01bkp.corvidtec.com:/data/brick01bkp/homegfs_bkp
Brick2: gfsib01bkp.corvidtec.com:/data/brick02bkp/homegfs_bkp

[root@gfs01bkp .glusterfs]# gluster volume info test2brick
Volume Name: test2brick
Type: Distribute
Volume ID: 123259b2-3c61-4277-a7e8-27c7ec15e550
Status: Started
Number of Bricks: 2
Transport-type: tcp
Bricks:
Brick1: gfsib01bkp.corvidtec.com:/data/brick01bkp/test2brick
Brick2: gfsib01bkp.corvidtec.com:/data/brick02bkp/test2brick

[root@gfs01bkp glusterfs]# gluster volume info test3brick
Volume Name: test3brick
Type: Distribute
Volume ID: 9b1613fc-f7e5-4325-8f94-e3611a5c3701
Status: Started
Number of Bricks: 2
Transport-type: tcp
Bricks:
Brick1: gfsib01bkp.corvidtec.com:/data/brick01bkp/test3brick
Brick2: gfsib01bkp.corvidtec.com:/data/brick02bkp/test3brick


 From homegfs_bkp:
# find . -type f -size 0 -perm 1000 -exec ls -al {} \;
T 2 gmathur pme_ics 0 Jan 9 16:59
./00/16/00169a69-1a7a-44c9-b2d8-991671ee87c4
-T 3 jcowan users 0 Jan 9 17:51
./00/16/0016a0a0-fd22-4fb5-b6fb-5d7f9024ab74
-T 2 morourke sbir 0 Jan 9 18:17
./00/16/0016b36f-32fc-4f2c-accd-e36be2f6c602
-T 2 carpentr irl 0 Jan 9 18:52
./00/16/00163faf-741c-4e40-8081-784786b3cc71
-T 3 601 raven 0 Jan 9 22:49
./00/16/00163385-a332-4050-8104-1b1af6cd8249
-T 3 bangell sbir 0 Jan 9 22:56
./00/16/00167803-0244-46de-8246-d9c382dd3083
-T 2 morourke sbir 0 Jan 9 23:17
./00/16/00167bc5-fc56-42ee-9e3f-1e238f3828f4
-T 3 morourke sbir 0 Jan 9 23:34
./00/16/0016a71e-89cf-4a86-9575-49c7e9d216c6
-T 2 gmathur users 0 Jan 9 23:47
./00/16/00168aa2-d069-4a77-8790-e36431324ca5
-T 2 bangell users 0 Jan 22 09:24
./00/16/0016e720-a190-4e43-962f-aa3e4216e5f5
-T 2 root root 0 Jan 22 09:26
./00/16/00169e95-64b7-455c-82dc-d9940ee7fe43
-T 2 dfrobins users 0 Jan 22 09:27
./00/16/00161b04-1612-4

Re: [Gluster-devel] missing files

2015-02-12 Thread Pranith Kumar Karampuri


On 02/12/2015 09:14 AM, Justin Clift wrote:

On 12 Feb 2015, at 03:02, Shyam  wrote:

On 02/11/2015 08:28 AM, David F. Robinson wrote:

My base filesystem has 40-TB and the tar takes 19 minutes. I copied over 10-TB 
and it took the tar extraction from 1-minute to 7-minutes.

My suspicion is that it is related to number of files and not necessarily file 
size. Shyam is looking into reproducing this behavior on a redhat system.

I am able to reproduce the issue on a similar setup internally (at least at the 
surface it seems to be similar to what David is facing).

I will continue the investigation for the root cause.
Here is the initial analysis of my investigation: (Thanks for providing 
me with the setup shyam, keep the setup we may need it for further analysis)


On bad volume:
 %-latency   Avg-latency   Min-Latency   Max-Latency   No. of 
calls Fop
 -   ---   ---   --- 


  0.00   0.00 us   0.00 us   0.00 us 937104  FORGET
  0.00   0.00 us   0.00 us   0.00 us 872478 RELEASE
  0.00   0.00 us   0.00 us   0.00 us  23668 
RELEASEDIR

  0.00  41.86 us  23.00 us  86.00 us 92STAT
  0.01  39.40 us  24.00 us 104.00 us 218  STATFS
  0.28  55.99 us  43.00 us1152.00 us 4065SETXATTR
  0.58  56.89 us  25.00 us4505.00 us 8236 OPENDIR
  0.73  26.80 us  11.00 us 257.00 us 22238   FLUSH
  0.77 152.83 us  92.00 us8819.00 us 4065   RMDIR
  2.57  62.00 us  21.00 us 409.00 us 33643   WRITE
  5.46 199.16 us 108.00 us  469938.00 us 22238  UNLINK
  6.70  69.83 us  43.00 us.00 us 77809  LOOKUP
  6.97 447.60 us  21.00 us   54875.00 us 12631READDIRP
  7.73  79.42 us  33.00 us1535.00 us 78909 SETATTR
 14.112815.00 us 176.00 us 2106305.00 us 4065   MKDIR
 54.091972.62 us 138.00 us 1520773.00 us 22238  CREATE

On good volume:
 %-latency   Avg-latency   Min-Latency   Max-Latency   No. of 
calls Fop
 -   ---   ---   --- 


  0.00   0.00 us   0.00 us   0.00 us 58870  FORGET
  0.00   0.00 us   0.00 us   0.00 us 66016 RELEASE
  0.00   0.00 us   0.00 us   0.00 us  16480 
RELEASEDIR

  0.00  61.50 us  58.00 us  65.00 us 2OPEN
  0.01  39.56 us  16.00 us 112.00 us 71STAT
  0.02  41.29 us  27.00 us  79.00 us 163  STATFS
  0.03  36.06 us  17.00 us  98.00 us 301   FSTAT
  0.79  62.38 us  39.00 us 269.00 us 4065SETXATTR
  1.14 242.99 us  25.00 us   28636.00 us 1497READ
  1.54  59.76 us  25.00 us6325.00 us 8236 OPENDIR
  1.70 133.75 us  89.00 us 374.00 us 4065   RMDIR
  2.25  32.65 us  15.00 us 265.00 us 22006   FLUSH
  3.37 265.05 us 172.00 us2349.00 us 4065   MKDIR
  7.14  68.34 us  21.00 us   21902.00 us 33357   WRITE
 11.00 159.68 us 107.00 us2567.00 us 22003  UNLINK
 13.82 200.54 us 133.00 us   21762.00 us 22003  CREATE
 17.85 448.85 us  22.00 us   54046.00 us 12697READDIRP
 18.37  76.12 us  45.00 us 294.00 us 77044  LOOKUP
 20.95  85.54 us  35.00 us1404.00 us 78204 SETATTR

As we can see here, FORGET/RELEASE are way more in the brick from full 
volume compared to the brick from empty volume. It seems to suggest that 
the inode-table on the volume with lots of data is carrying too many 
passive inodes in the table which need to be displaced to create new 
ones. Need to check if they come in the fop-path. Need to continue my 
investigations further, will let you know.


Pranith

Thanks Shyam. :)

+ Justin

--
GlusterFS - http://www.gluster.org

An open source, distributed file system scaling to several
petabytes, and handling thousands of clients.

My personal twitter: twitter.com/realjustinclift

___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] missing files

2015-02-12 Thread Pranith Kumar Karampuri


On 02/12/2015 03:05 PM, Pranith Kumar Karampuri wrote:


On 02/12/2015 09:14 AM, Justin Clift wrote:

On 12 Feb 2015, at 03:02, Shyam  wrote:

On 02/11/2015 08:28 AM, David F. Robinson wrote:
My base filesystem has 40-TB and the tar takes 19 minutes. I copied 
over 10-TB and it took the tar extraction from 1-minute to 7-minutes.


My suspicion is that it is related to number of files and not 
necessarily file size. Shyam is looking into reproducing this 
behavior on a redhat system.
I am able to reproduce the issue on a similar setup internally (at 
least at the surface it seems to be similar to what David is facing).


I will continue the investigation for the root cause.
Here is the initial analysis of my investigation: (Thanks for 
providing me with the setup shyam, keep the setup we may need it for 
further analysis)


On bad volume:
 %-latency   Avg-latency   Min-Latency   Max-Latency   No. of 
calls Fop
 -   ---   ---   --- 


  0.00   0.00 us   0.00 us   0.00 us 937104 FORGET
  0.00   0.00 us   0.00 us   0.00 us 872478 RELEASE
  0.00   0.00 us   0.00 us   0.00 us 23668 RELEASEDIR
  0.00  41.86 us  23.00 us  86.00 us 92 STAT
  0.01  39.40 us  24.00 us 104.00 us 218 STATFS
  0.28  55.99 us  43.00 us1152.00 us 4065 SETXATTR
  0.58  56.89 us  25.00 us4505.00 us 8236 OPENDIR
  0.73  26.80 us  11.00 us 257.00 us 22238 FLUSH
  0.77 152.83 us  92.00 us8819.00 us 4065 RMDIR
  2.57  62.00 us  21.00 us 409.00 us 33643 WRITE
  5.46 199.16 us 108.00 us  469938.00 us 22238 UNLINK
  6.70  69.83 us  43.00 us.00 us 77809 LOOKUP
  6.97 447.60 us  21.00 us   54875.00 us 12631 READDIRP
  7.73  79.42 us  33.00 us1535.00 us 78909 SETATTR
 14.112815.00 us 176.00 us 2106305.00 us 4065 MKDIR
 54.091972.62 us 138.00 us 1520773.00 us 22238 CREATE

On good volume:
 %-latency   Avg-latency   Min-Latency   Max-Latency   No. of 
calls Fop
 -   ---   ---   --- 


  0.00   0.00 us   0.00 us   0.00 us 58870 FORGET
  0.00   0.00 us   0.00 us   0.00 us 66016 RELEASE
  0.00   0.00 us   0.00 us   0.00 us 16480 RELEASEDIR
  0.00  61.50 us  58.00 us  65.00 us 2OPEN
  0.01  39.56 us  16.00 us 112.00 us 71 STAT
  0.02  41.29 us  27.00 us  79.00 us 163 STATFS
  0.03  36.06 us  17.00 us  98.00 us 301 FSTAT
  0.79  62.38 us  39.00 us 269.00 us 4065 SETXATTR
  1.14 242.99 us  25.00 us   28636.00 us 1497 READ
  1.54  59.76 us  25.00 us6325.00 us 8236 OPENDIR
  1.70 133.75 us  89.00 us 374.00 us 4065 RMDIR
  2.25  32.65 us  15.00 us 265.00 us 22006 FLUSH
  3.37 265.05 us 172.00 us2349.00 us 4065 MKDIR
  7.14  68.34 us  21.00 us   21902.00 us 33357 WRITE
 11.00 159.68 us 107.00 us2567.00 us 22003 UNLINK
 13.82 200.54 us 133.00 us   21762.00 us 22003 CREATE
 17.85 448.85 us  22.00 us   54046.00 us 12697 READDIRP
 18.37  76.12 us  45.00 us 294.00 us 77044 LOOKUP
 20.95  85.54 us  35.00 us1404.00 us 78204 SETATTR

As we can see here, FORGET/RELEASE are way more in the brick from full 
volume compared to the brick from empty volume. It seems to suggest 
that the inode-table on the volume with lots of data is carrying too 
many passive inodes in the table which need to be displaced to create 
new ones. Need to check if they come in the fop-path. Need to continue 
my investigations further, will let you know.
Just to increase confidence performed one more test. Stopped the volumes 
and re-started. Now on both the volumes, the numbers are almost same:


[root@gqac031 gluster-mount]# time rm -rf boost_1_57_0 ; time tar xf 
boost_1_57_0.tar.gz


real1m15.074s
user0m0.550s
sys 0m4.656s

real2m46.866s
user0m5.347s
sys 0m16.047s

[root@gqac031 gluster-mount]# cd /gluster-emptyvol/
[root@gqac031 gluster-emptyvol]# ls
boost_1_57_0.tar.gz
[root@gqac031 gluster-emptyvol]# time tar xf boost_1_57_0.tar.gz

real2m31.467s
user0m5.475s
sys 0m15.471s

gqas015.sbu.lab.eng.bos.redhat.com:testvol on /gluster-mount type 
fuse.glusterfs (rw,default_permissions,allow_other,max_read=131072)
gqas015.sbu.lab.eng.bos.redhat.com:emotyvol on /gluster-emptyvol type 
fuse.glusterfs (rw,default_permissions,allow_other,max_read=131072)


Pranith


Pranith

Thanks Shyam. :)

+ Justin

--
GlusterFS - http://www.gluster.org

An open source, distributed file system scaling to several
petabytes, and handling thousands of clients.

My personal twitter: twitter.com/realjustinclift

Re: [Gluster-devel] missing files

2015-02-12 Thread Pranith Kumar Karampuri


On 02/12/2015 04:52 PM, Pranith Kumar Karampuri wrote:


On 02/12/2015 03:05 PM, Pranith Kumar Karampuri wrote:


On 02/12/2015 09:14 AM, Justin Clift wrote:

On 12 Feb 2015, at 03:02, Shyam  wrote:

On 02/11/2015 08:28 AM, David F. Robinson wrote:
My base filesystem has 40-TB and the tar takes 19 minutes. I 
copied over 10-TB and it took the tar extraction from 1-minute to 
7-minutes.


My suspicion is that it is related to number of files and not 
necessarily file size. Shyam is looking into reproducing this 
behavior on a redhat system.
I am able to reproduce the issue on a similar setup internally (at 
least at the surface it seems to be similar to what David is facing).


I will continue the investigation for the root cause.
Here is the initial analysis of my investigation: (Thanks for 
providing me with the setup shyam, keep the setup we may need it for 
further analysis)


On bad volume:
 %-latency   Avg-latency   Min-Latency   Max-Latency   No. of 
calls Fop
 -   ---   ---   --- 


  0.00   0.00 us   0.00 us   0.00 us 937104 FORGET
  0.00   0.00 us   0.00 us   0.00 us 872478 RELEASE
  0.00   0.00 us   0.00 us   0.00 us 23668 RELEASEDIR
  0.00  41.86 us  23.00 us  86.00 us 92 STAT
  0.01  39.40 us  24.00 us 104.00 us 218 STATFS
  0.28  55.99 us  43.00 us1152.00 us 4065 SETXATTR
  0.58  56.89 us  25.00 us4505.00 us 8236 OPENDIR
  0.73  26.80 us  11.00 us 257.00 us 22238 FLUSH
  0.77 152.83 us  92.00 us8819.00 us 4065 RMDIR
  2.57  62.00 us  21.00 us 409.00 us 33643 WRITE
  5.46 199.16 us 108.00 us  469938.00 us 22238 UNLINK
  6.70  69.83 us  43.00 us.00 us 77809 LOOKUP
  6.97 447.60 us  21.00 us   54875.00 us 12631 READDIRP
  7.73  79.42 us  33.00 us1535.00 us 78909 SETATTR
 14.112815.00 us 176.00 us 2106305.00 us 4065 MKDIR
 54.091972.62 us 138.00 us 1520773.00 us 22238 CREATE

On good volume:
 %-latency   Avg-latency   Min-Latency   Max-Latency   No. of 
calls Fop
 -   ---   ---   --- 


  0.00   0.00 us   0.00 us   0.00 us 58870 FORGET
  0.00   0.00 us   0.00 us   0.00 us 66016 RELEASE
  0.00   0.00 us   0.00 us   0.00 us 16480 RELEASEDIR
  0.00  61.50 us  58.00 us  65.00 us 2 OPEN
  0.01  39.56 us  16.00 us 112.00 us 71 STAT
  0.02  41.29 us  27.00 us  79.00 us 163 STATFS
  0.03  36.06 us  17.00 us  98.00 us 301 FSTAT
  0.79  62.38 us  39.00 us 269.00 us 4065 SETXATTR
  1.14 242.99 us  25.00 us   28636.00 us 1497 READ
  1.54  59.76 us  25.00 us6325.00 us 8236 OPENDIR
  1.70 133.75 us  89.00 us 374.00 us 4065 RMDIR
  2.25  32.65 us  15.00 us 265.00 us 22006 FLUSH
  3.37 265.05 us 172.00 us2349.00 us 4065 MKDIR
  7.14  68.34 us  21.00 us   21902.00 us 33357 WRITE
 11.00 159.68 us 107.00 us2567.00 us 22003 UNLINK
 13.82 200.54 us 133.00 us   21762.00 us 22003 CREATE
 17.85 448.85 us  22.00 us   54046.00 us 12697 READDIRP
 18.37  76.12 us  45.00 us 294.00 us 77044 LOOKUP
 20.95  85.54 us  35.00 us1404.00 us 78204 SETATTR

As we can see here, FORGET/RELEASE are way more in the brick from 
full volume compared to the brick from empty volume. It seems to 
suggest that the inode-table on the volume with lots of data is 
carrying too many passive inodes in the table which need to be 
displaced to create new ones. Need to check if they come in the 
fop-path. Need to continue my investigations further, will let you know.
Just to increase confidence performed one more test. Stopped the 
volumes and re-started. Now on both the volumes, the numbers are 
almost same:


[root@gqac031 gluster-mount]# time rm -rf boost_1_57_0 ; time tar xf 
boost_1_57_0.tar.gz


real1m15.074s
user0m0.550s
sys 0m4.656s

real2m46.866s
user0m5.347s
sys 0m16.047s

[root@gqac031 gluster-mount]# cd /gluster-emptyvol/
[root@gqac031 gluster-emptyvol]# ls
boost_1_57_0.tar.gz
[root@gqac031 gluster-emptyvol]# time tar xf boost_1_57_0.tar.gz

real2m31.467s
user0m5.475s
sys 0m15.471s

gqas015.sbu.lab.eng.bos.redhat.com:testvol on /gluster-mount type 
fuse.glusterfs (rw,default_permissions,allow_other,max_read=131072)
gqas015.sbu.lab.eng.bos.redhat.com:emotyvol on /gluster-emptyvol type 
fuse.glusterfs (rw,default_permissions,allow_other,max_read=131072)
I just checked that inode_link links the inode and calls 
inode_table_prune which triggers these inode_forgets as a synchronous 
operation in the fop path.


Pranith


Pranith


Pranith

Thanks Shyam

Re: [Gluster-devel] Problems with ec/nfs.t in regression tests

2015-02-12 Thread Pranith Kumar Karampuri


On 02/12/2015 08:15 PM, Xavier Hernandez wrote:

I've made some more investigation and the problem seems worse.

It seems that NFS sends a huge amount of requests without waiting for 
answers (I've had more than 1400 requests ongoing). Probably there 
will be many factors that can influence on the load that this causes, 
and one of them could be ec, but it's not related exclusively to ec. 
I've repeated the test using a replica 3 and a replica 2 volumes and 
the problem still happens.


The test basically writes a file to an NFS mount using 'dd'. The file 
has a size of 1GB. With a smaller file, the test passes successfully.
Using NFS client and gluster NFS server on same machine with BIG file dd 
operations is known to cause hangs. anon-fd-quota.t used to give similar 
problems so we changed the test to not involve NFS mounts.


Pranith


One important thing to note is that I'm not using powerful servers (a 
dual core Intel Atom), but this problem shouldn't happen anyway. It 
can even happen on more powerful servers if they are busy doing other 
things (maybe this is what's happening on jenkins' slaves).


I think that this causes some NFS requests to timeout. This can be 
seen in /var/log/messages (there are many of these messages):


Feb 12 15:18:45 celler01 kernel: nfs: server gf01.datalab.es not 
responding, timed out


nfs log also has many errors:

[2015-02-12 14:18:45.132905] E [rpcsvc.c:1257:rpcsvc_submit_generic] 
0-rpc-service: failed to submit message (XID: 0x7be78dbe, Program: 
NFS3, ProgVers: 3, Proc: 7) to rpc

-transport (socket.nfs-server)
[2015-02-12 14:18:45.133009] E [nfs3.c:565:nfs3svc_submit_reply] 
0-nfs-nfsv3: Reply submission failed


Additionally this causes disconnections from NFS that are not 
correctly handled causing that a thread gets stuck in an infinite loop 
(I haven't analyzed this problem deeply, but it seems like an attempt 
to use an already disconnected socket). After a while, I get this 
error on the nfs log:


[2015-02-12 14:20:19.545429] C 
[rpc-clnt-ping.c:109:rpc_clnt_ping_timer_expired] 0-patchy-client-0: 
server 192.168.200.61:49152 has not responded in the last 42 seconds, 
disconnecting.


The console executing the test shows this (nfs.t is creating a replica 
3 instead of a dispersed volume):


# ./run-tests.sh tests/basic/ec/nfs.t

... GlusterFS Test Framework ...

Running tests in file ./tests/basic/ec/nfs.t
[14:12:52] ./tests/basic/ec/nfs.t .. 8/10 dd: error writing 
‘/mnt/nfs/0/test’: Input/output error

[14:12:52] ./tests/basic/ec/nfs.t .. 9/10
not ok 9
[14:12:52] ./tests/basic/ec/nfs.t .. Failed 1/10 subtests
[14:27:41]

Test Summary Report
---
./tests/basic/ec/nfs.t (Wstat: 0 Tests: 10 Failed: 1)
  Failed test:  9
Files=1, Tests=10, 889 wallclock secs ( 0.13 usr  0.02 sys +  1.29 
cusr  3.45 csys =  4.89 CPU)

Result: FAIL
Failed tests  ./tests/basic/ec/nfs.t

Note that the test takes almost 15 minutes to complete.

Is there any way to limit the number of requests NFS sends without 
having an answer ?


Xavi

On 02/11/2015 04:20 PM, Shyam wrote:

On 02/11/2015 09:40 AM, Xavier Hernandez wrote:

Hi,

it seems that there are some failures in ec/nfs.t test on regression
tests. Doing some investigation I've found that before applying the
multi-threaded patch (commit 5e25569e) the problem does not seem to
happen.


This has in interesting history in failures, on the regression runs for
the MT epoll this (i.e ec/nfs.t) did not fail (there were others, but
not nfs.t).

The patch that allows configuration of MT epoll is where this started
failing around Feb 5th (but later passed). (see patchset 7 failures on,
http://review.gluster.org/#/c/9488/ )

I state the above, as it may help narrowing down the changes in EC
(maybe) that could have caused it.

Also in the latter commit, there was an error configuring the number of
threads so all regression runs would have run with a single epoll thread
(the MT epoll patch had this hard coded, so that would have run with 2
threads, but did not show up the issue (patch:
http://review.gluster.org/#/c/3842/)).

Again I state the above, as this should not be exposing a
race/bug/problem due to the multi threaded nature of epoll, but of
course needs investigation.



I'm not sure if this patch is the cause or it has revealed some bug in
ec or any other xlator.


I guess we can reproduce this issue? If so I would try setting
client.event-threads on master branch to 1, restarting the volume and
then running the test (as a part of the test itself maybe) to eliminate
the possibility that MT epoll is causing it.

My belief on MT epoll causing it is in doubt as the runs failed on the
http://review.gluster.org/#/c/9488/ (configuration patch), which had the
thread count as 1 due to a bug in that code.



I can try to identify it (any help will be appreciated), but it may 
take

some time. Would it be better to remove the test in the meantime ?


I am checking if this is reproducible on my machine, so that I can
possib

Re: [Gluster-devel] Problems with ec/nfs.t in regression tests

2015-02-12 Thread Pranith Kumar Karampuri


On 02/12/2015 11:34 PM, Pranith Kumar Karampuri wrote:


On 02/12/2015 08:15 PM, Xavier Hernandez wrote:

I've made some more investigation and the problem seems worse.

It seems that NFS sends a huge amount of requests without waiting for 
answers (I've had more than 1400 requests ongoing). Probably there 
will be many factors that can influence on the load that this causes, 
and one of them could be ec, but it's not related exclusively to ec. 
I've repeated the test using a replica 3 and a replica 2 volumes and 
the problem still happens.


The test basically writes a file to an NFS mount using 'dd'. The file 
has a size of 1GB. With a smaller file, the test passes successfully.
Using NFS client and gluster NFS server on same machine with BIG file 
dd operations is known to cause hangs. anon-fd-quota.t used to give 
similar problems so we changed the test to not involve NFS mounts.
I don't re-collect the exact scenario. Avati found the deadlock of 
memory allocation, when I just joined gluster, in 2010. Raghavendra Bhat 
raised this bug then. CCed him to the thread as well if he knows the 
exact scenario.


Pranith


Pranith


One important thing to note is that I'm not using powerful servers (a 
dual core Intel Atom), but this problem shouldn't happen anyway. It 
can even happen on more powerful servers if they are busy doing other 
things (maybe this is what's happening on jenkins' slaves).


I think that this causes some NFS requests to timeout. This can be 
seen in /var/log/messages (there are many of these messages):


Feb 12 15:18:45 celler01 kernel: nfs: server gf01.datalab.es not 
responding, timed out


nfs log also has many errors:

[2015-02-12 14:18:45.132905] E [rpcsvc.c:1257:rpcsvc_submit_generic] 
0-rpc-service: failed to submit message (XID: 0x7be78dbe, Program: 
NFS3, ProgVers: 3, Proc: 7) to rpc

-transport (socket.nfs-server)
[2015-02-12 14:18:45.133009] E [nfs3.c:565:nfs3svc_submit_reply] 
0-nfs-nfsv3: Reply submission failed


Additionally this causes disconnections from NFS that are not 
correctly handled causing that a thread gets stuck in an infinite 
loop (I haven't analyzed this problem deeply, but it seems like an 
attempt to use an already disconnected socket). After a while, I get 
this error on the nfs log:


[2015-02-12 14:20:19.545429] C 
[rpc-clnt-ping.c:109:rpc_clnt_ping_timer_expired] 0-patchy-client-0: 
server 192.168.200.61:49152 has not responded in the last 42 seconds, 
disconnecting.


The console executing the test shows this (nfs.t is creating a 
replica 3 instead of a dispersed volume):


# ./run-tests.sh tests/basic/ec/nfs.t

... GlusterFS Test Framework ...

Running tests in file ./tests/basic/ec/nfs.t
[14:12:52] ./tests/basic/ec/nfs.t .. 8/10 dd: error writing 
‘/mnt/nfs/0/test’: Input/output error

[14:12:52] ./tests/basic/ec/nfs.t .. 9/10
not ok 9
[14:12:52] ./tests/basic/ec/nfs.t .. Failed 1/10 subtests
[14:27:41]

Test Summary Report
---
./tests/basic/ec/nfs.t (Wstat: 0 Tests: 10 Failed: 1)
Failed test: 9
Files=1, Tests=10, 889 wallclock secs ( 0.13 usr 0.02 sys + 1.29 cusr 
3.45 csys = 4.89 CPU)

Result: FAIL
Failed tests ./tests/basic/ec/nfs.t

Note that the test takes almost 15 minutes to complete.

Is there any way to limit the number of requests NFS sends without 
having an answer ?


Xavi

On 02/11/2015 04:20 PM, Shyam wrote:

On 02/11/2015 09:40 AM, Xavier Hernandez wrote:

Hi,

it seems that there are some failures in ec/nfs.t test on regression
tests. Doing some investigation I've found that before applying the
multi-threaded patch (commit 5e25569e) the problem does not seem to
happen.


This has in interesting history in failures, on the regression runs for
the MT epoll this (i.e ec/nfs.t) did not fail (there were others, but
not nfs.t).

The patch that allows configuration of MT epoll is where this started
failing around Feb 5th (but later passed). (see patchset 7 failures on,
http://review.gluster.org/#/c/9488/ )

I state the above, as it may help narrowing down the changes in EC
(maybe) that could have caused it.

Also in the latter commit, there was an error configuring the number of
threads so all regression runs would have run with a single epoll 
thread

(the MT epoll patch had this hard coded, so that would have run with 2
threads, but did not show up the issue (patch:
http://review.gluster.org/#/c/3842/)).

Again I state the above, as this should not be exposing a
race/bug/problem due to the multi threaded nature of epoll, but of
course needs investigation.



I'm not sure if this patch is the cause or it has revealed some bug in
ec or any other xlator.


I guess we can reproduce this issue? If so I would try setting
client.event-threads on master branch to 1, restarting the volume and
then running the test (as a part of the test itself maybe) to eliminate
the possibility that MT epoll is causing it.

My belief on MT epoll causing it is

Re: [Gluster-devel] Problems with ec/nfs.t in regression tests

2015-02-12 Thread Pranith Kumar Karampuri


On 02/13/2015 12:07 AM, Niels de Vos wrote:

On Thu, Feb 12, 2015 at 11:39:51PM +0530, Pranith Kumar Karampuri wrote:

On 02/12/2015 11:34 PM, Pranith Kumar Karampuri wrote:

On 02/12/2015 08:15 PM, Xavier Hernandez wrote:

I've made some more investigation and the problem seems worse.

It seems that NFS sends a huge amount of requests without waiting for
answers (I've had more than 1400 requests ongoing). Probably there will
be many factors that can influence on the load that this causes, and one
of them could be ec, but it's not related exclusively to ec. I've
repeated the test using a replica 3 and a replica 2 volumes and the
problem still happens.

The test basically writes a file to an NFS mount using 'dd'. The file
has a size of 1GB. With a smaller file, the test passes successfully.

Using NFS client and gluster NFS server on same machine with BIG file dd
operations is known to cause hangs. anon-fd-quota.t used to give similar
problems so we changed the test to not involve NFS mounts.

I don't re-collect the exact scenario. Avati found the deadlock of memory
allocation, when I just joined gluster, in 2010. Raghavendra Bhat raised
this bug then. CCed him to the thread as well if he knows the exact
scenario.

This is a well know issue. When a system is under memory pressure, it
will try to flush dirty pages from the VFS. The NFS-client will send the
dirty pages over the network to the NFS-server. Unfortunately, the
NFS-server needs to allocate memory for the handling or the WRITE
procedures. This causes a loop and will most often get the system into a
hang situation.
Yes. This was it :-). Seems like Xavi and Shyam found the reason for the 
failure though, which is not this.


Pranith


Mounting with "-o sync", or flushing outstanding I/O from the client
side should normally be sufficient to prevent these issues.

Nice, didn't know about this.

Pranith


Niels


Pranith

Pranith

One important thing to note is that I'm not using powerful servers (a
dual core Intel Atom), but this problem shouldn't happen anyway. It can
even happen on more powerful servers if they are busy doing other things
(maybe this is what's happening on jenkins' slaves).

I think that this causes some NFS requests to timeout. This can be seen
in /var/log/messages (there are many of these messages):

Feb 12 15:18:45 celler01 kernel: nfs: server gf01.datalab.es not
responding, timed out

nfs log also has many errors:

[2015-02-12 14:18:45.132905] E [rpcsvc.c:1257:rpcsvc_submit_generic]
0-rpc-service: failed to submit message (XID: 0x7be78dbe, Program: NFS3,
ProgVers: 3, Proc: 7) to rpc
-transport (socket.nfs-server)
[2015-02-12 14:18:45.133009] E [nfs3.c:565:nfs3svc_submit_reply]
0-nfs-nfsv3: Reply submission failed

Additionally this causes disconnections from NFS that are not correctly
handled causing that a thread gets stuck in an infinite loop (I haven't
analyzed this problem deeply, but it seems like an attempt to use an
already disconnected socket). After a while, I get this error on the nfs
log:

[2015-02-12 14:20:19.545429] C
[rpc-clnt-ping.c:109:rpc_clnt_ping_timer_expired] 0-patchy-client-0:
server 192.168.200.61:49152 has not responded in the last 42 seconds,
disconnecting.

The console executing the test shows this (nfs.t is creating a replica 3
instead of a dispersed volume):

# ./run-tests.sh tests/basic/ec/nfs.t

... GlusterFS Test Framework ...

Running tests in file ./tests/basic/ec/nfs.t
[14:12:52] ./tests/basic/ec/nfs.t .. 8/10 dd: error writing
‘/mnt/nfs/0/test’: Input/output error
[14:12:52] ./tests/basic/ec/nfs.t .. 9/10
not ok 9
[14:12:52] ./tests/basic/ec/nfs.t .. Failed 1/10 subtests
[14:27:41]

Test Summary Report
---
./tests/basic/ec/nfs.t (Wstat: 0 Tests: 10 Failed: 1)
Failed test: 9
Files=1, Tests=10, 889 wallclock secs ( 0.13 usr 0.02 sys + 1.29 cusr
3.45 csys = 4.89 CPU)
Result: FAIL
Failed tests ./tests/basic/ec/nfs.t

Note that the test takes almost 15 minutes to complete.

Is there any way to limit the number of requests NFS sends without
having an answer ?

Xavi

On 02/11/2015 04:20 PM, Shyam wrote:

On 02/11/2015 09:40 AM, Xavier Hernandez wrote:

Hi,

it seems that there are some failures in ec/nfs.t test on regression
tests. Doing some investigation I've found that before applying the
multi-threaded patch (commit 5e25569e) the problem does not seem to
happen.

This has in interesting history in failures, on the regression runs for
the MT epoll this (i.e ec/nfs.t) did not fail (there were others, but
not nfs.t).

The patch that allows configuration of MT epoll is where this started
failing around Feb 5th (but later passed). (see patchset 7 failures on,
http://review.gluster.org/#/c/9488/ )

I state the above, as it may help narrowing down the changes in EC
(maybe) that could have caused it.

Also in the latter commit, there was an error configuring the number of
threads so all regression runs would

Re: [Gluster-devel] How can we prevent GlusterFS packaging installation/update issues in future?

2015-02-19 Thread Pranith Kumar Karampuri


On 02/19/2015 02:30 PM, Niels de Vos wrote:

Hey Pranith!

Thanks for putting this topic on my radar. Uncommunicated packaging
changes have indeed been a pain for non-RPM distributions on several
occasions. We should try to inform other packagers about required
changes in the packaging scripts or upgrade/installation process better.

On Thu, Feb 19, 2015 at 12:26:33PM +0530, Pranith Kumar Karampuri wrote:

https://bugzilla.redhat.com/show_bug.cgi?id=1113778
https://bugzilla.redhat.com/show_bug.cgi?id=1191176

How can we make the process of giving good packages for things other than
RPMs?

My guess is that we need to announce packaging changes very clearly.
Maybe it makes sense to have a very low-traffic packag...@gluster.org
mailinglist where all packagers from all distributions are subscribed?

I've added all packagers that I could track on CC, and am interested in
their preferences and ideas.

Thanks Niels!
First thing we need to get going is to make other packaging infra to 
handle gluster upgrades. i.e. On upgrade glusterd needs to be restarted, 
first it needs to start with 'upgrade' option 'on' so that it starts 
re-generate the configuration files and dies, then 'glusterd' needs to 
be started again, normally. How do we get this done in other 
distributions if not done already?


Pranith


Thanks,
Niels


___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] Regression host hung on tests/basic/afr/split-brain-healing.t

2015-02-26 Thread Pranith Kumar Karampuri


On 02/26/2015 02:54 AM, Justin Clift wrote:

Anyone have an interest in a regression test VM that's (presently) hung on
tests/basic/afr/split-brain-healing.t?  Likely to be a spurious error.

I can either reboot the VM and put it back into service, or I can leave it
for someone to log into and figure out why it's hung.

Trying to decide which way to go. :)

Justin,
I copied others who are working on afr as well so that they can 
take a look.


Pranith


+ Justin

--
GlusterFS - http://www.gluster.org

An open source, distributed file system scaling to several
petabytes, and handling thousands of clients.

My personal twitter: twitter.com/realjustinclift



___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] Spurious failure report for master branch - 2015-03-03

2015-03-06 Thread Pranith Kumar Karampuri


On 03/04/2015 09:57 AM, Justin Clift wrote:

Ran 20 x regression tests on our GlusterFS master branch code
as of a few hours ago, commit 95d5e60afb29aedc29909340e7564d54a6a247c2.

5 of them were successful (25%), 15 of them failed in various ways
(75%).

We need to get this down to about 5% or less (preferably 0%), as it's
killing our development iteration speed.  We're wasting huge amounts
of time working around this. :(


Spurious failures
*

   * 5 x tests/bugs/distribute/bug-1117851.t
   (Wstat: 0 Tests: 24 Failed: 1)
 Failed test:  15

 This one is causing a 25% failure rate all by itself. :(

 This needs fixing soon. :)


   * 3 x tests/bugs/geo-replication/bug-877293.t
   (Wstat: 0 Tests: 15 Failed: 1)
 Failed test:  11

Nice catch by regression. Fix: http://review.gluster.org/9817

Pranith


   * 2 x tests/basic/afr/entry-self-heal.t  
   (Wstat: 0 Tests: 180 Failed: 2)
 Failed tests:  127-128

   * 1 x tests/basic/ec/ec-12-4.t   
   (Wstat: 0 Tests: 541 Failed: 2)
 Failed tests:  409, 441

   * 1 x tests/basic/fops-sanity.t  
   (Wstat: 0 Tests: 11 Failed: 1)
 Failed test:  10

   * 1 x tests/basic/uss.t  
   (Wstat: 0 Tests: 160 Failed: 1)
 Failed test:  26

   * 1 x tests/performance/open-behind.t
   (Wstat: 0 Tests: 17 Failed: 1)
 Failed test:  17

   * 1 x tests/bugs/distribute/bug-884455.t 
   (Wstat: 0 Tests: 22 Failed: 1)
 Failed test:  11

   * 1 x tests/bugs/fuse/bug-1126048.t  
   (Wstat: 0 Tests: 12 Failed: 1)
 Failed test:  10

   * 1 x tests/bugs/quota/bug-1038598.t 
   (Wstat: 0 Tests: 28 Failed: 1)
 Failed test:  28


2 x Coredumps
*

   * http://mirror.salasaga.org/gluster/master/2015-03-03/bulk5/

 IP - 104.130.74.142

 This coredump run also failed on:

   * tests/basic/fops-sanity.t  
   (Wstat: 0 Tests: 11 Failed: 1)
 Failed test:  10

   * tests/bugs/glusterfs-server/bug-861542.t   
   (Wstat: 0 Tests: 13 Failed: 1)
 Failed test:  10

   * tests/performance/open-behind.t
   (Wstat: 0 Tests: 17 Failed: 1)
 Failed test:  17

   * http://mirror.salasaga.org/gluster/master/2015-03-03/bulk8/

 IP - 104.130.74.143

 This coredump run also failed on:

   * tests/basic/afr/entry-self-heal.t  
   (Wstat: 0 Tests: 180 Failed: 2)
 Failed tests:  127-128

   * tests/bugs/glusterfs-server/bug-861542.t   
   (Wstat: 0 Tests: 13 Failed: 1)
 Failed test:  10

Both VMs are also online, in case they're useful to log into
for investigation (root / the jenkins slave pw).

If they're not, please let me know so I can blow them away. :)


1 x hung host
*

Hung on tests/bugs/posix/bug-1113960.t

root  12497  1290  0 Mar03 ?  S  0:00  \_ /bin/bash /opt/qa/regression.sh
root  12504 12497  0 Mar03 ?  S  0:00  \_ /bin/bash ./run-tests.sh
root  12519 12504  0 Mar03 ?  S  0:03  \_ /usr/bin/perl /usr/bin/prove 
-rf --timer ./tests
root  22018 12519  0 00:17 ?  S  0:00  \_ /bin/bash 
./tests/bugs/posix/bug-1113960.t
root  30002 22018  0 01:57 ?  S  0:00  \_ mv 
/mnt/glusterfs/0/longernamedir1/longernamedir2/longernamedir3/

This VM (23.253.53.111) is still online + untouched (still hung),
if someone wants to log in to investigate.  (root / the jenkins
slave pw)

Hope that's helpful. :)

Regards and best wishes,

Justin Clift

--
GlusterFS - http://www.gluster.org

An open source, distributed file system scaling to several
petabytes, and handling thousands of clients.

My personal twitter: twitter.com/realjustinclift

___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] tests/bugs/replicate/bug-918437-sh-mtime.t failing for a lot of regression runs

2015-03-11 Thread Pranith Kumar Karampuri


On 03/11/2015 11:16 AM, Kaushal M wrote:

Hey Pranith,
The above test is failing for a lot (almost all) regression runs. I 
think this is due the changes introduced to the heal command. Can you 
please take a look quickly?
Ravi already faced this issue. Something to do with nfs it seems. Kp 
sent a patch yesterday http://review.gluster.org/9851/


Pranith


~kaushal


___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] Responsibilities and expectations of our maintainers

2015-03-27 Thread Pranith Kumar Karampuri


On 03/25/2015 07:18 PM, Emmanuel Dreyfus wrote:

On Wed, Mar 25, 2015 at 02:04:10PM +0100, Niels de Vos wrote:

1. Who is going to maintain the new features?
2. Maintainers should be active in responding to users
3. What about reported bugs, there is the Bug Triaging in place?
4. Maintainers should keep an eye on open bugs affecting their component
5. Maintainers are expected to be responsive on patch reviews
6. Maintainers should try to attend IRC meetings

May I suggest a personnal item:
  7. Check your feature does not break NetBSD regression

NetBSD regression does not vote but is reported in gerrit. Please seek
help resolving breakage before merging.

Emmanuel,
What can we do to make it vote -2 when it fails? Things will 
automatically fall in place if it gives -2.


Pranith




___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] Responsibilities and expectations of our maintainers

2015-03-28 Thread Pranith Kumar Karampuri


On 03/28/2015 02:08 PM, Emmanuel Dreyfus wrote:

Pranith Kumar Karampuri  wrote:


Emmanuel,
  What can we do to make it vote -2 when it fails? Things will
automatically fall in place if it gives -2.

I will do this once I will have recovered. The changelog change broke
regression for weeks, and now we have a fix for it I discover many other
poblems have crop.
By which time some more problems may creep in, it will be chicken and 
egg problem. Force a -2. Everybody will work just on Netbsd for a while 
but after that things should be just similar to Linux. It would probably 
be a good idea to decide a date on which this forcing would happen.


Pranith


While there, to anyone:
- dd bs=1M is not portable. Use
   dd bs=1024k
- echo 3 > /proc/sys/vm/drop_caches is not portable. use instead this
command that fails but flushes inodes first.
   ( cd $M0 && umount $M0 )
- umount $N0 brings many problems, use instead
   EXPECT_WITHIN $UMOUNT_TIMEOUT "Y" umount_nfs $N0




___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] About split-brain-resolution.t

2015-03-28 Thread Pranith Kumar Karampuri


On 03/28/2015 09:51 PM, Emmanuel Dreyfus wrote:

Hi

I see split-brain-resolution.t uses attribute replica.split-brain-choice
to choose what replica should be used. This attribute is not in
privilegied space (trusted. prefixed). Is it on purpose?
Yes, these are used as internal commands to make a choice when file is 
in split-brain.


While there, just in case someone has an idea on this: on NetBSD setting
this attribute is ignored. The brick gets some request but does not set
the attribute. Not yet investigated, but just in case someone has an
idea...

Yes it is treated as command and not set on the file.




___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] About split-brain-resolution.t

2015-03-30 Thread Pranith Kumar Karampuri


On 03/29/2015 02:23 PM, Emmanuel Dreyfus wrote:

Pranith Kumar Karampuri  wrote:


I see split-brain-resolution.t uses attribute replica.split-brain-choice
to choose what replica should be used. This attribute is not in
privilegied space (trusted. prefixed). Is it on purpose?

Yes, these are used as internal commands to make a choice when file is
in split-brain.

Here is how the feature is broken on NetBSD:
setting split-brain-resolution.t  causes afr inode context's spb_choice
to be set to the desired source. That works.

But when I try to read from the file, spb_choice is -1. This is because
in the meantime, the context has been destroyed by afr_destroy()
and reallocated with sbp_choice set to default value -1.

Since spb_choice is not saved as an attribute for the file on the
bricks, it cannot be recovered when the context is reallocated. Either
that "save" feature has been forgotten, or going to afr_destroy() here
is a bug. Here is the backtrace leading there:
This is a known issue :-(. I will need to talk to Anuradha once about 
this issue. She is not in today. Will let you know about the decision.


Pranith


0xbb751313 <_gf_msg_backtrace_nomem+0xc0> at 
/autobuild/install/lib/libglusterfs.so.0
0xb9b441af  at 
/autobuild/install/lib/glusterfs/3.7dev/xlator/cluster/replicate.so
0xbb771801  at 
/autobuild/install/lib/libglusterfs.so.0
0xbb7718af  at 
/autobuild/install/lib/libglusterfs.so.0
0xbb773a38  at 
/autobuild/install/lib/libglusterfs.so.0
0xbb771dd4  at /autobuild/install/lib/libglusterfs.so.0
0xbb277518  at 
/autobuild/install/lib/glusterfs/3.7dev/xlator/mount/fuse.so
0xbb277639  at 
/autobuild/install/lib/glusterfs/3.7dev/xlator/mount/fuse.so
0xbb28d70f  at 
/autobuild/install/lib/glusterfs/3.7dev/xlator/mount/fuse.so
0xbb6a2bca <__libc_thr_exit+0x1f8> at /usr/lib/libpthread.so.1





___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] About split-brain-resolution.t

2015-03-30 Thread Pranith Kumar Karampuri


On 03/30/2015 06:34 PM, Emmanuel Dreyfus wrote:

Pranith Kumar Karampuri  wrote:


Since spb_choice is not saved as an attribute for the file on the
bricks, it cannot be recovered when the context is reallocated. Either
that "save" feature has been forgotten, or going to afr_destroy() here
is a bug. Here is the backtrace leading there:

This is a known issue :-(. I will need to talk to Anuradha once about
this issue. She is not in today. Will let you know about the decision.

It seems the thing arise because a threads quits and decide to cleanup
stuff. Do we have an idea what this thread is? For the test to pass we
need to keep the thread alive.

Of course that works around a real problem. Why don't we immediatly
clear pending xattr when replica.split-brain-choice is set? That would
clear the split brain state.

this is how the feature is supposed to work:
https://github.com/gluster/glusterfs/blob/master/doc/features/heal-info-and-split-brain-resolution.md

Basically the choice is given to inspect the 'data' of the file. Then 
one can finalize the choice which will clear the pending xattrs after 
resolving the split-brain.
Problem here is that ' inode_forget' is coming even before it gets to 
inspect the file. We initially thought we should 'ref' the inode when 
the user specifies the choice and 'unref' it at the time of 'finalize' 
or 'abort' of the operation. But that may lead to un-necessary leaks 
when the user forgets to do either finalize/abort the operation. One way 
to get around it is to ref the inode for some 'pre-determined time' when 
'choice' is given.


Pranith




___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] About split-brain-resolution.t

2015-03-30 Thread Pranith Kumar Karampuri


On 03/30/2015 06:01 PM, Emmanuel Dreyfus wrote:

On Mon, Mar 30, 2015 at 05:44:23PM +0530, Pranith Kumar Karampuri wrote:

Problem here is that ' inode_forget' is coming even before it gets to
inspect the file. We initially thought we should 'ref' the inode when the
user specifies the choice and 'unref' it at the time of 'finalize' or
'abort' of the operation. But that may lead to un-necessary leaks when the
user forgets to do either finalize/abort the operation. One way to get
around it is to ref the inode for some 'pre-determined time' when 'choice'
is given.

That suggests the design is not finalized ans the implementation ought to
have unwanted behaviors. IMO the test should be retired until the design
and implementation is completed.
I will work with Anuradha tomorrow about this one and either send a 
patch to remove the .t file or send the fix which makes things right.


Pranith




___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] crypt xlator bug

2015-04-02 Thread Pranith Kumar Karampuri


On 04/02/2015 12:27 AM, Raghavendra Talur wrote:



On Wed, Apr 1, 2015 at 10:34 PM, Justin Clift > wrote:


On 1 Apr 2015, at 10:57, Emmanuel Dreyfus mailto:m...@netbsd.org>> wrote:
> Hi
>
> crypt.t was recently broken in NetBSD regression. The glusterfs
returns
> a node with file type invalid to FUSE, and that breaks the test.
>
> After running a git bisect, I found the offending commit after which
> this behavior appeared:
>8a2e2b88fc21dc7879f838d18cd0413dd88023b7
>mem-pool: invalidate memory on GF_FREE to aid debugging
>
> This means the bug has always been there, but this debugging aid
> caused it to be reliable.

Sounds like that commit is a good win then. :)

Harsha/Pranith/Lala, your names are on the git blame for crypt.c...
any ideas? :)


I found one issue that local is not allocated using GF_CALLOC and with 
a mem-type.

This is a patch which *might* fix it.

diff --git a/xlators/encryption/crypt/src/crypt-mem-types.h 
b/xlators/encryption/crypt/src/crypt-mem-types.h

index 2eab921..c417b67 100644
--- a/xlators/encryption/crypt/src/crypt-mem-types.h
+++ b/xlators/encryption/crypt/src/crypt-mem-types.h
@@ -24,6 +24,7 @@ enum gf_crypt_mem_types_ {
gf_crypt_mt_key,
gf_crypt_mt_iovec,
gf_crypt_mt_char,
+gf_crypt_mt_local,
gf_crypt_mt_end,
 };
diff --git a/xlators/encryption/crypt/src/crypt.c 
b/xlators/encryption/crypt/src/crypt.c

index ae8cdb2..63c0977 100644
--- a/xlators/encryption/crypt/src/crypt.c
+++ b/xlators/encryption/crypt/src/crypt.c
@@ -48,7 +48,7 @@ static crypt_local_t *crypt_alloc_local(call_frame_t 
*frame, xlator_t *this,

 {
crypt_local_t *local = NULL;
-   local = mem_get0(this->local_pool);
+local = GF_CALLOC (sizeof (*local), 1, gf_crypt_mt_local);
local is using the memory from pool earlier(i.e. with mem_get0()). Which 
seems ok to me. Changing it this way will include memory allocation in 
fop I/O path which is why xlators generally use the mem-pool approach.


Pranith

if (!local) {
gf_log(this->name, GF_LOG_ERROR, "out of memory");
return NULL;


Niels should be able to recognize if this is sufficient fix or not.

Thanks,
Raghavendra Talur

+ Justin

--
GlusterFS - http://www.gluster.org

An open source, distributed file system scaling to several
petabytes, and handling thousands of clients.

My personal twitter: twitter.com/realjustinclift


___
Gluster-devel mailing list
Gluster-devel@gluster.org 
http://www.gluster.org/mailman/listinfo/gluster-devel




--
*Raghavendra Talur *



___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] crypt xlator bug

2015-04-02 Thread Pranith Kumar Karampuri


On 04/02/2015 07:27 PM, Raghavendra Bhat wrote:

On Thursday 02 April 2015 05:50 PM, Jeff Darcy wrote:
I think, crypt xlator should do a mem_put of local after doing 
STACK_UNWIND

like other xlators which also use mem_get for local (such as AFR). I am
suspecting crypt not doing mem_put might be the reason for the bug
mentioned.

My understanding was that mem_put should be called automatically from
FRAME_DESTROY, which is itself called from STACK_DESTROY when the fop
completes (e.g. at FUSE or GFAPI).  On the other hand, I see that AFR
and others call mem_put themselves, without zeroing the local pointer.
In my (possibly no longer relevant) experience, freeing local myself
without zeroing the pointer would lead to a double free, and I don't
see why that's not the case here.  What am I missing?


As per my understanding, the xlators which get local by mem_get should 
be doing below things in callback funtion  just before unwinding:


1) save frame->local pointer (i.e. local = frame->local);
2) STACK_UNWIND
3) mem_put (local)

After STACK_UNWIND and before mem_put any reference to fd or inode or 
dict that might be present in the local should be unrefed (also any 
allocated resources that are present in local should be freed). So 
mem_put is done at last. To avoid double free in FRAME_DESTROY, 
frame->local is set to NULL before doing STACK_UNWIND.


I suspect not doing 1 of the above three operations (may be either 1st 
or 3rd) in crypt xlator might be the reason for the bug.
I still don't understand why http://review.gluster.org/10109 is working. 
Does anyone know the reason? How are you guys re-creating the crash? I 
ran crypt.t but no crashes on my laptop. Could some one help me 
re-create this issue.


Pranith


Regards,
Raghavendra Bhat


___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] crypt xlator bug

2015-04-02 Thread Pranith Kumar Karampuri


On 04/01/2015 03:27 PM, Emmanuel Dreyfus wrote:

Hi

crypt.t was recently broken in NetBSD regression. The glusterfs returns
a node with file type invalid to FUSE, and that breaks the test.

After running a git bisect, I found the offending commit after which
this behavior appeared:
 8a2e2b88fc21dc7879f838d18cd0413dd88023b7
 mem-pool: invalidate memory on GF_FREE to aid debugging

This means the bug has always been there, but this debugging aid
caused it to be reliable.

With the help of an assertion, I can detect when inode->ia_type gets
a corrupted value. It gives me this backtrace where in frame 4,
inode = 0xb9611880 and inode->ia_type = 12475 (which is wrong).
inode value comes from FUSE state->loc->inode and we get it from
frame 20 which is in crypt.c:

#4  0xb9bd2adf in mdc_inode_iatt_get (this=0xbb1df030,
 inode=0xb9611880, iatt=0xbf7fdfa0) at md-cache.c:471
#5  0xb9bd34e1 in mdc_lookup (frame=0xb9aa82b0, this=0xbb1df030,
 loc=0xb9608840, xdata=0x0) at md-cache.c:847
#6  0xb9bc216e in io_stats_lookup (frame=0xb9aa8200, this=0xbb1e0030,
 loc=0xb9608840, xdata=0x0) at io-stats.c:1934
#7  0xbb76755f in default_lookup (frame=0xb9aa8200, this=0xbb1d0030,
 loc=0xb9608840, xdata=0x0) at defaults.c:2138
#8  0xb9ba69cd in meta_lookup (frame=0xb9aa8200, this=0xbb1d0030,
 loc=0xb9608840, xdata=0x0) at meta.c:49
#9  0xbb277365 in fuse_lookup_resume (state=0xb9608830) at fuse-bridge.c:607
#10 0xbb276e07 in fuse_fop_resume (state=0xb9608830) at fuse-bridge.c:569
#11 0xbb274969 in fuse_resolve_done (state=0xb9608830) at fuse-resolve.c:644
#12 0xbb274a29 in fuse_resolve_all (state=0xb9608830) at fuse-resolve.c:671
#13 0xbb274941 in fuse_resolve (state=0xb9608830) at fuse-resolve.c:635
#14 0xbb274a06 in fuse_resolve_all (state=0xb9608830) at fuse-resolve.c:667
#15 0xbb274a8e in fuse_resolve_continue (state=0xb9608830) at fuse-resolve.c:687
#16 0xbb2731f4 in fuse_resolve_entry_cbk (frame=0xb9609688,
 cookie=0xb96140a0, this=0xbb193030, op_ret=0, op_errno=0,
 inode=0xb9611880, buf=0xb961e558, xattr=0xbb18a1a0,
 postparent=0xb961e628) at fuse-resolve.c:81
#17 0xb9bbd0c1 in io_stats_lookup_cbk (frame=0xb96140a0,
 cookie=0xb9614150, this=0xbb1e0030, op_ret=0, op_errno=0,
 inode=0xb9611880, buf=0xb961e558, xdata=0xbb18a1a0,
 postparent=0xb961e628) at io-stats.c:1512
#18 0xb9bd33ff in mdc_lookup_cbk (frame=0xb9614150, cookie=0xb9614410,
 this=0xbb1df030, op_ret=0, op_errno=0,
 inode=0xb9611880, stbuf=0xb961e558, dict=0xbb18a1a0,
  postparent=0xb961e628) at md-cache.c:816
#19 0xb9be2b10 in ioc_lookup_cbk (frame=0xb9614410, cookie=0xb96144c0,
 this=0xbb1de030, op_ret=0, op_errno=0,
 inode=0xb9611880, stbuf=0xb961e558, xdata=0xbb18a1a0,
 postparent=0xb961e628) at io-cache.c:260
#20 0xbb227fb5 in load_file_size (frame=0xb96144c0, cookie=0xb9aa8200,
 this=0xbb1db030, op_ret=0, op_errno=0,
 dict=0xbb18a470, xdata=0x0) at crypt.c:3830

In frame 20:
 case GF_FOP_LOOKUP:
STACK_UNWIND_STRICT(lookup,
frame,
op_ret,
op_errno,
op_ret >= 0 ? local->inode : NULL,
op_ret >= 0 ? &local->buf : NULL,
local->xdata,
op_ret >= 0 &local->postbuf : NULL);
  
Here is the problem, local->inode is not the 0xb9611880 value anymore,

which means local got corrupted:

(gdb) print local->inode
$2 = (inode_t *) 0x1db030de

I now suspect local has been freed, but I do not find where in crypt.c
this operation is done. There is a local = mem_get0(this->local_pool)
in crypt_alloc_local, but where is that structure freed? There is
no mem_put() call in crypt xlator.
I joined this thread after seeing raghavendra talur's patch which fixed 
the issue, which seemed extremely odd to me. Just checked this mail from 
you and
local->inode in crypt need not be same as state->loc->inode because, 
inode_link in fuse_resolve_entry_cbk will give address of already linked 
inode with same gfid if one exists. I see hardlink related commands in 
crypt.t so this could be part of looking up extra link may be? which is 
resolving to older inode that is already linked. It is still some memory 
problem, but may not be anything to do with crypt. Could you let me know 
the details of the setup where you saw this issue? I can take a look.


Pranith





___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


[Gluster-devel] regarding sharding

2015-04-05 Thread Pranith Kumar Karampuri

hi,
  As I am not able to spend much time on sharding, Kritika is the 
handling it completely now. I am only doing reviews. Just letting 
everyone know so that future communication will happen directly with the 
active developer :-).


Pranith
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] Possibly root cause for the Gluster regression test cores?

2015-04-08 Thread Pranith Kumar Karampuri


On 04/08/2015 06:20 PM, Justin Clift wrote:

Hi Pranith,

Hagarth mentioned in the weekly IRC meeting that you have an
idea what might be causing the regression tests to generate
cores?

Can you outline that quickly, as Jeff has some time and might
be able to help narrow it down further. :)

(and these core files are really annoying :/)
I feel it is a lot like 
https://bugzilla.redhat.com/show_bug.cgi?id=1184417. clear-locks command 
is not handled properly after we did the client_t refactor. I believe 
that is the reason for the crashes but I could be wrong. But After 
looking at the code I feel there is high probability that this is the 
issue. I didn't find it easy to fix. We will need to change the lock 
structure list maintenance heavily. Easier thing would be to disable 
clear-locks functionality tests in the regression as it is not something 
that is used by the users IMO and see if it indeed is the same issue. 
There are 2 tests using this command:

18:34:00 :) ⚡ git grep clear-locks tests
tests/bugs/disperse/bug-1179050.t:TEST $CLI volume clear-locks $V0 / 
kind all inode
tests/bugs/glusterd/bug-824753-file-locker.c: "gluster volume 
clear-locks %s /%s kind all posix 0,7-1 |"


If even after disabling these two tests it fails then we will need to 
look again. I think jeff's patch which will find the test which 
triggered the core should help here.


Pranith


Regards and best wishes,

Justin Clift

--
GlusterFS - http://www.gluster.org

An open source, distributed file system scaling to several
petabytes, and handling thousands of clients.

My personal twitter: twitter.com/realjustinclift



___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] Possibly root cause for the Gluster regression test cores?

2015-04-09 Thread Pranith Kumar Karampuri


On 04/08/2015 07:08 PM, Justin Clift wrote:

On 8 Apr 2015, at 14:13, Pranith Kumar Karampuri  wrote:

On 04/08/2015 06:20 PM, Justin Clift wrote:



Hagarth mentioned in the weekly IRC meeting that you have an
idea what might be causing the regression tests to generate
cores?

Can you outline that quickly, as Jeff has some time and might
be able to help narrow it down further. :)

(and these core files are really annoying :/)

I feel it is a lot like https://bugzilla.redhat.com/show_bug.cgi?id=1184417. 
clear-locks command is not handled properly after we did the client_t refactor. 
I believe that is the reason for the crashes but I could be wrong. But After 
looking at the code I feel there is high probability that this is the issue. I 
didn't find it easy to fix. We will need to change the lock structure list 
maintenance heavily. Easier thing would be to disable clear-locks functionality 
tests in the regression as it is not something that is used by the users IMO 
and see if it indeed is the same issue. There are 2 tests using this command:
18:34:00 :) ⚡ git grep clear-locks tests
tests/bugs/disperse/bug-1179050.t:TEST $CLI volume clear-locks $V0 / kind all 
inode
tests/bugs/glusterd/bug-824753-file-locker.c: "gluster volume clear-locks %s /%s 
kind all posix 0,7-1 |"

If even after disabling these two tests it fails then we will need to look 
again. I think jeff's patch which will find the test which triggered the core 
should help here.

Thanks Pranith. :)

Is this other "problem when disconnecting" BZ possibly related, or is that a
different thing?

   https://bugzilla.redhat.com/show_bug.cgi?id=1195415

I feel 1195415 could be a duplicate of 1184417.

Pranith


+ Justin

--
GlusterFS - http://www.gluster.org

An open source, distributed file system scaling to several
petabytes, and handling thousands of clients.

My personal twitter: twitter.com/realjustinclift



___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


[Gluster-devel] cluster syncop framework

2015-04-21 Thread Pranith Kumar Karampuri

hi,
For implementing directory healing in ec I needed to generalize the 
cluster syncop implementation done in afr-v2 which makes things easy for 
implementing something like self-heal. The patch is at 
http://review.gluster.org/10240
Please feel free to let me know your comments. 
http://review.gluster.org/10298 uses this frame work to implement 
directory/name self-heal in ec. Re-implemented metadata self-heal in ec 
as well using this framework.


Most important things to look at are the following macros:
FOP_ONLIST - Performs the fop on the list provided in parallel
FOP_SEQ - Performs the fop on the list provided sequentially
FOP_CBK - Common cbk implementation which stores the replies from each 
of the subvolumes.


I have cced the people who I know have used barrier framework which is 
used to implement this.


One interesting thought for future is  to use this frame work in I/O 
path and measure the performance difference. If the performance 
difference is not a lot, we can probably use this heavily, because it 
makes things really easy.


Pranith
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] NetBSD regression status upate

2015-04-29 Thread Pranith Kumar Karampuri


On 04/30/2015 08:44 AM, Emmanuel Dreyfus wrote:

Hi

Here is NetBSD regression status update for broken tests:

- tests/basic/afr/split-brain-resolution.t
Anuradha Talur is working on it, the change being still under review
http://review.gluster.org/10134

- tests/basic/ec/
This works but with rare spurious faiures. Nobody works on it.
This is not specific to NetBSD, This also happens on Linux. I am looking 
into them one at a time(At the moment ec-3-1.t). I will post the updates.
On a related note, I see glupy is failing spuriously as well: 
http://build.gluster.org/job/rackspace-netbsd7-regression-triggered/4080/consoleFull, 
http://build.gluster.org/job/rackspace-netbsd7-regression-triggered/4007/consoleFull


Know anything about it?

Pranith
  
- tests/basic/quota-anon-fd-nfs.t

Jiffin Tony Thottan is working on it

- tests/basic/mgmt_v3-locks.t
This was fixed, changes are awaiting to be merged:
http://review.gluster.org/10425
http://review.gluster.org/10426
  
- tests/basic/tier/tier.t

With the help of Dan Lambright, two bugs were fixed (change merged). A
third one awaits review for master (release-3.7 not yet submitted)
http://review.gluster.org/10411

NB: This change was merged on release-3.7 but not on master:
http://review.gluster.org/10407

- tests/bugs
Mostly uncharted terrirory, we will not work on it for release-3.7

- tests/geo-rep
I started investigating and awaits input from Kotresh Hiremath
Ravishankar.

- tests/features/trash.t
Anoop C S, Jiffin Tony Thottan and I fixed it, changes are merged.




___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] NetBSD regression status upate

2015-04-29 Thread Pranith Kumar Karampuri


On 04/30/2015 10:02 AM, Emmanuel Dreyfus wrote:

Pranith Kumar Karampuri  wrote:


On a related note, I see glupy is failing spuriously as well:
Know anything about it?

glupy.t used to work and was broken quite recenly. My investigation led
to a free on an invalid pointer, but I am not able to reproduce it
reliabily.
Do you mind giving +1 from NetBSD regression side for 
http://review.gluster.com/10391 in that case?


Pranith

___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] NetBSD regression status upate

2015-04-29 Thread Pranith Kumar Karampuri


On 04/30/2015 10:18 AM, Emmanuel Dreyfus wrote:

Pranith Kumar Karampuri  wrote:


Do you mind giving +1 from NetBSD regression side for
http://review.gluster.com/10391 in that case?

Sure, but you can also rebase: glupy.t result is now ignored by
run-tests.sh (regardless of the OS).


Will do. Thanks

Pranith
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


[Gluster-devel] ec spurious regression failures

2015-04-30 Thread Pranith Kumar Karampuri

hi,
 I see that ec tests are failing because of 'df -h' test failure. 
It failing because dh -h fails on stale quota aux mount from the looks 
of it.


df: `/var/run/gluster/patchy': Transport endpoint is not connected 
-

[07:38:37] ./tests/basic/ec/ec-3-1.t .. not ok 11
 I see that ec tests are failing because of 'df -h' test failure 
which is failing because it is failing aux mount df -h failure which is 
done by quota.


I see from the test tests/bugs/quota/bug-1049323.t
gluster volume stop should stop the quota mount, so I sent 
http://review.gluster.org/10480


I think it is better to unmount it in 'cleanup' as well, but I am not 
sure how to get the 'run' directory in a generic way. On my laptop it is 
mounted on '/run/gluster/' instead of '/var/run/gluster/'


Sachin/Vijai,
Could you guys do the necessary things to get this in 'cleanup'?

Thanks
Pranith

___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


[Gluster-devel] spurious regression failures for ./tests/basic/fops-sanity.t

2015-05-01 Thread Pranith Kumar Karampuri

hi,
I see the following logs when the failure happens:
[2015-05-01 10:37:44.157477] E 
[dht-helper.c:900:dht_migration_complete_check_task] 0-patchy-dht: 
(null): failed to get the 'linkto' xattr No data avai

lable
[2015-05-01 10:37:44.157504] W [fuse-bridge.c:2190:fuse_readv_cbk] 
0-glusterfs-fuse: 25: READ => -1 (No data available)


Then the program fails with following message:
read failed: No data available
read returning junk
fd based file operation 1 failed
read failed: No data available
read returning junk
fstat failed : No data available
fd based file operation 2 failed
read failed: No data available
read returning junk
dup fd based file operation failed
not ok 10

Could you let us know when this can happen and post a patch which will 
fix it? Please let us know who is going to fix it.


Pranith
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


[Gluster-devel] Spurious test failure in tests/bugs/distribute/bug-1122443.t

2015-05-01 Thread Pranith Kumar Karampuri

hi,
  Found the reason for this too:
ok 8
not ok 9 Got "in" instead of "completed"
FAILED COMMAND: completed remove_brick_status_completed_field patchy 
pranithk-laptop:/d/backends/patchy0
volume remove-brick commit: failed: use 'force' option as migration is 
in progress

not ok 10
FAILED COMMAND: gluster --mode=script --wignore volume remove-brick 
patchy pranithk-laptop:/d/backends/patchy0 commit

ok 11
ok 12
Failed 2/12 subtests

Test Summary Report
---
tests/bugs/distribute/bug-1122443.t (Wstat: 0 Tests: 12 Failed: 2)
  Failed tests:  9-10

Here is the fix:
http://review.gluster.org/10487

Pranith
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


[Gluster-devel] netbsd regression logs

2015-05-01 Thread Pranith Kumar Karampuri

hi Emmanuel,
 I was not able to re-create glupy failure. I see that netbsd 
is not archiving logs like the linux regression. Do you mind adding that 
one? I think kaushal and Vijay did this for Linux regressions, so CC them.


Pranith
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] spurious regression failures for ./tests/basic/fops-sanity.t

2015-05-01 Thread Pranith Kumar Karampuri


On 05/01/2015 10:05 PM, Nithya Balachandran wrote:

Hi,

Can you point me to a Jenkins run with this failure?
I don't have one. But it is very easy to re-create. Just run the 
following in your workspace

while prove -rfv tests/basic/fops-sanity.t; do :; done
At least on my machine this failed in 5-10 minutes. Very consistent 
failure :-)


Pranith


Regards,
Nithya



- Original Message -

From: "Pranith Kumar Karampuri" 
To: "Shyam" , "Raghavendra Gowdappa" , 
"Nithya Balachandran"
, "Susant Palai" 
Cc: "Gluster Devel" 
Sent: Friday, 1 May, 2015 5:07:12 PM
Subject: spurious regression failures for ./tests/basic/fops-sanity.t

hi,
  I see the following logs when the failure happens:
[2015-05-01 10:37:44.157477] E
[dht-helper.c:900:dht_migration_complete_check_task] 0-patchy-dht:
(null): failed to get the 'linkto' xattr No data avai
lable
[2015-05-01 10:37:44.157504] W [fuse-bridge.c:2190:fuse_readv_cbk]
0-glusterfs-fuse: 25: READ => -1 (No data available)

Then the program fails with following message:
read failed: No data available
read returning junk
fd based file operation 1 failed
read failed: No data available
read returning junk
fstat failed : No data available
fd based file operation 2 failed
read failed: No data available
read returning junk
dup fd based file operation failed
not ok 10

Could you let us know when this can happen and post a patch which will
fix it? Please let us know who is going to fix it.

Pranith



___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] netbsd regression logs

2015-05-01 Thread Pranith Kumar Karampuri

Seems like glusterd failure from the looks of it: +glusterd folks.

Running tests in file ./tests/basic/cdc.t
volume delete: patchy: failed: Another transaction is in progress for patchy. 
Please try again after sometime.
[18:16:40] ./tests/basic/cdc.t ..
not ok 52
not ok 53 Got "Started" instead of "Stopped"
not ok 54
not ok 55
Failed 4/55 subtests
[18:16:40]

Pranith

On 05/02/2015 01:23 AM, Emmanuel Dreyfus wrote:

Justin Clift  wrote:


They are archived, in /archives/logs/ on the regressions VM. It's just
that you have to get them through sftp.

Is it easy to add web access for them?

It was really easy:
http://nbslave76.cloud.gluster.org/archives/logs/glusterfs-logs-20150501182952.tgz

Now there is the script in jenkins to tweak to give the URL instead of 
host:/path
I go offline, feel free to beat me at fixing this.



___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


[Gluster-devel] spurious failures in tests/basic/afr/sparse-file-self-heal.t

2015-05-01 Thread Pranith Kumar Karampuri

hi,
 As per the etherpad: 
https://public.pad.fsfe.org/p/gluster-spurious-failures


 * tests/basic/afr/sparse-file-self-heal.t (Wstat: 0 Tests: 64 Failed: 35)

 * Failed tests:  1-6, 11, 20-30, 33-34, 36, 41, 50-61, 64

 * Happens in master (Mon 30th March - git commit id
   3feaf1648528ff39e23748ac9004a77595460c9d)

 * (hasn't yet been added to BZs)

If glusterd itself fails to come up, of course the test will fail :-). 
Is it still happening?


Pranith


___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] spurious failures in tests/basic/afr/sparse-file-self-heal.t

2015-05-02 Thread Pranith Kumar Karampuri


On 05/02/2015 10:14 AM, Krishnan Parthasarathi wrote:

If glusterd itself fails to come up, of course the test will fail :-). Is it
still happening?

Pranith,

Did you get a chance to see glusterd logs and find why glusterd didn't come up?
Please paste the relevant logs in this thread.

No :-(. The etherpad doesn't have any links :-(.
Justin any help here?

Pranith




___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


[Gluster-devel] New glusterd crash, at least consistent on my laptop

2015-05-03 Thread Pranith Kumar Karampuri

Execute the following command on replicate volume:
root@pranithk-laptop - ~
17:23:02 :( ⚡ gluster v set r2 cluster.client-log-level 0
Connection failed. Please check if gluster daemon is operational.

Program terminated with signal SIGSEGV, Segmentation fault.
#0 0x0038e480c860 in pthread_spin_lock () from /lib64/libpthread.so.0
(gdb) bt
#0 0x0038e480c860 in pthread_spin_lock () from /lib64/libpthread.so.0
#1 0x7f80cb27ca32 in __gf_free (free_ptr=0x7f80a81ad760) at 
mem-pool.c:303
#2 0x7f80c197a13a in gd_sync_task_begin (op_ctx=0x7f80a800358c, 
req=0x7f80b0005ffc) at glusterd-syncop.c:1826
#3 0x7f80c197a1d0 in glusterd_op_begin_synctask (req=0x7f80b0005ffc, 
op=GD_OP_SET_VOLUME, dict=0x7f80a800358c) at glusterd-syncop.c:1846
#4 0x7f80c18dca39 in __glusterd_handle_set_volume 
(req=0x7f80b0005ffc) at glusterd-handler.c:1871
#5 0x7f80c18d784a in glusterd_big_locked_handler 
(req=0x7f80b0005ffc, actor_fn=0x7f80c18dc532 <__glusterd_handle_set_volume>)

at glusterd-handler.c:83
#6 0x7f80c18dcb2c in glusterd_handle_set_volume (req=0x7f80b0005ffc) 
at glusterd-handler.c:1893
#7 0x7f80cb28ef11 in synctask_wrap (old_task=0x7f80b0006b00) at 
syncop.c:375

#8 0x0038e4047a00 in ?? () from /lib64/libc.so.6
#9 0x in ?? ()


Pranith
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


[Gluster-devel] regarding spurious failure tests/bugs/snapshot/bug-1162498.t

2015-05-03 Thread Pranith Kumar Karampuri

hi Vijai,
  I am not sure if you are maintaining this now, but git blame 
gives your name, so sending the mail to you. Could you please take a 
look at 
http://build.gluster.org/job/rackspace-regression-2GB-triggered/8148/consoleFull 
where the failure happened. If someone else is looking into this, please 
add him/her to the thread.


Pranith
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


[Gluster-devel] Need help with snapshot regression failures

2015-05-03 Thread Pranith Kumar Karampuri

hi Rajesh/Avra,
 I do not have good understanding of snapshot, so couldn't 
investigate any of the snapshot related spurious failures present in 
https://public.pad.fsfe.org/p/gluster-spurious-failures. Could you guys 
help out?


Pranith
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] Need help with snapshot regression failures

2015-05-04 Thread Pranith Kumar Karampuri


On 05/04/2015 01:44 PM, Avra Sengupta wrote:

Hi Pranith,

Could you please provide  a regression instance where the snapshot 
tests failed. I had a look at 
http://build.gluster.org/job/rackspace-regression-2GB-triggered/8148/consoleFull 
but, the logs for bug-1162498.t are not present for that instance. 
Similarly other instances recorded in the etherpad either doesn't have 
the regression instances, or the logs for those instances are not 
present.

Do 'while prove -rfv ; do :; done'
After some runs you will see the failure. The test that took longest for 
me, I had to wait for around half hour.


Pranith


Regards,
Avra

On 05/04/2015 11:27 AM, Pranith Kumar Karampuri wrote:

hi Rajesh/Avra,
 I do not have good understanding of snapshot, so couldn't 
investigate any of the snapshot related spurious failures present in 
https://public.pad.fsfe.org/p/gluster-spurious-failures. Could you 
guys help out?


Pranith




___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] Regression test failures - Call for Action

2015-05-04 Thread Pranith Kumar Karampuri


On 05/05/2015 12:58 AM, Justin Clift wrote:

On 4 May 2015, at 08:06, Vijay Bellur  wrote:

Hi All,

There has been a spate of regression test failures (due to broken tests or race 
conditions showing up) in the recent past [1] and I am inclined to block 3.7.0 
GA along with acceptance of patches until we fix *all* regression test 
failures. We seem to have reached a point where this seems to be the only way 
to restore sanity to our regression runs.

I plan to put this into effect 24 hours from now i.e. around 0700 UTC on 05/05. 
Thoughts?

Please do this. :)
What happened to NetBSD setup connection? Lot of them are failing with: 
http://build.gluster.org/job/rackspace-netbsd7-regression-triggered/4528/console


Pranith


+ Justin



Thanks,
Vijay

[1] https://public.pad.fsfe.org/p/gluster-spurious-failures
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel

--
GlusterFS - http://www.gluster.org

An open source, distributed file system scaling to several
petabytes, and handling thousands of clients.

My personal twitter: twitter.com/realjustinclift

___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


[Gluster-devel] bitrot spurious test tests/bugs/bitrot/1207029-bitrot-daemon-should-start-on-valid-node.t

2015-05-04 Thread Pranith Kumar Karampuri

hi,
 I fixed it along with the patch on which this test failed 
@http://review.gluster.org/10391. Letting everyone know in case they 
face the same issue.


Pranith
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] Regression test failures - Call for Action

2015-05-04 Thread Pranith Kumar Karampuri


On 05/05/2015 06:12 AM, Pranith Kumar Karampuri wrote:


On 05/05/2015 12:58 AM, Justin Clift wrote:

On 4 May 2015, at 08:06, Vijay Bellur  wrote:

Hi All,

There has been a spate of regression test failures (due to broken 
tests or race conditions showing up) in the recent past [1] and I am 
inclined to block 3.7.0 GA along with acceptance of patches until we 
fix *all* regression test failures. We seem to have reached a point 
where this seems to be the only way to restore sanity to our 
regression runs.


I plan to put this into effect 24 hours from now i.e. around 0700 
UTC on 05/05. Thoughts?

Please do this. :)
What happened to NetBSD setup connection? Lot of them are failing 
with: 
http://build.gluster.org/job/rackspace-netbsd7-regression-triggered/4528/console
Jeff's patch failed again with same problem: 
http://build.gluster.org/job/rackspace-netbsd7-regression-triggered/4531/console


Pranith


Pranith


+ Justin



Thanks,
Vijay

[1] https://public.pad.fsfe.org/p/gluster-spurious-failures
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel

--
GlusterFS - http://www.gluster.org

An open source, distributed file system scaling to several
petabytes, and handling thousands of clients.

My personal twitter: twitter.com/realjustinclift

___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] Regression test failures - Call for Action

2015-05-04 Thread Pranith Kumar Karampuri
Just saw two more failures in the same place for netbsd regressions. I 
am ignoring NetBSD status for the test fixes for now. I am not sure how 
this needs to be fixed. Please help!


Pranith
On 05/05/2015 07:17 AM, Pranith Kumar Karampuri wrote:


On 05/05/2015 06:12 AM, Pranith Kumar Karampuri wrote:


On 05/05/2015 12:58 AM, Justin Clift wrote:

On 4 May 2015, at 08:06, Vijay Bellur  wrote:

Hi All,

There has been a spate of regression test failures (due to broken 
tests or race conditions showing up) in the recent past [1] and I 
am inclined to block 3.7.0 GA along with acceptance of patches 
until we fix *all* regression test failures. We seem to have 
reached a point where this seems to be the only way to restore 
sanity to our regression runs.


I plan to put this into effect 24 hours from now i.e. around 0700 
UTC on 05/05. Thoughts?

Please do this. :)
What happened to NetBSD setup connection? Lot of them are failing 
with: 
http://build.gluster.org/job/rackspace-netbsd7-regression-triggered/4528/console
Jeff's patch failed again with same problem: 
http://build.gluster.org/job/rackspace-netbsd7-regression-triggered/4531/console


Pranith


Pranith


+ Justin



Thanks,
Vijay

[1] https://public.pad.fsfe.org/p/gluster-spurious-failures
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel

--
GlusterFS - http://www.gluster.org

An open source, distributed file system scaling to several
petabytes, and handling thousands of clients.

My personal twitter: twitter.com/realjustinclift

___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] Regression test failures - Call for Action

2015-05-04 Thread Pranith Kumar Karampuri


On 05/05/2015 08:10 AM, Jeff Darcy wrote:

Jeff's patch failed again with same problem:
http://build.gluster.org/job/rackspace-netbsd7-regression-triggered/4531/console

Wouldn't have expected anything different.  This one looks like a
problem in the Jenkins/Gerrit infrastructure.

Sorry for the mis-communication, I was referring to the same infra problem.

Pranith

___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] ec spurious regression failures

2015-05-04 Thread Pranith Kumar Karampuri

Vijai/Sachin,
   Did you get a chance to work on this? 
http://review.gluster.com/10166 failed Just now again in ec because 
http://review.gluster.org/10069 is merged yesterday which can lead to 
same problem. I sent http://review.gluster.org/10539 to address the 
issue for now.  Please look into this.


Pranith
On 05/01/2015 11:30 AM, Pranith Kumar Karampuri wrote:

hi,
 I see that ec tests are failing because of 'df -h' test failure. 
It failing because dh -h fails on stale quota aux mount from the looks 
of it.


df: `/var/run/gluster/patchy': Transport endpoint is not connected 
<<<<-

[07:38:37] ./tests/basic/ec/ec-3-1.t .. not ok 11
 I see that ec tests are failing because of 'df -h' test failure 
which is failing because it is failing aux mount df -h failure which 
is done by quota.


I see from the test tests/bugs/quota/bug-1049323.t
gluster volume stop should stop the quota mount, so I sent 
http://review.gluster.org/10480


I think it is better to unmount it in 'cleanup' as well, but I am not 
sure how to get the 'run' directory in a generic way. On my laptop it 
is mounted on '/run/gluster/' instead of 
'/var/run/gluster/'


Sachin/Vijai,
Could you guys do the necessary things to get this in 'cleanup'?

Thanks
Pranith



___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


[Gluster-devel] spurious failure in quota-nfs.t

2015-05-04 Thread Pranith Kumar Karampuri

hi Vijai/Sachin,
http://build.gluster.org/job/rackspace-regression-2GB-triggered/8268/console
Doesn't seem like an obvious failure. Know anything about it?

Pranith
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] ec spurious regression failures

2015-05-04 Thread Pranith Kumar Karampuri


On 05/05/2015 08:39 AM, Pranith Kumar Karampuri wrote:

Vijai/Sachin,
   Did you get a chance to work on this? 
http://review.gluster.com/10166 failed Just now again in ec because 
http://review.gluster.org/10069 is merged yesterday which can lead to 
same problem. I sent http://review.gluster.org/10539 to address the 
issue for now.  Please look into this.
I sent http://review.gluster.org/10540 to address it completely. Not 
sure if it works on netBSD. Emmanuel help!!


Pranith


Pranith
On 05/01/2015 11:30 AM, Pranith Kumar Karampuri wrote:

hi,
 I see that ec tests are failing because of 'df -h' test failure. 
It failing because dh -h fails on stale quota aux mount from the 
looks of it.


df: `/var/run/gluster/patchy': Transport endpoint is not connected 
<<<<-

[07:38:37] ./tests/basic/ec/ec-3-1.t .. not ok 11
 I see that ec tests are failing because of 'df -h' test failure 
which is failing because it is failing aux mount df -h failure which 
is done by quota.


I see from the test tests/bugs/quota/bug-1049323.t
gluster volume stop should stop the quota mount, so I sent 
http://review.gluster.org/10480


I think it is better to unmount it in 'cleanup' as well, but I am not 
sure how to get the 'run' directory in a generic way. On my laptop it 
is mounted on '/run/gluster/' instead of 
'/var/run/gluster/'


Sachin/Vijai,
Could you guys do the necessary things to get this in 'cleanup'?

Thanks
Pranith



___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel




___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


[Gluster-devel] spurious failure in tests/geo-rep/georep-rsync-changelog.t

2015-05-04 Thread Pranith Kumar Karampuri

hi,
   Doesn't seem like obvious failure. It does say there is 
version mismatch, I wonder how? Could you look into it.


Gluster version mismatch between master and slave.
Geo-replication session between master and slave21.cloud.gluster.org::slave 
does not exist.
[08:27:15] ./tests/geo-rep/georep-rsync-changelog.t ..
Dubious, test returned 1 (wstat 256, 0x100)
Failed 14/17 subtests

 * ./tests/geo-rep/georep-rsync-changelog.t (Wstat: 256 Tests: 3 Failed: 0)

 * Non-zero exit status: 1

 * http://build.gluster.org/job/rackspace-regression-2GB-triggered/8168/console

Pranith

___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


[Gluster-devel] spurious failures in tests/bugs/snapshot/bug-1112559.t

2015-05-04 Thread Pranith Kumar Karampuri

Avra,
  Is it reproducible on your setup? If not do you want to move it 
to end of the page in 
https://public.pad.fsfe.org/p/gluster-spurious-failures


Pranith
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


[Gluster-devel] spurious failures in tests/basic/volume-snapshot-clone.t

2015-05-04 Thread Pranith Kumar Karampuri

hi Avra/Rajesh,
Any update on this test?

 * tests/basic/volume-snapshot-clone.t

 * http://review.gluster.org/#/c/10053/

 * Came back on April 9

 * http://build.gluster.org/job/rackspace-regression-2GB-triggered/6658/



Pranith
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] spurious failure in tests/geo-rep/georep-rsync-changelog.t

2015-05-04 Thread Pranith Kumar Karampuri


On 05/05/2015 10:31 AM, Kotresh Hiremath Ravishankar wrote:

Geo-rep runs /usr/local/libexec/glusterfs/gverify.sh to compare
gluster version between master and slave volume. It runs following
command
gluster --version | head -1 | cut -f2 -d " "
locally in the master and over ssh in slave.
You can probably fix this in status by improving the output to see what 
output it is giving so that you can make appropriate changes?


Pranith


If for some reason, version returned is empty string. It could
happen. But I don't see any reason for it to happening for two volumes
running on same node.

Is there any scenario the above command gives empty string?


Thanks and Regards,
Kotresh H R

- Original Message -

From: "Pranith Kumar Karampuri" 
To: "Aravinda Vishwanathapura Krishna Murthy" , "Kotresh 
Hiremath Ravishankar"

Cc: "Gluster Devel" 
Sent: Tuesday, May 5, 2015 8:57:26 AM
Subject: spurious failure in tests/geo-rep/georep-rsync-changelog.t

hi,
 Doesn't seem like obvious failure. It does say there is
version mismatch, I wonder how? Could you look into it.

Gluster version mismatch between master and slave.
Geo-replication session between master and slave21.cloud.gluster.org::slave
does not exist.
[08:27:15] ./tests/geo-rep/georep-rsync-changelog.t ..
Dubious, test returned 1 (wstat 256, 0x100)
Failed 14/17 subtests

   * ./tests/geo-rep/georep-rsync-changelog.t (Wstat: 256 Tests: 3 Failed: 0)

   * Non-zero exit status: 1

   *
   http://build.gluster.org/job/rackspace-regression-2GB-triggered/8168/console

Pranith




___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] spurious failures in tests/basic/volume-snapshot-clone.t

2015-05-04 Thread Pranith Kumar Karampuri


On 05/05/2015 10:32 AM, Avra Sengupta wrote:

Hi,

As already discussed, if you encounter this or any other snapshot 
tests, it would be great to provide the regression run instance so 
that we can have a look at the logs if there are any. Also I tried 
running the test in a loop as you suggested. After an hour and a half 
I stopped it so that I can use my machines to work on some patches. So 
please let us know when this or any snapshot tests fails for anyone 
and we will look into it asap.

Please read the mail again to find the link which has the logs.

./tests/basic/volume-snapshot-clone.t   
(Wstat: 0 Tests: 41 Failed: 3)
  Failed tests:  36, 38, 40



Pranith


Regards,
Avra

On 05/05/2015 09:01 AM, Pranith Kumar Karampuri wrote:

hi Avra/Rajesh,
Any update on this test?

  * tests/basic/volume-snapshot-clone.t

  * http://review.gluster.org/#/c/10053/

  * Came back on April 9

  * http://build.gluster.org/job/rackspace-regression-2GB-triggered/6658/



Pranith




___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] spurious failures in tests/basic/volume-snapshot-clone.t

2015-05-04 Thread Pranith Kumar Karampuri


On 05/05/2015 10:48 AM, Avra Sengupta wrote:

On 05/05/2015 10:43 AM, Pranith Kumar Karampuri wrote:


On 05/05/2015 10:32 AM, Avra Sengupta wrote:

Hi,

As already discussed, if you encounter this or any other snapshot 
tests, it would be great to provide the regression run instance so 
that we can have a look at the logs if there are any. Also I tried 
running the test in a loop as you suggested. After an hour and a 
half I stopped it so that I can use my machines to work on some 
patches. So please let us know when this or any snapshot tests fails 
for anyone and we will look into it asap.

Please read the mail again to find the link which has the logs.
./tests/basic/volume-snapshot-clone.t   
(Wstat: 0 Tests: 41 Failed: 3)
   Failed tests:  36, 38, 40
As repeatedly told, older regression run doesn't have the logs any 
more. Please find the link and try and fetch the logs. Please tell me 
if I am missing something here.


[root@VM1 lab]# wget 
http://slave33.cloud.gluster.org/logs/glusterfs-logs-20150409:09:27:03.tgz 
.
--2015-05-05 10:47:18-- 
http://slave33.cloud.gluster.org/logs/glusterfs-logs-20150409:09:27:03.tgz

Resolving slave33.cloud.gluster.org... 104.130.217.7
Connecting to slave33.cloud.gluster.org|104.130.217.7|:80... failed: 
Connection refused.

--2015-05-05 10:47:19-- http://./
Resolving  failed: No address associated with hostname.
wget: unable to resolve host address “.”
[root@VM1 lab]#

Ah! my bad, will let you know if it happens again.

Pranith


Regards,
Avra



Pranith


Regards,
Avra

On 05/05/2015 09:01 AM, Pranith Kumar Karampuri wrote:

hi Avra/Rajesh,
Any update on this test?

  * tests/basic/volume-snapshot-clone.t

  * http://review.gluster.org/#/c/10053/

  * Came back on April 9

  * http://build.gluster.org/job/rackspace-regression-2GB-triggered/6658/



Pranith








___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] ec spurious regression failures

2015-05-05 Thread Pranith Kumar Karampuri


On 05/05/2015 01:35 PM, Vijay Bellur wrote:

On 05/05/2015 11:40 AM, Emmanuel Dreyfus wrote:

Emmanuel Dreyfus  wrote:


I sent http://review.gluster.org/10540 to address it completely. Not
sure if it works on netBSD. Emmanuel help!!


I launched test runs in a loop on nbslave70. More later.


Failed on first pass:
Test Summary Report
---
./tests/basic/ec/ec-3-1.t(Wstat: 0 Tests: 217 Failed: 4)
   Failed tests:  133-134, 138-139
./tests/basic/ec/ec-4-1.t(Wstat: 0 Tests: 253 Failed: 6)
   Failed tests:  152-153, 157-158, 162-163
./tests/basic/ec/ec-5-1.t(Wstat: 0 Tests: 289 Failed: 8)
   Failed tests:  171-172, 176-177, 181-182, 186-187
./tests/basic/ec/ec-readdir.t(Wstat: 0 Tests: 9 Failed: 1)
   Failed test:  9
./tests/basic/ec/quota.t (Wstat: 0 Tests: 24 Failed: 1)
   Failed test:  24





In addition ec-12-4.t has started failing again [1]. Have added a note 
about this to the etherpad.
Already updated the status about this in the earlier mail. 
http://review.gluster.org/10539 is the fix.


-Vijay

[1] 
http://build.gluster.org/job/rackspace-regression-2GB-triggered/8312/consoleFull


___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] ec spurious regression failures

2015-05-05 Thread Pranith Kumar Karampuri


On 05/05/2015 01:54 PM, Emmanuel Dreyfus wrote:

On Tue, May 05, 2015 at 01:45:03PM +0530, Pranith Kumar Karampuri wrote:

Already updated the status about this in the earlier mail.
http://review.gluster.org/10539 is the fix.

That one only touches bug-1202244-support-inode-quota.t ...

RCA: http://www.gluster.org/pipermail/gluster-devel/2015-May/044799.html

Pranith




___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] ec spurious regression failures

2015-05-05 Thread Pranith Kumar Karampuri

I am yet to debug why it is failing in these new tests for netBSD.

Pranith
On 05/05/2015 01:54 PM, Emmanuel Dreyfus wrote:

On Tue, May 05, 2015 at 01:45:03PM +0530, Pranith Kumar Karampuri wrote:

Already updated the status about this in the earlier mail.
http://review.gluster.org/10539 is the fix.

That one only touches bug-1202244-support-inode-quota.t ...



___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


[Gluster-devel] spurious failure in tests/bugs/cli/bug-1087487.t

2015-05-05 Thread Pranith Kumar Karampuri

Gaurav,
 Please look into 
http://build.gluster.org/job/rackspace-regression-2GB-triggered/8409/console


Pranith
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


[Gluster-devel] core while running tests/bugs/snapshot/bug-1112559.t

2015-05-05 Thread Pranith Kumar Karampuri

hi,
Could you please look at this issue: 
http://build.gluster.org/job/rackspace-regression-2GB-triggered/8456/consoleFull


Pranith
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


<    1   2   3   4   5   6   7   8   9   10   >