Re: [Gluster-devel] glusterd crashing

2016-03-05 Thread Joseph Fernandes
http://www.gluster.org/community/documentation/index.php/Archives/Development_Work_Flow

http://www.gluster.org/community/documentation/index.php/Simplified_dev_workflow

Leaving the fun of exploration to you :)

~Joe

- Original Message -
> From: "Ajil Abraham" 
> To: "Atin Mukherjee" 
> Cc: "Joseph Fernandes" , "Gluster Devel" 
> 
> Sent: Saturday, March 5, 2016 10:06:37 PM
> Subject: Re: [Gluster-devel] glusterd crashing
> 
> Sure Atin.  I am itching to contribute code. But worried due to lack of
> experience in sending patches. Can somebody please send me across how to do
> this? Consider me a total newbie and please be as descriptive as possible
> :).
> 
> -Ajil
> 
> On Sat, Mar 5, 2016 at 12:46 PM, Atin Mukherjee 
> wrote:
> 
> > -Atin
> > Sent from one plus one
> > On 05-Mar-2016 11:46 am, "Ajil Abraham"  wrote:
> > >
> > > Thanks for all the support.  After handling the input validation in my
> > code, Glusterd no longer crashes.  I am still waiting for clearance from my
> > superior to pass on all the details. Expecting him to revert by this
> > Sunday.
> > Great to know that and we appreciate your contribution, if you happen to
> > find any issues feel free to send patches :)
> > >
> > > - Ajil
> > >
> > > On Fri, Mar 4, 2016 at 10:20 AM, Joseph Fernandes 
> > wrote:
> > >>
> > >> Well that may not be completely correct !
> > >>
> > >> Its  "gluster volume status all", unlike volume maintenance operation
> > which are rare.
> > >>
> > >> Status can be issued multiple times in a day or might be put in a
> > script/cron-job to check the health of the
> > >> cluster.
> > >> But anyways the fix is ready as the bug says.
> > >>
> > >> Crash is what we need to worry about.
> > >>
> > >> ~Joe
> > >>
> > >> - Original Message -
> > >> > From: "Atin Mukherjee" 
> > >> > To: "Joseph Fernandes" , "Atin Mukherjee" <
> > atin.mukherje...@gmail.com>
> > >> > Cc: "Gluster Devel" , "Ajil Abraham" <
> > ajil95.abra...@gmail.com>
> > >> > Sent: Friday, March 4, 2016 9:37:43 AM
> > >> > Subject: Re: [Gluster-devel] glusterd crashing
> > >> >
> > >> >
> > >> >
> > >> > On 03/04/2016 07:10 AM, Joseph Fernandes wrote:
> > >> > > Might be this bug can give some context on the mem-leak (fix
> > recently
> > >> > > merged on master but not on 3.7.x)
> > >> > >
> > >> > > https://bugzilla.redhat.com/show_bug.cgi?id=1287517
> > >> > Yes, this is what we'd be fixing in 3.7.x too, but if you refer to [1]
> > >> > the hike is seen when a command is run in a loop which is typically
> > not
> > >> > a use case in any production setup.
> > >> >
> > >> > [1] https://bugzilla.redhat.com/show_bug.cgi?id=1287517#c15
> > >> > >
> > >> > > ~Joe
> > >> > >
> > >> > >
> > >> > > - Original Message -
> > >> > >> From: "Atin Mukherjee" 
> > >> > >> To: "Joseph Fernandes" 
> > >> > >> Cc: "Gluster Devel" , "Ajil Abraham"
> > >> > >> 
> > >> > >> Sent: Friday, March 4, 2016 7:01:54 AM
> > >> > >> Subject: Re: [Gluster-devel] glusterd crashing
> > >> > >>
> > >> > >> -Atin
> > >> > >> Sent from one plus one
> > >> > >> On 04-Mar-2016 6:12 am, "Joseph Fernandes" 
> > wrote:
> > >> > >>>
> > >> > >>> Hi Ajil,
> > >> > >>>
> > >> > >>> Well few things,
> > >> > >>>
> > >> > >>> 1. Whenever you see a crash its better to send across the
> > Backtrace(BT)
> > >> > >> using gdb and attach the log files (or share it via some cloud
> > drive)
> > >> > >>>
> > >> > >>> 2. About the memory leak, What kind of tools are you using for
> > profiling
> > >> > >> memory, valgrind ? if so please attach the valgrind reports.
> > >> > >>>$> glusterd --xlator-option *.run-with-valgrind=yes
> > >> > >>>
> > >> > >>> 3. Well I am not sure if glusterd uses any of the mempools as we
> > do in
> > >> > >> client and brick processes, Atin can shed some light on this.
> > >> > >>>Well In that case you can used the statedump mechanism check
> > for
> > >> > >> mem-leaks check the glusterfs/doc/debugging/statedump.md
> > >> > >> GlusterD does use mempool and it has infra for capturing statedump
> > as
> > >> > >> well.
> > >> > >> I am aware of few bytes of memory leaks in few paths which is
> > really not a
> > >> > >> huge concern but it shouldn't crash.
> > >> > >>>
> > >> > >>> Hope this helps
> > >> > >>>
> > >> > >>> ~Joe
> > >> > >>>
> > >> > >>>
> > >> > >>> - Original Message -
> > >> >  From: "Ajil Abraham" 
> > >> >  To: "Atin Mukherjee" 
> > >> >  Cc: "Gluster Devel" 
> > >> >  Sent: Thursday, March 3, 2016 10:48:56 PM
> > >> >  Subject: Re: [Gluster-devel] 

Re: [Gluster-devel] glusterd crashing

2016-03-05 Thread Ajil Abraham
Sure Atin.  I am itching to contribute code. But worried due to lack of
experience in sending patches. Can somebody please send me across how to do
this? Consider me a total newbie and please be as descriptive as possible
:).

-Ajil

On Sat, Mar 5, 2016 at 12:46 PM, Atin Mukherjee 
wrote:

> -Atin
> Sent from one plus one
> On 05-Mar-2016 11:46 am, "Ajil Abraham"  wrote:
> >
> > Thanks for all the support.  After handling the input validation in my
> code, Glusterd no longer crashes.  I am still waiting for clearance from my
> superior to pass on all the details. Expecting him to revert by this Sunday.
> Great to know that and we appreciate your contribution, if you happen to
> find any issues feel free to send patches :)
> >
> > - Ajil
> >
> > On Fri, Mar 4, 2016 at 10:20 AM, Joseph Fernandes 
> wrote:
> >>
> >> Well that may not be completely correct !
> >>
> >> Its  "gluster volume status all", unlike volume maintenance operation
> which are rare.
> >>
> >> Status can be issued multiple times in a day or might be put in a
> script/cron-job to check the health of the
> >> cluster.
> >> But anyways the fix is ready as the bug says.
> >>
> >> Crash is what we need to worry about.
> >>
> >> ~Joe
> >>
> >> - Original Message -
> >> > From: "Atin Mukherjee" 
> >> > To: "Joseph Fernandes" , "Atin Mukherjee" <
> atin.mukherje...@gmail.com>
> >> > Cc: "Gluster Devel" , "Ajil Abraham" <
> ajil95.abra...@gmail.com>
> >> > Sent: Friday, March 4, 2016 9:37:43 AM
> >> > Subject: Re: [Gluster-devel] glusterd crashing
> >> >
> >> >
> >> >
> >> > On 03/04/2016 07:10 AM, Joseph Fernandes wrote:
> >> > > Might be this bug can give some context on the mem-leak (fix
> recently
> >> > > merged on master but not on 3.7.x)
> >> > >
> >> > > https://bugzilla.redhat.com/show_bug.cgi?id=1287517
> >> > Yes, this is what we'd be fixing in 3.7.x too, but if you refer to [1]
> >> > the hike is seen when a command is run in a loop which is typically
> not
> >> > a use case in any production setup.
> >> >
> >> > [1] https://bugzilla.redhat.com/show_bug.cgi?id=1287517#c15
> >> > >
> >> > > ~Joe
> >> > >
> >> > >
> >> > > - Original Message -
> >> > >> From: "Atin Mukherjee" 
> >> > >> To: "Joseph Fernandes" 
> >> > >> Cc: "Gluster Devel" , "Ajil Abraham"
> >> > >> 
> >> > >> Sent: Friday, March 4, 2016 7:01:54 AM
> >> > >> Subject: Re: [Gluster-devel] glusterd crashing
> >> > >>
> >> > >> -Atin
> >> > >> Sent from one plus one
> >> > >> On 04-Mar-2016 6:12 am, "Joseph Fernandes" 
> wrote:
> >> > >>>
> >> > >>> Hi Ajil,
> >> > >>>
> >> > >>> Well few things,
> >> > >>>
> >> > >>> 1. Whenever you see a crash its better to send across the
> Backtrace(BT)
> >> > >> using gdb and attach the log files (or share it via some cloud
> drive)
> >> > >>>
> >> > >>> 2. About the memory leak, What kind of tools are you using for
> profiling
> >> > >> memory, valgrind ? if so please attach the valgrind reports.
> >> > >>>$> glusterd --xlator-option *.run-with-valgrind=yes
> >> > >>>
> >> > >>> 3. Well I am not sure if glusterd uses any of the mempools as we
> do in
> >> > >> client and brick processes, Atin can shed some light on this.
> >> > >>>Well In that case you can used the statedump mechanism check
> for
> >> > >> mem-leaks check the glusterfs/doc/debugging/statedump.md
> >> > >> GlusterD does use mempool and it has infra for capturing statedump
> as
> >> > >> well.
> >> > >> I am aware of few bytes of memory leaks in few paths which is
> really not a
> >> > >> huge concern but it shouldn't crash.
> >> > >>>
> >> > >>> Hope this helps
> >> > >>>
> >> > >>> ~Joe
> >> > >>>
> >> > >>>
> >> > >>> - Original Message -
> >> >  From: "Ajil Abraham" 
> >> >  To: "Atin Mukherjee" 
> >> >  Cc: "Gluster Devel" 
> >> >  Sent: Thursday, March 3, 2016 10:48:56 PM
> >> >  Subject: Re: [Gluster-devel] glusterd crashing
> >> > 
> >> >  Hi Atin,
> >> > 
> >> >  The inputs I use are as per the requirements of a project I am
> working
> >> > >> on for
> >> >  one of the large finance institutions in Dubai. I will try to
> handle the
> >> >  input validation within my code. I uncovered some of the issues
> while
> >> > >> doing
> >> >  a thorough testing of my code.
> >> > 
> >> >  I tried with 3.7.6 and also my own build from master branch. I
> will
> >> > >> check
> >> >  with my superiors before sending you backtrace and other
> details. So
> >> > >> far, I
> >> >  have seen memory leak in 100s of KBs.
> >> > 
> >> >  -Ajil
> >> > 
> >> > 
> >> >  On Thu, Mar 3, 2016 at 10:17 PM, Atin Mukherjee <
> >> > 

[Gluster-devel] Jenkins regression for release-3.7 messed up?

2016-03-05 Thread Ravishankar N

'brick_up_status' is used by the following .ts in release-3.7

[root@ravi1 glusterfs]# git grep -w brick_up_status

tests/bugs/bitrot/bug-1288490.t:EXPECT_WITHIN $PROCESS_UP_TIMEOUT "Y" 
brick_up_status $V0 $H0 $B0/brick0
tests/bugs/bitrot/bug-1288490.t:EXPECT_WITHIN $PROCESS_UP_TIMEOUT "Y" 
brick_up_status $V0 $H0 $B0/brick1
tests/bugs/glusterd/bug-1225716-brick-online-validation-remove-brick.t:EXPECT_WITHIN 
$PROCESS_UP_TIMEOUT "Y" brick_up_status $V0 $H0 $B0/${V0}1
tests/bugs/glusterd/bug-857330/normal.t:EXPECT_WITHIN 
$PROCESS_UP_TIMEOUT "Y" brick_up_status $V0 $H0 $B0/${V0}3
tests/bugs/glusterd/bug-857330/xml.t:EXPECT_WITHIN $PROCESS_UP_TIMEOUT 
"Y" brick_up_status $V0 $H0 $B0/${V0}3


There seems to be a bug in this function. (It is another matter that the 
function is different in master but let us ignore that for now).  So all 
these tests should fail on release-3.7 and they do fail on my 
machine.*But for some reason, they succeed on jenkins. Why is that? They 
are not in bad_tests on 3.7 either.**

*
This fixes the function:
diff --git a/tests/volume.rc b/tests/volume.rc
index 9bd9eca..6040c5f 100644
--- a/tests/volume.rc
+++ b/tests/volume.rc
@@ -24,7 +24,7 @@ function brick_up_status {
 local host=$2
 local brick=$3
 brick_pid=$(get_brick_pid $vol $host $brick)
-gluster volume status | grep $brick_pid | awk '{print $4}'
+gluster volume status | grep $brick_pid | awk '{print $5}'
 }

and all the tests pass with the fix on my machine. I had send the fix as 
a part of a patch[1] and it *fails* on jenkins.Why?


Thanks,
Ravi


[1] http://review.gluster.org/#/c/13609/2



___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] Default quorum for 2 way replication

2016-03-05 Thread Pranith Kumar Karampuri



On 03/04/2016 08:36 PM, Shyam wrote:

On 03/04/2016 07:30 AM, Pranith Kumar Karampuri wrote:



On 03/04/2016 05:47 PM, Bipin Kunal wrote:

HI Pranith,

Thanks for starting this mail thread.

Looking from a user perspective most important is to get a "good copy"
of data.  I agree that people use replication for HA but having stale
data with HA will not have any value.
So I will suggest to make auto quorum as default configuration even
for 2-way replication.

If user is willing to lose data at the cost of HA, he always have
option disable it. But default preference should be data and its
integrity.


I think we need to consider *maintenance* activities on the volume, 
like replacing a brick in a replica pair, or upgrading one half of the 
replica and then the other, at which time the replica group would 
function read-only, if we choose 'auto' in a 2-way replicated state, 
is this correct?


Yes.



Having said the above, we already have the option in place, right? I.e 
admins can already choose 'auto', it is just the default that we are 
discussing. This could also be tackled via documentation/best 
practices ("yeah right! who reads those again?" is a valid comment here).


Yes. I just sent a reply to Jeff, where I told it is better to have 
interactive question at the time of creating 2-way replica volume which 
gives this information :-).




I guess we need to be clear (in documentation or otherwise) what they 
get when they choose one over the other (like the HA point below and 
also upgrade concerns etc.), irrespective of how this discussion ends 
(just my 2 c's).


Totally agree. We will give an interactive question above, a link which 
gives detailed explanation.






That is the point. There is an illusion of choice between Data integrity
and HA. But we are not *really* giving HA, are we? HA will be there only
if second brick in the replica pair goes down. In your typical


@Pranith, can you elaborate on this? I am not so AFR savvy, so unable 
to comprehend why HA is available if only when the second brick goes 
down and is not when the first does. Just helps in understanding the 
issue at hand.


Because it is client side replication there is a fixed *leader* i.e. 1st 
brick.


As a side note. We recently had a discussion with NSR team (Jeff, avra). 
We will be using some infra for NSR to implement server side afr as well 
with leader election etc.


Pranith



deployment, we can't really give any guarantees about what brick will go
down when. So I am not sure if we can consider it as HA. But I would
love to hear what others have to say about this as well. If majority of
users say they need it to be auto, you will definitely see a patch :-).

Pranith


Thanks,
Bipin Kunal

On Fri, Mar 4, 2016 at 5:43 PM, Ravishankar N 
wrote:

On 03/04/2016 05:26 PM, Pranith Kumar Karampuri wrote:

hi,
  So far default quorum for 2-way replication is 'none' (i.e.
files/directories may go into split-brain) and for 3-way replication
and
arbiter based replication it is 'auto' (files/directories won't go 
into

split-brain). There are requests to make default as 'auto' for 2-way
replication as well. The line of reasoning is that people value data
integrity (files not going into split-brain) more than HA 
(operation of

mount even when bricks go down). And admins should explicitly change
it to
'none' when they are fine with split-brains in 2-way replication. We
were
wondering if you have any inputs about what is a sane default for 
2-way

replication.

I like the default to be 'none'. Reason: If we have 'auto' as quorum
for
2-way replication and first brick dies, there is no HA.



+1.  Quorum does not make sense when there are only 2 parties. There
is no
majority voting. Arbiter volumes are a better option.
If someone wants some background, please see 'Client quorum' and
'Replica 2
and Replica 3 volumes' section of
http://gluster.readthedocs.org/en/latest/Administrator%20Guide/arbiter-volumes-and-quorum/ 




-Ravi

If users are fine with it, it is better to use plain distribute 
volume

rather than replication with quorum as 'auto'. What are your
thoughts on the
matter? Please guide us in the right direction.

Pranith





___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


[Gluster-devel] Arbiter brick size estimation

2016-03-05 Thread Oleksandr Natalenko
In order to estimate GlusterFS arbiter brick size, I've deployed test setup 
with replica 3 arbiter 1 volume within one node. Each brick is located on 
separate HDD (XFS with inode size == 512). Using GlusterFS v3.7.6 + memleak 
patches. Volume options are kept default.

Here is the script that creates files and folders in mounted volume: [1]

The script creates 1M of files of random size (between 1 and 32768 bytes) and 
some amount of folders. After running it I've got 1036637 folders. So, in 
total it is 2036637 files and folders.

The initial used space on each brick is 42M . After running script I've got:

replica brick 1 and 2: 19867168 kbytes == 19G
arbiter brick: 1872308 kbytes == 1.8G

The amount of inodes on each brick is 3139091. So here goes estimation.

Dividing arbiter used space by files+folders we get:

(1872308 - 42000)/2036637 == 899 bytes per file or folder

Dividing arbiter used space by inodes we get:

(1872308 - 42000)/3139091 == 583 bytes per inode

Not sure about what calculation is correct. I guess we should consider the one 
that accounts inodes because of .glusterfs/ folder data.

Nevertheless, in contrast, documentation [2] says it should be 4096 bytes per 
file. Am I wrong with my calculations?

Pranith?

[1] http://termbin.com/ka9x
[2] 
http://gluster.readthedocs.org/en/latest/Administrator%20Guide/arbiter-volumes-and-quorum/
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] Default quorum for 2 way replication

2016-03-05 Thread Pranith Kumar Karampuri



On 03/04/2016 09:10 PM, Jeff Darcy wrote:

I like the default to be 'none'. Reason: If we have 'auto' as quorum for
2-way replication and first brick dies, there is no HA. If users are
fine with it, it is better to use plain distribute volume

"Availability" is a tricky word.  Does it mean access to data now, or
later despite failure?  Taking a volume down due to loss of quorum might
be equivalent to having no replication in the first sense, but certainly
not in the second.  When the possibility (likelihood?) of split brain is
considered, enforcing quorum actually does a *better* job of preserving
availability in the second sense.  I believe this second sense is most
often what users care about, and therefore quorum enforcement should be
the default.

I think we all agree that quorum is a bit slippery when N=2.  That's
where there really is a tradeoff between (immediate) availability and
(highest levels of) data integrity.  That's why arbiters showed up first
in the NSR specs, and later in AFR.  We should definitely try to push
people toward N>=3 as much as we can.  However, the ability to "scale
down" is one of the things that differentiate us vs. both our Ceph
cousins and our true competitors.  Many of our users will stop at N=2 no
matter what we say.  However unwise that might be, we must still do what
we can to minimize harm when things go awry.
I always felt 2-way replication, 3-way replication analogy is similar to 
2-wheeler(motor-bikes) and 4-wheeler vehicles(cars). You have more fatal 
accidents with 2-wheelers than 4-wheelers. But it has its place. Arbiter 
volumes is like a 3-wheeler(auto rickshaw) :-). I feel users should be 
given the power to choose what they want based on what they are looking 
for and how much hardware they want to buy (affordability). We should 
educate them about the risks but the final decision should be theirs. So 
in that sense I don't like to *push* them to N>=3.


   "Many of our users will stop at N=2 no matter what we say". That 
right there is what I had to realize, some years back. I naively thought 
that people will rush to replica-3 with client quorum, but it didn't 
happen. That is the reason for investing time in arbiter volumes as a 
solution. Because we wanted to reduce the cost. People didn't want to 
spend so much money for consistency(based on what we are still seeing). 
Fact of the matter is, even after arbiter volumes I am sure some people 
will stick with replica-2 with unsplit-brain patch from facebook (For 
people who don't know: it resolves split-brain based on policies 
automatically without human intervention, it will be available soon in 
gluster). You do have a very good point though. I think it makes sense 
to make more people aware of what they are getting into with 2-way 
replication. So may be an interactive question at the time of 2-way 
replica volume creation about the possibility of split-brains and 
availability of other options(like arbiter/unsplit-brain in 2-way 
replication) could be helpful, keeping the default still as 'none'. I 
think it would be better if we educate users about value of arbiter 
volumes, so that users naturally progress towards that and embrace it. 
We are seeing more and more questions on the IRC and mailing list about 
arbiter volumes, so there is a +ve trend.


Pranith
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel