Re: [Gluster-devel] Throttling xlator on the bricks

2016-01-25 Thread Joe Julian



On 01/25/16 18:24, Ravishankar N wrote:


On 01/26/2016 01:22 AM, Shreyas Siravara wrote:
Just out of curiosity, what benefits do we think this throttling 
xlator would provide over the "enable-least-priority" option (where 
we put all the fops from SHD, etc into a least pri queue)?




For one, it could provide more granularity on the amount of throttling 
you want to do, for specific fops, from specific clients. If the only 
I/O going through the bricks was from the SHD, they would all be 
least-priority but yet consume an unfair % of the CPU. We could tweak 
`performance.least-rate-limit` to throttle but it would be a global 
option.


Right, because as it is now, when shd is the only client, it queues up 
so much iops that higher prioritiy ops are still getting delayed.





On Jan 25, 2016, at 12:29 AM, Venky Shankar  
wrote:


On Mon, Jan 25, 2016 at 01:08:38PM +0530, Ravishankar N wrote:

On 01/25/2016 12:56 PM, Venky Shankar wrote:
Also, it would be beneficial to have the core TBF implementation 
as part of
libglusterfs so as to be consumable by the server side xlator 
component to
throttle dispatched FOPs and for daemons to throttle anything 
that's outside

"brick" boundary (such as cpu, etc..).
That makes sense. We were initially thinking to overload 
posix_rchecksum()

to do the SHA256 sums for the signer.
That does have advantages by avoiding network rountrips by computing 
SHA* locally.
TBF could still implement ->rchecksum and throttle that (on behalf 
of clients,
residing on the server - internal daemons). Placing the core 
implementation as

part of libglusterfs would still provide the flexibility.




___
Gluster-devel mailing list
Gluster-devel@gluster.org
https://urldefense.proofpoint.com/v2/url?u=http-3A__www.gluster.org_mailman_listinfo_gluster-2Ddevel=CwICAg=5VD0RTtNlTh3ycd41b3MUw=N7LE2BKIHDDBvkYkakYthA=9W9xtRg0TIEUvFL-8HpUCux8psoWKkUbEFiwqykRwH4=OVF0dZRXt8GFcIxsHlkbNjH-bjD9097q5hjVVHgOFkQ= 




___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] Throttling xlator on the bricks

2016-01-25 Thread Pranith Kumar Karampuri



On 01/26/2016 08:14 AM, Vijay Bellur wrote:

On 01/25/2016 12:36 AM, Ravishankar N wrote:

Hi,

We are planning to introduce a throttling xlator on the server (brick)
process to regulate FOPS. The main motivation is to solve complaints 
about

AFR selfheal taking too much of CPU resources. (due to too many fops for
entry
self-heal, rchecksums for data self-heal etc.)



I am wondering if we can re-use the same xlator for throttling 
bandwidth, iops etc. in addition to fops. Based on admin configured 
policies we could provide different upper thresholds to different 
clients/tenants and this could prove to be an useful feature in 
multitenant deployments to avoid starvation/noisy neighbor class of 
problems. Has any thought gone in this direction?


Nope. It was mainly about internal processes at the moment.





The throttling is achieved using the Token Bucket Filter algorithm
(TBF). TBF
is already used by bitrot's bitd signer (which is a client process) in
gluster to regulate the CPU intensive check-sum calculation. By 
putting the
logic on the brick side, multiple clients- selfheal, bitrot, 
rebalance or

even the mounts themselves can avail the benefits of throttling.

The TBF algorithm in a nutshell is as follows: There is a bucket which
is filled
at a steady (configurable) rate with tokens. Each FOP will need a fixed
amount
of tokens to be processed. If the bucket has that many tokens, the 
FOP is
allowed and that many tokens are removed from the bucket. If not, the 
FOP is

queued until the bucket is filled.

The xlator will need to reside above io-threads and can have different
buckets,
one per client. There has to be a communication mechanism between the
client and
the brick (IPC?) to tell what FOPS need to be regulated from it, and the
no. of
tokens needed etc. These need to be re configurable via appropriate
mechanisms.
Each bucket will have a token filler thread which will fill the tokens
in it.


If there is one bucket per client and one thread per bucket, it would 
be difficult to scale as the number of clients increase. How can we do 
this better?


It is same thread for all the buckets. Because the number of internal 
clients at the moment is in single digits. The problem statement we have 
right now doesn't consider what you are looking for.




The main thread will enqueue heals in a list in the bucket if there 
aren't

enough tokens. Once the token filler detects some FOPS can be serviced,
it will
send a cond-broadcast to a dequeue thread which will process (stack
wind) all
the FOPS that have the required no. of tokens from all buckets.

This is just a high level abstraction: requesting feedback on any 
aspect of
this feature. what kind of mechanism is best between the 
client/bricks for

tuning various parameters? What other requirements do you foresee?



I am in favor of having administrator defined policies or templates 
(collection of policies) being used to provide the tuning parameter 
per client or a set of clients. We could even have a default template 
per use case etc. Is there a specific need to have this negotiation 
between clients and servers?


Thanks,
Vijay

___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] distributed files/directories and [cm]time updates

2016-01-25 Thread Xavier Hernandez

Hi Pranith,

On 26/01/16 03:47, Pranith Kumar Karampuri wrote:

hi,
   Traditionally gluster has been using ctime/mtime of the
files/dirs on the bricks as stat output. Problem we are seeing with this
approach is that, software which depends on it gets confused when there
are differences in these times. Tar especially gives "file changed as we
read it" whenever it detects ctime differences when stat is served from
different bricks. The way we have been trying to solve it is to serve
the stat structures from same brick in afr, max-time in dht. But it
doesn't avoid the problem completely. Because there is no way to change
ctime at the moment(lutimes() only allows mtime, atime), there is little
we can do to make sure ctimes match after self-heals/xattr
updates/rebalance. I am wondering if anyone of you solved these problems
before, if yes how did you go about doing it? It seems like applications
which depend on this for backups get confused the same way. The only way
out I see it is to bring ctime to an xattr, but that will need more iops
and gluster has to keep updating it on quite a few fops.


I did think about this when I was writing ec at the beginning. The idea 
was that the point in time at which each fop is executed were controlled 
by the client by adding an special xattr to each regular fop. Of course 
this would require support inside the storage/posix xlator. At that 
time, adding the needed support to other xlators seemed too complex for 
me, so I decided to do something similar to afr.


Anyway, the idea was like this: for example, when a write fop needs to 
be sent, dht/afr/ec sets the current time in a special xattr, for 
example 'glusterfs.time'. It can be done in a way that if the time is 
already set by a higher xlator, it's not modified. This way DHT could 
set the time in fops involving multiple afr subvolumes. For other fops, 
would be afr who sets the time. It could also be set directly by the top 
most xlator (fuse), but that time could be incorrect because lower 
xlators could delay the fop execution and reorder it. This would need 
more thinking.


That xattr will be received by storage/posix. This xlator will determine 
what times need to be modified and will change them. In the case of a 
write, it can decide to modify mtime and, maybe, atime. For a mkdir or 
create, it will set the times of the new file/directory and also the 
mtime of the parent directory. It depends on the specific fop being 
processed.


mtime, atime and ctime (or even others) could be saved in a special 
posix xattr instead of relying on the file system attributes that cannot 
be modified (at least for ctime).


This solution doesn't require extra fops, So it seems quite clean to me. 
The additional I/O needed in posix could be minimized by implementing a 
metadata cache in storage/posix that would read all metadata on lookup 
and update it on disk only at regular intervals and/or on invalidation. 
All fops would read/write into the cache. This would even reduce the 
number of I/O we are currently doing for each fop.


Xavi
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] Throttling xlator on the bricks

2016-01-25 Thread Venky Shankar
On Tue, Jan 26, 2016 at 03:11:50AM +, Richard Wareing wrote:
> > If there is one bucket per client and one thread per bucket, it would be
> > difficult to scale as the number of clients increase. How can we do this
> > better?
> 
> On this note... consider that 10's of thousands of clients are not 
> unrealistic in production :).  Using a thread per bucket would also 
> beunwise..
> 
> On the idea in general, I'm just wondering if there's specific (real-world) 
> cases where this has even been an issue where least-prio queuing hasn't been 
> able to handle?  Or is this more of a theoretical concern?  I ask as I've not 
> really encountered situations where I wished I could give more FOPs to SHD vs 
> rebalance and such.
> 
> In any event, it might be worth having Shreyas detail his throttling feature 
> (that can throttle any directory hierarchy no less) to illustrate how a 
> simpler design can achieve similar results to these more complicated (and it 
> followsbug prone) approaches.

TBF isn't complicated at all - it's widely used for traffic shaping, cgroups, 
UML to rate limit disk I/O.

But, I won't hurry up on things and wait to hear out from Shreyas regarding his 
throttling design.

> 
> Richard
> 
> 
> From: gluster-devel-boun...@gluster.org [gluster-devel-boun...@gluster.org] 
> on behalf of Vijay Bellur [vbel...@redhat.com]
> Sent: Monday, January 25, 2016 6:44 PM
> To: Ravishankar N; Gluster Devel
> Subject: Re: [Gluster-devel] Throttling xlator on the bricks
> 
> On 01/25/2016 12:36 AM, Ravishankar N wrote:
> > Hi,
> >
> > We are planning to introduce a throttling xlator on the server (brick)
> > process to regulate FOPS. The main motivation is to solve complaints about
> > AFR selfheal taking too much of CPU resources. (due to too many fops for
> > entry
> > self-heal, rchecksums for data self-heal etc.)
> 
> 
> I am wondering if we can re-use the same xlator for throttling
> bandwidth, iops etc. in addition to fops. Based on admin configured
> policies we could provide different upper thresholds to different
> clients/tenants and this could prove to be an useful feature in
> multitenant deployments to avoid starvation/noisy neighbor class of
> problems. Has any thought gone in this direction?
> 
> >
> > The throttling is achieved using the Token Bucket Filter algorithm
> > (TBF). TBF
> > is already used by bitrot's bitd signer (which is a client process) in
> > gluster to regulate the CPU intensive check-sum calculation. By putting the
> > logic on the brick side, multiple clients- selfheal, bitrot, rebalance or
> > even the mounts themselves can avail the benefits of throttling.
> >
> > The TBF algorithm in a nutshell is as follows: There is a bucket which
> > is filled
> > at a steady (configurable) rate with tokens. Each FOP will need a fixed
> > amount
> > of tokens to be processed. If the bucket has that many tokens, the FOP is
> > allowed and that many tokens are removed from the bucket. If not, the FOP is
> > queued until the bucket is filled.
> >
> > The xlator will need to reside above io-threads and can have different
> > buckets,
> > one per client. There has to be a communication mechanism between the
> > client and
> > the brick (IPC?) to tell what FOPS need to be regulated from it, and the
> > no. of
> > tokens needed etc. These need to be re configurable via appropriate
> > mechanisms.
> > Each bucket will have a token filler thread which will fill the tokens
> > in it.
> 
> If there is one bucket per client and one thread per bucket, it would be
> difficult to scale as the number of clients increase. How can we do this
> better?
> 
> > The main thread will enqueue heals in a list in the bucket if there aren't
> > enough tokens. Once the token filler detects some FOPS can be serviced,
> > it will
> > send a cond-broadcast to a dequeue thread which will process (stack
> > wind) all
> > the FOPS that have the required no. of tokens from all buckets.
> >
> > This is just a high level abstraction: requesting feedback on any aspect of
> > this feature. what kind of mechanism is best between the client/bricks for
> > tuning various parameters? What other requirements do you foresee?
> >
> 
> I am in favor of having administrator defined policies or templates
> (collection of policies) being used to provide the tuning parameter per
> client or a set of clients. We could even have a default template per
> use case etc. Is there a specific need to have this negotiation between
> clients and servers?
> 
> Thanks,
> Vijay
> 
> ___
> Gluster-devel mailing list
> Gluster-devel@gluster.org
> https://urldefense.proofpoint.com/v2/url?u=http-3A__www.gluster.org_mailman_listinfo_gluster-2Ddevel=CwICAg=5VD0RTtNlTh3ycd41b3MUw=qJ8Lp7ySfpQklq3QZr44Iw=aQHnnoxK50Ebw77QHtp3ykjC976mJIt2qrIUzpqEViQ=Jitbldlbjwye6QI8V33ZoKtVt6-B64p2_-5piVlfXMQ=
> ___
> Gluster-devel 

Re: [Gluster-devel] [Gluster-infra] Smoke tests run on the builder in RH DC (at least)

2016-01-25 Thread Michael Scherer
Le lundi 25 janvier 2016 à 22:24 +0100, Niels de Vos a écrit :
> On Mon, Jan 25, 2016 at 06:59:33PM +0100, Michael Scherer wrote:
> > Hi,
> > 
> > so today, after fixing one last config item, the smoke test jobs run
> > fine on the Centos 6 builder in the RH DC, which build things as non
> > root, then start the tests, then reboot the server.
> 
> Nice, sounds like great progress!
> 
> Did you need to change anything in the build or test scripts under
> /opt/qa? If so, please make sure that the changes land in the
> repository:
> 
>   https://github.com/gluster/glusterfs-patch-acceptance-tests/

So far, I mostly removed code that was running on the jenkins script
(ie, the cleanup part that kill process), and added a reboot at the end.
Not sure I want to have that in the script :)

> > Now, I am looking at the fedora one, but once this one is good, I will
> > likely reinstall a few builders as a test, and go on Centos 7 builder.
> 
> I'm not sure yet if I made an error, or what is going on. But for some
> reason smoke tests for my patch series fails... This is the smoke result
> of the 1st patch in the serie, it only updates the fuse-header to a
> newer version. Of course local testing works just fine... The output and
> (not available) logs of the smoke test do not really help me :-/
> 
>   https://build.gluster.org/job/smoke/24395/console
> 
> Could this be related to the changes that were made? If not, I'd
> appreciate a pointer to my mistake.

No, I tested on a separate job to not interfere.

> > I was also planning to look at jenkins job builder for the jenkins, but
> > no time yet. Will be after jenkins migration to a new host (which is
> > still not planned, unlike gerrit where we should be attempting to find a
> > time for that)
> 
> We also might want to use Jenkins Job Builder for the tests we're adding
> to the CentOS CI. Maybe we could experiment with it there first, and
> then use our knowledge to the Gluster Jenkins?

Why not. I think it would work fine on our too, as i am not sure it need
to completely take over the server configuration.
-- 
Michael Scherer
Sysadmin, Community Infrastructure and Platform, OSAS




signature.asc
Description: This is a digitally signed message part
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] Throttling xlator on the bricks

2016-01-25 Thread Joe Julian



On 01/25/16 20:36, Pranith Kumar Karampuri wrote:



On 01/26/2016 08:41 AM, Richard Wareing wrote:
If there is one bucket per client and one thread per bucket, it 
would be
difficult to scale as the number of clients increase. How can we do 
this

better?
On this note... consider that 10's of thousands of clients are not 
unrealistic in production :).  Using a thread per bucket would also 
beunwise..


There is only one thread and this solution is for internal 
processes(shd, rebalance, quota etc) not coming in the way of clients 
which do I/O.




On the idea in general, I'm just wondering if there's specific 
(real-world) cases where this has even been an issue where least-prio 
queuing hasn't been able to handle?  Or is this more of a theoretical 
concern?  I ask as I've not really encountered situations where I 
wished I could give more FOPs to SHD vs rebalance and such.


I have seen users resort to offline healing of the bricks whenever a 
brick is replaced, or new brick is added to replication to increase 
replica count. When entry self-heal happens or big VM image data 
self-heals which do rchecksums CPU spikes are seen and I/O becomes 
useless.
This is the recent thread where a user ran into similar problem (just 
yesterday) (This is a combination of client-side healing and 
healing-load):

http://www.gluster.org/pipermail/gluster-users/2016-January/025051.html

We can find more of such threads if we put some time to dig into the 
mailing list.
I personally have seen people even resort to things like, "we let 
gluster heal over the weekend or in the nights when none of us are 
working on the volumes" etc.


I get at least weekly complaints of such on the IRC channel. A lot of 
them are in virtual environments (aws).




There are people who complain healing is too slow too. We get both 
kinds of complaints :-). Your multi-threaded shd patch is going to 
help here. I somehow feel you guys are in this set of people :-).


+1




In any event, it might be worth having Shreyas detail his throttling 
feature (that can throttle any directory hierarchy no less) to 
illustrate how a simpler design can achieve similar results to these 
more complicated (and it followsbug prone) approaches.


The solution we came up with is about throttling internal I/O. And 
there are only 4/5 such processes(shd, rebalance, quota, bitd etc). 
What you are saying above about throttling any directory hierarchy 
seems a bit different than what we are trying to solve from the looks 
of it(At least from the small description you gave above :-) ). 
Shreyas' mail detailing the feature would definitely help us 
understand what each of us are trying to solve. We want to GA both 
multi-threaded shd and this feature for 3.8.


Pranith


Richard


From: gluster-devel-boun...@gluster.org 
[gluster-devel-boun...@gluster.org] on behalf of Vijay Bellur 
[vbel...@redhat.com]

Sent: Monday, January 25, 2016 6:44 PM
To: Ravishankar N; Gluster Devel
Subject: Re: [Gluster-devel] Throttling xlator on the bricks

On 01/25/2016 12:36 AM, Ravishankar N wrote:

Hi,

We are planning to introduce a throttling xlator on the server (brick)
process to regulate FOPS. The main motivation is to solve complaints 
about
AFR selfheal taking too much of CPU resources. (due to too many fops 
for

entry
self-heal, rchecksums for data self-heal etc.)


I am wondering if we can re-use the same xlator for throttling
bandwidth, iops etc. in addition to fops. Based on admin configured
policies we could provide different upper thresholds to different
clients/tenants and this could prove to be an useful feature in
multitenant deployments to avoid starvation/noisy neighbor class of
problems. Has any thought gone in this direction?


The throttling is achieved using the Token Bucket Filter algorithm
(TBF). TBF
is already used by bitrot's bitd signer (which is a client process) in
gluster to regulate the CPU intensive check-sum calculation. By 
putting the
logic on the brick side, multiple clients- selfheal, bitrot, 
rebalance or

even the mounts themselves can avail the benefits of throttling.

The TBF algorithm in a nutshell is as follows: There is a bucket which
is filled
at a steady (configurable) rate with tokens. Each FOP will need a fixed
amount
of tokens to be processed. If the bucket has that many tokens, the 
FOP is
allowed and that many tokens are removed from the bucket. If not, 
the FOP is

queued until the bucket is filled.

The xlator will need to reside above io-threads and can have different
buckets,
one per client. There has to be a communication mechanism between the
client and
the brick (IPC?) to tell what FOPS need to be regulated from it, and 
the

no. of
tokens needed etc. These need to be re configurable via appropriate
mechanisms.
Each bucket will have a token filler thread which will fill the tokens
in it.

If there is one bucket per client and one thread per bucket, it would be
difficult to 

Re: [Gluster-devel] Tips and Tricks for Gluster Developer

2016-01-25 Thread Niels de Vos
On Mon, Jan 25, 2016 at 06:41:50AM -0500, Rajesh Joseph wrote:
> 
> 
> - Original Message -
> > From: "Richard Wareing" 
> > To: "Raghavendra Talur" 
> > Cc: "Gluster Devel" 
> > Sent: Monday, January 25, 2016 8:12:53 AM
> > Subject: Re: [Gluster-devel] Tips and Tricks for Gluster Developer
> > 
> > Here's my tips:
> > 
> > 1. General C tricks
> > - learn to use vim or emacs & read their manuals; customize to suite your
> > style
> > - use vim w/ pathogen plugins for auto formatting (don't use tabs!) & syntax
> > - use ctags to jump around functions
> > - Use ASAN & valgrind to check for memory leaks and heap corruption
> > - learn to use "git bisect" to quickly find where regressions were 
> > introduced
> > & revert them
> > - Use a window manager like tmux or screen
> > 
> > 2. Gluster specific tricks
> > - Alias "ggrep" to grep through all Gluster source files for some string and
> > show you the line numbers
> > - Alias "gvim" or "gemacs" to open any source file without full path, eg.
> > "gvim afr.c"
> > - GFS specific gdb macros to dump out pretty formatting of various structs
> > (Jeff Darcy has some of these IIRC)
> 
> I also use few macros for printing dictionary and walking through the list 
> structures.
> I think it would be good to collect these macros, scripts and tool in a 
> common place
> so that people can use them. Can we include them in "extras/dev" directory
> under Gluster source tree?

Yes, but please call it "extras/devel-tools" or something descriptive
like that. "extras/dev" sounds like some device under /dev :)

Thanks,
Niels


> 
> > - Write prove tests...for everything you write, and any bug you fix.  Make
> > them deterministic (timing/races shouldn't matter).
> > - Bugs/races and/or crashes which are hard or impossible to repro often
> > require the creation of a developer specific feature to simulate the failure
> > and efficiently code/test a fix.  Example: "monkey-unlocking" in the lock
> > revocation patch I just posted.
> > - That edge case you are ignoring because you think it's 
> > impossible/unlikely?
> > We will find/hit it in 48hrs at large scale (seriously we will) handle
> > it correctly or at a minimum write a (kernel style) "OOPS" log type message.
> > 
> > That's all I have off the top of my head.  I'll give example aliases in
> > another reply.
> > 
> > Richard
> > 
> > Sent from my iPhone
> > 
> > > On Jan 22, 2016, at 6:14 AM, Raghavendra Talur  wrote:
> > > 
> > > HI All,
> > > 
> > > I am sure there are many tricks hidden under sleeves of many Gluster
> > > developers.
> > > I realized this when speaking to new developers. It would be good have a
> > > searchable thread of such tricks.
> > > 
> > > Just reply back on this thread with the tricks that you have and I promise
> > > I will collate them and add them to developer guide.
> > > 
> > > 
> > > Looking forward to be amazed!
> > > 
> > > Thanks,
> > > Raghavendra Talur
> > > 
> > > ___
> > > Gluster-devel mailing list
> > > Gluster-devel@gluster.org
> > > https://urldefense.proofpoint.com/v2/url?u=http-3A__www.gluster.org_mailman_listinfo_gluster-2Ddevel=CwICAg=5VD0RTtNlTh3ycd41b3MUw=qJ8Lp7ySfpQklq3QZr44Iw=wVrGhYdkvCanDEZF0xOyVbFg0am_GxaoXR26Cvp7H2U=JOrY0up51BoZOq2sKaNJQHPzqKiUS3Bwgn7fr5VPXjw=
> > ___
> > Gluster-devel mailing list
> > Gluster-devel@gluster.org
> > http://www.gluster.org/mailman/listinfo/gluster-devel
> > 
> ___
> Gluster-devel mailing list
> Gluster-devel@gluster.org
> http://www.gluster.org/mailman/listinfo/gluster-devel


signature.asc
Description: PGP signature
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] Tips and Tricks for Gluster Developer

2016-01-25 Thread Joseph Fernandes
- Original Message -
From: "Jeff Darcy" 
To: "Richard Wareing" 
Cc: "Gluster Devel" 
Sent: Monday, January 25, 2016 7:27:20 PM
Subject: Re: [Gluster-devel] Tips and Tricks for Gluster Developer

Oh boy, here we go.  ;)

I second Richard's suggestion to use cscope or some equivalent.  It's a good 
idea in general, but especially with a codebase as large and complex as 
Gluster's.  I literally wouldn't be able to do my job without it.  I also have 
a set of bash/zsh aliases that will regenerate the cscope database after any 
git action, so I rarely have to do it myself.

JOE : Well cscope and vim is good enough but a good IDE (with its own search 
and cscope integrated) will also help.
I have been using codelite (http://codelite.org/) for over 2 years now and it 
rocks!


Another secondary tip is that in many cases anything you see in the code as 
"xyz_t" is actually "struct _xyz" so you can save a bit of time (in vim) with 
":ta _xyz" instead of going through the meaningless typedef.  Unfortunately 
we're not as consistent as we should be about this convention, but it mostly 
works.  Some day I'll figure out the vim macro syntax enough to create a proper 
macro and binding for this shortcut.

I should probably write a whole new blog post about gdb stuff.  Here's one I 
wrote a while ago:

http://pl.atyp.us/hekafs.org/index.php/2013/02/gdb-macros-for-glusterfs/

There's a lot more that could be done in this area.  For example, adding loc_t 
or inode_t or fd_t would all be good exercises.

On a more controversial note, I am opposed to the practice of doing "make 
install" on anything other than a transient VM/container.  I've seen too many 
patches that were broken because they relied on "leftovers" in someone's source 
directory or elsewhere on the system from previous installs.  On my test 
systems, I always build and install actual RPMs, to make sure new files are 
properly incorporated in to the configure/rpm system.  One of these days I'll 
set it up so the test system even does a "git clone" (instead of rsync) from my 
real source tree to catch un-checked-in files as well.

I'll probably think of more later, and will update here as I do.


___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] Tips and Tricks for Gluster Developer

2016-01-25 Thread Rajesh Joseph


- Original Message -
> From: "Niels de Vos" 
> To: "Rajesh Joseph" 
> Cc: "Richard Wareing" , "Gluster Devel" 
> 
> Sent: Monday, January 25, 2016 6:30:53 PM
> Subject: Re: [Gluster-devel] Tips and Tricks for Gluster Developer
> 
> On Mon, Jan 25, 2016 at 06:41:50AM -0500, Rajesh Joseph wrote:
> > 
> > 
> > - Original Message -
> > > From: "Richard Wareing" 
> > > To: "Raghavendra Talur" 
> > > Cc: "Gluster Devel" 
> > > Sent: Monday, January 25, 2016 8:12:53 AM
> > > Subject: Re: [Gluster-devel] Tips and Tricks for Gluster Developer
> > > 
> > > Here's my tips:
> > > 
> > > 1. General C tricks
> > > - learn to use vim or emacs & read their manuals; customize to suite your
> > > style
> > > - use vim w/ pathogen plugins for auto formatting (don't use tabs!) &
> > > syntax
> > > - use ctags to jump around functions
> > > - Use ASAN & valgrind to check for memory leaks and heap corruption
> > > - learn to use "git bisect" to quickly find where regressions were
> > > introduced
> > > & revert them
> > > - Use a window manager like tmux or screen
> > > 
> > > 2. Gluster specific tricks
> > > - Alias "ggrep" to grep through all Gluster source files for some string
> > > and
> > > show you the line numbers
> > > - Alias "gvim" or "gemacs" to open any source file without full path, eg.
> > > "gvim afr.c"
> > > - GFS specific gdb macros to dump out pretty formatting of various
> > > structs
> > > (Jeff Darcy has some of these IIRC)
> > 
> > I also use few macros for printing dictionary and walking through the list
> > structures.
> > I think it would be good to collect these macros, scripts and tool in a
> > common place
> > so that people can use them. Can we include them in "extras/dev" directory
> > under Gluster source tree?
> 
> Yes, but please call it "extras/devel-tools" or something descriptive
> like that. "extras/dev" sounds like some device under /dev :)

Yes, sure :-)

> 
> Thanks,
> Niels
> 
> 
> > 
> > > - Write prove tests...for everything you write, and any bug you fix.
> > > Make
> > > them deterministic (timing/races shouldn't matter).
> > > - Bugs/races and/or crashes which are hard or impossible to repro often
> > > require the creation of a developer specific feature to simulate the
> > > failure
> > > and efficiently code/test a fix.  Example: "monkey-unlocking" in the lock
> > > revocation patch I just posted.
> > > - That edge case you are ignoring because you think it's
> > > impossible/unlikely?
> > > We will find/hit it in 48hrs at large scale (seriously we will)
> > > handle
> > > it correctly or at a minimum write a (kernel style) "OOPS" log type
> > > message.
> > > 
> > > That's all I have off the top of my head.  I'll give example aliases in
> > > another reply.
> > > 
> > > Richard
> > > 
> > > Sent from my iPhone
> > > 
> > > > On Jan 22, 2016, at 6:14 AM, Raghavendra Talur 
> > > > wrote:
> > > > 
> > > > HI All,
> > > > 
> > > > I am sure there are many tricks hidden under sleeves of many Gluster
> > > > developers.
> > > > I realized this when speaking to new developers. It would be good have
> > > > a
> > > > searchable thread of such tricks.
> > > > 
> > > > Just reply back on this thread with the tricks that you have and I
> > > > promise
> > > > I will collate them and add them to developer guide.
> > > > 
> > > > 
> > > > Looking forward to be amazed!
> > > > 
> > > > Thanks,
> > > > Raghavendra Talur
> > > > 
> > > > ___
> > > > Gluster-devel mailing list
> > > > Gluster-devel@gluster.org
> > > > https://urldefense.proofpoint.com/v2/url?u=http-3A__www.gluster.org_mailman_listinfo_gluster-2Ddevel=CwICAg=5VD0RTtNlTh3ycd41b3MUw=qJ8Lp7ySfpQklq3QZr44Iw=wVrGhYdkvCanDEZF0xOyVbFg0am_GxaoXR26Cvp7H2U=JOrY0up51BoZOq2sKaNJQHPzqKiUS3Bwgn7fr5VPXjw=
> > > ___
> > > Gluster-devel mailing list
> > > Gluster-devel@gluster.org
> > > http://www.gluster.org/mailman/listinfo/gluster-devel
> > > 
> > ___
> > Gluster-devel mailing list
> > Gluster-devel@gluster.org
> > http://www.gluster.org/mailman/listinfo/gluster-devel
> 
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


[Gluster-devel] Gerrit down for 10 to 15 minutes for reindexing

2016-01-25 Thread Michael Scherer
Hi,

in order to fix some issues (I hope), I am gonna start a reindex of the
lucense DB of gerrit. This requires the server to be put offline for a
while, and I did a test on another VM, would take ~10 minutes (it was
240 seconds on the VM, but it was likely faster since the VM is faster)

I will to do that around 18h UTC, in ~ 3h ( so 1pm Boston time, 23h Pune
time, and 19h Amsterdam time, so it shouldn't impact too much people who
would either be sleeping and/or eating ). 

If people really want to work, we always have bugs to triage :)
-- 
Michael Scherer
Sysadmin, Community Infrastructure and Platform, OSAS




signature.asc
Description: This is a digitally signed message part
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] Tips and Tricks for Gluster Developer

2016-01-25 Thread Jeff Darcy
Oh boy, here we go.  ;)

I second Richard's suggestion to use cscope or some equivalent.  It's a good 
idea in general, but especially with a codebase as large and complex as 
Gluster's.  I literally wouldn't be able to do my job without it.  I also have 
a set of bash/zsh aliases that will regenerate the cscope database after any 
git action, so I rarely have to do it myself.

Another secondary tip is that in many cases anything you see in the code as 
"xyz_t" is actually "struct _xyz" so you can save a bit of time (in vim) with 
":ta _xyz" instead of going through the meaningless typedef.  Unfortunately 
we're not as consistent as we should be about this convention, but it mostly 
works.  Some day I'll figure out the vim macro syntax enough to create a proper 
macro and binding for this shortcut.

I should probably write a whole new blog post about gdb stuff.  Here's one I 
wrote a while ago:

http://pl.atyp.us/hekafs.org/index.php/2013/02/gdb-macros-for-glusterfs/

There's a lot more that could be done in this area.  For example, adding loc_t 
or inode_t or fd_t would all be good exercises.

On a more controversial note, I am opposed to the practice of doing "make 
install" on anything other than a transient VM/container.  I've seen too many 
patches that were broken because they relied on "leftovers" in someone's source 
directory or elsewhere on the system from previous installs.  On my test 
systems, I always build and install actual RPMs, to make sure new files are 
properly incorporated in to the configure/rpm system.  One of these days I'll 
set it up so the test system even does a "git clone" (instead of rsync) from my 
real source tree to catch un-checked-in files as well.

I'll probably think of more later, and will update here as I do.


___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] Smoke tests run on the builder in RH DC (at least)

2016-01-25 Thread Raghavendra Talur
On Mon, Jan 25, 2016 at 11:29 PM, Michael Scherer 
wrote:

> Hi,
>
> so today, after fixing one last config item, the smoke test jobs run
> fine on the Centos 6 builder in the RH DC, which build things as non
> root, then start the tests, then reboot the server.
>

Awesome!


>
> Now, I am looking at the fedora one, but once this one is good, I will
> likely reinstall a few builders as a test, and go on Centos 7 builder.
>

This is what I had to do to get Fedora working. Ansible lines are shown
where applicable.

1. change ownership for python site packages: difference is in version 2.7
when compared to 2.6 of CentOS
file: path=/usr/lib/python2.7/site-packages/gluster/ state=directory
owner=jenkins group=root

2. Had to give jenkins write permission on /usr/lib/systemd/system/
for installing glusterd service file.




> I was also planning to look at jenkins job builder for the jenkins, but
> no time yet. Will be after jenkins migration to a new host (which is
> still not planned, unlike gerrit where we should be attempting to find a
> time for that)
>
>
> --
> Michael Scherer
> Sysadmin, Community Infrastructure and Platform, OSAS
>
>
>
> ___
> Gluster-devel mailing list
> Gluster-devel@gluster.org
> http://www.gluster.org/mailman/listinfo/gluster-devel
>
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] Throttling xlator on the bricks

2016-01-25 Thread Shreyas Siravara
Just out of curiosity, what benefits do we think this throttling xlator would 
provide over the "enable-least-priority" option (where we put all the fops from 
SHD, etc into a least pri queue)?

 
> On Jan 25, 2016, at 12:29 AM, Venky Shankar  wrote:
> 
> On Mon, Jan 25, 2016 at 01:08:38PM +0530, Ravishankar N wrote:
>> On 01/25/2016 12:56 PM, Venky Shankar wrote:
>>> Also, it would be beneficial to have the core TBF implementation as part of
>>> libglusterfs so as to be consumable by the server side xlator component to
>>> throttle dispatched FOPs and for daemons to throttle anything that's outside
>>> "brick" boundary (such as cpu, etc..).
>> That makes sense. We were initially thinking to overload posix_rchecksum()
>> to do the SHA256 sums for the signer.
> 
> That does have advantages by avoiding network rountrips by computing SHA* 
> locally.
> TBF could still implement ->rchecksum and throttle that (on behalf of clients,
> residing on the server - internal daemons). Placing the core implementation as
> part of libglusterfs would still provide the flexibility.
> 
>> 
>> 
> ___
> Gluster-devel mailing list
> Gluster-devel@gluster.org
> https://urldefense.proofpoint.com/v2/url?u=http-3A__www.gluster.org_mailman_listinfo_gluster-2Ddevel=CwICAg=5VD0RTtNlTh3ycd41b3MUw=N7LE2BKIHDDBvkYkakYthA=9W9xtRg0TIEUvFL-8HpUCux8psoWKkUbEFiwqykRwH4=OVF0dZRXt8GFcIxsHlkbNjH-bjD9097q5hjVVHgOFkQ=
>  

___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] [Gluster-users] Gluster Monthly Newsletter, January 2015 Edition

2016-01-25 Thread Amye Scavarda
On Fri, Jan 22, 2016 at 9:29 AM, Niels de Vos  wrote:
> On Mon, Jan 18, 2016 at 07:46:16PM -0800, Amye Scavarda wrote:
>> We're kicking off an updated Monthly Newsletter, coming out mid-month.
>> We'll highlight special posts, news and noteworthy threads from the
>> mailing lists, events, and other things that are important for the
>> Gluster community.
>
> ... snip!
>
>> FOSDEM:
>> * Gluster roadmap, recent improvements and upcoming features - Niels De Vos
>
> More details about the talk and related interview here:
>
>   https://fosdem.org/2016/schedule/event/gluster_roadmap/
>   https://fosdem.org/2016/interviews/2016-niels-de-vos/
>
>> * Go & Plugins - Kaushal Madappa
>> * Gluster Stand
>> DevConf
>> * small Gluster Developer Gathering
>> * Heketi GlusterFS volume management - Lusis Pabon
>> * Gluster roadmap, recent improvements and upcoming features - Niels De Vos
>
> Sorry, this is not correct. That talk was proposed, but not accepted.
> I'll be giving a workshop though:
>
>   Build your own Scale-Out Storage with Gluster
>   http://sched.co/5m1X
>
>> FAST
>>
>> ==
>> Questions? Comments? Want to be involved?
>
> Can the newsletter get posted in a blog as well? I like reading posts
> like this through the RSS feed from http://planet.gluster.org/ .
>
> Thanks!
> Niels


Thanks for the update! This time around, I just put out the Community
Survey followup, but I'll be adding this to our main blog (with
updates to reflect changes).

-- 
Amye Scavarda | a...@redhat.com | Gluster Community Lead
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] [Gluster-users] Memory leak in GlusterFS FUSE client

2016-01-25 Thread Oleksandr Natalenko
Here are the results of "rsync" test. I've got 2 volumes — source and target — 
performing multiple files rsyncing from one volume to another.

Source volume:

===
root 22259  3.5  1.5 1204200 771004 ?  Ssl  Jan23 109:42 /usr/sbin/
glusterfs --volfile-server=glusterfs.example.com --volfile-id=source /mnt/net/
glusterfs/source
===

One may see that memory consumption of source volume is not that high as with 
"find" test. Here is source volume client statedump: https://gist.github.com/
ef5b798859219e739aeb

Here is source volume info: https://gist.github.com/3d2f32e7346df9333004

Target volume:

===
root 22200 23.8  6.9 3983676 3456252 ? Ssl  Jan23 734:57 /usr/sbin/
glusterfs --volfile-server=glusterfs.example.com --volfile-id=target /mnt/net/
glusterfs/target
===

Here is target volume info: https://gist.github.com/c9de01168071575b109e

Target volume RAM consumption is very high (more than 3 GiBs). Here is client 
statedump too: https://gist.github.com/31e43110eaa4da663435

I see huge DHT-related memory usage, e.g.:

===
[cluster/distribute.asterisk_records-dht - usage-type gf_common_mt_mem_pool 
memusage]
size=725575592
num_allocs=7552486
max_size=725575836
max_num_allocs=7552489
total_allocs=90843958

[cluster/distribute.asterisk_records-dht - usage-type gf_common_mt_char 
memusage]
size=586404954
num_allocs=7572836
max_size=586405157
max_num_allocs=7572839
total_allocs=80463096
===

Ideas?

On понеділок, 25 січня 2016 р. 02:46:32 EET Oleksandr Natalenko wrote:
> Also, I've repeated the same "find" test again, but with glusterfs process
> launched under valgrind. And here is valgrind output:
> 
> https://gist.github.com/097afb01ebb2c5e9e78d
> 
> On неділя, 24 січня 2016 р. 09:33:00 EET Mathieu Chateau wrote:
> > Thanks for all your tests and times, it looks promising :)
> > 
> > 
> > Cordialement,
> > Mathieu CHATEAU
> > http://www.lotp.fr
> > 
> > 2016-01-23 22:30 GMT+01:00 Oleksandr Natalenko :
> > > OK, now I'm re-performing tests with rsync + GlusterFS v3.7.6 + the
> > > following
> > > patches:
> > > 
> > > ===
> > > 
> > > Kaleb S KEITHLEY (1):
> > >   fuse: use-after-free fix in fuse-bridge, revisited
> > > 
> > > Pranith Kumar K (1):
> > >   mount/fuse: Fix use-after-free crash
> > > 
> > > Soumya Koduri (3):
> > >   gfapi: Fix inode nlookup counts
> > >   inode: Retire the inodes from the lru list in inode_table_destroy
> > >   upcall: free the xdr* allocations
> > > 
> > > ===
> > > 
> > > I run rsync from one GlusterFS volume to another. While memory started
> > > from
> > > under 100 MiBs, it stalled at around 600 MiBs for source volume and does
> > > not
> > > grow further. As for target volume it is ~730 MiBs, and that is why I'm
> > > going
> > > to do several rsync rounds to see if it grows more (with no patches bare
> > > 3.7.6
> > > could consume more than 20 GiBs).
> > > 
> > > No "kernel notifier loop terminated" message so far for both volumes.
> > > 
> > > Will report more in several days. I hope current patches will be
> > > incorporated
> > > into 3.7.7.
> > > 
> > > On пʼятниця, 22 січня 2016 р. 12:53:36 EET Kaleb S. KEITHLEY wrote:
> > > > On 01/22/2016 12:43 PM, Oleksandr Natalenko wrote:
> > > > > On пʼятниця, 22 січня 2016 р. 12:32:01 EET Kaleb S. KEITHLEY wrote:
> > > > >> I presume by this you mean you're not seeing the "kernel notifier
> > > > >> loop
> > > > >> terminated" error in your logs.
> > > > > 
> > > > > Correct, but only with simple traversing. Have to test under rsync.
> > > > 
> > > > Without the patch I'd get "kernel notifier loop terminated" within a
> > > > few
> > > > minutes of starting I/O.  With the patch I haven't seen it in 24 hours
> > > > of beating on it.
> > > > 
> > > > >> Hmmm.  My system is not leaking. Last 24 hours the RSZ and VSZ are
> > > 
> > > > >> stable:
> > > http://download.gluster.org/pub/gluster/glusterfs/dynamic-analysis/longe
> > > v
> > > 
> > > > >> ity /client.out
> > > > > 
> > > > > What ops do you perform on mounted volume? Read, write, stat? Is
> > > > > that
> > > > > 3.7.6 + patches?
> > > > 
> > > > I'm running an internally developed I/O load generator written by a
> > > > guy
> > > > on our perf team.
> > > > 
> > > > it does, create, write, read, rename, stat, delete, and more.
> 
> ___
> Gluster-users mailing list
> gluster-us...@gluster.org
> http://www.gluster.org/mailman/listinfo/gluster-users


___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] Gerrit down for 10 to 15 minutes for reindexing

2016-01-25 Thread Michael Scherer
Le lundi 25 janvier 2016 à 15:41 +0100, Michael Scherer a écrit :
> Hi,
> 
> in order to fix some issues (I hope), I am gonna start a reindex of the
> lucense DB of gerrit. This requires the server to be put offline for a
> while, and I did a test on another VM, would take ~10 minutes (it was
> 240 seconds on the VM, but it was likely faster since the VM is faster)
> 
> I will to do that around 18h UTC, in ~ 3h ( so 1pm Boston time, 23h Pune
> time, and 19h Amsterdam time, so it shouldn't impact too much people who
> would either be sleeping and/or eating ). 
> 
> If people really want to work, we always have bugs to triage :)

So it took 3 to 4 minutes, much faster than what I tought.

And it did fixed the issue it was meant to fix, ie Prashanth being
unable to find his own reviews.

This did happen because we did some modification directly in SQL, but
gerrit need to reindex everything to see some changes, and this requires
to take the db offline (something that is fixed with a newer version of
gerrit).

Please ping me if anything weird appear.
-- 
Michael Scherer
Sysadmin, Community Infrastructure and Platform, OSAS




signature.asc
Description: This is a digitally signed message part
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel

[Gluster-devel] Smoke tests run on the builder in RH DC (at least)

2016-01-25 Thread Michael Scherer
Hi,

so today, after fixing one last config item, the smoke test jobs run
fine on the Centos 6 builder in the RH DC, which build things as non
root, then start the tests, then reboot the server.

Now, I am looking at the fedora one, but once this one is good, I will
likely reinstall a few builders as a test, and go on Centos 7 builder.

I was also planning to look at jenkins job builder for the jenkins, but
no time yet. Will be after jenkins migration to a new host (which is
still not planned, unlike gerrit where we should be attempting to find a
time for that)


-- 
Michael Scherer
Sysadmin, Community Infrastructure and Platform, OSAS




signature.asc
Description: This is a digitally signed message part
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] [Gluster-infra] Smoke tests run on the builder in RH DC (at least)

2016-01-25 Thread Niels de Vos
On Mon, Jan 25, 2016 at 06:59:33PM +0100, Michael Scherer wrote:
> Hi,
> 
> so today, after fixing one last config item, the smoke test jobs run
> fine on the Centos 6 builder in the RH DC, which build things as non
> root, then start the tests, then reboot the server.

Nice, sounds like great progress!

Did you need to change anything in the build or test scripts under
/opt/qa? If so, please make sure that the changes land in the
repository:

  https://github.com/gluster/glusterfs-patch-acceptance-tests/

> Now, I am looking at the fedora one, but once this one is good, I will
> likely reinstall a few builders as a test, and go on Centos 7 builder.

I'm not sure yet if I made an error, or what is going on. But for some
reason smoke tests for my patch series fails... This is the smoke result
of the 1st patch in the serie, it only updates the fuse-header to a
newer version. Of course local testing works just fine... The output and
(not available) logs of the smoke test do not really help me :-/

  https://build.gluster.org/job/smoke/24395/console

Could this be related to the changes that were made? If not, I'd
appreciate a pointer to my mistake.

> I was also planning to look at jenkins job builder for the jenkins, but
> no time yet. Will be after jenkins migration to a new host (which is
> still not planned, unlike gerrit where we should be attempting to find a
> time for that)

We also might want to use Jenkins Job Builder for the tests we're adding
to the CentOS CI. Maybe we could experiment with it there first, and
then use our knowledge to the Gluster Jenkins?

Thanks,
Niels


signature.asc
Description: PGP signature
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] Feature: Automagic lock-revocation for features/locks xlator (v3.7.x)

2016-01-25 Thread Richard Wareing
Hey Hey Panith,

>Maybe give clients a second (or more) chance to "refresh" their locks - in the 
>sense, when a lock is about to be revoked, notify the client which can then 
>call for a refresh to conform it's locks holding validity. This would require 
>some maintainance work on the client to keep >track of locked regions.

So we've thought about this as well, however the approach I'd rather is that we 
(long term) eliminate any need for multi-hour locking.  This would put the 
responsibility on the SHD/rebalance/bitrot daemons to take out another lock 
request once in a while to signal to the POSIX locks translator that they are 
still there and alive.

The world we want to be in is that locks > N minutes is most _definitely_ a bug 
or broken client and should be revoked.  With this patch it's simply a 
heuristic to make a judgement call, in our world however we've seen that once 
you have 1000's of lock requests piled outit's only a matter of time before 
your entire cluster is going to collapse; so the "correctness" of the locking 
behavior or however much you might upset SHD/bitrot/rebalance is a completely 
secondary concern over the availability and stability of the cluster itself.

For folks that want to use this feature conservatively, they shouldn't revoke 
based on time, but rather based on (lock request) queue depth; if you are in a 
situation like I've described above it's almost certainly a bug or a situation 
not fully understood by developers.

Richard



From: Venky Shankar [yknev.shan...@gmail.com]
Sent: Sunday, January 24, 2016 9:36 PM
To: Pranith Kumar Karampuri
Cc: Richard Wareing; Gluster Devel
Subject: Re: [Gluster-devel] Feature: Automagic lock-revocation for 
features/locks xlator (v3.7.x)


On Jan 25, 2016 08:12, "Pranith Kumar Karampuri" 
> wrote:
>
>
>
> On 01/25/2016 02:17 AM, Richard Wareing wrote:
>>
>> Hello all,
>>
>> Just gave a talk at SCaLE 14x today and I mentioned our new locks revocation 
>> feature which has had a significant impact on our GFS cluster reliability.  
>> As such I wanted to share the patch with the community, so here's the 
>> bugzilla report:
>>
>> https://bugzilla.redhat.com/show_bug.cgi?id=1301401
>>
>> =
>> Summary:
>> Mis-behaving brick clients (gNFSd, FUSE, gfAPI) can cause cluster 
>> instability and eventual complete unavailability due to failures in 
>> releasing entry/inode locks in a timely manner.
>>
>> Classic symptoms on this are increased brick (and/or gNFSd) memory usage due 
>> the high number of (lock request) frames piling up in the processes.  The 
>> failure-mode results in bricks eventually slowing down to a crawl due to 
>> swapping, or OOMing due to complete memory exhaustion; during this period 
>> the entire cluster can begin to fail.  End-users will experience this as 
>> hangs on the filesystem, first in a specific region of the file-system and 
>> ultimately the entire filesystem as the offending brick begins to turn into 
>> a zombie (i.e. not quite dead, but not quite alive either).
>>
>> Currently, these situations must be handled by an administrator detecting & 
>> intervening via the "clear-locks" CLI command.  Unfortunately this doesn't 
>> scale for large numbers of clusters, and it depends on the correct 
>> (external) detection of the locks piling up (for which there is little 
>> signal other than state dumps).
>>
>> This patch introduces two features to remedy this situation:
>>
>> 1. Monkey-unlocking - This is a feature targeted at developers (only!) to 
>> help track down crashes due to stale locks, and prove the utility of he lock 
>> revocation feature.  It does this by silently dropping 1% of unlock 
>> requests; simulating bugs or mis-behaving clients.
>>
>> The feature is activated via:
>> features.locks-monkey-unlocking 
>>
>> You'll see the message
>> "[] W [inodelk.c:653:pl_inode_setlk] 0-groot-locks: MONKEY 
>> LOCKING (forcing stuck lock)!" ... in the logs indicating a request has been 
>> dropped.
>>
>> 2. Lock revocation - Once enabled, this feature will revoke a 
>> *contended*lock  (i.e. if nobody else asks for the lock, we will not revoke 
>> it) either by the amount of time the lock has been held, how many other lock 
>> requests are waiting on the lock to be freed, or some combination of both.  
>> Clients which are losing their locks will be notified by receiving EAGAIN 
>> (send back to their callback function).
>>
>> The feature is activated via these options:
>> features.locks-revocation-secs 
>> features.locks-revocation-clear-all [on/off]
>> features.locks-revocation-max-blocked 
>>
>> Recommended settings are: 1800 seconds for a time based timeout (give 
>> clients the benefit of the doubt, or chose a max-blocked requires some 
>> experimentation depending on your workload, but generally values of hundreds 
>> to low thousands (it's normal for many ten's of locks to be taken 

Re: [Gluster-devel] [Gluster-infra] Smoke tests run on the builder in RH DC (at least)

2016-01-25 Thread Niels de Vos
On Mon, Jan 25, 2016 at 10:24:33PM +0100, Niels de Vos wrote:
> On Mon, Jan 25, 2016 at 06:59:33PM +0100, Michael Scherer wrote:
> > Hi,
> > 
> > so today, after fixing one last config item, the smoke test jobs run
> > fine on the Centos 6 builder in the RH DC, which build things as non
> > root, then start the tests, then reboot the server.
> 
> Nice, sounds like great progress!
> 
> Did you need to change anything in the build or test scripts under
> /opt/qa? If so, please make sure that the changes land in the
> repository:
> 
>   https://github.com/gluster/glusterfs-patch-acceptance-tests/
> 
> > Now, I am looking at the fedora one, but once this one is good, I will
> > likely reinstall a few builders as a test, and go on Centos 7 builder.
> 
> I'm not sure yet if I made an error, or what is going on. But for some
> reason smoke tests for my patch series fails... This is the smoke result
> of the 1st patch in the serie, it only updates the fuse-header to a
> newer version. Of course local testing works just fine... The output and
> (not available) logs of the smoke test do not really help me :-/
> 
>   https://build.gluster.org/job/smoke/24395/console
> 
> Could this be related to the changes that were made? If not, I'd
> appreciate a pointer to my mistake.

Well, I guess that this is a limitation in the FUSE kernel module that
is part of EL6 and EL7. One of the structures sent is probably too big
and the kernel refuses to accept it. I guess I'll need to go back to the
drawing board and add a real check for the FUSE version, or something
like that.

Niels


signature.asc
Description: PGP signature
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] Tips and Tricks for Gluster Developer

2016-01-25 Thread Vijay Bellur

On 01/22/2016 09:13 AM, Raghavendra Talur wrote:

HI All,

I am sure there are many tricks hidden under sleeves of many Gluster
developers.
I realized this when speaking to new developers. It would be good have a
searchable thread of such tricks.

Just reply back on this thread with the tricks that you have and I
promise I will collate them and add them to developer guide.


Looking forward to be amazed!




Things that I normally do:


1. Visualizing flow through the stack is one of the first steps that I 
use in debugging. Tracing a call from the origin (application) to the 
underlying filesystem is usually helpful in isolating most problems. 
Looking at logs emanating from the endpoints of each stack (fuse/nfs 
etc. + client protocols, server + posix) helps in identifying the stack 
that might be the source of a problem. For understanding the nature of 
fops happening, you can use the wireshark plugin or the trace translator 
at appropriate locations in the graph.


2. Use statedump/meta for understanding internal state. For servers 
statedump is the only recourse, on fuse clients you can get a meta view 
of the filesystem by


cd /mnt/point/.meta

and you get to view the statedump information in a hierarchical fashion 
(including information from individual xlators).


3. Reproduce a problem by minimizing the number of nodes in a graph. 
This can be done by disabling translators that can be disabled through 
volume set interface and by having custom volume files.


4. Use error-gen while developing new code to simulate fault injection 
for fops.


5. If a problem happens only at scale, try reproducing the problem by 
reducing default limits in code (timeouts, inode table limits etc.). 
Some of them do require re-compilation of code.


6. Use the wealth of tools available on *nix systems for understanding a 
performance problem better. This infographic [1] and page [2] by Brendan 
Gregg is quite handy for using the right tool at the right layer.


7. For isolating regression test failures:

 - use tests/utils/testn.sh to quickly identify the failing test
 - grep for the last "Test Summary Report" in the jenkins report 
for a failed regression run. That usually provides a pointer to the 
failing test.
 - In case of a failure due to a core, the gdb command provided in 
the jenkins report is quite handy to get a backtrace after downloading 
the core and its runtime to your laptop.


8. Get necessary information to debug a problem as soon as a new bug is 
logged (a day or two is ideal). If we miss that opportunity, users could 
have potentially moved on to other things and obtaining information can 
prove to be difficult.


9. Be paranoid about any code that you write ;-). Anything that is not 
tested by us will come back to haunt us sometime else in the future.


10. Use terminator [3] for concurrently executing the same command on 
multiple nodes.


Will fill in more when I recollect something useful.

-Vijay

[1] http://www.brendangregg.com/Perf/linux_observability_tools.png

[2] http://www.brendangregg.com/perf.html

[3] https://launchpad.net/terminator


___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


[Gluster-devel] Vault CFP closes January 29th

2016-01-25 Thread Amye Scavarda
The Linux Foundation's Vault
(http://events.linuxfoundation.org/events/vault ) event focusing on
Linux storage and filesystems currently has their call for papers open
- but it closes this Friday, January 29th. I'm highlighting this
because GlusterFS is mentioned as a suggested topic!

This year, it's in Raleigh, North Carolina.
>From http://events.linuxfoundation.org/events/vault/program/cfp

CFP Close: January 29, 2016
CFP Notifications: February 9, 2016
Schedule Announced: February 11, 2016

Suggested Topics
We seek proposals on a diverse range of topics related to storage,
Linux, and open source, including:

Object, Block, and File System Storage Architectures (Ceph, Swift,
Cinder, Manila, OpenZFS)
Distributed, Clustered, and Parallel Storage Systems (**GlusterFS**,
Ceph, Lustre, OrangeFS, XtreemFS, MooseFS, OCFS2, HDFS)
Persistent Memory and Other New Hardware Technologies
File System Scaling Issues
IT Automation and Storage Management (OpenLMI, Ovirt, Ansible)
Client/server file systems (NFS, Samba, pNFS)
Big Data Storage
Long Term, Offline Data Archiving
Data Compression and Storage Optimization
Software Defined Storage

-- 
Anyone want to put in some great 3.8 talks?

 -- amye

-- 
Amye Scavarda | a...@redhat.com | Gluster Community Lead
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] Throttling xlator on the bricks

2016-01-25 Thread Venky Shankar
On Mon, Jan 25, 2016 at 01:08:38PM +0530, Ravishankar N wrote:
> On 01/25/2016 12:56 PM, Venky Shankar wrote:
> >Also, it would be beneficial to have the core TBF implementation as part of
> >libglusterfs so as to be consumable by the server side xlator component to
> >throttle dispatched FOPs and for daemons to throttle anything that's outside
> >"brick" boundary (such as cpu, etc..).
> That makes sense. We were initially thinking to overload posix_rchecksum()
> to do the SHA256 sums for the signer.

That does have advantages by avoiding network rountrips by computing SHA* 
locally.
TBF could still implement ->rchecksum and throttle that (on behalf of clients,
residing on the server - internal daemons). Placing the core implementation as
part of libglusterfs would still provide the flexibility.

> 
> 
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] [Gluster-users] GlusterFS FUSE client hangs on rsyncing lots of file

2016-01-25 Thread baul jianguo
the client statedump is at http://pastebin.centos.org/38671/

On Mon, Jan 25, 2016 at 3:33 PM, baul jianguo  wrote:
> 3.5.7 also hangs.only the flush op hung. Yes,off the
> performance.client-io-threads ,no hang.
>
> The hang does not relate the client kernel version.
>
> One client statdump about flush op,any abnormal?
>
> [global.callpool.stack.12]
>
> uid=0
>
> gid=0
>
> pid=14432
>
> unique=16336007098
>
> lk-owner=77cb199aa36f3641
>
> op=FLUSH
>
> type=1
>
> cnt=6
>
>
>
> [global.callpool.stack.12.frame.1]
>
> ref_count=1
>
> translator=fuse
>
> complete=0
>
>
>
> [global.callpool.stack.12.frame.2]
>
> ref_count=0
>
> translator=datavolume-write-behind
>
> complete=0
>
> parent=datavolume-read-ahead
>
> wind_from=ra_flush
>
> wind_to=FIRST_CHILD (this)->fops->flush
>
> unwind_to=ra_flush_cbk
>
>
>
> [global.callpool.stack.12.frame.3]
>
> ref_count=1
>
> translator=datavolume-read-ahead
>
> complete=0
>
> parent=datavolume-open-behind
>
> wind_from=default_flush_resume
>
> wind_to=FIRST_CHILD(this)->fops->flush
>
> unwind_to=default_flush_cbk
>
>
>
> [global.callpool.stack.12.frame.4]
>
> ref_count=1
>
> translator=datavolume-open-behind
>
> complete=0
>
> parent=datavolume-io-threads
>
> wind_from=iot_flush_wrapper
>
> wind_to=FIRST_CHILD(this)->fops->flush
>
> unwind_to=iot_flush_cbk
>
>
>
> [global.callpool.stack.12.frame.5]
>
> ref_count=1
>
> translator=datavolume-io-threads
>
> complete=0
>
> parent=datavolume
>
> wind_from=io_stats_flush
>
> wind_to=FIRST_CHILD(this)->fops->flush
>
> unwind_to=io_stats_flush_cbk
>
>
>
> [global.callpool.stack.12.frame.6]
>
> ref_count=1
>
> translator=datavolume
>
> complete=0
>
> parent=fuse
>
> wind_from=fuse_flush_resume
>
> wind_to=xl->fops->flush
>
> unwind_to=fuse_err_cbk
>
>
>
> On Sun, Jan 24, 2016 at 5:35 AM, Oleksandr Natalenko
>  wrote:
>> With "performance.client-io-threads" set to "off" no hangs occurred in 3
>> rsync/rm rounds. Could that be some fuse-bridge lock race? Will bring that
>> option to "on" back again and try to get full statedump.
>>
>> On четвер, 21 січня 2016 р. 14:54:47 EET Raghavendra G wrote:
>>> On Thu, Jan 21, 2016 at 10:49 AM, Pranith Kumar Karampuri <
>>>
>>> pkara...@redhat.com> wrote:
>>> > On 01/18/2016 02:28 PM, Oleksandr Natalenko wrote:
>>> >> XFS. Server side works OK, I'm able to mount volume again. Brick is 30%
>>> >> full.
>>> >
>>> > Oleksandr,
>>> >
>>> >   Will it be possible to get the statedump of the client, bricks
>>> >
>>> > output next time it happens?
>>> >
>>> > https://github.com/gluster/glusterfs/blob/master/doc/debugging/statedump.m
>>> > d#how-to-generate-statedump
>>> We also need to dump inode information. To do that you've to add "all=yes"
>>> to /var/run/gluster/glusterdump.options before you issue commands to get
>>> statedump.
>>>
>>> > Pranith
>>> >
>>> >> On понеділок, 18 січня 2016 р. 15:07:18 EET baul jianguo wrote:
>>> >>> What is your brick file system? and the glusterfsd process and all
>>> >>> thread status?
>>> >>> I met same issue when client app such as rsync stay in D status,and
>>> >>> the brick process and relate thread also be in the D status.
>>> >>> And the brick dev disk util is 100% .
>>> >>>
>>> >>> On Sun, Jan 17, 2016 at 6:13 AM, Oleksandr Natalenko
>>> >>>
>>> >>>  wrote:
>>>  Wrong assumption, rsync hung again.
>>> 
>>>  On субота, 16 січня 2016 р. 22:53:04 EET Oleksandr Natalenko wrote:
>>> > One possible reason:
>>> >
>>> > cluster.lookup-optimize: on
>>> > cluster.readdir-optimize: on
>>> >
>>> > I've disabled both optimizations, and at least as of now rsync still
>>> > does
>>> > its job with no issues. I would like to find out what option causes
>>> > such
>>> > a
>>> > behavior and why. Will test more.
>>> >
>>> > On пʼятниця, 15 січня 2016 р. 16:09:51 EET Oleksandr Natalenko wrote:
>>> >> Another observation: if rsyncing is resumed after hang, rsync itself
>>> >> hangs a lot faster because it does stat of already copied files. So,
>>> >> the
>>> >> reason may be not writing itself, but massive stat on GlusterFS
>>> >> volume
>>> >> as well.
>>> >>
>>> >> 15.01.2016 09:40, Oleksandr Natalenko написав:
>>> >>> While doing rsync over millions of files from ordinary partition to
>>> >>> GlusterFS volume, just after approx. first 2 million rsync hang
>>> >>> happens, and the following info appears in dmesg:
>>> >>>
>>> >>> ===
>>> >>> [17075038.924481] INFO: task rsync:10310 blocked for more than 120
>>> >>> seconds.
>>> >>> [17075038.931948] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
>>> >>> disables this message.
>>> >>> [17075038.940748] rsync   D 88207fc13680 0 10310
>>> >>> 10309 0x0080
>>> >>> [17075038.940752]  8809c578be18 0086
>>> >>> 8809c578bfd8
>>> >>> 00013680
>>> >>> 

Re: [Gluster-devel] Tips and Tricks for Gluster Developer

2016-01-25 Thread Raghavendra Talur
I don't like installing the bits under /usr/local so I configure and
compile them to install in the same place as a Fedora rpm would.
Here is my compile command


./autogen.sh

CFLAGS="-g -O0 -Werror -Wall -Wno-error=cpp -Wno-error=maybe-uninitialized"
./configure \
--prefix=/usr \
--exec-prefix=/usr \
--bindir=/usr/bin \
--sbindir=/usr/sbin \
--sysconfdir=/etc \
--datadir=/usr/share \
--includedir=/usr/include \
--libdir=/usr/lib64 \
--libexecdir=/usr/libexec \
--localstatedir=/var \
--sharedstatedir=/var/lib \
--mandir=/usr/share/man \
--infodir=/usr/share/info \
--libdir=/usr/lib64 \
--enable-debug

make install




On Fri, Jan 22, 2016 at 7:43 PM, Raghavendra Talur 
wrote:

> HI All,
>
> I am sure there are many tricks hidden under sleeves of many Gluster
> developers.
> I realized this when speaking to new developers. It would be good have a
> searchable thread of such tricks.
>
> Just reply back on this thread with the tricks that you have and I promise
> I will collate them and add them to developer guide.
>
>
> Looking forward to be amazed!
>
> Thanks,
> Raghavendra Talur
>
>
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] Tips and Tricks for Gluster Developer

2016-01-25 Thread Rajesh Joseph


- Original Message -
> From: "Richard Wareing" 
> To: "Raghavendra Talur" 
> Cc: "Gluster Devel" 
> Sent: Monday, January 25, 2016 8:12:53 AM
> Subject: Re: [Gluster-devel] Tips and Tricks for Gluster Developer
> 
> Here's my tips:
> 
> 1. General C tricks
> - learn to use vim or emacs & read their manuals; customize to suite your
> style
> - use vim w/ pathogen plugins for auto formatting (don't use tabs!) & syntax
> - use ctags to jump around functions
> - Use ASAN & valgrind to check for memory leaks and heap corruption
> - learn to use "git bisect" to quickly find where regressions were introduced
> & revert them
> - Use a window manager like tmux or screen
> 
> 2. Gluster specific tricks
> - Alias "ggrep" to grep through all Gluster source files for some string and
> show you the line numbers
> - Alias "gvim" or "gemacs" to open any source file without full path, eg.
> "gvim afr.c"
> - GFS specific gdb macros to dump out pretty formatting of various structs
> (Jeff Darcy has some of these IIRC)

I also use few macros for printing dictionary and walking through the list 
structures.
I think it would be good to collect these macros, scripts and tool in a common 
place
so that people can use them. Can we include them in "extras/dev" directory
under Gluster source tree?

> - Write prove tests...for everything you write, and any bug you fix.  Make
> them deterministic (timing/races shouldn't matter).
> - Bugs/races and/or crashes which are hard or impossible to repro often
> require the creation of a developer specific feature to simulate the failure
> and efficiently code/test a fix.  Example: "monkey-unlocking" in the lock
> revocation patch I just posted.
> - That edge case you are ignoring because you think it's impossible/unlikely?
> We will find/hit it in 48hrs at large scale (seriously we will) handle
> it correctly or at a minimum write a (kernel style) "OOPS" log type message.
> 
> That's all I have off the top of my head.  I'll give example aliases in
> another reply.
> 
> Richard
> 
> Sent from my iPhone
> 
> > On Jan 22, 2016, at 6:14 AM, Raghavendra Talur  wrote:
> > 
> > HI All,
> > 
> > I am sure there are many tricks hidden under sleeves of many Gluster
> > developers.
> > I realized this when speaking to new developers. It would be good have a
> > searchable thread of such tricks.
> > 
> > Just reply back on this thread with the tricks that you have and I promise
> > I will collate them and add them to developer guide.
> > 
> > 
> > Looking forward to be amazed!
> > 
> > Thanks,
> > Raghavendra Talur
> > 
> > ___
> > Gluster-devel mailing list
> > Gluster-devel@gluster.org
> > https://urldefense.proofpoint.com/v2/url?u=http-3A__www.gluster.org_mailman_listinfo_gluster-2Ddevel=CwICAg=5VD0RTtNlTh3ycd41b3MUw=qJ8Lp7ySfpQklq3QZr44Iw=wVrGhYdkvCanDEZF0xOyVbFg0am_GxaoXR26Cvp7H2U=JOrY0up51BoZOq2sKaNJQHPzqKiUS3Bwgn7fr5VPXjw=
> ___
> Gluster-devel mailing list
> Gluster-devel@gluster.org
> http://www.gluster.org/mailman/listinfo/gluster-devel
> 
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] Throttling xlator on the bricks

2016-01-25 Thread Ravishankar N


On 01/26/2016 01:22 AM, Shreyas Siravara wrote:

Just out of curiosity, what benefits do we think this throttling xlator would provide 
over the "enable-least-priority" option (where we put all the fops from SHD, 
etc into a least pri queue)?

  


For one, it could provide more granularity on the amount of throttling 
you want to do, for specific fops, from specific clients. If the only 
I/O going through the bricks was from the SHD, they would all be 
least-priority but yet consume an unfair % of the CPU. We could tweak 
`performance.least-rate-limit` to throttle but it would be a global option.




On Jan 25, 2016, at 12:29 AM, Venky Shankar  wrote:

On Mon, Jan 25, 2016 at 01:08:38PM +0530, Ravishankar N wrote:

On 01/25/2016 12:56 PM, Venky Shankar wrote:

Also, it would be beneficial to have the core TBF implementation as part of
libglusterfs so as to be consumable by the server side xlator component to
throttle dispatched FOPs and for daemons to throttle anything that's outside
"brick" boundary (such as cpu, etc..).

That makes sense. We were initially thinking to overload posix_rchecksum()
to do the SHA256 sums for the signer.

That does have advantages by avoiding network rountrips by computing SHA* 
locally.
TBF could still implement ->rchecksum and throttle that (on behalf of clients,
residing on the server - internal daemons). Placing the core implementation as
part of libglusterfs would still provide the flexibility.




___
Gluster-devel mailing list
Gluster-devel@gluster.org
https://urldefense.proofpoint.com/v2/url?u=http-3A__www.gluster.org_mailman_listinfo_gluster-2Ddevel=CwICAg=5VD0RTtNlTh3ycd41b3MUw=N7LE2BKIHDDBvkYkakYthA=9W9xtRg0TIEUvFL-8HpUCux8psoWKkUbEFiwqykRwH4=OVF0dZRXt8GFcIxsHlkbNjH-bjD9097q5hjVVHgOFkQ=



___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] Throttling xlator on the bricks

2016-01-25 Thread Vijay Bellur

On 01/25/2016 12:36 AM, Ravishankar N wrote:

Hi,

We are planning to introduce a throttling xlator on the server (brick)
process to regulate FOPS. The main motivation is to solve complaints about
AFR selfheal taking too much of CPU resources. (due to too many fops for
entry
self-heal, rchecksums for data self-heal etc.)



I am wondering if we can re-use the same xlator for throttling 
bandwidth, iops etc. in addition to fops. Based on admin configured 
policies we could provide different upper thresholds to different 
clients/tenants and this could prove to be an useful feature in 
multitenant deployments to avoid starvation/noisy neighbor class of 
problems. Has any thought gone in this direction?




The throttling is achieved using the Token Bucket Filter algorithm
(TBF). TBF
is already used by bitrot's bitd signer (which is a client process) in
gluster to regulate the CPU intensive check-sum calculation. By putting the
logic on the brick side, multiple clients- selfheal, bitrot, rebalance or
even the mounts themselves can avail the benefits of throttling.

The TBF algorithm in a nutshell is as follows: There is a bucket which
is filled
at a steady (configurable) rate with tokens. Each FOP will need a fixed
amount
of tokens to be processed. If the bucket has that many tokens, the FOP is
allowed and that many tokens are removed from the bucket. If not, the FOP is
queued until the bucket is filled.

The xlator will need to reside above io-threads and can have different
buckets,
one per client. There has to be a communication mechanism between the
client and
the brick (IPC?) to tell what FOPS need to be regulated from it, and the
no. of
tokens needed etc. These need to be re configurable via appropriate
mechanisms.
Each bucket will have a token filler thread which will fill the tokens
in it.


If there is one bucket per client and one thread per bucket, it would be 
difficult to scale as the number of clients increase. How can we do this 
better?



The main thread will enqueue heals in a list in the bucket if there aren't
enough tokens. Once the token filler detects some FOPS can be serviced,
it will
send a cond-broadcast to a dequeue thread which will process (stack
wind) all
the FOPS that have the required no. of tokens from all buckets.

This is just a high level abstraction: requesting feedback on any aspect of
this feature. what kind of mechanism is best between the client/bricks for
tuning various parameters? What other requirements do you foresee?



I am in favor of having administrator defined policies or templates 
(collection of policies) being used to provide the tuning parameter per 
client or a set of clients. We could even have a default template per 
use case etc. Is there a specific need to have this negotiation between 
clients and servers?


Thanks,
Vijay

___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


[Gluster-devel] distributed files/directories and [cm]time updates

2016-01-25 Thread Pranith Kumar Karampuri

hi,
  Traditionally gluster has been using ctime/mtime of the 
files/dirs on the bricks as stat output. Problem we are seeing with this 
approach is that, software which depends on it gets confused when there 
are differences in these times. Tar especially gives "file changed as we 
read it" whenever it detects ctime differences when stat is served from 
different bricks. The way we have been trying to solve it is to serve 
the stat structures from same brick in afr, max-time in dht. But it 
doesn't avoid the problem completely. Because there is no way to change 
ctime at the moment(lutimes() only allows mtime, atime), there is little 
we can do to make sure ctimes match after self-heals/xattr 
updates/rebalance. I am wondering if anyone of you solved these problems 
before, if yes how did you go about doing it? It seems like applications 
which depend on this for backups get confused the same way. The only way 
out I see it is to bring ctime to an xattr, but that will need more iops 
and gluster has to keep updating it on quite a few fops.


Pranith
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] Throttling xlator on the bricks

2016-01-25 Thread Richard Wareing
> If there is one bucket per client and one thread per bucket, it would be
> difficult to scale as the number of clients increase. How can we do this
> better?

On this note... consider that 10's of thousands of clients are not unrealistic 
in production :).  Using a thread per bucket would also beunwise..

On the idea in general, I'm just wondering if there's specific (real-world) 
cases where this has even been an issue where least-prio queuing hasn't been 
able to handle?  Or is this more of a theoretical concern?  I ask as I've not 
really encountered situations where I wished I could give more FOPs to SHD vs 
rebalance and such.

In any event, it might be worth having Shreyas detail his throttling feature 
(that can throttle any directory hierarchy no less) to illustrate how a simpler 
design can achieve similar results to these more complicated (and it 
followsbug prone) approaches.

Richard


From: gluster-devel-boun...@gluster.org [gluster-devel-boun...@gluster.org] on 
behalf of Vijay Bellur [vbel...@redhat.com]
Sent: Monday, January 25, 2016 6:44 PM
To: Ravishankar N; Gluster Devel
Subject: Re: [Gluster-devel] Throttling xlator on the bricks

On 01/25/2016 12:36 AM, Ravishankar N wrote:
> Hi,
>
> We are planning to introduce a throttling xlator on the server (brick)
> process to regulate FOPS. The main motivation is to solve complaints about
> AFR selfheal taking too much of CPU resources. (due to too many fops for
> entry
> self-heal, rchecksums for data self-heal etc.)


I am wondering if we can re-use the same xlator for throttling
bandwidth, iops etc. in addition to fops. Based on admin configured
policies we could provide different upper thresholds to different
clients/tenants and this could prove to be an useful feature in
multitenant deployments to avoid starvation/noisy neighbor class of
problems. Has any thought gone in this direction?

>
> The throttling is achieved using the Token Bucket Filter algorithm
> (TBF). TBF
> is already used by bitrot's bitd signer (which is a client process) in
> gluster to regulate the CPU intensive check-sum calculation. By putting the
> logic on the brick side, multiple clients- selfheal, bitrot, rebalance or
> even the mounts themselves can avail the benefits of throttling.
>
> The TBF algorithm in a nutshell is as follows: There is a bucket which
> is filled
> at a steady (configurable) rate with tokens. Each FOP will need a fixed
> amount
> of tokens to be processed. If the bucket has that many tokens, the FOP is
> allowed and that many tokens are removed from the bucket. If not, the FOP is
> queued until the bucket is filled.
>
> The xlator will need to reside above io-threads and can have different
> buckets,
> one per client. There has to be a communication mechanism between the
> client and
> the brick (IPC?) to tell what FOPS need to be regulated from it, and the
> no. of
> tokens needed etc. These need to be re configurable via appropriate
> mechanisms.
> Each bucket will have a token filler thread which will fill the tokens
> in it.

If there is one bucket per client and one thread per bucket, it would be
difficult to scale as the number of clients increase. How can we do this
better?

> The main thread will enqueue heals in a list in the bucket if there aren't
> enough tokens. Once the token filler detects some FOPS can be serviced,
> it will
> send a cond-broadcast to a dequeue thread which will process (stack
> wind) all
> the FOPS that have the required no. of tokens from all buckets.
>
> This is just a high level abstraction: requesting feedback on any aspect of
> this feature. what kind of mechanism is best between the client/bricks for
> tuning various parameters? What other requirements do you foresee?
>

I am in favor of having administrator defined policies or templates
(collection of policies) being used to provide the tuning parameter per
client or a set of clients. We could even have a default template per
use case etc. Is there a specific need to have this negotiation between
clients and servers?

Thanks,
Vijay

___
Gluster-devel mailing list
Gluster-devel@gluster.org
https://urldefense.proofpoint.com/v2/url?u=http-3A__www.gluster.org_mailman_listinfo_gluster-2Ddevel=CwICAg=5VD0RTtNlTh3ycd41b3MUw=qJ8Lp7ySfpQklq3QZr44Iw=aQHnnoxK50Ebw77QHtp3ykjC976mJIt2qrIUzpqEViQ=Jitbldlbjwye6QI8V33ZoKtVt6-B64p2_-5piVlfXMQ=
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel