Re: [Gluster-devel] performance issues Manoj found in EC testing

2016-06-28 Thread Pranith Kumar Karampuri
>
>> Yes it will need some changes but I don't think they are big changes. I
>> think the functions to decode/encode already exist. We just to need to
>> move encoding/decoding as tasks and run as synctasks.
>>
>
> I was also thinking in sleeping fops. Currently when they are resumed,
> they are processed in the same thread that was processing another fop. This
> could add latencies to fops or unnecessary delays in lock management. If
> they can be scheduled to be executed by another thread, these delays are
> drastically reduced.
>
> On the other hand, splitting the computation of EC encoding into multiple
> threads is bad because current implementation takes advantage of internal
> CPU memory cache, which is really fast. We should compute all fragments of
> a single request in the same thread. Multiple independent computations
> could be executed by different threads.
>
>
>> Xavi,
>>   Long time back we chatted a bit about synctask code and you wanted
>> the scheduling to happen by kernel or something. Apart from that do you
>> see any other issues? At least if the tasks are synchronous i.e. nothing
>> goes out the wire, task scheduling = thread scheduling by kernel and it
>> works exactly like thread-pool you were referring to. It does
>> multi-tasking only if the tasks are asynchronous in nature.
>>
>
> How would this work ? should we have to create a new synctask for each
> background function we want to execute ? I think this has an important
> overhead, since each synctask requires its own stack, creates a frame that
> we don't really need in most cases, and it causes context switches.
>

Yes we will have to create a synctask. Yes it does have overhead of own
stack because it assumes the task will pause at some point. I think when
synctask framework was written the smallest thing that will be executed is
a fop over network. It was mainly written to do replace-brick using 'pump'
xlator which is now deprecated. If we know upfront that the task will never
pause there is absolutely no need to create a new stack. In which case it
just executes the function and moves on to the next task.


>
> We could have hundreds or thousands of requests per second. they could
> even require more than one background task for each request in some cases.
> I'm not sure if synctasks are the right choice in this case.
>

For each request we need to create a new synctask. It will be placed in the
tasks that are ready to execute. there will be 16 threads(in the stressful
scenario) waiting for new tasks, one of them will pick it up and execute.



>
> I think that a thread pool is more lightweight.
>

I think a small write-up of your thoughts on how it should be would be a
good start for us.

In my head a thread-pool is a set of threads waiting for incoming tasks.
Each thread picks up a new task and executes the task, upon completion it
will move on to the next task that needs to be executed.

Synctask framework is also a thread-pool waiting for incoming tasks. Each
thread picks up a task in readyq and executes the task. If the task has to
pause in the middle it will have to put it in wait-q and move on to the
next one. If the task never pauses, then it will complete the task
execution and moves on to the next task.

So synctask is more complex than thread-pool because it assumes the tasks
will pause. I am wondering if we can 1) break the complexity into
thread-pool which is more light-weight and add synctask framework on top of
it. or alternatively 2) Optimize synctask framework to perform synchronous
tasks without any stack creation and execute it in the thread stack itself.


>
> Xavi
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] performance issues Manoj found in EC testing

2016-06-28 Thread Xavier Hernandez

Hi Pranith,

On 28/06/16 08:08, Pranith Kumar Karampuri wrote:




On Tue, Jun 28, 2016 at 10:21 AM, Poornima Gurusiddaiah
mailto:pguru...@redhat.com>> wrote:

Regards,
Poornima



*From: *"Pranith Kumar Karampuri" mailto:pkara...@redhat.com>>
*To: *"Xavier Hernandez" mailto:xhernan...@datalab.es>>
*Cc: *"Gluster Devel" mailto:gluster-devel@gluster.org>>
*Sent: *Monday, June 27, 2016 5:48:24 PM
    *Subject: *Re: [Gluster-devel] performance issues Manoj found in
EC testing



On Mon, Jun 27, 2016 at 12:42 PM, Pranith Kumar Karampuri
mailto:pkara...@redhat.com>> wrote:



On Mon, Jun 27, 2016 at 11:52 AM, Xavier Hernandez
mailto:xhernan...@datalab.es>> wrote:

Hi Manoj,

I always enable client-io-threads option for disperse
volumes. It improves performance sensibly, most probably
because of the problem you have detected.

I don't see any other way to solve that problem.


I agree. Updated the bug with same info.


I think it would be a lot better to have a true thread
pool (and maybe an I/O thread pool shared by fuse,
client and server xlators) in libglusterfs instead of
the io-threads xlator. This would allow each xlator to
decide when and what should be parallelized in a more
intelligent way, since basing the decision solely on the
fop type seems too simplistic to me.

In the specific case of EC, there are a lot of
operations to perform for a single high level fop, and
not all of them require the same priority. Also some of
them could be executed in parallel instead of sequentially.


I think it is high time we actually schedule(for which
release) to get this in gluster. May be you should send out
a doc where we can work out details? I will be happy to
explore options to integrate io-threads, syncop/barrier with
this infra based on the design may be.


I was just thinking why we can't reuse synctask framework. It
already scales up/down based on the tasks. At max it uses 16
threads. Whatever we want to be executed in parallel we can
create a synctask around it and run it. Would that be good enough?

Yes, synctask framework can be preferred over io-threads, else it
would mean 16 synctask threads + 16(?) io-threads for one instance
of mount, this will blow out the gfapi clients if they have many
mounts from the same process. Also using synctask would mean code
changes in EC?


Yes it will need some changes but I don't think they are big changes. I
think the functions to decode/encode already exist. We just to need to
move encoding/decoding as tasks and run as synctasks.


I was also thinking in sleeping fops. Currently when they are resumed, 
they are processed in the same thread that was processing another fop. 
This could add latencies to fops or unnecessary delays in lock 
management. If they can be scheduled to be executed by another thread, 
these delays are drastically reduced.


On the other hand, splitting the computation of EC encoding into 
multiple threads is bad because current implementation takes advantage 
of internal CPU memory cache, which is really fast. We should compute 
all fragments of a single request in the same thread. Multiple 
independent computations could be executed by different threads.




Xavi,
  Long time back we chatted a bit about synctask code and you wanted
the scheduling to happen by kernel or something. Apart from that do you
see any other issues? At least if the tasks are synchronous i.e. nothing
goes out the wire, task scheduling = thread scheduling by kernel and it
works exactly like thread-pool you were referring to. It does
multi-tasking only if the tasks are asynchronous in nature.


How would this work ? should we have to create a new synctask for each 
background function we want to execute ? I think this has an important 
overhead, since each synctask requires its own stack, creates a frame 
that we don't really need in most cases, and it causes context switches.


We could have hundreds or thousands of requests per second. they could 
even require more than one background task for each request in some 
cases. I'm not sure if synctasks are the right choice in this case.


I think that a thread pool is more lightweight.

Xavi






Xavi


On 25/06/16 19:42, Manoj Pillai wrote:


- Original Message -

From: "Pranith Kumar Karampuri&quo

Re: [Gluster-devel] performance issues Manoj found in EC testing

2016-06-27 Thread Pranith Kumar Karampuri
On Tue, Jun 28, 2016 at 10:21 AM, Poornima Gurusiddaiah  wrote:

> Regards,
> Poornima
>
> --
>
> *From: *"Pranith Kumar Karampuri" 
> *To: *"Xavier Hernandez" 
> *Cc: *"Gluster Devel" 
> *Sent: *Monday, June 27, 2016 5:48:24 PM
> *Subject: *Re: [Gluster-devel] performance issues Manoj found in EC
> testing
>
>
>
> On Mon, Jun 27, 2016 at 12:42 PM, Pranith Kumar Karampuri <
> pkara...@redhat.com> wrote:
>
>>
>>
>> On Mon, Jun 27, 2016 at 11:52 AM, Xavier Hernandez > > wrote:
>>
>>> Hi Manoj,
>>>
>>> I always enable client-io-threads option for disperse volumes. It
>>> improves performance sensibly, most probably because of the problem you
>>> have detected.
>>>
>>> I don't see any other way to solve that problem.
>>>
>>
>> I agree. Updated the bug with same info.
>>
>>>
>>> I think it would be a lot better to have a true thread pool (and maybe
>>> an I/O thread pool shared by fuse, client and server xlators) in
>>> libglusterfs instead of the io-threads xlator. This would allow each xlator
>>> to decide when and what should be parallelized in a more intelligent way,
>>> since basing the decision solely on the fop type seems too simplistic to me.
>>>
>>> In the specific case of EC, there are a lot of operations to perform for
>>> a single high level fop, and not all of them require the same priority.
>>> Also some of them could be executed in parallel instead of sequentially.
>>>
>>
>> I think it is high time we actually schedule(for which release) to get
>> this in gluster. May be you should send out a doc where we can work out
>> details? I will be happy to explore options to integrate io-threads,
>> syncop/barrier with this infra based on the design may be.
>>
>
> I was just thinking why we can't reuse synctask framework. It already
> scales up/down based on the tasks. At max it uses 16 threads. Whatever we
> want to be executed in parallel we can create a synctask around it and run
> it. Would that be good enough?
>
> Yes, synctask framework can be preferred over io-threads, else it would
> mean 16 synctask threads + 16(?) io-threads for one instance of mount, this
> will blow out the gfapi clients if they have many mounts from the same
> process. Also using synctask would mean code changes in EC?
>

Yes it will need some changes but I don't think they are big changes. I
think the functions to decode/encode already exist. We just to need to move
encoding/decoding as tasks and run as synctasks.

Xavi,
  Long time back we chatted a bit about synctask code and you wanted
the scheduling to happen by kernel or something. Apart from that do you see
any other issues? At least if the tasks are synchronous i.e. nothing goes
out the wire, task scheduling = thread scheduling by kernel and it works
exactly like thread-pool you were referring to. It does multi-tasking only
if the tasks are asynchronous in nature.


>
>
>>> Xavi
>>>
>>>
>>> On 25/06/16 19:42, Manoj Pillai wrote:
>>>
>>>>
>>>> - Original Message -
>>>>
>>>>> From: "Pranith Kumar Karampuri" 
>>>>> To: "Xavier Hernandez" 
>>>>> Cc: "Manoj Pillai" , "Gluster Devel" <
>>>>> gluster-devel@gluster.org>
>>>>> Sent: Thursday, June 23, 2016 8:50:44 PM
>>>>> Subject: performance issues Manoj found in EC testing
>>>>>
>>>>> hi Xavi,
>>>>>   Meet Manoj from performance team Redhat. He has been testing
>>>>> EC
>>>>> performance in his stretch clusters. He found some interesting things
>>>>> we
>>>>> would like to share with you.
>>>>>
>>>>> 1) When we perform multiple streams of big file writes(12 parallel dds
>>>>> I
>>>>> think) he found one thread to be always hot (99%CPU always). He was
>>>>> asking
>>>>> me if fuse_reader thread does any extra processing in EC compared to
>>>>> replicate. Initially I thought it would just lock and epoll threads
>>>>> will
>>>>> perform the encoding but later realized that once we have the lock and
>>>>> version details, next writes on the file would be encoded in the same
>>>>> thread that comes to EC. write-behind could play a role and make the
>>>>> writes
>>>>> co

Re: [Gluster-devel] performance issues Manoj found in EC testing

2016-06-27 Thread Poornima Gurusiddaiah
Regards, 
Poornima 

- Original Message -

> From: "Pranith Kumar Karampuri" 
> To: "Xavier Hernandez" 
> Cc: "Gluster Devel" 
> Sent: Monday, June 27, 2016 5:48:24 PM
> Subject: Re: [Gluster-devel] performance issues Manoj found in EC testing

> On Mon, Jun 27, 2016 at 12:42 PM, Pranith Kumar Karampuri <
> pkara...@redhat.com > wrote:

> > On Mon, Jun 27, 2016 at 11:52 AM, Xavier Hernandez < xhernan...@datalab.es
> > >
> > wrote:
> 

> > > Hi Manoj,
> > 
> 

> > > I always enable client-io-threads option for disperse volumes. It
> > > improves
> > > performance sensibly, most probably because of the problem you have
> > > detected.
> > 
> 

> > > I don't see any other way to solve that problem.
> > 
> 

> > I agree. Updated the bug with same info.
> 

> > > I think it would be a lot better to have a true thread pool (and maybe an
> > > I/O
> > > thread pool shared by fuse, client and server xlators) in libglusterfs
> > > instead of the io-threads xlator. This would allow each xlator to decide
> > > when and what should be parallelized in a more intelligent way, since
> > > basing
> > > the decision solely on the fop type seems too simplistic to me.
> > 
> 

> > > In the specific case of EC, there are a lot of operations to perform for
> > > a
> > > single high level fop, and not all of them require the same priority.
> > > Also
> > > some of them could be executed in parallel instead of sequentially.
> > 
> 

> > I think it is high time we actually schedule(for which release) to get this
> > in gluster. May be you should send out a doc where we can work out details?
> > I will be happy to explore options to integrate io-threads, syncop/barrier
> > with this infra based on the design may be.
> 

> I was just thinking why we can't reuse synctask framework. It already scales
> up/down based on the tasks. At max it uses 16 threads. Whatever we want to
> be executed in parallel we can create a synctask around it and run it. Would
> that be good enough?

Yes, synctask framework can be preferred over io-threads, else it would mean 16 
synctask threads + 16(?) io-threads for one instance of mount, this will blow 
out the gfapi clients if they have many mounts from the same process. Also 
using synctask would mean code changes in EC? 

> > > Xavi
> > 
> 

> > > On 25/06/16 19:42, Manoj Pillai wrote:
> > 
> 

> > > > - Original Message -
> > > 
> > 
> 

> > > > > From: "Pranith Kumar Karampuri" < pkara...@redhat.com >
> > > > 
> > > 
> > 
> 
> > > > > To: "Xavier Hernandez" < xhernan...@datalab.es >
> > > > 
> > > 
> > 
> 
> > > > > Cc: "Manoj Pillai" < mpil...@redhat.com >, "Gluster Devel" <
> > > > > gluster-devel@gluster.org >
> > > > 
> > > 
> > 
> 
> > > > > Sent: Thursday, June 23, 2016 8:50:44 PM
> > > > 
> > > 
> > 
> 
> > > > > Subject: performance issues Manoj found in EC testing
> > > > 
> > > 
> > 
> 

> > > > > hi Xavi,
> > > > 
> > > 
> > 
> 
> > > > > Meet Manoj from performance team Redhat. He has been testing EC
> > > > 
> > > 
> > 
> 
> > > > > performance in his stretch clusters. He found some interesting things
> > > > > we
> > > > 
> > > 
> > 
> 
> > > > > would like to share with you.
> > > > 
> > > 
> > 
> 

> > > > > 1) When we perform multiple streams of big file writes(12 parallel
> > > > > dds
> > > > > I
> > > > 
> > > 
> > 
> 
> > > > > think) he found one thread to be always hot (99%CPU always). He was
> > > > > asking
> > > > 
> > > 
> > 
> 
> > > > > me if fuse_reader thread does any extra processing in EC compared to
> > > > 
> > > 
> > 
> 
> > > > > replicate. Initially I thought it would just lock and epoll threads
> > > > > will
> > > > 
> > > 
> > 
> 
> > > > > perform the encoding but later realized that once we have the lock
> > > > > and
> > > > 
> > > 
> > 
> 

Re: [Gluster-devel] performance issues Manoj found in EC testing

2016-06-27 Thread Manoj Pillai

- Original Message -
> From: "Sankarshan Mukhopadhyay" 
> To: "Gluster Devel" 
> Sent: Monday, June 27, 2016 5:54:19 PM
> Subject: Re: [Gluster-devel] performance issues Manoj found in EC testing
> 
> On Mon, Jun 27, 2016 at 2:38 PM, Manoj Pillai  wrote:
> > Thanks, folks! As a quick update, throughput on a single client test jumped
> > from ~180 MB/s to 700+MB/s after enabling client-io-threads. Throughput is
> > now more in line with what is expected for this workload based on
> > back-of-the-envelope calculations.
> 
> Is it possible to provide additional detail about this exercise in
> terms of setup; tests executed; data sets generated?

Yes, in the bz. The "before" number is from here:
https://bugzilla.redhat.com/show_bug.cgi?id=1349953#c1

-- Manoj


> 
> --
> sankarshan mukhopadhyay
> <https://about.me/sankarshan.mukhopadhyay>
> ___
> Gluster-devel mailing list
> Gluster-devel@gluster.org
> http://www.gluster.org/mailman/listinfo/gluster-devel
> 
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] performance issues Manoj found in EC testing

2016-06-27 Thread Sankarshan Mukhopadhyay
On Mon, Jun 27, 2016 at 2:38 PM, Manoj Pillai  wrote:
> Thanks, folks! As a quick update, throughput on a single client test jumped
> from ~180 MB/s to 700+MB/s after enabling client-io-threads. Throughput is
> now more in line with what is expected for this workload based on
> back-of-the-envelope calculations.

Is it possible to provide additional detail about this exercise in
terms of setup; tests executed; data sets generated?

-- 
sankarshan mukhopadhyay

___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] performance issues Manoj found in EC testing

2016-06-27 Thread Pranith Kumar Karampuri
On Mon, Jun 27, 2016 at 12:42 PM, Pranith Kumar Karampuri <
pkara...@redhat.com> wrote:

>
>
> On Mon, Jun 27, 2016 at 11:52 AM, Xavier Hernandez 
> wrote:
>
>> Hi Manoj,
>>
>> I always enable client-io-threads option for disperse volumes. It
>> improves performance sensibly, most probably because of the problem you
>> have detected.
>>
>> I don't see any other way to solve that problem.
>>
>
> I agree. Updated the bug with same info.
>
>
>>
>> I think it would be a lot better to have a true thread pool (and maybe an
>> I/O thread pool shared by fuse, client and server xlators) in libglusterfs
>> instead of the io-threads xlator. This would allow each xlator to decide
>> when and what should be parallelized in a more intelligent way, since
>> basing the decision solely on the fop type seems too simplistic to me.
>>
>> In the specific case of EC, there are a lot of operations to perform for
>> a single high level fop, and not all of them require the same priority.
>> Also some of them could be executed in parallel instead of sequentially.
>>
>
> I think it is high time we actually schedule(for which release) to get
> this in gluster. May be you should send out a doc where we can work out
> details? I will be happy to explore options to integrate io-threads,
> syncop/barrier with this infra based on the design may be.
>

I was just thinking why we can't reuse synctask framework. It already
scales up/down based on the tasks. At max it uses 16 threads. Whatever we
want to be executed in parallel we can create a synctask around it and run
it. Would that be good enough?


>
>
>>
>> Xavi
>>
>>
>> On 25/06/16 19:42, Manoj Pillai wrote:
>>
>>>
>>> - Original Message -
>>>
 From: "Pranith Kumar Karampuri" 
 To: "Xavier Hernandez" 
 Cc: "Manoj Pillai" , "Gluster Devel" <
 gluster-devel@gluster.org>
 Sent: Thursday, June 23, 2016 8:50:44 PM
 Subject: performance issues Manoj found in EC testing

 hi Xavi,
   Meet Manoj from performance team Redhat. He has been testing
 EC
 performance in his stretch clusters. He found some interesting things we
 would like to share with you.

 1) When we perform multiple streams of big file writes(12 parallel dds I
 think) he found one thread to be always hot (99%CPU always). He was
 asking
 me if fuse_reader thread does any extra processing in EC compared to
 replicate. Initially I thought it would just lock and epoll threads will
 perform the encoding but later realized that once we have the lock and
 version details, next writes on the file would be encoded in the same
 thread that comes to EC. write-behind could play a role and make the
 writes
 come to EC in an epoll thread but we saw consistently there was just one
 thread that is hot. Not multiple threads. We will be able to confirm
 this
 in tomorrow's testing.

 2) This is one more thing Raghavendra G found, that our current
 implementation of epoll doesn't let other epoll threads pick messages
 from
 a socket while one thread is processing one message from that socket. In
 EC's case that can be encoding of the write/decoding read. This will not
 let replies of operations on different files to be processed in
 parallel.
 He thinks this can be fixed for 3.9.

 Manoj will be raising a bug to gather all his findings. I just wanted to
 introduce him and let you know the interesting things he is finding
 before
 you see the bug :-).
 --
 Pranith

>>>
>>> Thanks, Pranith :).
>>>
>>> Here's the bug: https://bugzilla.redhat.com/show_bug.cgi?id=1349953
>>>
>>> Comparing EC and replica-2 runs, the hot thread is seen in both cases, so
>>> I have not opened this as an EC bug. But initial impression is that
>>> performance impact for EC is particularly bad (details in the bug).
>>>
>>> -- Manoj
>>>
>>>
>
>
> --
> Pranith
>



-- 
Pranith
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] performance issues Manoj found in EC testing

2016-06-27 Thread Pranith Kumar Karampuri
On Mon, Jun 27, 2016 at 2:38 PM, Manoj Pillai  wrote:

>
>
> - Original Message -
> > From: "Raghavendra Gowdappa" 
> > To: "Pranith Kumar Karampuri" 
> > Cc: "Gluster Devel" 
> > Sent: Monday, June 27, 2016 12:48:49 PM
> > Subject: Re: [Gluster-devel] performance issues Manoj found in EC testing
> >
> >
> >
> > - Original Message -
> > > From: "Pranith Kumar Karampuri" 
> > > To: "Xavier Hernandez" 
> > > Cc: "Gluster Devel" 
> > > Sent: Monday, June 27, 2016 12:42:35 PM
> > > Subject: Re: [Gluster-devel] performance issues Manoj found in EC
> testing
> > >
> > >
> > >
> > > On Mon, Jun 27, 2016 at 11:52 AM, Xavier Hernandez <
> xhernan...@datalab.es
> > > >
> > > wrote:
> > >
> > >
> > > Hi Manoj,
> > >
> > > I always enable client-io-threads option for disperse volumes. It
> improves
> > > performance sensibly, most probably because of the problem you have
> > > detected.
> > >
> > > I don't see any other way to solve that problem.
> > >
> > > I agree. Updated the bug with same info.
> > >
> > >
> > >
> > > I think it would be a lot better to have a true thread pool (and maybe
> an
> > > I/O
> > > thread pool shared by fuse, client and server xlators) in libglusterfs
> > > instead of the io-threads xlator. This would allow each xlator to
> decide
> > > when and what should be parallelized in a more intelligent way, since
> > > basing
> > > the decision solely on the fop type seems too simplistic to me.
> > >
> > > In the specific case of EC, there are a lot of operations to perform
> for a
> > > single high level fop, and not all of them require the same priority.
> Also
> > > some of them could be executed in parallel instead of sequentially.
> > >
> > > I think it is high time we actually schedule(for which release) to get
> this
> > > in gluster. May be you should send out a doc where we can work out
> details?
> > > I will be happy to explore options to integrate io-threads,
> syncop/barrier
> > > with this infra based on the design may be.
> >
> > +1. I can volunteer too.
>
> Thanks, folks! As a quick update, throughput on a single client test jumped
> from ~180 MB/s to 700+MB/s after enabling client-io-threads. Throughput is
> now more in line with what is expected for this workload based on
> back-of-the-envelope calculations.
>
> Are there any reservations about recommending client-io-threads=on as
> "default" tuning, until the enhancement discussed above becomes reality?
>

The only thing I can think of is possible races we may have to address
after enabling this option. So I would let it bake on master for a while
with this as default may be?


> -- Manoj
>
> >
> > >
> > >
> > >
> > > Xavi
> > >
> > >
> > > On 25/06/16 19:42, Manoj Pillai wrote:
> > >
> > >
> > >
> > > - Original Message -
> > >
> > >
> > > From: "Pranith Kumar Karampuri" < pkara...@redhat.com >
> > > To: "Xavier Hernandez" < xhernan...@datalab.es >
> > > Cc: "Manoj Pillai" < mpil...@redhat.com >, "Gluster Devel" <
> > > gluster-devel@gluster.org >
> > > Sent: Thursday, June 23, 2016 8:50:44 PM
> > > Subject: performance issues Manoj found in EC testing
> > >
> > > hi Xavi,
> > > Meet Manoj from performance team Redhat. He has been testing EC
> > > performance in his stretch clusters. He found some interesting things
> we
> > > would like to share with you.
> > >
> > > 1) When we perform multiple streams of big file writes(12 parallel dds
> I
> > > think) he found one thread to be always hot (99%CPU always). He was
> asking
> > > me if fuse_reader thread does any extra processing in EC compared to
> > > replicate. Initially I thought it would just lock and epoll threads
> will
> > > perform the encoding but later realized that once we have the lock and
> > > version details, next writes on the file would be encoded in the same
> > > thread that comes to EC. write-behind could play a role and make the
> writes
> > > come to EC in an epoll thread but we saw consistently there was just
> one
> > > thread that is hot. Not multiple threads. W

Re: [Gluster-devel] performance issues Manoj found in EC testing

2016-06-27 Thread Manoj Pillai


- Original Message -
> From: "Raghavendra Gowdappa" 
> To: "Pranith Kumar Karampuri" 
> Cc: "Gluster Devel" 
> Sent: Monday, June 27, 2016 12:48:49 PM
> Subject: Re: [Gluster-devel] performance issues Manoj found in EC testing
> 
> 
> 
> - Original Message -
> > From: "Pranith Kumar Karampuri" 
> > To: "Xavier Hernandez" 
> > Cc: "Gluster Devel" 
> > Sent: Monday, June 27, 2016 12:42:35 PM
> > Subject: Re: [Gluster-devel] performance issues Manoj found in EC testing
> > 
> > 
> > 
> > On Mon, Jun 27, 2016 at 11:52 AM, Xavier Hernandez < xhernan...@datalab.es
> > >
> > wrote:
> > 
> > 
> > Hi Manoj,
> > 
> > I always enable client-io-threads option for disperse volumes. It improves
> > performance sensibly, most probably because of the problem you have
> > detected.
> > 
> > I don't see any other way to solve that problem.
> > 
> > I agree. Updated the bug with same info.
> > 
> > 
> > 
> > I think it would be a lot better to have a true thread pool (and maybe an
> > I/O
> > thread pool shared by fuse, client and server xlators) in libglusterfs
> > instead of the io-threads xlator. This would allow each xlator to decide
> > when and what should be parallelized in a more intelligent way, since
> > basing
> > the decision solely on the fop type seems too simplistic to me.
> > 
> > In the specific case of EC, there are a lot of operations to perform for a
> > single high level fop, and not all of them require the same priority. Also
> > some of them could be executed in parallel instead of sequentially.
> > 
> > I think it is high time we actually schedule(for which release) to get this
> > in gluster. May be you should send out a doc where we can work out details?
> > I will be happy to explore options to integrate io-threads, syncop/barrier
> > with this infra based on the design may be.
> 
> +1. I can volunteer too.

Thanks, folks! As a quick update, throughput on a single client test jumped 
from ~180 MB/s to 700+MB/s after enabling client-io-threads. Throughput is 
now more in line with what is expected for this workload based on 
back-of-the-envelope calculations.

Are there any reservations about recommending client-io-threads=on as 
"default" tuning, until the enhancement discussed above becomes reality? 

-- Manoj

> 
> > 
> > 
> > 
> > Xavi
> > 
> > 
> > On 25/06/16 19:42, Manoj Pillai wrote:
> > 
> > 
> > 
> > - Original Message -
> > 
> > 
> > From: "Pranith Kumar Karampuri" < pkara...@redhat.com >
> > To: "Xavier Hernandez" < xhernan...@datalab.es >
> > Cc: "Manoj Pillai" < mpil...@redhat.com >, "Gluster Devel" <
> > gluster-devel@gluster.org >
> > Sent: Thursday, June 23, 2016 8:50:44 PM
> > Subject: performance issues Manoj found in EC testing
> > 
> > hi Xavi,
> > Meet Manoj from performance team Redhat. He has been testing EC
> > performance in his stretch clusters. He found some interesting things we
> > would like to share with you.
> > 
> > 1) When we perform multiple streams of big file writes(12 parallel dds I
> > think) he found one thread to be always hot (99%CPU always). He was asking
> > me if fuse_reader thread does any extra processing in EC compared to
> > replicate. Initially I thought it would just lock and epoll threads will
> > perform the encoding but later realized that once we have the lock and
> > version details, next writes on the file would be encoded in the same
> > thread that comes to EC. write-behind could play a role and make the writes
> > come to EC in an epoll thread but we saw consistently there was just one
> > thread that is hot. Not multiple threads. We will be able to confirm this
> > in tomorrow's testing.
> > 
> > 2) This is one more thing Raghavendra G found, that our current
> > implementation of epoll doesn't let other epoll threads pick messages from
> > a socket while one thread is processing one message from that socket. In
> > EC's case that can be encoding of the write/decoding read. This will not
> > let replies of operations on different files to be processed in parallel.
> > He thinks this can be fixed for 3.9.
> > 
> > Manoj will be raising a bug to gather all his findings. I just wanted to
> > introduce him and let you know the interesting things he is finding before
> > you see the bug :-).
> > --
&g

Re: [Gluster-devel] performance issues Manoj found in EC testing

2016-06-27 Thread Raghavendra Gowdappa


- Original Message -
> From: "Pranith Kumar Karampuri" 
> To: "Xavier Hernandez" 
> Cc: "Gluster Devel" 
> Sent: Monday, June 27, 2016 12:42:35 PM
> Subject: Re: [Gluster-devel] performance issues Manoj found in EC testing
> 
> 
> 
> On Mon, Jun 27, 2016 at 11:52 AM, Xavier Hernandez < xhernan...@datalab.es >
> wrote:
> 
> 
> Hi Manoj,
> 
> I always enable client-io-threads option for disperse volumes. It improves
> performance sensibly, most probably because of the problem you have
> detected.
> 
> I don't see any other way to solve that problem.
> 
> I agree. Updated the bug with same info.
> 
> 
> 
> I think it would be a lot better to have a true thread pool (and maybe an I/O
> thread pool shared by fuse, client and server xlators) in libglusterfs
> instead of the io-threads xlator. This would allow each xlator to decide
> when and what should be parallelized in a more intelligent way, since basing
> the decision solely on the fop type seems too simplistic to me.
> 
> In the specific case of EC, there are a lot of operations to perform for a
> single high level fop, and not all of them require the same priority. Also
> some of them could be executed in parallel instead of sequentially.
> 
> I think it is high time we actually schedule(for which release) to get this
> in gluster. May be you should send out a doc where we can work out details?
> I will be happy to explore options to integrate io-threads, syncop/barrier
> with this infra based on the design may be.

+1. I can volunteer too.

> 
> 
> 
> Xavi
> 
> 
> On 25/06/16 19:42, Manoj Pillai wrote:
> 
> 
> 
> - Original Message -
> 
> 
> From: "Pranith Kumar Karampuri" < pkara...@redhat.com >
> To: "Xavier Hernandez" < xhernan...@datalab.es >
> Cc: "Manoj Pillai" < mpil...@redhat.com >, "Gluster Devel" <
> gluster-devel@gluster.org >
> Sent: Thursday, June 23, 2016 8:50:44 PM
> Subject: performance issues Manoj found in EC testing
> 
> hi Xavi,
> Meet Manoj from performance team Redhat. He has been testing EC
> performance in his stretch clusters. He found some interesting things we
> would like to share with you.
> 
> 1) When we perform multiple streams of big file writes(12 parallel dds I
> think) he found one thread to be always hot (99%CPU always). He was asking
> me if fuse_reader thread does any extra processing in EC compared to
> replicate. Initially I thought it would just lock and epoll threads will
> perform the encoding but later realized that once we have the lock and
> version details, next writes on the file would be encoded in the same
> thread that comes to EC. write-behind could play a role and make the writes
> come to EC in an epoll thread but we saw consistently there was just one
> thread that is hot. Not multiple threads. We will be able to confirm this
> in tomorrow's testing.
> 
> 2) This is one more thing Raghavendra G found, that our current
> implementation of epoll doesn't let other epoll threads pick messages from
> a socket while one thread is processing one message from that socket. In
> EC's case that can be encoding of the write/decoding read. This will not
> let replies of operations on different files to be processed in parallel.
> He thinks this can be fixed for 3.9.
> 
> Manoj will be raising a bug to gather all his findings. I just wanted to
> introduce him and let you know the interesting things he is finding before
> you see the bug :-).
> --
> Pranith
> 
> Thanks, Pranith :).
> 
> Here's the bug: https://bugzilla.redhat.com/show_bug.cgi?id=1349953
> 
> Comparing EC and replica-2 runs, the hot thread is seen in both cases, so
> I have not opened this as an EC bug. But initial impression is that
> performance impact for EC is particularly bad (details in the bug).
> 
> -- Manoj
> 
> 
> 
> 
> --
> Pranith
> 
> ___
> Gluster-devel mailing list
> Gluster-devel@gluster.org
> http://www.gluster.org/mailman/listinfo/gluster-devel
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] performance issues Manoj found in EC testing

2016-06-27 Thread Pranith Kumar Karampuri
On Mon, Jun 27, 2016 at 11:52 AM, Xavier Hernandez 
wrote:

> Hi Manoj,
>
> I always enable client-io-threads option for disperse volumes. It improves
> performance sensibly, most probably because of the problem you have
> detected.
>
> I don't see any other way to solve that problem.
>

I agree. Updated the bug with same info.


>
> I think it would be a lot better to have a true thread pool (and maybe an
> I/O thread pool shared by fuse, client and server xlators) in libglusterfs
> instead of the io-threads xlator. This would allow each xlator to decide
> when and what should be parallelized in a more intelligent way, since
> basing the decision solely on the fop type seems too simplistic to me.
>
> In the specific case of EC, there are a lot of operations to perform for a
> single high level fop, and not all of them require the same priority. Also
> some of them could be executed in parallel instead of sequentially.
>

I think it is high time we actually schedule(for which release) to get this
in gluster. May be you should send out a doc where we can work out details?
I will be happy to explore options to integrate io-threads, syncop/barrier
with this infra based on the design may be.


>
> Xavi
>
>
> On 25/06/16 19:42, Manoj Pillai wrote:
>
>>
>> - Original Message -
>>
>>> From: "Pranith Kumar Karampuri" 
>>> To: "Xavier Hernandez" 
>>> Cc: "Manoj Pillai" , "Gluster Devel" <
>>> gluster-devel@gluster.org>
>>> Sent: Thursday, June 23, 2016 8:50:44 PM
>>> Subject: performance issues Manoj found in EC testing
>>>
>>> hi Xavi,
>>>   Meet Manoj from performance team Redhat. He has been testing EC
>>> performance in his stretch clusters. He found some interesting things we
>>> would like to share with you.
>>>
>>> 1) When we perform multiple streams of big file writes(12 parallel dds I
>>> think) he found one thread to be always hot (99%CPU always). He was
>>> asking
>>> me if fuse_reader thread does any extra processing in EC compared to
>>> replicate. Initially I thought it would just lock and epoll threads will
>>> perform the encoding but later realized that once we have the lock and
>>> version details, next writes on the file would be encoded in the same
>>> thread that comes to EC. write-behind could play a role and make the
>>> writes
>>> come to EC in an epoll thread but we saw consistently there was just one
>>> thread that is hot. Not multiple threads. We will be able to confirm this
>>> in tomorrow's testing.
>>>
>>> 2) This is one more thing Raghavendra G found, that our current
>>> implementation of epoll doesn't let other epoll threads pick messages
>>> from
>>> a socket while one thread is processing one message from that socket. In
>>> EC's case that can be encoding of the write/decoding read. This will not
>>> let replies of operations on different files to be processed in parallel.
>>> He thinks this can be fixed for 3.9.
>>>
>>> Manoj will be raising a bug to gather all his findings. I just wanted to
>>> introduce him and let you know the interesting things he is finding
>>> before
>>> you see the bug :-).
>>> --
>>> Pranith
>>>
>>
>> Thanks, Pranith :).
>>
>> Here's the bug: https://bugzilla.redhat.com/show_bug.cgi?id=1349953
>>
>> Comparing EC and replica-2 runs, the hot thread is seen in both cases, so
>> I have not opened this as an EC bug. But initial impression is that
>> performance impact for EC is particularly bad (details in the bug).
>>
>> -- Manoj
>>
>>


-- 
Pranith
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] performance issues Manoj found in EC testing

2016-06-26 Thread Xavier Hernandez

Hi Manoj,

I always enable client-io-threads option for disperse volumes. It 
improves performance sensibly, most probably because of the problem you 
have detected.


I don't see any other way to solve that problem.

I think it would be a lot better to have a true thread pool (and maybe 
an I/O thread pool shared by fuse, client and server xlators) in 
libglusterfs instead of the io-threads xlator. This would allow each 
xlator to decide when and what should be parallelized in a more 
intelligent way, since basing the decision solely on the fop type seems 
too simplistic to me.


In the specific case of EC, there are a lot of operations to perform for 
a single high level fop, and not all of them require the same priority. 
Also some of them could be executed in parallel instead of sequentially.


Xavi

On 25/06/16 19:42, Manoj Pillai wrote:


- Original Message -

From: "Pranith Kumar Karampuri" 
To: "Xavier Hernandez" 
Cc: "Manoj Pillai" , "Gluster Devel" 

Sent: Thursday, June 23, 2016 8:50:44 PM
Subject: performance issues Manoj found in EC testing

hi Xavi,
  Meet Manoj from performance team Redhat. He has been testing EC
performance in his stretch clusters. He found some interesting things we
would like to share with you.

1) When we perform multiple streams of big file writes(12 parallel dds I
think) he found one thread to be always hot (99%CPU always). He was asking
me if fuse_reader thread does any extra processing in EC compared to
replicate. Initially I thought it would just lock and epoll threads will
perform the encoding but later realized that once we have the lock and
version details, next writes on the file would be encoded in the same
thread that comes to EC. write-behind could play a role and make the writes
come to EC in an epoll thread but we saw consistently there was just one
thread that is hot. Not multiple threads. We will be able to confirm this
in tomorrow's testing.

2) This is one more thing Raghavendra G found, that our current
implementation of epoll doesn't let other epoll threads pick messages from
a socket while one thread is processing one message from that socket. In
EC's case that can be encoding of the write/decoding read. This will not
let replies of operations on different files to be processed in parallel.
He thinks this can be fixed for 3.9.

Manoj will be raising a bug to gather all his findings. I just wanted to
introduce him and let you know the interesting things he is finding before
you see the bug :-).
--
Pranith


Thanks, Pranith :).

Here's the bug: https://bugzilla.redhat.com/show_bug.cgi?id=1349953

Comparing EC and replica-2 runs, the hot thread is seen in both cases, so
I have not opened this as an EC bug. But initial impression is that
performance impact for EC is particularly bad (details in the bug).

-- Manoj


___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] performance issues Manoj found in EC testing

2016-06-25 Thread Manoj Pillai

- Original Message -
> From: "Pranith Kumar Karampuri" 
> To: "Xavier Hernandez" 
> Cc: "Manoj Pillai" , "Gluster Devel" 
> 
> Sent: Thursday, June 23, 2016 8:50:44 PM
> Subject: performance issues Manoj found in EC testing
> 
> hi Xavi,
>   Meet Manoj from performance team Redhat. He has been testing EC
> performance in his stretch clusters. He found some interesting things we
> would like to share with you.
> 
> 1) When we perform multiple streams of big file writes(12 parallel dds I
> think) he found one thread to be always hot (99%CPU always). He was asking
> me if fuse_reader thread does any extra processing in EC compared to
> replicate. Initially I thought it would just lock and epoll threads will
> perform the encoding but later realized that once we have the lock and
> version details, next writes on the file would be encoded in the same
> thread that comes to EC. write-behind could play a role and make the writes
> come to EC in an epoll thread but we saw consistently there was just one
> thread that is hot. Not multiple threads. We will be able to confirm this
> in tomorrow's testing.
> 
> 2) This is one more thing Raghavendra G found, that our current
> implementation of epoll doesn't let other epoll threads pick messages from
> a socket while one thread is processing one message from that socket. In
> EC's case that can be encoding of the write/decoding read. This will not
> let replies of operations on different files to be processed in parallel.
> He thinks this can be fixed for 3.9.
> 
> Manoj will be raising a bug to gather all his findings. I just wanted to
> introduce him and let you know the interesting things he is finding before
> you see the bug :-).
> --
> Pranith

Thanks, Pranith :).

Here's the bug: https://bugzilla.redhat.com/show_bug.cgi?id=1349953

Comparing EC and replica-2 runs, the hot thread is seen in both cases, so 
I have not opened this as an EC bug. But initial impression is that 
performance impact for EC is particularly bad (details in the bug).

-- Manoj
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


[Gluster-devel] performance issues Manoj found in EC testing

2016-06-23 Thread Pranith Kumar Karampuri
hi Xavi,
  Meet Manoj from performance team Redhat. He has been testing EC
performance in his stretch clusters. He found some interesting things we
would like to share with you.

1) When we perform multiple streams of big file writes(12 parallel dds I
think) he found one thread to be always hot (99%CPU always). He was asking
me if fuse_reader thread does any extra processing in EC compared to
replicate. Initially I thought it would just lock and epoll threads will
perform the encoding but later realized that once we have the lock and
version details, next writes on the file would be encoded in the same
thread that comes to EC. write-behind could play a role and make the writes
come to EC in an epoll thread but we saw consistently there was just one
thread that is hot. Not multiple threads. We will be able to confirm this
in tomorrow's testing.

2) This is one more thing Raghavendra G found, that our current
implementation of epoll doesn't let other epoll threads pick messages from
a socket while one thread is processing one message from that socket. In
EC's case that can be encoding of the write/decoding read. This will not
let replies of operations on different files to be processed in parallel.
He thinks this can be fixed for 3.9.

Manoj will be raising a bug to gather all his findings. I just wanted to
introduce him and let you know the interesting things he is finding before
you see the bug :-).
-- 
Pranith
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel