Re: [Gluster-devel] Implementing multiplexing for self heal client.

2019-01-07 Thread RAFI KC
I have completed the patches and pushed for reviews. Please feel free to 
raise your review concerns/suggestions.



https://review.gluster.org/#/c/glusterfs/+/21868 



https://review.gluster.org/#/c/glusterfs/+/21907 



https://review.gluster.org/#/c/glusterfs/+/21960 



https://review.gluster.org/#/c/glusterfs/+/21989/ 




Regards

Rafi KC


On 12/24/18 3:58 PM, RAFI KC wrote:


On 12/21/18 6:56 PM, Sankarshan Mukhopadhyay wrote:

On Fri, Dec 21, 2018 at 6:30 PM RAFI KC  wrote:

Hi All,

What is the problem?
As of now self-heal client is running as one daemon per node, this 
means

even if there are multiple volumes, there will only be one self-heal
daemon. So to take effect of each configuration changes in the cluster,
the self-heal has to be reconfigured. But it doesn't have ability to
dynamically reconfigure. Which means when you have lot of volumes in 
the
cluster, every management operation that involves configurations 
changes

like volume start/stop, add/remove brick etc will result in self-heal
daemon restart. If such operation is executed more often, it is not 
only

slow down self-heal for a volume, but also increases the slef-heal logs
substantially.

What is the value of the number of volumes when you write "lot of
volumes"? 1000 volumes, more etc


Yes, more than 1000 volumes. It also depends on how often you execute 
glusterd management operations (mentioned above). Each time self heal 
daemon is restarted, it prints the entire graph. This graph traces in 
the log will contribute the majority it's size.







How to fix it?

We are planning to follow a similar procedure as attach/detach graphs
dynamically which is similar to brick multiplex. The detailed steps is
as below,




1) First step is to make shd per volume daemon, to generate/reconfigure
volfiles per volume basis .

    1.1) This will help to attach the volfiles easily to existing 
shd daemon


    1.2) This will help to send notification to shd daemon as each
volinfo keeps the daemon object

    1.3) reconfiguring a particular subvolume is easier as we can check
the topology better

    1.4) With this change the volfiles will be moved to workdir/vols/
directory.

2) Writing new rpc requests like attach/detach_client_graph function to
support clients attach/detach

    2.1) Also functions like graph reconfigure, mgmt_getspec_cbk has to
be modified

3) Safely detaching a subvolume when there are pending frames to 
unwind.


    3.1) We can mark the client disconnected and make all the frames to
unwind with ENOTCONN

    3.2) We can wait all the i/o to unwind until the new updated subvol
attaches

4) Handle scenarios like glusterd restart, node reboot, etc



At the moment we are not planning to limit the number of heal subvolmes
per process as, because with the current approach also for every volume
heal was doing from a single process. We have not heared any major
complains on this?

Is the plan to not ever limit or, have a throttle set to a default
high(er) value? How would system resources be impacted if the proposed
design is implemented?


The plan is to implement in a way that it can support more than one 
multiplexed self-heal daemon. The throttling function as of now 
returns the same process to multiplex, but it can be easily modified 
to create a new process.


This multiplexing logic won't utilize any additional resources that it 
currently does.



Rafi KC



___
Gluster-devel mailing list
Gluster-devel@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-devel
___
Gluster-devel mailing list
Gluster-devel@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] Implementing multiplexing for self heal client.

2018-12-24 Thread RAFI KC



On 12/21/18 6:56 PM, Sankarshan Mukhopadhyay wrote:

On Fri, Dec 21, 2018 at 6:30 PM RAFI KC  wrote:

Hi All,

What is the problem?
As of now self-heal client is running as one daemon per node, this means
even if there are multiple volumes, there will only be one self-heal
daemon. So to take effect of each configuration changes in the cluster,
the self-heal has to be reconfigured. But it doesn't have ability to
dynamically reconfigure. Which means when you have lot of volumes in the
cluster, every management operation that involves configurations changes
like volume start/stop, add/remove brick etc will result in self-heal
daemon restart. If such operation is executed more often, it is not only
slow down self-heal for a volume, but also increases the slef-heal logs
substantially.

What is the value of the number of volumes when you write "lot of
volumes"? 1000 volumes, more etc


Yes, more than 1000 volumes. It also depends on how often you execute 
glusterd management operations (mentioned above). Each time self heal 
daemon is restarted, it prints the entire graph. This graph traces in 
the log will contribute the majority it's size.







How to fix it?

We are planning to follow a similar procedure as attach/detach graphs
dynamically which is similar to brick multiplex. The detailed steps is
as below,




1) First step is to make shd per volume daemon, to generate/reconfigure
volfiles per volume basis .

1.1) This will help to attach the volfiles easily to existing shd daemon

1.2) This will help to send notification to shd daemon as each
volinfo keeps the daemon object

1.3) reconfiguring a particular subvolume is easier as we can check
the topology better

1.4) With this change the volfiles will be moved to workdir/vols/
directory.

2) Writing new rpc requests like attach/detach_client_graph function to
support clients attach/detach

2.1) Also functions like graph reconfigure, mgmt_getspec_cbk has to
be modified

3) Safely detaching a subvolume when there are pending frames to unwind.

3.1) We can mark the client disconnected and make all the frames to
unwind with ENOTCONN

3.2) We can wait all the i/o to unwind until the new updated subvol
attaches

4) Handle scenarios like glusterd restart, node reboot, etc



At the moment we are not planning to limit the number of heal subvolmes
per process as, because with the current approach also for every volume
heal was doing from a single process. We have not heared any major
complains on this?

Is the plan to not ever limit or, have a throttle set to a default
high(er) value? How would system resources be impacted if the proposed
design is implemented?


The plan is to implement in a way that it can support more than one 
multiplexed self-heal daemon. The throttling function as of now returns 
the same process to multiplex, but it can be easily modified to create a 
new process.


This multiplexing logic won't utilize any additional resources that it 
currently does.



Rafi KC



___
Gluster-devel mailing list
Gluster-devel@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-devel

___
Gluster-devel mailing list
Gluster-devel@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] Implementing multiplexing for self heal client.

2018-12-24 Thread Sankarshan Mukhopadhyay
On Fri, Dec 21, 2018 at 6:30 PM RAFI KC  wrote:
>
> Hi All,
>
> What is the problem?
> As of now self-heal client is running as one daemon per node, this means
> even if there are multiple volumes, there will only be one self-heal
> daemon. So to take effect of each configuration changes in the cluster,
> the self-heal has to be reconfigured. But it doesn't have ability to
> dynamically reconfigure. Which means when you have lot of volumes in the
> cluster, every management operation that involves configurations changes
> like volume start/stop, add/remove brick etc will result in self-heal
> daemon restart. If such operation is executed more often, it is not only
> slow down self-heal for a volume, but also increases the slef-heal logs
> substantially.

What is the value of the number of volumes when you write "lot of
volumes"? 1000 volumes, more etc

>
>
> How to fix it?
>
> We are planning to follow a similar procedure as attach/detach graphs
> dynamically which is similar to brick multiplex. The detailed steps is
> as below,
>
>
>
>
> 1) First step is to make shd per volume daemon, to generate/reconfigure
> volfiles per volume basis .
>
>1.1) This will help to attach the volfiles easily to existing shd daemon
>
>1.2) This will help to send notification to shd daemon as each
> volinfo keeps the daemon object
>
>1.3) reconfiguring a particular subvolume is easier as we can check
> the topology better
>
>1.4) With this change the volfiles will be moved to workdir/vols/
> directory.
>
> 2) Writing new rpc requests like attach/detach_client_graph function to
> support clients attach/detach
>
>2.1) Also functions like graph reconfigure, mgmt_getspec_cbk has to
> be modified
>
> 3) Safely detaching a subvolume when there are pending frames to unwind.
>
>3.1) We can mark the client disconnected and make all the frames to
> unwind with ENOTCONN
>
>3.2) We can wait all the i/o to unwind until the new updated subvol
> attaches
>
> 4) Handle scenarios like glusterd restart, node reboot, etc
>
>
>
> At the moment we are not planning to limit the number of heal subvolmes
> per process as, because with the current approach also for every volume
> heal was doing from a single process. We have not heared any major
> complains on this?

Is the plan to not ever limit or, have a throttle set to a default
high(er) value? How would system resources be impacted if the proposed
design is implemented?
___
Gluster-devel mailing list
Gluster-devel@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-devel