[ 
https://issues.apache.org/jira/browse/GOBBLIN-480?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kuai Yu updated GOBBLIN-480:
----------------------------
    Description: 
Today GobblinClusterManager leverages single Helix cluster responsible for both 
job distribution and cluster manager HA. This all-in-one mode cannot works with 
Helix super controller, because GobblinClusterManager will create its own 
dedicated controller for HA handling, which is internal to Gobblin framework. 
This architect works fine but gradually we find it's hard to monitor Helix 
behavior and debug Helix related issues due to the lack of Helix task framework 
metrics, which is enabled for free, but only available when using a dedicated 
controllers under Helix super controller's supervision.

To allow the migration, we separated existing cluster into two clusters:

1. Our existing cluster will remain the same, but called as "job distribution 
cluster" in the separation mode. In unit test or local deployment mode, we will 
create a dedicated controller for this cluster. In production mode, we can 
assume Helix will provide a dedicated controller for us.

2. A new cluster will be created, now called 'manager cluster', which is 
responsible for cluster manager leadership change. This will provide leadership 
change callback just like we did earlier in all-in-one mode.

The new 'two cluster mode' can be turned on/off by user configuration. 
Similarly user can configure whether a controller for job distribution should 
be created.

  was:
Today GobblinClusterManager leverages single Helix cluster responsible for both 
job distribution and cluster manager HA. This all-in-one mode cannot works with 
Helix super controller, because GobblinClusterManager will create its own 
dedicated controller for HA handling, which is internal to Gobblin framework. 
This architect works fine but gradually we find it's hard to monitor Helix 
behavior and debug Helix related issues due to the lack of Helix task framework 
metrics, which is enabled for free, but only available when using a dedicated 
controllers under Helix super controller's supervision.

To allow the migration, we separated existing cluster into two clusters:

1. Our existing cluster will remain the same, called "job distribution 
cluster". In unit test or local deployment mode, we will create a dedicated 
controller for this cluster. In production mode, we assume Helix will provide 
this dedicated controller for us.

2. A new cluster will be created, called 'manager cluster', which is 
responsible for cluster manager leadership change. This will leadership change 
callback just like we did earlier in all-in-one mode.

Two cluster mode can be turned on/off by user configuration. Similarly to 
whether a controller for job distribution should be created.


> Allow job distribution cluster to be separated from cluster manager cluster
> ---------------------------------------------------------------------------
>
>                 Key: GOBBLIN-480
>                 URL: https://issues.apache.org/jira/browse/GOBBLIN-480
>             Project: Apache Gobblin
>          Issue Type: Improvement
>            Reporter: Kuai Yu
>            Assignee: Kuai Yu
>            Priority: Major
>
> Today GobblinClusterManager leverages single Helix cluster responsible for 
> both job distribution and cluster manager HA. This all-in-one mode cannot 
> works with Helix super controller, because GobblinClusterManager will create 
> its own dedicated controller for HA handling, which is internal to Gobblin 
> framework. This architect works fine but gradually we find it's hard to 
> monitor Helix behavior and debug Helix related issues due to the lack of 
> Helix task framework metrics, which is enabled for free, but only available 
> when using a dedicated controllers under Helix super controller's supervision.
> To allow the migration, we separated existing cluster into two clusters:
> 1. Our existing cluster will remain the same, but called as "job distribution 
> cluster" in the separation mode. In unit test or local deployment mode, we 
> will create a dedicated controller for this cluster. In production mode, we 
> can assume Helix will provide a dedicated controller for us.
> 2. A new cluster will be created, now called 'manager cluster', which is 
> responsible for cluster manager leadership change. This will provide 
> leadership change callback just like we did earlier in all-in-one mode.
> The new 'two cluster mode' can be turned on/off by user configuration. 
> Similarly user can configure whether a controller for job distribution should 
> be created.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to