[
https://issues.apache.org/jira/browse/S4-110?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13549777#comment-13549777
]
Matthieu Morel commented on S4-110:
-----------------------------------
There is a patch in branch S4-110. Currently commit [9661f27].
In my opinion, it would be better to have both helix and legacy cluster
management available for 0.6
Here is a list of notes about the integration:
* InstanceConfig: we could use a generic interface compatible both with helix
and legacy
* HelixBasedCommModule: use finer grained comm module: we should decompose the
coarse grained DefaultCommModule
* TaskSetup: isHelixEnabled : get through DI
* Cluster: ok, more general
* ClusterFromZk: correctly implement new methods from Cluster interface
* Main: some defaults to helix
* Server: could use helix optionally - code could belong to another class
(Main?) - could inject helix code ?
* remote senders / remote managers: there seems to be some ideas to refactor
the current mechanism into a state machine, but it's not implemented. We
absolutely need this part since it is used for injecting events (from an app
adapter).
* UDP: not a priority, since we should first refactor to use netty there as well
> Enhance cluster management features in S4
> -----------------------------------------
>
> Key: S4-110
> URL: https://issues.apache.org/jira/browse/S4-110
> Project: Apache S4
> Issue Type: New Feature
> Reporter: kishore gopalakrishna
> Assignee: kishore gopalakrishna
> Fix For: 0.6
>
>
> In S4 the number of partition is fixed for all streams and is dependent on
> the number of nodes in the cluster. Adding new nodes to S4 cluster causes
> the number of partitions to change. This results in lot of data movement. For
> example if there are 4 nodes and you add another node then nearly all keys
> will be remapped which result is huge data movement where as ideally only 20%
> of the data should move.
> By using Helix, every stream can be partitioned differently and independent
> of the number of nodes. Helix distributes the partitions evenly among the
> nodes. When new nodes are added, partitions can be migrated to new nodes
> without changing the number of partitions and minimizes the data movement.
>
> In S4 handles failures by having stand by nodes that are idle most of the
> time and become active when a node fails. Even though this works, its not
> ideal in terms of efficient hardware usage since the stand by nodes are idle
> most of the time. This also increases the fail over time since the PE state
> has to be transfered to only one node.
> Helix allows S4 to have Active and Standby nodes at a partition level so that
> all nodes can be active but some partitions will be Active and some in stand
> by mode. When a node fails, the partitions that were Active on that node
> will be evenly distributed among the remaining nodes. This provides automatic
> load balancing and also improves fail over time, since PE state can be
> transfered to multiple nodes in parallel.
> I have a prototype implementation here
> https://github.com/kishoreg/incubator-s4
> Instructions to build it and try it out are in the Readme.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira