Hi Kishore,
Thanks for creating the JIRA. I will try to respond to this mail here,
but please let me know if you would like to continue further discussion
on the issue in the JIRA going forward.
>
The reasoning behind having a consistent naming scheme is to provide a
consistent mechanism of assigning partition to nodes even after restarts.
This is important for stateful systems where we dont want to move the data
I see the need for stability in naming instances to a avoid complete
reshuffle on cluster restart. However, IMO this is a consequence of
Helix's design of having ZooKeeper be the single source of truth when
the cluster is not running.
Let's say Helix had an alternate approach:
While the cluster is running, let's say that ZooKeeper is used as the
source of truth regarding locations of partitions of resources. On the
other hand, when the cluster starts up, say ZK starts with a clean slate
that is incrementally populated as instances join the cluster based on
partitions reported by each instance during the "join" process. After
this point say Helix continued doing what it does today.
With this approach, instance names matter only while the cluster is
running and has no stability requirements across restarts. However, this
is a huge change for Helix and I am sure you guys probably thought about
this as a possible direction - I would like to hear your thoughts on
this topic.
on restarts. Another (not really technical but more practical) reason is to
avoid rogue instances connecting to the cluster with random id due to code
bugs or misconfiguration.
I completely agree with the need to handle the rogue/misconfigured
instances case.
This requirement has come up multiple times at LinkedIn and on other
threads. Will a feature like auto create instance on join and delete on
leave be help ful. We can have this flag set at cluster level when the
cluster is created so we can throw exception if the flag is set is false
and node is not already created.
While the above feature would be great for adding new instances with
little configuration (and for zero-configuration while testing), there
still needs a way to handle a loaded cluster restart without leading to
a massive reshuffle.
Thanks,
Vinayak