Hi Kishore, Thank you for the detailed response, it certainly helps me understand better.
For my case I have an home-grown 'container' deployed as a war file. The container allows install/uninstall of components (I use the term component to indicate a unit of functionality) at runtime. The war file runs on a vanilla Tomcat server (node) hosted on EC2. The current deployment model involves having a cluster of nodes each running the container and housing 'all' the components front ended by a load balancer so its a homogeneous install. The container leverages registries to track artifacts used by the components and I am looking to cluster these registries (this is where I hope to use Helix) so that I can start moving away from a homogeneous install. I plan for components to be installed on nodes based on their resource usage rather than installing everything on each node i.e. move to a heterogeneous install. So considering this, my approach is to build intelligence in the container (rather than the load-balancers) to route requests to nodes which have those components. The way I plan to achieve this is by having the registries carry a shared view of the cluster. It allows the ability to build routing from this shared view. I already have a component deployment model in place which tracks the install/uninstall of components through events on each node and I think the Helix concepts map nicely to what I already have. My events would simply propagate to the cluster and I can get individual registries to listen to these events to build a shared view. I also have some other use-cases where I manage asynchronous work on individual nodes which I would like to route to other nodes based on capacity. In phase-II I plan to move towards installing components on-demand on nodes which don't carry them but have capacity to handle them, this would trigger when requests exceed capacity on existing servers. Let me know what you think, it appears so far like Helix is a good fit for what I am planning to use it for. Thanks, Sandeep From: kishore g <[email protected]> Reply-To: "[email protected]" <[email protected]> Date: Wednesday, February 12, 2014 1:44 AM To: "[email protected]" <[email protected]> Subject: Re: Newbie Questions Hi Sandeep, 1) Difference between Apache Helix and Norbert. While both projects originated at LinkedIn, there are some fundamental differences between the two. Will try to explain the difference via a simple use case of partitioned search index. In general you have 3 scenarios * Start up --> Partition assignment: One needs to distribute the partitions among the nodes in the cluster. * In Norbert this process is manual. One needs to generate the mapping of partition to node and push this configuration to each each server in the system needs (this is generally done via a config file). At start up, Norbert will simply write this configuration to zookeeper so that the clients can discover the partition to node mapping. * In Helix, the nodes simply join the cluster. Helix will inform the nodes which partitions to host based on the objectives and constraints (A simple objective would be to distribute the partitions evenly among the nodes) * Failure: When a node fails, you have multiple options #1. Do nothing #2. Re-assign the partitions hosted on the failed nodes to the remaining nodes. #3 Start a new node and assign the partitions to that new node. * Norbert simply informs you that a node left the group. You need to program your requirement. * Helix is capable of doing #1, #2 or even #3. #3 feature is work in progress and is possible if the deployment system is flexible and allows starting up process dynamically, If you are deploying in EC2, this should be possible. * Scalling: If you add more nodes to handle work load, you would want to redistribute some of the work to new nodes. * Norbert: This is again manual, you would have to change the configuration and re-start all the nodes * Helix: Helix would detect new nodes and fire appropriate transitions. As you already mentioned Helix treats partitions, replicas, state, transitions as first class citizens. What this means is you can not only say how many partitions you have but also mention the number of replicas for each partition. For example, for redundancy you can say you need 3 replicas for each partition and Helix will ensure that 3 replicas exist for each partition. 2. No we haven't used Helix on Amazon, not sure if any one has done this. There was another thread about this and he dint seem to think it would be a big problem. It will help if you give us more information about your application and set up. 3. "Incubating" indicates the state of the project in Apache, its does not reflect the quality of the code or production readiness. Apart from LinkedIn, companies like (Box, Instagram, Jboss jBPM clustering) have used Helix in production. Hope this helps. Cheers Kishore G On Wed, Feb 12, 2014 at 12:55 AM, Sandeep Nayak <[email protected]> wrote: Hi, I am a newbie to Apache Helix and am evaluating technologies to build clustered services. I have read through the documentation on the http://helix.incubator.apache.org/ site but did not find answers to the questions below so decided to ask them here. (1) What is the difference between Apache Helix and LinkedIn Norbert? I believe Norbert does not support state-machine transitions like Helix but is there a document/summary on what are the differences so someone like me can use that in my evaluation? (2) Has Apache Helix been used on Amazon? Is there any documentation on how to get this working? (3) The website as 0.6.2-incubating-stable and 0.7.0-incubating-alpha. Does incubating indicate the state of the project in Apache or is it indicative of the production-readiness of the library? I imagine the former because prior to getting to Apache the library was used at linked-in, am I correct? Thanks in advance, Sandeep
