Re: Newbie Questions

kishore g Wed, 12 Feb 2014 09:47:39 -0800

Hi Sandeep,

Deployment: It makes sense at a high level. If I understand correctly, you
will generate the component to node mapping based on the resource usage and
I am assuming you would want to change this mapping dynamically as new
containers come up or existing containers fail or load changes. This fits
well in the Helix architecture. How do you plan to come up with the
algorithm to do the mapping between components and containers. We have
multiple options here but I would like to get more details before I give
all the possible solutions. We are available on the free node IRC
#apachehelix and we can discuss the details any time on the IRC.


Regarding asynchronous work, this is definitely a great fit for Helix.We
are working on simplifying/enhancing the existing task rebalancer to
support task management as a first class citizen in Helix. Will be great to
get your feedback on that.

thanks,
Kishore G



On Wed, Feb 12, 2014 at 2:29 AM, Sandeep Nayak <[email protected]> wrote:

> Hi Kishore,
>
> Thank you for the detailed response, it certainly helps me understand
> better.
>
> For my case I have an home-grown 'container' deployed as a war file. The
> container allows install/uninstall of components (I use the term component
> to indicate a unit of functionality) at runtime. The war file runs on a
> vanilla Tomcat server (node) hosted on EC2. The current deployment model
> involves having a cluster of nodes each running the container and housing
> 'all' the components front ended by a load balancer so its a  homogeneous
> install.
>
> The container leverages registries to track artifacts used by the
> components and I am looking to cluster these registries (this is where I
> hope to use Helix) so that I can start moving away from a homogeneous
> install. I plan for components to be installed on nodes based on their
> resource usage rather than installing everything on each node i.e. move to
> a heterogeneous install.
>
> So considering this, my approach is to build intelligence in the container
> (rather than the load-balancers) to route requests to nodes which have
> those components. The way I plan to achieve this is by having the
> registries carry a shared view of the cluster. It allows the ability to
> build routing from this shared view. I already have a component deployment
> model in place which tracks the install/uninstall of components through
> events on each node and I think the Helix concepts map nicely to what I
> already have. My events would simply propagate to the cluster and I can
> get individual registries to listen to these events to build a shared
> view.
>
> I also have some other use-cases where I manage asynchronous work on
> individual nodes which I would like to route to other nodes based on
> capacity. In phase-II I plan to move towards installing components
> on-demand on nodes which don't carry them but have capacity to handle
> them, this would trigger when requests exceed capacity on existing servers.
>
> Let me know what you think, it appears so far like Helix is a good fit for
> what I am planning to use it for.
>
> Thanks,
>
> Sandeep
>
>
> From:  kishore g <[email protected]>
> Reply-To:  "[email protected]" <[email protected]>
> Date:  Wednesday, February 12, 2014 1:44 AM
> To:  "[email protected]" <[email protected]>
> Subject:  Re: Newbie Questions
>
>
> Hi Sandeep,
>
>
>
> 1) Difference between Apache Helix and Norbert.
>
> While both projects originated at LinkedIn, there are some fundamental
> differences between the two. Will try to explain the difference via a
> simple use case of partitioned search index. In general you have 3
> scenarios
>
> * Start up --> Partition assignment: One needs to distribute the
> partitions among the nodes in the cluster.
>
> * In Norbert this process is manual. One needs to generate the mapping of
> partition to node and push this configuration to each each server in the
> system needs (this is generally done via a config file). At start up,
> Norbert will simply write this configuration
>  to zookeeper so that the clients can discover the partition to node
> mapping.
>
> * In Helix, the nodes simply join the cluster. Helix will inform the nodes
> which partitions to host based on the objectives and constraints (A simple
> objective would be to distribute the partitions evenly among the nodes)
>
>
>
> * Failure: When a node fails, you have multiple options #1. Do nothing #2.
> Re-assign the partitions hosted on the failed nodes to the remaining
> nodes. #3 Start a new node and assign the partitions to that new node.
>
> * Norbert simply informs you that a node left the group. You need to
> program your requirement.
> * Helix is capable of doing #1, #2 or even #3. #3 feature is work in
> progress and is possible if the deployment system is flexible and allows
> starting up process dynamically, If you are deploying in EC2, this should
> be possible.
>
>
>
> * Scalling: If you add more nodes to handle work load, you would want to
> redistribute some of the work to new nodes.
>
> * Norbert: This is again manual, you would have to change the
> configuration and re-start all the nodes
>
> * Helix: Helix would detect new nodes and fire appropriate transitions.
>
>
>
>
> As you already mentioned Helix treats partitions, replicas, state,
> transitions as first class citizens. What this means is you can not only
> say how many partitions you have but also mention the number of replicas
> for each partition. For example, for redundancy
>  you can say you need 3 replicas for each partition and Helix will ensure
> that 3 replicas exist for each partition.
> 2. No we haven't used Helix on Amazon, not sure if any one has done this.
> There was another thread about this and he dint seem to think it would be
> a big problem. It will help if you give us more information about your
> application and set up.
>
> 3. "Incubating" indicates the state of the project in Apache, its does not
> reflect the quality of the code or production readiness. Apart from
> LinkedIn, companies like (Box, Instagram, Jboss jBPM clustering) have used
> Helix in production.
> Hope this helps.
>
> Cheers
>
> Kishore G
>
>
>
>
>
>
>
>
>
>
>
>
> On Wed, Feb 12, 2014 at 12:55 AM, Sandeep Nayak
> <[email protected]> wrote:
>
> Hi,
>
> I am a newbie to Apache Helix and am evaluating technologies to build
> clustered services. I have read through the documentation on the
> http://helix.incubator.apache.org/ site but did not find answers to the
> questions below so decided to ask them here.
>
> (1) What is the difference between Apache Helix and LinkedIn Norbert? I
> believe Norbert does not support state-machine transitions like Helix but
> is there a document/summary on what are the differences so someone like me
> can use that in my evaluation?
>
> (2) Has Apache Helix been used on Amazon? Is there any documentation on
> how to get this working?
>
> (3) The website as 0.6.2-incubating-stable and 0.7.0-incubating-alpha.
> Does incubating indicate the state of the project in Apache or is it
> indicative of the production-readiness of the library? I imagine the
> former because prior to getting to Apache the library was used at
> linked-in, am I correct?
>
>
> Thanks in advance,
>
> Sandeep
>
>
>
>
>
>
>
>

Re: Newbie Questions

Reply via email to