Thank you all for your very precious inputs... time to move on ;) @Kishore, I was already doing almost exactly what you described... sorry to bother you on this topic, but it was important to me to be sure that I was following the right path.
Regards, --- Alexandre Porcelli [email protected] On May 21, 2013, at 4:08 PM, kishore g <[email protected]> wrote: > Thanks Alexandre, MasterSlave would have been ideal but i see that you have > already considered that. Here is the outline of what I have understood till > now > > on each node > > onstartup > -- sync with current leader > > onwriteRequest > --- wait to become leader > --- once you become leader, you do the data to git local, send message to > every one to pull and wait for response. > --- after all standby acknowledges the request, release the lock. > > How to solve this using Helix ( your idea of using two separate resources and > state models actually makes sense) > ===================== > • Have two resource a)data_availability b) Global_lock. For > data_availability use a simple onlineoffline model. For global_lock use > leader_standby mode. > • data_availability: during offline->online transition sync with > current leader > • global_lock: set the ideal state in AUTO(not AUTO_REBALANCE) with one > partition for all repos(in future you can have one per repo to support > concurrent updates to multiple repos). Start with empty preference list. > • On every write request, the node simply adds itself to the end of > preference list. Helix will make the first in the preference list the leader. > On the become leader call back, you send message to all nodes in > data_availability resource( you can set self excluded flag=true to not send > message to itself). Once you get the acknowledgement, you update the > idealstate and remove yourself from preferencelist. Old leader will get > transition to > > Couple of things > > • while updating the idealstate, make sure you use the version flag to > avoid race condition while updating. > • After every write you probably should write the git commit number > using helixpropertystore, there is an api that allows you to limit the number > of elements you can store. Every time a node processes a write, it should > verify that latest commit version matches local commit version. > > Note that this solution, even though it works it will be limited by the > throughput of zookeeper since every write is resulting in zookeeper > access.And latency wont be great. This is definitely a good case where you > can use in memory grid like infinispan or hazelcast to achieve that. > > Let me know if it makes sense. > > Thanks, > Kishore G > > > > > On Tue, May 21, 2013 at 6:19 AM, Alexandre Porcelli <[email protected]> > wrote: > Hi Swaroop, > > Thanks for your input... In fact your description was my initial impression > too... in fact my first PoC was using MasterSlave, but afterwards I got some > other strong requirements like all nodes should be available to execute > writes (redirect writes to master isn't much efficient if compared to a git > pull). > Other important thing to consider is that my Git cluster is usually (not > always once you have external `clones`) consumed `local` (a web app that > cluster is deployed alongside).. and the web app load balancer isn't > something that I can control (my typical scenario is a mod_cluster with a > JBoss application server cluster, and the Git cluster runs from inside that > JBoss cluster). > > Regards, > --- > Alexandre Porcelli > [email protected] > > > > On May 21, 2013, at 4:15 AM, Swaroop Jagadish <[email protected]> wrote: > > > Hello Alexandre, > > Based on your description, it looks like MasterSlave state model is best > > suited for your use case. You distribute the different git repositories > > evenly across the cluster using "auto rebalance" mode in the state model. > > A git repo will be mapped to a Helix resource and for a given repo, there > > is only one node which is the master. Thus, there is only one node which > > can write to a given repo. The client uses Helix's external view in order > > to determine which node is the master for a given repo(can be accomplished > > using the RoutingTableProvider class). In order to keep the repositories > > in sync, whenever a write happens at the master, the master can send a > > message to all the slaves to sync their repos. A slave can either reject > > any direct writes it receives from the client or forward it to the master > > node. > > > > Let me know if that makes sense > > > > Regards, > > Swaroop > > > > On 5/20/13 10:36 AM, "Alexandre Porcelli" <[email protected]> wrote: > > > >> Hi Kishore, > >> > >> Lemme try to explain my needs with my real world usage scenario, so > >> would be easier to you understand. > >> > >> In a simple form, what I'm doing is a GIT cluster (using for that jgit > >> and apache helix). External clients can push data to any node of the > >> cluster, but in order to be able to have the cluster synced properly (to > >> avoid conflicts) I need to be sure that just one node is writing at once > >> (the single global lock role). Just before the current node `unlock` I > >> notify all other members of the cluster (using Messaging API) that they > >> must sync (the message points what was the updated repo). The unlock > >> operation releases the lock (so others that may need update data can do > >> it). > >> My current setup to do that uses the "LeaderStandby" model with one > >> resource (that I name it git-lock resource, with only one partition > >> git-lock_0), the Leader is the node that holds the lock and the standby > >> queue is formed by nodes that are willing to update data... nodes that > >> are not trying to update data, aren't in standby (they're offline due the > >> partition disabling). > >> Aside from global lock, when a new node joins the cluster.. it needs > >> sync all the git repositories - I don't have a fixed list of those repos, > >> that is why I need to query the cluster asking for a list of existing > >> repos. This query can be answered by any member of the existing cluster > >> (once all of them are sync`ed with the global lock). > >> > >> Is it clear now? > >> > >> What I'm wondering is.. if I'm not trying to mix two different things at > >> just one (single global lock and cluster's git repository list). > >> > >> Maybe it's worth to mention that in a near future I plan to get rid of > >> the single global lock and have a per git repo lock... > >> > >> > >> Again.. thanks in advance! > >> > >> Regards, > >> --- > >> Alexandre Porcelli > >> [email protected] > >> > >> > >> > >> > >> On May 20, 2013, at 2:05 PM, kishore g <[email protected]> wrote: > >> > >>> Hi Alex, > >>> > >>> Let me try to formulate your requirements > >>> > >>> 1. Have a global lock, of all nodes only one node needs to be LEADER > >>> 2. When new nodes are added, they automatically become STANDBY and sync > >>> data with existing LEADER > >>> > >>> Both the above requirements can be satisfied with AUTO_REBALANCE mode. > >>> In your original email, you mentioned about releasing the lock, can you > >>> explain when do you want to release the lock. Sorry I should have asked > >>> this earlier. I think this is the requirement that is causing some > >>> confusion. Also in 0.6.1 we have added a feature where you can plugin > >>> custom rebalancer logic when the pipeline is run so you can actually > >>> come up with your custom rebalancing logic. But, its not documented :( > >>> > >>> You might be right about using two state models or configure Helix with > >>> a custom state model. But I want to make sure I understand your use case > >>> before suggesting that. > >>> > >>> thanks, > >>> Kishore G > >>> > >>> > >>> > >>> On Mon, May 20, 2013 at 9:17 AM, Alexandre Porcelli > >>> <[email protected]> wrote: > >>> Hi Kishore, > >>> > >>> Once again, thanks for your support... it has been really valuable. > >>> > >>> I've been thinking and I'd like to share my thought and ask your (any > >>> comments are welcomed) opinion about it. My general need (I think I've > >>> already wrote about it, but here just a small recap) is a single global > >>> lock to control data changes and, in the same time check the current > >>> state of a live node in order to be able to sync when a new node joins > >>> the cluster. > >>> > >>> My latest questions about been able to manipulate transitions from API > >>> was to avoid to have a node in offline mode - as moving away from > >>> offline is the transition that triggers the sync, and if I disable a > >>> resource/node I'm redirected to offline automatically (using > >>> AUTO_REBALANCE). Kishore pointed me how to change my cluster from > >>> AUTO_REBALANCE to AUTO so I can have control of those transitions.... > >>> > >>> Now here is what I've been thinking about all of this: seems that I'm > >>> mixing two different things in just one cluster/resource - one is the > >>> lock and other is the cluster availability - maybe I'd just need to have > >>> two different resources for that, one for lock and other for the the > >>> real data availability - Wdyt? Another thing that would come to my mind > >>> is that maybe my need doesn't fit to existing state models, and I'd need > >>> to create a new one with my own config. > >>> > >>> I'd like to hear what you think about it... recommendations? thoughts, > >>> opinions, considerations... anything is welcomed. > >>> > >>> Regards, > >>> --- > >>> Alexandre Porcelli > >>> [email protected] > >>> > >>> > >>> On May 17, 2013, at 4:40 AM, kishore g <[email protected]> wrote: > >>> > >>>> Hi Alexandre, > >>>> > >>>> You can get more control in AUTO mode, you are currently using > >>> AUTO_REBALANCE where Helix decides who should be leader and where should > >>> it be. If you look at Idealstate it basically looks like this. > >>>> p1:[] > >>>> > >>>> In Auto mode you set the preference list for each partition > >>>> so you can set something like p1:[n1,n2,n3] > >>>> > >>>> In this case if n1 is alive, helix will make n1 the leader n2 n3 will > >>> be standby. If you want to make some one else leader, say n2 simply > >>> change this to > >>>> p1:[n2,n3,n1]. > >>>> > >>>> Change this line in your code > >>>> admin.addResource( clusterName, lockGroupName, 1, "LeaderStandby", > >>> IdealStateModeProperty.AUTO_REBALANCE.toString() ); > >>>> > >>>> admin.rebalance( clusterName, lockGroupName, numInstances ); > >>>> > >>>> to > >>>> > >>>> admin.addResource( clusterName, lockGroupName, 1, "LeaderStandby", > >>> IdealStateModeProperty.AUTO.toString() ); > >>>> > >>>> admin.rebalance( clusterName, lockGroupName, numInstances ); > >>>> > >>>> > >>>> // if you want to change the current leader, you can do the > >>> following. > >>>> > >>>> i > >>>> dealState = admin.getResourceIdealState(String clusterName, String > >>> resourceName); > >>>> > >>>> List preferenceList; //set the newleader you want as the first entry > >>>> > >>>> idealState.getRecord().setListField(partitionName,preferenceList); > >>>> > >>>> admin.addResource(String clusterName,String resourceName, IdealState > >>> idealstate) > >>>> > >>>> > >>>> > >>>> Read more about the different execution modes > >>>> > >>>> http://helix.incubator.apache.org/Concepts.html and > >>>> > >>>> http://helix.incubator.apache.org/Features.html > >>>> > >>>> > >>>> Thanks, > >>>> > >>>> Kishore G > >>>> > >>>> > >>>> > >>>> On Thu, May 16, 2013 at 11:09 PM, Alexandre Porcelli > >>> <[email protected]> wrote: > >>>> Hello all, > >>>> > >>>> Sorry to revamp this thread, but I think I'll have to ask again... > >>> is it possible to force, via an api call, a transition from Leader to > >>> "Wait" without disable an instance or partition? The transition from > >>> Leader to Offline triggered by the disabled partition is causing me > >>> some troubles... > >>>> The main problem is that my transition from "Offline" to "Standby" > >>> syncs data with the rest of the cluster (an expensive task, that should > >>> be executed only if that node was really offline, in other words: there > >>> was a partition, the node crashed or whatever). > >>>> > >>>> I predict that I may need build my own transition model... not sure > >>> (not even sure on how to do it and be able to control/expose that > >>> transition from Leader to "Wait")... > >>>> > >>>> Well... any help/suggestion is really welcomed! > >>>> > >>>> Cheers, > >>>> --- > >>>> Alexandre Porcelli > >>>> [email protected] > >>>> > >>>> On May 2, 2013, at 2:26 PM, Alexandre Porcelli <[email protected]> > >>> wrote: > >>>> > >>>>> Hi Vinayak, > >>>>> > >>>>> You were right, all my mistake! Disabling the partition works like > >>> a charm! Thank you very much. > >>>>> > >>>>> Regards, > >>>>> --- > >>>>> Alexandre Porcelli > >>>>> [email protected] > >>>>> > >>>>> On May 2, 2013, at 1:22 PM, Vinayak Borkar <[email protected]> > >>> wrote: > >>>>> > >>>>>> Looking at the signature of HelixAdmin.enablePartition, I see this: > >>>>>> > >>>>>> void enablePartition(boolean enabled, > >>>>>> String clusterName, > >>>>>> String instanceName, > >>>>>> String resourceName, > >>>>>> List<String> partitionNames); > >>>>>> > >>>>>> > >>>>>> > >>>>>> So when you disable the partition, you are doing so only on a > >>> perticular instance. So my understanding is that the same partition at > >>> other instances will participate in an election to come out of standby. > >>>>>> > >>>>>> Vinayak > >>>>>> > >>>>>> > >>>>>> On 5/2/13 9:14 AM, Alexandre Porcelli wrote: > >>>>>>> Hi Vinayak, > >>>>>>> > >>>>>>> Thanks for your quick answer, but I don't think this would be > >>> the case... once the partition `represents` the locked resource, so If i > >>> disable it no other instance in the cluster will be able to be promoted > >>> to Leader (at this point other nodes should be in standby just waiting > >>> to be able to acquire the lock - in other words, become Leader). > >>>>>>> Anyway thanks for your support. > >>>>>>> > >>>>>>> Cheers, > >>>>>>> --- > >>>>>>> Alexandre Porcelli > >>>>>>> [email protected] > >>>>>>> > >>>>>>> > >>>>>>> On May 2, 2013, at 1:06 PM, Vinayak Borkar <[email protected]> > >>> wrote: > >>>>>>> > >>>>>>>>> > >>>>>>>>> 1. I'm using a LeaderStandby in order to build a single global > >>> lock on my cluster, it works as expected.. but in order to release the > >>> lock I have to put the current leader in standby... I could achieve this > >>> by disabling the current instance. It works, but doing this I loose (at > >>> least seems to be) the ability to send/receive user defined messages. > >>> I'd like to know if it's possible to, via an api call, force a > >>> transition from Leader to Standby without disable an instance. > >>>>>>>> > >>>>>>>> I am a newbie to Helix too and I had a similar question a few > >>> days ago. Have you looked into disabling the resource by using the > >>> disablePartition() call in HelixAdmin using a partition number of 0? > >>> This should disable just the resource without impacting the instance. > >>>>>>>> > >>>>>>>> Vinayak > >>>>>>>> > >>>>>>>>> > >>>>>>>>> 2. I've been taking a quick look on Helix codebase, more > >>> specific on ZooKeeper usage. Seems that you're using ZooKeeper as a > >>> default implementation, but Helix architecture is not tied to it, right? > >>> I'm asking this, because I'm interested to implement (in a near future) > >>> a different backend (Infinispan). > >>>>>>>>> > >>>>>>>>> That's it for now... thanks in advance. > >>>>>>>>> > >>>>>>>>> Cheers, > >>>>>>>>> --- > >>>>>>>>> Alexandre Porcelli > >>>>>>>>> [email protected] > >>>>>>>>> > >>>>>>>> > >>>>>>> > >>>>>>> > >>>>>> > >>>>> > >>>> > >>>> > >>> > >>> > > > >
