Thank you all for your very precious inputs... time to move on ;)

@Kishore, I was already doing almost exactly what you described... sorry to 
bother you on this topic, but it was important to me to be sure that I was 
following the right path.


Regards,
---
Alexandre Porcelli
[email protected]

On May 21, 2013, at 4:08 PM, kishore g <[email protected]> wrote:

> Thanks Alexandre, MasterSlave would have been ideal but i see that you have 
> already considered that. Here is the outline of what I have understood till 
> now 
> 
> on each node
> 
> onstartup
> -- sync with current leader
> 
> onwriteRequest
> --- wait to become leader
> --- once you become leader, you do the data to git local, send message to 
> every one to pull and wait for response.
> --- after all  standby acknowledges the request, release the lock.
> 
> How to solve this using Helix ( your idea of using two separate resources and 
> state models actually makes sense)
> =====================
>       • Have two resource a)data_availability b) Global_lock. For 
> data_availability use a simple onlineoffline model. For global_lock use 
> leader_standby mode.
>       • data_availability: during offline->online transition sync with 
> current leader
>       • global_lock: set the ideal state in AUTO(not AUTO_REBALANCE) with one 
> partition for all repos(in future you can have one per repo to support 
> concurrent updates to multiple repos). Start with empty preference list.
>       • On every write request, the node simply adds itself to the end of 
> preference list. Helix will make the first in the preference list the leader. 
> On the become leader call back, you send message to all nodes in 
> data_availability resource( you can set self excluded flag=true to not send 
> message to itself). Once you get the acknowledgement, you update the 
> idealstate and remove yourself from preferencelist. Old leader will get 
> transition to 
> 
> Couple of things 
> 
>       • while updating the idealstate, make sure you use the version flag to 
> avoid race condition while updating.
>       • After every write you probably should write the git commit number 
> using helixpropertystore, there is an api that allows you to limit the number 
> of elements you can store. Every time a node processes a write, it should 
> verify that latest commit version matches local commit version.
> 
> Note that this solution, even though it works it will be limited by the 
> throughput of zookeeper since every write is resulting in zookeeper 
> access.And latency wont be great. This is definitely a good case where you 
> can use in memory grid like infinispan or hazelcast to achieve that.
> 
> Let me know if it makes sense.
> 
> Thanks,
> Kishore G
> 
> 
> 
> 
> On Tue, May 21, 2013 at 6:19 AM, Alexandre Porcelli <[email protected]> 
> wrote:
> Hi Swaroop,
> 
>  Thanks for your input... In fact your description was my initial impression 
> too... in fact my first PoC was using MasterSlave, but afterwards I got some 
> other strong requirements like all nodes should be available to execute 
> writes (redirect writes to master isn't much efficient if compared to a git 
> pull).
>  Other important thing to consider is that my Git cluster is usually (not 
> always once you have external `clones`) consumed `local` (a web app that 
> cluster is deployed alongside).. and the web app load balancer isn't 
> something that I can control (my typical scenario is a mod_cluster with a 
> JBoss application server cluster, and the Git cluster runs from inside that 
> JBoss cluster).
> 
> Regards,
> ---
> Alexandre Porcelli
> [email protected]
> 
> 
> 
> On May 21, 2013, at 4:15 AM, Swaroop Jagadish <[email protected]> wrote:
> 
> > Hello Alexandre,
> > Based on your description, it looks like MasterSlave state model is best
> > suited for your use case. You distribute the different git repositories
> > evenly across the cluster using "auto rebalance" mode in the state model.
> > A git repo will be mapped to a Helix resource and for a given repo, there
> > is only one node which is the master. Thus, there is only one node which
> > can write to a given repo. The client uses Helix's external view in order
> > to determine which node is the master for a given repo(can be accomplished
> > using the RoutingTableProvider class). In order to keep the repositories
> > in sync, whenever a write happens at the master, the master can send a
> > message to all the slaves to sync their repos. A slave can either reject
> > any direct writes it receives from the client or forward it to the master
> > node.
> >
> > Let me know if that makes sense
> >
> > Regards,
> > Swaroop
> >
> > On 5/20/13 10:36 AM, "Alexandre Porcelli" <[email protected]> wrote:
> >
> >> Hi Kishore,
> >>
> >> Lemme try to explain my needs with my real world usage scenario, so
> >> would be easier to you understand.
> >>
> >> In a simple form, what I'm doing is a GIT cluster (using for that jgit
> >> and apache helix). External clients can push data to any node of the
> >> cluster, but in order to be able to have the cluster synced properly (to
> >> avoid conflicts) I need to be sure that just one node is writing at once
> >> (the single global lock role). Just before the current node `unlock` I
> >> notify all other members of the cluster (using Messaging API) that they
> >> must sync (the message points what was the updated repo). The unlock
> >> operation releases the lock (so others that may need update data can do
> >> it).
> >> My current setup to do that uses the "LeaderStandby" model with one
> >> resource (that I name it git-lock resource, with only one partition
> >> git-lock_0), the Leader is the node that holds the lock and the standby
> >> queue is formed by nodes that are willing to update data... nodes that
> >> are not trying to update data, aren't in standby (they're offline due the
> >> partition disabling).
> >> Aside from global lock, when a new node joins the cluster.. it needs
> >> sync all the git repositories - I don't have a fixed list of those repos,
> >> that is why I need to query the cluster asking for a list of existing
> >> repos. This query can be answered by any member of the existing cluster
> >> (once all of them are sync`ed with the global lock).
> >>
> >> Is it clear now?
> >>
> >> What I'm wondering is.. if I'm not trying to mix two different things at
> >> just one (single global lock and cluster's git repository list).
> >>
> >> Maybe it's worth to mention that in a near future I plan to get rid of
> >> the single global lock and have a per git repo lock...
> >>
> >>
> >> Again.. thanks in advance!
> >>
> >> Regards,
> >> ---
> >> Alexandre Porcelli
> >> [email protected]
> >>
> >>
> >>
> >>
> >> On May 20, 2013, at 2:05 PM, kishore g <[email protected]> wrote:
> >>
> >>> Hi Alex,
> >>>
> >>> Let me try to formulate your requirements
> >>>
> >>> 1. Have a global lock, of all nodes only one node needs to be LEADER
> >>> 2. When new nodes are added, they automatically become STANDBY and sync
> >>> data with existing LEADER
> >>>
> >>> Both the above requirements can be satisfied with AUTO_REBALANCE mode.
> >>> In your original email, you mentioned about releasing the lock, can you
> >>> explain when do you want to release the lock. Sorry I should have asked
> >>> this earlier. I think this is the requirement that is causing some
> >>> confusion. Also in 0.6.1 we have added a feature where you can plugin
> >>> custom rebalancer logic when the pipeline is run so you can actually
> >>> come up with your custom rebalancing logic. But, its not documented :(
> >>>
> >>> You might be right about using two state models or configure Helix with
> >>> a custom state model. But I want to make sure I understand your use case
> >>> before suggesting that.
> >>>
> >>> thanks,
> >>> Kishore G
> >>>
> >>>
> >>>
> >>> On Mon, May 20, 2013 at 9:17 AM, Alexandre Porcelli
> >>> <[email protected]> wrote:
> >>> Hi Kishore,
> >>>
> >>> Once again, thanks for your support... it has been really valuable.
> >>>
> >>> I've been thinking and I'd like to share my thought and ask your (any
> >>> comments are welcomed) opinion about it. My general need (I think I've
> >>> already wrote about it, but here just a small recap) is a single global
> >>> lock to control data changes and, in the same time check the current
> >>> state of a live node in order to be able to sync when a new node joins
> >>> the cluster.
> >>>
> >>> My latest questions about been able to manipulate transitions from API
> >>> was to avoid to have a node in offline mode - as moving away from
> >>> offline is the transition that triggers the sync, and if I disable a
> >>> resource/node I'm redirected to offline automatically (using
> >>> AUTO_REBALANCE). Kishore pointed me how to change my cluster from
> >>> AUTO_REBALANCE to AUTO so I can have control of those transitions....
> >>>
> >>> Now here is what I've been thinking about all of this: seems that I'm
> >>> mixing two different things in just one cluster/resource - one is the
> >>> lock and other is the cluster availability - maybe I'd just need to have
> >>> two different resources for that, one for lock and other for the the
> >>> real data availability - Wdyt? Another thing that would come to my mind
> >>> is that maybe my need doesn't fit to existing state models, and I'd need
> >>> to create a new one with my own config.
> >>>
> >>> I'd like to hear what you think about it... recommendations? thoughts,
> >>> opinions, considerations... anything is welcomed.
> >>>
> >>> Regards,
> >>> ---
> >>> Alexandre Porcelli
> >>> [email protected]
> >>>
> >>>
> >>> On May 17, 2013, at 4:40 AM, kishore g <[email protected]> wrote:
> >>>
> >>>> Hi Alexandre,
> >>>>
> >>>> You can get more control in AUTO mode, you are currently using
> >>> AUTO_REBALANCE where Helix decides who should be leader and where should
> >>> it be. If you look at Idealstate it basically looks like this.
> >>>> p1:[]
> >>>>
> >>>> In Auto mode you set the preference list for each partition
> >>>> so you can set something like p1:[n1,n2,n3]
> >>>>
> >>>> In this case if n1 is alive, helix will make n1 the leader n2 n3 will
> >>> be standby. If you want to make some one else leader, say n2 simply
> >>> change this to
> >>>> p1:[n2,n3,n1].
> >>>>
> >>>> Change this line in your code
> >>>> admin.addResource( clusterName, lockGroupName, 1, "LeaderStandby",
> >>> IdealStateModeProperty.AUTO_REBALANCE.toString() );
> >>>>
> >>>> admin.rebalance( clusterName, lockGroupName, numInstances );
> >>>>
> >>>> to
> >>>>
> >>>> admin.addResource( clusterName, lockGroupName, 1, "LeaderStandby",
> >>> IdealStateModeProperty.AUTO.toString() );
> >>>>
> >>>> admin.rebalance( clusterName, lockGroupName, numInstances );
> >>>>
> >>>>
> >>>> //  if you want to change the current leader, you can do the
> >>> following.
> >>>>
> >>>> i
> >>>> dealState = admin.getResourceIdealState(String clusterName, String
> >>> resourceName);
> >>>>
> >>>> List preferenceList; //set the newleader you want as the first entry
> >>>>
> >>>> idealState.getRecord().setListField(partitionName,preferenceList);
> >>>>
> >>>> admin.addResource(String clusterName,String resourceName, IdealState
> >>> idealstate)
> >>>>
> >>>>
> >>>>
> >>>> Read more about the different execution modes
> >>>>
> >>>> http://helix.incubator.apache.org/Concepts.html and
> >>>>
> >>>> http://helix.incubator.apache.org/Features.html
> >>>>
> >>>>
> >>>> Thanks,
> >>>>
> >>>> Kishore G
> >>>>
> >>>>
> >>>>
> >>>> On Thu, May 16, 2013 at 11:09 PM, Alexandre Porcelli
> >>> <[email protected]> wrote:
> >>>> Hello all,
> >>>>
> >>>> Sorry to revamp this thread, but I think I'll have to ask again...
> >>> is it possible to force, via an api call, a transition from Leader to
> >>> "Wait" without disable an instance or partition? The transition from
> >>> Leader to Offline triggered  by the disabled partition is causing me
> >>> some troubles...
> >>>> The main problem is that my transition from "Offline" to "Standby"
> >>> syncs data with the rest of the cluster (an expensive task, that should
> >>> be executed only if that node was really offline, in other words: there
> >>> was a partition, the node crashed or whatever).
> >>>>
> >>>> I predict that I may need build my own transition model... not sure
> >>> (not even sure on how to do it and be able to control/expose that
> >>> transition from Leader to "Wait")...
> >>>>
> >>>> Well... any help/suggestion is really welcomed!
> >>>>
> >>>> Cheers,
> >>>> ---
> >>>> Alexandre Porcelli
> >>>> [email protected]
> >>>>
> >>>> On May 2, 2013, at 2:26 PM, Alexandre Porcelli <[email protected]>
> >>> wrote:
> >>>>
> >>>>> Hi Vinayak,
> >>>>>
> >>>>> You were right, all my mistake! Disabling the partition works like
> >>> a charm! Thank you very much.
> >>>>>
> >>>>> Regards,
> >>>>> ---
> >>>>> Alexandre Porcelli
> >>>>> [email protected]
> >>>>>
> >>>>> On May 2, 2013, at 1:22 PM, Vinayak Borkar <[email protected]>
> >>> wrote:
> >>>>>
> >>>>>> Looking at the signature of HelixAdmin.enablePartition, I see this:
> >>>>>>
> >>>>>> void enablePartition(boolean enabled,
> >>>>>>                      String clusterName,
> >>>>>>                      String instanceName,
> >>>>>>                      String resourceName,
> >>>>>>                      List<String> partitionNames);
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>> So when you disable the partition, you are doing so only on a
> >>> perticular instance. So my understanding is that the same partition at
> >>> other instances will participate in an election to come out of standby.
> >>>>>>
> >>>>>> Vinayak
> >>>>>>
> >>>>>>
> >>>>>> On 5/2/13 9:14 AM, Alexandre Porcelli wrote:
> >>>>>>> Hi Vinayak,
> >>>>>>>
> >>>>>>> Thanks for your quick answer, but I don't think this would be
> >>> the case... once the partition `represents` the locked resource, so If i
> >>> disable it no other instance in the cluster will be able to be promoted
> >>> to Leader (at this point other nodes should be in standby just waiting
> >>> to be able to acquire the lock - in other words, become Leader).
> >>>>>>> Anyway thanks for your support.
> >>>>>>>
> >>>>>>> Cheers,
> >>>>>>> ---
> >>>>>>> Alexandre Porcelli
> >>>>>>> [email protected]
> >>>>>>>
> >>>>>>>
> >>>>>>> On May 2, 2013, at 1:06 PM, Vinayak Borkar <[email protected]>
> >>> wrote:
> >>>>>>>
> >>>>>>>>>
> >>>>>>>>> 1. I'm using a LeaderStandby in order to build a single global
> >>> lock on my cluster, it works as expected.. but in order to release the
> >>> lock I have to put the current leader in standby... I could achieve this
> >>> by disabling the current instance. It works, but doing this I loose (at
> >>> least seems to be) the ability to send/receive user defined messages.
> >>> I'd like to know if it's possible to, via an api call, force a
> >>> transition from Leader to Standby without disable an instance.
> >>>>>>>>
> >>>>>>>> I am a newbie to Helix too and I had a similar question a few
> >>> days ago. Have you looked into disabling the resource by using the
> >>> disablePartition() call in HelixAdmin using a partition number of 0?
> >>> This should disable just the resource without impacting the instance.
> >>>>>>>>
> >>>>>>>> Vinayak
> >>>>>>>>
> >>>>>>>>>
> >>>>>>>>> 2. I've been taking a quick look on Helix codebase, more
> >>> specific on ZooKeeper usage. Seems that you're using ZooKeeper as a
> >>> default implementation, but Helix architecture is not tied to it, right?
> >>> I'm asking this, because I'm interested to implement (in a near future)
> >>> a different backend (Infinispan).
> >>>>>>>>>
> >>>>>>>>> That's it for now...  thanks in advance.
> >>>>>>>>>
> >>>>>>>>> Cheers,
> >>>>>>>>> ---
> >>>>>>>>> Alexandre Porcelli
> >>>>>>>>> [email protected]
> >>>>>>>>>
> >>>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>
> >>>>>
> >>>>
> >>>>
> >>>
> >>>
> >
> 
> 

Reply via email to