Sure! I'll join the channel!
On Wed, Apr 9, 2014 at 12:41 PM, kishore g <[email protected]> wrote: > Hi Vlad, > > I have some questions. Can you join the IRC channel #apachehelix. > > thanks, > Kishore G > > > On Wed, Apr 9, 2014 at 11:35 AM, [email protected] <[email protected]>wrote: > >> Upon some further testing, it seems that the controller does not execute >> the events in the right sequence. >> >> Here are the results of some of my testing. Assume that we have a >> partition NEWPROFILE_5 with the ideal state: >> >> "NEWPROFILE_5" : { >> >> "pf1.apps-pf.dev.docker_12000" : "SLAVE", >> >> "pf2.apps-pf.dev.docker_12000" : "MASTER" >> >> } >> >> I boot the host pf1 and a few minutes later the host pf2. In the >> controller logs I see, when doing a grep for NEWPROFILE_5: >> >> 2014-04-08 17:04:35,309 (Thread-2) TaskAssignmentStage INFO: Sending >> Message 69b4eddf-ac5f-4726-9d6b-bac742ad082e to >> pf1.apps-pf.dev.docker_12000 transit NEWPROFILE_5|[] from:SLAVE to:MASTER >> >> 2014-04-08 17:27:08,187 (Thread-2) TaskAssignmentStage INFO: Sending >> Message a221b1ac-0807-425e-9062-6507e45b0bfb to >> pf1.apps-pf.dev.docker_12000 transit NEWPROFILE_5|[] from:OFFLINE >> to:BOOTSTRAP >> >> 2014-04-08 17:27:10,164 (Thread-2) TaskAssignmentStage INFO: Sending >> Message 73ed85fd-49c9-46a5-b262-687d612c7d06 to >> pf1.apps-pf.dev.docker_12000 transit NEWPROFILE_5|[] from:BOOTSTRAP to:SLAVE >> >> 2014-04-08 17:27:11,868 (Thread-2) TaskAssignmentStage INFO: Sending >> Message fb21aecc-68cf-4b9f-9718-aa6ed535c29d to >> pf1.apps-pf.dev.docker_12000 transit NEWPROFILE_5|[] from:SLAVE to:MASTER >> >> 2014-04-08 17:28:22,978 (Thread-2) TaskAssignmentStage INFO: Sending >> Message ea441d18-b1f3-4ceb-96a2-3262cab1dfbe to >> pf2.apps-pf.dev.docker_12000 transit NEWPROFILE_5|[] from:OFFLINE >> to:BOOTSTRAP >> >> 2014-04-08 17:28:22,978 (Thread-2) TaskAssignmentStage INFO: Sending >> Message f36b4d64-c790-413b-b9fa-915b9539d28c to >> pf1.apps-pf.dev.docker_12000 transit NEWPROFILE_5|[] from:MASTER to:SLAVE >> >> 2014-04-08 17:28:26,065 (Thread-2) TaskAssignmentStage INFO: Sending >> Message 201429e1-e810-4017-b3ef-fb5930ac2192 to >> pf2.apps-pf.dev.docker_12000 transit NEWPROFILE_5|[] from:BOOTSTRAP to:SLAVE >> >> 2014-04-08 17:28:28,238 (Thread-2) TaskAssignmentStage INFO: Sending >> Message 4a1fb64c-1063-4e49-a995-946d2dd25733 to >> pf2.apps-pf.dev.docker_12000 transit NEWPROFILE_5|[] from:SLAVE to:MASTER >> >> That is, the controller issues an offline->bootstrap command to pf-2, but >> then issues a master->slave command to of-1 before bringing pf-2 up as a >> slave as well (the last step before promotion to master). Since the >> bootstrap->slave that follows takes time, the system spends time without a >> master for the partition. >> >> The state model definition was: >> public static StateModelDefinition defineStateModel() { >> StateModelDefinition.Builder builder = >> new StateModelDefinition.Builder(KVHelixDefinitions.STATE_MODEL_NAME); >> // Add states and their rank to indicate priority. Lower the rank higher >> the >> // priority >> builder.addState(MASTER, 1); >> builder.addState(SLAVE, 2); >> builder.addState(BOOTSTRAP, 3); >> builder.addState(OFFLINE); >> builder.addState(DROPPED); >> // Set the initial state when the node starts >> builder.initialState(OFFLINE); >> >> // Add transitions between the states. >> builder.addTransition(OFFLINE, BOOTSTRAP, 3); >> builder.addTransition(BOOTSTRAP, SLAVE, 2); >> builder.addTransition(SLAVE, MASTER, 1); >> builder.addTransition(MASTER, SLAVE, 4); >> builder.addTransition(SLAVE, OFFLINE, 5); >> builder.addTransition(OFFLINE, DROPPED, 6); >> >> // set constraints on states. >> // static constraint >> builder.upperBound(MASTER, 1); >> // dynamic constraint, R means it should be derived based on the >> replication >> // factor. >> builder.dynamicUpperBound(SLAVE, "R"); >> >> StateModelDefinition statemodelDefinition = builder.build(); >> >> assert(statemodelDefinition.isValid()); >> >> return statemodelDefinition; >> } >> >> I have tried reversing the values of the transition priorities. In this >> case, the controller log file looked as follows: >> >> 2014-04-09 11:17:52,831 (Thread-2) TaskAssignmentStage INFO: Sending >> Message 2b29a319-c1c6-4042-b1ad-3e3c1b5092a7 to >> pf1.apps-pf.dev.docker_12000 transit NEWPROFILE_5|[] from:OFFLINE >> to:BOOTSTRAP >> >> 2014-04-09 11:17:55,672 (Thread-2) MessageGenerationStage INFO: Message >> hasn't been removed for pf1.apps-pf.dev.docker_12000 to transitNEWPROFILE_5 >> to BOOTSTRAP, desiredState: MASTER >> >> 2014-04-09 11:17:57,047 (Thread-2) TaskAssignmentStage INFO: Sending >> Message b1ca701d-65f1-46b9-9ae4-286400d6d266 to >> pf1.apps-pf.dev.docker_12000 transit NEWPROFILE_5|[] from:BOOTSTRAP to:SLAVE >> >> 2014-04-09 11:17:58,888 (Thread-2) TaskAssignmentStage INFO: Sending >> Message fe10228f-8f5b-4133-964a-5f6c7e60b0e6 to >> pf1.apps-pf.dev.docker_12000 transit NEWPROFILE_5|[] from:SLAVE to:MASTER >> >> 2014-04-09 11:23:26,117 (Thread-2) TaskAssignmentStage INFO: Sending >> Message 6252a4e6-0ab8-490a-a51d-c47195c434b5 to >> pf1.apps-pf.dev.docker_12000 transit NEWPROFILE_5|[] from:MASTER to:SLAVE >> >> 2014-04-09 11:23:26,117 (Thread-2) TaskAssignmentStage INFO: Sending >> Message 18bbf028-cb51-4162-8226-a6564a121986 to >> pf2.apps-pf.dev.docker_12000 transit NEWPROFILE_5|[] from:OFFLINE >> to:BOOTSTRAP >> >> 2014-04-09 11:23:33,462 (Thread-2) MessageGenerationStage INFO: Message >> hasn't been removed for pf2.apps-pf.dev.docker_12000 to transitNEWPROFILE_5 >> to BOOTSTRAP, desiredState: MASTER >> >> 2014-04-09 11:23:33,892 (Thread-2) TaskAssignmentStage INFO: Sending >> Message c7fc4983-9d71-4dc4-bfee-2ad69e4de411 to >> pf2.apps-pf.dev.docker_12000 transit NEWPROFILE_5|[] from:BOOTSTRAP to:SLAVE >> >> 2014-04-09 11:23:35,933 (Thread-2) TaskAssignmentStage INFO: Sending >> Message 75e715ed-3d53-4e39-b1e7-44695e4bfa03 to >> pf2.apps-pf.dev.docker_12000 transit NEWPROFILE_5|[] from:SLAVE to:MASTER >> >> That is, the transition for master->slave for pf1 was executed before >> taking any action on pf2, clearly the opposite of the right order. >> >> >> On Tue, Apr 8, 2014 at 2:19 PM, Kanak Biscuitwala <[email protected]>wrote: >> >>> >>> Looks good, thanks for sharing! >>> >>> Kanak >>> ________________________________ >>> > Date: Tue, 8 Apr 2014 14:08:28 -0700 >>> > Subject: Re: keeping the master node up during bootstrap >>> > From: [email protected] >>> > To: [email protected] >>> > >>> > My modified code looks like: >>> > >>> > /* Setup a Helix cluster for the KVStore */ >>> > public static void setupCluster() { >>> > assert(cluster != null); >>> > clusterSetup.addCluster(cluster, true); >>> > >>> > а а а а ConstraintItemBuilder constraintItemBuilder = new >>> > ConstraintItemBuilder(); >>> > >>> > а а а а constraintItemBuilder >>> > а а а а а а а а >>> > .addConstraintAttribute(ConstraintAttribute.MESSAGE_TYPE.toString(), >>> > "STATE_TRANSITION") >>> > а а а а а а а а >>> > .addConstraintAttribute(ConstraintAttribute.PARTITION.toString(), ".*") >>> > а а а а а а а а >>> > >>> .addConstraintAttribute(ConstraintAttribute.CONSTRAINT_VALUE.toString(), >>> > "1"); >>> > >>> > а а а а clusterSetup.getClusterManagementTool().setConstraint(cluster, >>> > а а а а а а а а ClusterConstraints.ConstraintType.MESSAGE_CONSTRAINT, >>> > а а а а а а а а "constraint1", constraintItemBuilder.build()); >>> > а а } >>> > >>> > I will try to see whether it works in every situation. >>> > >>> > Regards, >>> > Vlad >>> > >>> > >>> > On Tue, Apr 8, 2014 at 8:59 AM, Vlad Balan >>> > <[email protected]<mailto:[email protected]>> wrote: >>> > Hi Kishore, >>> > >>> > I managed to implement the bootstrapping using the constraint and it >>> > appears to be running as expected. I will post my code shortly. >>> > >>> > Regards, >>> > Vlad >>> > >>> > On Apr 8, 2014, at 8:27 AM, kishore g >>> > <[email protected]<mailto:[email protected]>> wrote: >>> > >>> > Hi Vlad, >>> > >>> > Did you get a chance to play with the constraint.а I can write a sample >>> > code today to try this. >>> > >>> > Thanks, >>> > Kishore G >>> > >>> > >>> > On Thu, Apr 3, 2014 at 5:45 PM, >>> > [email protected]<mailto:[email protected]> >>> > <[email protected]<mailto:[email protected]>> wrote: >>> > >>> > Thank you Kanak and Kishore! I will try enforcing the per-partition >>> > constraint and let you know if somehow it does not work. I was looking >>> > at the throttling documentation, but somehow missed that a >>> > per-partition constraint was an option! >>> > >>> > Regards, >>> > Vlad >>> > >>> > >>> > On Thu, Apr 3, 2014 at 5:42 PM, kishore g >>> > <[email protected]<mailto:[email protected]>> wrote: >>> > Hi Vlad, >>> > >>> > You can try setting the transition priority order and a constraint that >>> > there should be only one transition per partition across the cluster. >>> > >>> > So the transition priority could be something like >>> > >>> > Slave-Master >>> > Offfline -> Bootstrap >>> > Bootstrap->Slave >>> > Slave->Master >>> > >>> > For the rest not sure if order matters. >>> > >>> > Also set the max transitions constraint to 1 per partition. >>> > >>> > The reason I put Slave-Master before Offline->Bootstrap is to ensure >>> > that availability is given more importance. For example if you have 3 >>> > nodes, N1, N2, N3. N1 is Master, N2 is Slave, and N3 is down. If N1 >>> > goes down and N3 comes up at the same time. We probably dont want to >>> > wait for N3 to bootstrap before promoting N2 to Master. >>> > >>> > I haven't tested this but assuming the constraints enforcement works, >>> > this should do the trick. >>> > >>> > Does this make sense? Let me know if this does not work, we can add a >>> > test case. >>> > >>> > thanks, >>> > Kishore G >>> > >>> > >>> > >>> > >>> > >>> > >>> > On Thu, Apr 3, 2014 at 4:57 PM, >>> > [email protected]<mailto:[email protected]> >>> > <[email protected]<mailto:[email protected]>> wrote: >>> > >>> > Dear all, >>> > >>> > I am trying to construct a state model with the following transition >>> diagram: >>> > >>> > OFFLINE -> BOOTSTRAPPING <---> SLAVE <-----> MASTER >>> > а а а а а<----------------------------------- >>> > >>> > That is, an offline mode can go into a bootstraping state, from the >>> > bootstrap state it can go into a slave state, >>> > from slave it can go from master, from master to slave and from slave >>> > it can go offline. >>> > >>> > Assume that if I have a partition with two nodes pf1 and pf2 and a >>> > partition partition_0 with the following ideal state: >>> > >>> > partition_0: pf2: MASTER pf1: SLAVE, >>> > >>> > and that currently pf1 is serving as a master. When pf2 boots, Helix >>> > will issue, almost simultaneously, two commands: >>> > for pf1: transition from MASTER to SLAVE >>> > for pf2: transition from BOOTSTRAPPING to SLAVE >>> > >>> > My understanding is that this happens since Helix is trying to execute >>> > as many commands in parallel and since the last state >>> > has pf2 as master. However, the transition from BOOTSTRAPPING to SLAVE >>> > for pf2 involves a long data copy step, so >>> > I would like to keep pf1 as a master in the meanwhile. I tried >>> > prioritizing the transition from BOOTSTRAPPING to SLAVE >>> > over the transition from MASTER to SLAVE, however Helix still issues >>> > them in parallel (as it should). >>> > >>> > I was wondering what my options would be in order to keep the master up >>> > while the future master is bootstrapping. Could >>> > a throttling in the number of transitions be enforced at partition >>> > level? Could I somehow specify that a state with a slave >>> > and a bootstrapping node is undesirable? >>> > >>> > As a note, I have also looked at the RSync-replicateed filesystem >>> > example. The reason for not using the OfflineOnline or the >>> > MasterSlave model in my application is that I would like the >>> > bootstrapping node to receive updates from clients, i.e. be visible >>> > during the bootstrap. For this reason, I am introducing the new >>> > BOOTSTRAPPING phase in-between OFFLINE and SLAVE. >>> > >>> > Regards, >>> > Vlad >>> > >>> > >>> > PS: The state model definition is as follows: >>> > >>> > builder.addState(MASTER, 1); а а а а а а а а а а а а а а а а а а а а а >>> > а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а >>> > а а а а а а а а а а а а >>> > >>> > а а а а а а builder.addState(SLAVE, 2);а а а а а а а а а а а а а а а а >>> > а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а >>> > а а а а а а а а а а а а а а а а а а >>> > >>> > а а а а а а builder.addState(BOOTSTRAP, 3);а а а а а а а а а а а а а а >>> > а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а >>> > а а а а а а а а а а а а а а а а а а >>> > >>> > а а а а а а builder.addState(OFFLINE); а а а а а а а а а а а а а а а а >>> > а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а >>> > а а а а а а а а а а а а а а а а а а >>> > >>> > а а а а а а builder.addState(DROPPED); а а а а а а а а а а а а а а а а >>> > а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а >>> > а а а а а а а а а а а а а а а а а а >>> > >>> > а а а а а а // Set the initial state when the node startsа а а а а а а >>> > а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а >>> > а а а а а а а а а а а а а а а а а а >>> > >>> > а а а а а а builder.initialState(OFFLINE); а а а а а а а а а а а а а а >>> > а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а >>> > а а а а а а а а а а а а а а а а а а >>> > >>> > аа а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а >>> > а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а >>> > а а а а а а а а а а а а а а а а а а >>> > >>> > а а а а а а // Add transitions between the states. а а а а а а а а а а >>> > а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а >>> > а а а а а а а а а а а а а а а а а а >>> > >>> > а а а а а а builder.addTransition(OFFLINE, BOOTSTRAP, 4);а а а а а а а >>> > а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а >>> > а а а а а а а а а а а а а а а а а а >>> > >>> > а а а а а а builder.addTransition(BOOTSTRAP, SLAVE, 5);а а а а а а а а >>> > а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а >>> > а а а а а а а а а а а а а а а а а а >>> > >>> > а а а а а а builder.addTransition(SLAVE, MASTER, 6); а а а а а а а а а >>> > а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а >>> > а а а а а а а а а а а а а а а а а а >>> > >>> > а а а а а а builder.addTransition(MASTER, SLAVE, 3); а а а а а а а а а >>> > а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а >>> > а а а а а а а а а а а а а а а а а а >>> > >>> > а а а а а а builder.addTransition(SLAVE, OFFLINE, 2);а а а а а а а а а >>> > а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а >>> > а а а а а а а а а а а а а а а а а а >>> > >>> > а а а а а а builder.addTransition(OFFLINE, DROPPED, 1);а а а а а а а а >>> > а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а >>> > а а а а а а а а а а а а а а а а а а >>> > >>> > аа а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а >>> > а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а >>> > а а а а а а а а а а а а а а а а а а >>> > >>> > а а а а а а // set constraints on states.а а а а а а а а а а а а а а а >>> > а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а >>> > а а а а а а а а а а а а а а а а а а >>> > >>> > а а а а а а // static constraint а а а а а а а а а а а а а а а а а а а >>> > а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а >>> > а а а а а а а а а а а а а а а а а а >>> > >>> > а а а а а а builder.upperBound(MASTER, 1); а а а а а а а а а а а а а а >>> > а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а >>> > а а а а а а а а а а а а а а а а а а >>> > >>> > а а а а а а // dynamic constraint, R means it should be derived based >>> > on the replication а а а а а а а а а а а а а а а а а а а а а а а а а а >>> > а а а а а а а а а а а а а а а а а а а >>> > >>> > а а а а а а // factor. а а а а а а а а а а а а а а а а а а а а а а а а >>> > а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а >>> > а а а а а а а а а а а а а а а а а а >>> > >>> > а а а а а а builder.dynamicUpperBound(SLAVE, "R");а а а а а а а а а а а >>> > >>> > >>> > >>> > >>> >>> >> >> >
