Hi, I finally have a working example to demonstrate Helix and YARN integration.
This is what the user provides 1. An application spec that defines number of services. For each service, the spec provides the configuration, number of containers, deployable package 2. A service implementation for each service that handles service init/start/stop. What Helix-YARN provides 1. Automatically requests containers from YARN and launch the services. 2. Monitors the state of the services and makes it discoverable. 3. Detects container failure and re-launches it by requesting new containers from YARN 4. Allows one to increase/decrease the number of containers dynamically. Helloworld recipe code is here https://git-wip-us.apache.org/repos/asf?p=helix.git;a=tree;f=recipes/helloworld-provisioning-yarn/src/main/java/org/apache/helix/provisioning/yarn/example;hb=helix-provisioning Here are the setup of instructions to try a HelloWorld service example. It will be great if we can get feedback/suggestions/questions. //Install YARN, local single node cluster. See the instructions here Setup YARN using the instructions here http://codesfusion.blogspot.com/2013/10/setup-hadoop-2x-220-on-ubuntu.html?m=1. Take a look at the single node setup script, I haven't tried this but looks like it should do the trick. https://github.com/ericduq/hadoop-scripts export YARN_HOME=<path to where YARN/HADOOP is installed> //ADD HADOOP/YARN xml files in the classpath, Don't miss this step. export CLASSPATH_PREFIX=$YARN_HOME/etc/hadoop git clone https://git-wip-us.apache.org/repos/asf?p=helix.git helix cd helix git checkout helix-provisioning export HELIX_CORE_SCRIPTS_HOME=`pwd`/helix-core/target/helix-core-pkg/bin export HELIX_PROVISIONING_SCRIPTS_HOME=`pwd`/helix-provisioning/target/helix-provisioning-pkg/bin export HELLOWORLD_APP_HOME=`pwd`/recipes/helloworld-provisioning-yarn export HELLOWORLD_APP_SCRIPTS_HOME=`pwd`/recipes/helloworld-provisioning-yarn/target/helloworld-provisioning-yarn/bin mvn clean package -DskipTests chmod +x $HELIX_PROVISIONING_SCRIPTS_HOME/*.sh chmod +x $HELIX_CORE_SCRIPTS_HOME/*.sh chmod +x $HELLOWORLD_APP_SCRIPTS_HOME/*.sh $HELLOWORLD_APP_SCRIPTS_HOME/app-launcher.sh --app_spec_provider org.apache.helix.provisioning.yarn.example.HelloWordAppSpecFactory --app_config_spec $HELLOWORLD_APP_HOME/src/main/resources/hello_world_app_spec.yaml hello_world_app_spec.yaml ================================ !!org.apache.helix.provisioning.yarn.example.HelloworldAppSpec appConfig: config: { k1: v1 } appMasterPackageUri: 'file:///Users/kgopalak/Documents/projects/incubator-helix/recipes/helloworld-provisioning-yarn/target/helloworld-provisioning-yarn-0.7.1-incubating-SNAPSHOT-pkg.tar' appName: testApp serviceConfigMap: HelloWorld: { num_containers: 3, memory: 1024 } serviceMainClassMap: { HelloWorld: org.apache.helix.provisioning.yarn.example.HelloWorldService } servicePackageURIMap: { HelloWorld: 'file:///Users/kgopalak/Documents/projects/incubator-helix/recipes/helloworld-provisioning-yarn/target/helloworld-provisioning-yarn-0.7.1-incubating-SNAPSHOT-pkg.tar' } services: [ HelloWorld] taskConfigs: null ====================================== 14/02/25 16:44:19 INFO yarn.AppLauncher: Submitted application with applicationId:application_1393375114439_0003 14/02/25 16:44:19 INFO yarn.AppLauncher: Got application report from ASM for, appId=3, clientToAMToken=null, appDiagnostics=, appMasterHost=N/A, appQueue=default, appMasterRpcPort=0, appStartTime=1393375458993, yarnAppState=ACCEPTED, distributedFinalState=UNDEFINED, appTrackingUrl=kgopalak-mn2:8088/proxy/application_1393375114439_0003/, appUser=kgopalak 14/02/25 16:44:49 INFO yarn.AppLauncher: Got application report from ASM for, appId=3, clientToAMToken=null, appDiagnostics=, appMasterHost=kgopalak-mn2/172.21.157.207, appQueue=default, appMasterRpcPort=-1, appStartTime=1393375458993, yarnAppState=RUNNING, distributedFinalState=UNDEFINED, appTrackingUrl=kgopalak-mn2:8088/proxy/application_1393375114439_0003/A, appUser=kgopalak SERVICE HelloWorld CONTAINER_NAME CONTAINER_STATE SERVICE_STATE CONTAINER_ID HelloWorld_container_2 CONNECTED ONLINE container_1393375114439_0003_01_000004 HelloWorld_container_1 CONNECTED ONLINE container_1393375114439_0003_01_000003 HelloWorld_container_0 CONNECTED ONLINE container_1393375114439_0003_01_000002 //FAILURE Stop a container HelloWorld_container_0. This will stop container_1393375114439_0003_01_000002 container, the Application Master detects that and restarts the container as container_1393375114439_0003_01_000005. Any configuration/metadata set by the old container container_1393375114439_0003_01_000002 will be made available to container_1393375114439_0003_01_000005. $HELIX_PROVISIONING_SCRIPTS_HOME/container-admin.sh --zookeeperAddress localhost:2181 --stopContainer testApp HelloWorld_container_0 SERVICE HelloWorld CONTAINER_NAME CONTAINER_STATE SERVICE_STATE CONTAINER_ID HelloWorld_container_2 CONNECTED ONLINE container_1393375114439_0003_01_000004 HelloWorld_container_1 CONNECTED ONLINE container_1393375114439_0003_01_000003 HelloWorld_container_0 CONNECTED ONLINE container_1393375114439_0003_01_000005 //Scale down Decrease the number of containers from 3 to 2. This will decrease the number of containers for HelloWorld service. It always stops the container that is ranked the lowest when sorted by their names (not container id). $HELIX_PROVISIONING_SCRIPTS_HOME/update-provisioner-config.sh --zookeeperAddress localhost:2181 --updateContainerCount testApp HelloWorld 2 SERVICE HelloWorld CONTAINER_NAME CONTAINER_STATE SERVICE_STATE CONTAINER_ID HelloWorld_container_1 CONNECTED ONLINE container_1393375114439_0003_01_000003 HelloWorld_container_0 CONNECTED ONLINE container_1393375114439_0003_01_000005 //SCALE UP Increase the number of containers from 2 to 4. SERVICE HelloWorld CONTAINER_NAME CONTAINER_STATE SERVICE_STATE CONTAINER_ID HelloWorld_container_2 CONNECTED ONLINE container_1393375114439_0003_01_000009 HelloWorld_container_1 CONNECTED ONLINE container_1393375114439_0003_01_000003 HelloWorld_container_3 CONNECTED ONLINE container_1393375114439_0003_01_000010 HelloWorld_container_0 CONNECTED ONLINE container_1393375114439_0003_01_000005 At any point, additional information about the cluster can be got via Helix-Admin apis. http://helix.apache.org/0.6.2-incubating-docs/tutorial_admin.html Thanks, Kishore G
