Hi,

it is a multiple-node cluster, two master nodes (rm1 and rm2), below is my 
yarn-site.xml.

At the moment, the ResourceManager HA works if:

1) at rm1, run ./sbin/start-yarn.sh

yarn rmadmin -getServiceState rm1
active

yarn rmadmin -getServiceState rm2
14/08/12 07:47:59 INFO ipc.Client: Retrying connect to server: 
rm1/192.168.1.1:23142. Already tried 0 time(s); retry policy is 
RetryUpToMaximumCountWithFixedSleep(maxRetries=1, sleepTime=1000 MILLISECONDS)
Operation failed: Call From rm2/192.168.1.2 to rm2:23142 failed on connection 
exception: java.net.ConnectException: Connection refused; For more details see: 
 http://wiki.apache.org/hadoop/ConnectionRefused


2) at rm2, run ./sbin/start-yarn.sh

yarn rmadmin -getServiceState rm1
standby


Some questions:
Q1)  I need start yarn in EACH master separately, is this normal? Is there a 
way that I just run ./sbin/start-yarn.sh in rm1 and get the STANDBY 
ResourceManager in rm2 started as well?

Q2) How to get alerts (e.g. by email) if the ACTIVE ResourceManager is down in 
an auto-failover env? or how do you monitor the status of ACTIVE/STANDBY 
ResourceManager?   


Regards
Arthur


<?xml version="1.0"?>
<configuration>

<!-- Site specific YARN configuration properties -->

   <property>
      <name>yarn.nodemanager.aux-services</name>
      <value>mapreduce_shuffle</value>
   </property>

   <property>
      <name>yarn.resourcemanager.address</name>
      <value>192.168.1.1:8032</value>
   </property>

   <property>
       <name>yarn.resourcemanager.resource-tracker.address</name>
       <value>192.168.1.1:8031</value>
   </property>

   <property>
       <name>yarn.resourcemanager.admin.address</name>
       <value>192.168.1.1:8033</value>
   </property>

   <property>
       <name>yarn.resourcemanager.scheduler.address</name>
       <value>192.168.1.1:8030</value>
   </property>

   <property>
      <name>yarn.nodemanager.loacl-dirs</name>
       <value>/edh/hadoop_data/mapred/nodemanager</value>
       <final>true</final>
   </property>

   <property>
       <name>yarn.web-proxy.address</name>
       <value>192.168.1.1:8888</value>
   </property>

   <property>
      <name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
      <value>org.apache.hadoop.mapred.ShuffleHandler</value>
   </property>




   <property>
      <name>yarn.nodemanager.resource.memory-mb</name>
      <value>18432</value>
   </property>

   <property>
      <name>yarn.scheduler.minimum-allocation-mb</name>
      <value>9216</value>
   </property>

   <property>
      <name>yarn.scheduler.maximum-allocation-mb</name>
      <value>18432</value>
   </property>



  <property>
    <name>yarn.resourcemanager.connect.retry-interval.ms</name>
    <value>2000</value>
  </property>
  <property>
    <name>yarn.resourcemanager.ha.enabled</name>
    <value>true</value>
  </property>
  <property>
    <name>yarn.resourcemanager.ha.automatic-failover.enabled</name>
    <value>true</value>
  </property>
  <property>
    <name>yarn.resourcemanager.ha.automatic-failover.embedded</name>
    <value>true</value>
  </property>
  <property>
    <name>yarn.resourcemanager.cluster-id</name>
    <value>cluster_rm</value>
  </property>
  <property>
    <name>yarn.resourcemanager.ha.rm-ids</name>
    <value>rm1,rm2</value>
  </property>
  <property>
    <name>yarn.resourcemanager.hostname.rm1</name>
    <value>192.168.1.1</value>
  </property>
  <property>
    <name>yarn.resourcemanager.hostname.rm2</name>
    <value>192.168.1.2</value>
  </property>
  <property>
    <name>yarn.resourcemanager.scheduler.class</name>
    
<value>org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler</value>
  </property>
  <property>
    <name>yarn.resourcemanager.recovery.enabled</name>
    <value>true</value>
  </property>
  <property>
    <name>yarn.resourcemanager.store.class</name>
    
<value>org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore</value>
  </property>
  <property>
      <name>yarn.resourcemanager.zk-address</name>
      <value>rm1:2181,m135:2181,m137:2181</value>
  </property>
  <property>
    <name>yarn.app.mapreduce.am.scheduler.connection.wait.interval-ms</name>
    <value>5000</value>
  </property>

  <!-- RM1 configs -->
  <property>
    <name>yarn.resourcemanager.address.rm1</name>
    <value>192.168.1.1:23140</value>
  </property>
  <property>
    <name>yarn.resourcemanager.scheduler.address.rm1</name>
    <value>192.168.1.1:23130</value>
  </property>
  <property>
    <name>yarn.resourcemanager.webapp.https.address.rm1</name>
    <value>192.168.1.1:23189</value>
  </property>
  <property>
    <name>yarn.resourcemanager.webapp.address.rm1</name>
    <value>192.168.1.1:23188</value>
  </property>
  <property>
    <name>yarn.resourcemanager.resource-tracker.address.rm1</name>
    <value>192.168.1.1:23125</value>
  </property>
  <property>
    <name>yarn.resourcemanager.admin.address.rm1</name>
    <value>192.168.1.1:23142</value>
  </property>


  <!-- RM2 configs -->
  <property>
    <name>yarn.resourcemanager.address.rm2</name>
    <value>192.168.1.2:23140</value>
  </property>
  <property>
    <name>yarn.resourcemanager.scheduler.address.rm2</name>
    <value>192.168.1.2:23130</value>
  </property>
  <property>
    <name>yarn.resourcemanager.webapp.https.address.rm2</name>
    <value>192.168.1.2:23189</value>
  </property>
  <property>
    <name>yarn.resourcemanager.webapp.address.rm2</name>
    <value>192.168.1.2:23188</value>
  </property>
  <property>
    <name>yarn.resourcemanager.resource-tracker.address.rm2</name>
    <value>192.168.1.2:23125</value>
  </property>
  <property>
    <name>yarn.resourcemanager.admin.address.rm2</name>
    <value>192.168.1.2:23142</value>
  </property>

  <property>
    <name>yarn.nodemanager.remote-app-log-dir</name>
    <value>/edh/hadoop_logs/hadoop/</value>
  </property>

</configuration>



On 12 Aug, 2014, at 1:49 am, Xuan Gong <xg...@hortonworks.com> wrote:

> Hey, Arthur:
> 
>     Did you use single node cluster or multiple nodes cluster? Could you 
> share your configuration file (yarn-site.xml) ? This looks like a 
> configuration issue. 
> 
> Thanks
> 
> Xuan Gong
> 
> 
> On Mon, Aug 11, 2014 at 9:45 AM, arthur.hk.c...@gmail.com 
> <arthur.hk.c...@gmail.com> wrote:
> Hi,
> 
> If I have TWO nodes for ResourceManager HA, what should be the correct steps 
> and commands to start and stop ResourceManager in a ResourceManager HA 
> cluster ?
> Unlike ./sbin/start-dfs.sh (which can start all NNs from a NN), it seems that 
>  ./sbin/start-yarn.sh can only start YARN in a node at a time.
> 
> Regards
> Arthur
> 
> 

Reply via email to