Re: Hadoop 2.4.1 Verifying Automatic Failover Failed: ResourceManager

[email protected] Mon, 11 Aug 2014 22:19:25 -0700

Hi,

Thank y very much!


At the moment if I run ./sbin/start-yarn.sh in rm1, the standby STANDBY 
ResourceManager in rm2 is not started accordingly.  Please advise what would be 
wrong? Thanks

Regards
Arthur




On 12 Aug, 2014, at 1:13 pm, Xuan Gong <[email protected]> wrote:

> Some questions:
> Q1)  I need start yarn in EACH master separately, is this normal? Is there a 
> way that I just run ./sbin/start-yarn.sh in rm1 and get the STANDBY 
> ResourceManager in rm2 started as well?
> 
> No, need to start multiple RMs separately.
> 
> Q2) How to get alerts (e.g. by email) if the ACTIVE ResourceManager is down 
> in an auto-failover env? or how do you monitor the status of ACTIVE/STANDBY 
> ResourceManager? 
> 
> Interesting question. But one of the design for auto-failover is that the 
> down-time of RM is invisible to end users. The end users can submit 
> applications normally even if the failover happens. 
> 
> We can monitor the status of RMs by using the command-line (you did 
> previously) or from webUI/webService (rm_address:portnumber/cluster/cluster). 
> We can get the current status from there.
> 
> Thanks
> 
> Xuan Gong
> 
> 
> On Mon, Aug 11, 2014 at 5:12 PM, [email protected] 
> <[email protected]> wrote:
> Hi,
> 
> it is a multiple-node cluster, two master nodes (rm1 and rm2), below is my 
> yarn-site.xml.
> 
> At the moment, the ResourceManager HA works if:
> 
> 1) at rm1, run ./sbin/start-yarn.sh
> 
> yarn rmadmin -getServiceState rm1
> active
> 
> yarn rmadmin -getServiceState rm2
> 14/08/12 07:47:59 INFO ipc.Client: Retrying connect to server: 
> rm1/192.168.1.1:23142. Already tried 0 time(s); retry policy is 
> RetryUpToMaximumCountWithFixedSleep(maxRetries=1, sleepTime=1000 MILLISECONDS)
> Operation failed: Call From rm2/192.168.1.2 to rm2:23142 failed on connection 
> exception: java.net.ConnectException: Connection refused; For more details 
> see:  http://wiki.apache.org/hadoop/ConnectionRefused
> 
> 
> 2) at rm2, run ./sbin/start-yarn.sh
> 
> yarn rmadmin -getServiceState rm1
> standby
> 
> 
> Some questions:
> Q1)  I need start yarn in EACH master separately, is this normal? Is there a 
> way that I just run ./sbin/start-yarn.sh in rm1 and get the STANDBY 
> ResourceManager in rm2 started as well?
> 
> Q2) How to get alerts (e.g. by email) if the ACTIVE ResourceManager is down 
> in an auto-failover env? or how do you monitor the status of ACTIVE/STANDBY 
> ResourceManager?   
> 
> 
> Regards
> Arthur
> 
> 
> <?xml version="1.0"?>
> <configuration>
> 
> <!-- Site specific YARN configuration properties -->
> 
>    <property>
>       <name>yarn.nodemanager.aux-services</name>
>       <value>mapreduce_shuffle</value>
>    </property>
> 
>    <property>
>       <name>yarn.resourcemanager.address</name>
>       <value>192.168.1.1:8032</value>
>    </property>
> 
>    <property>
>        <name>yarn.resourcemanager.resource-tracker.address</name>
>        <value>192.168.1.1:8031</value>
>    </property>
> 
>    <property>
>        <name>yarn.resourcemanager.admin.address</name>
>        <value>192.168.1.1:8033</value>
>    </property>
> 
>    <property>
>        <name>yarn.resourcemanager.scheduler.address</name>
>        <value>192.168.1.1:8030</value>
>    </property>
> 
>    <property>
>       <name>yarn.nodemanager.loacl-dirs</name>
>        <value>/edh/hadoop_data/mapred/nodemanager</value>
>        <final>true</final>
>    </property>
> 
>    <property>
>        <name>yarn.web-proxy.address</name>
>        <value>192.168.1.1:8888</value>
>    </property>
> 
>    <property>
>       <name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
>       <value>org.apache.hadoop.mapred.ShuffleHandler</value>
>    </property>
> 
> 
> 
> 
>    <property>
>       <name>yarn.nodemanager.resource.memory-mb</name>
>       <value>18432</value>
>    </property>
> 
>    <property>
>       <name>yarn.scheduler.minimum-allocation-mb</name>
>       <value>9216</value>
>    </property>
> 
>    <property>
>       <name>yarn.scheduler.maximum-allocation-mb</name>
>       <value>18432</value>
>    </property>
> 
> 
> 
>   <property>
>     <name>yarn.resourcemanager.connect.retry-interval.ms</name>
>     <value>2000</value>
>   </property>
>   <property>
>     <name>yarn.resourcemanager.ha.enabled</name>
>     <value>true</value>
>   </property>
>   <property>
>     <name>yarn.resourcemanager.ha.automatic-failover.enabled</name>
>     <value>true</value>
>   </property>
>   <property>
>     <name>yarn.resourcemanager.ha.automatic-failover.embedded</name>
>     <value>true</value>
>   </property>
>   <property>
>     <name>yarn.resourcemanager.cluster-id</name>
>     <value>cluster_rm</value>
>   </property>
>   <property>
>     <name>yarn.resourcemanager.ha.rm-ids</name>
>     <value>rm1,rm2</value>
>   </property>
>   <property>
>     <name>yarn.resourcemanager.hostname.rm1</name>
>     <value>192.168.1.1</value>
>   </property>
>   <property>
>     <name>yarn.resourcemanager.hostname.rm2</name>
>     <value>192.168.1.2</value>
>   </property>
>   <property>
>     <name>yarn.resourcemanager.scheduler.class</name>
>     
> <value>org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler</value>
>   </property>
>   <property>
>     <name>yarn.resourcemanager.recovery.enabled</name>
>     <value>true</value>
>   </property>
>   <property>
>     <name>yarn.resourcemanager.store.class</name>
>     
> <value>org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore</value>
>   </property>
>   <property>
>       <name>yarn.resourcemanager.zk-address</name>
>       <value>rm1:2181,m135:2181,m137:2181</value>
>   </property>
>   <property>
>     <name>yarn.app.mapreduce.am.scheduler.connection.wait.interval-ms</name>
>     <value>5000</value>
>   </property>
> 
>   <!-- RM1 configs -->
>   <property>
>     <name>yarn.resourcemanager.address.rm1</name>
>     <value>192.168.1.1:23140</value>
>   </property>
>   <property>
>     <name>yarn.resourcemanager.scheduler.address.rm1</name>
>     <value>192.168.1.1:23130</value>
>   </property>
>   <property>
>     <name>yarn.resourcemanager.webapp.https.address.rm1</name>
>     <value>192.168.1.1:23189</value>
>   </property>
>   <property>
>     <name>yarn.resourcemanager.webapp.address.rm1</name>
>     <value>192.168.1.1:23188</value>
>   </property>
>   <property>
>     <name>yarn.resourcemanager.resource-tracker.address.rm1</name>
>     <value>192.168.1.1:23125</value>
>   </property>
>   <property>
>     <name>yarn.resourcemanager.admin.address.rm1</name>
>     <value>192.168.1.1:23142</value>
>   </property>
> 
> 
>   <!-- RM2 configs -->
>   <property>
>     <name>yarn.resourcemanager.address.rm2</name>
>     <value>192.168.1.2:23140</value>
>   </property>
>   <property>
>     <name>yarn.resourcemanager.scheduler.address.rm2</name>
>     <value>192.168.1.2:23130</value>
>   </property>
>   <property>
>     <name>yarn.resourcemanager.webapp.https.address.rm2</name>
>     <value>192.168.1.2:23189</value>
>   </property>
>   <property>
>     <name>yarn.resourcemanager.webapp.address.rm2</name>
>     <value>192.168.1.2:23188</value>
>   </property>
>   <property>
>     <name>yarn.resourcemanager.resource-tracker.address.rm2</name>
>     <value>192.168.1.2:23125</value>
>   </property>
>   <property>
>     <name>yarn.resourcemanager.admin.address.rm2</name>
>     <value>192.168.1.2:23142</value>
>   </property>
> 
>   <property>
>     <name>yarn.nodemanager.remote-app-log-dir</name>
>     <value>/edh/hadoop_logs/hadoop/</value>
>   </property>
> 
> </configuration>
> 
> 
> 
> On 12 Aug, 2014, at 1:49 am, Xuan Gong <[email protected]> wrote:
> 
>> Hey, Arthur:
>> 
>>     Did you use single node cluster or multiple nodes cluster? Could you 
>> share your configuration file (yarn-site.xml) ? This looks like a 
>> configuration issue. 
>> 
>> Thanks
>> 
>> Xuan Gong
>> 
>> 
>> On Mon, Aug 11, 2014 at 9:45 AM, [email protected] 
>> <[email protected]> wrote:
>> Hi,
>> 
>> If I have TWO nodes for ResourceManager HA, what should be the correct steps 
>> and commands to start and stop ResourceManager in a ResourceManager HA 
>> cluster ?
>> Unlike ./sbin/start-dfs.sh (which can start all NNs from a NN), it seems 
>> that  ./sbin/start-yarn.sh can only start YARN in a node at a time.
>> 
>> Regards
>> Arthur
>> 
>> 
> 
> 
> 
> CONFIDENTIALITY NOTICE
> NOTICE: This message is intended for the use of the individual or entity to 
> which it is addressed and may contain information that is confidential, 
> privileged and exempt from disclosure under applicable law. If the reader of 
> this message is not the intended recipient, you are hereby notified that any 
> printing, copying, dissemination, distribution, disclosure or forwarding of 
> this communication is strictly prohibited. If you have received this 
> communication in error, please contact the sender immediately and delete it 
> from your system. Thank You.

Re: Hadoop 2.4.1 Verifying Automatic Failover Failed: ResourceManager

Reply via email to