Re: Yarn AM is abending job when submitting a remote job to cluster

Alexander Alten-Lorenz Thu, 19 Feb 2015 06:25:56 -0800

Daemeon,

Yes, deleting the older stagings should help. But could be that you have to 
restart the history server.


BR,
 Alex


> On 19 Feb 2015, at 15:12, roland.depratti <[email protected]> wrote:
> 
> Alex,
> 
> That sounds like a very likely situation.
> 
> I read in the first jira that tokens are now used in nonsecure setups, which 
> explains my earlier ssl question.
> 
> Is the solution simply to delete those staging files from the cluster?
> 
> - rd 
> 
> 
> Sent from my Verizon Wireless 4G LTE smartphone
> 
> 
> -------- Original message --------
> From: Alexander Alten-Lorenz <[email protected]> 
> Date:02/19/2015 7:43 AM (GMT-05:00) 
> To: [email protected] 
> Subject: Re: Yarn AM is abending job when submitting a remote job to cluster 
> 
> Hi,
> 
> https://issues.apache.org/jira/browse/YARN-1116 
> <https://issues.apache.org/jira/browse/YARN-1058>
> 
> Looks like that the history server received a unclean shutdown or an previous 
> job doesn’t finished, or wasn’t cleaned up after finishing the job 
> (2015-02-15 07:51:07,241 INFO [main] 
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster: Kind: YARN_AM_RM_TOKEN, 
> Service: , Ident: 
> (org.apache.hadoop.yarn.security.AMRMTokenIdentifier@33be1aa0 
> <mailto:org.apache.hadoop.yarn.security.AMRMTokenIdentifier@33be1aa0>) …. 
> Previous history file is at 
> hdfs://mycluster.com:8020/user/cloudera/.staging/job_1424003606313_0001/job_1424003606313_0001_1.jhist
>  
> <http://mycluster.com:8020/user/cloudera/.staging/job_1424003606313_0001/job_1424003606313_0001_1.jhist2015-02-15>).
> 
> BR,
> Alex
> 
> 
> > On 19 Feb 2015, at 13:27, Roland DePratti <[email protected]> wrote:
> > 
> > Daemeon,
> >  
> > Thanks for the reply.  I have about 6 months exposure to Hadoop and new to 
> > SSL so I did some digging after reading your message.
> >  
> > In the HDFS config, I have hadoop.ssl.enabled. using the default which is 
> > ‘false’  (which I understand sets it for all Hadoop daemons).
> >  
> > I assumed this meant that it is not in use and not a factor in job 
> > submission (ssl certs not needed).
> >  
> > Do I misunderstand and are you saying that it needs to be set to ‘true’ 
> > with valid certs and store setup for me to submit a remote job (this is a 
> > POC setup without exposure to outside my environment)?
> >  
> > -  rd
> >  
> > From: daemeon reiydelle [mailto:[email protected]] 
> > Sent: Wednesday, February 18, 2015 10:22 PM
> > To: [email protected]
> > Subject: Re: Yarn AM is abending job when submitting a remote job to cluster
> >  
> > I would guess you do not have your ssl certs set up, client or server, 
> > based on the error. 
> > 
> > 
> > .......
> > “Life should not be a journey to the grave with the intention of arriving 
> > safely in a
> > pretty and well preserved body, but rather to skid in broadside in a cloud 
> > of smoke,
> > thoroughly used up, totally worn out, and loudly proclaiming “Wow! What a 
> > Ride!” 
> > - Hunter Thompson
> > 
> > Daemeon C.M. Reiydelle
> > USA (+1) 415.501.0198
> > London (+44) (0) 20 8144 9872
> >  
> > On Wed, Feb 18, 2015 at 5:19 PM, Roland DePratti <[email protected] 
> > <mailto:[email protected]>> wrote:
> > I have been searching for a handle on a problem without very little clues. 
> > Any help pointing me to the right direction will be huge.
> > I have not received any input form the Cloudera google groups. Perhaps this 
> > is more Yarn based and I am hoping I have more luck here.
> > Any help is greatly appreciated.
> >  
> > I am running a Hadoop cluster using CDH5.3. I also have a client machine 
> > with a standalone one node setup (VM).
> >  
> > All environments are running CentOS 6.6.
> >  
> > I have submitted some Java mapreduce jobs locally on both the cluster and 
> > the standalone environment with successfully completions.   
> >  
> > I can submit a remote HDFS job from client to cluster using -conf 
> > hadoop-cluster.xml (see below) and get data back from the cluster with no 
> > problem.
> > 
> > When submitted remotely the mapreduce jobs remotely, I get an AM error:
> >  
> > AM fails the job with the error: 
> > 
> >            SecretManager$InvalidToken: appattempt_1424003606313_0001_000002 
> > not found in AMRMTokenSecretManager
> > 
> > I searched /var/log/secure on the client and cluster with no unusual 
> > messages.
> > 
> > Here is the contents of hadoop-cluster.xml:
> > 
> > <?xml version="1.0" encoding="UTF-8"?>
> > 
> > <!--generated by Roland-->
> > <configuration>
> >   <property>
> >     <name>fs.defaultFS</name>
> >     <value>hdfs://mycluser:8020</value>
> >   </property>
> >   <property>
> >     <name>mapreduce.jobtracker.address</name>
> >     <value>hdfs://mycluster:8032</value>
> >   </property>
> >   <property>
> >     <name>yarn.resourcemanager.address</name>
> >     <value>hdfs://mycluster:8032</value>
> >   </property>
> > 
> > Here is the output from the job log on the cluster:  
> > 
> > 2015-02-15 07:51:06,544 INFO [main] 
> > org.apache.hadoop.mapreduce.v2.app.MRAppMaster: Created MRAppMaster for 
> > application appattempt_1424003606313_0001_000002
> > 2015-02-15 07:51:06,949 WARN [main] org.apache.hadoop.conf.Configuration: 
> > job.xml:an attempt to override final parameter: 
> > hadoop.ssl.require.client.cert;  Ignoring.
> > 2015-02-15 07:51:06,952 WARN [main] org.apache.hadoop.conf.Configuration: 
> > job.xml:an attempt to override final parameter: 
> > mapreduce.job.end-notification.max.retry.interval;  Ignoring.
> > 2015-02-15 07:51:06,952 WARN [main] org.apache.hadoop.conf.Configuration: 
> > job.xml:an attempt to override final parameter: hadoop.ssl.client.conf;  
> > Ignoring.
> > 2015-02-15 07:51:06,954 WARN [main] org.apache.hadoop.conf.Configuration: 
> > job.xml:an attempt to override final parameter: 
> > hadoop.ssl.keystores.factory.class;  Ignoring.
> > 2015-02-15 07:51:06,957 WARN [main] org.apache.hadoop.conf.Configuration: 
> > job.xml:an attempt to override final parameter: hadoop.ssl.server.conf;  
> > Ignoring.
> > 2015-02-15 07:51:06,973 WARN [main] org.apache.hadoop.conf.Configuration: 
> > job.xml:an attempt to override final parameter: 
> > mapreduce.job.end-notification.max.attempts;  Ignoring.
> > 2015-02-15 07:51:07,241 INFO [main] 
> > org.apache.hadoop.mapreduce.v2.app.MRAppMaster: Executing with tokens:
> > 2015-02-15 07:51:07,241 INFO [main] 
> > org.apache.hadoop.mapreduce.v2.app.MRAppMaster: Kind: YARN_AM_RM_TOKEN, 
> > Service: , Ident: 
> > (org.apache.hadoop.yarn.security.AMRMTokenIdentifier@33be1aa0 
> > <mailto:org.apache.hadoop.yarn.security.AMRMTokenIdentifier@33be1aa0>)
> > 2015-02-15 07:51:07,332 INFO [main] 
> > org.apache.hadoop.mapreduce.v2.app.MRAppMaster: Using mapred 
> > newApiCommitter.
> > 2015-02-15 07:51:07,627 WARN [main] org.apache.hadoop.conf.Configuration: 
> > job.xml:an attempt to override final parameter: 
> > hadoop.ssl.require.client.cert;  Ignoring.
> > 2015-02-15 07:51:07,632 WARN [main] org.apache.hadoop.conf.Configuration: 
> > job.xml:an attempt to override final parameter: 
> > mapreduce.job.end-notification.max.retry.interval;  Ignoring.
> > 2015-02-15 07:51:07,632 WARN [main] org.apache.hadoop.conf.Configuration: 
> > job.xml:an attempt to override final parameter: hadoop.ssl.client.conf;  
> > Ignoring.
> > 2015-02-15 07:51:07,639 WARN [main] org.apache.hadoop.conf.Configuration: 
> > job.xml:an attempt to override final parameter: 
> > hadoop.ssl.keystores.factory.class;  Ignoring.
> > 2015-02-15 07:51:07,645 WARN [main] org.apache.hadoop.conf.Configuration: 
> > job.xml:an attempt to override final parameter: hadoop.ssl.server.conf;  
> > Ignoring.
> > 2015-02-15 07:51:07,663 WARN [main] org.apache.hadoop.conf.Configuration: 
> > job.xml:an attempt to override final parameter: 
> > mapreduce.job.end-notification.max.attempts;  Ignoring.
> > 2015-02-15 07:51:08,237 WARN [main] 
> > org.apache.hadoop.util.NativeCodeLoader: Unable to load native-hadoop 
> > library for your platform... using builtin-java classes where applicable
> > 2015-02-15 07:51:08,429 INFO [main] 
> > org.apache.hadoop.mapreduce.v2.app.MRAppMaster: OutputCommitter set in 
> > config null
> > 2015-02-15 07:51:08,499 INFO [main] 
> > org.apache.hadoop.mapreduce.v2.app.MRAppMaster: OutputCommitter is 
> > org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter
> > 2015-02-15 07:51:08,526 INFO [main] 
> > org.apache.hadoop.yarn.event.AsyncDispatcher: Registering class 
> > org.apache.hadoop.mapreduce.jobhistory.EventType for class 
> > org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler
> > 2015-02-15 07:51:08,527 INFO [main] 
> > org.apache.hadoop.yarn.event.AsyncDispatcher: Registering class 
> > org.apache.hadoop.mapreduce.v2.app.job.event.JobEventType for class 
> > org.apache.hadoop.mapreduce.v2.app.MRAppMaster$JobEventDispatcher
> > 2015-02-15 07:51:08,561 INFO [main] 
> > org.apache.hadoop.yarn.event.AsyncDispatcher: Registering class 
> > org.apache.hadoop.mapreduce.v2.app.job.event.TaskEventType for class 
> > org.apache.hadoop.mapreduce.v2.app.MRAppMaster$TaskEventDispatcher
> > 2015-02-15 07:51:08,562 INFO [main] 
> > org.apache.hadoop.yarn.event.AsyncDispatcher: Registering class 
> > org.apache.hadoop.mapreduce.v2.app.job.event.TaskAttemptEventType for class 
> > org.apache.hadoop.mapreduce.v2.app.MRAppMaster$TaskAttemptEventDispatcher
> > 2015-02-15 07:51:08,566 INFO [main] 
> > org.apache.hadoop.yarn.event.AsyncDispatcher: Registering class 
> > org.apache.hadoop.mapreduce.v2.app.commit.CommitterEventType for class 
> > org.apache.hadoop.mapreduce.v2.app.commit.CommitterEventHandler
> > 2015-02-15 07:51:08,568 INFO [main] 
> > org.apache.hadoop.yarn.event.AsyncDispatcher: Registering class 
> > org.apache.hadoop.mapreduce.v2.app.speculate.Speculator$EventType for class 
> > org.apache.hadoop.mapreduce.v2.app.MRAppMaster$SpeculatorEventDispatcher
> > 2015-02-15 07:51:08,568 INFO [main] 
> > org.apache.hadoop.yarn.event.AsyncDispatcher: Registering class 
> > org.apache.hadoop.mapreduce.v2.app.rm.ContainerAllocator$EventType for 
> > class 
> > org.apache.hadoop.mapreduce.v2.app.MRAppMaster$ContainerAllocatorRouter
> > 2015-02-15 07:51:08,570 INFO [main] 
> > org.apache.hadoop.yarn.event.AsyncDispatcher: Registering class 
> > org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncher$EventType for 
> > class org.apache.hadoop.mapreduce.v2.app.MRAppMaster$ContainerLauncherRouter
> > 2015-02-15 07:51:08,599 INFO [main] 
> > org.apache.hadoop.mapreduce.v2.app.MRAppMaster: Recovery is enabled. Will 
> > try to recover from previous life on best effort basis.
> > 2015-02-15 07:51:08,642 INFO [main] 
> > org.apache.hadoop.mapreduce.v2.app.MRAppMaster: Previous history file is at 
> > hdfs://mycluster.com:8020/user/cloudera/.staging/job_1424003606313_0001/job_1424003606313_0001_1.jhist
> >  
> > <http://mycluster.com:8020/user/cloudera/.staging/job_1424003606313_0001/job_1424003606313_0001_1.jhist2015-02-15>
> > 2015-02-15 
> > <http://mycluster.com:8020/user/cloudera/.staging/job_1424003606313_0001/job_1424003606313_0001_1.jhist2015-02-15>
> >  07:51:09,147 INFO [main] org.apache.hadoop.mapreduce.v2.app.MRAppMaster: 
> > Read completed tasks from history 0
> > 2015-02-15 07:51:09,193 INFO [main] 
> > org.apache.hadoop.yarn.event.AsyncDispatcher: Registering class 
> > org.apache.hadoop.mapreduce.v2.app.job.event.JobFinishEvent$Type for class 
> > org.apache.hadoop.mapreduce.v2.app.MRAppMaster$JobFinishEventHandler
> > 2015-02-15 07:51:09,222 INFO [main] 
> > org.apache.hadoop.metrics2.impl.MetricsConfig: loaded properties from 
> > hadoop-metrics2.properties
> > 2015-02-15 07:51:09,277 INFO [main] 
> > org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Scheduled snapshot 
> > period at 10 second(s).
> > 2015-02-15 07:51:09,277 INFO [main] 
> > org.apache.hadoop.metrics2.impl.MetricsSystemImpl: MRAppMaster metrics 
> > system started
> > 2015-02-15 07:51:09,286 INFO [main] 
> > org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: Adding job token for 
> > job_1424003606313_0001 to jobTokenSecretManager
> > 2015-02-15 07:51:09,306 INFO [main] 
> > org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: Not uberizing 
> > job_1424003606313_0001 because: not enabled; too much RAM;
> > 2015-02-15 07:51:09,324 INFO [main] 
> > org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: Input size for job 
> > job_1424003606313_0001 = 5343207. Number of splits = 5
> > 2015-02-15 07:51:09,325 INFO [main] 
> > org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: Number of reduces for 
> > job job_1424003606313_0001 = 1
> > 2015-02-15 07:51:09,325 INFO [main] 
> > org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: 
> > job_1424003606313_0001Job Transitioned from NEW to INITED
> > 2015-02-15 07:51:09,327 INFO [main] 
> > org.apache.hadoop.mapreduce.v2.app.MRAppMaster: MRAppMaster launching 
> > normal, non-uberized, multi-container job job_1424003606313_0001.
> > 2015-02-15 07:51:09,387 INFO [main]

Re: Yarn AM is abending job when submitting a remote job to cluster

Reply via email to