Thanks for the additional info. Still not sure what could be going on. Do you notice any other suspicious LOG messages in the resourcemanager log? Are you able to show the results of <resourcemanagerwebaddress>/ws/v1/ cluster/scheduler? On the resourcemanager web UI, how much memory does it say is used?
On Wed, Nov 27, 2013 at 1:28 AM, 麦树荣 <[email protected]> wrote: > Hi, > > > > sorry, I complement some information. > > > > The hadoop 2.2.0 had been running normally for some days since I start up > the hadoop server. I can run jobs without any problems. > > Today suddenly the jobs cannot run and all the jobs’ status were keeping > “submitted” after submitting. > > There are 3 slavers and every slave has 32G memory and 24 cpus. > > > > The contents of my fair-scheduler.xml is as follows: > > > > <?xml version="1.0"?> > > <allocations> > > <queue name="root"> > > <minResources>10000mb,10vcores</minResources> > > <maxResources>90000mb,100vcores</maxResources> > > <maxRunningApps>50</maxRunningApps> > > <weight>2.0</weight> > > <schedulingMode>fair</schedulingMode> > > <aclSubmitApps> </aclSubmitApps> > > <aclAdministerApps> </aclAdministerApps> > > <queue name="queue1"> > > <minResources>10000mb,10vcores</minResources> > > <maxResources>30000mb,30vcores</maxResources> > > <maxRunningApps>10</maxRunningApps> > > <weight>2.0</weight> > > <schedulingMode>fair</schedulingMode> > > <aclAdministerApps>xxx1,xxx2 admins</aclAdministerApps> > > <aclSubmitApps>xxx1,xxx2,xxx3 datadev</aclSubmitApps> > > </queue> > > <queue name="queue2"> > > <minResources>10000mb,10vcores</minResources> > > <maxResources>30000mb,30vcores</maxResources> > > <maxRunningApps>10</maxRunningApps> > > <weight>2.0</weight> > > <schedulingMode>fair</schedulingMode> > > <aclAdministerApps>datadev admins</aclAdministerApps> > > <aclSubmitApps>xxx1 datadev</aclSubmitApps> > > </queue> > > <queue name="queue3"> > > <minResources>5000mb,5vcores</minResources> > > <maxResources>10000mb,10vcores</maxResources> > > <maxRunningApps>10</maxRunningApps> > > <weight>2.0</weight> > > <schedulingMode>fair</schedulingMode> > > <aclAdministerApps>datadev admins</aclAdministerApps> > > <aclSubmitApps>xxx1,xxx2 datadev</aclSubmitApps> > > </queue> > > <queue name="default"> > > <minResources>10000mb,10vcores</minResources> > > <maxResources>30000mb,30vcores</maxResources> > > <maxRunningApps>10</maxRunningApps> > > <weight>2.0</weight> > > <schedulingMode>fair</schedulingMode> > > <aclAdministerApps>xxx1 admins</aclAdministerApps> > > <aclSubmitApps>xxx1,xxx2,xxx3,root datadev</aclSubmitApps> > > </queue> > > </queue> > > <user name="xxx"> > > <maxRunningApps>10</maxRunningApps> > > </user> > > <userMaxAppsDefault>10</userMaxAppsDefault> > > </allocations> > > > > *发件人:* Sandy Ryza [mailto:[email protected]] > *发送时间:* 2013年11月27日 16:33 > *收件人:* [email protected] > *主题:* Re: problems of FairScheduler in hadoop2.2.0 > > > > Hi, > > > > Can you share the contents of your fair-scheduler.xml? If you submit just > a single job, does it run? What do you see if you go to > <resourcemanagerwebui>/ws/v1/cluster/scheduler? > > > > -Sandy > > > > On Wed, Nov 27, 2013 at 12:09 AM, 麦树荣 <[email protected]> wrote: > > Hi, all > > > > When I run jobs in hadoop 2.2.0, I encounter a problem. Suddenly, the > hadoop resourcemanager cannot work normally: When I submit jobs and the > jobs’ status all are “submitted” and cannot run. > > I cannot find any answers in the internet, who can give me some help? > Thanks. > > > > The resourcemanager log is as follows: > > > > 2013-11-27 14:39:10,749 ERROR > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: > Request for appInfo of unknown attemptappattempt_138474337603 > > 8_1129_000001 > > 2013-11-27 14:39:11,050 ERROR > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: > Request for appInfo of unknown attemptappattempt_138474337603 > > 8_1128_000001 > > 2013-11-27 14:39:11,050 ERROR > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: > Request for appInfo of unknown attemptappattempt_138474337603 > > 8_1127_000001 > > 2013-11-27 14:39:11,051 ERROR > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: > Request for appInfo of unknown attemptappattempt_138474337603 > > 8_1128_000001 > > 2013-11-27 14:39:11,051 ERROR > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: > Request for appInfo of unknown attemptappattempt_138474337603 > > 8_1127_000001 > > 2013-11-27 14:39:11,753 ERROR > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: > Request for appInfo of unknown attemptappattempt_138474337603 > > 8_1129_000001 > > 2013-11-27 14:39:11,754 ERROR > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: > Request for appInfo of unknown attemptappattempt_138474337603 > > 8_1129_000001 > > 2013-11-27 14:39:12,055 ERROR > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: > Request for appInfo of unknown attemptappattempt_138474337603 > > 8_1128_000001 > > 2013-11-27 14:39:12,055 ERROR > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: > Request for appInfo of unknown attemptappattempt_138474337603 > > 8_1127_000001 > > 2013-11-27 14:39:12,056 ERROR > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: > Request for appInfo of unknown attemptappattempt_138474337603 > > 8_1128_000001 > > 2013-11-27 14:39:12,056 ERROR > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: > Request for appInfo of unknown attemptappattempt_138474337603 > > 8_1127_000001 > > >
