I didn't config yarn.scheduler.capacity.resource-calculator ,So I am on YARN-CS + DefaultResourceCalculator
[email protected] From: Gopal V Date: 2015-02-04 01:25 To: [email protected]; user Subject: Re: tez map task and reduce task stay pending forerver On 1/27/15, 9:24 PM, [email protected] wrote: > I test again, I found if I set mapreduce.map.cpu.vcores >1 ,the job will hang > . Very similar > to https://issues.apache.org/jira/browse/TEZ-704 I suspect this might be a YARN scheduler bug. Are you using the FairScheduler or the CapacityScheduler? I cannot reproduce this issue using my YARN-CS cluster, but I suspect capacity scheduler is automatically set up to do the dominant resource scheduling. Are you on FS + Fair instead of FS + DRF or CS? Cheers, Gopal > From: [email protected] > Date: 2015-01-28 10:29 > To: user > Subject: Re: Re: tez map task and reduce task stay pending forerver > o yeah . I fix the problem. > I add the config to my hive-site.xml > <property> > <name>yarn.app.mapreduce.am.resource.mb</name> > <value>1024</value> > </property> > <property> > <name>yarn.app.mapreduce.am.resource.cpu-vcores</name> > <value>1</value> > </property> > <property> > <name>yarn.app.mapreduce.am.command-opts</name> > <value>-Djava.net.preferIPv4Stack=true -Xmx825955249</value> > </property> > <property> > <name>mapreduce.map.java.opts</name> > <value>-Djava.net.preferIPv4Stack=true -Xmx825955249</value> > </property> > <property> > <name>mapreduce.reduce.java.opts</name> > <value>-Djava.net.preferIPv4Stack=true -Xmx825955249</value> > </property> > <property> > <name>mapreduce.map.memory.mb</name> > <value>1024</value> > </property> > <property> > <name>mapreduce.map.cpu.vcores</name> > <value>1</value> > </property> > <property> > <name>mapreduce.reduce.memory.mb</name> > <value>1024</value> > </property> > <property> > <name>mapreduce.reduce.cpu.vcores</name> > <value>1</value> > </property> > And config my tez-site.xml just > <property> > <name>tez.lib.uris</name> > <value>${fs.defaultFS}/apps/tez-0.5.3/tez-0.5.3-minimal.tar.gz</value> > </property> > <property> > <name>tez.use.cluster.hadoop-libs</name> > <value>true</value> > </property> > > Every thing is ok. > I think some config in my cluster is too larger . > > > > [email protected] > > From: [email protected] > Date: 2015-01-28 10:24 > To: user > Subject: Re: Re: tez map task and reduce task stay pending forerver > No . set hive.execution.engine=mr , still hang... > > > > [email protected] > > 发件人: Jianfeng (Jeff) Zhang > 发送时间: 2015-01-28 10:11 > 收件人: user > 主题: Re: 回复: tez map task and reduce task stay pending forerver > Can you run this query successfully using hive on mr ? > > > > Best Regards, > Jeff Zhang > > > On Wed, Jan 28, 2015 at 10:01 AM, [email protected] <[email protected]> > wrote: > > I check the tez document from HDP page > http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.1.7/bk_installing_manually_book/content/rpm-chap-tez_configure_tez.html. > > tez.am.resource.memory.mb default value is 1536 > My hadoop yarn.app.mapreduce.am.resource.mb value is 5734 MiB > > The configuration mismatch will cause the problem ? > > > [email protected] > > 发件人: [email protected] > 发送时间: 2015-01-27 17:59 > 收件人: user > 主题: 回复: 回复: tez map task and reduce task stay pending forerver > Sorry Gopal V, I made a mistake, My config mapreduce.map.memory.mb is 2867 . > > > > [email protected] > > 发件人: [email protected] > 发送时间: 2015-01-27 17:58 > 收件人: user > 主题: 回复: 回复: tez map task and reduce task stay pending forerver > Hello Gopal V, > I check my cdh config ,I found mapreduce.map.memory.mb is 2876. > [email protected] > > 发件人: [email protected] > 发送时间: 2015-01-27 17:31 > 收件人: user > 主题: 回复: Re: tez map task and reduce task stay pending forerver > > I check the hivetez.log . No kill request trigger by hive. > > > [email protected] > > 发件人: Gopal V > 发送时间: 2015-01-27 17:17 > 收件人: user > 抄送: [email protected] > 主题: Re: tez map task and reduce task stay pending forerver > On 1/27/15, 12:50 AM, [email protected] wrote: >> hive 0.14.0 tez 0.53 hadoop 2.3.0-cdh 5.0.2 >> hive> select * from p_city order by id; >> Query ID = zhoushugang_20150127163434_da70d957-6ac4-4b8b-a484-42b593838076 > ... >> -------------------------------------------------------------------------------- >> >> VERTICES STATUS TOTAL COMPLETED RUNNING PENDING FAILED KILLED >> -------------------------------------------------------------------------------- >> >> Map 1 INITED 1 0 0 1 0 0 >> Reducer 2 INITED 1 0 0 1 0 0 > > Looks all container requests are pending/unresponsive. > > I see a container request in the log with > > 2015-01-27 15:43:15,434 INFO [TaskSchedulerEventHandlerThread] > rm.YarnTaskSchedulerService: Allocation request for task: > attempt_1419300485749_371785_1_00_000000_0 with request: > Capability[<memory:2867, vCores:3>]Priority[2] host: > yhd-jqhadoop11.int.yihaodian.com rack: null > ... > 2015-01-27 15:43:17,635 INFO [DelayedContainerManager] > rm.YarnTaskSchedulerService: Releasing held container as either there > are pending but unmatched requests or this is not a session, > containerId=container_1419300485749_371785_01_000002, pendingTasks=1, > isSession=true. isNew=true > > That seems to indicate that a container allocation request was made, but > YARN resource manager never responded with a container (or gave the > wrong container?). > > Does the container size 2867 suggest any idea on what that might be? > > Cheers, > Gopal > > > > CONFIDENTIALITY NOTICE > NOTICE: This message is intended for the use of the individual or entity to > which it is addressed > and may contain information that is confidential, privileged and exempt from > disclosure under > applicable law. If the reader of this message is not the intended recipient, > you are hereby > notified that any printing, copying, dissemination, distribution, disclosure > or forwarding > of this communication is strictly prohibited. If you have received this > communication in error, > please contact the sender immediately and delete it from your system. Thank > You. >
