As requested - I believe I've captured all of the required steps to get this working:
https://gist.github.com/epiphani/dd37e87acfb2f8c4cbb0 -Aaron On Thu, Oct 15, 2015 at 5:56 PM, Edward Capriolo <[email protected]> wrote: > Dude if you can write up your steps to do this it would be awesome! > > On Thu, Oct 15, 2015 at 5:49 PM, Aaron Wiebe <[email protected]> wrote: >> >> Ok, so it was not TEZ-2563 after all. >> >> While I was trying to fix the cloudera dependancy issue, I'd added the >> classpath manually. I took the depending jar and threw it into >> /apps/tez-0.7.0 along with everything else, removed that classpath >> reference, and things are working. >> >> Thanks Gopal, I wouldn't have found it without the nodemanager delay >> change. >> -Aaron >> >> On Thu, Oct 15, 2015 at 5:37 PM, Aaron Wiebe <[email protected]> wrote: >> > Ok, I may be hitting TEZ-2563. >> > >> > Since I'm added the cloudera jar via >> > tez.cluster.additional.classpath.prefix, the launcher has -classpath >> > to that jar - and nothing else. If I remove that section of the >> > commandline, the container executes. Relevant sections of >> > launch_container.sh: >> > >> > ... >> > export >> > CLASSPATH="/u/cloudera/parcels/CDH-5.4.1-1.cdh5.4.1.p0.6/jars/hadoop-mapreduce-client-common-2.6.0-cdh5.4.1.jar:$CLASSPATH:$PWD:$PWD/*:$HADOOP_CLIENT_CONF_DIR:$HADOOP_CONF_DIR:$HADOOP_COMMON_HOME/*:$HADOOP_COMMON_HOME/lib/*:$HADOOP_HDFS_HOME/*:$HADOOP_HDFS_HOME/lib/*:$HADOOP_YARN_HOME/*:$HADOOP_YARN_HOME/lib/*:" >> > ... >> > >> > exec /bin/bash -c "$JAVA_HOME/bin/java -server >> > -Djava.net.preferIPv4Stack=true -Dhadoop.metrics.log.level=WARN >> > -XX:+PrintGCDetails -verbose:gc -XX:+PrintGCTimeStamps -XX:+UseNUMA >> > -XX:+UseParallelGC -Xmx3866m -Xms3866m -XX:NewRatio=8 -XX:+UseNUMA >> > -XX:+UseParallelGC -classpath >> > >> > /u/cloudera/parcels/CDH/jars/hadoop-mapreduce-client-common-2.6.0-cdh5.4.1.jar >> > -Dlog4j.configuratorClass=org.apache.tez.common.TezLog4jConfigurator >> > -Dlog4j.configuration=tez-container-log4j.properties >> > >> > -Dyarn.app.container.log.dir=/u/log/hadoop-yarn/container/application_1444837422599_6372/container_e08_1444837422599_6372_01_000259 >> > -Dtez.root.logger=INFO,CLA -Djava.io.tmpdir=$PWD/tmp >> > org.apache.tez.runtime.task.TezChild 172.16.125.48 40257 >> > container_e08_1444837422599_6372_01_000259 >> > application_1444837422599_6372 1 >> > >> > 1>/u/log/hadoop-yarn/container/application_1444837422599_6372/container_e08_1444837422599_6372_01_000259/stdout >> > >> > 2>/u/log/hadoop-yarn/container/application_1444837422599_6372/container_e08_1444837422599_6372_01_000259/stderr >> > " >> > ... >> > >> > If I remove "-classpath >> > >> > /u/cloudera/parcels/CDH/jars/hadoop-mapreduce-client-common-2.6.0-cdh5.4.1.jar", >> > I get a successful class load. >> > >> > Doing things like adding :$CLASSPATH to the aux.jars.prefix doesn't >> > seem to work. Ideas? >> > >> > On Thu, Oct 15, 2015 at 4:24 PM, Gopal Vijayaraghavan >> > <[email protected]> wrote: >> >> >> >> >> >>>I'm convinced this is a hive issue, but I'm sending it here because >> >>>you folks might have a good idea on what the issue is. It appears >> >>>that the tez package from hdfs is not being localized when children >> >>>are spun up. The AM does work. >> >> >> >> I think the AM working + tasks not working needs you to get the yarn >> >> executor and check it. >> >> >> >> You need to set yarn.nodemanager.delete.debug-delay-sec=600 & restart >> >> node >> >> managers. >> >> >> >> Then you've got 10 minutes to ssh into the node where the task failed >> >> to >> >> read the container launcher shell script. >> >> >> >> In general, it's the missing classpath entry for the tez.tar.gz (which >> >> untars into a directory). >> >> >> >> The debug delay will let you some way to look into the error beyond the >> >> single error message. >> >> >> >>>Yet... this works for every other execution of tez. Is there >> >>>something I could look into here? I could in theory populate all >> >>>nodes with the tez libraries, but I feel like that would just lead me >> >>>down a bad path. Suggestions? >> >> >> >> As a temporary workaround, you can give up on rolling upgrades & untar >> >> the >> >> tarball onto the HDFS tez lib uris. >> >> >> >> >> >> <property> >> >> <name>tez.lib.uris</name> >> >> >> >> <value>${fs.default.name}/apps/tez-0.7/,${fs.default.name}/apps/tez-0.7/lib >> >> </value> >> >> </property> >> >> >> >> Cheers, >> >> >> >> Gopal >> >> >> >> > >
