Ok, I may be hitting TEZ-2563.

Since I'm added the cloudera jar via
tez.cluster.additional.classpath.prefix, the launcher has -classpath
to that jar - and nothing else.  If I remove that section of the
commandline, the container executes.  Relevant sections of
launch_container.sh:

...
export 
CLASSPATH="/u/cloudera/parcels/CDH-5.4.1-1.cdh5.4.1.p0.6/jars/hadoop-mapreduce-client-common-2.6.0-cdh5.4.1.jar:$CLASSPATH:$PWD:$PWD/*:$HADOOP_CLIENT_CONF_DIR:$HADOOP_CONF_DIR:$HADOOP_COMMON_HOME/*:$HADOOP_COMMON_HOME/lib/*:$HADOOP_HDFS_HOME/*:$HADOOP_HDFS_HOME/lib/*:$HADOOP_YARN_HOME/*:$HADOOP_YARN_HOME/lib/*:"
...

exec /bin/bash -c "$JAVA_HOME/bin/java -server
-Djava.net.preferIPv4Stack=true -Dhadoop.metrics.log.level=WARN
-XX:+PrintGCDetails -verbose:gc -XX:+PrintGCTimeStamps -XX:+UseNUMA
-XX:+UseParallelGC -Xmx3866m -Xms3866m -XX:NewRatio=8 -XX:+UseNUMA
-XX:+UseParallelGC -classpath
/u/cloudera/parcels/CDH/jars/hadoop-mapreduce-client-common-2.6.0-cdh5.4.1.jar
-Dlog4j.configuratorClass=org.apache.tez.common.TezLog4jConfigurator
-Dlog4j.configuration=tez-container-log4j.properties
-Dyarn.app.container.log.dir=/u/log/hadoop-yarn/container/application_1444837422599_6372/container_e08_1444837422599_6372_01_000259
-Dtez.root.logger=INFO,CLA  -Djava.io.tmpdir=$PWD/tmp
org.apache.tez.runtime.task.TezChild 172.16.125.48 40257
container_e08_1444837422599_6372_01_000259
application_1444837422599_6372 1
1>/u/log/hadoop-yarn/container/application_1444837422599_6372/container_e08_1444837422599_6372_01_000259/stdout
2>/u/log/hadoop-yarn/container/application_1444837422599_6372/container_e08_1444837422599_6372_01_000259/stderr
"
...

If I remove "-classpath
/u/cloudera/parcels/CDH/jars/hadoop-mapreduce-client-common-2.6.0-cdh5.4.1.jar",
I get a successful class load.

Doing things like adding :$CLASSPATH to the aux.jars.prefix doesn't
seem to work.  Ideas?

On Thu, Oct 15, 2015 at 4:24 PM, Gopal Vijayaraghavan <[email protected]> wrote:
>
>
>>I'm convinced this is a hive issue, but I'm sending it here because
>>you folks might have a good idea on what the issue is.  It appears
>>that the tez package from hdfs is not being localized when children
>>are spun up.  The AM does work.
>
> I think the AM working + tasks not working needs you to get the yarn
> executor and check it.
>
> You need to set yarn.nodemanager.delete.debug-delay-sec=600 & restart node
> managers.
>
> Then you've got 10 minutes to ssh into the node where the task failed to
> read the container launcher shell script.
>
> In general, it's the missing classpath entry for the tez.tar.gz (which
> untars into a directory).
>
> The debug delay will let you some way to look into the error beyond the
> single error message.
>
>>Yet... this works for every other execution of tez.  Is there
>>something I could look into here?  I could in theory populate all
>>nodes with the tez libraries, but I feel like that would just lead me
>>down a bad path.  Suggestions?
>
> As a temporary workaround, you can give up on rolling upgrades & untar the
> tarball onto the HDFS tez lib uris.
>
>
> <property>
> <name>tez.lib.uris</name>
> <value>${fs.default.name}/apps/tez-0.7/,${fs.default.name}/apps/tez-0.7/lib
> </value>
> </property>
>
> Cheers,
>
> Gopal
>
>

Reply via email to