As requested - I believe I've captured all of the required steps to
get this working:

https://gist.github.com/epiphani/dd37e87acfb2f8c4cbb0

-Aaron

On Thu, Oct 15, 2015 at 5:56 PM, Edward Capriolo <[email protected]> wrote:
> Dude if you can write up your steps to do this it would be awesome!
>
> On Thu, Oct 15, 2015 at 5:49 PM, Aaron Wiebe <[email protected]> wrote:
>>
>> Ok, so it was not TEZ-2563 after all.
>>
>> While I was trying to fix the cloudera dependancy issue, I'd added the
>> classpath manually.  I took the depending jar and threw it into
>> /apps/tez-0.7.0 along with everything else, removed that classpath
>> reference, and things are working.
>>
>> Thanks Gopal, I wouldn't have found it without the nodemanager delay
>> change.
>> -Aaron
>>
>> On Thu, Oct 15, 2015 at 5:37 PM, Aaron Wiebe <[email protected]> wrote:
>> > Ok, I may be hitting TEZ-2563.
>> >
>> > Since I'm added the cloudera jar via
>> > tez.cluster.additional.classpath.prefix, the launcher has -classpath
>> > to that jar - and nothing else.  If I remove that section of the
>> > commandline, the container executes.  Relevant sections of
>> > launch_container.sh:
>> >
>> > ...
>> > export
>> > CLASSPATH="/u/cloudera/parcels/CDH-5.4.1-1.cdh5.4.1.p0.6/jars/hadoop-mapreduce-client-common-2.6.0-cdh5.4.1.jar:$CLASSPATH:$PWD:$PWD/*:$HADOOP_CLIENT_CONF_DIR:$HADOOP_CONF_DIR:$HADOOP_COMMON_HOME/*:$HADOOP_COMMON_HOME/lib/*:$HADOOP_HDFS_HOME/*:$HADOOP_HDFS_HOME/lib/*:$HADOOP_YARN_HOME/*:$HADOOP_YARN_HOME/lib/*:"
>> > ...
>> >
>> > exec /bin/bash -c "$JAVA_HOME/bin/java -server
>> > -Djava.net.preferIPv4Stack=true -Dhadoop.metrics.log.level=WARN
>> > -XX:+PrintGCDetails -verbose:gc -XX:+PrintGCTimeStamps -XX:+UseNUMA
>> > -XX:+UseParallelGC -Xmx3866m -Xms3866m -XX:NewRatio=8 -XX:+UseNUMA
>> > -XX:+UseParallelGC -classpath
>> >
>> > /u/cloudera/parcels/CDH/jars/hadoop-mapreduce-client-common-2.6.0-cdh5.4.1.jar
>> > -Dlog4j.configuratorClass=org.apache.tez.common.TezLog4jConfigurator
>> > -Dlog4j.configuration=tez-container-log4j.properties
>> >
>> > -Dyarn.app.container.log.dir=/u/log/hadoop-yarn/container/application_1444837422599_6372/container_e08_1444837422599_6372_01_000259
>> > -Dtez.root.logger=INFO,CLA  -Djava.io.tmpdir=$PWD/tmp
>> > org.apache.tez.runtime.task.TezChild 172.16.125.48 40257
>> > container_e08_1444837422599_6372_01_000259
>> > application_1444837422599_6372 1
>> >
>> > 1>/u/log/hadoop-yarn/container/application_1444837422599_6372/container_e08_1444837422599_6372_01_000259/stdout
>> >
>> > 2>/u/log/hadoop-yarn/container/application_1444837422599_6372/container_e08_1444837422599_6372_01_000259/stderr
>> > "
>> > ...
>> >
>> > If I remove "-classpath
>> >
>> > /u/cloudera/parcels/CDH/jars/hadoop-mapreduce-client-common-2.6.0-cdh5.4.1.jar",
>> > I get a successful class load.
>> >
>> > Doing things like adding :$CLASSPATH to the aux.jars.prefix doesn't
>> > seem to work.  Ideas?
>> >
>> > On Thu, Oct 15, 2015 at 4:24 PM, Gopal Vijayaraghavan
>> > <[email protected]> wrote:
>> >>
>> >>
>> >>>I'm convinced this is a hive issue, but I'm sending it here because
>> >>>you folks might have a good idea on what the issue is.  It appears
>> >>>that the tez package from hdfs is not being localized when children
>> >>>are spun up.  The AM does work.
>> >>
>> >> I think the AM working + tasks not working needs you to get the yarn
>> >> executor and check it.
>> >>
>> >> You need to set yarn.nodemanager.delete.debug-delay-sec=600 & restart
>> >> node
>> >> managers.
>> >>
>> >> Then you've got 10 minutes to ssh into the node where the task failed
>> >> to
>> >> read the container launcher shell script.
>> >>
>> >> In general, it's the missing classpath entry for the tez.tar.gz (which
>> >> untars into a directory).
>> >>
>> >> The debug delay will let you some way to look into the error beyond the
>> >> single error message.
>> >>
>> >>>Yet... this works for every other execution of tez.  Is there
>> >>>something I could look into here?  I could in theory populate all
>> >>>nodes with the tez libraries, but I feel like that would just lead me
>> >>>down a bad path.  Suggestions?
>> >>
>> >> As a temporary workaround, you can give up on rolling upgrades & untar
>> >> the
>> >> tarball onto the HDFS tez lib uris.
>> >>
>> >>
>> >> <property>
>> >> <name>tez.lib.uris</name>
>> >>
>> >> <value>${fs.default.name}/apps/tez-0.7/,${fs.default.name}/apps/tez-0.7/lib
>> >> </value>
>> >> </property>
>> >>
>> >> Cheers,
>> >>
>> >> Gopal
>> >>
>> >>
>
>

Reply via email to