>I'm convinced this is a hive issue, but I'm sending it here because
>you folks might have a good idea on what the issue is. It appears
>that the tez package from hdfs is not being localized when children
>are spun up. The AM does work.
I think the AM working + tasks not working needs you to get the yarn
executor and check it.
You need to set yarn.nodemanager.delete.debug-delay-sec=600 & restart node
managers.
Then you've got 10 minutes to ssh into the node where the task failed to
read the container launcher shell script.
In general, it's the missing classpath entry for the tez.tar.gz (which
untars into a directory).
The debug delay will let you some way to look into the error beyond the
single error message.
>Yet... this works for every other execution of tez. Is there
>something I could look into here? I could in theory populate all
>nodes with the tez libraries, but I feel like that would just lead me
>down a bad path. Suggestions?
As a temporary workaround, you can give up on rolling upgrades & untar the
tarball onto the HDFS tez lib uris.
<property>
<name>tez.lib.uris</name>
<value>${fs.default.name}/apps/tez-0.7/,${fs.default.name}/apps/tez-0.7/lib
</value>
</property>
Cheers,
Gopal