Tez jobs on YARN failing sporadically..

Gautam Tue, 28 Jun 2016 17:59:34 -0700

Hello,

We have Tez being used for one of our main ETL workflows and have been
using it for couple months now. We recently started seeing the following
error for a query that regularly runs and hasn't been changed in any way.
It's a job that counts an hour's worth of data in a M-R-R flow. This error
happens in the Map phase. I could send more details about the job but I
don't think this is something specific to this query.


I believe this error shows up in java.util.concurrent.ThreadPoolExecutor
when the executor is overwhelmed with tasks or execute() is called while
shutting down. I'm confounded as to why this would be an issue suddenly. I
also believe this isn't Tez's fault in particular, could be YARN hitting
some limits. Which means this is prolly happening to MR jobs as well.

Have others faced this issue? If not, what should I be looking at to get
more data around this issue..

*The Error:*

Task failed, taskId=task_1466828114374_53316_1_00_000029, diagnostics=
 TaskAttempt 0 failed, info=
 Container container_e23_1466828114374_53316_01_000009 finished with
diagnostics set to
 Container failed, exitCode=-1000. Task
java.util.concurrent.ExecutorCompletionService$QueueingFuture@732af2f3
rejected from java.util.concurrent.ThreadPoolExecutor@9bf8295
 Terminated, pool size = 0, active threads = 0, queued tasks = 0, completed
tasks = 111
...
...
TaskAttempt 3 failed, info=
 Container container_e23_1466828114374_53316_01_000018 finished with
diagnostics set to
 Container failed, exitCode=-1000. Task
java.util.concurrent.ExecutorCompletionService$QueueingFuture@6c5f576
rejected from java.util.concurrent.ThreadPoolExecutor@9bf8295
 Terminated, pool size = 0, active threads = 0, queued tasks = 0, completed
tasks = 111



Vertex did not succeed due to OWN_TASK_FAILURE, failedTasks:1
killedTasks:115
Vertex vertex_1466828114374_53316_1_00
 Map 1
killed/failed due to:OWN_TASK_FAILURE



Thanks,
-Gautam.

Tez jobs on YARN failing sporadically..

Reply via email to