Hello, We have Tez being used for one of our main ETL workflows and have been using it for couple months now. We recently started seeing the following error for a query that regularly runs and hasn't been changed in any way. It's a job that counts an hour's worth of data in a M-R-R flow. This error happens in the Map phase. I could send more details about the job but I don't think this is something specific to this query.
I believe this error shows up in java.util.concurrent.ThreadPoolExecutor when the executor is overwhelmed with tasks or execute() is called while shutting down. I'm confounded as to why this would be an issue suddenly. I also believe this isn't Tez's fault in particular, could be YARN hitting some limits. Which means this is prolly happening to MR jobs as well. Have others faced this issue? If not, what should I be looking at to get more data around this issue.. *The Error:* Task failed, taskId=task_1466828114374_53316_1_00_000029, diagnostics= TaskAttempt 0 failed, info= Container container_e23_1466828114374_53316_01_000009 finished with diagnostics set to Container failed, exitCode=-1000. Task java.util.concurrent.ExecutorCompletionService$QueueingFuture@732af2f3 rejected from java.util.concurrent.ThreadPoolExecutor@9bf8295 Terminated, pool size = 0, active threads = 0, queued tasks = 0, completed tasks = 111 ... ... TaskAttempt 3 failed, info= Container container_e23_1466828114374_53316_01_000018 finished with diagnostics set to Container failed, exitCode=-1000. Task java.util.concurrent.ExecutorCompletionService$QueueingFuture@6c5f576 rejected from java.util.concurrent.ThreadPoolExecutor@9bf8295 Terminated, pool size = 0, active threads = 0, queued tasks = 0, completed tasks = 111 Vertex did not succeed due to OWN_TASK_FAILURE, failedTasks:1 killedTasks:115 Vertex vertex_1466828114374_53316_1_00 Map 1 killed/failed due to:OWN_TASK_FAILURE Thanks, -Gautam.