Currently, failed tasks make the JVM exit. There is no work around for that. Before we can change that we would need to be able to check the task execution is isolated such that a task failure does not end up “corrupting” the host.
Bikas *From:* Thaddeus Diamond [mailto:[email protected]] *Sent:* Wednesday, July 30, 2014 3:15 PM *To:* [email protected] *Subject:* Reusing Containers Of Failed Tasks Hi, I turned on container reuse and upped the time that containers linger after task vertex completion (tez.am.container.session.delay-allocation-millis), but I'm still having an issue. Sometimes, the Processor I created will fail due to application logic in one DAG but not the next. The trivial example is: class MyProcessor implements LogicalIOProcessor { // Other non-application logic code public void run(...) { if (new Random().nextBoolean()) { throw new FooBarBazException(); } } } In this case I don't want the task JVM to be deallocated because it was application logic that caused the failure and next time I start a DAG I will have the long JVM task startup delay. I see the following code in the source (TaskScheduler#deallocateTask(...)) that I think is the cause of this: if (!taskSucceeded || !shouldReuseContainers) { if (LOG.isDebugEnabled()) { LOG.debug("Releasing container, containerId=" + container.getId() + ", taskSucceeded=" + taskSucceeded + ", reuseContainersFlag=" + shouldReuseContainers); } releaseContainer(container.getId()); } Is this something that can be fixed in master? Or is there a workaround/conf I can set to get this working? Thanks, Thad -- CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.
