Nevermind, I was not on master. I'll investigate that. Thanks!
On Thu, Jul 31, 2014 at 12:14 AM, Thaddeus Diamond < [email protected]> wrote: > I don't see that setting in TezConfiguration.java. Do you happen to know > it offhand? > > > On Thu, Jul 31, 2014 at 12:10 AM, Bikas Saha <[email protected]> > wrote: > >> There is no workaround without code change in Tez. >> >> >> >> The simplest code change would be to make this behavior configurable and >> have the current behavior as default. >> >> >> >> Btw, you can also try the session min held containers configuration that >> was recently added. This ensures that your session will retain some minimum >> resources. You can use the session min/max timeouts to decay excess >> containers. >> >> >> >> Bikas >> >> >> >> *From:* Thaddeus Diamond [mailto:[email protected]] >> *Sent:* Wednesday, July 30, 2014 8:51 PM >> *To:* [email protected] >> *Subject:* Re: Reusing Containers Of Failed Tasks >> >> >> >> I see. Is there a manual workaround you suggest for this? >> >> >> >> The motivation is this: I have an application with low latency and max >> concurrency SLAs. The way we are trying to solve this with Tez is to keep >> an application-level pool of Tez sessions and configure each to have >> long-lived containers. When users submit DAGs the application grabs an >> idle Tez session from the pool and submits to that one. After the DAG >> completes (successful or not) it is returned to the pool in an idle state. >> >> >> >> If a session gets returned to the pool but no containers are spun up in >> it because the DAG failed, I will fail to meet my SLAs on the next DAG >> submission. >> >> >> >> On Wed, Jul 30, 2014 at 8:05 PM, Bikas Saha <[email protected]> >> wrote: >> >> Currently, failed tasks make the JVM exit. There is no work around for >> that. Before we can change that we would need to be able to check the task >> execution is isolated such that a task failure does not end up “corrupting” >> the host. >> >> >> >> Bikas >> >> >> >> *From:* Thaddeus Diamond [mailto:[email protected]] >> *Sent:* Wednesday, July 30, 2014 3:15 PM >> *To:* [email protected] >> *Subject:* Reusing Containers Of Failed Tasks >> >> >> >> Hi, >> >> >> >> I turned on container reuse and upped the time that containers linger >> after task vertex completion >> (tez.am.container.session.delay-allocation-millis), but I'm still having an >> issue. Sometimes, the Processor I created will fail due to application >> logic in one DAG but not the next. The trivial example is: >> >> >> >> class MyProcessor implements LogicalIOProcessor { >> >> // Other non-application logic code >> >> public void run(...) { >> >> if (new Random().nextBoolean()) { >> >> throw new FooBarBazException(); >> >> } >> >> } >> >> } >> >> >> >> In this case I don't want the task JVM to be deallocated because it was >> application logic that caused the failure and next time I start a DAG I >> will have the long JVM task startup delay. >> >> >> >> I see the following code in the source >> (TaskScheduler#deallocateTask(...)) that I think is the cause of this: >> >> >> >> if (!taskSucceeded || !shouldReuseContainers) { >> >> if (LOG.isDebugEnabled()) { >> >> LOG.debug("Releasing container, containerId=" + >> container.getId() >> >> + ", taskSucceeded=" + taskSucceeded >> >> + ", reuseContainersFlag=" + shouldReuseContainers); >> >> } >> >> releaseContainer(container.getId()); >> >> } >> >> >> >> Is this something that can be fixed in master? Or is there a >> workaround/conf I can set to get this working? >> >> >> >> Thanks, >> >> Thad >> >> >> CONFIDENTIALITY NOTICE >> NOTICE: This message is intended for the use of the individual or entity >> to which it is addressed and may contain information that is confidential, >> privileged and exempt from disclosure under applicable law. If the reader >> of this message is not the intended recipient, you are hereby notified that >> any printing, copying, dissemination, distribution, disclosure or >> forwarding of this communication is strictly prohibited. If you have >> received this communication in error, please contact the sender immediately >> and delete it from your system. Thank You. >> >> >> >> CONFIDENTIALITY NOTICE >> NOTICE: This message is intended for the use of the individual or entity >> to which it is addressed and may contain information that is confidential, >> privileged and exempt from disclosure under applicable law. If the reader >> of this message is not the intended recipient, you are hereby notified that >> any printing, copying, dissemination, distribution, disclosure or >> forwarding of this communication is strictly prohibited. If you have >> received this communication in error, please contact the sender immediately >> and delete it from your system. Thank You. >> > >
