Nevermind, I was not on master.  I'll investigate that.

Thanks!


On Thu, Jul 31, 2014 at 12:14 AM, Thaddeus Diamond <
[email protected]> wrote:

> I don't see that setting in TezConfiguration.java.  Do you happen to know
> it offhand?
>
>
> On Thu, Jul 31, 2014 at 12:10 AM, Bikas Saha <[email protected]>
> wrote:
>
>> There is no workaround without code change in Tez.
>>
>>
>>
>> The simplest code change would be to make this behavior configurable and
>> have the current behavior as default.
>>
>>
>>
>> Btw, you can also try the session min held containers configuration that
>> was recently added. This ensures that your session will retain some minimum
>> resources. You can use the session min/max timeouts to decay excess
>> containers.
>>
>>
>>
>> Bikas
>>
>>
>>
>> *From:* Thaddeus Diamond [mailto:[email protected]]
>> *Sent:* Wednesday, July 30, 2014 8:51 PM
>> *To:* [email protected]
>> *Subject:* Re: Reusing Containers Of Failed Tasks
>>
>>
>>
>> I see.  Is there a manual workaround you suggest for this?
>>
>>
>>
>> The motivation is this: I have an application with low latency and max
>> concurrency SLAs.  The way we are trying to solve this with Tez is to keep
>> an application-level pool of Tez sessions and configure each to have
>> long-lived containers.  When users submit DAGs the application grabs an
>> idle Tez session from the pool and submits to that one. After the DAG
>> completes (successful or not) it is returned to the pool in an idle state.
>>
>>
>>
>> If a session gets returned to the pool but no containers are spun up in
>> it because the DAG failed, I will fail to meet my SLAs on the next DAG
>> submission.
>>
>>
>>
>> On Wed, Jul 30, 2014 at 8:05 PM, Bikas Saha <[email protected]>
>> wrote:
>>
>> Currently, failed tasks make the JVM exit. There is no work around for
>> that. Before we can change that we would need to be able to check the task
>> execution is isolated such that a task failure does not end up “corrupting”
>> the host.
>>
>>
>>
>> Bikas
>>
>>
>>
>> *From:* Thaddeus Diamond [mailto:[email protected]]
>> *Sent:* Wednesday, July 30, 2014 3:15 PM
>> *To:* [email protected]
>> *Subject:* Reusing Containers Of Failed Tasks
>>
>>
>>
>> Hi,
>>
>>
>>
>> I turned on container reuse and upped the time that containers linger
>> after task vertex completion
>> (tez.am.container.session.delay-allocation-millis), but I'm still having an
>> issue.  Sometimes, the Processor I created will fail due to application
>> logic in one DAG but not the next. The trivial example is:
>>
>>
>>
>> class MyProcessor implements LogicalIOProcessor {
>>
>>   // Other non-application logic code
>>
>>   public void run(...) {
>>
>>     if (new Random().nextBoolean()) {
>>
>>       throw new FooBarBazException();
>>
>>     }
>>
>>   }
>>
>> }
>>
>>
>>
>> In this case I don't want the task JVM to be deallocated because it was
>> application logic that caused the failure and next time I start a DAG I
>> will have the long JVM task startup delay.
>>
>>
>>
>> I see the following code in the source
>> (TaskScheduler#deallocateTask(...)) that I think is the cause of this:
>>
>>
>>
>>        if (!taskSucceeded || !shouldReuseContainers) {
>>
>>           if (LOG.isDebugEnabled()) {
>>
>>             LOG.debug("Releasing container, containerId=" +
>> container.getId()
>>
>>                 + ", taskSucceeded=" + taskSucceeded
>>
>>                 + ", reuseContainersFlag=" + shouldReuseContainers);
>>
>>           }
>>
>>           releaseContainer(container.getId());
>>
>>         }
>>
>>
>>
>> Is this something that can be fixed in master? Or is there a
>> workaround/conf I can set to get this working?
>>
>>
>>
>> Thanks,
>>
>> Thad
>>
>>
>> CONFIDENTIALITY NOTICE
>> NOTICE: This message is intended for the use of the individual or entity
>> to which it is addressed and may contain information that is confidential,
>> privileged and exempt from disclosure under applicable law. If the reader
>> of this message is not the intended recipient, you are hereby notified that
>> any printing, copying, dissemination, distribution, disclosure or
>> forwarding of this communication is strictly prohibited. If you have
>> received this communication in error, please contact the sender immediately
>> and delete it from your system. Thank You.
>>
>>
>>
>> CONFIDENTIALITY NOTICE
>> NOTICE: This message is intended for the use of the individual or entity
>> to which it is addressed and may contain information that is confidential,
>> privileged and exempt from disclosure under applicable law. If the reader
>> of this message is not the intended recipient, you are hereby notified that
>> any printing, copying, dissemination, distribution, disclosure or
>> forwarding of this communication is strictly prohibited. If you have
>> received this communication in error, please contact the sender immediately
>> and delete it from your system. Thank You.
>>
>
>

Reply via email to