Thanks.  Created https://issues.apache.org/jira/browse/TEZ-1369 and
uploaded a patch.


On Sat, Aug 2, 2014 at 3:33 PM, Bikas Saha <[email protected]> wrote:

> Session min held containers was orthogonal to your main issue about failed
> task causing containers to get lost.
>
>
>
> It was more of a suggestion to your use case of maintaining an allocated
> session pool for low latency. Min held containers will maintain that
> minimum pool of containers (best effort) that is distributed evenly across
> your cluster (best effort) such that subsequent DAGs are assured of some
> min capacity.
>
>
>
> For your failed task to not fail the container, that would still need
> minor code change in Tez to add a config to change that behavior. Please
> feel free to create a jira and if possible provide a patch.
>
>
>
> Bikas
>
>
>
> *From:* Thaddeus Diamond [mailto:[email protected]]
> *Sent:* Friday, August 01, 2014 8:54 PM
>
> *To:* [email protected]
> *Subject:* Re: Reusing Containers Of Failed Tasks
>
>
>
> Okay, so I built the source and used the target JARs to compile my
> project, but I'm not seeing any improvement in the behavior.  What is the
> expected behavior if I set the session min held containers property?  It
> still doesn't start up the containers on session start and the failed
> containers still get shut down.  Thoughts?
>
>
>
> On Fri, Aug 1, 2014 at 3:43 PM, Thaddeus Diamond <
> [email protected]> wrote:
>
> Okay.  Is there a place I can get the latest JARs to compile my code
> against?  I need this and other configurations for development but the
> latest maven central artifacts are 0.4.1-incubating.  Don't worry about
> being unstable, I'm still in development with this project.
>
>
>
> On Fri, Aug 1, 2014 at 1:41 PM, Bikas Saha <[email protected]> wrote:
>
> Warning. Master is tracking the 0.5 API stability release. Hence
> transferring to master would mean work. But your code would be a lot
> cleaner. Master is expected to be unstable until next week or so.
>
>
>
> Bikas
>
>
>
> *From:* Thaddeus Diamond [mailto:[email protected]]
> *Sent:* Wednesday, July 30, 2014 9:27 PM
>
>
> *To:* [email protected]
> *Subject:* Re: Reusing Containers Of Failed Tasks
>
>
>
> Nevermind, I was not on master.  I'll investigate that.
>
>
>
> Thanks!
>
>
>
> On Thu, Jul 31, 2014 at 12:14 AM, Thaddeus Diamond <
> [email protected]> wrote:
>
> I don't see that setting in TezConfiguration.java.  Do you happen to know
> it offhand?
>
>
>
> On Thu, Jul 31, 2014 at 12:10 AM, Bikas Saha <[email protected]>
> wrote:
>
> There is no workaround without code change in Tez.
>
>
>
> The simplest code change would be to make this behavior configurable and
> have the current behavior as default.
>
>
>
> Btw, you can also try the session min held containers configuration that
> was recently added. This ensures that your session will retain some minimum
> resources. You can use the session min/max timeouts to decay excess
> containers.
>
>
>
> Bikas
>
>
>
> *From:* Thaddeus Diamond [mailto:[email protected]]
> *Sent:* Wednesday, July 30, 2014 8:51 PM
> *To:* [email protected]
> *Subject:* Re: Reusing Containers Of Failed Tasks
>
>
>
> I see.  Is there a manual workaround you suggest for this?
>
>
>
> The motivation is this: I have an application with low latency and max
> concurrency SLAs.  The way we are trying to solve this with Tez is to keep
> an application-level pool of Tez sessions and configure each to have
> long-lived containers.  When users submit DAGs the application grabs an
> idle Tez session from the pool and submits to that one. After the DAG
> completes (successful or not) it is returned to the pool in an idle state.
>
>
>
> If a session gets returned to the pool but no containers are spun up in it
> because the DAG failed, I will fail to meet my SLAs on the next DAG
> submission.
>
>
>
> On Wed, Jul 30, 2014 at 8:05 PM, Bikas Saha <[email protected]> wrote:
>
> Currently, failed tasks make the JVM exit. There is no work around for
> that. Before we can change that we would need to be able to check the task
> execution is isolated such that a task failure does not end up “corrupting”
> the host.
>
>
>
> Bikas
>
>
>
> *From:* Thaddeus Diamond [mailto:[email protected]]
> *Sent:* Wednesday, July 30, 2014 3:15 PM
> *To:* [email protected]
> *Subject:* Reusing Containers Of Failed Tasks
>
>
>
> Hi,
>
>
>
> I turned on container reuse and upped the time that containers linger
> after task vertex completion
> (tez.am.container.session.delay-allocation-millis), but I'm still having an
> issue.  Sometimes, the Processor I created will fail due to application
> logic in one DAG but not the next. The trivial example is:
>
>
>
> class MyProcessor implements LogicalIOProcessor {
>
>   // Other non-application logic code
>
>   public void run(...) {
>
>     if (new Random().nextBoolean()) {
>
>       throw new FooBarBazException();
>
>     }
>
>   }
>
> }
>
>
>
> In this case I don't want the task JVM to be deallocated because it was
> application logic that caused the failure and next time I start a DAG I
> will have the long JVM task startup delay.
>
>
>
> I see the following code in the source (TaskScheduler#deallocateTask(...))
> that I think is the cause of this:
>
>
>
>        if (!taskSucceeded || !shouldReuseContainers) {
>
>           if (LOG.isDebugEnabled()) {
>
>             LOG.debug("Releasing container, containerId=" +
> container.getId()
>
>                 + ", taskSucceeded=" + taskSucceeded
>
>                 + ", reuseContainersFlag=" + shouldReuseContainers);
>
>           }
>
>           releaseContainer(container.getId());
>
>         }
>
>
>
> Is this something that can be fixed in master? Or is there a
> workaround/conf I can set to get this working?
>
>
>
> Thanks,
>
> Thad
>
>
> CONFIDENTIALITY NOTICE
> NOTICE: This message is intended for the use of the individual or entity
> to which it is addressed and may contain information that is confidential,
> privileged and exempt from disclosure under applicable law. If the reader
> of this message is not the intended recipient, you are hereby notified that
> any printing, copying, dissemination, distribution, disclosure or
> forwarding of this communication is strictly prohibited. If you have
> received this communication in error, please contact the sender immediately
> and delete it from your system. Thank You.
>
>
>
>
> CONFIDENTIALITY NOTICE
> NOTICE: This message is intended for the use of the individual or entity
> to which it is addressed and may contain information that is confidential,
> privileged and exempt from disclosure under applicable law. If the reader
> of this message is not the intended recipient, you are hereby notified that
> any printing, copying, dissemination, distribution, disclosure or
> forwarding of this communication is strictly prohibited. If you have
> received this communication in error, please contact the sender immediately
> and delete it from your system. Thank You.
>
>
>
>
>
>
> CONFIDENTIALITY NOTICE
> NOTICE: This message is intended for the use of the individual or entity
> to which it is addressed and may contain information that is confidential,
> privileged and exempt from disclosure under applicable law. If the reader
> of this message is not the intended recipient, you are hereby notified that
> any printing, copying, dissemination, distribution, disclosure or
> forwarding of this communication is strictly prohibited. If you have
> received this communication in error, please contact the sender immediately
> and delete it from your system. Thank You.
>
>
>
>
>
> CONFIDENTIALITY NOTICE
> NOTICE: This message is intended for the use of the individual or entity
> to which it is addressed and may contain information that is confidential,
> privileged and exempt from disclosure under applicable law. If the reader
> of this message is not the intended recipient, you are hereby notified that
> any printing, copying, dissemination, distribution, disclosure or
> forwarding of this communication is strictly prohibited. If you have
> received this communication in error, please contact the sender immediately
> and delete it from your system. Thank You.
>

Reply via email to