Hi,

I turned on container reuse and upped the time that containers linger after
task vertex completion (tez.am.container.session.delay-allocation-millis),
but I'm still having an issue.  Sometimes, the Processor I created will
fail due to application logic in one DAG but not the next. The trivial
example is:

class MyProcessor implements LogicalIOProcessor {
  // Other non-application logic code
  public void run(...) {
    if (new Random().nextBoolean()) {
      throw new FooBarBazException();
    }
  }
}

In this case I don't want the task JVM to be deallocated because it was
application logic that caused the failure and next time I start a DAG I
will have the long JVM task startup delay.

I see the following code in the source (TaskScheduler#deallocateTask(...))
that I think is the cause of this:

       if (!taskSucceeded || !shouldReuseContainers) {
          if (LOG.isDebugEnabled()) {
            LOG.debug("Releasing container, containerId=" +
container.getId()
                + ", taskSucceeded=" + taskSucceeded
                + ", reuseContainersFlag=" + shouldReuseContainers);
          }
          releaseContainer(container.getId());
        }

Is this something that can be fixed in master? Or is there a
workaround/conf I can set to get this working?

Thanks,
Thad

Reply via email to