Jason Lowe created TEZ-3444: ------------------------------- Summary: Handling of fetch-failures should consider time spent producing output Key: TEZ-3444 URL: https://issues.apache.org/jira/browse/TEZ-3444 Project: Apache Tez Issue Type: Improvement Reporter: Jason Lowe
When handling fetch failures and deciding whether the upstream task should be re-run, we should consider the duration of the upstream task that generated the data trying to be fetched. If the upstream task ran for a long time then we may want to retry a bit harder before deciding to re-run. If the upstream task executed in a few seconds then we should probably re-run the upstream task more aggressively since that may be cheaper than multiple retries that timeout. -- This message was sent by Atlassian JIRA (v6.3.4#6332)