Henry Robinson has uploaded a new change for review. http://gerrit.cloudera.org:8080/4865
Change subject: IMPALA-4348 / IMPALA-4333: Improve coordinator fragment cancellation ...................................................................... IMPALA-4348 / IMPALA-4333: Improve coordinator fragment cancellation This patch fixes two problems: 1. If a fragment instance is cancelled (by the coordinator) concurrently to the coordinator calling GetNext() on its PlanRootSink, GetNext() may access a destroyed PlanRootSink object and crash (IMPALA-4333). 2. If ExecPlanFragment() times out, but is successful, the coordinator will never call CloseConsumer() to release its side of the PlanRootSink, and the fragment instance will never finish (IMPALA-4348). The following changes address this: * Make PlanFragmentExecutor::Cancel() call CloseConsumer() on its root sink. This ensures that, no matter what the reason, the fragment instance will eventually terminate. * Ensure that it is safe to call CloseConsumer() concurrently to PlanRootSink::GetNext(). * Ensure that the PlanRootSink has a lifetime at least as long as the coordinator object by taking a shared_ptr reference to its parent FragmentExecState object. Even if the fragment instance finishes, the object will still be valid until the coordinator releases this reference. In the future, we expect this to be replaced by an explicit reference take / relinquish API. * Ensure the coordinator tries to cancel fragment instances if any attempt was made to start them, not just if the RPC was a success. This deals with the timeout case where the RPC was slow, but successful. This patch relies on a fix for IMPALA-4383, which ensures that fragment instances will always try to send reports, and so will always detect an error and cancel themselves. Testing: Ran TestRPCTimeout.test_execplanfragment_timeout in a loop for 90 minutes. Patch addresses hangs and failure modes that were observed previously. Change-Id: I5b4e25c1d658b3929182ba5e56b5c5e881dd394a --- M be/src/exec/plan-root-sink.cc M be/src/exec/plan-root-sink.h M be/src/runtime/coordinator.cc M be/src/runtime/coordinator.h M be/src/runtime/plan-fragment-executor.cc 5 files changed, 37 insertions(+), 23 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/65/4865/1 -- To view, visit http://gerrit.cloudera.org:8080/4865 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newchange Gerrit-Change-Id: I5b4e25c1d658b3929182ba5e56b5c5e881dd394a Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Henry Robinson <[email protected]>
