[Impala-ASF-CR] IMPALA-6338: Fix flaky test profile fragment instances

Lars Volker (Code Review) Wed, 21 Mar 2018 11:08:10 -0700

Lars Volker has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/9718 )


Change subject: IMPALA-6338: Fix flaky test_profile_fragment_instances
......................................................................


Patch Set 1:

(5 comments)

Thanks for the replies. I left some question around potential simplifications 
but the current logic looks correct to me.

http://gerrit.cloudera.org:8080/#/c/9718/1/be/src/runtime/coordinator-backend-state.h
File be/src/runtime/coordinator-backend-state.h:

http://gerrit.cloudera.org:8080/#/c/9718/1/be/src/runtime/coordinator-backend-state.h@119
PS1, Line 119:   boost::mutex* lock() { return &lock_; }
> Is there a reasonable way to make that work with the way that we have a lis
I don't think it will help with the iteration part, but it'll prevent mistakes 
where we forget to release the lock. During the locking phase we can just 
populate a vector of these scoped lock handles.


http://gerrit.cloudera.org:8080/#/c/9718/1/be/src/runtime/coordinator-backend-state.cc
File be/src/runtime/coordinator-backend-state.cc:

http://gerrit.cloudera.org:8080/#/c/9718/1/be/src/runtime/coordinator-backend-state.cc@237
PS1, Line 237:   return num_remaining_instances_ == 0 || (!status_.ok() && 
!status_.IsCancelled());
> Sure, that would be equivalent. I don't have a strong feeling either way ab
That makes sense, thanks for the explanation.


http://gerrit.cloudera.org:8080/#/c/9718/1/be/src/runtime/coordinator-backend-state.cc@249
PS1, Line 249:   for (const TFragmentInstanceExecStatus& instance_exec_status:
> I didn't want to add any more locking to the profile update path, which may
My main concern was making the locking protocol less complicated but I wasn't 
aware that the lock I'm looking for is shared across backends.

Looking at the calls to ComputeQuerySummary(), two of them already follow a 
call to WaitForBackendCompletion(). Can we call WaitForBackendCompletion() in 
Coordinator::Cancel(), too, before calling ComputeQuerySummary()?


http://gerrit.cloudera.org:8080/#/c/9718/1/be/src/runtime/coordinator-backend-state.cc@251
PS1, Line 251: instance_status
> Not quite, since (at least for successfully completed queries where
 > fragments are cancelled because returned_all_results_ becomes true)
 > BackendState::status_ is only updated as a result of these status
 > reports, so the first time an instance reports CANCELLED, status_
 > won't already be CANCELLED.

Can the coordinator cancel the backend_states in this case before sending out 
the cancellations?

 >
 > What you would need is access to Coordinator::query_status_, which
 > we could pass into ApplyExecStatusReport(), except that accessing
 > it requires taking Coordinator::lock_, which of course we don't
 > want to hold for the duration of ApplyExecStatusReport().
 >
 > So, I think that what we could do is: in UpdateBackendExecStatus,
 > take Coordinator::lock_ and make a copy of Coordinator::query_status_.
 > Release lock_ and pass the copy into ApplyExecStatusReport(), along
 > with a copy of the value of returned_all_results_. We can then say
 > something like:
 > DCHECK(!instance_status.IsCancelled() || (!query_status_.ok() ||
 > returned_all_results_))


http://gerrit.cloudera.org:8080/#/c/9718/1/be/src/runtime/coordinator-backend-state.cc@314
PS1, Line 314:   return IsDone();
> Right, that's the intended behavior.
Thx for the explanation.



--
To view, visit http://gerrit.cloudera.org:8080/9718
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I2f8295e4ec580b476f7668c77aec1348fb2e868c
Gerrit-Change-Number: 9718
Gerrit-PatchSet: 1
Gerrit-Owner: Thomas Tauber-Marshall <[email protected]>
Gerrit-Reviewer: Lars Volker <[email protected]>
Gerrit-Reviewer: Thomas Tauber-Marshall <[email protected]>
Gerrit-Comment-Date: Wed, 21 Mar 2018 18:07:56 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-6338: Fix flaky test profile fragment instances

Reply via email to