Thomas Tauber-Marshall has posted comments on this change. ( http://gerrit.cloudera.org:8080/14975 )
Change subject: IMPALA-9154: Make runtime filter propagation asynchronous ...................................................................... Patch Set 2: (9 comments) http://gerrit.cloudera.org:8080/#/c/14975/2/be/src/runtime/coordinator-backend-state.h File be/src/runtime/coordinator-backend-state.h: http://gerrit.cloudera.org:8080/#/c/14975/2/be/src/runtime/coordinator-backend-state.h@21 PS2, Line 21: #include <condition_variable> I don't think you need this here http://gerrit.cloudera.org:8080/#/c/14975/2/be/src/runtime/coordinator-backend-state.cc File be/src/runtime/coordinator-backend-state.cc: http://gerrit.cloudera.org:8080/#/c/14975/2/be/src/runtime/coordinator-backend-state.cc@593 PS2, Line 593: if (state->num_inflight_rpcs() == 0) { Its a little subtle why this is correct - since we hold FilterState::lock_ while we issue all of the PublishFilter rpcs, for us to be able to get the lock again here on line 591 means that all PublishFilters that will be issued for this FilterState must have already been issued, this filter has been disabled, and if we reach 0 we know that there won't be any more PublishFilters issued. Can you add a DCHECK(state->disabled()) and a brief comment mentioning this? http://gerrit.cloudera.org:8080/#/c/14975/2/be/src/runtime/coordinator-filter-state.h File be/src/runtime/coordinator-filter-state.h: http://gerrit.cloudera.org:8080/#/c/14975/2/be/src/runtime/coordinator-filter-state.h@58 PS2, Line 58: /// Once a filter is disabled, subsequent updates for that filter are ignored. We should document the thread safety of this class now, eg. if you take my advice from some of my other comments it will be something like "This class is not thread safe. Callers must always take 'lock()' themselves when calling any FilterState functions if thread safety is needed" http://gerrit.cloudera.org:8080/#/c/14975/2/be/src/runtime/coordinator-filter-state.h@92 PS2, Line 92: get_lock() just 'lock()' http://gerrit.cloudera.org:8080/#/c/14975/2/be/src/runtime/coordinator-filter-state.h@161 PS2, Line 161: update_filter_done_cv_ publish_filter_done_cv_ http://gerrit.cloudera.org:8080/#/c/14975/2/be/src/runtime/coordinator-filter-state.h@178 PS2, Line 178: SpinLock update_lock; This isn't used anywhere anymore, right? http://gerrit.cloudera.org:8080/#/c/14975/2/be/src/runtime/coordinator.h File be/src/runtime/coordinator.h: http://gerrit.cloudera.org:8080/#/c/14975/2/be/src/runtime/coordinator.h@100 PS2, Line 100: /// Lock ordering: (lower-numbered acquired before higher-numbered) Please update this comment to reflect the new protocol - what's listed here as filter_lock_ was replaced by FilterRoutingTable::lock in a previous patch so you can go ahead and update that too, and filter_update_lock_ has been replaced with FilterState::lock_ (and of course check that we are in fact following the ordering shown here for those locks) http://gerrit.cloudera.org:8080/#/c/14975/2/be/src/runtime/coordinator.cc File be/src/runtime/coordinator.cc: http://gerrit.cloudera.org:8080/#/c/14975/2/be/src/runtime/coordinator.cc@1050 PS2, Line 1050: for (auto& filter : filter_routing_table_->id_to_filter) { : filter.second.WaitForPublishFilter(); : } : : for (auto& filter : filter_routing_table_->id_to_filter) { : FilterState* state = &filter.second; : state->DisableAndRelease(filter_mem_tracker_); : } > I have noticed that this patch failed the core tests in the ASAN build due Yes - you definitely have to do WaitForPublishFilter() and DisableAndRelease() atomically with respect to FilterState::lock_ somehow, since otherwise after caling WaitForPublishFilter() you could get another call to UpdateFilter() which will attempt to make PublishFilter rpcs even though we think all the PublishFilter rpcs for this FilterState are done. Possibly the most straight forward way of doing that (if you follow my other comments) is to not have WaitForPublishFilter() take FilterState::lock_, but instead take FilterState::get_lock() ourselves here in the 'for' loop and hold it while calling both WaitForPublishFilter() and DisableAndRelease() http://gerrit.cloudera.org:8080/#/c/14975/2/be/src/runtime/coordinator.cc@1315 PS2, Line 1315: unique_lock<SpinLock> l(lock_); I think that this is the only place where you take FilterState::lock_ from within a FilterState function instead of using FilterState::get_lock(). Of course, usually that sort of thing is better encapsulation, but there isn't really a good way to avoid having a FilterState::get_lock() function, eg. because its needed in Coordinator::UpdateFilter, so I think it might be better to just be consistent and always require that callers take FilterState::get_lock() themselves before calling any FilterState functions. -- To view, visit http://gerrit.cloudera.org:8080/14975 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ifb6726d349be701f3a0602b2ad5a934082f188a0 Gerrit-Change-Number: 14975 Gerrit-PatchSet: 2 Gerrit-Owner: Fang-Yu Rao <[email protected]> Gerrit-Reviewer: Bikramjeet Vig <[email protected]> Gerrit-Reviewer: Fang-Yu Rao <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Thomas Tauber-Marshall <[email protected]> Gerrit-Reviewer: Tim Armstrong <[email protected]> Gerrit-Comment-Date: Fri, 10 Jan 2020 20:09:56 +0000 Gerrit-HasComments: Yes
