Fang-Yu Rao has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/14975 )

Change subject: IMPALA-9154: Make runtime filter propagation asynchronous
......................................................................


Patch Set 2:

(1 comment)

> Patch Set 1:
>
> (9 comments)
>
> You've got several formatting issues. Instead of me pointing all of them out, 
> please run clang-format

Hi all, I have some thoughts on the locking protocol when 
Coordinator::ReleaseExecResources() is called. Please also let me know if you 
have other ideas. Thanks!

http://gerrit.cloudera.org:8080/#/c/14975/2/be/src/runtime/coordinator.cc
File be/src/runtime/coordinator.cc:

http://gerrit.cloudera.org:8080/#/c/14975/2/be/src/runtime/coordinator.cc@1050
PS2, Line 1050:   for (auto& filter : filter_routing_table_->id_to_filter) {
              :     filter.second.WaitForPublishFilter();
              :   }
              :
              :   for (auto& filter : filter_routing_table_->id_to_filter) {
              :     FilterState* state = &filter.second;
              :     state->DisableAndRelease(filter_mem_tracker_);
              :   }
I have noticed that this patch failed the core tests in the ASAN build due to 
some heap-use-after-free error.

After some initial thoughts, I think this two for-loops should not be 
separated. Instead, inside WaitForPublishFilter(), after acquiring the lock for 
a FilterState 'state', we should immediately call state->DisableAndRelease() to 
prevent the related memory from being accessed later in UpdateFilter() or 
ApplyUpdate().

This scenario above seems possible when there are more than one instances of 
FilterState in 'filter_routing_table_' because it is possible that 
'num_inflight_rpcs_' for some FilterState was 0 when its WaitForPublishFilter() 
was called but later on when we are checking the next FilterState, there could 
be another thread that updates 'num_inflight_rpcs_' of the previous FilterState 
and tries to access some memory associated with the previous FilterState.

I do not have a concrete proof about my theory above since there is not enough 
information in the log. But I will first try to combine these two for-loops in 
a proper way and run the core tests in the ASAN build again.



--
To view, visit http://gerrit.cloudera.org:8080/14975
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ifb6726d349be701f3a0602b2ad5a934082f188a0
Gerrit-Change-Number: 14975
Gerrit-PatchSet: 2
Gerrit-Owner: Fang-Yu Rao <[email protected]>
Gerrit-Reviewer: Bikramjeet Vig <[email protected]>
Gerrit-Reviewer: Fang-Yu Rao <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Thomas Tauber-Marshall <[email protected]>
Gerrit-Reviewer: Tim Armstrong <[email protected]>
Gerrit-Comment-Date: Fri, 10 Jan 2020 18:06:15 +0000
Gerrit-HasComments: Yes

Reply via email to