Thomas Tauber-Marshall has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/13882 )

Change subject: IMPALA-7984: Port runtime filter from Thrift RPC to KRPC
......................................................................


Patch Set 18:

(31 comments)

Thanks so much for all of your work on this! Its getting pretty close, most of 
these comments are cosmetic

One thing: instead of me going through and marking any formatting issues, 
please run clang-format and take its suggestions. Let me know if you need help 
with this. Keep in mind that clang-format is just a suggestion and you should:
- keep formatting consistent with the rest of the surrounding file or with 
things that you know are impala's guidelines, even if clang-format disagrees
- avoid  changing lines that otherwise wouldn't be affected by your patch just 
to fix the formatting, eg. you don't need to re-arrange includes that were 
already present even if clang-format says to

http://gerrit.cloudera.org:8080/#/c/13882/18/be/src/runtime/coordinator-backend-state.h
File be/src/runtime/coordinator-backend-state.h:

http://gerrit.cloudera.org:8080/#/c/13882/18/be/src/runtime/coordinator-backend-state.h@42
PS18, Line 42: namespace kudu {
             : namespace rpc {
             : class RpcController;
             : } // namespace rpc
             : } // namespace kudu
Is this needed? rpc_controller.h is being imported above already


http://gerrit.cloudera.org:8080/#/c/13882/18/be/src/runtime/coordinator-backend-state.cc
File be/src/runtime/coordinator-backend-state.cc:

http://gerrit.cloudera.org:8080/#/c/13882/18/be/src/runtime/coordinator-backend-state.cc@551
PS18, Line 551: DCHECK
DCHECK_EQ?


http://gerrit.cloudera.org:8080/#/c/13882/18/be/src/runtime/coordinator-backend-state.cc@570
PS18, Line 570: kudu::Status rpc_status;
              :   rpc_status = proxy->PublishFilter(rpc_params, &res, 
&controller);
This can all be one line


http://gerrit.cloudera.org:8080/#/c/13882/18/be/src/runtime/coordinator-backend-state.cc@572
PS18, Line 572:   if (!rpc_status.ok()) {
In addition to rpc_status, which just reflects whether the rpc actually got to 
the other machine and was responded to, there's also 
PublishFilterResultPB::status, which reflects whether the operation itself was 
successful, so we should add another 'if' that checks that and logs if its an 
error.


http://gerrit.cloudera.org:8080/#/c/13882/18/be/src/runtime/coordinator.cc
File be/src/runtime/coordinator.cc:

http://gerrit.cloudera.org:8080/#/c/13882/18/be/src/runtime/coordinator.cc@994
PS18, Line 994: FilterState* state;
I don't think there's any reason to move this here


http://gerrit.cloudera.org:8080/#/c/13882/18/be/src/runtime/coordinator.cc@1013
PS18, Line 1013:     // Check if the filter has already been sent, which could 
happen in four cases:
I think we need another case in this comment for 'failed to get bloom filter 
sidecar'


http://gerrit.cloudera.org:8080/#/c/13882/18/be/src/runtime/coordinator.cc@1070
PS18, Line 1070:       bs->PublishFilter(rpc_params, controller);
Could you add a TODO: make this asynchronous


http://gerrit.cloudera.org:8080/#/c/13882/18/be/src/runtime/coordinator.cc@1083
PS18, Line 1083: void 
Coordinator::SetupSidecarForBloomFilter(PublishFilterParamsPB* rpc_params,
This function is unnecessary, you can just call AddSidecar directly


http://gerrit.cloudera.org:8080/#/c/13882/18/be/src/runtime/coordinator.cc@1089
PS18, Line 1089: (
extra parentheses


http://gerrit.cloudera.org:8080/#/c/13882/18/be/src/runtime/coordinator.cc@1115
PS18, Line 1115: // If the incoming Bloom filter is neither an always true 
filter nor an
               :       // always false filter, then it must be the case that a 
non-empty sidecar slice
               :       // has been received. Refer to BloomFilter::ToProtobuf() 
for further details.
Comment is outdated.

Thinking through the cases here, I think that this all works correctly, but its 
a little weird.

In particular, in the case where params->bloom_filter() is always_false:
- if bloom_filter_ is also always_false, then we'll TryConsume(0), do a swap() 
that doesn't actually change the value of bloom_filter_ since its still 
always_false, and then re-instantiate bloom_filter_directory_ to an empty string
- if bloom_filter_ is not always_false, then we'll call Or(), which will just 
return immediately without doing anything

This is how it worked before, so it would be fine to leave it, but I think that 
we can make this clearer by changing the 'else' on line 1109 to be 'else 
(!params->bloom_filter().always_false)' and skip all the extra work in that 
case. Then we can make the 'if' here a 
DCHECK(params->bloom_filter().has_directory_sidecar_idx()), and people won't 
have to reason through what the code below would do in the case that 
sidecar_slice doesn't get filled in.

You'll of course also want to add appropriate comments (and you should think it 
through and make sure my reasoning is correct)


http://gerrit.cloudera.org:8080/#/c/13882/18/be/src/runtime/runtime-filter-bank.h
File be/src/runtime/runtime-filter-bank.h:

http://gerrit.cloudera.org:8080/#/c/13882/18/be/src/runtime/runtime-filter-bank.h@147
PS18, Line 147:   /// Lock protecting those BloomFilter's in 'bloom_filters_' 
from being
Lets move this somewhere further down, so that runtime_filter_lock_ is next to 
the things it protects. Also rename 'num_krpcs_counter_lock_' to 
'num_inflight_rpcs_lock_', change the comment above it to just say that it 
protects 'num_inflight_rpcs_' and also that it shouldn't be taken at the same 
time as 'runtime_filter_lock_'.

Then put a comment above num_inflight_rpcs_ explaining what its used for (as 
mentioned elsewhere, you can just use your existing comment from the cc file)


http://gerrit.cloudera.org:8080/#/c/13882/18/be/src/runtime/runtime-filter-bank.h@194
PS18, Line 194:   void UpdateFilterCompleteCb(const kudu::rpc::RpcController* 
rpc_controller);
brief comment


http://gerrit.cloudera.org:8080/#/c/13882/18/be/src/runtime/runtime-filter-bank.cc
File be/src/runtime/runtime-filter-bank.cc:

http://gerrit.cloudera.org:8080/#/c/13882/18/be/src/runtime/runtime-filter-bank.cc@41
PS18, Line 41: #include "service/control-service.h"
I don't think you need this (or probably a couple of the other imports you 
added here)


http://gerrit.cloudera.org:8080/#/c/13882/18/be/src/runtime/runtime-filter-bank.cc@112
PS18, Line 112: void RuntimeFilterBank::UpdateFilterCompleteCb(
I think we'll also want to pass in the UpdateFilterResultPB* here by inclusing 
it in the bind() below, so that we can check it's 'status'. I think in the 
current code in data-stream-service that it can only be OK, so its fine to 
DCHECK on it.


http://gerrit.cloudera.org:8080/#/c/13882/18/be/src/runtime/runtime-filter-bank.cc@190
PS18, Line 190: krpc_address, krpc_address.hostname
This is wrong. See https://gerrit.cloudera.org/#/c/13818/


http://gerrit.cloudera.org:8080/#/c/13882/18/be/src/runtime/runtime-filter-bank.cc@199
PS18, Line 199:  // Use 'num_inflight_rpcs_' to keep track of the number of 
current in-flight
              :     // KRPC calls to prevent the memory pointed to by 
'bloom_filter' being
              :     // deallocated in RuntimeFilterBank::Close() before all 
KRPC calls have
              :     // been completed.
Lets move this to above the definition of num_inflight_rpcs_ in the header


http://gerrit.cloudera.org:8080/#/c/13882/18/be/src/runtime/runtime-filter-bank.cc@237
PS18, Line 237:       if (params->bloom_filter().has_directory_sidecar_idx()) {
Maybe add en else{} that just DCHECKs that params->bloom_filter().always_false 
is true, since that's the only reason this shouldn't be set.


http://gerrit.cloudera.org:8080/#/c/13882/18/be/src/runtime/runtime-filter-bank.cc@242
PS18, Line 242: LOG(ERROR)
Probably WARNING, since we're not considering this a reason to fail the query


http://gerrit.cloudera.org:8080/#/c/13882/18/be/src/util/bloom-filter-test.cc
File be/src/util/bloom-filter-test.cc:

http://gerrit.cloudera.org:8080/#/c/13882/18/be/src/util/bloom-filter-test.cc@368
PS18, Line 368: (NULL
nullptr


http://gerrit.cloudera.org:8080/#/c/13882/18/be/src/util/bloom-filter.h
File be/src/util/bloom-filter.h:

http://gerrit.cloudera.org:8080/#/c/13882/18/be/src/util/bloom-filter.h@30
PS18, Line 30: #include "gen-cpp/control_service.pb.h"
I don't think that you need this


http://gerrit.cloudera.org:8080/#/c/13882/18/be/src/util/bloom-filter.h@38
PS18, Line 38: class Slice;
This isn't used


http://gerrit.cloudera.org:8080/#/c/13882/18/be/src/util/bloom-filter.h@95
PS18, Line 95: Status Init(const BloomFilterPB& protobuf, const std::string& 
directory);
We can eliminate this version, since its basically exactly the same as the one 
below. Just have the code that calls this instead call .data() and .size() on 
the string when passing it in.


http://gerrit.cloudera.org:8080/#/c/13882/18/be/src/util/bloom-filter.h@106
PS18, Line 106: /// Also uses a sidecar buffer to instantiate an outbound 
sidecar, which in turn
              :   /// is passed to 'controller' to get a sidecar index which 
will be used on the client
              :   /// side to retrieve the contents stored in the sidecar 
buffer.
I think this can be simplified to something more like "Also sets a sidecar on 
'controller' containing the bloom filter's directory"


http://gerrit.cloudera.org:8080/#/c/13882/18/be/src/util/bloom-filter.h@125
PS18, Line 125: /// 'directory_in' is the string associated with the 
BloomFilterPB 'in' and
              :   /// 'directory_out' is the string associated with the 
BloomFilterPB 'out'.
This can be eliminated, its fairly obvious from the names


http://gerrit.cloudera.org:8080/#/c/13882/18/be/src/util/bloom-filter.h@223
PS18, Line 223: /// If the first argument is NULL, it is interpreted as a 
complete filter which
              :   /// contains all elements.
              :   /// It also uses a sidecar buffer to instantiate an outbound 
sidecar, which in turn
              :   /// is passed to 'controller' to get a sidecar index which 
will be used on the
              :   /// receiving side to retrieve the contents stored in the 
sidecar buffer.
This can be eliminated (its covered by the comment for the public ToProtobuf 
above)


http://gerrit.cloudera.org:8080/#/c/13882/18/be/src/util/bloom-filter.h@229
PS18, Line 229: void AddSidecar(BloomFilterPB* protobuf,
              :       kudu::rpc::RpcController* controller) const;
I don't think this is actually used anywhere


http://gerrit.cloudera.org:8080/#/c/13882/18/be/src/util/bloom-filter.h@231
PS18, Line 231: AddSidecar
It might be more clear to name this something like AddDirectorySidecar(), and 
also add a brief comment saying what it does, including noting any requirements 
this function has (eg. it looks like you require that the filter is neither 
always_true nor always_false)

And of course, when you remove the 'friend' below it'll need to be made public


http://gerrit.cloudera.org:8080/#/c/13882/18/be/src/util/bloom-filter.h@247
PS18, Line 247: friend class Coordinator;
Remove this.

As a general rule, we avoid using 'friend' because it messes up encapsulation. 
The main exception is for tests, where encapsulation doesn't matter as much.


http://gerrit.cloudera.org:8080/#/c/13882/18/be/src/util/bloom-filter.cc
File be/src/util/bloom-filter.cc:

http://gerrit.cloudera.org:8080/#/c/13882/18/be/src/util/bloom-filter.cc@92
PS18, Line 92:     unsigned long directory_size) {
Look like this function requires that the filter is not always_false. You 
should add a DCHECK for that, and also for any other requirements, eg. that 
protobuf != nullptr


http://gerrit.cloudera.org:8080/#/c/13882/18/be/src/util/bloom-filter.cc@249
PS18, Line 249: (
I think you got an unnecessary pair of parentheses here and around 
directory_out on this line, and also in several places on the lines below. 
Makes it a better hard to read what's going on


http://gerrit.cloudera.org:8080/#/c/13882/18/be/src/util/bloom-filter.cc@257
PS18, Line 257:     __m128i* simd_out = 
reinterpret_cast<__m128i*>(&((*(directory_out))[0]));
Might as well leave this where it was, to keep the comment next to the loop its 
referring to



--
To view, visit http://gerrit.cloudera.org:8080/13882
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I6b394796d250286510e157ae326882bfc01d387a
Gerrit-Change-Number: 13882
Gerrit-PatchSet: 18
Gerrit-Owner: Fang-Yu Rao <[email protected]>
Gerrit-Reviewer: Fang-Yu Rao <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Michael Ho <[email protected]>
Gerrit-Reviewer: Thomas Tauber-Marshall <[email protected]>
Gerrit-Comment-Date: Mon, 09 Sep 2019 19:15:22 +0000
Gerrit-HasComments: Yes

Reply via email to