Thomas Tauber-Marshall has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15683 )

Change subject: IMPALA-3741 [part 2]: Push runtime bloom filter to Kudu
......................................................................


Patch Set 9:

(17 comments)

http://gerrit.cloudera.org:8080/#/c/15683/9/be/CMakeLists.txt
File be/CMakeLists.txt:

http://gerrit.cloudera.org:8080/#/c/15683/9/be/CMakeLists.txt@234
PS9, Line 234:   "-DKUDU_HEADERS_NO_STUBS"
nit formatting: trailing whitespace and you can wrap this into the rest of the 
string instead of putting it on its own line


http://gerrit.cloudera.org:8080/#/c/15683/9/be/src/exec/filter-context.cc
File be/src/exec/filter-context.cc:

http://gerrit.cloudera.org:8080/#/c/15683/9/be/src/exec/filter-context.cc@91
PS9, Line 91:     //local_bloom_filter->Insert(val, expr_eval->root().type());
I guess this was left by accident?


http://gerrit.cloudera.org:8080/#/c/15683/9/be/src/exec/filter-context.cc@106
PS9, Line 106: // An example of the generated code for TPCH-Q2: RF002 -> 
n_regionkey
This needs to be updated (though of course it will only change slightly). If 
you're not sure how to get this, see: 
https://cwiki.apache.org/confluence/display/IMPALA/Codegen

(the easiest way is probably just to add a line at the end of this function 
that logs fn->dump() or whatever)


http://gerrit.cloudera.org:8080/#/c/15683/9/be/src/exec/kudu-scanner.cc
File be/src/exec/kudu-scanner.cc:

http://gerrit.cloudera.org:8080/#/c/15683/9/be/src/exec/kudu-scanner.cc@232
PS9, Line 232: }
             :       else if
nit: formatting ('else if' should go on the same line as prior '}')

Not sure if I've bugged you about clang-format before, but its a great way to 
help ensure that you're code reviews don't have these sorts of simple errors. 
See: https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=65868536

Note that clang-format is just a guide and if it's suggestion differs from 
surrounding code then you can feel free to ignore it


http://gerrit.cloudera.org:8080/#/c/15683/9/be/src/exec/kudu-scanner.cc@234
PS9, Line 234:         continue;
Brief comment, eg. 'This filter won't actually remove any rows so we don't need 
to push it down to Kudu'


http://gerrit.cloudera.org:8080/#/c/15683/9/be/src/exec/kudu-scanner.cc@236
PS9, Line 236: else
nit: it doesn't actually change how the code works, but it would be better to 
leave off this 'else' since its not actually mutually exclusive with the prior 
'if's


http://gerrit.cloudera.org:8080/#/c/15683/9/be/src/exec/kudu-scanner.cc@238
PS9, Line 238: if (filter != nullptr) {
I think we can save ourselves some duplication and an indention level by 
changing this to 'ctx.filter->HasFilter()' and moving it up to the 
'if(AlwaysTrue())'


http://gerrit.cloudera.org:8080/#/c/15683/9/be/src/exec/kudu-scanner.cc@239
PS9, Line 239: auto it = ctx.filter->filter_desc().planid_to_target_ndx.find(
             :               scan_node_->id());
             :           const TRuntimeFilterTargetDesc &target_desc =
             :               ctx.filter->filter_desc().targets[it->second];
             :           const string &col_name = target_desc.kudu_col_name;
             :           DCHECK(col_name != "");
This is duplicated below, I think we can eliminate the duplication by moving 
this up to before the 'if(is_bloom)'


http://gerrit.cloudera.org:8080/#/c/15683/9/be/src/util/bloom-filter.h
File be/src/util/bloom-filter.h:

http://gerrit.cloudera.org:8080/#/c/15683/9/be/src/util/bloom-filter.h@53
PS9, Line 53: class BloomFilterBufferAllocator;
I don't think this is necessary


http://gerrit.cloudera.org:8080/#/c/15683/9/be/src/util/bloom-filter.h@73
PS9, Line 73: BloomFilterBufferAllocator
nit: maybe name this ImpalaBloomFilterBufferAllocator to ensure there's no 
confusion


http://gerrit.cloudera.org:8080/#/c/15683/9/be/src/util/bloom-filter.h@91
PS9, Line 91:   std::shared_ptr<kudu::BlockBloomFilterBufferAllocatorIf> 
Clone() const override {
If this is something that's really only used in Kudu's internal testing and 
won't ever get hit in Impala, it might be better to just do a LOG(FATAL) here 
or something, since then you would be able to simplify the logic of 
AllocateBuffer()/Close()/etc. and not support a code path that's never getting 
hit.

Probably have to check with Bankim to ensure that's really the case, though


http://gerrit.cloudera.org:8080/#/c/15683/9/be/src/util/bloom-filter.h@111
PS9, Line 111: /// A BloomFilter stores sets of items and offers a query 
operation indicating whether or
This class comment could probably use some updating to reflect the fact that 
this class is now basically a thin wrapper around Kudu's bloom filter


http://gerrit.cloudera.org:8080/#/c/15683/9/be/src/util/bloom-filter.h@171
PS9, Line 171:   void Insert(void* val, const ColumnType& col_type) noexcept;
Is this version of Insert() used anywhere? Same with the equivalent Find() below


http://gerrit.cloudera.org:8080/#/c/15683/9/be/src/util/bloom-filter.h@241
PS9, Line 241: HashAlgorithm hash_algorithm_;
As far as I can tell, this isn't actually getting used anywhere, and I don't 
think that it would be valid to pass any value other than FAST_HASH into Init() 
anyways (since we've hard-coded the use of fasthash in various places such as 
FilterContext::Insert()), so I think its probably better to remove this, the 
related parameter to Init(), etc.


http://gerrit.cloudera.org:8080/#/c/15683/9/be/src/util/bloom-filter.cc
File be/src/util/bloom-filter.cc:

http://gerrit.cloudera.org:8080/#/c/15683/9/be/src/util/bloom-filter.cc@205
PS9, Line 205:   if (is_allocated_) Close();
I suspect its probably supposed to be guaranteed that FreeBuffer() is always 
called before the allocator is destructed, in which case it would be good to do 
a LOG(DFATAL) here like we do in the BloomFilter destructor above (in general, 
its against Impala's style to do any real work in destructors, not sure about 
Kudu but I assume they have the same rule)


http://gerrit.cloudera.org:8080/#/c/15683/9/be/src/util/bloom-filter.cc@226
PS9, Line 226:   Close(); // Ensure that any previously allocated memory is 
released.
I wonder if this is really necessary - I would hope that Kudu provides a 
guarantee that every call to AllocateBuffer will have a corresponding call to 
FreeBuffer. Might need to ask Bankim to be sure.


http://gerrit.cloudera.org:8080/#/c/15683/9/fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java
File fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java:

http://gerrit.cloudera.org:8080/#/c/15683/9/fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java@723
PS9, Line 723: if (slotRef == null || slotRef.getDesc().getColumn() == null
             :               || (targetExpr instanceof CastExpr)
             :               || filter.getExprCompOp() == 
Operator.NOT_DISTINCT) {
             :             continue;
             :           }
This is almost identical to the 'if' below. I think we can reduce code 
duplication by moving this outside the "if(BLOOM) {} else(MIN_MAX) {}" section



--
To view, visit http://gerrit.cloudera.org:8080/15683
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9100076f68ea299ddb6ec8bc027cac7a47f5d754
Gerrit-Change-Number: 15683
Gerrit-PatchSet: 9
Gerrit-Owner: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Reviewer: Aman Sinha <amsi...@cloudera.com>
Gerrit-Reviewer: Bankim Bhavsar <ban...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com>
Gerrit-Reviewer: Thomas Tauber-Marshall <tmarsh...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <tarmstr...@cloudera.com>
Gerrit-Reviewer: Todd Lipcon <t...@apache.org>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Comment-Date: Mon, 04 May 2020 21:57:27 +0000
Gerrit-HasComments: Yes

Reply via email to