Tim Armstrong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16373 )

Change subject: IMPALA-4065 Inline comparator calls into TopN::InsertBatch()
......................................................................


Patch Set 9:

(24 comments)

Had a bunch of style comments, mostly driven by our style guide and existing 
best practices.

I expected to see some changes to the codegen logic or perf numbers to actually 
replace the indirect comparator calls in the LLVM IR with a direct call to the 
codegen'd function. Is there going to be a part 2 that does that?

http://gerrit.cloudera.org:8080/#/c/16373/9//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/16373/9//COMMIT_MSG@19
PS9, Line 19:
Did you do any performance comparison? Would be nice to have a targeted query 
that stresses the operator.


http://gerrit.cloudera.org:8080/#/c/16373/9/be/src/exec/topn-node-ir.cc
File be/src/exec/topn-node-ir.cc:

http://gerrit.cloudera.org:8080/#/c/16373/9/be/src/exec/topn-node-ir.cc@58
PS9, Line 58:       priority_queue_.Heapify();
I'm a little confused by this - isn't heapify generally an O(n) operation?


http://gerrit.cloudera.org:8080/#/c/16373/9/be/src/exec/topn-node.h
File be/src/exec/topn-node.h:

http://gerrit.cloudera.org:8080/#/c/16373/9/be/src/exec/topn-node.h@188
PS9, Line 188: private :
nit: bad whitespace change


http://gerrit.cloudera.org:8080/#/c/16373/9/be/src/exec/topn-node.cc
File be/src/exec/topn-node.cc:

http://gerrit.cloudera.org:8080/#/c/16373/9/be/src/exec/topn-node.cc@128
PS9, Line 128:       if (codegen_status.ok()) {
I would have expected to see some changes here to replace the comparator call 
with a codegen'd version.


http://gerrit.cloudera.org:8080/#/c/16373/9/be/src/exec/topn-node.cc@273
PS9, Line 273:     priority_queue_.Pop(tuple);
This interface is an improvement!


http://gerrit.cloudera.org:8080/#/c/16373/9/be/src/exec/topn-node.cc@312
PS9, Line 312:   : capacity_(capacity), priority_queue_(c, capacity) {}
Not the biggest deal, but this preallocates the full array whereas previously 
it relied on the vector to expand. Might be a small difference in memory 
management for larger values of N.


http://gerrit.cloudera.org:8080/#/c/16373/9/be/src/util/CMakeLists.txt
File be/src/util/CMakeLists.txt:

http://gerrit.cloudera.org:8080/#/c/16373/9/be/src/util/CMakeLists.txt@151
PS9, Line 151:   priority-queue-test.cc
nit: can you preserve the alphabetic ordering of the files here?


http://gerrit.cloudera.org:8080/#/c/16373/9/be/src/util/CMakeLists.txt@221
PS9, Line 221: ADD_UNIFIED_BE_LSAN_TEST(priority-queue-test 
"PriorityQueueTest.*")
Could preserve order here too


http://gerrit.cloudera.org:8080/#/c/16373/9/be/src/util/comparator-wrapper.h
File be/src/util/comparator-wrapper.h:

PS9:
I think we can maybe just delete this wrapper - it was only required to adapt 
TupleRowComparator to the C++ STL interface, and I think we can now just call 
it directly.


http://gerrit.cloudera.org:8080/#/c/16373/9/be/src/util/comparator-wrapper.h@19
PS9, Line 19: #ifndef IMPALA_UTIL_COMPARATOR_WRAPPER_H_
Not a big deal, but we prefer #pragma once in new code instead of traditional 
include guards.

https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=65868536


http://gerrit.cloudera.org:8080/#/c/16373/9/be/src/util/priority-queue-test.cc
File be/src/util/priority-queue-test.cc:

http://gerrit.cloudera.org:8080/#/c/16373/9/be/src/util/priority-queue-test.cc@33
PS9, Line 33: testInt
nit: TestInt


http://gerrit.cloudera.org:8080/#/c/16373/9/be/src/util/priority-queue.h
File be/src/util/priority-queue.h:

http://gerrit.cloudera.org:8080/#/c/16373/9/be/src/util/priority-queue.h@19
PS9, Line 19: #define IMPALA_UTIL_PRIORITY_QUEUE_H
We prefer #pragma once


http://gerrit.cloudera.org:8080/#/c/16373/9/be/src/util/priority-queue.h@81
PS9, Line 81:   PriorityQueue(ComparatorWrapper<C> c, std::size_t n = 100)
Can't we pass in TupleRowComparator directly? ComparatorWrapper was just used 
to adapt TupleRowComparator to the c++ stl interface but I don't think we need 
that here.


http://gerrit.cloudera.org:8080/#/c/16373/9/be/src/util/priority-queue.h@96
PS9, Line 96:   inline T& Top() { return elements_[0]; }
nit: not a big deal, but I generally recommend omitting inline from functions 
defined in the class body cause it's already implied.

https://en.cppreference.com/w/cpp/language/inline "A function defined entirely 
inside a class/struct/union definition, whether it's a member function or a 
non-member friend function, is implicitly an inline function. "


http://gerrit.cloudera.org:8080/#/c/16373/9/be/src/util/priority-queue.h@105
PS9, Line 105:       capacity_ *= 2;
I'd prefer to use a vector<> so that we don't need to reimplement the doubling 
behaviour.


http://gerrit.cloudera.org:8080/#/c/16373/9/be/src/util/priority-queue.h@122
PS9, Line 122:   void Pop(T& v) {
Why not just return the value? It's just a pointer so copy vs move etc doesn't 
matter.

You could also change it to the following if you wanted to avoid a hypothetical 
copy if T was a different type.

  size_--;
  return move(elements_[last]);


http://gerrit.cloudera.org:8080/#/c/16373/9/be/src/util/priority-queue.h@128
PS9, Line 128:       Heapify(0, last);
Isn't heapify generally an O(n) operation? Popping off a heap should be O(log 
n) and it was before. Am I missing something?


http://gerrit.cloudera.org:8080/#/c/16373/9/be/src/util/priority-queue.h@139
PS9, Line 139:   // high_bound-1.
I would have expected a function called heapify to take an arbitrary array and 
turn it into a valid heap, e.g. 
http://www.cplusplus.com/reference/algorithm/make_heap/

It looks like this is doing something else - sifting down the element at i?


http://gerrit.cloudera.org:8080/#/c/16373/9/be/src/util/priority-queue.h@146
PS9, Line 146: DCHECK
DCHECK_LE since it will print values on failure


http://gerrit.cloudera.org:8080/#/c/16373/9/be/src/util/priority-queue.h@166
PS9, Line 166:         break;
else should have braces


http://gerrit.cloudera.org:8080/#/c/16373/9/be/src/util/priority-queue.h@168
PS9, Line 168:   }
nit: here and elsewhere, missing blank line between function definitions


http://gerrit.cloudera.org:8080/#/c/16373/9/be/src/util/priority-queue.h@204
PS9, Line 204:   std::size_t capacity_;
We use int64_t for sizes.

https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=65868536

https://google.github.io/styleguide/cppguide.html#Integer_Types


http://gerrit.cloudera.org:8080/#/c/16373/9/be/src/util/priority-queue.h@207
PS9, Line 207:   // The array holding the elements.
We prefer a unique_ptr<T[]> or vector<T> to avoid need for explicit memory 
management.

https://cwiki.apache.org/confluence/display/IMPALA/Resource+Management+Best+Practices+in+Impala
 has some context about the preferred ways to manage memory.


http://gerrit.cloudera.org:8080/#/c/16373/9/be/src/util/priority-queue.h@209
PS9, Line 209: compartor
comparator



--
To view, visit http://gerrit.cloudera.org:8080/16373
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I676b4c05cf10a6946c05e317b0002c1e29e78aa8
Gerrit-Change-Number: 16373
Gerrit-PatchSet: 9
Gerrit-Owner: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Sahil Takiar <stak...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <tarmstr...@cloudera.com>
Gerrit-Comment-Date: Wed, 02 Sep 2020 19:11:47 +0000
Gerrit-HasComments: Yes

Reply via email to