Thomas Tauber-Marshall has posted comments on this change. ( http://gerrit.cloudera.org:8080/16942 )
Change subject: IMPALA-10296: Fix analytic limit pushdown when predicates are present ...................................................................... Patch Set 11: (10 comments) I've only looked at the BE part and tests so far, but I can try to wrap my head around the FE changes too if its helpful. I think the BE logic looks good, but since its complex it took me awhile to convince myself it was correct, so most of my suggestions are around cleaning up comments http://gerrit.cloudera.org:8080/#/c/16942/11/be/src/exec/topn-node-ir.cc File be/src/exec/topn-node-ir.cc: http://gerrit.cloudera.org:8080/#/c/16942/11/be/src/exec/topn-node-ir.cc@44 PS11, Line 44: priority_queue_.Push(insert_tuple); Might be useful to explicitly mention here that we don't have to worry about ties going into the heap at this point, as we'll handle that with the complicated logic in InsertTupleWithTieHandling later if necessary. http://gerrit.cloudera.org:8080/#/c/16942/11/be/src/exec/topn-node-ir.cc@91 PS11, Line 91: // 'materialized_tuple' needs to be added. Figure out which other tuples, if any, : // need to be removed from the heap. I might move this above the DCHECK, to make it clearer its referring to the entire 'else' branch. http://gerrit.cloudera.org:8080/#/c/16942/11/be/src/exec/topn-node-ir.cc@93 PS11, Line 93: It will compare equal with any overflowed ties at the back of : // 'priority_queue_' Not sure what this is supposed to mean, since my understanding would be that 'overflowed' ties aren't in 'priority_queue_'. http://gerrit.cloudera.org:8080/#/c/16942/11/be/src/exec/topn-node-ir.cc@97 PS11, Line 97: 'top_tuple' Might be worth explicitly saying that 'top_tuple' is what we just popped off the heap. http://gerrit.cloudera.org:8080/#/c/16942/11/be/src/exec/topn-node-ir.cc@100 PS11, Line 100: // Removing all the ties would bring the heap below its capacity. We need to I find this comment a little confusing, since we've already done the one Pop() we need to do, its unclear why we would think about doing another Pop() and bring the heap below capacity just because the new top is tied with the old one. I think it would be better to say something along the lines of "The new top is still tied with the tuples in 'overflowed_ties_' so don't clear it, just add the previous top as another overflowed tuple" http://gerrit.cloudera.org:8080/#/c/16942/11/be/src/exec/topn-node.h File be/src/exec/topn-node.h: http://gerrit.cloudera.org:8080/#/c/16942/11/be/src/exec/topn-node.h@86 PS11, Line 86: /// Unpartitioned TopN Implementation Details nit: its a little confusing to call out 'unpartitioned mode' specifically when its the only mode currently supported. I guess it doesn't matter if the partitioned mode patch is going in soon, but you could add a "TODO: implement partitioned mode" or something. http://gerrit.cloudera.org:8080/#/c/16942/11/be/src/exec/topn-node.h@180 PS11, Line 180: RuntimeProfile::Counter* num_partitions_counter_ = nullptr; The counters here and below aren't used in this patch http://gerrit.cloudera.org:8080/#/c/16942/11/be/src/exec/topn-node.h@200 PS11, Line 200: GetNextPartitioned() nit: not in this patch http://gerrit.cloudera.org:8080/#/c/16942/11/be/src/exec/topn-node.h@256 PS11, Line 256: /// Helper to insert tuple row into the priority queue with tie handling. If I understand correctly, this function only gets called once the heap has reached capacity. I suppose that's sort of implicit, since that's when tie handling starts to matter, but might be useful to call this out explicitly. http://gerrit.cloudera.org:8080/#/c/16942/11/be/src/exec/topn-node.cc File be/src/exec/topn-node.cc: http://gerrit.cloudera.org:8080/#/c/16942/11/be/src/exec/topn-node.cc@306 PS11, Line 306: addition to I think this is a typo? i.e. these two words should be removed -- To view, visit http://gerrit.cloudera.org:8080/16942 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I801d7799b0d649c73d2dd1703729a9b58a662509 Gerrit-Change-Number: 16942 Gerrit-PatchSet: 11 Gerrit-Owner: Tim Armstrong <[email protected]> Gerrit-Reviewer: Aman Sinha <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Thomas Tauber-Marshall <[email protected]> Gerrit-Reviewer: Tim Armstrong <[email protected]> Gerrit-Comment-Date: Sat, 16 Jan 2021 00:12:42 +0000 Gerrit-HasComments: Yes
