Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/8123 )
Change subject: IMPALA-5870: Improve explain/profile output for partial sort ...................................................................... Patch Set 1: (3 comments) The profile changes look great to me and make it way less confusing - just had one comment about variable names then I'm happy with that part of the patch. I'm unsure about the explain plan changes - I think that needs more thought and discussion. We could potentially decouple the two parts and get the profile changes in soon. http://gerrit.cloudera.org:8080/#/c/8123/1//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/8123/1//COMMIT_MSG@14 PS1, Line 14: For EXPLAIN, this patch removes the 'spill-buffer' mem-estimate for I'm skeptical about this part of the change. It definitely a wart but there are downsides to changing it. I think reporting the buffer size is still useful. It seems like this is mainly an issue with the name diplayed so if we want to change it I'd prefer directly changing the output bypass in a parameter to the resource profile to controls the display name and otherwise leaving the logic unchanged. Preaggregations also have the exact same naming inconsistency so I think whatever is done here should also be applied there. Another problem is that changing the name makes inconsistent with the query options that control the buffer sizes - min_spillable_buffer_bytes and default_spillable_buffer_bytes. I think the naming of the query options is imperfect but a reasonable compromise - it disambiguates it from the scanner's I/O buffers and most of time it's accurate. In the cases where it's slightly inaccurate the non-spilling operators are at least variants of spilling operators. We could maybe make an argument for decoupling the non-spillable and spillable buffer size query options since the non-spilling sizes don't affect I/O performance. Unclear if it's worth adding more query options though. I guess my preference is probably to leave the explain output unchanged and accept that there's some potential for confusion. I think a lot of the explain_level=2 output is low-level enough that you need to understand the mechanisms in order to interpret it. http://gerrit.cloudera.org:8080/#/c/8123/1//COMMIT_MSG@40 PS1, Line 40: - RunsCreated: 1 (1) This is much better! http://gerrit.cloudera.org:8080/#/c/8123/1/be/src/runtime/sorter.cc File be/src/runtime/sorter.cc: http://gerrit.cloudera.org:8080/#/c/8123/1/be/src/runtime/sorter.cc@1520 PS1, Line 1520: runs_counter_ "runs_counter_" is a bit inaccurate for spilling sorts since it doesn't include merged runs. Maybe we should keep the old variable name and just change the string? "initial_runs_counter_" seems to make sense for both cases, since they are still initial runs for non-spilling sorts. -- To view, visit http://gerrit.cloudera.org:8080/8123 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I2b15af78d8299db8edc44ff820c85db1cbe0be1b Gerrit-Change-Number: 8123 Gerrit-PatchSet: 1 Gerrit-Owner: Thomas Tauber-Marshall <[email protected]> Gerrit-Reviewer: Tim Armstrong <[email protected]> Gerrit-Comment-Date: Fri, 22 Sep 2017 18:08:47 +0000 Gerrit-HasComments: Yes
