Tim Armstrong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8123 )

Change subject: IMPALA-5870: Improve explain/profile output for partial sort
......................................................................


Patch Set 1:

(3 comments)

The profile changes look great to me and make it way less confusing - just had 
one comment about variable names then I'm happy with that part of the patch. 
I'm unsure about the explain plan changes - I think that needs more thought and 
discussion. We could potentially decouple the two parts and get the profile 
changes in soon.

http://gerrit.cloudera.org:8080/#/c/8123/1//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/8123/1//COMMIT_MSG@14
PS1, Line 14: For EXPLAIN, this patch removes the 'spill-buffer' mem-estimate 
for
I'm skeptical about this part of the change. It definitely a wart but there are 
downsides to changing it.

I think reporting the buffer size is still useful. It seems like this is mainly 
an issue with the name diplayed so if we want to change it I'd prefer directly 
changing the output bypass in a parameter to the resource profile to controls 
the display name and otherwise leaving the logic unchanged. Preaggregations 
also have the exact same naming inconsistency so I think whatever is done here 
should also be applied there.

Another problem is that changing the name makes inconsistent with the query 
options that control the buffer sizes - min_spillable_buffer_bytes and 
default_spillable_buffer_bytes. I think the naming of the query options is 
imperfect but a reasonable compromise - it disambiguates it from the scanner's 
I/O buffers and most of time it's accurate. In the cases where it's slightly 
inaccurate the non-spilling operators are at least variants of spilling 
operators. We could maybe make an argument for decoupling the non-spillable and 
spillable buffer size query options since the non-spilling sizes don't affect 
I/O performance. Unclear if it's worth adding more query options though.

I guess my preference is probably to leave the explain output unchanged and 
accept that there's some potential for confusion. I think a lot of the 
explain_level=2 output is low-level enough that you need to understand the 
mechanisms in order to interpret it.


http://gerrit.cloudera.org:8080/#/c/8123/1//COMMIT_MSG@40
PS1, Line 40:     - RunsCreated: 1 (1)
This is much better!


http://gerrit.cloudera.org:8080/#/c/8123/1/be/src/runtime/sorter.cc
File be/src/runtime/sorter.cc:

http://gerrit.cloudera.org:8080/#/c/8123/1/be/src/runtime/sorter.cc@1520
PS1, Line 1520: runs_counter_
"runs_counter_" is a bit inaccurate for spilling sorts since it doesn't include 
merged runs. Maybe we should keep the old variable name and just change the 
string?

"initial_runs_counter_" seems to make sense for both cases, since they are 
still initial runs for non-spilling sorts.



--
To view, visit http://gerrit.cloudera.org:8080/8123
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I2b15af78d8299db8edc44ff820c85db1cbe0be1b
Gerrit-Change-Number: 8123
Gerrit-PatchSet: 1
Gerrit-Owner: Thomas Tauber-Marshall <[email protected]>
Gerrit-Reviewer: Tim Armstrong <[email protected]>
Gerrit-Comment-Date: Fri, 22 Sep 2017 18:08:47 +0000
Gerrit-HasComments: Yes

Reply via email to