[Impala-ASF-CR] IMPALA-5870: Improve explain/profile output for partial sort

2017-09-22 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8123 )

Change subject: IMPALA-5870: Improve explain/profile output for partial sort
..


Patch Set 1:

(3 comments)

The profile changes look great to me and make it way less confusing - just had 
one comment about variable names then I'm happy with that part of the patch. 
I'm unsure about the explain plan changes - I think that needs more thought and 
discussion. We could potentially decouple the two parts and get the profile 
changes in soon.

http://gerrit.cloudera.org:8080/#/c/8123/1//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/8123/1//COMMIT_MSG@14
PS1, Line 14: For EXPLAIN, this patch removes the 'spill-buffer' mem-estimate 
for
I'm skeptical about this part of the change. It definitely a wart but there are 
downsides to changing it.

I think reporting the buffer size is still useful. It seems like this is mainly 
an issue with the name diplayed so if we want to change it I'd prefer directly 
changing the output bypass in a parameter to the resource profile to controls 
the display name and otherwise leaving the logic unchanged. Preaggregations 
also have the exact same naming inconsistency so I think whatever is done here 
should also be applied there.

Another problem is that changing the name makes inconsistent with the query 
options that control the buffer sizes - min_spillable_buffer_bytes and 
default_spillable_buffer_bytes. I think the naming of the query options is 
imperfect but a reasonable compromise - it disambiguates it from the scanner's 
I/O buffers and most of time it's accurate. In the cases where it's slightly 
inaccurate the non-spilling operators are at least variants of spilling 
operators. We could maybe make an argument for decoupling the non-spillable and 
spillable buffer size query options since the non-spilling sizes don't affect 
I/O performance. Unclear if it's worth adding more query options though.

I guess my preference is probably to leave the explain output unchanged and 
accept that there's some potential for confusion. I think a lot of the 
explain_level=2 output is low-level enough that you need to understand the 
mechanisms in order to interpret it.


http://gerrit.cloudera.org:8080/#/c/8123/1//COMMIT_MSG@40
PS1, Line 40: - RunsCreated: 1 (1)
This is much better!


http://gerrit.cloudera.org:8080/#/c/8123/1/be/src/runtime/sorter.cc
File be/src/runtime/sorter.cc:

http://gerrit.cloudera.org:8080/#/c/8123/1/be/src/runtime/sorter.cc@1520
PS1, Line 1520: runs_counter_
"runs_counter_" is a bit inaccurate for spilling sorts since it doesn't include 
merged runs. Maybe we should keep the old variable name and just change the 
string?

"initial_runs_counter_" seems to make sense for both cases, since they are 
still initial runs for non-spilling sorts.



--
To view, visit http://gerrit.cloudera.org:8080/8123
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I2b15af78d8299db8edc44ff820c85db1cbe0be1b
Gerrit-Change-Number: 8123
Gerrit-PatchSet: 1
Gerrit-Owner: Thomas Tauber-Marshall 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Fri, 22 Sep 2017 18:08:47 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5870: Improve explain/profile output for partial sort

2017-09-21 Thread Thomas Tauber-Marshall (Code Review)
Thomas Tauber-Marshall has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/8123


Change subject: IMPALA-5870: Improve explain/profile output for partial sort
..

IMPALA-5870: Improve explain/profile output for partial sort

A recent change (IMPALA-5498) added the ability to do partial sorts,
which divide their input up into runs each of which is sorted
individually, avoiding the need to spill. Some of the debug output
wasn't updated vs. regular sorts, leading to confusion.

For EXPLAIN, this patch removes the 'spill-buffer' mem-estimate for
partial sorts, since they can't spill. It does this by setting the
spillable buffer size in the resource profile to -1. Since the BE
relied on that number to determine the page size for sorts, it
now calculates the page size from the min reservation, which gives
an equivalent value.

For the runtime profile, it removes the counters 'SpilledRuns' and
'MergesPerformed' since they will always be 0, and it renames the
'IntialRunsCreated' counter to 'RunsCreated' since the 'Initial'
refers to the fact that in a regular sort those runs may be spilled
or merged.

It also adds a profile info string 'SortType' that can take the values
'Total', 'TopN', or 'Partial' to reflect the type of exec node being
used.

Example profile snippet for a partial sort:
SORT_NODE (id=2):(Total: 403.261us, non-child: 382.029us, % non-child: 94.73%)
 SortType: Partial
 ExecOption: Codegen Enabled
- NumRowsPerRun: (Avg: 44 (44) ; Min: 44 (44) ; Max: 44 (44) ; Number of 
samples: 1)
- InMemorySortTime: 34.201us
- PeakMemoryUsage: 2.02 MB (2117632)
- RowsReturned: 44 (44)
- RowsReturnedRate: 109.11 K/sec
- RunsCreated: 1 (1)
- SortDataSize: 572.00 B (572)

Testing:
- Manually ran several sorting queries and inspected their explain
  plans and profile

Change-Id: I2b15af78d8299db8edc44ff820c85db1cbe0be1b
---
M be/src/exec/partial-sort-node.cc
M be/src/exec/sort-node.cc
M be/src/exec/topn-node.cc
M be/src/runtime/sorter.cc
M be/src/runtime/sorter.h
M fe/src/main/java/org/apache/impala/planner/SortNode.java
6 files changed, 22 insertions(+), 9 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/23/8123/1
--
To view, visit http://gerrit.cloudera.org:8080/8123
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I2b15af78d8299db8edc44ff820c85db1cbe0be1b
Gerrit-Change-Number: 8123
Gerrit-PatchSet: 1
Gerrit-Owner: Thomas Tauber-Marshall