[Impala-ASF-CR] IMPALA-6692: Trigger sort node run before hitting memory limit.

2020-07-23 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has submitted this change and it was merged. ( 
http://gerrit.cloudera.org:8080/15963 )

Change subject: IMPALA-6692: Trigger sort node run before hitting memory limit.
..

IMPALA-6692: Trigger sort node run before hitting memory limit.

Sorter node works by adding row batches to a sort run. After all
batches are added to current unsorted run or memory limit is hit,
sorter will immediately start the run. If the latter case happens,
sorter will spill the sorted run to disk after sort complete, create
new unsorted run object, and continue to add the next row batches, and
so on.

This algorithm tries to fit as much rows into memory before start
sorting. However, in the case of partitioned sort with large number of
row batches, fitting too much rows into memory will cause the sort to
be slow and block the sorter node for a long time before it can
release some memory and continue accepting the next row batch from
exchange node. One slow sorter node can block exchange node from
sending row batches to other sorter node that is free.

This patch speeds up the decision to start the sort without waiting it
to hit memory limit first by capping the intermediary quicksort run to
lower memory limit, determined by query option 'sort_run_bytes_limit'.
If the total used reservation of quicksort has exceeded
sort_run_bytes_limit, current unsorted_run_ will be wrapped up,
sorted, and then spilled. Thus, overlapping the next sort run with
spill from previous sort run.

To reduce regression for cases where total input size of sort node
might be fully fit into available memory, sort_run_bytes_limit will
not be enforced for the first sort run. However, it will stay limited
by sort_run_bytes_limit if planner estimates hint that spill is
inevitably will happen.

We also add new summary counter 'AddBatchTime' to get summary of how
much time spent in Sorter::AddBatch. Max of 'AddBatchTime' indicate
the longest time spent in Sorter::AddBatch, presumably busy doing
intermediary sort.

Testing:
- Add new e2e test TestQueryFullSort::test_multiple_sort_run_bytes_limits
- Run core tests
- Run data loading of 3 largest TPC-DS facts table of 300GB scale into
  real cluster using 5 backends, and 4GB mem_limit.
  sort_run_bytes_limit is varied between unspecified (not limited) vs
  512 MB. The performance result is summarized in the following table.

+---+-+--+---+-+
|  Insert table |  #Rows  |  Avg |   no limit|  512 MB 
limit   |
|   | | SortDataSize 
++--+-+---+
|   | |   per Node   |  Query |  Max |  Query  |
  Max  |
|   | |  |  Time  | AddBatchTime |   Time  |  
AddBatchTime |
+---+-+--++--+-+---+
| store_sales   | 864.00M | 15.29 GB | 30m18s | 53s311ms | 20m |
   5s634ms |
+---+-+--++--+-+---+
| catalog_sales | 431.97M | 11.34 GB | 23m24s | 31s212ms |  15m27s |
   3s603ms |
+---+-+--++--+-+---+
| web_sales | 216.01M |  5.67 GB |  8m16s | 29s250ms |   6m41s |
   3s856ms |
+---+-+--++--+-+---+

Change-Id: I2a0ba7c4bae4f1d300d4d9d7f594f63ced06a240
Reviewed-on: http://gerrit.cloudera.org:8080/15963
Reviewed-by: Impala Public Jenkins 
Tested-by: Impala Public Jenkins 
---
M be/src/exec/sort-node.cc
M be/src/exec/sort-node.h
M be/src/runtime/coordinator-backend-state.cc
M be/src/runtime/query-state.cc
M be/src/runtime/query-state.h
M be/src/runtime/sorter.cc
M be/src/runtime/sorter.h
M be/src/service/query-options-test.cc
M be/src/service/query-options.cc
M be/src/service/query-options.h
M common/thrift/ImpalaInternalService.thrift
M common/thrift/ImpalaService.thrift
M common/thrift/PlanNodes.thrift
M fe/src/main/java/org/apache/impala/planner/SortNode.java
M tests/query_test/test_sort.py
15 files changed, 224 insertions(+), 10 deletions(-)

Approvals:
  Impala Public Jenkins: Looks good to me, approved; Verified

--
To view, visit http://gerrit.cloudera.org:8080/15963
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: I2a0ba7c4bae4f1d300d4d9d7f594f63ced06a240
Gerrit-Change-Number: 15963
Gerrit-PatchSet: 22
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: David Rorke 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-6692: Trigger sort node run before hitting memory limit.

2020-07-23 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15963 )

Change subject: IMPALA-6692: Trigger sort node run before hitting memory limit.
..


Patch Set 21: Verified+1


--
To view, visit http://gerrit.cloudera.org:8080/15963
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I2a0ba7c4bae4f1d300d4d9d7f594f63ced06a240
Gerrit-Change-Number: 15963
Gerrit-PatchSet: 21
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: David Rorke 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Thu, 23 Jul 2020 17:13:42 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-6692: Trigger sort node run before hitting memory limit.

2020-07-23 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15963 )

Change subject: IMPALA-6692: Trigger sort node run before hitting memory limit.
..


Patch Set 21: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/15963
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I2a0ba7c4bae4f1d300d4d9d7f594f63ced06a240
Gerrit-Change-Number: 15963
Gerrit-PatchSet: 21
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: David Rorke 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Thu, 23 Jul 2020 12:16:40 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-6692: Trigger sort node run before hitting memory limit.

2020-07-23 Thread Csaba Ringhofer (Code Review)
Csaba Ringhofer has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15963 )

Change subject: IMPALA-6692: Trigger sort node run before hitting memory limit.
..


Patch Set 20: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/15963
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I2a0ba7c4bae4f1d300d4d9d7f594f63ced06a240
Gerrit-Change-Number: 15963
Gerrit-PatchSet: 20
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: David Rorke 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Thu, 23 Jul 2020 12:16:17 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-6692: Trigger sort node run before hitting memory limit.

2020-07-23 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15963 )

Change subject: IMPALA-6692: Trigger sort node run before hitting memory limit.
..


Patch Set 21:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/6172/ 
DRY_RUN=false


--
To view, visit http://gerrit.cloudera.org:8080/15963
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I2a0ba7c4bae4f1d300d4d9d7f594f63ced06a240
Gerrit-Change-Number: 15963
Gerrit-PatchSet: 21
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: David Rorke 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Thu, 23 Jul 2020 12:16:41 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-6692: Trigger sort node run before hitting memory limit.

2020-07-22 Thread Riza Suminto (Code Review)
Riza Suminto has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15963 )

Change subject: IMPALA-6692: Trigger sort node run before hitting memory limit.
..


Patch Set 20:

Patch set 19 fail the same test, test_multiple_sort_run_bytes_limits.
Looks like admission controller does not respect buffer_pool_limit as much as 
mem_limit.

Patch set 20 change the test cases to use mem_limit instead of 
buffer_pool_limit, just as Tim initially suggest. Some of the 
sort_run_bytes_limit parameter also adjusted to keep the assertions true.
Fang-Yu help me verify that this Patch set 20 can pass 
ubuntu-16.04-dockerised-tests by rerunning it in this jenkins job:
https://jenkins.impala.io/job/ubuntu-16.04-dockerised-tests/2814/


--
To view, visit http://gerrit.cloudera.org:8080/15963
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I2a0ba7c4bae4f1d300d4d9d7f594f63ced06a240
Gerrit-Change-Number: 15963
Gerrit-PatchSet: 20
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: David Rorke 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Thu, 23 Jul 2020 02:47:18 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-6692: Trigger sort node run before hitting memory limit.

2020-07-22 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15963 )

Change subject: IMPALA-6692: Trigger sort node run before hitting memory limit.
..


Patch Set 20:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/6692/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/15963
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I2a0ba7c4bae4f1d300d4d9d7f594f63ced06a240
Gerrit-Change-Number: 15963
Gerrit-PatchSet: 20
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: David Rorke 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Wed, 22 Jul 2020 23:01:00 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-6692: Trigger sort node run before hitting memory limit.

2020-07-22 Thread Riza Suminto (Code Review)
Hello David Rorke, Tim Armstrong, Csaba Ringhofer, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/15963

to look at the new patch set (#20).

Change subject: IMPALA-6692: Trigger sort node run before hitting memory limit.
..

IMPALA-6692: Trigger sort node run before hitting memory limit.

Sorter node works by adding row batches to a sort run. After all
batches are added to current unsorted run or memory limit is hit,
sorter will immediately start the run. If the latter case happens,
sorter will spill the sorted run to disk after sort complete, create
new unsorted run object, and continue to add the next row batches, and
so on.

This algorithm tries to fit as much rows into memory before start
sorting. However, in the case of partitioned sort with large number of
row batches, fitting too much rows into memory will cause the sort to
be slow and block the sorter node for a long time before it can
release some memory and continue accepting the next row batch from
exchange node. One slow sorter node can block exchange node from
sending row batches to other sorter node that is free.

This patch speeds up the decision to start the sort without waiting it
to hit memory limit first by capping the intermediary quicksort run to
lower memory limit, determined by query option 'sort_run_bytes_limit'.
If the total used reservation of quicksort has exceeded
sort_run_bytes_limit, current unsorted_run_ will be wrapped up,
sorted, and then spilled. Thus, overlapping the next sort run with
spill from previous sort run.

To reduce regression for cases where total input size of sort node
might be fully fit into available memory, sort_run_bytes_limit will
not be enforced for the first sort run. However, it will stay limited
by sort_run_bytes_limit if planner estimates hint that spill is
inevitably will happen.

We also add new summary counter 'AddBatchTime' to get summary of how
much time spent in Sorter::AddBatch. Max of 'AddBatchTime' indicate
the longest time spent in Sorter::AddBatch, presumably busy doing
intermediary sort.

Testing:
- Add new e2e test TestQueryFullSort::test_multiple_sort_run_bytes_limits
- Run core tests
- Run data loading of 3 largest TPC-DS facts table of 300GB scale into
  real cluster using 5 backends, and 4GB mem_limit.
  sort_run_bytes_limit is varied between unspecified (not limited) vs
  512 MB. The performance result is summarized in the following table.

+---+-+--+---+-+
|  Insert table |  #Rows  |  Avg |   no limit|  512 MB 
limit   |
|   | | SortDataSize 
++--+-+---+
|   | |   per Node   |  Query |  Max |  Query  |
  Max  |
|   | |  |  Time  | AddBatchTime |   Time  |  
AddBatchTime |
+---+-+--++--+-+---+
| store_sales   | 864.00M | 15.29 GB | 30m18s | 53s311ms | 20m |
   5s634ms |
+---+-+--++--+-+---+
| catalog_sales | 431.97M | 11.34 GB | 23m24s | 31s212ms |  15m27s |
   3s603ms |
+---+-+--++--+-+---+
| web_sales | 216.01M |  5.67 GB |  8m16s | 29s250ms |   6m41s |
   3s856ms |
+---+-+--++--+-+---+

Change-Id: I2a0ba7c4bae4f1d300d4d9d7f594f63ced06a240
---
M be/src/exec/sort-node.cc
M be/src/exec/sort-node.h
M be/src/runtime/coordinator-backend-state.cc
M be/src/runtime/query-state.cc
M be/src/runtime/query-state.h
M be/src/runtime/sorter.cc
M be/src/runtime/sorter.h
M be/src/service/query-options-test.cc
M be/src/service/query-options.cc
M be/src/service/query-options.h
M common/thrift/ImpalaInternalService.thrift
M common/thrift/ImpalaService.thrift
M common/thrift/PlanNodes.thrift
M fe/src/main/java/org/apache/impala/planner/SortNode.java
M tests/query_test/test_sort.py
15 files changed, 224 insertions(+), 10 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/63/15963/20
--
To view, visit http://gerrit.cloudera.org:8080/15963
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I2a0ba7c4bae4f1d300d4d9d7f594f63ced06a240
Gerrit-Change-Number: 15963
Gerrit-PatchSet: 20
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: David Rorke 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-6692: Trigger sort node run before hitting memory limit.

2020-07-22 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15963 )

Change subject: IMPALA-6692: Trigger sort node run before hitting memory limit.
..


Patch Set 19: Verified-1

Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/6163/


--
To view, visit http://gerrit.cloudera.org:8080/15963
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I2a0ba7c4bae4f1d300d4d9d7f594f63ced06a240
Gerrit-Change-Number: 15963
Gerrit-PatchSet: 19
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: David Rorke 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Wed, 22 Jul 2020 19:34:05 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-6692: Trigger sort node run before hitting memory limit.

2020-07-22 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15963 )

Change subject: IMPALA-6692: Trigger sort node run before hitting memory limit.
..


Patch Set 19:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/6163/ 
DRY_RUN=false


--
To view, visit http://gerrit.cloudera.org:8080/15963
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I2a0ba7c4bae4f1d300d4d9d7f594f63ced06a240
Gerrit-Change-Number: 15963
Gerrit-PatchSet: 19
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: David Rorke 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Wed, 22 Jul 2020 14:29:16 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-6692: Trigger sort node run before hitting memory limit.

2020-07-21 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15963 )

Change subject: IMPALA-6692: Trigger sort node run before hitting memory limit.
..


Patch Set 19:

The dockerised tests have memory-based admission control enabled so it's more 
likely that the query was given a memory limit by the admission controller and 
that it behaved differently because of that (rather than concurrency being a 
factor). You could explicitly set a higher mem_limit for that test if you want 
more control.

I think you're probably achieving the same thing by setting buffer_pool_limit.


--
To view, visit http://gerrit.cloudera.org:8080/15963
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I2a0ba7c4bae4f1d300d4d9d7f594f63ced06a240
Gerrit-Change-Number: 15963
Gerrit-PatchSet: 19
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: David Rorke 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Tue, 21 Jul 2020 22:47:28 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-6692: Trigger sort node run before hitting memory limit.

2020-07-21 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15963 )

Change subject: IMPALA-6692: Trigger sort node run before hitting memory limit.
..


Patch Set 19:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/6683/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/15963
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I2a0ba7c4bae4f1d300d4d9d7f594f63ced06a240
Gerrit-Change-Number: 15963
Gerrit-PatchSet: 19
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: David Rorke 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Tue, 21 Jul 2020 22:08:11 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-6692: Trigger sort node run before hitting memory limit.

2020-07-21 Thread Riza Suminto (Code Review)
Riza Suminto has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15963 )

Change subject: IMPALA-6692: Trigger sort node run before hitting memory limit.
..


Patch Set 19:

Patch set 18 failed the first test case of test_multiple_sort_run_bytes_limits 
on ubuntu-16.04-dockerised-tests setup.
I assume it is due to memory pressure from other running queries that force 
sorter to spill, causing assertion error (there should be 0 spilled runs).

Patch set 19 tries to fix it by marking the test to execute serially and raise 
the buffer_pool_limit for the first test case from 2GB to 3GB.


--
To view, visit http://gerrit.cloudera.org:8080/15963
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I2a0ba7c4bae4f1d300d4d9d7f594f63ced06a240
Gerrit-Change-Number: 15963
Gerrit-PatchSet: 19
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: David Rorke 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Tue, 21 Jul 2020 21:48:05 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-6692: Trigger sort node run before hitting memory limit.

2020-07-21 Thread Riza Suminto (Code Review)
Hello David Rorke, Tim Armstrong, Csaba Ringhofer, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/15963

to look at the new patch set (#19).

Change subject: IMPALA-6692: Trigger sort node run before hitting memory limit.
..

IMPALA-6692: Trigger sort node run before hitting memory limit.

Sorter node works by adding row batches to a sort run. After all
batches are added to current unsorted run or memory limit is hit,
sorter will immediately start the run. If the latter case happens,
sorter will spill the sorted run to disk after sort complete, create
new unsorted run object, and continue to add the next row batches, and
so on.

This algorithm tries to fit as much rows into memory before start
sorting. However, in the case of partitioned sort with large number of
row batches, fitting too much rows into memory will cause the sort to
be slow and block the sorter node for a long time before it can
release some memory and continue accepting the next row batch from
exchange node. One slow sorter node can block exchange node from
sending row batches to other sorter node that is free.

This patch speeds up the decision to start the sort without waiting it
to hit memory limit first by capping the intermediary quicksort run to
lower memory limit, determined by query option 'sort_run_bytes_limit'.
If the total used reservation of quicksort has exceeded
sort_run_bytes_limit, current unsorted_run_ will be wrapped up,
sorted, and then spilled. Thus, overlapping the next sort run with
spill from previous sort run.

To reduce regression for cases where total input size of sort node
might be fully fit into available memory, sort_run_bytes_limit will
not be enforced for the first sort run. However, it will stay limited
by sort_run_bytes_limit if planner estimates hint that spill is
inevitably will happen.

We also add new summary counter 'AddBatchTime' to get summary of how
much time spent in Sorter::AddBatch. Max of 'AddBatchTime' indicate
the longest time spent in Sorter::AddBatch, presumably busy doing
intermediary sort.

Testing:
- Add new e2e test TestQueryFullSort::test_multiple_sort_run_bytes_limits
- Run core tests
- Run data loading of 3 largest TPC-DS facts table of 300GB scale into
  real cluster using 5 backends, and 4GB mem_limit.
  sort_run_bytes_limit is varied between unspecified (not limited) vs
  512 MB. The performance result is summarized in the following table.

+---+-+--+---+-+
|  Insert table |  #Rows  |  Avg |   no limit|  512 MB 
limit   |
|   | | SortDataSize 
++--+-+---+
|   | |   per Node   |  Query |  Max |  Query  |
  Max  |
|   | |  |  Time  | AddBatchTime |   Time  |  
AddBatchTime |
+---+-+--++--+-+---+
| store_sales   | 864.00M | 15.29 GB | 30m18s | 53s311ms | 20m |
   5s634ms |
+---+-+--++--+-+---+
| catalog_sales | 431.97M | 11.34 GB | 23m24s | 31s212ms |  15m27s |
   3s603ms |
+---+-+--++--+-+---+
| web_sales | 216.01M |  5.67 GB |  8m16s | 29s250ms |   6m41s |
   3s856ms |
+---+-+--++--+-+---+

Change-Id: I2a0ba7c4bae4f1d300d4d9d7f594f63ced06a240
---
M be/src/exec/sort-node.cc
M be/src/exec/sort-node.h
M be/src/runtime/coordinator-backend-state.cc
M be/src/runtime/query-state.cc
M be/src/runtime/query-state.h
M be/src/runtime/sorter.cc
M be/src/runtime/sorter.h
M be/src/service/query-options-test.cc
M be/src/service/query-options.cc
M be/src/service/query-options.h
M common/thrift/ImpalaInternalService.thrift
M common/thrift/ImpalaService.thrift
M common/thrift/PlanNodes.thrift
M fe/src/main/java/org/apache/impala/planner/SortNode.java
M tests/query_test/test_sort.py
15 files changed, 228 insertions(+), 10 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/63/15963/19
--
To view, visit http://gerrit.cloudera.org:8080/15963
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I2a0ba7c4bae4f1d300d4d9d7f594f63ced06a240
Gerrit-Change-Number: 15963
Gerrit-PatchSet: 19
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: David Rorke 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-6692: Trigger sort node run before hitting memory limit.

2020-07-21 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15963 )

Change subject: IMPALA-6692: Trigger sort node run before hitting memory limit.
..


Patch Set 18: Verified-1

Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/6154/


--
To view, visit http://gerrit.cloudera.org:8080/15963
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I2a0ba7c4bae4f1d300d4d9d7f594f63ced06a240
Gerrit-Change-Number: 15963
Gerrit-PatchSet: 18
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: David Rorke 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Tue, 21 Jul 2020 20:51:48 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-6692: Trigger sort node run before hitting memory limit.

2020-07-21 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15963 )

Change subject: IMPALA-6692: Trigger sort node run before hitting memory limit.
..


Patch Set 18: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/15963
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I2a0ba7c4bae4f1d300d4d9d7f594f63ced06a240
Gerrit-Change-Number: 15963
Gerrit-PatchSet: 18
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: David Rorke 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Tue, 21 Jul 2020 15:42:28 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-6692: Trigger sort node run before hitting memory limit.

2020-07-21 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15963 )

Change subject: IMPALA-6692: Trigger sort node run before hitting memory limit.
..


Patch Set 18:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/6154/ 
DRY_RUN=false


--
To view, visit http://gerrit.cloudera.org:8080/15963
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I2a0ba7c4bae4f1d300d4d9d7f594f63ced06a240
Gerrit-Change-Number: 15963
Gerrit-PatchSet: 18
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: David Rorke 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Tue, 21 Jul 2020 15:42:29 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-6692: Trigger sort node run before hitting memory limit.

2020-07-21 Thread Csaba Ringhofer (Code Review)
Csaba Ringhofer has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15963 )

Change subject: IMPALA-6692: Trigger sort node run before hitting memory limit.
..


Patch Set 17: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/15963
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I2a0ba7c4bae4f1d300d4d9d7f594f63ced06a240
Gerrit-Change-Number: 15963
Gerrit-PatchSet: 17
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: David Rorke 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Tue, 21 Jul 2020 15:39:29 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-6692: Trigger sort node run before hitting memory limit.

2020-07-20 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15963 )

Change subject: IMPALA-6692: Trigger sort node run before hitting memory limit.
..


Patch Set 17:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/6658/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/15963
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I2a0ba7c4bae4f1d300d4d9d7f594f63ced06a240
Gerrit-Change-Number: 15963
Gerrit-PatchSet: 17
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: David Rorke 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Mon, 20 Jul 2020 19:02:49 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-6692: Trigger sort node run before hitting memory limit.

2020-07-20 Thread Riza Suminto (Code Review)
Hello David Rorke, Tim Armstrong, Csaba Ringhofer, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/15963

to look at the new patch set (#17).

Change subject: IMPALA-6692: Trigger sort node run before hitting memory limit.
..

IMPALA-6692: Trigger sort node run before hitting memory limit.

Sorter node works by adding row batches to a sort run. After all
batches are added to current unsorted run or memory limit is hit,
sorter will immediately start the run. If the latter case happens,
sorter will spill the sorted run to disk after sort complete, create
new unsorted run object, and continue to add the next row batches, and
so on.

This algorithm tries to fit as much rows into memory before start
sorting. However, in the case of partitioned sort with large number of
row batches, fitting too much rows into memory will cause the sort to
be slow and block the sorter node for a long time before it can
release some memory and continue accepting the next row batch from
exchange node. One slow sorter node can block exchange node from
sending row batches to other sorter node that is free.

This patch speeds up the decision to start the sort without waiting it
to hit memory limit first by capping the intermediary quicksort run to
lower memory limit, determined by query option 'sort_run_bytes_limit'.
If the total used reservation of quicksort has exceeded
sort_run_bytes_limit, current unsorted_run_ will be wrapped up,
sorted, and then spilled. Thus, overlapping the next sort run with
spill from previous sort run.

To reduce regression for cases where total input size of sort node
might be fully fit into available memory, sort_run_bytes_limit will
not be enforced for the first sort run. However, it will stay limited
by sort_run_bytes_limit if planner estimates hint that spill is
inevitably will happen.

We also add new summary counter 'AddBatchTime' to get summary of how
much time spent in Sorter::AddBatch. Max of 'AddBatchTime' indicate
the longest time spent in Sorter::AddBatch, presumably busy doing
intermediary sort.

Testing:
- Add new e2e test TestQueryFullSort::test_multiple_sort_run_bytes_limits
- Run core tests
- Run data loading of 3 largest TPC-DS facts table of 300GB scale into
  real cluster using 5 backends, and 4GB mem_limit.
  sort_run_bytes_limit is varied between unspecified (not limited) vs
  512 MB. The performance result is summarized in the following table.

+---+-+--+---+-+
|  Insert table |  #Rows  |  Avg |   no limit|  512 MB 
limit   |
|   | | SortDataSize 
++--+-+---+
|   | |   per Node   |  Query |  Max |  Query  |
  Max  |
|   | |  |  Time  | AddBatchTime |   Time  |  
AddBatchTime |
+---+-+--++--+-+---+
| store_sales   | 864.00M | 15.29 GB | 30m18s | 53s311ms | 20m |
   5s634ms |
+---+-+--++--+-+---+
| catalog_sales | 431.97M | 11.34 GB | 23m24s | 31s212ms |  15m27s |
   3s603ms |
+---+-+--++--+-+---+
| web_sales | 216.01M |  5.67 GB |  8m16s | 29s250ms |   6m41s |
   3s856ms |
+---+-+--++--+-+---+

Change-Id: I2a0ba7c4bae4f1d300d4d9d7f594f63ced06a240
---
M be/src/exec/sort-node.cc
M be/src/exec/sort-node.h
M be/src/runtime/coordinator-backend-state.cc
M be/src/runtime/query-state.cc
M be/src/runtime/query-state.h
M be/src/runtime/sorter.cc
M be/src/runtime/sorter.h
M be/src/service/query-options-test.cc
M be/src/service/query-options.cc
M be/src/service/query-options.h
M common/thrift/ImpalaInternalService.thrift
M common/thrift/ImpalaService.thrift
M common/thrift/PlanNodes.thrift
M fe/src/main/java/org/apache/impala/planner/SortNode.java
M tests/query_test/test_sort.py
15 files changed, 224 insertions(+), 10 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/63/15963/17
--
To view, visit http://gerrit.cloudera.org:8080/15963
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I2a0ba7c4bae4f1d300d4d9d7f594f63ced06a240
Gerrit-Change-Number: 15963
Gerrit-PatchSet: 17
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: David Rorke 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-6692: Trigger sort node run before hitting memory limit.

2020-07-01 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15963 )

Change subject: IMPALA-6692: Trigger sort node run before hitting memory limit.
..


Patch Set 16:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/6483/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/15963
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I2a0ba7c4bae4f1d300d4d9d7f594f63ced06a240
Gerrit-Change-Number: 15963
Gerrit-PatchSet: 16
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: David Rorke 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Thu, 02 Jul 2020 04:17:20 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-6692: Trigger sort node run before hitting memory limit.

2020-07-01 Thread Riza Suminto (Code Review)
Hello David Rorke, Tim Armstrong, Csaba Ringhofer, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/15963

to look at the new patch set (#16).

Change subject: IMPALA-6692: Trigger sort node run before hitting memory limit.
..

IMPALA-6692: Trigger sort node run before hitting memory limit.

Sorter node works by adding row batches to a sort run. After all
batches are added to current unsorted run or memory limit is hit,
sorter will immediately start the run. If the latter case happens,
sorter will spill the sorted run to disk after sort complete, create
new unsorted run object, and continue to add the next row batches, and
so on.

This algorithm tries to fit as much rows into memory before start
sorting. However, in the case of partitioned sort with large number of
row batches, fitting too much rows into memory will cause the sort to
be slow and block the sorter node for a long time before it can
release some memory and continue accepting the next row batch from
exchange node. One slow sorter node can block exchange node from
sending row batches to other sorter node that is free.

This patch speeds up the decision to start the sort without waiting it
to hit memory limit first by capping the intermediary quicksort run to
lower memory limit, determined by query option 'sort_run_bytes_limit'.
If the total used reservation of quicksort has exceeded
sort_run_bytes_limit, current unsorted_run_ will be wrapped up,
sorted, and then spilled. Thus, overlapping the next sort run with
spill from previous sort run.

To reduce regression for cases where total input size of sort node
might be fully fit into available memory, sort_run_bytes_limit will
not be enforced for the first sort run. However, it will stay limited
by sort_run_bytes_limit if planner estimates hint that spill is
inevitably will happen.

We also add new summary counter 'AddBatchTime' to get summary of how
much time spent in Sorter::AddBatch. Max of 'AddBatchTime' indicate
the longest time spent in Sorter::AddBatch, presumably busy doing
intermediary sort.

Testing:
- Add new e2e test TestQueryFullSort::test_multiple_sort_run_bytes_limits
- Run core tests
- Run data loading of 3 largest TPC-DS facts table of 300GB scale into
  real cluster using 5 backends, and 4GB mem_limit.
  sort_run_bytes_limit is varied between unspecified (not limited) vs
  512 MB. The performance result is summarized in the following table.

+---+-+--+---+-+
|  Insert table |  #Rows  |  Avg |   no limit|  512 MB 
limit   |
|   | | SortDataSize 
++--+-+---+
|   | |   per Node   |  Query |  Max |  Query  |
  Max  |
|   | |  |  Time  | AddBatchTime |   Time  |  
AddBatchTime |
+---+-+--++--+-+---+
| store_sales   | 864.00M | 15.29 GB | 30m18s | 53s311ms | 20m |
   5s634ms |
+---+-+--++--+-+---+
| catalog_sales | 431.97M | 11.34 GB | 23m24s | 31s212ms |  15m27s |
   3s603ms |
+---+-+--++--+-+---+
| web_sales | 216.01M |  5.67 GB |  8m16s | 29s250ms |   6m41s |
   3s856ms |
+---+-+--++--+-+---+

Change-Id: I2a0ba7c4bae4f1d300d4d9d7f594f63ced06a240
---
M be/src/exec/sort-node.cc
M be/src/exec/sort-node.h
M be/src/runtime/coordinator-backend-state.cc
M be/src/runtime/query-state.cc
M be/src/runtime/query-state.h
M be/src/runtime/sorter.cc
M be/src/runtime/sorter.h
M be/src/service/query-options-test.cc
M be/src/service/query-options.cc
M be/src/service/query-options.h
M common/thrift/ImpalaInternalService.thrift
M common/thrift/ImpalaService.thrift
M common/thrift/PlanNodes.thrift
M fe/src/main/java/org/apache/impala/planner/SortNode.java
M tests/query_test/test_sort.py
15 files changed, 224 insertions(+), 10 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/63/15963/16
--
To view, visit http://gerrit.cloudera.org:8080/15963
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I2a0ba7c4bae4f1d300d4d9d7f594f63ced06a240
Gerrit-Change-Number: 15963
Gerrit-PatchSet: 16
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: David Rorke 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-6692: Trigger sort node run before hitting memory limit.

2020-07-01 Thread Riza Suminto (Code Review)
Riza Suminto has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15963 )

Change subject: IMPALA-6692: Trigger sort node run before hitting memory limit.
..


Patch Set 15:

> Patch Set 15: Code-Review+1
>
> Thanks for the changes! I can update to +2 if no one else has comments.
>
> One more thing to think about is potentially finding a good default for 
> sort_run_bytes_limit. I agree with doing it in another commit, but it could 
> be good to know where we will track this, e.g. in IMPALA-6692 or a new jira.

Thank you, Csaba.
Yes, once this get merged, I can file a follow up JIRA to track what is the 
best default value for sort_run_bytes_limit.

I'm also interested to refine how we buffer OutboundRowBatch in 
krpc-data-stream-sender.
Even with this IMPALA-6692 solution, with around 512mb sort_run_bytes_limit, we 
will still have 2-3s back pressure anytime we do intermediate sort.
We can utilize that time in data stream sender to do more buffering and 
serialization of next RowBatch while waiting for the pipeline to unblock.

Since IMPALA-5444 and IMPALA-9692 (part 3) get in first, it looks like this 
patch will hit merge conflict. I can prepare a rebase and submit a new patchset 
later today.


--
To view, visit http://gerrit.cloudera.org:8080/15963
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I2a0ba7c4bae4f1d300d4d9d7f594f63ced06a240
Gerrit-Change-Number: 15963
Gerrit-PatchSet: 15
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: David Rorke 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Wed, 01 Jul 2020 19:35:09 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-6692: Trigger sort node run before hitting memory limit.

2020-07-01 Thread Csaba Ringhofer (Code Review)
Csaba Ringhofer has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15963 )

Change subject: IMPALA-6692: Trigger sort node run before hitting memory limit.
..


Patch Set 15: Code-Review+1

Thanks for the changes! I can update to +2 if no one else has comments.

One more thing to think about is potentially finding a good default for 
sort_run_bytes_limit. I agree with doing it in another commit, but it could be 
good to know where we will track this, e.g. in IMPALA-6692 or a new jira.


--
To view, visit http://gerrit.cloudera.org:8080/15963
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I2a0ba7c4bae4f1d300d4d9d7f594f63ced06a240
Gerrit-Change-Number: 15963
Gerrit-PatchSet: 15
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: David Rorke 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Wed, 01 Jul 2020 11:26:43 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-6692: Trigger sort node run before hitting memory limit.

2020-06-30 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15963 )

Change subject: IMPALA-6692: Trigger sort node run before hitting memory limit.
..


Patch Set 15:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/6465/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/15963
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I2a0ba7c4bae4f1d300d4d9d7f594f63ced06a240
Gerrit-Change-Number: 15963
Gerrit-PatchSet: 15
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: David Rorke 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Tue, 30 Jun 2020 14:45:01 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-6692: Trigger sort node run before hitting memory limit.

2020-06-30 Thread Riza Suminto (Code Review)
Riza Suminto has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15963 )

Change subject: IMPALA-6692: Trigger sort node run before hitting memory limit.
..


Patch Set 15:

(8 comments)

Thank you, Csaba!

http://gerrit.cloudera.org:8080/#/c/15963/14//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/15963/14//COMMIT_MSG@10
PS14, Line 10: batches are added to current unsorted run or memory limit is hit,
> missing 'are'?
Done


http://gerrit.cloudera.org:8080/#/c/15963/14//COMMIT_MSG@11
PS14, Line 11: sorter will immediately start the run. If the latter case 
happens,
> nit: happens
Done


http://gerrit.cloudera.org:8080/#/c/15963/14//COMMIT_MSG@13
PS14, Line 13: new unsorted run object, and continue to add the next row 
batches, and
> missing 'to'
Done


http://gerrit.cloudera.org:8080/#/c/15963/14//COMMIT_MSG@15
PS14, Line 15:
> nit: tries
Done


http://gerrit.cloudera.org:8080/#/c/15963/14//COMMIT_MSG@23
PS14, Line 23:
> nit: speeds up
Done


http://gerrit.cloudera.org:8080/#/c/15963/14//COMMIT_MSG@26
PS14, Line 26: 'sort_
> nit: exceeded
Done


http://gerrit.cloudera.org:8080/#/c/15963/14//COMMIT_MSG@32
PS14, Line 32: ssion
> nit: fit
Done


http://gerrit.cloudera.org:8080/#/c/15963/14/be/src/runtime/sorter.cc
File be/src/runtime/sorter.cc:

http://gerrit.cloudera.org:8080/#/c/15963/14/be/src/runtime/sorter.cc@952
PS14, Line 952: 3
> I didn't catch this one - I think that it should be also VLOG(3)
Done



--
To view, visit http://gerrit.cloudera.org:8080/15963
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I2a0ba7c4bae4f1d300d4d9d7f594f63ced06a240
Gerrit-Change-Number: 15963
Gerrit-PatchSet: 15
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: David Rorke 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Tue, 30 Jun 2020 14:18:47 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-6692: Trigger sort node run before hitting memory limit.

2020-06-30 Thread Riza Suminto (Code Review)
Hello David Rorke, Tim Armstrong, Csaba Ringhofer, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/15963

to look at the new patch set (#15).

Change subject: IMPALA-6692: Trigger sort node run before hitting memory limit.
..

IMPALA-6692: Trigger sort node run before hitting memory limit.

Sorter node works by adding row batches to a sort run. After all
batches are added to current unsorted run or memory limit is hit,
sorter will immediately start the run. If the latter case happens,
sorter will spill the sorted run to disk after sort complete, create
new unsorted run object, and continue to add the next row batches, and
so on.

This algorithm tries to fit as much rows into memory before start
sorting. However, in the case of partitioned sort with large number of
row batches, fitting too much rows into memory will cause the sort to
be slow and block the sorter node for a long time before it can
release some memory and continue accepting the next row batch from
exchange node. One slow sorter node can block exchange node from
sending row batches to other sorter node that is free.

This patch speeds up the decision to start the sort without waiting it
to hit memory limit first by capping the intermediary quicksort run to
lower memory limit, determined by query option 'sort_run_bytes_limit'.
If the total used reservation of quicksort has exceeded
sort_run_bytes_limit, current unsorted_run_ will be wrapped up,
sorted, and then spilled. Thus, overlapping the next sort run with
spill from previous sort run.

To reduce regression for cases where total input size of sort node
might be fully fit into available memory, sort_run_bytes_limit will
not be enforced for the first sort run. However, it will stay limited
by sort_run_bytes_limit if planner estimates hint that spill is
inevitably will happen.

We also add new summary counter 'AddBatchTime' to get summary of how
much time spent in Sorter::AddBatch. Max of 'AddBatchTime' indicate
the longest time spent in Sorter::AddBatch, presumably busy doing
intermediary sort.

Testing:
- Add new e2e test TestQueryFullSort::test_multiple_sort_run_bytes_limits
- Run core tests
- Run data loading of 3 largest TPC-DS facts table of 300GB scale into
  real cluster using 5 backends, and 4GB mem_limit.
  sort_run_bytes_limit is varied between unspecified (not limited) vs
  512 MB. The performance result is summarized in the following table.

+---+-+--+---+-+
|  Insert table |  #Rows  |  Avg |   no limit|  512 MB 
limit   |
|   | | SortDataSize 
++--+-+---+
|   | |   per Node   |  Query |  Max |  Query  |
  Max  |
|   | |  |  Time  | AddBatchTime |   Time  |  
AddBatchTime |
+---+-+--++--+-+---+
| store_sales   | 864.00M | 15.29 GB | 30m18s | 53s311ms | 20m |
   5s634ms |
+---+-+--++--+-+---+
| catalog_sales | 431.97M | 11.34 GB | 23m24s | 31s212ms |  15m27s |
   3s603ms |
+---+-+--++--+-+---+
| web_sales | 216.01M |  5.67 GB |  8m16s | 29s250ms |   6m41s |
   3s856ms |
+---+-+--++--+-+---+

Change-Id: I2a0ba7c4bae4f1d300d4d9d7f594f63ced06a240
---
M be/src/exec/sort-node.cc
M be/src/exec/sort-node.h
M be/src/runtime/coordinator-backend-state.cc
M be/src/runtime/query-state.cc
M be/src/runtime/query-state.h
M be/src/runtime/sorter.cc
M be/src/runtime/sorter.h
M be/src/service/query-options-test.cc
M be/src/service/query-options.cc
M be/src/service/query-options.h
M common/thrift/ImpalaInternalService.thrift
M common/thrift/ImpalaService.thrift
M common/thrift/PlanNodes.thrift
M fe/src/main/java/org/apache/impala/planner/SortNode.java
M tests/query_test/test_sort.py
15 files changed, 225 insertions(+), 11 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/63/15963/15
--
To view, visit http://gerrit.cloudera.org:8080/15963
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I2a0ba7c4bae4f1d300d4d9d7f594f63ced06a240
Gerrit-Change-Number: 15963
Gerrit-PatchSet: 15
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: David Rorke 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-6692: Trigger sort node run before hitting memory limit.

2020-06-30 Thread Csaba Ringhofer (Code Review)
Csaba Ringhofer has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15963 )

Change subject: IMPALA-6692: Trigger sort node run before hitting memory limit.
..


Patch Set 14: Code-Review+1

(8 comments)

http://gerrit.cloudera.org:8080/#/c/15963/14//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/15963/14//COMMIT_MSG@10
PS14, Line 10: batches added to current unsorted run or memory limit is hit, 
sorter
missing 'are'?


http://gerrit.cloudera.org:8080/#/c/15963/14//COMMIT_MSG@11
PS14, Line 11: will immediately start the run. If the latter case happen, 
sorter will
nit: happens


http://gerrit.cloudera.org:8080/#/c/15963/14//COMMIT_MSG@13
PS14, Line 13: run object, and continue add the next row batches, and so on.
missing 'to'


http://gerrit.cloudera.org:8080/#/c/15963/14//COMMIT_MSG@15
PS14, Line 15: try
nit: tries


http://gerrit.cloudera.org:8080/#/c/15963/14//COMMIT_MSG@23
PS14, Line 23: speedup
nit: speeds up


http://gerrit.cloudera.org:8080/#/c/15963/14//COMMIT_MSG@26
PS14, Line 26: exceed
nit: exceeded


http://gerrit.cloudera.org:8080/#/c/15963/14//COMMIT_MSG@32
PS14, Line 32: fitted
nit: fit


http://gerrit.cloudera.org:8080/#/c/15963/14/be/src/runtime/sorter.cc
File be/src/runtime/sorter.cc:

http://gerrit.cloudera.org:8080/#/c/15963/14/be/src/runtime/sorter.cc@952
PS14, Line 952: 2
I didn't catch this one - I think that it should be also VLOG(3)



--
To view, visit http://gerrit.cloudera.org:8080/15963
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I2a0ba7c4bae4f1d300d4d9d7f594f63ced06a240
Gerrit-Change-Number: 15963
Gerrit-PatchSet: 14
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: David Rorke 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Tue, 30 Jun 2020 08:19:11 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-6692: Trigger sort node run before hitting memory limit.

2020-06-26 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15963 )

Change subject: IMPALA-6692: Trigger sort node run before hitting memory limit.
..


Patch Set 14:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/6440/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/15963
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I2a0ba7c4bae4f1d300d4d9d7f594f63ced06a240
Gerrit-Change-Number: 15963
Gerrit-PatchSet: 14
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: David Rorke 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Sat, 27 Jun 2020 02:11:36 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-6692: Trigger sort node run before hitting memory limit.

2020-06-26 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15963 )

Change subject: IMPALA-6692: Trigger sort node run before hitting memory limit.
..


Patch Set 13:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/6439/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/15963
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I2a0ba7c4bae4f1d300d4d9d7f594f63ced06a240
Gerrit-Change-Number: 15963
Gerrit-PatchSet: 13
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: David Rorke 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Sat, 27 Jun 2020 02:08:21 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-6692: Trigger sort node run before hitting memory limit.

2020-06-26 Thread Riza Suminto (Code Review)
Riza Suminto has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15963 )

Change subject: IMPALA-6692: Trigger sort node run before hitting memory limit.
..


Patch Set 14:

(13 comments)

http://gerrit.cloudera.org:8080/#/c/15963/12/be/src/exec/sort-node.h
File be/src/exec/sort-node.h:

http://gerrit.cloudera.org:8080/#/c/15963/12/be/src/exec/sort-node.h@77
PS12, Line 77: will
> nit: "will go"?
Done


http://gerrit.cloudera.org:8080/#/c/15963/12/be/src/exec/sort-node.cc
File be/src/exec/sort-node.cc:

http://gerrit.cloudera.org:8080/#/c/15963/12/be/src/exec/sort-node.cc@89
PS12, Line 89:
> Ok, in that case, we should just abandon estimate in case of cardinality is
This is now changed to obtain full input estimate from planner.


http://gerrit.cloudera.org:8080/#/c/15963/12/be/src/exec/sort-node.cc@90
PS12, Line 90:
> I will look at possibility to access that average size data in the backend.
I figure out how to get more precise estimate from planner. It is close enough 
to the actual input size that I observe in my test.


http://gerrit.cloudera.org:8080/#/c/15963/12/be/src/exec/sort-node.cc@92
PS12, Line 92: &
> I think that VLOG(3) is enough.
Done


http://gerrit.cloudera.org:8080/#/c/15963/12/be/src/runtime/sorter.h
File be/src/runtime/sorter.h:

http://gerrit.cloudera.org:8080/#/c/15963/12/be/src/runtime/sorter.h@101
PS12, Line 101: t
> "the" would be better
Done


http://gerrit.cloudera.org:8080/#/c/15963/12/be/src/runtime/sorter.h@101
PS12, Line 101:   /// 'estimated_input_size' is the total rows in bytes that 
are estimated to get added
> nit: missing "are"
Done


http://gerrit.cloudera.org:8080/#/c/15963/12/be/src/runtime/sorter.h@102
PS12, Line 102:   /// into this sorter. This is used to decide if sorter needs 
to proactively spill for
> nit: needs
Done


http://gerrit.cloudera.org:8080/#/c/15963/12/be/src/runtime/sorter.h@102
PS12, Line 102: ively sp
> nit: spill
Done


http://gerrit.cloudera.org:8080/#/c/15963/12/be/src/runtime/sorter.h@223
PS12, Line 223: do
> nit: "do an"?
Done


http://gerrit.cloudera.org:8080/#/c/15963/12/be/src/runtime/sorter.cc
File be/src/runtime/sorter.cc:

http://gerrit.cloudera.org:8080/#/c/15963/12/be/src/runtime/sorter.cc@816
PS12, Line 816:   PrettyPrinter::PrintBytes(estimated_input_size),
> I think that VLOG(3) is enough here - this should happen if the cardinality
Done


http://gerrit.cloudera.org:8080/#/c/15963/12/be/src/runtime/sorter.cc@907
PS12, Line 907: PrettyPrinter::PrintBytes(GetSortRunBytesLimit()));
> Same as line 816.
Done


http://gerrit.cloudera.org:8080/#/c/15963/12/common/thrift/ImpalaInternalService.thrift
File common/thrift/ImpalaInternalService.thrift:

http://gerrit.cloudera.org:8080/#/c/15963/12/common/thrift/ImpalaInternalService.thrift@645
PS12, Line 645:  a join
> typo: backends
Done


http://gerrit.cloudera.org:8080/#/c/15963/12/tests/query_test/test_sort.py
File tests/query_test/test_sort.py:

http://gerrit.cloudera.org:8080/#/c/15963/12/tests/query_test/test_sort.py@74
PS12, Line 74: """The first sort run is given a privilege to ignore 
sort_run_bytes_limit, except
 :when estimate hints that spill is inevitable. The lower 
sort_run_bytes_limit of
 :a query is, the more sort runs are likely to be produced 
and spilled.
 :Case 1 : 0 SpilledRuns, because all rows fit within the 
maximum reservation.
 : sort_run_bytes_limit is not enforced.
 :Case 2 : 3 SpilledRuns, because the first run hit 
reservation limit, and the
 : next 2 runs are capped to 150m.
 :Case 3 : 4 SpilledRuns, because sort node estimate that 
spill is inevitable.
 : So all runs are capped to 130m, including the 
first one."""
> I will look at that 'query_result.runtime_profile'.
Done



--
To view, visit http://gerrit.cloudera.org:8080/15963
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I2a0ba7c4bae4f1d300d4d9d7f594f63ced06a240
Gerrit-Change-Number: 15963
Gerrit-PatchSet: 14
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: David Rorke 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Sat, 27 Jun 2020 01:50:13 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-6692: Trigger sort node run before hitting memory limit.

2020-06-26 Thread Riza Suminto (Code Review)
Hello David Rorke, Tim Armstrong, Csaba Ringhofer, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/15963

to look at the new patch set (#14).

Change subject: IMPALA-6692: Trigger sort node run before hitting memory limit.
..

IMPALA-6692: Trigger sort node run before hitting memory limit.

Sorter node works by adding row batches to a sort run. After all
batches added to current unsorted run or memory limit is hit, sorter
will immediately start the run. If the latter case happen, sorter will
spill the sorted run to disk after sort complete, create new unsorted
run object, and continue add the next row batches, and so on.

This algorithm try to fit as much rows into memory before start
sorting. However, in the case of partitioned sort with large number of
row batches, fitting too much rows into memory will cause the sort to
be slow and block the sorter node for a long time before it can
release some memory and continue accepting the next row batch from
exchange node. One slow sorter node can block exchange node from
sending row batches to other sorter node that is free.

This patch speedup the decision to start the sort without waiting it
to hit memory limit first by capping the intermediary quicksort run to
lower memory limit, determined by query option 'sort_run_bytes_limit'.
If the total used reservation of quicksort has exceed
sort_run_bytes_limit, current unsorted_run_ will be wrapped up,
sorted, and then spilled. Thus, overlapping the next sort run with
spill from previous sort run.

To reduce regression for cases where total input size of sort node
might be fully fitted into available memory, sort_run_bytes_limit will
not be enforced for the first sort run. However, it will stay limited
by sort_run_bytes_limit if planner estimates hint that spill is
inevitably will happen.

We also add new summary counter 'AddBatchTime' to get summary of how
much time spent in Sorter::AddBatch. Max of 'AddBatchTime' indicate
the longest time spent in Sorter::AddBatch, presumably busy doing
intermediary sort.

Testing:
- Add new e2e test TestQueryFullSort::test_multiple_sort_run_bytes_limits
- Run core tests
- Run data loading of 3 largest TPC-DS facts table of 300GB scale into
  real cluster using 5 backends, and 4GB mem_limit.
  sort_run_bytes_limit is varied between unspecified (not limited) vs
  512 MB. The performance result is summarized in the following table.

+---+-+--+---+-+
|  Insert table |  #Rows  |  Avg |   no limit|  512 MB 
limit   |
|   | | SortDataSize 
++--+-+---+
|   | |   per Node   |  Query |  Max |  Query  |
  Max  |
|   | |  |  Time  | AddBatchTime |   Time  |  
AddBatchTime |
+---+-+--++--+-+---+
| store_sales   | 864.00M | 15.29 GB | 30m18s | 53s311ms | 20m |
   5s634ms |
+---+-+--++--+-+---+
| catalog_sales | 431.97M | 11.34 GB | 23m24s | 31s212ms |  15m27s |
   3s603ms |
+---+-+--++--+-+---+
| web_sales | 216.01M |  5.67 GB |  8m16s | 29s250ms |   6m41s |
   3s856ms |
+---+-+--++--+-+---+

Change-Id: I2a0ba7c4bae4f1d300d4d9d7f594f63ced06a240
---
M be/src/exec/sort-node.cc
M be/src/exec/sort-node.h
M be/src/runtime/coordinator-backend-state.cc
M be/src/runtime/query-state.cc
M be/src/runtime/query-state.h
M be/src/runtime/sorter.cc
M be/src/runtime/sorter.h
M be/src/service/query-options-test.cc
M be/src/service/query-options.cc
M be/src/service/query-options.h
M common/thrift/ImpalaInternalService.thrift
M common/thrift/ImpalaService.thrift
M common/thrift/PlanNodes.thrift
M fe/src/main/java/org/apache/impala/planner/SortNode.java
M tests/query_test/test_sort.py
15 files changed, 225 insertions(+), 11 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/63/15963/14
--
To view, visit http://gerrit.cloudera.org:8080/15963
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I2a0ba7c4bae4f1d300d4d9d7f594f63ced06a240
Gerrit-Change-Number: 15963
Gerrit-PatchSet: 14
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: David Rorke 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-6692: Trigger sort node run before hitting memory limit.

2020-06-26 Thread Riza Suminto (Code Review)
Hello David Rorke, Tim Armstrong, Csaba Ringhofer, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/15963

to look at the new patch set (#13).

Change subject: IMPALA-6692: Trigger sort node run before hitting memory limit.
..

IMPALA-6692: Trigger sort node run before hitting memory limit.

Sorter node works by adding row batches to a sort run. After all
batches added to current unsorted run or memory limit is hit, sorter
will immediately start the run. If the latter case happen, sorter will
spill the sorted run to disk after sort complete, create new unsorted
run object, and continue add the next row batches, and so on.

This algorithm try to fit as much rows into memory before start
sorting. However, in the case of partitioned sort with large number of
row batches, fitting too much rows into memory will cause the sort to
be slow and block the sorter node for a long time before it can
release some memory and continue accepting the next row batch from
exchange node. One slow sorter node can block exchange node from
sending row batches to other sorter node that is free.

This patch speedup the decision to start the sort without waiting it
to hit memory limit first by capping the intermediary quicksort run to
lower memory limit, determined by query option 'sort_run_bytes_limit'.
If the total used reservation of quicksort has exceed
sort_run_bytes_limit, current unsorted_run_ will be wrapped up,
sorted, and then spilled. Thus, overlapping the next sort run with
spill from previous sort run.

To reduce regression for cases where total input size of sort node
might be fully fitted into available memory, sort_run_bytes_limit will
not be enforced for the first sort run. However, it will stay limited
by sort_run_bytes_limit if planner estimates hint that spill is
inevitably will happen.

We also add new summary counter 'AddBatchTime' to get summary of how
much time spent in Sorter::AddBatch. Max of 'AddBatchTime' indicate
the longest time spent in Sorter::AddBatch, presumably busy doing
intermediary sort.

Testing:
- Add new e2e test TestQueryFullSort::test_multiple_sort_run_bytes_limits
- Run core tests
- Run data loading of 3 largest TPC-DS facts table of 300GB scale into
  real cluster using 5 backends, and 4GB mem_limit.
  sort_run_bytes_limit is varied between unspecified (not limited) vs
  512 MB. The performance result is summarized in the following table.

+---+-+--+---+-+
|  Insert table |  #Rows  |  Avg |   no limit|  512 MB 
limit   |
|   | | SortDataSize 
++--+-+---+
|   | |   per Node   |  Query |  Max |  Query  |
  Max  |
|   | |  |  Time  | AddBatchTime |   Time  |  
AddBatchTime |
+---+-+--++--+-+---+
| store_sales   | 864.00M | 15.29 GB | 30m18s | 53s311ms | 20m |
   5s634ms |
+---+-+--++--+-+---+
| catalog_sales | 431.97M | 11.34 GB | 23m24s | 31s212ms |  15m27s |
   3s603ms |
+---+-+--++--+-+---+
| web_sales | 216.01M |  5.67 GB |  8m16s | 29s250ms |   6m41s |
   3s856ms |
+---+-+--++--+-+---+

Change-Id: I2a0ba7c4bae4f1d300d4d9d7f594f63ced06a240
---
M be/src/exec/sort-node.cc
M be/src/exec/sort-node.h
M be/src/runtime/coordinator-backend-state.cc
M be/src/runtime/query-state.cc
M be/src/runtime/query-state.h
M be/src/runtime/sorter.cc
M be/src/runtime/sorter.h
M be/src/service/query-options-test.cc
M be/src/service/query-options.cc
M be/src/service/query-options.h
M common/thrift/ImpalaInternalService.thrift
M common/thrift/ImpalaService.thrift
M common/thrift/PlanNodes.thrift
M fe/src/main/java/org/apache/impala/planner/SortNode.java
M tests/query_test/test_sort.py
15 files changed, 225 insertions(+), 11 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/63/15963/13
--
To view, visit http://gerrit.cloudera.org:8080/15963
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I2a0ba7c4bae4f1d300d4d9d7f594f63ced06a240
Gerrit-Change-Number: 15963
Gerrit-PatchSet: 13
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: David Rorke 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-6692: Trigger sort node run before hitting memory limit.

2020-06-25 Thread Riza Suminto (Code Review)
Riza Suminto has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15963 )

Change subject: IMPALA-6692: Trigger sort node run before hitting memory limit.
..


Patch Set 12:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/15963/12/be/src/exec/sort-node.cc
File be/src/exec/sort-node.cc:

http://gerrit.cloudera.org:8080/#/c/15963/12/be/src/exec/sort-node.cc@90
PS12, Line 90: GetRowSize
> I didn't dig too deep, but row_descriptor_->GetRowSize() seems to contain t
I will look at possibility to access that average size data in the backend.

But just to make sure I get it right.
For row that contain varlen data, the GetRowSize() will most likely 
underestimate the size, since it only takes account for the pointer, but not 
the string length itself?
So that, in turn, will cause return value of this ComputeInputSizeEstimate() to 
be underestimate as well.

But isn't this input size underestimation better than overestimation? In case 
of underestimation, the worse situation is that we don't enforce 
sort_run_bytes_limit  for the first run (hoping that all will fit in memory), 
turns out wrong and spill, but we then enforce sort_run_bytes_limit for the 
next runs. Overestimation is worse, because we unnecessarily spill from 
beginning when the input can possibly fit in the memory.



--
To view, visit http://gerrit.cloudera.org:8080/15963
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I2a0ba7c4bae4f1d300d4d9d7f594f63ced06a240
Gerrit-Change-Number: 15963
Gerrit-PatchSet: 12
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: David Rorke 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Thu, 25 Jun 2020 20:09:22 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-6692: Trigger sort node run before hitting memory limit.

2020-06-25 Thread Csaba Ringhofer (Code Review)
Csaba Ringhofer has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15963 )

Change subject: IMPALA-6692: Trigger sort node run before hitting memory limit.
..


Patch Set 12:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/15963/12/be/src/exec/sort-node.cc
File be/src/exec/sort-node.cc:

http://gerrit.cloudera.org:8080/#/c/15963/12/be/src/exec/sort-node.cc@90
PS12, Line 90: GetRowSize
> So what is the nature of varlen column? Is each row possibly will have diff
I didn't dig too deep, but row_descriptor_->GetRowSize() seems to contain the 
size of the tuple that holds a row - but in case of string and varchar it 
contains a pointer (+length), so there is additional data in some buffer.

The column stats contain AvgSize and MaxSize - these are constants for fixed 
sized types, but we calculate them for strings during COMPUTE STATS, so we can 
get a more or less accurate estimation for the total amount of memory consumed.

I don't know from the top of my head how to access this data in the backend.

Strings are very common, so many queries contain varlen slots. I am not sure if 
it is a good idea to create an optimization specifically for queries without 
strings.



--
To view, visit http://gerrit.cloudera.org:8080/15963
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I2a0ba7c4bae4f1d300d4d9d7f594f63ced06a240
Gerrit-Change-Number: 15963
Gerrit-PatchSet: 12
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: David Rorke 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Thu, 25 Jun 2020 18:11:01 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-6692: Trigger sort node run before hitting memory limit.

2020-06-25 Thread Riza Suminto (Code Review)
Riza Suminto has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15963 )

Change subject: IMPALA-6692: Trigger sort node run before hitting memory limit.
..


Patch Set 12:

(3 comments)

Thank you Csaba for your feedback!
I have couple follow up questions.

http://gerrit.cloudera.org:8080/#/c/15963/12/be/src/exec/sort-node.cc
File be/src/exec/sort-node.cc:

http://gerrit.cloudera.org:8080/#/c/15963/12/be/src/exec/sort-node.cc@89
PS12, Line 89: cardinality
> What is the default value of this? Can it be -1 (unknown)? The result seems
Ok, in that case, we should just abandon estimate in case of cardinality is -1.


http://gerrit.cloudera.org:8080/#/c/15963/12/be/src/exec/sort-node.cc@90
PS12, Line 90: GetRowSize
> I think that this doesn't contain varlen data, so it can greatly underestim
So what is the nature of varlen column? Is each row possibly will have 
different sizes with large variations?
And what is GetRowSize() return in that case? Thinking if we should abandon the 
estimate entirely for input rows having varlen data.


http://gerrit.cloudera.org:8080/#/c/15963/12/tests/query_test/test_sort.py
File tests/query_test/test_sort.py:

http://gerrit.cloudera.org:8080/#/c/15963/12/tests/query_test/test_sort.py@74
PS12, Line 74: """The first sort run is given a privilege to ignore 
sort_run_bytes_limit, except
 :when estimate hints that spill is inevitable. The lower 
sort_run_bytes_limit of
 :a query is, the more sort runs are likely to be produced.
 :Case 1 : 1 run produced, because all rows fit within the 
maximum reservation.
 : sort_run_bytes_limit is not enforced.
 :Case 2 : 3 run produced, because the first run hit 
reservation limit, and the
 : next 2 runs are capped to 150m.
 :Case 3 : 4 run produced, because sort node estimate that 
spill is inevitable.
 : So all runs are capped to 130m, including the 
first one."""
> Isn't there something in query_result.runtime_profile that could be used to
I will look at that 'query_result.runtime_profile'.
Otherwise, I will change this test to run_test_case and verify the profile via 
regex.



--
To view, visit http://gerrit.cloudera.org:8080/15963
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I2a0ba7c4bae4f1d300d4d9d7f594f63ced06a240
Gerrit-Change-Number: 15963
Gerrit-PatchSet: 12
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: David Rorke 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Thu, 25 Jun 2020 16:15:34 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-6692: Trigger sort node run before hitting memory limit.

2020-06-25 Thread Csaba Ringhofer (Code Review)
Csaba Ringhofer has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15963 )

Change subject: IMPALA-6692: Trigger sort node run before hitting memory limit.
..


Patch Set 12:

(13 comments)

Sorry for the many grammar comments, I was also the victim of this in the past 
:)

My only real concern is about the case when the cardinality is unknown. My 
preference would be to try to allow spilling in that case.

http://gerrit.cloudera.org:8080/#/c/15963/12/be/src/exec/sort-node.h
File be/src/exec/sort-node.h:

http://gerrit.cloudera.org:8080/#/c/15963/12/be/src/exec/sort-node.h@77
PS12, Line 77: going
nit: "will go"?


http://gerrit.cloudera.org:8080/#/c/15963/12/be/src/exec/sort-node.cc
File be/src/exec/sort-node.cc:

http://gerrit.cloudera.org:8080/#/c/15963/12/be/src/exec/sort-node.cc@89
PS12, Line 89: cardinality
What is the default value of this? Can it be -1 (unknown)? The result seems 
pretty wrong in that case.


http://gerrit.cloudera.org:8080/#/c/15963/12/be/src/exec/sort-node.cc@90
PS12, Line 90: GetRowSize
I think that this doesn't contain varlen data, so it can greatly underestimate 
the input size if there are strings.


http://gerrit.cloudera.org:8080/#/c/15963/12/be/src/exec/sort-node.cc@92
PS12, Line 92: 2)
I think that VLOG(3) is enough.


http://gerrit.cloudera.org:8080/#/c/15963/12/be/src/runtime/sorter.h
File be/src/runtime/sorter.h:

http://gerrit.cloudera.org:8080/#/c/15963/12/be/src/runtime/sorter.h@101
PS12, Line 101: a
"the" would be better


http://gerrit.cloudera.org:8080/#/c/15963/12/be/src/runtime/sorter.h@101
PS12, Line 101:   /// 'estimated_input_size' is a total rows in bytes that 
estimated to get added into
nit: missing "are"


http://gerrit.cloudera.org:8080/#/c/15963/12/be/src/runtime/sorter.h@102
PS12, Line 102:   /// this sorter. This is used to decide if sorter need to 
proactively spilling for
nit: needs


http://gerrit.cloudera.org:8080/#/c/15963/12/be/src/runtime/sorter.h@102
PS12, Line 102: spilling
nit: spill


http://gerrit.cloudera.org:8080/#/c/15963/12/be/src/runtime/sorter.h@223
PS12, Line 223: run
nit: "do an"?


http://gerrit.cloudera.org:8080/#/c/15963/12/be/src/runtime/sorter.cc
File be/src/runtime/sorter.cc:

http://gerrit.cloudera.org:8080/#/c/15963/12/be/src/runtime/sorter.cc@816
PS12, Line 816:   VLOG(2) << Substitute(
I think that VLOG(3) is enough here - this should happen if the cardinality 
estimation was wrong, which may make WARNING logical, but this seems 
unavoidable for many queries, so I wouldn't spam the warning log.


http://gerrit.cloudera.org:8080/#/c/15963/12/be/src/runtime/sorter.cc@907
PS12, Line 907: VLOG(2) << Substitute(
Same as line 816.


http://gerrit.cloudera.org:8080/#/c/15963/12/common/thrift/ImpalaInternalService.thrift
File common/thrift/ImpalaInternalService.thrift:

http://gerrit.cloudera.org:8080/#/c/15963/12/common/thrift/ImpalaInternalService.thrift@645
PS12, Line 645: backeds
typo: backends


http://gerrit.cloudera.org:8080/#/c/15963/12/tests/query_test/test_sort.py
File tests/query_test/test_sort.py:

http://gerrit.cloudera.org:8080/#/c/15963/12/tests/query_test/test_sort.py@74
PS12, Line 74: """The first sort run is given a privilege to ignore 
sort_run_bytes_limit, except
 :when estimate hints that spill is inevitable. The lower 
sort_run_bytes_limit of
 :a query is, the more sort runs are likely to be produced.
 :Case 1 : 1 run produced, because all rows fit within the 
maximum reservation.
 : sort_run_bytes_limit is not enforced.
 :Case 2 : 3 run produced, because the first run hit 
reservation limit, and the
 : next 2 runs are capped to 150m.
 :Case 3 : 4 run produced, because sort node estimate that 
spill is inevitable.
 : So all runs are capped to 130m, including the 
first one."""
Isn't there something in query_result.runtime_profile that could be used to 
check some of these statements? E.g. I think we can check that no spilling 
occurred for case 1, but it did occur for case 2 and 3



--
To view, visit http://gerrit.cloudera.org:8080/15963
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I2a0ba7c4bae4f1d300d4d9d7f594f63ced06a240
Gerrit-Change-Number: 15963
Gerrit-PatchSet: 12
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: David Rorke 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Thu, 25 Jun 2020 15:00:47 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-6692: Trigger sort node run before hitting memory limit.

2020-06-16 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15963 )

Change subject: IMPALA-6692: Trigger sort node run before hitting memory limit.
..


Patch Set 12:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/6338/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/15963
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I2a0ba7c4bae4f1d300d4d9d7f594f63ced06a240
Gerrit-Change-Number: 15963
Gerrit-PatchSet: 12
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: David Rorke 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Tue, 16 Jun 2020 15:50:52 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-6692: Trigger sort node run before hitting memory limit.

2020-06-16 Thread Riza Suminto (Code Review)
Riza Suminto has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15963 )

Change subject: IMPALA-6692: Trigger sort node run before hitting memory limit.
..


Patch Set 12:

(5 comments)

Patch Set 12 add num_backends info in TPlanFragmentInstanceCtx.

Using this info, sort node can make a better estimation of how many input data 
that might be received by each backend, and, if necessary, enforce 
sort_run_bytes_limit right from the beginning.

http://gerrit.cloudera.org:8080/#/c/15963/10/be/src/exec/sort-node.cc
File be/src/exec/sort-node.cc:

http://gerrit.cloudera.org:8080/#/c/15963/10/be/src/exec/sort-node.cc@199
PS10, Line 199:  << " nulls " << (tsort_info.
> We usually simply use "Status status = ... ".
Done


http://gerrit.cloudera.org:8080/#/c/15963/11/be/src/exec/sort-node.cc
File be/src/exec/sort-node.cc:

http://gerrit.cloudera.org:8080/#/c/15963/11/be/src/exec/sort-node.cc@76
PS11, Line 76:   SCOPED_TIMER(runtime_profile_->total_time_counter());
 :   RETURN_IF_ERROR(ExecNode::Prepare(state));
> I'm not sure my calculation is right here. My assumption is that estimated_
Done.

We now piggyback num_backends info through TPlanFragmentInstanceCtx.


http://gerrit.cloudera.org:8080/#/c/15963/10/be/src/runtime/sorter.h
File be/src/runtime/sorter.h:

http://gerrit.cloudera.org:8080/#/c/15963/10/be/src/runtime/sorter.h@160
PS10, Line 160:
> typo
Done


http://gerrit.cloudera.org:8080/#/c/15963/10/be/src/runtime/sorter.h@211
PS10, Line 211: kely go
> nit: specifies
Done


http://gerrit.cloudera.org:8080/#/c/15963/10/be/src/runtime/sorter.h@217
PS10, Line 217:   /// Get value of sort_run_bytes_limit query option. If user 
specifies value between
> nit: cases
Done



--
To view, visit http://gerrit.cloudera.org:8080/15963
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I2a0ba7c4bae4f1d300d4d9d7f594f63ced06a240
Gerrit-Change-Number: 15963
Gerrit-PatchSet: 12
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: David Rorke 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Tue, 16 Jun 2020 15:14:52 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-6692: Trigger sort node run before hitting memory limit.

2020-06-16 Thread Riza Suminto (Code Review)
Hello David Rorke, Tim Armstrong, Csaba Ringhofer, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/15963

to look at the new patch set (#12).

Change subject: IMPALA-6692: Trigger sort node run before hitting memory limit.
..

IMPALA-6692: Trigger sort node run before hitting memory limit.

Sorter node works by adding row batches to a sort run. After all
batches added to current unsorted run or memory limit is hit, sorter
will immediately start the run. If the latter case happen, sorter will
spill the sorted run to disk after sort complete, create new unsorted
run object, and continue add the next row batches, and so on.

This algorithm try to fit as much rows into memory before start
sorting. However, in the case of partitioned sort with large number of
row batches, fitting too much rows into memory will cause the sort to
be slow and block the sorter node for a long time before it can
release some memory and continue accepting the next row batch from
exchange node. One slow sorter node can block exchange node from
sending row batches to other sorter node that is free.

This patch speedup the decision to start the sort without waiting it
to hit memory limit first by capping the intermediary quicksort run to
lower memory limit, determined by query option 'sort_run_bytes_limit'.
If the total used reservation of quicksort has exceed
sort_run_bytes_limit, current unsorted_run_ will be wrapped up,
sorted, and then spilled. Thus, overlapping the next sort run with
spill from previous sort run.

To reduce regression for cases where total input size of sort node
might be fully fitted into available memory, sort_run_bytes_limit will
not be enforced for the first sort run. However, it will stay limited
by sort_run_bytes_limit if planner estimates hint that spill is
inevitably will happen.

We also add new summary counter 'AddBatchTime' to get summary of how
much time spent in Sorter::AddBatch. Max of 'AddBatchTime' indicate
the longest time spent in Sorter::AddBatch, presumably busy doing
intermediary sort.

Testing:
- Add new e2e test TestQueryFullSort::test_multiple_sort_run_bytes_limits
- Run core tests
- Run data loading of 3 largest TPC-DS facts table of 300GB scale into
  real cluster using 5 backends, and 4GB mem_limit.
  sort_run_bytes_limit is varied between unspecified (not limited) vs
  512 MB. The performance result is summarized in the following table.

+---+-+--+---+-+
|  Insert table |  #Rows  |  Avg |   no limit|  512 MB 
limit   |
|   | | SortDataSize 
++--+-+---+
|   | |   per Node   |  Query |  Max |  Query  |
  Max  |
|   | |  |  Time  | AddBatchTime |   Time  |  
AddBatchTime |
+---+-+--++--+-+---+
| store_sales   | 864.00M | 15.29 GB | 30m18s | 53s311ms | 20m |
   5s634ms |
+---+-+--++--+-+---+
| catalog_sales | 431.97M | 11.34 GB | 23m24s | 31s212ms |  15m27s |
   3s603ms |
+---+-+--++--+-+---+
| web_sales | 216.01M |  5.67 GB |  8m16s | 29s250ms |   6m41s |
   3s856ms |
+---+-+--++--+-+---+

Change-Id: I2a0ba7c4bae4f1d300d4d9d7f594f63ced06a240
---
M be/src/exec/sort-node.cc
M be/src/exec/sort-node.h
M be/src/runtime/coordinator-backend-state.cc
M be/src/runtime/query-state.cc
M be/src/runtime/query-state.h
M be/src/runtime/sorter.cc
M be/src/runtime/sorter.h
M be/src/service/query-options-test.cc
M be/src/service/query-options.cc
M be/src/service/query-options.h
M common/thrift/ImpalaInternalService.thrift
M common/thrift/ImpalaService.thrift
M tests/query_test/test_sort.py
13 files changed, 211 insertions(+), 11 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/63/15963/12
--
To view, visit http://gerrit.cloudera.org:8080/15963
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I2a0ba7c4bae4f1d300d4d9d7f594f63ced06a240
Gerrit-Change-Number: 15963
Gerrit-PatchSet: 12
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: David Rorke 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-6692: Trigger sort node run before hitting memory limit.

2020-06-15 Thread Csaba Ringhofer (Code Review)
Csaba Ringhofer has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15963 )

Change subject: IMPALA-6692: Trigger sort node run before hitting memory limit.
..


Patch Set 11:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/15963/10/be/src/exec/sort-node.cc
File be/src/exec/sort-node.cc:

http://gerrit.cloudera.org:8080/#/c/15963/10/be/src/exec/sort-node.cc@199
PS10, Line 199: const ::impala::Status& add_statu
We usually simply use "Status status = ... ".
Status only has a single pointer to an error message (which is nullptr if it is 
OK), so there's is no win in using a ptr/reference.


http://gerrit.cloudera.org:8080/#/c/15963/10/be/src/runtime/sorter.h
File be/src/runtime/sorter.h:

http://gerrit.cloudera.org:8080/#/c/15963/10/be/src/runtime/sorter.h@160
PS10, Line 160: sot_rub
typo


http://gerrit.cloudera.org:8080/#/c/15963/10/be/src/runtime/sorter.h@211
PS10, Line 211: specify
nit: specifies


http://gerrit.cloudera.org:8080/#/c/15963/10/be/src/runtime/sorter.h@217
PS10, Line 217:   /// There are two case where it is necessary to run 
intermediate run.
nit: cases



--
To view, visit http://gerrit.cloudera.org:8080/15963
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I2a0ba7c4bae4f1d300d4d9d7f594f63ced06a240
Gerrit-Change-Number: 15963
Gerrit-PatchSet: 11
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: David Rorke 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Mon, 15 Jun 2020 09:54:38 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-6692: Trigger sort node run before hitting memory limit.

2020-06-12 Thread Riza Suminto (Code Review)
Riza Suminto has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15963 )

Change subject: IMPALA-6692: Trigger sort node run before hitting memory limit.
..


Patch Set 11:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/15963/9//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/15963/9//COMMIT_MSG@7
PS9, Line 7: IMPALA-6692
> This is one step towards solving the problem.  The back pressure problem st
Resolving this, and continue discussion in latest patch.


http://gerrit.cloudera.org:8080/#/c/15963/9//COMMIT_MSG@23
PS9, Line 23: This patch speedup the decision to start the sort without waiting 
it
: to hit memory limit first by capping the intermediary quicksort 
run to
: lower memory limit,
> Patch set 10 add flag to either enforce sort_run_bytes_limit or not. It wil
Resolving this, and continue discussion in latest patch.


http://gerrit.cloudera.org:8080/#/c/15963/9//COMMIT_MSG@40
PS9, Line 40: intermediary sort.
> I later did 256 MB limit and it achieved a little faster query time. Loweri
Done


http://gerrit.cloudera.org:8080/#/c/15963/11/be/src/exec/sort-node.cc
File be/src/exec/sort-node.cc:

http://gerrit.cloudera.org:8080/#/c/15963/11/be/src/exec/sort-node.cc@76
PS11, Line 76:   int64_t estimated_input_size = 
children_node->tnode_->estimated_stats.cardinality
 :   * children_node->row_descriptor_->GetRowSize();
I'm not sure my calculation is right here. My assumption is that 
estimated_stats.cardinality here is per backend, but it seems like it is a 
query wide.

In my experiment today, I did insert to tpcds_300_parquet.web_sales using 5 
executor backends. Each SORT_NODE in each backend is expected to process 5.67GB 
of row batches, which is more beneficial to fit everything in memory. Setting 
buffer_pool_limit between 8GB to 28GB still cap and spill the first sort run, 
while setting it as 29GB or above can successfully waive sort_run_bytes_limit 
for the first sort run (allowing it to use full memory). Maybe I should divide 
by the number of executor backend involved here.



--
To view, visit http://gerrit.cloudera.org:8080/15963
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I2a0ba7c4bae4f1d300d4d9d7f594f63ced06a240
Gerrit-Change-Number: 15963
Gerrit-PatchSet: 11
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: David Rorke 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Sat, 13 Jun 2020 00:58:41 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-6692: Trigger sort node run before hitting memory limit.

2020-06-07 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15963 )

Change subject: IMPALA-6692: Trigger sort node run before hitting memory limit.
..


Patch Set 11:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/6231/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/15963
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I2a0ba7c4bae4f1d300d4d9d7f594f63ced06a240
Gerrit-Change-Number: 15963
Gerrit-PatchSet: 11
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: David Rorke 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Mon, 08 Jun 2020 02:27:44 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-6692: Trigger sort node run before hitting memory limit.

2020-06-07 Thread Riza Suminto (Code Review)
Hello David Rorke, Tim Armstrong, Csaba Ringhofer, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/15963

to look at the new patch set (#11).

Change subject: IMPALA-6692: Trigger sort node run before hitting memory limit.
..

IMPALA-6692: Trigger sort node run before hitting memory limit.

Sorter node works by adding row batches to a sort run. After all
batches added to current unsorted run or memory limit is hit, sorter
will immediately start the run. If the latter case happen, sorter will
spill the sorted run to disk after sort complete, create new unsorted
run object, and continue add the next row batches, and so on.

This algorithm try to fit as much rows into memory before start
sorting. However, in the case of partitioned sort with large number of
row batches, fitting too much rows into memory will cause the sort to
be slow and block the sorter node for a long time before it can
release some memory and continue accepting the next row batch from
exchange node. One slow sorter node can block exchange node from
sending row batches to other sorter node that is free.

This patch speedup the decision to start the sort without waiting it
to hit memory limit first by capping the intermediary quicksort run to
lower memory limit, determined by query option 'sort_run_bytes_limit'.
If the total used reservation of quicksort has exceed
sort_run_bytes_limit, current unsorted_run_ will be wrapped up,
sorted, and then spilled. Thus, overlapping the next sort run with
spill from previous sort run.

To reduce regression for cases where total input size of sort node
might be fully fitted into available memory, sort_run_bytes_limit will
not be enforced for the first sort run. However, it will stay limited
by sort_run_bytes_limit if planner estimates hint that spill is
inevitably will happen.

We also add new summary counter 'AddBatchTime' to get summary of how
much time spent in Sorter::AddBatch. Max of 'AddBatchTime' indicate
the longest time spent in Sorter::AddBatch, presumably busy doing
intermediary sort.

Testing:
- Add new e2e test TestQueryFullSort::test_multiple_sort_run_bytes_limits
- Run core tests
- Run data loading of 3 largest TPC-DS facts table of 300GB scale into
  real cluster using 5 backends, and 4GB mem_limit.
  sort_run_bytes_limit is varied between unspecified (not limited) vs
  512 MB. The performance result is summarized in the following table.

+---+-+--+---+-+
|  Insert table |  #Rows  |  Avg |   no limit|  512 MB 
limit   |
|   | | SortDataSize 
++--+-+---+
|   | |   per Node   |  Query |  Max |  Query  |
  Max  |
|   | |  |  Time  | AddBatchTime |   Time  |  
AddBatchTime |
+---+-+--++--+-+---+
| store_sales   | 864.00M | 15.29 GB | 30m18s | 53s311ms | 20m |
   5s634ms |
+---+-+--++--+-+---+
| catalog_sales | 431.97M | 11.34 GB | 23m24s | 31s212ms |  15m27s |
   3s603ms |
+---+-+--++--+-+---+
| web_sales | 216.01M |  5.67 GB |  8m16s | 29s250ms |   6m41s |
   3s856ms |
+---+-+--++--+-+---+

Change-Id: I2a0ba7c4bae4f1d300d4d9d7f594f63ced06a240
---
M be/src/exec/partial-sort-node.cc
M be/src/exec/sort-node.cc
M be/src/exec/sort-node.h
M be/src/runtime/sorter.cc
M be/src/runtime/sorter.h
M be/src/service/query-options-test.cc
M be/src/service/query-options.cc
M be/src/service/query-options.h
M common/thrift/ImpalaInternalService.thrift
M common/thrift/ImpalaService.thrift
M tests/query_test/test_sort.py
11 files changed, 141 insertions(+), 10 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/63/15963/11
--
To view, visit http://gerrit.cloudera.org:8080/15963
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I2a0ba7c4bae4f1d300d4d9d7f594f63ced06a240
Gerrit-Change-Number: 15963
Gerrit-PatchSet: 11
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: David Rorke 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-6692: Trigger sort node run before hitting memory limit.

2020-06-05 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15963 )

Change subject: IMPALA-6692: Trigger sort node run before hitting memory limit.
..


Patch Set 10:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/6225/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/15963
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I2a0ba7c4bae4f1d300d4d9d7f594f63ced06a240
Gerrit-Change-Number: 15963
Gerrit-PatchSet: 10
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: David Rorke 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Fri, 05 Jun 2020 21:35:58 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-6692: Trigger sort node run before hitting memory limit.

2020-06-05 Thread Riza Suminto (Code Review)
Riza Suminto has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15963 )

Change subject: IMPALA-6692: Trigger sort node run before hitting memory limit.
..


Patch Set 10:

(9 comments)

http://gerrit.cloudera.org:8080/#/c/15963/9//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/15963/9//COMMIT_MSG@23
PS9, Line 23: This patch speedup the decision to start the sort without waiting 
it
: to hit memory limit first by capping the intermediary quicksort 
run to
: lower memory limit,
> The following estimates vs actual from a partitioned parquet insert of cata
Patch set 10 add flag to either enforce sort_run_bytes_limit or not. It will 
not enforce sort_run_bytes_limit for the first run unless planner estimates 
hint that spill is inevitable.


http://gerrit.cloudera.org:8080/#/c/15963/9//COMMIT_MSG@47
PS9, Line 47:
> This is the maximum time of a single AddBatchTime call.
I decide to keep the counter name as it is, because there is already another 
counter for quicksort total time.


http://gerrit.cloudera.org:8080/#/c/15963/9/be/src/runtime/sorter.cc
File be/src/runtime/sorter.cc:

http://gerrit.cloudera.org:8080/#/c/15963/9/be/src/runtime/sorter.cc@879
PS9, Line 879:   int num_processed = 0;
> Ahh, Sorter::enable_spilling_ is a bit misleading - that actually is whethe
Done, it is now summarized by enforce_sort_run_bytes_limit_ flag.


http://gerrit.cloudera.org:8080/#/c/15963/9/be/src/runtime/sorter.cc@880
PS9, Line 880:   int cur_batch_index = 0;
 :   while (cur_batch_index < batch->num_rows()) {
 : RETURN_IF_ERROR(AddBatchNoSpill(batch, cur_batch_index, &nu
> Ack
Done


http://gerrit.cloudera.org:8080/#/c/15963/9/be/src/runtime/sorter.cc@894
PS9, Line 894: }
> Ack
Move the timer and counter to sort-node.cc


http://gerrit.cloudera.org:8080/#/c/15963/9/be/src/service/query-options.cc
File be/src/service/query-options.cc:

http://gerrit.cloudera.org:8080/#/c/15963/9/be/src/service/query-options.cc@904
PS9, Line 904: RETURN_IF_ERROR(
> Ack
I decide to silently cap it to minimal 32 MB as your later suggestion.
It feels more natural so that we can keep -1 and 0 to represent 'no limit' and 
not invalidate range between 1 byte to 32 MB.


http://gerrit.cloudera.org:8080/#/c/15963/9/common/thrift/ImpalaService.thrift
File common/thrift/ImpalaService.thrift:

http://gerrit.cloudera.org:8080/#/c/15963/9/common/thrift/ImpalaService.thrift@531
PS9, Line 531: intermediate
> Ack
Done


http://gerrit.cloudera.org:8080/#/c/15963/9/common/thrift/ImpalaService.thrift@533
PS9, Line 533:   SORT_RUN_BYTES_LIMIT = 103
> Ack
Done


http://gerrit.cloudera.org:8080/#/c/15963/9/tests/query_test/test_sort.py
File tests/query_test/test_sort.py:

http://gerrit.cloudera.org:8080/#/c/15963/9/tests/query_test/test_sort.py@79
PS9, Line 79: options = [('-1', '2g'), ('100m', '2g'), ('400m', '400m'), 
('120m', '400m')]
> Ack
Done



--
To view, visit http://gerrit.cloudera.org:8080/15963
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I2a0ba7c4bae4f1d300d4d9d7f594f63ced06a240
Gerrit-Change-Number: 15963
Gerrit-PatchSet: 10
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: David Rorke 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Fri, 05 Jun 2020 21:00:07 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-6692: Trigger sort node run before hitting memory limit.

2020-06-05 Thread Riza Suminto (Code Review)
Hello David Rorke, Tim Armstrong, Csaba Ringhofer, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/15963

to look at the new patch set (#10).

Change subject: IMPALA-6692: Trigger sort node run before hitting memory limit.
..

IMPALA-6692: Trigger sort node run before hitting memory limit.

Sorter node works by adding row batches to a sort run. After all
batches added to current unsorted run or memory limit is hit, sorter
will immediately start the run. If the latter case happen, sorter will
spill the sorted run to disk after sort complete, create new unsorted
run object, and continue add the next row batches, and so on.

This algorithm try to fit as much rows into memory before start
sorting. However, in the case of partitioned sort with large number of
row batches, fitting too much rows into memory will cause the sort to
be slow and block the sorter node for a long time before it can
release some memory and continue accepting the next row batch from
exchange node. One slow sorter node can block exchange node from
sending row batches to other sorter node that is free.

This patch speedup the decision to start the sort without waiting it
to hit memory limit first by capping the intermediary quicksort run to
lower memory limit, determined by query option 'sort_run_bytes_limit'.
If the total used reservation of quicksort has exceed
sort_run_bytes_limit, current unsorted_run_ will be wrapped up,
sorted, and then spilled. Thus, overlapping the next sort run with
spill from previous sort run.

To reduce regression for cases where total input size of sort node
might be fully fitted into available memory, sort_run_bytes_limit will
not be enforced for the first sort run. However, it will stay limited
by sort_run_bytes_limit if planner estimates hint that spill is
inevitably will happen.

We also add new summary counter 'AddBatchTime' to get summary of how
much time spent in Sorter::AddBatch. Max of 'AddBatchTime' indicate
the longest time spent in Sorter::AddBatch, presumably busy doing
intermediary sort.

Testing:
- Add new e2e test TestQueryFullSort::test_multiple_sort_run_bytes_limits
- Run core tests
- Run data loading of 3 largest TPC-DS facts table of 300GB scale into
  real cluster using 5 backends, and 4GB mem_limit.
  sort_run_bytes_limit is varied between unspecified (not limited) vs
  512 MB. The performance result is summarized in the following table.

+---+-+--+---+-+
|  Insert table |  #Rows  |  Avg |   no limit|  512 MB 
limit   |
|   | | SortDataSize 
++--+-+---+
|   | |   per Node   |  Query |  Max |  Query  |
  Max  |
|   | |  |  Time  | AddBatchTime |   Time  |  
AddBatchTime |
+---+-+--++--+-+---+
| store_sales   | 864.00M | 15.29 GB | 30m18s | 53s311ms | 20m |
   5s634ms |
+---+-+--++--+-+---+
| catalog_sales | 431.97M | 11.34 GB | 23m24s | 31s212ms |  15m27s |
   3s603ms |
+---+-+--++--+-+---+
| web_sales | 216.01M |  5.67 GB |  8m16s | 29s250ms |   6m41s |
   3s856ms |
+---+-+--++--+-+---+

Change-Id: I2a0ba7c4bae4f1d300d4d9d7f594f63ced06a240
---
M be/src/exec/partial-sort-node.cc
M be/src/exec/sort-node.cc
M be/src/exec/sort-node.h
M be/src/runtime/sorter.cc
M be/src/runtime/sorter.h
M be/src/service/query-options-test.cc
M be/src/service/query-options.cc
M be/src/service/query-options.h
M common/thrift/ImpalaInternalService.thrift
M common/thrift/ImpalaService.thrift
M tests/query_test/test_sort.py
11 files changed, 141 insertions(+), 10 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/63/15963/10
--
To view, visit http://gerrit.cloudera.org:8080/15963
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I2a0ba7c4bae4f1d300d4d9d7f594f63ced06a240
Gerrit-Change-Number: 15963
Gerrit-PatchSet: 10
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: David Rorke 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-6692: Trigger sort node run before hitting memory limit.

2020-06-05 Thread David Rorke (Code Review)
David Rorke has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15963 )

Change subject: IMPALA-6692: Trigger sort node run before hitting memory limit.
..


Patch Set 9:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/15963/9//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/15963/9//COMMIT_MSG@23
PS9, Line 23: This patch speedup the decision to start the sort without waiting 
it
: to hit memory limit first by capping the intermediary quicksort 
run to
: lower memory limit,
> > Since the patch can have immediate benefit in some well known cases like
The following estimates vs actual from a partitioned parquet insert of catalog 
sales.  Multiple runs with differing sort limits.  All with 10 TB scale factor 
on 20 nodes with an explicit 32 GB memory limit set:

Sort Limit.Rows.   Est Rows.  Peak Mem.   Est Peak Mem
None 14.40B.15.59B. 25.60 GB   2.43 GB
8GB   14.40B 15.59B   8.00 GB2.43 GB
2GB   14.40B 15.59B   2.00 GB2.43 GB
512MB.  14.40B 15.59B  678.65 MB2.43 GB
256MB   14.40B 15.59B   1.51 GB2.43 GB

So cardinality estimate is pretty good in this case, estimated peak mem is very 
low for the reason you gave (sort estimate doesn't assume the full input will 
be kept in memory).



--
To view, visit http://gerrit.cloudera.org:8080/15963
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I2a0ba7c4bae4f1d300d4d9d7f594f63ced06a240
Gerrit-Change-Number: 15963
Gerrit-PatchSet: 9
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: David Rorke 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Fri, 05 Jun 2020 19:45:56 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-6692: Trigger sort node run before hitting memory limit.

2020-06-05 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15963 )

Change subject: IMPALA-6692: Trigger sort node run before hitting memory limit.
..


Patch Set 9:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/15963/9//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/15963/9//COMMIT_MSG@23
PS9, Line 23: This patch speedup the decision to start the sort without waiting 
it
: to hit memory limit first by capping the intermediary quicksort 
run to
: lower memory limit,
> Since the patch can have immediate benefit in some well known cases like 
> partitioned inserts, we might consider putting it in with the query option 
> but without additional safeguards against regressions (but documenting the 
> cases where it's safe to use) and then explore some of these other approaches 
> in a follow up phase.

I'm fine with this, I just want to make sure we're set up with some freedom to 
change the behaviour in future.

As far as estimates go, there's a lot of room for improvement. The approach to 
estimates for SortNode is an outlier, since it doesn't actually estimate the 
ideal memory to keep it in-memory, unlike everything else. I'd be a little 
concerned that the estimate of the memory required to keep the full sort input 
in memory could be too high, since it depends on getting the cardinality and 
row size right for the full output of the plan tree.

Agree that the estimates would likely be very accurate for plain ETL cases 
where the source table has stats computed.

We could probably finesse the estimate logic a bit, e.g. use the full data size 
estimate up to a few GB, then switch to the two-phase estimate.

David, do you have some examples of numbers for the estimates vs actual that 
you mentioned?



--
To view, visit http://gerrit.cloudera.org:8080/15963
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I2a0ba7c4bae4f1d300d4d9d7f594f63ced06a240
Gerrit-Change-Number: 15963
Gerrit-PatchSet: 9
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: David Rorke 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Fri, 05 Jun 2020 16:46:58 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-6692: Trigger sort node run before hitting memory limit.

2020-06-05 Thread David Rorke (Code Review)
David Rorke has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15963 )

Change subject: IMPALA-6692: Trigger sort node run before hitting memory limit.
..


Patch Set 9:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/15963/9//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/15963/9//COMMIT_MSG@23
PS9, Line 23: This patch speedup the decision to start the sort without waiting 
it
: to hit memory limit first by capping the intermediary quicksort 
run to
: lower memory limit,
> Estimates could work well on very simple queries, but even a query like a s
I agree that we shouldn't use estimates in cases where they're likely to be 
unreliable.
Maybe we should separate the discussion into what we think is needed if we want 
to make this the default behavior vs what's good enough if it's enabled only 
with a non-default query option.   The potential for regression is obviously 
more of a concern if we're going to enable by default.
A few other thoughts:
* If we're looking for heuristics at planning or runtime to indicate whether 
this is likely to help we should also consider whether the sort has partitioned 
inputs.   It's not clear whether this approach helps if the input to the sort 
isn't partitioned.
* Regarding Tim's suggestion for making the sort more incremental but without 
spilling more aggressively - I agree this would be interesting to explore.  
It's not obvious how much of the benefit we're seeing in benchmarks of the 
current patch is caused by the earlier spilling vs just reducing burstiness 
through more incremental runs.
* Since the patch can have immediate benefit in some well known cases like 
partitioned inserts, we might consider putting it in with the query option but 
without additional safeguards against regressions (but documenting the cases 
where it's safe to use) and then explore some of these other approaches in a 
follow up phase.



--
To view, visit http://gerrit.cloudera.org:8080/15963
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I2a0ba7c4bae4f1d300d4d9d7f594f63ced06a240
Gerrit-Change-Number: 15963
Gerrit-PatchSet: 9
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: David Rorke 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Fri, 05 Jun 2020 16:05:48 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-6692: Trigger sort node run before hitting memory limit.

2020-06-05 Thread Csaba Ringhofer (Code Review)
Csaba Ringhofer has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15963 )

Change subject: IMPALA-6692: Trigger sort node run before hitting memory limit.
..


Patch Set 9:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/15963/9//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/15963/9//COMMIT_MSG@23
PS9, Line 23: This patch speedup the decision to start the sort without waiting 
it
: to hit memory limit first by capping the intermediary quicksort 
run to
: lower memory limit,
> From a quick look at the planner estimate code it seems we're estimating on
Estimates could work well on very simple queries, but even a query like a 
scanning a huge table + a some predicate can be tricky -  should we pessimize 
by always spilling, even if predicate might turn out to be very selective? The 
sort node is generally at the "end" of the query, so in complex queries the 
estimates can be easily orders of magnitude off at that point. + stats can be 
missing/not up to date

For these reasons I think that a simple adaptive behavior or a more 
sophisticated solution mentioned by Tim (doing quicksort in smaller runs from 
the start but postpone spilling if possible) would be much more reliable. Using 
estimates in specific cases (e.g. full table scan without predicates) could be 
a further optimization.



--
To view, visit http://gerrit.cloudera.org:8080/15963
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I2a0ba7c4bae4f1d300d4d9d7f594f63ced06a240
Gerrit-Change-Number: 15963
Gerrit-PatchSet: 9
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: David Rorke 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Fri, 05 Jun 2020 09:21:07 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-6692: Trigger sort node run before hitting memory limit.

2020-06-04 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15963 )

Change subject: IMPALA-6692: Trigger sort node run before hitting memory limit.
..


Patch Set 9:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/15963/9/be/src/runtime/sorter.cc
File be/src/runtime/sorter.cc:

http://gerrit.cloudera.org:8080/#/c/15963/9/be/src/runtime/sorter.cc@879
PS9, Line 879: if (cur_batch_index < batch->num_rows()
> I think it is implied by line 872?
Ahh, Sorter::enable_spilling_ is a bit misleading - that actually is whether 
it's in the partial sort mode or not. You have Sorter::enable_spilling_ = true 
for full sorts, even if spilling is disabled at the query level.

Just as an example of what I mean, if you run a sort query that requires 500mb 
with, say, scratch_limit=0, then currently that will succeed so long as it can 
get the memory.

But I think if you set sort_bytes_limit=100mb, then it would try to spill and 
fail, even thought there's memory available.



--
To view, visit http://gerrit.cloudera.org:8080/15963
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I2a0ba7c4bae4f1d300d4d9d7f594f63ced06a240
Gerrit-Change-Number: 15963
Gerrit-PatchSet: 9
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: David Rorke 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Fri, 05 Jun 2020 00:38:04 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-6692: Trigger sort node run before hitting memory limit.

2020-06-04 Thread David Rorke (Code Review)
David Rorke has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15963 )

Change subject: IMPALA-6692: Trigger sort node run before hitting memory limit.
..


Patch Set 9:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/15963/9//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/15963/9//COMMIT_MSG@23
PS9, Line 23: This patch speedup the decision to start the sort without waiting 
it
: to hit memory limit first by capping the intermediary quicksort 
run to
: lower memory limit,
> I'm not necessarily opposed to this approach (enforcing limit only after sp
>From a quick look at the planner estimate code it seems we're estimating only 
>what needs to be kept in memory assuming an external 2-phase sort (so assuming 
>we'll spill).  So maybe we should just look at the full input size and enforce 
>the sort_bytes_limit from the outset if the full input size is > the memory 
>limit.



--
To view, visit http://gerrit.cloudera.org:8080/15963
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I2a0ba7c4bae4f1d300d4d9d7f594f63ced06a240
Gerrit-Change-Number: 15963
Gerrit-PatchSet: 9
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: David Rorke 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Thu, 04 Jun 2020 22:50:58 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-6692: Trigger sort node run before hitting memory limit.

2020-06-04 Thread Riza Suminto (Code Review)
Riza Suminto has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15963 )

Change subject: IMPALA-6692: Trigger sort node run before hitting memory limit.
..


Patch Set 9:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/15963/9//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/15963/9//COMMIT_MSG@40
PS9, Line 40:   is varied between unspecified (not limited) vs 512 MB. The
> Did you do any experiments with lower values? Wondering if that helps furth
I later did 256 MB limit and it achieved a little faster query time. Lowering 
to 128 MB make the query time worse.
In David's large scale experiment though, there is not much difference between 
256 MB vs 512 MB.


http://gerrit.cloudera.org:8080/#/c/15963/9/be/src/runtime/sorter.cc
File be/src/runtime/sorter.cc:

http://gerrit.cloudera.org:8080/#/c/15963/9/be/src/runtime/sorter.cc@879
PS9, Line 879: if (cur_batch_index < batch->num_rows()
> I have some concern with how this will interact with spilling being disable
I think it is implied by line 872?

  DCHECK(enable_spilling_);


http://gerrit.cloudera.org:8080/#/c/15963/9/common/thrift/ImpalaService.thrift
File common/thrift/ImpalaService.thrift:

http://gerrit.cloudera.org:8080/#/c/15963/9/common/thrift/ImpalaService.thrift@531
PS9, Line 531: intermediate
> intermediate sort isn't that clear - I guess "intermediate sort runs"
Ack


http://gerrit.cloudera.org:8080/#/c/15963/9/common/thrift/ImpalaService.thrift@533
PS9, Line 533:   SORT_BYTES_LIMIT = 103
> I think we might actually want to make this SORT_RUN_BYTES_LIMIT or somethi
Ack



--
To view, visit http://gerrit.cloudera.org:8080/15963
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I2a0ba7c4bae4f1d300d4d9d7f594f63ced06a240
Gerrit-Change-Number: 15963
Gerrit-PatchSet: 9
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: David Rorke 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Thu, 04 Jun 2020 23:00:12 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-6692: Trigger sort node run before hitting memory limit.

2020-06-04 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15963 )

Change subject: IMPALA-6692: Trigger sort node run before hitting memory limit.
..


Patch Set 9:

(4 comments)

I'm thinking through the high level of this. I think the biggest obstacle to 
using this (or turning on by default) is that it can make queries spill that 
didn't spill before, or spill more than before, and that can cause issues.

I guess if the sort is already spilling the different is smaller.

It might be more of a clear win if it achieved the more incremental sorting 
behaviour without the extra spilling, e.g. by having sorted in-memory runs (as 
opposed to sorted spilled runs and unsorted in-memory runs). I don't know that 
I necessarily want to expand the scope of this commit, but would be interested 
on your thoughts on that.

http://gerrit.cloudera.org:8080/#/c/15963/9//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/15963/9//COMMIT_MSG@40
PS9, Line 40:   is varied between unspecified (not limited) vs 512 MB. The
Did you do any experiments with lower values? Wondering if that helps further.


http://gerrit.cloudera.org:8080/#/c/15963/9/be/src/runtime/sorter.cc
File be/src/runtime/sorter.cc:

http://gerrit.cloudera.org:8080/#/c/15963/9/be/src/runtime/sorter.cc@879
PS9, Line 879: if (cur_batch_index < batch->num_rows()
I have some concern with how this will interact with spilling being disabled 
(scratch_limit=0 or disable_unsafe_spills=true, with some missing stats). In 
those cases StartSpilling() returns an error.

Other operators only spill when the alternative is failing the query with OOM, 
so it's fine to just fail the query. But here we could hit that even if there's 
plenty of memory. I think it would make sense to also check if spilling is 
enabled.


http://gerrit.cloudera.org:8080/#/c/15963/9/common/thrift/ImpalaService.thrift
File common/thrift/ImpalaService.thrift:

http://gerrit.cloudera.org:8080/#/c/15963/9/common/thrift/ImpalaService.thrift@531
PS9, Line 531: intermediate
intermediate sort isn't that clear - I guess "intermediate sort runs"


http://gerrit.cloudera.org:8080/#/c/15963/9/common/thrift/ImpalaService.thrift@533
PS9, Line 533:   SORT_BYTES_LIMIT = 103
I think we might actually want to make this SORT_RUN_BYTES_LIMIT or something 
along those lines. This name kinda boxes us into it being an upper bound on the 
total memory used by the sort node, whereas we might want to use more memory to 
optimise this further.

One way I can see this evolving is that, if we had some free memory to play 
with when hitting this limit, we could defer spilling and keep some of the runs 
in memory. Maybe even incrementally do the quicksort work on previous runs in 
AddBatch() to reduce the blocking time further.

If we did that, SORT_BYTES_LIMIT would be pretty misleading.

Another case I thought about was the interaction with spilling being disabled - 
i left a comment about that in sorter.cc.



--
To view, visit http://gerrit.cloudera.org:8080/15963
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I2a0ba7c4bae4f1d300d4d9d7f594f63ced06a240
Gerrit-Change-Number: 15963
Gerrit-PatchSet: 9
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: David Rorke 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Thu, 04 Jun 2020 22:37:04 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-6692: Trigger sort node run before hitting memory limit.

2020-06-04 Thread David Rorke (Code Review)
David Rorke has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15963 )

Change subject: IMPALA-6692: Trigger sort node run before hitting memory limit.
..


Patch Set 9:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/15963/9//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/15963/9//COMMIT_MSG@23
PS9, Line 23: This patch speedup the decision to start the sort without waiting 
it
: to hit memory limit first by capping the intermediary quicksort 
run to
: lower memory limit,
> Great idea! I will try to implement it that way.
I'm not necessarily opposed to this approach (enforcing limit only after 
spilling starts) but if we had confidence in the memory estimate it seems like 
we could enforce the limit from the start if the estimate is > the memory limit 
(we're very likely to spill).  Unfortunately in some of the queries I'm looking 
at our estimates are lower than the actual peak consumed (with no limit) by an 
order of magnitude even though the queries end up spilling heavily.
So we'll have to look into why those estimates are so bad, but for now maybe we 
should file a follow up JIRA to go back and make the application of the limit 
consider the estimate once we've improved the estimates (and consider a TODO in 
the code).



--
To view, visit http://gerrit.cloudera.org:8080/15963
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I2a0ba7c4bae4f1d300d4d9d7f594f63ced06a240
Gerrit-Change-Number: 15963
Gerrit-PatchSet: 9
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: David Rorke 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Thu, 04 Jun 2020 22:25:36 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-6692: Trigger sort node run before hitting memory limit.

2020-06-04 Thread Riza Suminto (Code Review)
Riza Suminto has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15963 )

Change subject: IMPALA-6692: Trigger sort node run before hitting memory limit.
..


Patch Set 9:

(7 comments)

http://gerrit.cloudera.org:8080/#/c/15963/9//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/15963/9//COMMIT_MSG@7
PS9, Line 7: IMPALA-6692
> Should this solve the issue in general, or this is just one step in that di
This is one step towards solving the problem.  The back pressure problem still 
exist in some degree, but should be decreased using this method.


http://gerrit.cloudera.org:8080/#/c/15963/9//COMMIT_MSG@23
PS9, Line 23: This patch speedup the decision to start the sort without waiting 
it
: to hit memory limit first by capping the intermediary quicksort 
run to
: lower memory limit,
> I was thinking about ways to minimize the regression this can cause in quer
Great idea! I will try to implement it that way.
I think there are two points that I need to pay attention in regards to this 
idea.

First, the size of the first sort run, where sort_bytes_limit is not enforced 
yet, can be couple times bigger than the rest of other runs. I'm not sure how 
the merge phase will behave in that case. I suppose it may be work just fine, 
since in many case the very last run will have different size (smaller) as well.

Second, when sort_bytes_limit is going to be enforced, I need to make sure to 
lower memory reservation for that SORT NODE to at least sort_bytes_limit. The 
reservation reduction might need to be done gradually as the sorted run pages 
are unpinned.


http://gerrit.cloudera.org:8080/#/c/15963/9//COMMIT_MSG@47
PS9, Line 47: AddBatchTime
> Does this mean the maximum time of a single AddBatchTime call, or the total
This is the maximum time of a single AddBatchTime call.
AddBatchTime is implemented as SummaryStatsCounter showing min, max, average, 
and sample count.

In regards to your proposed idea earlier, I probably need to rethink about this 
counter as well.
The purpose of this counter was to measure the effect of selected 
sort_bytes_limit value towards the time a SORT NODE blocked doing quicksort.
If we not enforcing sort_bytes_limit in the first run, the Max AddBatchTime 
will then always associated with the first quicksort.

Maybe I need to refocus this counter to just measure the quicksort time. Rename 
'AddBatchTime' to 'QuickSortTime'.


http://gerrit.cloudera.org:8080/#/c/15963/9/be/src/runtime/sorter.cc
File be/src/runtime/sorter.cc:

http://gerrit.cloudera.org:8080/#/c/15963/9/be/src/runtime/sorter.cc@880
PS9, Line 880: || (state_->query_options().sort_bytes_limit > 0
 :&& buffer_pool_client_->GetUsedReservation()
 :>= state_->query_options().sort_bytes_limit)
> nit: indentation looks a bit messy, but I am not sure what is the rule here
Ack


http://gerrit.cloudera.org:8080/#/c/15963/9/be/src/runtime/sorter.cc@894
PS9, Line 894:   timer.Stop();
> I don't know if it matters, but this means that the counter won't be increa
Ack

Will rethink a better way to wrap the timer.


http://gerrit.cloudera.org:8080/#/c/15963/9/be/src/service/query-options.cc
File be/src/service/query-options.cc:

http://gerrit.cloudera.org:8080/#/c/15963/9/be/src/service/query-options.cc@904
PS9, Line 904: RETURN_IF_ERROR(ParseMemValue(value, "sort bytes limit", 
&sort_bytes_limit));
> I wonder if it would make sense add a minimum limit, e.g. 32MB. The problem
Ack


http://gerrit.cloudera.org:8080/#/c/15963/9/tests/query_test/test_sort.py
File tests/query_test/test_sort.py:

http://gerrit.cloudera.org:8080/#/c/15963/9/tests/query_test/test_sort.py@79
PS9, Line 79: query, exec_option, table_format=table_format)
> nit: +2 indentation
Ack



--
To view, visit http://gerrit.cloudera.org:8080/15963
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I2a0ba7c4bae4f1d300d4d9d7f594f63ced06a240
Gerrit-Change-Number: 15963
Gerrit-PatchSet: 9
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: David Rorke 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Thu, 04 Jun 2020 16:13:37 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-6692: Trigger sort node run before hitting memory limit.

2020-06-04 Thread Csaba Ringhofer (Code Review)
Csaba Ringhofer has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15963 )

Change subject: IMPALA-6692: Trigger sort node run before hitting memory limit.
..


Patch Set 9:

(7 comments)

The change looks good to me in general, I only have some nits and improvement 
suggestions.

http://gerrit.cloudera.org:8080/#/c/15963/9//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/15963/9//COMMIT_MSG@7
PS9, Line 7: IMPALA-6692
Should this solve the issue in general, or this is just one step in that 
direction? The 3-6 second AddBatchTime still seems enough to fill the buffers 
and cause back pressure.


http://gerrit.cloudera.org:8080/#/c/15963/9//COMMIT_MSG@23
PS9, Line 23: This patch speedup the decision to start the sort without waiting 
it
: to hit memory limit first by capping the intermediary quicksort 
run to
: lower memory limit,
I was thinking about ways to minimize the regression this can cause in queries 
that spill more because of the sort_bytes_limit, e.g. where the memory need of 
the sort is between sort_bytes_limit and mem limit.

A compromise could be to start using sort_bytes_limit only after the sorter had 
to spill at least once. This would mean no spilling if it is not needed at all, 
and the "big sort + spilling" could only occur once for an instance, so the 
penalty would not scale with the size of the data set.

This may potentially allow us to set a safe default for sort_bytes_limit that 
rarely leads to regression but still helps in huge queries.


http://gerrit.cloudera.org:8080/#/c/15963/9//COMMIT_MSG@47
PS9, Line 47: AddBatchTime
Does this mean the maximum time of a single AddBatchTime call, or the total 
time of fragment instance that spent to most time in the function?


http://gerrit.cloudera.org:8080/#/c/15963/9/be/src/runtime/sorter.cc
File be/src/runtime/sorter.cc:

http://gerrit.cloudera.org:8080/#/c/15963/9/be/src/runtime/sorter.cc@880
PS9, Line 880: || (state_->query_options().sort_bytes_limit > 0
 :&& buffer_pool_client_->GetUsedReservation()
 :>= state_->query_options().sort_bytes_limit)
nit: indentation looks a bit messy, but I am not sure what is the rule here

Moving the check to a function like bool ReachedSortBytesLimit() could help.


http://gerrit.cloudera.org:8080/#/c/15963/9/be/src/runtime/sorter.cc@894
PS9, Line 894:   timer.Stop();
I don't know if it matters, but this means that the counter won't be increased 
if the function returns with an error, while it is possible that a huge amount 
of time was spent here.


http://gerrit.cloudera.org:8080/#/c/15963/9/be/src/service/query-options.cc
File be/src/service/query-options.cc:

http://gerrit.cloudera.org:8080/#/c/15963/9/be/src/service/query-options.cc@904
PS9, Line 904: RETURN_IF_ERROR(ParseMemValue(value, "sort bytes limit", 
&sort_bytes_limit));
I wonder if it would make sense add a minimum limit, e.g. 32MB. The problem 
with small values is that it would greatly increase the number of runs, which 
can lead to doing the merge phase of the sort in multiple levels, as all runs 
during a merge needs a buffer, see Sorter::MaxRunsInNextMerge(). It seems also 
ok to enforce this in Sorter without notifying the user.


http://gerrit.cloudera.org:8080/#/c/15963/9/tests/query_test/test_sort.py
File tests/query_test/test_sort.py:

http://gerrit.cloudera.org:8080/#/c/15963/9/tests/query_test/test_sort.py@79
PS9, Line 79: query, exec_option, table_format=table_format)
nit: +2 indentation



--
To view, visit http://gerrit.cloudera.org:8080/15963
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I2a0ba7c4bae4f1d300d4d9d7f594f63ced06a240
Gerrit-Change-Number: 15963
Gerrit-PatchSet: 9
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: David Rorke 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Thu, 04 Jun 2020 14:31:00 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-6692: Trigger sort node run before hitting memory limit.

2020-06-03 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15963 )

Change subject: IMPALA-6692: Trigger sort node run before hitting memory limit.
..


Patch Set 9:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/6209/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/15963
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I2a0ba7c4bae4f1d300d4d9d7f594f63ced06a240
Gerrit-Change-Number: 15963
Gerrit-PatchSet: 9
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: David Rorke 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Wed, 03 Jun 2020 23:43:51 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-6692: Trigger sort node run before hitting memory limit.

2020-06-03 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15963 )

Change subject: IMPALA-6692: Trigger sort node run before hitting memory limit.
..


Patch Set 8:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/6206/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/15963
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I2a0ba7c4bae4f1d300d4d9d7f594f63ced06a240
Gerrit-Change-Number: 15963
Gerrit-PatchSet: 8
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: David Rorke 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Comment-Date: Wed, 03 Jun 2020 23:11:16 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-6692: Trigger sort node run before hitting memory limit.

2020-06-03 Thread Riza Suminto (Code Review)
Hello David Rorke, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/15963

to look at the new patch set (#9).

Change subject: IMPALA-6692: Trigger sort node run before hitting memory limit.
..

IMPALA-6692: Trigger sort node run before hitting memory limit.

Sorter node works by adding row batches to a sort run. After all
batches added to current unsorted run or memory limit is hit, sorter
will immediately start the run. If the latter case happen, sorter will
spill the sorted run to disk after sort complete, create new unsorted
run object, and continue add the next row batches, and so on.

This algorithm try to fit as much rows into memory before start
sorting. However, in the case of partitioned sort with large number of
row batches, fitting too much rows into memory will cause the sort to
be slow and block the sorter node for a long time before it can
release some memory and continue accepting the next row batch from
exchange node. One slow sorter node can block exchange node from
sending row batches to other sorter node that is free.

This patch speedup the decision to start the sort without waiting it
to hit memory limit first by capping the intermediary quicksort run to
lower memory limit, determined by query option 'sort_bytes_limit'. If
the total used reservation of quicksort has exceed sort_bytes_limit,
current unsorted_run_ will be wrapped up, sorted, and then spilled.
Thus, overlapping the next sort run with spill from previous sort run.

We also add new summary counter 'AddBatchTime' to get summary of how
much time spent in Sorter::AddBatch. Max of 'AddBatchTime' indicate
the longest time spent in Sorter::AddBatch, presumably busy doing
intermediary sort.

Testing:
- Add new e2e test TestQueryFullSort::test_multiple_sort_bytes_limits
- Run core tests
- Run data loading of 3 largest TPC-DS facts table of 300GB scale into
  real cluster using 5 backends, and 4GB mem_limit. sort_bytes_limit
  is varied between unspecified (not limited) vs 512 MB. The
  performance result is summarized in the following table.

+---+-+--+---+-+
|  Insert table |  #Rows  |  Avg |  no sort_bytes_limit  | 512 MB 
sort_bytes_limit |
|   | | SortDataSize 
++--+-+---+
|   | |   per Node   |  Query |  Max |  Query  |
  Max  |
|   | |  |  Time  | AddBatchTime |   Time  |  
AddBatchTime |
+---+-+--++--+-+---+
| store_sales   | 864.00M | 15.29 GB | 30m18s | 53s311ms | 20m |
   5s634ms |
+---+-+--++--+-+---+
| catalog_sales | 431.97M | 11.34 GB | 23m24s | 31s212ms |  15m27s |
   3s603ms |
+---+-+--++--+-+---+
| web_sales | 216.01M |  5.67 GB |  8m16s | 29s250ms |   6m41s |
   3s856ms |
+---+-+--++--+-+---+

Change-Id: I2a0ba7c4bae4f1d300d4d9d7f594f63ced06a240
---
M be/src/runtime/sorter.cc
M be/src/runtime/sorter.h
M be/src/service/query-options-test.cc
M be/src/service/query-options.cc
M be/src/service/query-options.h
M common/thrift/ImpalaInternalService.thrift
M common/thrift/ImpalaService.thrift
M tests/query_test/test_sort.py
8 files changed, 49 insertions(+), 3 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/63/15963/9
--
To view, visit http://gerrit.cloudera.org:8080/15963
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I2a0ba7c4bae4f1d300d4d9d7f594f63ced06a240
Gerrit-Change-Number: 15963
Gerrit-PatchSet: 9
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: David Rorke 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Riza Suminto