[Impala-ASF-CR] IMPALA-9955,IMPALA-9957: Fix not enough reservation for large read/write pages in GroupingAggregator

2020-07-30 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16240 )

Change subject: IMPALA-9955,IMPALA-9957: Fix not enough reservation for large 
read/write pages in GroupingAggregator
..


Patch Set 4: Verified-1

Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/6198/


--
To view, visit http://gerrit.cloudera.org:8080/16240
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I3d9c3a2e7f0da60071b920dec979729e86459775
Gerrit-Change-Number: 16240
Gerrit-PatchSet: 4
Gerrit-Owner: Quanlong Huang 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Fri, 31 Jul 2020 05:14:42 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-9955,IMPALA-9957: Fix not enough reservation for large read/write pages in GroupingAggregator

2020-07-30 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16240 )

Change subject: IMPALA-9955,IMPALA-9957: Fix not enough reservation for large 
read/write pages in GroupingAggregator
..


Patch Set 4:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/6198/ 
DRY_RUN=true


--
To view, visit http://gerrit.cloudera.org:8080/16240
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I3d9c3a2e7f0da60071b920dec979729e86459775
Gerrit-Change-Number: 16240
Gerrit-PatchSet: 4
Gerrit-Owner: Quanlong Huang 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Thu, 30 Jul 2020 23:39:39 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-9955,IMPALA-9957: Fix not enough reservation for large read/write pages in GroupingAggregator

2020-07-28 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16240 )

Change subject: IMPALA-9955,IMPALA-9957: Fix not enough reservation for large 
read/write pages in GroupingAggregator
..


Patch Set 4:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/6727/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/16240
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I3d9c3a2e7f0da60071b920dec979729e86459775
Gerrit-Change-Number: 16240
Gerrit-PatchSet: 4
Gerrit-Owner: Quanlong Huang 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Wed, 29 Jul 2020 03:36:26 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-9955,IMPALA-9957: Fix not enough reservation for large read/write pages in GroupingAggregator

2020-07-28 Thread Quanlong Huang (Code Review)
Hello Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/16240

to look at the new patch set (#4).

Change subject: IMPALA-9955,IMPALA-9957: Fix not enough reservation for large 
read/write pages in GroupingAggregator
..

IMPALA-9955,IMPALA-9957: Fix not enough reservation for large read/write pages 
in GroupingAggregator

The minimum requirement for a spillable operator is ((min_buffers -2) *
default_buffer_size) + 2 * max_row_size. In the min reservation, we only
reserve space for two large pages, one for reading, the other for
writing. However, to make the non-streaming GroupingAggregator work
correctly, we have to manage these extra reservations carefully. So it
won't run out of the min reservation when it actually needs to spill a
large page, or when it actually needs to read a large page.

To be specific, we save extra reservation for writing a large page. It's
only used when we run out of unused reservation and fail to increase the
reservation to fit the large page. Currently there are two cases in
non-streaming GroupingAggregator. One case is when we start to spill a
partition and a serialize stream is needed to write some large pages.
The other case is when we have spilled all partitions in a repartition
process and need to write a large page to a spilled partition. Note that
each spilled partition in the repartition process still keeps the
default_page_size worth of reservation for writing a default page. We
can only restore the extra reservation when a partition is actually
writing a large page, and then reclaim it after the writing.

The same for extra reservation for reading a large page. In the
repartition process, we may read large pages from the input stream (from
a previous spilled partition). When it needs to pin the current large
page, we restore the extra reservation, and then reclaim it when the
attached row batch is reset.

This patch also fixes the wrong assumption that non-streaming
GroupingAggregator only requires one buffer reservation for the hash
tables. The minimal spillable buffer size is 64KB, while the minimal
requirement of a non-streaming GroupingAggregator's hash tables is
num_buckets(1024) * bucket_size(16) * partition_fanout(16) = 256KB.
We should reserve more buffers when the spillable buffer size is small.
Fix some planner test failures due to this change.

Tests:
 - Add spilling tests with memory pressure to verify
   GroupingAggregator works in min reservation.
 - Run core tests

Change-Id: I3d9c3a2e7f0da60071b920dec979729e86459775
---
M be/src/exec/grouping-aggregator-partition.cc
M be/src/exec/grouping-aggregator.cc
M be/src/exec/grouping-aggregator.h
M be/src/runtime/buffered-tuple-stream.cc
M be/src/runtime/buffered-tuple-stream.h
M be/src/runtime/bufferpool/buffer-pool.cc
M be/src/runtime/bufferpool/buffer-pool.h
M be/src/runtime/bufferpool/reservation-tracker.cc
M fe/src/main/java/org/apache/impala/planner/AggregationNode.java
M 
testdata/workloads/functional-planner/queries/PlannerTest/spillable-buffer-sizing.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds-all.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpch-all.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpch-kudu.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpch-nested.test
A 
testdata/workloads/functional-query/queries/QueryTest/spilling-aggs-large-rows.test
M tests/query_test/test_spilling.py
16 files changed, 402 insertions(+), 139 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/40/16240/4
--
To view, visit http://gerrit.cloudera.org:8080/16240
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I3d9c3a2e7f0da60071b920dec979729e86459775
Gerrit-Change-Number: 16240
Gerrit-PatchSet: 4
Gerrit-Owner: Quanlong Huang 
Gerrit-Reviewer: Impala Public Jenkins 


[Impala-ASF-CR] IMPALA-9955,IMPALA-9957: Fix not enough reservation for large read/write pages in GroupingAggregator

2020-07-28 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16240 )

Change subject: IMPALA-9955,IMPALA-9957: Fix not enough reservation for large 
read/write pages in GroupingAggregator
..


Patch Set 3: Verified-1

Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/6188/


--
To view, visit http://gerrit.cloudera.org:8080/16240
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I3d9c3a2e7f0da60071b920dec979729e86459775
Gerrit-Change-Number: 16240
Gerrit-PatchSet: 3
Gerrit-Owner: Quanlong Huang 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Tue, 28 Jul 2020 14:17:28 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-9955,IMPALA-9957: Fix not enough reservation for large read/write pages in GroupingAggregator

2020-07-28 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16240 )

Change subject: IMPALA-9955,IMPALA-9957: Fix not enough reservation for large 
read/write pages in GroupingAggregator
..


Patch Set 3:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/6188/ 
DRY_RUN=true


--
To view, visit http://gerrit.cloudera.org:8080/16240
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I3d9c3a2e7f0da60071b920dec979729e86459775
Gerrit-Change-Number: 16240
Gerrit-PatchSet: 3
Gerrit-Owner: Quanlong Huang 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Tue, 28 Jul 2020 09:01:52 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-9955,IMPALA-9957: Fix not enough reservation for large read/write pages in GroupingAggregator

2020-07-28 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16240 )

Change subject: IMPALA-9955,IMPALA-9957: Fix not enough reservation for large 
read/write pages in GroupingAggregator
..


Patch Set 3:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/6722/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/16240
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I3d9c3a2e7f0da60071b920dec979729e86459775
Gerrit-Change-Number: 16240
Gerrit-PatchSet: 3
Gerrit-Owner: Quanlong Huang 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Tue, 28 Jul 2020 09:01:18 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-9955,IMPALA-9957: Fix not enough reservation for large read/write pages in GroupingAggregator

2020-07-28 Thread Quanlong Huang (Code Review)
Hello Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/16240

to look at the new patch set (#3).

Change subject: IMPALA-9955,IMPALA-9957: Fix not enough reservation for large 
read/write pages in GroupingAggregator
..

IMPALA-9955,IMPALA-9957: Fix not enough reservation for large read/write pages 
in GroupingAggregator

The minimum requirement for a spillable operator is ((min_buffers -2) *
default_buffer_size) + 2 * max_row_size. In the min reservation, we only
reserve space for two large pages, one for reading, the other for
writing. However, to make the non-streaming GroupingAggregator work
correctly, we have to manage these extra reservations carefully. So it
won't run out of the min reservation when it actually needs to spill a
large page, or when it actually needs to read a large page.

To be specific, we save extra reservation for writing a large page. It's
only used when we run out of unused reservation and fail to increase the
reservation to fit the large page. Currently there are two cases in
non-streaming GroupingAggregator. One case is when we start to spill a
partition and a serialize stream is needed to write some large pages.
The other case is when we have spilled all partitions in a repartition
process and need to write a large page to a spilled partition. Note that
each spilled partition in the repartition process still keeps the
default_page_size worth of reservation for writing a default page. We
can only restore the extra reservation when a partition is actually
writing a large page, and then reclaim it after the writing.

The same for extra reservation for reading a large page. In the
repartition process, we may read large pages from the input stream (from
a previous spilled partition). When it needs to pin the current large
page, we restore the extra reservation, and then reclaim it when the
attached row batch is reset.

This patch also fixes the wrong assumption that non-streaming
GroupingAggregator only requires one buffer reservation for the hash
tables. The minimal spillable buffer size is 64KB, while the minimal
requirement of a non-streaming GroupingAggregator's hash tables is
num_buckets(1024) * bucket_size(16) * partition_fanout(16) = 256KB.
We should reserve more buffers when the spillable buffer size is small.
Fix some planner test failures due to this change.

Tests:
 - Add spilling tests with memory pressure to verify
   GroupingAggregator works in min reservation.
 - Run core tests

Change-Id: I3d9c3a2e7f0da60071b920dec979729e86459775
---
M be/src/exec/grouping-aggregator-partition.cc
M be/src/exec/grouping-aggregator.cc
M be/src/exec/grouping-aggregator.h
M be/src/runtime/buffered-tuple-stream.cc
M be/src/runtime/buffered-tuple-stream.h
M be/src/runtime/bufferpool/buffer-pool.cc
M be/src/runtime/bufferpool/buffer-pool.h
M be/src/runtime/bufferpool/reservation-tracker.cc
M fe/src/main/java/org/apache/impala/planner/AggregationNode.java
M 
testdata/workloads/functional-planner/queries/PlannerTest/spillable-buffer-sizing.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds-all.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpch-all.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpch-kudu.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpch-nested.test
A 
testdata/workloads/functional-query/queries/QueryTest/spilling-aggs-large-rows.test
M tests/query_test/test_spilling.py
16 files changed, 355 insertions(+), 136 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/40/16240/3
--
To view, visit http://gerrit.cloudera.org:8080/16240
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I3d9c3a2e7f0da60071b920dec979729e86459775
Gerrit-Change-Number: 16240
Gerrit-PatchSet: 3
Gerrit-Owner: Quanlong Huang 
Gerrit-Reviewer: Impala Public Jenkins 


[Impala-ASF-CR] IMPALA-9955,IMPALA-9957: Fix not enough reservation for large read/write pages in GroupingAggregator

2020-07-28 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16240 )

Change subject: IMPALA-9955,IMPALA-9957: Fix not enough reservation for large 
read/write pages in GroupingAggregator
..


Patch Set 2:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/6720/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/16240
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I3d9c3a2e7f0da60071b920dec979729e86459775
Gerrit-Change-Number: 16240
Gerrit-PatchSet: 2
Gerrit-Owner: Quanlong Huang 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Tue, 28 Jul 2020 07:57:16 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-9955,IMPALA-9957: Fix not enough reservation for large read/write pages in GroupingAggregator

2020-07-28 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16240 )

Change subject: IMPALA-9955,IMPALA-9957: Fix not enough reservation for large 
read/write pages in GroupingAggregator
..


Patch Set 2:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/16240/2/be/src/exec/grouping-aggregator-partition.cc
File be/src/exec/grouping-aggregator-partition.cc:

http://gerrit.cloudera.org:8080/#/c/16240/2/be/src/exec/grouping-aggregator-partition.cc@144
PS2, Line 144:   // make clean up easier (someone has to finalize this 
stream and we don't want to
line too long (91 > 90)



--
To view, visit http://gerrit.cloudera.org:8080/16240
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I3d9c3a2e7f0da60071b920dec979729e86459775
Gerrit-Change-Number: 16240
Gerrit-PatchSet: 2
Gerrit-Owner: Quanlong Huang 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Tue, 28 Jul 2020 07:35:44 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-9955,IMPALA-9957: Fix not enough reservation for large read/write pages in GroupingAggregator

2020-07-28 Thread Quanlong Huang (Code Review)
Quanlong Huang has uploaded a new patch set (#2). ( 
http://gerrit.cloudera.org:8080/16240 )

Change subject: IMPALA-9955,IMPALA-9957: Fix not enough reservation for large 
read/write pages in GroupingAggregator
..

IMPALA-9955,IMPALA-9957: Fix not enough reservation for large read/write pages 
in GroupingAggregator

The minimum requirement for a spillable operator is ((min_buffers -2) *
default_buffer_size) + 2 * max_row_size. In the min reservation, we only
reserve space for two large pages, one for reading, the other for
writing. However, to make the non-streaming GroupingAggregator work
correctly, we have to manage these extra reservations carefully. So it
won't run out of the min reservation when it actually needs to spill a
large page, or when it actually needs to read a large page.

To be specific, we save extra reservation for writing a large page. It's
only used when we run out of unused reservation and fail to increase the
reservation to fit the large page. Currently there are two cases in
non-streaming GroupingAggregator. One case is when we start to spill a
partition and a serialize stream is needed to write some large pages.
The other case is when we have spilled all partitions in a repartition
process and need to write a large page to a spilled partition. Note that
each spilled partition in the repartition process still keeps the
default_page_size worth of reservation for writing a default page. We
can only restore the extra reservation when a partition is actually
writing a large page, and then reclaim it after the writing.

The same for extra reservation for reading a large page. In the
repartition process, we may read large pages from the input stream (from
a previous spilled partition). When it needs to pin the current large
page, we restore the extra reservation, and then reclaim it when the
attached row batch is reset.

This patch also fixes the wrong assumption that non-streaming
GroupingAggregator only requires one buffer reservation for the hash
tables. The minimal spillable buffer size is 64KB, while the minimal
requirement of a non-streaming GroupingAggregator's hash tables is
num_buckets(1024) * bucket_size(16) * partition_fanout(16) = 256KB.
We should reserve more buffers when the spillable buffer size is small.
Fix some planner test failures due to this change.

Tests:
 - Add spilling tests with memory pressure to verify
   GroupingAggregator works in min reservation.
 - Run core tests

Change-Id: I3d9c3a2e7f0da60071b920dec979729e86459775
---
M be/src/exec/grouping-aggregator-partition.cc
M be/src/exec/grouping-aggregator.cc
M be/src/exec/grouping-aggregator.h
M be/src/runtime/buffered-tuple-stream.cc
M be/src/runtime/buffered-tuple-stream.h
M be/src/runtime/bufferpool/buffer-pool.cc
M be/src/runtime/bufferpool/buffer-pool.h
M be/src/runtime/bufferpool/reservation-tracker.cc
M fe/src/main/java/org/apache/impala/planner/AggregationNode.java
M 
testdata/workloads/functional-planner/queries/PlannerTest/spillable-buffer-sizing.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds-all.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpch-all.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpch-kudu.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpch-nested.test
A 
testdata/workloads/functional-query/queries/QueryTest/spilling-aggs-large-rows.test
M tests/query_test/test_spilling.py
16 files changed, 340 insertions(+), 136 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/40/16240/2
--
To view, visit http://gerrit.cloudera.org:8080/16240
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I3d9c3a2e7f0da60071b920dec979729e86459775
Gerrit-Change-Number: 16240
Gerrit-PatchSet: 2
Gerrit-Owner: Quanlong Huang