[Impala-ASF-CR] IMPALA-10565: Adjust result spooling memory based on scratch limit

2021-03-13 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17166 )

Change subject: IMPALA-10565: Adjust result spooling memory based on 
scratch_limit
..


Patch Set 7:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/8359/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/17166
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I541f46e6911694e14c0fc25be1a6982fd929d3a9
Gerrit-Change-Number: 17166
Gerrit-PatchSet: 7
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Comment-Date: Sun, 14 Mar 2021 03:49:28 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10565: Adjust result spooling memory based on scratch limit

2021-03-13 Thread Aman Sinha (Code Review)
Aman Sinha has submitted this change and it was merged. ( 
http://gerrit.cloudera.org:8080/17166 )

Change subject: IMPALA-10565: Adjust result spooling memory based on 
scratch_limit
..

IMPALA-10565: Adjust result spooling memory based on scratch_limit

IMPALA-9856 enables result spooling by default. Result spooling depends
on the ability to spill its entire BufferedTupleStream to disk once it
hits maximum memory reservation. However, if the query option
scratch_limit is set lower than max_spilled_result_spooling_mem, the
query might fail in the middle of execution due to insufficient scratch
space. This patch adds planner change to consider scratch_limit and
scratch_dirs query option when computing resource used by result
spooling. The algorithm is as follow:

* If scratch_dirs is empty or scratch_limit < minMemReservationBytes
  required to use BufferedPlanRootSink, we set spool_query_results to
  false and fallback to use BlockingPlanRootSink.

* If scratch_limit > minMemReservationBytes but still fairly low, we
  lower the max_result_spooling_mem (default is 100MB) and
  max_spilled_result_spooling_mem (default is 1GB) to fit scratch_limit.

* if scratch_limit > max_spilled_result_spooling_mem, do nothing.

Testing:
- Add TestScratchLimit::test_result_spooling_and_varying_scratch_limit
- Verify that spool_query_results query option is disabled in
  TestScratchDir::test_no_dirs
- Pass exhaustive tests.

Change-Id: I541f46e6911694e14c0fc25be1a6982fd929d3a9
Reviewed-on: http://gerrit.cloudera.org:8080/17166
Reviewed-by: Aman Sinha 
Tested-by: Aman Sinha 
---
M be/src/service/query-options-test.cc
M be/src/util/backend-gflag-util.cc
M common/thrift/BackendGflags.thrift
M fe/src/main/java/org/apache/impala/planner/PlanRootSink.java
M fe/src/main/java/org/apache/impala/service/BackendConfig.java
A testdata/workloads/functional-query/queries/QueryTest/scratch-limit.test
M tests/custom_cluster/test_scratch_disk.py
M tests/query_test/test_scratch_limit.py
8 files changed, 143 insertions(+), 6 deletions(-)

Approvals:
  Aman Sinha: Looks good to me, approved; Verified

--
To view, visit http://gerrit.cloudera.org:8080/17166
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: I541f46e6911694e14c0fc25be1a6982fd929d3a9
Gerrit-Change-Number: 17166
Gerrit-PatchSet: 8
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Riza Suminto 


[Impala-ASF-CR] IMPALA-10565: Adjust result spooling memory based on scratch limit

2021-03-13 Thread Aman Sinha (Code Review)
Aman Sinha has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17166 )

Change subject: IMPALA-10565: Adjust result spooling memory based on 
scratch_limit
..


Patch Set 7: Verified+1 Code-Review+2

Carry prior +2 for code review and +1 for gerrit-verify-dryrun.


--
To view, visit http://gerrit.cloudera.org:8080/17166
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I541f46e6911694e14c0fc25be1a6982fd929d3a9
Gerrit-Change-Number: 17166
Gerrit-PatchSet: 7
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Comment-Date: Sun, 14 Mar 2021 03:35:30 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10565: Adjust result spooling memory based on scratch limit

2021-03-13 Thread Riza Suminto (Code Review)
Riza Suminto has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17166 )

Change subject: IMPALA-10565: Adjust result spooling memory based on 
scratch_limit
..


Patch Set 7:

Patch set 7 is a rebase of patch set 6.


--
To view, visit http://gerrit.cloudera.org:8080/17166
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I541f46e6911694e14c0fc25be1a6982fd929d3a9
Gerrit-Change-Number: 17166
Gerrit-PatchSet: 7
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Comment-Date: Sun, 14 Mar 2021 03:32:06 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10565: Adjust result spooling memory based on scratch limit

2021-03-13 Thread Riza Suminto (Code Review)
Hello Aman Sinha, Bikramjeet Vig, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/17166

to look at the new patch set (#7).

Change subject: IMPALA-10565: Adjust result spooling memory based on 
scratch_limit
..

IMPALA-10565: Adjust result spooling memory based on scratch_limit

IMPALA-9856 enables result spooling by default. Result spooling depends
on the ability to spill its entire BufferedTupleStream to disk once it
hits maximum memory reservation. However, if the query option
scratch_limit is set lower than max_spilled_result_spooling_mem, the
query might fail in the middle of execution due to insufficient scratch
space. This patch adds planner change to consider scratch_limit and
scratch_dirs query option when computing resource used by result
spooling. The algorithm is as follow:

* If scratch_dirs is empty or scratch_limit < minMemReservationBytes
  required to use BufferedPlanRootSink, we set spool_query_results to
  false and fallback to use BlockingPlanRootSink.

* If scratch_limit > minMemReservationBytes but still fairly low, we
  lower the max_result_spooling_mem (default is 100MB) and
  max_spilled_result_spooling_mem (default is 1GB) to fit scratch_limit.

* if scratch_limit > max_spilled_result_spooling_mem, do nothing.

Testing:
- Add TestScratchLimit::test_result_spooling_and_varying_scratch_limit
- Verify that spool_query_results query option is disabled in
  TestScratchDir::test_no_dirs
- Pass exhaustive tests.

Change-Id: I541f46e6911694e14c0fc25be1a6982fd929d3a9
---
M be/src/service/query-options-test.cc
M be/src/util/backend-gflag-util.cc
M common/thrift/BackendGflags.thrift
M fe/src/main/java/org/apache/impala/planner/PlanRootSink.java
M fe/src/main/java/org/apache/impala/service/BackendConfig.java
A testdata/workloads/functional-query/queries/QueryTest/scratch-limit.test
M tests/custom_cluster/test_scratch_disk.py
M tests/query_test/test_scratch_limit.py
8 files changed, 143 insertions(+), 6 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/66/17166/7
--
To view, visit http://gerrit.cloudera.org:8080/17166
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I541f46e6911694e14c0fc25be1a6982fd929d3a9
Gerrit-Change-Number: 17166
Gerrit-PatchSet: 7
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Riza Suminto 


[Impala-ASF-CR] IMPALA-10565: Adjust result spooling memory based on scratch limit

2021-03-13 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17166 )

Change subject: IMPALA-10565: Adjust result spooling memory based on 
scratch_limit
..


Patch Set 6: Verified+1


--
To view, visit http://gerrit.cloudera.org:8080/17166
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I541f46e6911694e14c0fc25be1a6982fd929d3a9
Gerrit-Change-Number: 17166
Gerrit-PatchSet: 6
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Comment-Date: Sat, 13 Mar 2021 23:42:18 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10565: Adjust result spooling memory based on scratch limit

2021-03-13 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17166 )

Change subject: IMPALA-10565: Adjust result spooling memory based on 
scratch_limit
..


Patch Set 6:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/6966/ 
DRY_RUN=false


--
To view, visit http://gerrit.cloudera.org:8080/17166
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I541f46e6911694e14c0fc25be1a6982fd929d3a9
Gerrit-Change-Number: 17166
Gerrit-PatchSet: 6
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Comment-Date: Sat, 13 Mar 2021 18:02:51 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10565: Adjust result spooling memory based on scratch limit

2021-03-13 Thread Aman Sinha (Code Review)
Aman Sinha has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17166 )

Change subject: IMPALA-10565: Adjust result spooling memory based on 
scratch_limit
..


Patch Set 6:

> Patch Set 6:
>
> > Yes, let's do that. I presume you will explicitly enable result_spooling 
> > only for the following test?
> > def test_result_spooling_and_varying_scratch_limit(self, vector):
>
> Yes. My plan is to disable result spooling as a separate patch for 
> IMPALA-10559.
> This patch then only have result spooling enabled for 
> test_result_spooling_and_varying_scratch_limit.

Will restart a run.


--
To view, visit http://gerrit.cloudera.org:8080/17166
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I541f46e6911694e14c0fc25be1a6982fd929d3a9
Gerrit-Change-Number: 17166
Gerrit-PatchSet: 6
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Comment-Date: Sat, 13 Mar 2021 18:02:37 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10565: Adjust result spooling memory based on scratch limit

2021-03-13 Thread Riza Suminto (Code Review)
Riza Suminto has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17166 )

Change subject: IMPALA-10565: Adjust result spooling memory based on 
scratch_limit
..


Patch Set 6:

> Yes, let's do that. I presume you will explicitly enable result_spooling only 
> for the following test?
> def test_result_spooling_and_varying_scratch_limit(self, vector):

Yes. My plan is to disable result spooling as a separate patch for IMPALA-10559.
This patch then only have result spooling enabled for 
test_result_spooling_and_varying_scratch_limit.


--
To view, visit http://gerrit.cloudera.org:8080/17166
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I541f46e6911694e14c0fc25be1a6982fd929d3a9
Gerrit-Change-Number: 17166
Gerrit-PatchSet: 6
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Comment-Date: Sat, 13 Mar 2021 17:36:00 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10565: Adjust result spooling memory based on scratch limit

2021-03-13 Thread Aman Sinha (Code Review)
Aman Sinha has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17166 )

Change subject: IMPALA-10565: Adjust result spooling memory based on 
scratch_limit
..


Patch Set 6:

> Patch Set 6:
>
> > Patch Set 6:
> >
> > > Patch Set 6:
> > >
> > > > Patch Set 6: Verified-1
> > > >
> > > > Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/6965/
> > >
> > > Looks like it hit flaky 
> > > TestScratchLimit::test_with_unlimited_scratch_limit (IMPALA-10559).
> >
> > Yeah, that is one failure..but I also see for the dockerized tests bunch of 
> > other failures: 
> > https://jenkins.impala.io/job/ubuntu-16.04-dockerised-tests/3935/ although 
> > it seems the failures are all related to lost connection.
> > E   ImpalaBeeswaxException: ImpalaBeeswaxException:
> > EINNER EXCEPTION:  > 'thrift.transport.TTransport.TTransportException'>
> > EMESSAGE: TSocket read 0 bytes
>
> TestScratchLimit::test_with_unlimited_scratch_limit crash one of the impalad 
> due to DCHECK violation in reservation-tracker.cc:436. It is a violation when 
> BufferedTupleStream is spilling and subsequently allocating new page for read.
>
> It is unrelated with this patch. But the combination of sort query + result 
> spooling in this test seems to reveal another bug in BufferedTupleStream.
> At this point, I think we should disable result spooling for all test in 
> TestScratchLimit (per discussion in IMPALA-10559 jira) and investigate it 
> separately. What do you think?

Yes, let's do that. I presume you will explicitly enable result_spooling only 
for the following test?
def test_result_spooling_and_varying_scratch_limit(self, vector):


--
To view, visit http://gerrit.cloudera.org:8080/17166
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I541f46e6911694e14c0fc25be1a6982fd929d3a9
Gerrit-Change-Number: 17166
Gerrit-PatchSet: 6
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Comment-Date: Sat, 13 Mar 2021 17:30:27 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10565: Adjust result spooling memory based on scratch limit

2021-03-13 Thread Riza Suminto (Code Review)
Riza Suminto has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17166 )

Change subject: IMPALA-10565: Adjust result spooling memory based on 
scratch_limit
..


Patch Set 6:

> Patch Set 6:
>
> > Patch Set 6:
> >
> > > Patch Set 6: Verified-1
> > >
> > > Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/6965/
> >
> > Looks like it hit flaky TestScratchLimit::test_with_unlimited_scratch_limit 
> > (IMPALA-10559).
>
> Yeah, that is one failure..but I also see for the dockerized tests bunch of 
> other failures: 
> https://jenkins.impala.io/job/ubuntu-16.04-dockerised-tests/3935/ although it 
> seems the failures are all related to lost connection.
> E   ImpalaBeeswaxException: ImpalaBeeswaxException:
> EINNER EXCEPTION:  'thrift.transport.TTransport.TTransportException'>
> EMESSAGE: TSocket read 0 bytes

TestScratchLimit::test_with_unlimited_scratch_limit crash one of the impalad 
due to DCHECK violation in reservation-tracker.cc:436. It is a violation when 
BufferedTupleStream is spilling and subsequently allocating new page for read.

It is unrelated with this patch. But the combination of sort query + result 
spooling in this test seems to reveal another bug in BufferedTupleStream.
At this point, I think we should disable result spooling for all test in 
TestScratchLimit (per discussion in IMPALA-10559 jira) and investigate it 
separately. What do you think?


--
To view, visit http://gerrit.cloudera.org:8080/17166
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I541f46e6911694e14c0fc25be1a6982fd929d3a9
Gerrit-Change-Number: 17166
Gerrit-PatchSet: 6
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Comment-Date: Sat, 13 Mar 2021 17:10:48 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10565: Adjust result spooling memory based on scratch limit

2021-03-13 Thread Aman Sinha (Code Review)
Aman Sinha has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17166 )

Change subject: IMPALA-10565: Adjust result spooling memory based on 
scratch_limit
..


Patch Set 6:

> Patch Set 6:
>
> > Patch Set 6: Verified-1
> >
> > Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/6965/
>
> Looks like it hit flaky TestScratchLimit::test_with_unlimited_scratch_limit 
> (IMPALA-10559).

Yeah, that is one failure..but I also see for the dockerized tests bunch of 
other failures: 
https://jenkins.impala.io/job/ubuntu-16.04-dockerised-tests/3935/ although it 
seems the failures are all related to lost connection.
E   ImpalaBeeswaxException: ImpalaBeeswaxException:
EINNER EXCEPTION: 
EMESSAGE: TSocket read 0 bytes


--
To view, visit http://gerrit.cloudera.org:8080/17166
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I541f46e6911694e14c0fc25be1a6982fd929d3a9
Gerrit-Change-Number: 17166
Gerrit-PatchSet: 6
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Comment-Date: Sat, 13 Mar 2021 16:58:34 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10565: Adjust result spooling memory based on scratch limit

2021-03-13 Thread Riza Suminto (Code Review)
Riza Suminto has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17166 )

Change subject: IMPALA-10565: Adjust result spooling memory based on 
scratch_limit
..


Patch Set 6:

> Patch Set 6: Verified-1
>
> Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/6965/

Looks like it hit flaky TestScratchLimit::test_with_unlimited_scratch_limit 
(IMPALA-10559).


--
To view, visit http://gerrit.cloudera.org:8080/17166
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I541f46e6911694e14c0fc25be1a6982fd929d3a9
Gerrit-Change-Number: 17166
Gerrit-PatchSet: 6
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Comment-Date: Sat, 13 Mar 2021 15:49:58 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10565: Adjust result spooling memory based on scratch limit

2021-03-13 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17166 )

Change subject: IMPALA-10565: Adjust result spooling memory based on 
scratch_limit
..


Patch Set 6: Verified-1

Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/6965/


--
To view, visit http://gerrit.cloudera.org:8080/17166
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I541f46e6911694e14c0fc25be1a6982fd929d3a9
Gerrit-Change-Number: 17166
Gerrit-PatchSet: 6
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Comment-Date: Sat, 13 Mar 2021 12:46:35 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10565: Adjust result spooling memory based on scratch limit

2021-03-12 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17166 )

Change subject: IMPALA-10565: Adjust result spooling memory based on 
scratch_limit
..


Patch Set 6:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/8357/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/17166
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I541f46e6911694e14c0fc25be1a6982fd929d3a9
Gerrit-Change-Number: 17166
Gerrit-PatchSet: 6
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Comment-Date: Sat, 13 Mar 2021 07:04:03 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10565: Adjust result spooling memory based on scratch limit

2021-03-12 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17166 )

Change subject: IMPALA-10565: Adjust result spooling memory based on 
scratch_limit
..


Patch Set 6:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/6965/ 
DRY_RUN=false


--
To view, visit http://gerrit.cloudera.org:8080/17166
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I541f46e6911694e14c0fc25be1a6982fd929d3a9
Gerrit-Change-Number: 17166
Gerrit-PatchSet: 6
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Comment-Date: Sat, 13 Mar 2021 06:55:38 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10565: Adjust result spooling memory based on scratch limit

2021-03-12 Thread Aman Sinha (Code Review)
Aman Sinha has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17166 )

Change subject: IMPALA-10565: Adjust result spooling memory based on 
scratch_limit
..


Patch Set 6: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/17166
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I541f46e6911694e14c0fc25be1a6982fd929d3a9
Gerrit-Change-Number: 17166
Gerrit-PatchSet: 6
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Comment-Date: Sat, 13 Mar 2021 06:53:19 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10565: Adjust result spooling memory based on scratch limit

2021-03-12 Thread Riza Suminto (Code Review)
Riza Suminto has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17166 )

Change subject: IMPALA-10565: Adjust result spooling memory based on 
scratch_limit
..


Patch Set 6:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/17166/5/fe/src/main/java/org/apache/impala/planner/PlanRootSink.java
File fe/src/main/java/org/apache/impala/planner/PlanRootSink.java:

http://gerrit.cloudera.org:8080/#/c/17166/5/fe/src/main/java/org/apache/impala/planner/PlanRootSink.java@123
PS5, Line 123:  // If maxAllowedScratchLimit < minMemReservat
> nit: Shouldn't this be maxAllowedScratchLimit < minMemReservationBytes ?
Thanks! Sorry for missing this.


http://gerrit.cloudera.org:8080/#/c/17166/5/fe/src/main/java/org/apache/impala/planner/PlanRootSink.java@125
PS5, Line 125: If maxAllowedScratchLimit < maxMemReserva
> nit: Similar update needed here ?
Done



--
To view, visit http://gerrit.cloudera.org:8080/17166
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I541f46e6911694e14c0fc25be1a6982fd929d3a9
Gerrit-Change-Number: 17166
Gerrit-PatchSet: 6
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Comment-Date: Sat, 13 Mar 2021 06:44:48 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-10565: Adjust result spooling memory based on scratch limit

2021-03-12 Thread Riza Suminto (Code Review)
Hello Aman Sinha, Bikramjeet Vig, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/17166

to look at the new patch set (#6).

Change subject: IMPALA-10565: Adjust result spooling memory based on 
scratch_limit
..

IMPALA-10565: Adjust result spooling memory based on scratch_limit

IMPALA-9856 enables result spooling by default. Result spooling depends
on the ability to spill its entire BufferedTupleStream to disk once it
hits maximum memory reservation. However, if the query option
scratch_limit is set lower than max_spilled_result_spooling_mem, the
query might fail in the middle of execution due to insufficient scratch
space. This patch adds planner change to consider scratch_limit and
scratch_dirs query option when computing resource used by result
spooling. The algorithm is as follow:

* If scratch_dirs is empty or scratch_limit < minMemReservationBytes
  required to use BufferedPlanRootSink, we set spool_query_results to
  false and fallback to use BlockingPlanRootSink.

* If scratch_limit > minMemReservationBytes but still fairly low, we
  lower the max_result_spooling_mem (default is 100MB) and
  max_spilled_result_spooling_mem (default is 1GB) to fit scratch_limit.

* if scratch_limit > max_spilled_result_spooling_mem, do nothing.

Testing:
- Add TestScratchLimit::test_result_spooling_and_varying_scratch_limit
- Verify that spool_query_results query option is disabled in
  TestScratchDir::test_no_dirs
- Pass exhaustive tests.

Change-Id: I541f46e6911694e14c0fc25be1a6982fd929d3a9
---
M be/src/service/query-options-test.cc
M be/src/util/backend-gflag-util.cc
M common/thrift/BackendGflags.thrift
M fe/src/main/java/org/apache/impala/planner/PlanRootSink.java
M fe/src/main/java/org/apache/impala/service/BackendConfig.java
A testdata/workloads/functional-query/queries/QueryTest/scratch-limit.test
M tests/custom_cluster/test_scratch_disk.py
M tests/query_test/test_scratch_limit.py
8 files changed, 143 insertions(+), 6 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/66/17166/6
--
To view, visit http://gerrit.cloudera.org:8080/17166
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I541f46e6911694e14c0fc25be1a6982fd929d3a9
Gerrit-Change-Number: 17166
Gerrit-PatchSet: 6
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Riza Suminto 


[Impala-ASF-CR] IMPALA-10565: Adjust result spooling memory based on scratch limit

2021-03-12 Thread Aman Sinha (Code Review)
Aman Sinha has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17166 )

Change subject: IMPALA-10565: Adjust result spooling memory based on 
scratch_limit
..


Patch Set 5: Code-Review+1

(2 comments)

Couple of nits. Rest of it LGTM.

http://gerrit.cloudera.org:8080/#/c/17166/5/fe/src/main/java/org/apache/impala/planner/PlanRootSink.java
File fe/src/main/java/org/apache/impala/planner/PlanRootSink.java:

http://gerrit.cloudera.org:8080/#/c/17166/5/fe/src/main/java/org/apache/impala/planner/PlanRootSink.java@123
PS5, Line 123:  // If scratch_limit < maxAllowedScratchLimit,
nit: Shouldn't this be maxAllowedScratchLimit < minMemReservationBytes ?


http://gerrit.cloudera.org:8080/#/c/17166/5/fe/src/main/java/org/apache/impala/planner/PlanRootSink.java@125
PS5, Line 125: If scratch_limit < maxAllowedScratchLimit
nit: Similar update needed here ?



--
To view, visit http://gerrit.cloudera.org:8080/17166
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I541f46e6911694e14c0fc25be1a6982fd929d3a9
Gerrit-Change-Number: 17166
Gerrit-PatchSet: 5
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Comment-Date: Sat, 13 Mar 2021 06:39:06 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-10565: Adjust result spooling memory based on scratch limit

2021-03-12 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17166 )

Change subject: IMPALA-10565: Adjust result spooling memory based on 
scratch_limit
..


Patch Set 5:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/8355/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/17166
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I541f46e6911694e14c0fc25be1a6982fd929d3a9
Gerrit-Change-Number: 17166
Gerrit-PatchSet: 5
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Comment-Date: Sat, 13 Mar 2021 04:17:30 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10565: Adjust result spooling memory based on scratch limit

2021-03-12 Thread Riza Suminto (Code Review)
Riza Suminto has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17166 )

Change subject: IMPALA-10565: Adjust result spooling memory based on 
scratch_limit
..


Patch Set 5:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/17166/4/fe/src/main/java/org/apache/impala/planner/PlanRootSink.java
File fe/src/main/java/org/apache/impala/planner/PlanRootSink.java:

http://gerrit.cloudera.org:8080/#/c/17166/4/fe/src/main/java/org/apache/impala/planner/PlanRootSink.java@120
PS4, Line 120:
> ACK.
Done


http://gerrit.cloudera.org:8080/#/c/17166/4/fe/src/main/java/org/apache/impala/planner/PlanRootSink.java@142
PS4, Line 142:   queryOptions.setSpool_query_results(false);
> Yeah, that will make the code cleaner.  Agree about the first point about u
Done



--
To view, visit http://gerrit.cloudera.org:8080/17166
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I541f46e6911694e14c0fc25be1a6982fd929d3a9
Gerrit-Change-Number: 17166
Gerrit-PatchSet: 5
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Comment-Date: Sat, 13 Mar 2021 03:59:21 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-10565: Adjust result spooling memory based on scratch limit

2021-03-12 Thread Riza Suminto (Code Review)
Hello Aman Sinha, Bikramjeet Vig, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/17166

to look at the new patch set (#5).

Change subject: IMPALA-10565: Adjust result spooling memory based on 
scratch_limit
..

IMPALA-10565: Adjust result spooling memory based on scratch_limit

IMPALA-9856 enables result spooling by default. Result spooling depends
on the ability to spill its entire BufferedTupleStream to disk once it
hits maximum memory reservation. However, if the query option
scratch_limit is set lower than max_spilled_result_spooling_mem, the
query might fail in the middle of execution due to insufficient scratch
space. This patch adds planner change to consider scratch_limit and
scratch_dirs query option when computing resource used by result
spooling. The algorithm is as follow:

* If scratch_dirs is empty or scratch_limit < minMemReservationBytes
  required to use BufferedPlanRootSink, we set spool_query_results to
  false and fallback to use BlockingPlanRootSink.

* If scratch_limit > minMemReservationBytes but still fairly low, we
  lower the max_result_spooling_mem (default is 100MB) and
  max_spilled_result_spooling_mem (default is 1GB) to fit scratch_limit.

* if scratch_limit > max_spilled_result_spooling_mem, do nothing.

Testing:
- Add TestScratchLimit::test_result_spooling_and_varying_scratch_limit
- Verify that spool_query_results query option is disabled in
  TestScratchDir::test_no_dirs
- Pass exhaustive tests.

Change-Id: I541f46e6911694e14c0fc25be1a6982fd929d3a9
---
M be/src/service/query-options-test.cc
M be/src/util/backend-gflag-util.cc
M common/thrift/BackendGflags.thrift
M fe/src/main/java/org/apache/impala/planner/PlanRootSink.java
M fe/src/main/java/org/apache/impala/service/BackendConfig.java
A testdata/workloads/functional-query/queries/QueryTest/scratch-limit.test
M tests/custom_cluster/test_scratch_disk.py
M tests/query_test/test_scratch_limit.py
8 files changed, 143 insertions(+), 6 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/66/17166/5
--
To view, visit http://gerrit.cloudera.org:8080/17166
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I541f46e6911694e14c0fc25be1a6982fd929d3a9
Gerrit-Change-Number: 17166
Gerrit-PatchSet: 5
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Riza Suminto 


[Impala-ASF-CR] IMPALA-10565: Adjust result spooling memory based on scratch limit

2021-03-12 Thread Aman Sinha (Code Review)
Aman Sinha has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17166 )

Change subject: IMPALA-10565: Adjust result spooling memory based on 
scratch_limit
..


Patch Set 4:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/17166/4/fe/src/main/java/org/apache/impala/planner/PlanRootSink.java
File fe/src/main/java/org/apache/impala/planner/PlanRootSink.java:

http://gerrit.cloudera.org:8080/#/c/17166/4/fe/src/main/java/org/apache/impala/planner/PlanRootSink.java@142
PS4, Line 142:   maxMemReservationBytes = scratchLimit - 
maxRowBufferSize;
> If scratch_limit is unbounded, the maxMemReservationBytes calculation in li
Yeah, that will make the code cleaner.  Agree about the first point about 
unbounded scratch_limit.



--
To view, visit http://gerrit.cloudera.org:8080/17166
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I541f46e6911694e14c0fc25be1a6982fd929d3a9
Gerrit-Change-Number: 17166
Gerrit-PatchSet: 4
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Comment-Date: Sat, 13 Mar 2021 02:05:37 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-10565: Adjust result spooling memory based on scratch limit

2021-03-12 Thread Riza Suminto (Code Review)
Riza Suminto has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17166 )

Change subject: IMPALA-10565: Adjust result spooling memory based on 
scratch_limit
..


Patch Set 4:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/17166/4/fe/src/main/java/org/apache/impala/planner/PlanRootSink.java
File fe/src/main/java/org/apache/impala/planner/PlanRootSink.java:

http://gerrit.cloudera.org:8080/#/c/17166/4/fe/src/main/java/org/apache/impala/planner/PlanRootSink.java@120
PS4, Line 120: scratch_limit < minMemReservationBytes
> Update this and the one below to account for the extra maxRowBufferSize
ACK.


http://gerrit.cloudera.org:8080/#/c/17166/4/fe/src/main/java/org/apache/impala/planner/PlanRootSink.java@142
PS4, Line 142:   maxMemReservationBytes = scratchLimit - 
maxRowBufferSize;
> For this adjustment for maxRowBufferSize, can we not just do it up front (o
If scratch_limit is unbounded, the maxMemReservationBytes calculation in line 
114 is OK. Little overspill will not fail the query.
In contrary, if scratch_limit is bounded, just a little overspill will 
terminate the query because scratch_limit is strictly enforced.

What if I tidy up the comparison a bit so that it looks simpler? We define

  long maxAllowedScratchLimit = scratchLimit - maxRowBufferSize;

Instead of comparing against scratchLimit, these should compare against 
maxAllowedScratchLimit;



--
To view, visit http://gerrit.cloudera.org:8080/17166
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I541f46e6911694e14c0fc25be1a6982fd929d3a9
Gerrit-Change-Number: 17166
Gerrit-PatchSet: 4
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Comment-Date: Sat, 13 Mar 2021 01:47:52 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-10565: Adjust result spooling memory based on scratch limit

2021-03-12 Thread Aman Sinha (Code Review)
Aman Sinha has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17166 )

Change subject: IMPALA-10565: Adjust result spooling memory based on 
scratch_limit
..


Patch Set 4:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/17166/4/fe/src/main/java/org/apache/impala/planner/PlanRootSink.java
File fe/src/main/java/org/apache/impala/planner/PlanRootSink.java:

http://gerrit.cloudera.org:8080/#/c/17166/4/fe/src/main/java/org/apache/impala/planner/PlanRootSink.java@120
PS4, Line 120: scratch_limit < minMemReservationBytes
Update this and the one below to account for the extra maxRowBufferSize


http://gerrit.cloudera.org:8080/#/c/17166/4/fe/src/main/java/org/apache/impala/planner/PlanRootSink.java@142
PS4, Line 142:   maxMemReservationBytes = scratchLimit - 
maxRowBufferSize;
For this adjustment for maxRowBufferSize, can we not just do it up front (on 
line 114) since we know that maxMemReservationBytes should always be 
conservative such that it leaves a cushion for maxRowBufferSize. It should 
simplify the logic and presumably not cause other side effects (unless I am 
missing something).



--
To view, visit http://gerrit.cloudera.org:8080/17166
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I541f46e6911694e14c0fc25be1a6982fd929d3a9
Gerrit-Change-Number: 17166
Gerrit-PatchSet: 4
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Comment-Date: Sat, 13 Mar 2021 01:14:44 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-10565: Adjust result spooling memory based on scratch limit

2021-03-12 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17166 )

Change subject: IMPALA-10565: Adjust result spooling memory based on 
scratch_limit
..


Patch Set 4:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/8351/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/17166
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I541f46e6911694e14c0fc25be1a6982fd929d3a9
Gerrit-Change-Number: 17166
Gerrit-PatchSet: 4
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Comment-Date: Fri, 12 Mar 2021 20:12:32 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10565: Adjust result spooling memory based on scratch limit

2021-03-12 Thread Riza Suminto (Code Review)
Riza Suminto has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17166 )

Change subject: IMPALA-10565: Adjust result spooling memory based on 
scratch_limit
..


Patch Set 4:

(8 comments)

My exhaustive test run last night reveal that scratch_limit might still get 
violated if maxMemReservationBytes is equal to scratch_limit. This is because 
the content of SpillableRowBatchQueue can be slightly higher than 
maxMemReservationBytes when it decide to spill.

To anticipate that, I lower the spooling mem config a little further here in 
Patch Set 4.

http://gerrit.cloudera.org:8080/#/c/17166/3/be/src/service/query-options.cc
File be/src/service/query-options.cc:

http://gerrit.cloudera.org:8080/#/c/17166/3/be/src/service/query-options.cc@1104
PS3, Line 1104:   // max_spilled_result_spooling_mem (a value of 0 means memory 
is unbounded).
> I just figured out in ParseUtil::ParseMemSpec() that -1 for memory query op
Done


http://gerrit.cloudera.org:8080/#/c/17166/3/fe/src/main/java/org/apache/impala/planner/PlanRootSink.java
File fe/src/main/java/org/apache/impala/planner/PlanRootSink.java:

http://gerrit.cloudera.org:8080/#/c/17166/3/fe/src/main/java/org/apache/impala/planner/PlanRootSink.java@77
PS3, Line 77:* If SPOOL_QUERY_RESULTS is true, then the ResourceProfile 
sets a min/max resevation,
> Some of the method level comment should be updated to reflect the behavior
Done


http://gerrit.cloudera.org:8080/#/c/17166/3/fe/src/main/java/org/apache/impala/planner/PlanRootSink.java@92
PS3, Line 92:
> nit: typo ?
Done


http://gerrit.cloudera.org:8080/#/c/17166/3/fe/src/main/java/org/apache/impala/planner/PlanRootSink.java@110
PS3, Line 110:   long bufferSize = 
queryOptions.getDefault_spillable_buffer_size();
 :   long maxRowBufferSize = 
PlanNode.computeMaxSpillableBufferSize(
> It sounds like an existing bug.  If you can create a test case for it can y
I filed IMPALA-10583. Will work on that next.


http://gerrit.cloudera.org:8080/#/c/17166/3/fe/src/main/java/org/apache/impala/planner/PlanRootSink.java@126
PS3, Line 126:
> Suggest rewording:  'to >=' minMemReservationBytes
Done


http://gerrit.cloudera.org:8080/#/c/17166/3/fe/src/main/java/org/apache/impala/planner/PlanRootSink.java@126
PS3, Line 126:
> nit: 'increasing'
Done


http://gerrit.cloudera.org:8080/#/c/17166/3/fe/src/main/java/org/apache/impala/planner/PlanRootSink.java@142
PS3, Line 142:   maxMemReservationBytes = scratchLimit - 
maxRowBufferSize;
> Would be useful to add a trace level log message here as well.
Done


http://gerrit.cloudera.org:8080/#/c/17166/3/testdata/workloads/functional-query/queries/QueryTest/scratch-limit.test
File testdata/workloads/functional-query/queries/QueryTest/scratch-limit.test:

http://gerrit.cloudera.org:8080/#/c/17166/3/testdata/workloads/functional-query/queries/QueryTest/scratch-limit.test@2
PS3, Line 2:  QUERY
> Could you add 1 tests with empty scratch dirs ?
Since scratch_dirs is a backend flag, I piggy back the test under 
TestScratchDir::test_no_dirs



--
To view, visit http://gerrit.cloudera.org:8080/17166
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I541f46e6911694e14c0fc25be1a6982fd929d3a9
Gerrit-Change-Number: 17166
Gerrit-PatchSet: 4
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Comment-Date: Fri, 12 Mar 2021 20:00:15 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-10565: Adjust result spooling memory based on scratch limit

2021-03-12 Thread Riza Suminto (Code Review)
Hello Aman Sinha, Bikramjeet Vig, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/17166

to look at the new patch set (#4).

Change subject: IMPALA-10565: Adjust result spooling memory based on 
scratch_limit
..

IMPALA-10565: Adjust result spooling memory based on scratch_limit

IMPALA-9856 enables result spooling by default. Result spooling depends
on the ability to spill its entire BufferedTupleStream to disk once it
hits maximum memory reservation. However, if the query option
scratch_limit is set lower than max_spilled_result_spooling_mem, the
query might fail in the middle of execution due to insufficient scratch
space. This patch adds planner change to consider scratch_limit and
scratch_dirs query option when computing resource used by result
spooling. The algorithm is as follow:

* If scratch_dirs is empty or scratch_limit < minMemReservationBytes
  required to use BufferedPlanRootSink, we set spool_query_results to
  false and fallback to use BlockingPlanRootSink.

* If scratch_limit > minMemReservationBytes but still fairly low, we
  lower the max_result_spooling_mem (default is 100MB) and
  max_spilled_result_spooling_mem (default is 1GB) to fit scratch_limit.

* if scratch_limit > max_spilled_result_spooling_mem, do nothing.

Testing:
- Add TestScratchLimit::test_result_spooling_and_varying_scratch_limit
- Verify that spool_query_results query option is disabled in
  TestScratchDir::test_no_dirs
- Pass exhaustive tests.

Change-Id: I541f46e6911694e14c0fc25be1a6982fd929d3a9
---
M be/src/service/query-options-test.cc
M be/src/util/backend-gflag-util.cc
M common/thrift/BackendGflags.thrift
M fe/src/main/java/org/apache/impala/planner/PlanRootSink.java
M fe/src/main/java/org/apache/impala/service/BackendConfig.java
A testdata/workloads/functional-query/queries/QueryTest/scratch-limit.test
M tests/custom_cluster/test_scratch_disk.py
M tests/query_test/test_scratch_limit.py
8 files changed, 140 insertions(+), 6 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/66/17166/4
--
To view, visit http://gerrit.cloudera.org:8080/17166
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I541f46e6911694e14c0fc25be1a6982fd929d3a9
Gerrit-Change-Number: 17166
Gerrit-PatchSet: 4
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Riza Suminto 


[Impala-ASF-CR] IMPALA-10565: Adjust result spooling memory based on scratch limit

2021-03-12 Thread Aman Sinha (Code Review)
Aman Sinha has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17166 )

Change subject: IMPALA-10565: Adjust result spooling memory based on 
scratch_limit
..


Patch Set 3:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/17166/3/fe/src/main/java/org/apache/impala/planner/PlanRootSink.java
File fe/src/main/java/org/apache/impala/planner/PlanRootSink.java:

http://gerrit.cloudera.org:8080/#/c/17166/3/fe/src/main/java/org/apache/impala/planner/PlanRootSink.java@122
PS3, Line 122:   if (scratchLimit > -1) {
> Should this check be scratchLimit > 0 since -1 or 0 mean unbounded right ?
Ignore this comment.  I  was probably thinking about the memory setting (side 
effects of late night review).



--
To view, visit http://gerrit.cloudera.org:8080/17166
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I541f46e6911694e14c0fc25be1a6982fd929d3a9
Gerrit-Change-Number: 17166
Gerrit-PatchSet: 3
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Comment-Date: Fri, 12 Mar 2021 15:37:41 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-10565: Adjust result spooling memory based on scratch limit

2021-03-11 Thread Aman Sinha (Code Review)
Aman Sinha has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17166 )

Change subject: IMPALA-10565: Adjust result spooling memory based on 
scratch_limit
..


Patch Set 3:

(8 comments)

http://gerrit.cloudera.org:8080/#/c/17166/3/fe/src/main/java/org/apache/impala/planner/PlanRootSink.java
File fe/src/main/java/org/apache/impala/planner/PlanRootSink.java:

http://gerrit.cloudera.org:8080/#/c/17166/3/fe/src/main/java/org/apache/impala/planner/PlanRootSink.java@77
PS3, Line 77:* SPOOL_QUERY_RESULTS is true, then the ResourceProfile sets a 
min/max resevation,
Some of the method level comment should be updated to reflect the behavior now 
that an empty resource profile is created in some situations when 
spool_query_results is true.


http://gerrit.cloudera.org:8080/#/c/17166/3/fe/src/main/java/org/apache/impala/planner/PlanRootSink.java@92
PS3, Line 92: wither
nit: typo ?


http://gerrit.cloudera.org:8080/#/c/17166/3/fe/src/main/java/org/apache/impala/planner/PlanRootSink.java@110
PS3, Line 110:   long maxMemReservationBytes = Math.max(
 :   queryOptions.getMax_result_spooling_mem(), 
minMemReservationBytes);
> This maxMemReservationBytes does not seems to take account of max_result_sp
It sounds like an existing bug.  If you can create a test case for it can you 
file a separate JIRA ?


http://gerrit.cloudera.org:8080/#/c/17166/3/fe/src/main/java/org/apache/impala/planner/PlanRootSink.java@122
PS3, Line 122:   if (scratchLimit > -1) {
Should this check be scratchLimit > 0 since -1 or 0 mean unbounded right ?


http://gerrit.cloudera.org:8080/#/c/17166/3/fe/src/main/java/org/apache/impala/planner/PlanRootSink.java@126
PS3, Line 126: increase
nit: 'increasing'


http://gerrit.cloudera.org:8080/#/c/17166/3/fe/src/main/java/org/apache/impala/planner/PlanRootSink.java@126
PS3, Line 126: to
Suggest rewording:  'to >=' minMemReservationBytes
(since one can bump it up to a much higher value)


http://gerrit.cloudera.org:8080/#/c/17166/3/fe/src/main/java/org/apache/impala/planner/PlanRootSink.java@142
PS3, Line 142:   
queryOptions.setMax_spilled_result_spooling_mem(scratchLimit);
Would be useful to add a trace level log message here as well.


http://gerrit.cloudera.org:8080/#/c/17166/3/testdata/workloads/functional-query/queries/QueryTest/scratch-limit.test
File testdata/workloads/functional-query/queries/QueryTest/scratch-limit.test:

http://gerrit.cloudera.org:8080/#/c/17166/3/testdata/workloads/functional-query/queries/QueryTest/scratch-limit.test@2
PS3, Line 2:  QUERY
Could you add 1 tests with empty scratch dirs ?



--
To view, visit http://gerrit.cloudera.org:8080/17166
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I541f46e6911694e14c0fc25be1a6982fd929d3a9
Gerrit-Change-Number: 17166
Gerrit-PatchSet: 3
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Comment-Date: Fri, 12 Mar 2021 07:58:29 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-10565: Adjust result spooling memory based on scratch limit

2021-03-11 Thread Riza Suminto (Code Review)
Riza Suminto has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17166 )

Change subject: IMPALA-10565: Adjust result spooling memory based on 
scratch_limit
..


Patch Set 3:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/17166/3/be/src/service/query-options.cc
File be/src/service/query-options.cc:

http://gerrit.cloudera.org:8080/#/c/17166/3/be/src/service/query-options.cc@1104
PS3, Line 1104:   // (a value of 0 or -1 means memory is unbounded).
I just figured out in ParseUtil::ParseMemSpec() that -1 for memory query option 
is accepted as indicator for infinite memory and translated into 0 return 
value. So testing against -1 here is not necessary. I will revert this change.


http://gerrit.cloudera.org:8080/#/c/17166/3/fe/src/main/java/org/apache/impala/planner/PlanRootSink.java
File fe/src/main/java/org/apache/impala/planner/PlanRootSink.java:

http://gerrit.cloudera.org:8080/#/c/17166/3/fe/src/main/java/org/apache/impala/planner/PlanRootSink.java@110
PS3, Line 110:   long maxMemReservationBytes = Math.max(
 :   queryOptions.getMax_result_spooling_mem(), 
minMemReservationBytes);
This maxMemReservationBytes does not seems to take account of 
max_result_spooling_mem = 0 (unbounded), which can peg maxMemReservationBytes = 
minMemReservationBytes. Im not sure what to set maxMemReservationBytes in this 
case.

Similarly, SpillableRowBatchQueue does not seem to recognize 
max_spilled_result_spooling_mem = 0 as unbounded.



--
To view, visit http://gerrit.cloudera.org:8080/17166
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I541f46e6911694e14c0fc25be1a6982fd929d3a9
Gerrit-Change-Number: 17166
Gerrit-PatchSet: 3
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Comment-Date: Fri, 12 Mar 2021 05:00:07 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-10565: Adjust result spooling memory based on scratch limit

2021-03-11 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17166 )

Change subject: IMPALA-10565: Adjust result spooling memory based on 
scratch_limit
..


Patch Set 3:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/8344/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/17166
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I541f46e6911694e14c0fc25be1a6982fd929d3a9
Gerrit-Change-Number: 17166
Gerrit-PatchSet: 3
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Comment-Date: Fri, 12 Mar 2021 00:13:52 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10565: Adjust result spooling memory based on scratch limit

2021-03-11 Thread Riza Suminto (Code Review)
Riza Suminto has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17166 )

Change subject: IMPALA-10565: Adjust result spooling memory based on 
scratch_limit
..


Patch Set 3:

Following up on private discussion, we decide that it is better to override 
result spooling configuration if scratch_limit is too low.

Patch set 3 implement this strategy. I will rerun an exhaustive tests over this 
patch set later tonight.


--
To view, visit http://gerrit.cloudera.org:8080/17166
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I541f46e6911694e14c0fc25be1a6982fd929d3a9
Gerrit-Change-Number: 17166
Gerrit-PatchSet: 3
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Comment-Date: Fri, 12 Mar 2021 00:00:47 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10565: Adjust result spooling memory based on scratch limit

2021-03-11 Thread Riza Suminto (Code Review)
Hello Aman Sinha, Bikramjeet Vig, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/17166

to look at the new patch set (#3).

Change subject: IMPALA-10565: Adjust result spooling memory based on 
scratch_limit
..

IMPALA-10565: Adjust result spooling memory based on scratch_limit

IMPALA-9856 enables result spooling by default. Result spooling depends
on the ability to spill its entire BufferedTupleStream to disk once it
hits maximum memory reservation. However, if the query option
scratch_limit is set lower than max_spilled_result_spooling_mem, the
query might fail in the middle of execution due to insufficient scratch
space. This patch adds planner change to consider scratch_limit and
scratch_dirs query option when computing resource used by result
spooling. The algorithm is as follow:

* If scratch_dirs is empty or scratch_limit < minMemReservationBytes
  required to use BufferedPlanRootSink, we set spool_query_results to
  false and fallback to use BlockingPlanRootSink.

* If scratch_limit > minMemReservationBytes but still fairly low, we
  lower the max_result_spooling_mem (default is 100MB) and
  max_spilled_result_spooling_mem (default is 1GB) to fit scratch_limit.

* if scratch_limit > max_spilled_result_spooling_mem, do nothing.

This patch also fix validation between max_result_spooling_mem and
max_spilled_result_spooling_mem that should treat both value 0 and -1 as
unbounded.

Testing:
- Add TestScratchLimit::test_result_spooling_and_varying_scratch_limit
- Remove spool_query_results query option in
  custom_cluster/test_scratch_disk.py
- Change be test QueryOptions.ResultSpooling to also test -1 as
  unbounded.
- Pass exhaustive tests.

Change-Id: I541f46e6911694e14c0fc25be1a6982fd929d3a9
---
M be/src/service/query-options-test.cc
M be/src/service/query-options.cc
M be/src/util/backend-gflag-util.cc
M common/thrift/BackendGflags.thrift
M fe/src/main/java/org/apache/impala/planner/PlanRootSink.java
M fe/src/main/java/org/apache/impala/service/BackendConfig.java
A testdata/workloads/functional-query/queries/QueryTest/scratch-limit.test
M tests/custom_cluster/test_scratch_disk.py
M tests/query_test/test_scratch_limit.py
9 files changed, 140 insertions(+), 12 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/66/17166/3
-- 
To view, visit http://gerrit.cloudera.org:8080/17166
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I541f46e6911694e14c0fc25be1a6982fd929d3a9
Gerrit-Change-Number: 17166
Gerrit-PatchSet: 3
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Riza Suminto