[Impala-ASF-CR] IMPALA-9930 (part 2): Introduce new admission control rpc service

2021-09-30 Thread Bikramjeet Vig (Code Review)
Bikramjeet Vig has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16412 )

Change subject: IMPALA-9930 (part 2): Introduce new admission control rpc 
service
..


Patch Set 14:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/16412/14/be/src/scheduling/admission-controller.cc
File be/src/scheduling/admission-controller.cc:

http://gerrit.cloudera.org:8080/#/c/16412/14/be/src/scheduling/admission-controller.cc@1155
PS14, Line 1155: if (!queued) queue_nodes_.erase(request.query_id);
> We can get a memory leak here.
You are absolutely right! I have created a ticket for this IMPALA-10942 and 
will get a fix for it in soon.
Thanks a lot for catching it.



--
To view, visit http://gerrit.cloudera.org:8080/16412
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I594fc593a27b24b6952e381a9bc1a9a5c6b757ae
Gerrit-Change-Number: 16412
Gerrit-PatchSet: 14
Gerrit-Owner: Thomas Tauber-Marshall 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Sahil Takiar 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: abeltian 
Gerrit-Comment-Date: Thu, 30 Sep 2021 20:14:43 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-9930 (part 2): Introduce new admission control rpc service

2021-09-29 Thread abeltian (Code Review)
abeltian has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16412 )

Change subject: IMPALA-9930 (part 2): Introduce new admission control rpc 
service
..


Patch Set 14:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/16412/14/be/src/scheduling/admission-controller.cc
File be/src/scheduling/admission-controller.cc:

http://gerrit.cloudera.org:8080/#/c/16412/14/be/src/scheduling/admission-controller.cc@1155
PS14, Line 1155: if (!queued) queue_nodes_.erase(request.query_id);
We can get a memory leak here.

I think you can use if (!*queued) at L1155.



-- 
To view, visit http://gerrit.cloudera.org:8080/16412
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I594fc593a27b24b6952e381a9bc1a9a5c6b757ae
Gerrit-Change-Number: 16412
Gerrit-PatchSet: 14
Gerrit-Owner: Thomas Tauber-Marshall 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Sahil Takiar 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: abeltian 
Gerrit-Comment-Date: Thu, 30 Sep 2021 01:46:10 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-9930 (part 2): Introduce new admission control rpc service

2020-12-03 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has submitted this change and it was merged. ( 
http://gerrit.cloudera.org:8080/16412 )

Change subject: IMPALA-9930 (part 2): Introduce new admission control rpc 
service
..

IMPALA-9930 (part 2): Introduce new admission control rpc service

This patch introduces a new krpc service, AdmissionControlService,
which coordinators can use to submit queries for admission.

This patch adds some simple configuration flags that make it possible
to have coordinators use this service to submit their queries for
admission to other coordinators. These flags are only to make this
patch testable and will be replaced when the separate admission
control daemon is introduced in IMPALA-9975.

The interface consists of the following RPCs:
- AdmitQuery: takes a TQueryExecRequest and a TQueryOptions
  (serialized into sidecars), places the request on a queue to be
  processed by a thread pool and then immediately returns.
- GetQueryStatus: takes a query id and returns the current admission
  status, including the QuerySchedulePB if admission has completed
  successfully but the query has not been released yet.
- ReleaseQueryBackends: called when individual backends complete but
  the overall query is still running to release resources
  incrementally. This RPC will be called at most O(log(# backends))
  per query due to BackendResourceState, which batches backends to
  release together.
- ReleaseQuery: called when the query has completely finished.
  Releases all remaining resources.
- CancelAdmission: called if a query is cancelled before an admission
  decision has been made to indicate that it should no longer be
  considered for admission.

The majority of the patch consists of two classes:
- AdmissionControlClient: used to abstract whether admission is being
  performed locally or remotely. In the local case, it is basically
  just a wrapper around AdmissionController. In the remote case, it
  handles serializing/deserializing of RPC params, polling
  GetQueryStatus() until a decision has been made, etc.
- AdmissionControlService: exports the RPC interface and acts as a
  wrapper around AdmissionController.

Some notable changes involved:
- AdmissionController::SubmitForAdmission() no longer blocks while a
  query is queued. Instead, a new function WaitOnQueued() can be used
  to monitor the admission status of a queued query.
- Adding events to the query timeline is moved out of
  AdmissionController and into the AdmissionControlClient classes, so
  that it always happens on the coordinator.
- When a cluster is run in the new admission control service mode,
  only the impalad that is performing admission control exposes the
  /admission http endpoint. Observability will be cleaned up in a
  subsequent patch.

Testing:
- Modified existing admission control tests to run both with and
  without the admission control service enabled, including both the
  functional and stress tests. The 'num_queries' param in the stress
  test is modified to only use a single value to reduce the number of
  tests that are run and keep the running time reasonable.
- Ran tpch10 on a local minicluster and observed no significant
  regressions.

Change-Id: I594fc593a27b24b6952e381a9bc1a9a5c6b757ae
Reviewed-on: http://gerrit.cloudera.org:8080/16412
Reviewed-by: Impala Public Jenkins 
Tested-by: Impala Public Jenkins 
---
M be/src/runtime/exec-env.cc
M be/src/runtime/exec-env.h
M be/src/scheduling/CMakeLists.txt
M be/src/scheduling/admission-control-client.cc
M be/src/scheduling/admission-control-client.h
A be/src/scheduling/admission-control-service.cc
A be/src/scheduling/admission-control-service.h
M be/src/scheduling/admission-controller-test.cc
M be/src/scheduling/admission-controller.cc
M be/src/scheduling/admission-controller.h
M be/src/scheduling/local-admission-control-client.cc
M be/src/scheduling/local-admission-control-client.h
A be/src/scheduling/remote-admission-control-client.cc
A be/src/scheduling/remote-admission-control-client.h
M be/src/scheduling/schedule-state.cc
M be/src/scheduling/schedule-state.h
M be/src/service/client-request-state.cc
M be/src/service/impala-http-handler.cc
M be/src/util/sharded-query-map-util.cc
M common/protobuf/admission_control_service.proto
M tests/common/resource_pool_config.py
M tests/custom_cluster/test_admission_controller.py
M tests/hs2/hs2_test_suite.py
M tests/util/web_pages_util.py
24 files changed, 1,270 insertions(+), 171 deletions(-)

Approvals:
  Impala Public Jenkins: Looks good to me, approved; Verified

--
To view, visit http://gerrit.cloudera.org:8080/16412
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: I594fc593a27b24b6952e381a9bc1a9a5c6b757ae
Gerrit-Change-Number: 16412
Gerrit-PatchSet: 14
Gerrit-Owner: Thomas Tauber-Marshall 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Impala 

[Impala-ASF-CR] IMPALA-9930 (part 2): Introduce new admission control rpc service

2020-12-03 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16412 )

Change subject: IMPALA-9930 (part 2): Introduce new admission control rpc 
service
..


Patch Set 13: Verified+1


--
To view, visit http://gerrit.cloudera.org:8080/16412
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I594fc593a27b24b6952e381a9bc1a9a5c6b757ae
Gerrit-Change-Number: 16412
Gerrit-PatchSet: 13
Gerrit-Owner: Thomas Tauber-Marshall 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Sahil Takiar 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Thu, 03 Dec 2020 23:46:27 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-9930 (part 2): Introduce new admission control rpc service

2020-12-03 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16412 )

Change subject: IMPALA-9930 (part 2): Introduce new admission control rpc 
service
..


Patch Set 13:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/6727/ 
DRY_RUN=false


--
To view, visit http://gerrit.cloudera.org:8080/16412
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I594fc593a27b24b6952e381a9bc1a9a5c6b757ae
Gerrit-Change-Number: 16412
Gerrit-PatchSet: 13
Gerrit-Owner: Thomas Tauber-Marshall 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Sahil Takiar 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Thu, 03 Dec 2020 18:17:18 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-9930 (part 2): Introduce new admission control rpc service

2020-12-03 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16412 )

Change subject: IMPALA-9930 (part 2): Introduce new admission control rpc 
service
..


Patch Set 13: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/16412
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I594fc593a27b24b6952e381a9bc1a9a5c6b757ae
Gerrit-Change-Number: 16412
Gerrit-PatchSet: 13
Gerrit-Owner: Thomas Tauber-Marshall 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Sahil Takiar 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Thu, 03 Dec 2020 18:17:17 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-9930 (part 2): Introduce new admission control rpc service

2020-12-03 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16412 )

Change subject: IMPALA-9930 (part 2): Introduce new admission control rpc 
service
..


Patch Set 12:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/7770/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/16412
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I594fc593a27b24b6952e381a9bc1a9a5c6b757ae
Gerrit-Change-Number: 16412
Gerrit-PatchSet: 12
Gerrit-Owner: Thomas Tauber-Marshall 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Sahil Takiar 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Thu, 03 Dec 2020 18:16:22 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-9930 (part 2): Introduce new admission control rpc service

2020-12-03 Thread Thomas Tauber-Marshall (Code Review)
Thomas Tauber-Marshall has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16412 )

Change subject: IMPALA-9930 (part 2): Introduce new admission control rpc 
service
..


Patch Set 12: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/16412
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I594fc593a27b24b6952e381a9bc1a9a5c6b757ae
Gerrit-Change-Number: 16412
Gerrit-PatchSet: 12
Gerrit-Owner: Thomas Tauber-Marshall 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Sahil Takiar 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Thu, 03 Dec 2020 17:54:52 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-9930 (part 2): Introduce new admission control rpc service

2020-12-03 Thread Thomas Tauber-Marshall (Code Review)
Hello Sahil Takiar, Joe McDonnell, Tim Armstrong, Bikramjeet Vig, Impala Public 
Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/16412

to look at the new patch set (#12).

Change subject: IMPALA-9930 (part 2): Introduce new admission control rpc 
service
..

IMPALA-9930 (part 2): Introduce new admission control rpc service

This patch introduces a new krpc service, AdmissionControlService,
which coordinators can use to submit queries for admission.

This patch adds some simple configuration flags that make it possible
to have coordinators use this service to submit their queries for
admission to other coordinators. These flags are only to make this
patch testable and will be replaced when the separate admission
control daemon is introduced in IMPALA-9975.

The interface consists of the following RPCs:
- AdmitQuery: takes a TQueryExecRequest and a TQueryOptions
  (serialized into sidecars), places the request on a queue to be
  processed by a thread pool and then immediately returns.
- GetQueryStatus: takes a query id and returns the current admission
  status, including the QuerySchedulePB if admission has completed
  successfully but the query has not been released yet.
- ReleaseQueryBackends: called when individual backends complete but
  the overall query is still running to release resources
  incrementally. This RPC will be called at most O(log(# backends))
  per query due to BackendResourceState, which batches backends to
  release together.
- ReleaseQuery: called when the query has completely finished.
  Releases all remaining resources.
- CancelAdmission: called if a query is cancelled before an admission
  decision has been made to indicate that it should no longer be
  considered for admission.

The majority of the patch consists of two classes:
- AdmissionControlClient: used to abstract whether admission is being
  performed locally or remotely. In the local case, it is basically
  just a wrapper around AdmissionController. In the remote case, it
  handles serializing/deserializing of RPC params, polling
  GetQueryStatus() until a decision has been made, etc.
- AdmissionControlService: exports the RPC interface and acts as a
  wrapper around AdmissionController.

Some notable changes involved:
- AdmissionController::SubmitForAdmission() no longer blocks while a
  query is queued. Instead, a new function WaitOnQueued() can be used
  to monitor the admission status of a queued query.
- Adding events to the query timeline is moved out of
  AdmissionController and into the AdmissionControlClient classes, so
  that it always happens on the coordinator.
- When a cluster is run in the new admission control service mode,
  only the impalad that is performing admission control exposes the
  /admission http endpoint. Observability will be cleaned up in a
  subsequent patch.

Testing:
- Modified existing admission control tests to run both with and
  without the admission control service enabled, including both the
  functional and stress tests. The 'num_queries' param in the stress
  test is modified to only use a single value to reduce the number of
  tests that are run and keep the running time reasonable.
- Ran tpch10 on a local minicluster and observed no significant
  regressions.

Change-Id: I594fc593a27b24b6952e381a9bc1a9a5c6b757ae
---
M be/src/runtime/exec-env.cc
M be/src/runtime/exec-env.h
M be/src/scheduling/CMakeLists.txt
M be/src/scheduling/admission-control-client.cc
M be/src/scheduling/admission-control-client.h
A be/src/scheduling/admission-control-service.cc
A be/src/scheduling/admission-control-service.h
M be/src/scheduling/admission-controller-test.cc
M be/src/scheduling/admission-controller.cc
M be/src/scheduling/admission-controller.h
M be/src/scheduling/local-admission-control-client.cc
M be/src/scheduling/local-admission-control-client.h
A be/src/scheduling/remote-admission-control-client.cc
A be/src/scheduling/remote-admission-control-client.h
M be/src/scheduling/schedule-state.cc
M be/src/scheduling/schedule-state.h
M be/src/service/client-request-state.cc
M be/src/service/impala-http-handler.cc
M be/src/util/sharded-query-map-util.cc
M common/protobuf/admission_control_service.proto
M tests/common/resource_pool_config.py
M tests/custom_cluster/test_admission_controller.py
M tests/hs2/hs2_test_suite.py
M tests/util/web_pages_util.py
24 files changed, 1,270 insertions(+), 171 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/12/16412/12
--
To view, visit http://gerrit.cloudera.org:8080/16412
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I594fc593a27b24b6952e381a9bc1a9a5c6b757ae
Gerrit-Change-Number: 16412
Gerrit-PatchSet: 12
Gerrit-Owner: Thomas Tauber-Marshall 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Impala 

[Impala-ASF-CR] IMPALA-9930 (part 2): Introduce new admission control rpc service

2020-12-01 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16412 )

Change subject: IMPALA-9930 (part 2): Introduce new admission control rpc 
service
..


Patch Set 11:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/7761/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/16412
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I594fc593a27b24b6952e381a9bc1a9a5c6b757ae
Gerrit-Change-Number: 16412
Gerrit-PatchSet: 11
Gerrit-Owner: Thomas Tauber-Marshall 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Sahil Takiar 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Wed, 02 Dec 2020 00:20:19 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-9930 (part 2): Introduce new admission control rpc service

2020-12-01 Thread Thomas Tauber-Marshall (Code Review)
Thomas Tauber-Marshall has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16412 )

Change subject: IMPALA-9930 (part 2): Introduce new admission control rpc 
service
..


Patch Set 11: Code-Review+2

Fixed a minor testing related issue, carrying forward


--
To view, visit http://gerrit.cloudera.org:8080/16412
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I594fc593a27b24b6952e381a9bc1a9a5c6b757ae
Gerrit-Change-Number: 16412
Gerrit-PatchSet: 11
Gerrit-Owner: Thomas Tauber-Marshall 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Sahil Takiar 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Wed, 02 Dec 2020 00:00:41 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-9930 (part 2): Introduce new admission control rpc service

2020-12-01 Thread Thomas Tauber-Marshall (Code Review)
Hello Sahil Takiar, Joe McDonnell, Tim Armstrong, Bikramjeet Vig, Impala Public 
Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/16412

to look at the new patch set (#11).

Change subject: IMPALA-9930 (part 2): Introduce new admission control rpc 
service
..

IMPALA-9930 (part 2): Introduce new admission control rpc service

This patch introduces a new krpc service, AdmissionControlService,
which coordinators can use to submit queries for admission.

This patch adds some simple configuration flags that make it possible
to have coordinators use this service to submit their queries for
admission to other coordinators. These flags are only to make this
patch testable and will be replaced when the separate admission
control daemon is introduced in IMPALA-9975.

The interface consists of the following RPCs:
- AdmitQuery: takes a TQueryExecRequest and a TQueryOptions
  (serialized into sidecars), places the request on a queue to be
  processed by a thread pool and then immediately returns.
- GetQueryStatus: takes a query id and returns the current admission
  status, including the QuerySchedulePB if admission has completed
  successfully but the query has not been released yet.
- ReleaseQueryBackends: called when individual backends complete but
  the overall query is still running to release resources
  incrementally. This RPC will be called at most O(log(# backends))
  per query due to BackendResourceState, which batches backends to
  release together.
- ReleaseQuery: called when the query has completely finished.
  Releases all remaining resources.
- CancelAdmission: called if a query is cancelled before an admission
  decision has been made to indicate that it should no longer be
  considered for admission.

The majority of the patch consists of two classes:
- AdmissionControlClient: used to abstract whether admission is being
  performed locally or remotely. In the local case, it is basically
  just a wrapper around AdmissionController. In the remote case, it
  handles serializing/deserializing of RPC params, polling
  GetQueryStatus() until a decision has been made, etc.
- AdmissionControlService: exports the RPC interface and acts as a
  wrapper around AdmissionController.

Some notable changes involved:
- AdmissionController::SubmitForAdmission() no longer blocks while a
  query is queued. Instead, a new function WaitOnQueued() can be used
  to monitor the admission status of a queued query.
- Adding events to the query timeline is moved out of
  AdmissionController and into the AdmissionControlClient classes, so
  that it always happens on the coordinator.
- When a cluster is run in the new admission control service mode,
  only the impalad that is performing admission control exposes the
  /admission http endpoint. Observability will be cleaned up in a
  subsequent patch.

Testing:
- Modified existing admission control tests to run both with and
  without the admission control service enabled, including both the
  functional and stress tests. The 'num_queries' param in the stress
  test is modified to only use a single value to reduce the number of
  tests that are run and keep the running time reasonable.
- Ran tpch10 on a local minicluster and observed no significant
  regressions.

Change-Id: I594fc593a27b24b6952e381a9bc1a9a5c6b757ae
---
M be/src/runtime/exec-env.cc
M be/src/runtime/exec-env.h
M be/src/scheduling/CMakeLists.txt
M be/src/scheduling/admission-control-client.cc
M be/src/scheduling/admission-control-client.h
A be/src/scheduling/admission-control-service.cc
A be/src/scheduling/admission-control-service.h
M be/src/scheduling/admission-controller.cc
M be/src/scheduling/admission-controller.h
M be/src/scheduling/local-admission-control-client.cc
M be/src/scheduling/local-admission-control-client.h
A be/src/scheduling/remote-admission-control-client.cc
A be/src/scheduling/remote-admission-control-client.h
M be/src/scheduling/schedule-state.cc
M be/src/scheduling/schedule-state.h
M be/src/service/client-request-state.cc
M be/src/service/impala-http-handler.cc
M be/src/util/sharded-query-map-util.cc
M common/protobuf/admission_control_service.proto
M tests/common/resource_pool_config.py
M tests/custom_cluster/test_admission_controller.py
M tests/hs2/hs2_test_suite.py
M tests/util/web_pages_util.py
23 files changed, 1,268 insertions(+), 169 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/12/16412/11
--
To view, visit http://gerrit.cloudera.org:8080/16412
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I594fc593a27b24b6952e381a9bc1a9a5c6b757ae
Gerrit-Change-Number: 16412
Gerrit-PatchSet: 11
Gerrit-Owner: Thomas Tauber-Marshall 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 

[Impala-ASF-CR] IMPALA-9930 (part 2): Introduce new admission control rpc service

2020-11-30 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16412 )

Change subject: IMPALA-9930 (part 2): Introduce new admission control rpc 
service
..


Patch Set 10: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/16412
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I594fc593a27b24b6952e381a9bc1a9a5c6b757ae
Gerrit-Change-Number: 16412
Gerrit-PatchSet: 10
Gerrit-Owner: Thomas Tauber-Marshall 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Sahil Takiar 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Tue, 01 Dec 2020 01:33:53 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-9930 (part 2): Introduce new admission control rpc service

2020-11-30 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16412 )

Change subject: IMPALA-9930 (part 2): Introduce new admission control rpc 
service
..


Patch Set 10:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/7752/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/16412
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I594fc593a27b24b6952e381a9bc1a9a5c6b757ae
Gerrit-Change-Number: 16412
Gerrit-PatchSet: 10
Gerrit-Owner: Thomas Tauber-Marshall 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Sahil Takiar 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Mon, 30 Nov 2020 19:27:45 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-9930 (part 2): Introduce new admission control rpc service

2020-11-30 Thread Thomas Tauber-Marshall (Code Review)
Thomas Tauber-Marshall has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16412 )

Change subject: IMPALA-9930 (part 2): Introduce new admission control rpc 
service
..


Patch Set 10:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/16412/9/be/src/scheduling/admission-control-service.cc
File be/src/scheduling/admission-control-service.cc:

http://gerrit.cloudera.org:8080/#/c/16412/9/be/src/scheduling/admission-control-service.cc@159
PS9, Line 159:   admission_state->admission_done = true;
> Can we file a JIRA to rework this so that this doesn't block the RPC thread
IMPALA-10359


http://gerrit.cloudera.org:8080/#/c/16412/9/be/src/scheduling/admission-control-service.cc@236
PS9, Line 236:   }
> When would this error happen? Is this the appropriate log level?
Currently, this shouldn't ever happen. I think logging as an ERROR instead of 
eg. at FATAL or DCHECK-ing or similar seems like the most defensive choice.


http://gerrit.cloudera.org:8080/#/c/16412/9/be/src/scheduling/remote-admission-control-client.cc
File be/src/scheduling/remote-admission-control-client.cc:

http://gerrit.cloudera.org:8080/#/c/16412/9/be/src/scheduling/remote-admission-control-client.cc@189
PS9, Line 189:   RPC_TIMEOUT_MS, RPC_BACKOFF_TIME_MS, 
"REMOTE_AC_RELEASE_BACKENDS");
> Should we move these constants into a shared place?
Done



--
To view, visit http://gerrit.cloudera.org:8080/16412
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I594fc593a27b24b6952e381a9bc1a9a5c6b757ae
Gerrit-Change-Number: 16412
Gerrit-PatchSet: 10
Gerrit-Owner: Thomas Tauber-Marshall 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Sahil Takiar 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Mon, 30 Nov 2020 19:06:37 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-9930 (part 2): Introduce new admission control rpc service

2020-11-30 Thread Thomas Tauber-Marshall (Code Review)
Hello Sahil Takiar, Joe McDonnell, Tim Armstrong, Bikramjeet Vig, Impala Public 
Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/16412

to look at the new patch set (#10).

Change subject: IMPALA-9930 (part 2): Introduce new admission control rpc 
service
..

IMPALA-9930 (part 2): Introduce new admission control rpc service

This patch introduces a new krpc service, AdmissionControlService,
which coordinators can use to submit queries for admission.

This patch adds some simple configuration flags that make it possible
to have coordinators use this service to submit their queries for
admission to other coordinators. These flags are only to make this
patch testable and will be replaced when the separate admission
control daemon is introduced in IMPALA-9975.

The interface consists of the following RPCs:
- AdmitQuery: takes a TQueryExecRequest and a TQueryOptions
  (serialized into sidecars), places the request on a queue to be
  processed by a thread pool and then immediately returns.
- GetQueryStatus: takes a query id and returns the current admission
  status, including the QuerySchedulePB if admission has completed
  successfully but the query has not been released yet.
- ReleaseQueryBackends: called when individual backends complete but
  the overall query is still running to release resources
  incrementally. This RPC will be called at most O(log(# backends))
  per query due to BackendResourceState, which batches backends to
  release together.
- ReleaseQuery: called when the query has completely finished.
  Releases all remaining resources.
- CancelAdmission: called if a query is cancelled before an admission
  decision has been made to indicate that it should no longer be
  considered for admission.

The majority of the patch consists of two classes:
- AdmissionControlClient: used to abstract whether admission is being
  performed locally or remotely. In the local case, it is basically
  just a wrapper around AdmissionController. In the remote case, it
  handles serializing/deserializing of RPC params, polling
  GetQueryStatus() until a decision has been made, etc.
- AdmissionControlService: exports the RPC interface and acts as a
  wrapper around AdmissionController.

Some notable changes involved:
- AdmissionController::SubmitForAdmission() no longer blocks while a
  query is queued. Instead, a new function WaitOnQueued() can be used
  to monitor the admission status of a queued query.
- Adding events to the query timeline is moved out of
  AdmissionController and into the AdmissionControlClient classes, so
  that it always happens on the coordinator.
- When a cluster is run in the new admission control service mode,
  only the impalad that is performing admission control exposes the
  /admission http endpoint. Observability will be cleaned up in a
  subsequent patch.

Testing:
- Modified existing admission control tests to run both with and
  without the admission control service enabled, including both the
  functional and stress tests. The 'num_queries' param in the stress
  test is modified to only use a single value to reduce the number of
  tests that are run and keep the running time reasonable.
- Ran tpch10 on a local minicluster and observed no significant
  regressions.

Change-Id: I594fc593a27b24b6952e381a9bc1a9a5c6b757ae
---
M be/src/runtime/exec-env.cc
M be/src/runtime/exec-env.h
M be/src/scheduling/CMakeLists.txt
M be/src/scheduling/admission-control-client.cc
M be/src/scheduling/admission-control-client.h
A be/src/scheduling/admission-control-service.cc
A be/src/scheduling/admission-control-service.h
M be/src/scheduling/admission-controller-test.cc
M be/src/scheduling/admission-controller.cc
M be/src/scheduling/admission-controller.h
M be/src/scheduling/local-admission-control-client.cc
M be/src/scheduling/local-admission-control-client.h
A be/src/scheduling/remote-admission-control-client.cc
A be/src/scheduling/remote-admission-control-client.h
M be/src/scheduling/schedule-state.cc
M be/src/scheduling/schedule-state.h
M be/src/service/client-request-state.cc
M be/src/service/impala-http-handler.cc
M be/src/util/sharded-query-map-util.cc
M common/protobuf/admission_control_service.proto
M tests/common/resource_pool_config.py
M tests/custom_cluster/test_admission_controller.py
M tests/hs2/hs2_test_suite.py
M tests/util/web_pages_util.py
24 files changed, 1,268 insertions(+), 171 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/12/16412/10
--
To view, visit http://gerrit.cloudera.org:8080/16412
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I594fc593a27b24b6952e381a9bc1a9a5c6b757ae
Gerrit-Change-Number: 16412
Gerrit-PatchSet: 10
Gerrit-Owner: Thomas Tauber-Marshall 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Impala 

[Impala-ASF-CR] IMPALA-9930 (part 2): Introduce new admission control rpc service

2020-11-24 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16412 )

Change subject: IMPALA-9930 (part 2): Introduce new admission control rpc 
service
..


Patch Set 9: Code-Review+2

(4 comments)

Looks good to merge modulo a few minor things

http://gerrit.cloudera.org:8080/#/c/16412/9/be/src/scheduling/admission-control-service.cc
File be/src/scheduling/admission-control-service.cc:

http://gerrit.cloudera.org:8080/#/c/16412/9/be/src/scheduling/admission-control-service.cc@159
PS9, Line 159:   admission_state->admission_done = true;
Can we file a JIRA to rework this so that this doesn't block the RPC thread (or 
if there's already one, link it here). At larger scale I think this could eat 
up all the RPC threads over time.


http://gerrit.cloudera.org:8080/#/c/16412/9/be/src/scheduling/admission-control-service.cc@236
PS9, Line 236: LOG(ERROR) << s;
When would this error happen? Is this the appropriate log level?


http://gerrit.cloudera.org:8080/#/c/16412/9/be/src/scheduling/admission-controller.h
File be/src/scheduling/admission-controller.h:

http://gerrit.cloudera.org:8080/#/c/16412/9/be/src/scheduling/admission-controller.h@887
PS9, Line 887:   QueueNode* queue_node, bool& coordinator_resource_limited);
nit: should use pointer for output argument here and below.


http://gerrit.cloudera.org:8080/#/c/16412/9/be/src/scheduling/remote-admission-control-client.cc
File be/src/scheduling/remote-admission-control-client.cc:

http://gerrit.cloudera.org:8080/#/c/16412/9/be/src/scheduling/remote-admission-control-client.cc@189
PS9, Line 189:   const int num_retries = 3;
Should we move these constants into a shared place?



--
To view, visit http://gerrit.cloudera.org:8080/16412
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I594fc593a27b24b6952e381a9bc1a9a5c6b757ae
Gerrit-Change-Number: 16412
Gerrit-PatchSet: 9
Gerrit-Owner: Thomas Tauber-Marshall 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Sahil Takiar 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Tue, 24 Nov 2020 22:13:02 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-9930 (part 2): Introduce new admission control rpc service

2020-11-18 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16412 )

Change subject: IMPALA-9930 (part 2): Introduce new admission control rpc 
service
..


Patch Set 9:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/7681/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/16412
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I594fc593a27b24b6952e381a9bc1a9a5c6b757ae
Gerrit-Change-Number: 16412
Gerrit-PatchSet: 9
Gerrit-Owner: Thomas Tauber-Marshall 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Sahil Takiar 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Thu, 19 Nov 2020 00:43:42 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-9930 (part 2): Introduce new admission control rpc service

2020-11-18 Thread Thomas Tauber-Marshall (Code Review)
Hello Sahil Takiar, Joe McDonnell, Tim Armstrong, Bikramjeet Vig, Impala Public 
Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/16412

to look at the new patch set (#9).

Change subject: IMPALA-9930 (part 2): Introduce new admission control rpc 
service
..

IMPALA-9930 (part 2): Introduce new admission control rpc service

This patch introduces a new krpc service, AdmissionControlService,
which coordinators can use to submit queries for admission.

This patch adds some simple configuration flags that make it possible
to have coordinators use this service to submit their queries for
admission to other coordinators. These flags are only to make this
patch testable and will be replaced when the separate admission
control daemon is introduced in IMPALA-9975.

The interface consists of the following RPCs:
- AdmitQuery: takes a TQueryExecRequest and a TQueryOptions
  (serialized into sidecars), places the request on a queue to be
  processed by a thread pool and then immediately returns.
- GetQueryStatus: takes a query id and returns the current admission
  status, including the QuerySchedulePB if admission has completed
  successfully but the query has not been released yet.
- ReleaseQueryBackends: called when individual backends complete but
  the overall query is still running to release resources
  incrementally. This RPC will be called at most O(log(# backends))
  per query due to BackendResourceState, which batches backends to
  release together.
- ReleaseQuery: called when the query has completely finished.
  Releases all remaining resources.
- CancelAdmission: called if a query is cancelled before an admission
  decision has been made to indicate that it should no longer be
  considered for admission.

The majority of the patch consists of two classes:
- AdmissionControlClient: used to abstract whether admission is being
  performed locally or remotely. In the local case, it is basically
  just a wrapper around AdmissionController. In the remote case, it
  handles serializing/deserializing of RPC params, polling
  GetQueryStatus() until a decision has been made, etc.
- AdmissionControlService: exports the RPC interface and acts as a
  wrapper around AdmissionController.

Some notable changes involved:
- AdmissionController::SubmitForAdmission() no longer blocks while a
  query is queued. Instead, a new function WaitOnQueued() can be used
  to monitor the admission status of a queued query.
- Adding events to the query timeline is moved out of
  AdmissionController and into the AdmissionControlClient classes, so
  that it always happens on the coordinator.
- When a cluster is run in the new admission control service mode,
  only the impalad that is performing admission control exposes the
  /admission http endpoint. Observability will be cleaned up in a
  subsequent patch.

Testing:
- Modified existing admission control tests to run both with and
  without the admission control service enabled, including both the
  functional and stress tests. The 'num_queries' param in the stress
  test is modified to only use a single value to reduce the number of
  tests that are run and keep the running time reasonable.
- Ran tpch10 on a local minicluster and observed no significant
  regressions.

Change-Id: I594fc593a27b24b6952e381a9bc1a9a5c6b757ae
---
M be/src/runtime/exec-env.cc
M be/src/runtime/exec-env.h
M be/src/scheduling/CMakeLists.txt
M be/src/scheduling/admission-control-client.cc
M be/src/scheduling/admission-control-client.h
A be/src/scheduling/admission-control-service.cc
A be/src/scheduling/admission-control-service.h
M be/src/scheduling/admission-controller-test.cc
M be/src/scheduling/admission-controller.cc
M be/src/scheduling/admission-controller.h
M be/src/scheduling/local-admission-control-client.cc
M be/src/scheduling/local-admission-control-client.h
A be/src/scheduling/remote-admission-control-client.cc
A be/src/scheduling/remote-admission-control-client.h
M be/src/scheduling/schedule-state.cc
M be/src/scheduling/schedule-state.h
M be/src/service/client-request-state.cc
M be/src/service/impala-http-handler.cc
M be/src/util/sharded-query-map-util.cc
M common/protobuf/admission_control_service.proto
M tests/common/resource_pool_config.py
M tests/custom_cluster/test_admission_controller.py
M tests/hs2/hs2_test_suite.py
M tests/util/web_pages_util.py
24 files changed, 1,241 insertions(+), 190 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/12/16412/9
--
To view, visit http://gerrit.cloudera.org:8080/16412
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I594fc593a27b24b6952e381a9bc1a9a5c6b757ae
Gerrit-Change-Number: 16412
Gerrit-PatchSet: 9
Gerrit-Owner: Thomas Tauber-Marshall 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Impala 

[Impala-ASF-CR] IMPALA-9930 (part 2): Introduce new admission control rpc service

2020-11-18 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16412 )

Change subject: IMPALA-9930 (part 2): Introduce new admission control rpc 
service
..


Patch Set 8:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/7680/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/16412
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I594fc593a27b24b6952e381a9bc1a9a5c6b757ae
Gerrit-Change-Number: 16412
Gerrit-PatchSet: 8
Gerrit-Owner: Thomas Tauber-Marshall 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Sahil Takiar 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Wed, 18 Nov 2020 23:08:02 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-9930 (part 2): Introduce new admission control rpc service

2020-11-18 Thread Thomas Tauber-Marshall (Code Review)
Thomas Tauber-Marshall has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16412 )

Change subject: IMPALA-9930 (part 2): Introduce new admission control rpc 
service
..


Patch Set 8:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/16412/7/be/src/scheduling/remote-admission-control-client.cc
File be/src/scheduling/remote-admission-control-client.cc:

http://gerrit.cloudera.org:8080/#/c/16412/7/be/src/scheduling/remote-admission-control-client.cc@121
PS7, Line 121: admit_status = Status(get_status_resp.status());
> Is there a JIRA for this? Might be an obstacle for production-readiness for
As discussed, switched to long polling which should mean this only ever comes 
into play for queries that are queued for a non-trivial amount of time, in 
which case its somewhat less important. Also made it configurable just for good 
measure.


http://gerrit.cloudera.org:8080/#/c/16412/7/be/src/scheduling/remote-admission-control-client.cc@184
PS7, Line 184:   ReleaseQueryBackendsResponsePB resp;
> Just thinking, but I think something like retrying the RPC then giving up a
Added retries here and for the other similar rpcs here.

As mentioned elsewhere, I think the cleanup mechanism will be a KeepAlive rpc 
that will be added in followup work.


http://gerrit.cloudera.org:8080/#/c/16412/7/common/protobuf/admission_control_service.proto
File common/protobuf/admission_control_service.proto:

http://gerrit.cloudera.org:8080/#/c/16412/7/common/protobuf/admission_control_service.proto@233
PS7, Line 233:   /// admission and the query getting released.
> Does this return immediately and require the client to throttle polling, or
As discussed offline, decided to go with long-polling.

For fault tolerance, I think the plan is to add a specific KeepAlive rpc which 
coordinators will periodically send to the admission controller with a list of 
active query ids. That should be pretty light weight and cover any issues from 
the coordinator failing or from small rpc errors that get the coordinator and 
admission controller out of sync.

For failures of the admission controller itself, I think the plan is to store 
the info needed to reconstruct the admission controller's state in the 
statestored.


http://gerrit.cloudera.org:8080/#/c/16412/7/tests/custom_cluster/test_admission_controller.py
File tests/custom_cluster/test_admission_controller.py:

http://gerrit.cloudera.org:8080/#/c/16412/7/tests/custom_cluster/test_admission_controller.py@1303
PS7, Line 1303: class 
TestAdmissionControllerWithACService(TestAdmissionController):
> How much time does this add to exhaustive tests?
Because of the reduction in dimensions for the stress test, this patch actually 
reduces the time to run this test file in exhaustive - from ~78 minutes to ~62 
minutes (on my machine).



--
To view, visit http://gerrit.cloudera.org:8080/16412
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I594fc593a27b24b6952e381a9bc1a9a5c6b757ae
Gerrit-Change-Number: 16412
Gerrit-PatchSet: 8
Gerrit-Owner: Thomas Tauber-Marshall 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Sahil Takiar 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Wed, 18 Nov 2020 22:46:48 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-9930 (part 2): Introduce new admission control rpc service

2020-11-18 Thread Thomas Tauber-Marshall (Code Review)
Hello Sahil Takiar, Joe McDonnell, Tim Armstrong, Bikramjeet Vig, Impala Public 
Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/16412

to look at the new patch set (#8).

Change subject: IMPALA-9930 (part 2): Introduce new admission control rpc 
service
..

IMPALA-9930 (part 2): Introduce new admission control rpc service

This patch introduces a new krpc service, AdmissionControlService,
which coordinators can use to submit queries for admission.

This patch adds some simple configuration flags that make it possible
to have coordinators use this service to submit their queries for
admission to other coordinators. These flags are only to make this
patch testable will be replaced when the separate admission control
daemon is introduced in IMPALA-9975.

The interface consists of the following RPCs:
- AdmitQuery: takes a TQueryExecRequest and a TQueryOptions
  (serialized into sidecars), places the request on a queue to be
  processed by a thread pool and then immediately returns.
- GetQueryStatus: takes a query id and returns the current admission
  status, including the QuerySchedulePB if admission has completed
  successfully but the query has not been released yet.
- ReleaseQueryBackends: called when individual backends complete but
  the overall query is still running to release resources
  incrementally. This RPC will be called at most O(log(# backends))
  per query due to BackendResourceState, which batches backends to
  release together.
- ReleaseQuery: called when the query has completely finished.
  Releases all remaining resources.
- CancelAdmission: called if a query is cancelled before an admission
  decision has been made to indicate that it should no longer be
  considered for admission.

The majority of the patch consists of two classes:
- AdmissionControlClient: used to abstract whether admission is being
  performed locally or remotely. In the local case, it is basically
  just a wrapper around AdmissionController. In the remote case, it
  handles serializing/deserializing of RPC params, polling
  GetQueryStatus() until a decision has been made, etc.
- AdmissionControlService: exports the RPC interface and acts as a
  wrapper around AdmissionController.

Some notable changes involved:
- AdmissionController::SubmitForAdmission() no longer blocks while a
  query is queued. Instead, a new function CheckQueued() can be used
  to monitor the admission status of a queued query.
- Adding events to the query timeline is moved out of
  AdmissionController and into the AdmissionControlClient classes, so
  that it always happens on the coordinator.
- When a cluster is run in the new admission control service mode,
  only the impalad that is performing admission control exposes the
  /admission http endpoint. Observability will be cleaned up in a
  subsequent patch.

Testing:
- Modified existing admission control tests to run both with and
  without the admission control service enabled, including both the
  functional and stress tests. The 'num_queries' param in the stress
  test is modified to only use a single value to reduce the number of
  tests that are run and keep the running time reasonable.
- Ran tpch10 on a local minicluster and observed no significant
  regressions.

Change-Id: I594fc593a27b24b6952e381a9bc1a9a5c6b757ae
---
M be/src/runtime/exec-env.cc
M be/src/runtime/exec-env.h
M be/src/scheduling/CMakeLists.txt
M be/src/scheduling/admission-control-client.cc
M be/src/scheduling/admission-control-client.h
A be/src/scheduling/admission-control-service.cc
A be/src/scheduling/admission-control-service.h
M be/src/scheduling/admission-controller-test.cc
M be/src/scheduling/admission-controller.cc
M be/src/scheduling/admission-controller.h
M be/src/scheduling/local-admission-control-client.cc
M be/src/scheduling/local-admission-control-client.h
A be/src/scheduling/remote-admission-control-client.cc
A be/src/scheduling/remote-admission-control-client.h
M be/src/scheduling/schedule-state.cc
M be/src/scheduling/schedule-state.h
M be/src/service/client-request-state.cc
M be/src/service/impala-http-handler.cc
M be/src/util/sharded-query-map-util.cc
M common/protobuf/admission_control_service.proto
M tests/common/resource_pool_config.py
M tests/custom_cluster/test_admission_controller.py
M tests/hs2/hs2_test_suite.py
M tests/util/web_pages_util.py
24 files changed, 1,240 insertions(+), 190 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/12/16412/8
--
To view, visit http://gerrit.cloudera.org:8080/16412
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I594fc593a27b24b6952e381a9bc1a9a5c6b757ae
Gerrit-Change-Number: 16412
Gerrit-PatchSet: 8
Gerrit-Owner: Thomas Tauber-Marshall 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Impala Public 

[Impala-ASF-CR] IMPALA-9930 (part 2): Introduce new admission control rpc service

2020-10-28 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16412 )

Change subject: IMPALA-9930 (part 2): Introduce new admission control rpc 
service
..


Patch Set 7:

(2 comments)

I was pleasantly surprised by how easy this was to understand. I think we're on 
the same page that we need to do some more things for production readiness, but 
I have few concerns about the current state of things for what it is.

http://gerrit.cloudera.org:8080/#/c/16412/7/be/src/scheduling/remote-admission-control-client.cc
File be/src/scheduling/remote-admission-control-client.cc:

http://gerrit.cloudera.org:8080/#/c/16412/7/be/src/scheduling/remote-admission-control-client.cc@121
PS7, Line 121: // TODO: make this more sophisticated, eg. base it off 
benchmarks, make it
Is there a JIRA for this? Might be an obstacle for production-readiness for 
shorter running queries. I think some kind of long-polling might work well with 
KRPC (we already do something like this for row batches - delaying the RPC 
response until the batch is queued).


http://gerrit.cloudera.org:8080/#/c/16412/7/be/src/scheduling/remote-admission-control-client.cc@184
PS7, Line 184:   // Failure of this rpc is not considered a query failure, so 
we just log it.
Just thinking, but I think something like retrying the RPC then giving up and 
having some cleanup mechanism on the AC service would work.



-- 
To view, visit http://gerrit.cloudera.org:8080/16412
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I594fc593a27b24b6952e381a9bc1a9a5c6b757ae
Gerrit-Change-Number: 16412
Gerrit-PatchSet: 7
Gerrit-Owner: Thomas Tauber-Marshall 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Sahil Takiar 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Thu, 29 Oct 2020 00:11:06 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-9930 (part 2): Introduce new admission control rpc service

2020-10-28 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16412 )

Change subject: IMPALA-9930 (part 2): Introduce new admission control rpc 
service
..


Patch Set 7:

(2 comments)

Read through the headers, the tests and the protobuf, so pushing out a batch of 
comments. Need to read through implementation next.

http://gerrit.cloudera.org:8080/#/c/16412/7/common/protobuf/admission_control_service.proto
File common/protobuf/admission_control_service.proto:

http://gerrit.cloudera.org:8080/#/c/16412/7/common/protobuf/admission_control_service.proto@233
PS7, Line 233:   rpc GetQueryStatus(GetQueryStatusRequestPB) returns 
(GetQueryStatusResponsePB);
Does this return immediately and require the client to throttle polling, or is 
it more of a long-polling thing where it can block for a bit? Would be useful 
to document the expected usage pattern briefly.

If a coordinator fails, how do we clean up admission requests? Is 
GetQueryStatus() effectively a keepalive?


http://gerrit.cloudera.org:8080/#/c/16412/7/tests/custom_cluster/test_admission_controller.py
File tests/custom_cluster/test_admission_controller.py:

http://gerrit.cloudera.org:8080/#/c/16412/7/tests/custom_cluster/test_admission_controller.py@1303
PS7, Line 1303: class 
TestAdmissionControllerWithACService(TestAdmissionController):
How much time does this add to exhaustive tests?



--
To view, visit http://gerrit.cloudera.org:8080/16412
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I594fc593a27b24b6952e381a9bc1a9a5c6b757ae
Gerrit-Change-Number: 16412
Gerrit-PatchSet: 7
Gerrit-Owner: Thomas Tauber-Marshall 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Sahil Takiar 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Wed, 28 Oct 2020 16:49:17 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-9930 (part 2): Introduce new admission control rpc service

2020-10-14 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16412 )

Change subject: IMPALA-9930 (part 2): Introduce new admission control rpc 
service
..


Patch Set 7:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/7460/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/16412
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I594fc593a27b24b6952e381a9bc1a9a5c6b757ae
Gerrit-Change-Number: 16412
Gerrit-PatchSet: 7
Gerrit-Owner: Thomas Tauber-Marshall 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Sahil Takiar 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Thu, 15 Oct 2020 00:11:02 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-9930 (part 2): Introduce new admission control rpc service

2020-10-14 Thread Thomas Tauber-Marshall (Code Review)
Hello Sahil Takiar, Joe McDonnell, Tim Armstrong, Bikramjeet Vig, Impala Public 
Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/16412

to look at the new patch set (#7).

Change subject: IMPALA-9930 (part 2): Introduce new admission control rpc 
service
..

IMPALA-9930 (part 2): Introduce new admission control rpc service

This patch introduces a new krpc service, AdmissionControlService,
which coordinators can use to submit queries for admission.

This patch adds some simple configuration flags that make it possible
to have coordinators use this service to submit their queries for
admission to other coordinators. These flags are only to make this
patch testable will be replaced when the separate admission control
daemon is introduced in IMPALA-9975.

The interface consists of the following RPCs:
- AdmitQuery: takes a TQueryExecRequest and a TQueryOptions
  (serialized into sidecars), places the request on a queue to be
  processed by a thread pool and then immediately returns.
- GetQueryStatus: takes a query id and returns the current admission
  status, including the QuerySchedulePB if admission has completed
  successfully but the query has not been released yet.
- ReleaseQueryBackends: called when individual backends complete but
  the overall query is still running to release resources
  incrementally. This RPC will be called at most O(log(# backends))
  per query due to BackendResourceState, which batches backends to
  release together.
- ReleaseQuery: called when the query has completely finished.
  Releases all remaining resources.
- CancelAdmission: called if a query is cancelled before an admission
  decision has been made to indicate that it should no longer be
  considered for admission.

The majority of the patch consists of two classes:
- AdmissionControlClient: used to abstract whether admission is being
  performed locally or remotely. In the local case, it is basically
  just a wrapper around AdmissionController. In the remote case, it
  handles serializing/deserializing of RPC params, polling
  GetQueryStatus() until a decision has been made, etc.
- AdmissionControlService: exports the RPC interface and acts as a
  wrapper around AdmissionController.

Some notable changes involved:
- AdmissionController::SubmitForAdmission() no longer blocks while a
  query is queued. Instead, a new function CheckQueued() can be used
  to monitor the admission status of a queued query.
- Adding events to the query timeline is moved out of
  AdmissionController and into the AdmissionControlClient classes, so
  that it always happens on the coordinator.
- When a cluster is run in the new admission control service mode,
  only the impalad that is performing admission control exposes the
  /admission http endpoint. Observability will be cleaned up in a
  subsequent patch.

Testing:
- Modified existing admission control tests to run both with and
  without the admission control service enabled, including both the
  functional and stress tests. The 'num_queries' param in the stress
  test is modified to only use a single value to reduce the number of
  tests that are run and keep the running time reasonable.
- Ran tpch10 on a local minicluster and observed no significant
  regressions.

Change-Id: I594fc593a27b24b6952e381a9bc1a9a5c6b757ae
---
M be/src/runtime/exec-env.cc
M be/src/runtime/exec-env.h
M be/src/scheduling/CMakeLists.txt
M be/src/scheduling/admission-control-client.cc
M be/src/scheduling/admission-control-client.h
A be/src/scheduling/admission-control-service.cc
A be/src/scheduling/admission-control-service.h
M be/src/scheduling/admission-controller-test.cc
M be/src/scheduling/admission-controller.cc
M be/src/scheduling/admission-controller.h
M be/src/scheduling/local-admission-control-client.cc
M be/src/scheduling/local-admission-control-client.h
A be/src/scheduling/remote-admission-control-client.cc
A be/src/scheduling/remote-admission-control-client.h
M be/src/scheduling/schedule-state.cc
M be/src/scheduling/schedule-state.h
M be/src/service/client-request-state.cc
M be/src/service/impala-http-handler.cc
M be/src/util/sharded-query-map-util.cc
M common/protobuf/admission_control_service.proto
M tests/common/resource_pool_config.py
M tests/custom_cluster/test_admission_controller.py
M tests/hs2/hs2_test_suite.py
M tests/util/web_pages_util.py
24 files changed, 1,212 insertions(+), 189 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/12/16412/7
--
To view, visit http://gerrit.cloudera.org:8080/16412
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I594fc593a27b24b6952e381a9bc1a9a5c6b757ae
Gerrit-Change-Number: 16412
Gerrit-PatchSet: 7
Gerrit-Owner: Thomas Tauber-Marshall 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Impala Public 

[Impala-ASF-CR] IMPALA-9930 (part 2): Introduce new admission control rpc service

2020-10-02 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16412 )

Change subject: IMPALA-9930 (part 2): Introduce new admission control rpc 
service
..


Patch Set 6:

Build Failed

https://jenkins.impala.io/job/gerrit-code-review-checks/7350/ : Initial code 
review checks failed. See linked job for details on the failure.


--
To view, visit http://gerrit.cloudera.org:8080/16412
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I594fc593a27b24b6952e381a9bc1a9a5c6b757ae
Gerrit-Change-Number: 16412
Gerrit-PatchSet: 6
Gerrit-Owner: Thomas Tauber-Marshall 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Sahil Takiar 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Fri, 02 Oct 2020 23:17:29 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-9930 (part 2): Introduce new admission control rpc service

2020-10-02 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16412 )

Change subject: IMPALA-9930 (part 2): Introduce new admission control rpc 
service
..


Patch Set 6:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/16412/6/be/src/scheduling/admission-control-service.cc
File be/src/scheduling/admission-control-service.cc:

http://gerrit.cloudera.org:8080/#/c/16412/6/be/src/scheduling/admission-control-service.cc@236
PS6, Line 236: 
admission_state->query_exec_request.query_ctx.client_request.query_options, 
admission_state->summary_profile,
line too long (117 > 90)



--
To view, visit http://gerrit.cloudera.org:8080/16412
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I594fc593a27b24b6952e381a9bc1a9a5c6b757ae
Gerrit-Change-Number: 16412
Gerrit-PatchSet: 6
Gerrit-Owner: Thomas Tauber-Marshall 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Sahil Takiar 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Fri, 02 Oct 2020 22:56:01 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-9930 (part 2): Introduce new admission control rpc service

2020-10-02 Thread Thomas Tauber-Marshall (Code Review)
Hello Sahil Takiar, Joe McDonnell, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/16412

to look at the new patch set (#6).

Change subject: IMPALA-9930 (part 2): Introduce new admission control rpc 
service
..

IMPALA-9930 (part 2): Introduce new admission control rpc service

This patch introduces a new krpc service, AdmissionControlService,
which coordinators can use to submit queries for admission.

This patch adds some simple configuration flags that make it possible
to have coordinators use this service to submit their queries for
admission to other coordinators. These flags are only to make this
patch testable will be replaced when the separate admission control
daemon is introduced in IMPALA-9975.

The interface consists of the following RPCs:
- AdmitQuery: takes a TQueryExecRequest and a TQueryOptions
  (serialized into sidecars), places the request on a queue to be
  processed by a thread pool and then immediately returns.
- GetQueryStatus: takes a query id and returns the current admission
  status, including the QuerySchedulePB if admission has completed
  successfully but the query has not been released yet.
- ReleaseQueryBackends: called when individual backends complete but
  the overall query is still running to release resources
  incrementally. This RPC will be called at most O(log(# backends))
  per query due to BackendResourceState, which batches backends to
  release together.
- ReleaseQuery: called when the query has completely finished.
  Releases all remaining resources.
- CancelAdmission: called if a query is cancelled before an admission
  decision has been made to indicate that it should no longer be
  considered for admission.

The majority of the patch consists of two classes:
- AdmissionControlClient: used to abstract whether admission is being
  performed locally or remotely. In the local case, it is basically
  just a wrapper around AdmissionController. In the remote case, it
  handles serializing/deserializing of RPC params, polling
  GetQueryStatus() until a decision has been made, etc.
- AdmissionControlService: exports the RPC interface and acts as a
  wrapper around AdmissionController.

Some notable changes involved:
- AdmissionController::SubmitForAdmission() no longer blocks while a
  query is queued. Instead, a new function CheckQueued() can be used
  to monitor the admission status of a queued query.
- Adding events to the query timeline is moved out of
  AdmissionController and into the AdmissionControlClient classes, so
  that it always happens on the coordinator.
- When a cluster is run in the new admission control service mode,
  only the impalad that is performing admission control exposes the
  /admission http endpoint. Observability will be cleaned up in a
  subsequent patch.

Testing:
- Modified existing admission control tests to run both with and
  without the admission control service enabled, including both the
  functional and stress tests. The 'num_queries' param in the stress
  test is modified to only use a single value to reduce the number of
  tests that are run and keep the running time reasonable.
- Ran tpch10 on a local minicluster and observed no significant
  regressions.

Change-Id: I594fc593a27b24b6952e381a9bc1a9a5c6b757ae
---
M be/src/runtime/exec-env.cc
M be/src/runtime/exec-env.h
M be/src/scheduling/CMakeLists.txt
M be/src/scheduling/admission-control-client.cc
M be/src/scheduling/admission-control-client.h
A be/src/scheduling/admission-control-service.cc
A be/src/scheduling/admission-control-service.h
M be/src/scheduling/admission-controller-test.cc
M be/src/scheduling/admission-controller.cc
M be/src/scheduling/admission-controller.h
M be/src/scheduling/local-admission-control-client.cc
M be/src/scheduling/local-admission-control-client.h
A be/src/scheduling/remote-admission-control-client.cc
A be/src/scheduling/remote-admission-control-client.h
M be/src/scheduling/schedule-state.cc
M be/src/scheduling/schedule-state.h
M be/src/service/client-request-state.cc
M be/src/service/impala-http-handler.cc
M be/src/util/sharded-query-map-util.cc
M common/protobuf/admission_control_service.proto
M tests/common/resource_pool_config.py
M tests/custom_cluster/test_admission_controller.py
M tests/hs2/hs2_test_suite.py
M tests/util/web_pages_util.py
24 files changed, 1,210 insertions(+), 189 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/12/16412/6
--
To view, visit http://gerrit.cloudera.org:8080/16412
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I594fc593a27b24b6952e381a9bc1a9a5c6b757ae
Gerrit-Change-Number: 16412
Gerrit-PatchSet: 6
Gerrit-Owner: Thomas Tauber-Marshall 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Sahil 

[Impala-ASF-CR] IMPALA-9930 (part 2): Introduce new admission control rpc service

2020-10-02 Thread Thomas Tauber-Marshall (Code Review)
Thomas Tauber-Marshall has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16412 )

Change subject: IMPALA-9930 (part 2): Introduce new admission control rpc 
service
..


Patch Set 6:

(20 comments)

http://gerrit.cloudera.org:8080/#/c/16412/5//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/16412/5//COMMIT_MSG@12
PS5, Line 12: This patch adds some simple configuration flags that make it 
possible
> not sure I understand the plan here. this patch adds flags is --is_admissio
My intention with IMPALA-9975 is that there will be a whole new binary, called 
'admissiond', which will be what actually exposes the AdmissionControlService, 
rather than having an impalad do it.

At that point, we won't need the --is_admission_controller flag because there 
will be two possible setups: traditional distributed admission control mode 
(which impalads detect by checking if admission_control_service_addr is empty), 
and new admission control service mode (where the admissiond will know to 
always export the AdmissionControlService).

I think this will be cleaner than having the admission control service run as 
an impalad with --is_admission_controller=true because:
- There are a bunch of things that impalads need that the admissiond won't 
(DataStreamService, ControlService, the catalog client, the hs2 and beeswax 
interfaces, etc.) and explicitly separating out an admissiond makes it more 
natural to exclude these things from it
- I think it makes integration with external deployment systems more natural, 
eg. in a kubernetes setup you might have an istio policy for exposing various 
ports for impalads and you'll need a separate policy for the admissiond anyways 
since the ports that it needs exposed are different. It seems nicer to just 
have admissiond be its own separate thing, rather than always special-casing 
the handling of impalads depending on whether they expose 
AdmissionControlService.

I'll be utilizing the same functionality we already use for the statestored and 
catalogd where its really all the same binary behind the scenes to limit 
linking time (see daemon-main.cc), so in a way the admissiond will just be an 
impalad with --is_admission_controller=true always set.

I can certainly see some downsides to this approach (eg. changes to legacy 
deployment systems will be larger, the need for generating another docker 
container image, etc.), so I'm open to the possibility this is not the right 
approach, depending on what others think.

I already have the admissiond stuff basically written and tested if you want me 
to submit it as a WIP (though its rough at the moment)


http://gerrit.cloudera.org:8080/#/c/16412/5/be/src/scheduling/admission-control-service.h
File be/src/scheduling/admission-control-service.h:

http://gerrit.cloudera.org:8080/#/c/16412/5/be/src/scheduling/admission-control-service.h@76
PS5, Line 76: AdmitQueryRequestP
> nit: typo
Done


http://gerrit.cloudera.org:8080/#/c/16412/5/be/src/scheduling/admission-control-service.h@105
PS5, Line 105: RuntimeProfile* summary_profile;
> might be worth mentioning that this is passed to AdmissionController::Submi
Done


http://gerrit.cloudera.org:8080/#/c/16412/5/be/src/scheduling/admission-control-service.h@118
PS5, Line 118: to
> nit: typo
Done


http://gerrit.cloudera.org:8080/#/c/16412/5/be/src/scheduling/admission-control-service.cc
File be/src/scheduling/admission-control-service.cc:

http://gerrit.cloudera.org:8080/#/c/16412/5/be/src/scheduling/admission-control-service.cc@85
PS5, Line 85: Admission
> nit: typo
Done


http://gerrit.cloudera.org:8080/#/c/16412/5/be/src/scheduling/admission-control-service.cc@142
PS5, Line 142:   shared_ptr admission_state;
> might be too verbose to log.
I moved it to a higher VLOG level.


http://gerrit.cloudera.org:8080/#/c/16412/5/be/src/scheduling/admission-control-service.cc@155
PS5, Line 155:
> why wait at all? won't waiting tie up one of the RPC threads? I think the c
Done


http://gerrit.cloudera.org:8080/#/c/16412/5/be/src/scheduling/admission-control-service.cc@159
PS5, Line 159: } else {
> is this expected? do want to DCHECK if the admit_status is not Status::OK()
Done


http://gerrit.cloudera.org:8080/#/c/16412/5/be/src/scheduling/admission-control-service.cc@197
PS5, Line 197: const ReleaseQueryBackendsRequestPB* req, 
ReleaseQueryBackendsResponsePB* resp,
> why does the lock need to be acquired here?
Done


http://gerrit.cloudera.org:8080/#/c/16412/5/be/src/scheduling/admission-control-service.cc@209
PS5, Line 209:   req->query_id(), host_addrs);
> might be too verbose to log
Moved to a higher vlog level


http://gerrit.cloudera.org:8080/#/c/16412/5/be/src/scheduling/admission-control-service.cc@219
PS5, Line 219:   
admission_state->admit_outcome.Set(AdmissionOutcome::CANCELLED);
> why does the lock need to be acquired here?
Done



[Impala-ASF-CR] IMPALA-9930 (part 2): Introduce new admission control rpc service

2020-09-30 Thread Sahil Takiar (Code Review)
Sahil Takiar has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16412 )

Change subject: IMPALA-9930 (part 2): Introduce new admission control rpc 
service
..


Patch Set 5:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/16412/5/be/src/scheduling/remote-admission-control-client.cc
File be/src/scheduling/remote-admission-control-client.cc:

http://gerrit.cloudera.org:8080/#/c/16412/5/be/src/scheduling/remote-admission-control-client.cc@86
PS5, Line 86: KUDU_RETURN_IF_ERROR(
: proxy->AdmitQuery(req, , _controller), 
"AdmitQuery rpc failed");
: Status admit_status(resp.status());
: RETURN_IF_ERROR(admit_status);
> is it necessary to hold the lock while making the RPC to Admit the query? t
nvm, seems there is a separate client per query.



--
To view, visit http://gerrit.cloudera.org:8080/16412
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I594fc593a27b24b6952e381a9bc1a9a5c6b757ae
Gerrit-Change-Number: 16412
Gerrit-PatchSet: 5
Gerrit-Owner: Thomas Tauber-Marshall 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Sahil Takiar 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Comment-Date: Wed, 30 Sep 2020 21:14:00 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-9930 (part 2): Introduce new admission control rpc service

2020-09-30 Thread Sahil Takiar (Code Review)
Sahil Takiar has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16412 )

Change subject: IMPALA-9930 (part 2): Introduce new admission control rpc 
service
..


Patch Set 5:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/16412/5/be/src/scheduling/admission-control-service.h
File be/src/scheduling/admission-control-service.h:

http://gerrit.cloudera.org:8080/#/c/16412/5/be/src/scheduling/admission-control-service.h@78
PS5, Line 78: UniqueIdPB query_id;
: UniqueIdPB coord_id;
> can these both be const ref?
nvm, I guess they need to be copied since they come from the AdmitQueryRequestPB



--
To view, visit http://gerrit.cloudera.org:8080/16412
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I594fc593a27b24b6952e381a9bc1a9a5c6b757ae
Gerrit-Change-Number: 16412
Gerrit-PatchSet: 5
Gerrit-Owner: Thomas Tauber-Marshall 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Sahil Takiar 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Comment-Date: Wed, 30 Sep 2020 19:34:36 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-9930 (part 2): Introduce new admission control rpc service

2020-09-23 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16412 )

Change subject: IMPALA-9930 (part 2): Introduce new admission control rpc 
service
..


Patch Set 4:

Build Failed

https://jenkins.impala.io/job/gerrit-code-review-checks/7265/ : Initial code 
review checks failed. See linked job for details on the failure.


--
To view, visit http://gerrit.cloudera.org:8080/16412
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I594fc593a27b24b6952e381a9bc1a9a5c6b757ae
Gerrit-Change-Number: 16412
Gerrit-PatchSet: 4
Gerrit-Owner: Thomas Tauber-Marshall 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Sahil Takiar 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Comment-Date: Thu, 24 Sep 2020 01:43:19 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-9930 (part 2): Introduce new admission control rpc service

2020-09-23 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16412 )

Change subject: IMPALA-9930 (part 2): Introduce new admission control rpc 
service
..


Patch Set 3:

Build Failed

https://jenkins.impala.io/job/gerrit-code-review-checks/7264/ : Initial code 
review checks failed. See linked job for details on the failure.


--
To view, visit http://gerrit.cloudera.org:8080/16412
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I594fc593a27b24b6952e381a9bc1a9a5c6b757ae
Gerrit-Change-Number: 16412
Gerrit-PatchSet: 3
Gerrit-Owner: Thomas Tauber-Marshall 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Sahil Takiar 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Comment-Date: Thu, 24 Sep 2020 01:41:03 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-9930 (part 2): Introduce new admission control rpc service

2020-09-23 Thread Thomas Tauber-Marshall (Code Review)
Hello Sahil Takiar, Joe McDonnell, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/16412

to look at the new patch set (#4).

Change subject: IMPALA-9930 (part 2): Introduce new admission control rpc 
service
..

IMPALA-9930 (part 2): Introduce new admission control rpc service

This patch introduces a new krpc service, AdmissionControlService,
which coordinators can use to submit queries for admission.

This patch adds some simple configuration flags that make it possible
to have coordinators use this service to submit their queries for
admission to other coordinators. These flags are only to make this
patch testable will be replaced when the separate admission control
daemon is introduced in IMPALA-9975.

The interface consists of the following RPCs:
- AdmitQuery: takes a TQueryExecRequest and a TQueryOptions
  (serialized into sidecars), places the request on a queue to be
  processed by a thread pool and then immediately returns.
- GetQueryStatus: takes a query id and returns the current admission
  status, including the QuerySchedulePB if admission has completed
  successfully but the query has not been released yet.
- ReleaseQueryBackends: called when individual backends complete but
  the overall query is still running to release resources
  incrementally. This RPC will be called at most O(log(# backends))
  per query due to BackendResourceState, which batches backends to
  release together.
- ReleaseQuery: called when the query has completely finished.
  Releases all remaining resources.
- CancelAdmission: called if a query is cancelled before an admission
  decision has been made to indicate that it should no longer be
  considered for admission.

The majority of the patch consists of two classes:
- AdmissionControlClient: used to abstract whether admission is being
  performed locally or remotely. In the local case, it is basically
  just a wrapper around AdmissionController. In the remote case, it
  handles serializing/deserializing of RPC params, polling
  GetQueryStatus() until a decision has been made, etc.
- AdmissionControlService: exports the RPC interface and acts as a
  wrapper around AdmissionController.

Testing:
- Modified existing admission control tests to run both with and
  without the admission control service enabled, including both the
  functional and stress tests. The 'num_queries' param in the stress
  test is modified to only use a single value to reduce the number of
  tests that are run and keep the running time reasonable.
- Ran tpch10 on a local minicluster and observed no significant
  regressions.

Change-Id: I594fc593a27b24b6952e381a9bc1a9a5c6b757ae
---
M be/src/runtime/exec-env.cc
M be/src/runtime/exec-env.h
M be/src/scheduling/CMakeLists.txt
M be/src/scheduling/admission-control-client.cc
M be/src/scheduling/admission-control-client.h
A be/src/scheduling/admission-control-service.cc
A be/src/scheduling/admission-control-service.h
M be/src/scheduling/admission-controller-test.cc
M be/src/scheduling/admission-controller.cc
M be/src/scheduling/admission-controller.h
M be/src/scheduling/local-admission-control-client.cc
M be/src/scheduling/local-admission-control-client.h
A be/src/scheduling/remote-admission-control-client.cc
A be/src/scheduling/remote-admission-control-client.h
M be/src/scheduling/schedule-state.cc
M be/src/scheduling/schedule-state.h
M be/src/service/client-request-state.cc
M be/src/service/impala-http-handler.cc
M be/src/util/sharded-query-map-util.cc
M common/protobuf/admission_control_service.proto
M tests/common/resource_pool_config.py
M tests/custom_cluster/test_admission_controller.py
M tests/hs2/hs2_test_suite.py
M tests/util/web_pages_util.py
24 files changed, 1,217 insertions(+), 186 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/12/16412/4
--
To view, visit http://gerrit.cloudera.org:8080/16412
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I594fc593a27b24b6952e381a9bc1a9a5c6b757ae
Gerrit-Change-Number: 16412
Gerrit-PatchSet: 4
Gerrit-Owner: Thomas Tauber-Marshall 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Sahil Takiar 


[Impala-ASF-CR] IMPALA-9930 (part 2): Introduce new admission control rpc service

2020-09-23 Thread Thomas Tauber-Marshall (Code Review)
Thomas Tauber-Marshall has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16412 )

Change subject: IMPALA-9930 (part 2): Introduce new admission control rpc 
service
..


Patch Set 4:

(23 comments)

http://gerrit.cloudera.org:8080/#/c/16412/1/be/src/scheduling/admission-control-client.cc
File be/src/scheduling/admission-control-client.cc:

http://gerrit.cloudera.org:8080/#/c/16412/1/be/src/scheduling/admission-control-client.cc@210
PS1, Line 210:
> line too long (91 > 90)
Done


http://gerrit.cloudera.org:8080/#/c/16412/3/be/src/scheduling/admission-control-client.cc
File be/src/scheduling/admission-control-client.cc:

http://gerrit.cloudera.org:8080/#/c/16412/3/be/src/scheduling/admission-control-client.cc@31
PS3, Line 31: const string 
AdmissionControlClient::QUERY_EVENT_SUBMIT_FOR_ADMISSION =
> line too long (95 > 90)
Done


http://gerrit.cloudera.org:8080/#/c/16412/3/be/src/scheduling/admission-control-client.cc@33
PS3, Line 33: const string AdmissionControlClient::QUERY_EVENT_QUEUED = 
"Queued";
> line too long (93 > 90)
Done


http://gerrit.cloudera.org:8080/#/c/16412/1/be/src/scheduling/admission-controller.h
File be/src/scheduling/admission-controller.h:

http://gerrit.cloudera.org:8080/#/c/16412/1/be/src/scheduling/admission-controller.h@339
PS1, Line 339:   std::string* request_pool = nullptr);
> line too long (108 > 90)
Done


http://gerrit.cloudera.org:8080/#/c/16412/1/be/src/scheduling/admission-controller.h@343
PS1, Line 343:   /// 'timeout_ms' has passed. If the function returns due to a 
timeout 'wait_timed_out'
> line too long (99 > 90)
Done


http://gerrit.cloudera.org:8080/#/c/16412/1/be/src/scheduling/admission-controller.cc
File be/src/scheduling/admission-controller.cc:

http://gerrit.cloudera.org:8080/#/c/16412/1/be/src/scheduling/admission-controller.cc@1116
PS1, Line 1116:   DebugActionNoFail(request.query_options, 
"AC_BEFORE_ADMISSION");
> line too long (92 > 90)
Done


http://gerrit.cloudera.org:8080/#/c/16412/1/be/src/scheduling/admission-controller.cc@1292
PS1, Line 1292:   const ErrorMsg& rejected_msg = 
ErrorMsg(TErrorCode::ADMISSION_TIMED_OUT,
> line too long (97 > 90)
Done


http://gerrit.cloudera.org:8080/#/c/16412/3/be/src/scheduling/schedule-state.h
File be/src/scheduling/schedule-state.h:

http://gerrit.cloudera.org:8080/#/c/16412/3/be/src/scheduling/schedule-state.h@150
PS3, Line 150:   /// For testing only: specify 'init=false' to build a 
ScheduleState object but do not
> line too long (99 > 90)
Done


http://gerrit.cloudera.org:8080/#/c/16412/1/tests/custom_cluster/test_admission_controller.py
File tests/custom_cluster/test_admission_controller.py:

http://gerrit.cloudera.org:8080/#/c/16412/1/tests/custom_cluster/test_admission_controller.py@602
PS1, Line 602:
> flake8: E501 line too long (94 > 90 characters)
Done


http://gerrit.cloudera.org:8080/#/c/16412/1/tests/custom_cluster/test_admission_controller.py@803
PS1, Line 803:
> flake8: E501 line too long (91 > 90 characters)
Done


http://gerrit.cloudera.org:8080/#/c/16412/1/tests/custom_cluster/test_admission_controller.py@816
PS1, Line 816:
> flake8: E501 line too long (95 > 90 characters)
Done


http://gerrit.cloudera.org:8080/#/c/16412/1/tests/custom_cluster/test_admission_controller.py@1082
PS1, Line 1082:
> flake8: E501 line too long (113 > 90 characters)
Done


http://gerrit.cloudera.org:8080/#/c/16412/1/tests/custom_cluster/test_admission_controller.py@1140
PS1, Line 1140:
> flake8: E501 line too long (92 > 90 characters)
Done


http://gerrit.cloudera.org:8080/#/c/16412/1/tests/custom_cluster/test_admission_controller.py@1287
PS1, Line 1287:
> flake8: E501 line too long (94 > 90 characters)
Done


http://gerrit.cloudera.org:8080/#/c/16412/1/tests/custom_cluster/test_admission_controller.py@1318
PS1, Line 1318:
> flake8: E501 line too long (97 > 90 characters)
Done


http://gerrit.cloudera.org:8080/#/c/16412/1/tests/custom_cluster/test_admission_controller.py@1321
PS1, Line 1321:
> flake8: E501 line too long (92 > 90 characters)
Done


http://gerrit.cloudera.org:8080/#/c/16412/1/tests/custom_cluster/test_admission_controller.py@1903
PS1, Line 1903: # Each query mem limit (set the query option to override 
the per-host memory
> flake8: E302 expected 2 blank lines, found 1
Done


http://gerrit.cloudera.org:8080/#/c/16412/1/tests/custom_cluster/test_admission_controller.py@1919
PS1, Line 1919:
> flake8: E501 line too long (97 > 90 characters)
Done


http://gerrit.cloudera.org:8080/#/c/16412/1/tests/custom_cluster/test_admission_controller.py@1922
PS1, Line 1922:
> flake8: E501 line too long (92 > 90 characters)
Done


http://gerrit.cloudera.org:8080/#/c/16412/1/tests/hs2/hs2_test_suite.py
File tests/hs2/hs2_test_suite.py:

http://gerrit.cloudera.org:8080/#/c/16412/1/tests/hs2/hs2_test_suite.py@338
PS1, Line 338: 0
> flake8: E251 unexpected spaces around keyword / parameter equals
Done