[Impala-ASF-CR] IMPALA-8685,IMPALA-8677: Use consistent scheduling for small clusters

2019-08-14 Thread Joe McDonnell (Code Review)
Joe McDonnell has submitted this change and it was merged. ( 
http://gerrit.cloudera.org:8080/14026 )

Change subject: IMPALA-8685,IMPALA-8677: Use consistent scheduling for small 
clusters
..

IMPALA-8685,IMPALA-8677: Use consistent scheduling for small clusters

In the original change for consistent scheduling, if a cluster has
fewer nodes than the number of remote executor candidates, then
the scheduler falls back to using the old SelectRemoteExecutor().
SelectRemoteExecutor() considers all backends and picks the backend
with the least assigned bytes; to break ties, it uses randomness.
This means that clusters with fewer backends than
num_remote_executor_candidates do not have consistent placement.

For the file handle cache (the original user of consistent
placement), this is not a major problem. However, for data caching,
it can result in slower warm up of the data cache and greater
duplication of the same data across different nodes.

This changes the algorithm to use consistent placement even for
small clusters (num nodes <= num_remote_executor_candidates).
To make this more predictable, it increases the maximum number
of iterations.

This also changes GetRemoteExecutorCandidates() to return the
candidates in the order that they were selected. While still
using a set for detecting duplicate backends, the vector of
distinct backends is constructed directly rather than by
iterating over the set.

Testing:
 - Modify the scheduler-test backend test to verify that small
   clusters use consistent scheduling.

Change-Id: Icfdb2cc53d7206e316ea8a1cc28ad443f246f741
Reviewed-on: http://gerrit.cloudera.org:8080/14026
Reviewed-by: Lars Volker 
Tested-by: Impala Public Jenkins 
---
M be/src/scheduling/scheduler-test.cc
M be/src/scheduling/scheduler.cc
2 files changed, 85 insertions(+), 42 deletions(-)

Approvals:
  Lars Volker: Looks good to me, approved
  Impala Public Jenkins: Verified

--
To view, visit http://gerrit.cloudera.org:8080/14026
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: Icfdb2cc53d7206e316ea8a1cc28ad443f246f741
Gerrit-Change-Number: 14026
Gerrit-PatchSet: 9
Gerrit-Owner: Joe McDonnell 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Michael Ho 


[Impala-ASF-CR] IMPALA-8685,IMPALA-8677: Use consistent scheduling for small clusters

2019-08-13 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/14026 )

Change subject: IMPALA-8685,IMPALA-8677: Use consistent scheduling for small 
clusters
..


Patch Set 8: Verified+1


--
To view, visit http://gerrit.cloudera.org:8080/14026
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Icfdb2cc53d7206e316ea8a1cc28ad443f246f741
Gerrit-Change-Number: 14026
Gerrit-PatchSet: 8
Gerrit-Owner: Joe McDonnell 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Michael Ho 
Gerrit-Comment-Date: Wed, 14 Aug 2019 04:39:09 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-8685,IMPALA-8677: Use consistent scheduling for small clusters

2019-08-13 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/14026 )

Change subject: IMPALA-8685,IMPALA-8677: Use consistent scheduling for small 
clusters
..


Patch Set 8:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/4243/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/14026
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Icfdb2cc53d7206e316ea8a1cc28ad443f246f741
Gerrit-Change-Number: 14026
Gerrit-PatchSet: 8
Gerrit-Owner: Joe McDonnell 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Michael Ho 
Gerrit-Comment-Date: Wed, 14 Aug 2019 00:57:21 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-8685,IMPALA-8677: Use consistent scheduling for small clusters

2019-08-13 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/14026 )

Change subject: IMPALA-8685,IMPALA-8677: Use consistent scheduling for small 
clusters
..


Patch Set 8:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/4785/ 
DRY_RUN=true


--
To view, visit http://gerrit.cloudera.org:8080/14026
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Icfdb2cc53d7206e316ea8a1cc28ad443f246f741
Gerrit-Change-Number: 14026
Gerrit-PatchSet: 8
Gerrit-Owner: Joe McDonnell 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Michael Ho 
Gerrit-Comment-Date: Wed, 14 Aug 2019 00:32:23 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-8685,IMPALA-8677: Use consistent scheduling for small clusters

2019-08-13 Thread Lars Volker (Code Review)
Lars Volker has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/14026 )

Change subject: IMPALA-8685,IMPALA-8677: Use consistent scheduling for small 
clusters
..


Patch Set 8: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/14026
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Icfdb2cc53d7206e316ea8a1cc28ad443f246f741
Gerrit-Change-Number: 14026
Gerrit-PatchSet: 8
Gerrit-Owner: Joe McDonnell 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Michael Ho 
Gerrit-Comment-Date: Wed, 14 Aug 2019 00:30:11 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-8685,IMPALA-8677: Use consistent scheduling for small clusters

2019-08-13 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/14026 )

Change subject: IMPALA-8685,IMPALA-8677: Use consistent scheduling for small 
clusters
..


Patch Set 6:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/4242/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/14026
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Icfdb2cc53d7206e316ea8a1cc28ad443f246f741
Gerrit-Change-Number: 14026
Gerrit-PatchSet: 6
Gerrit-Owner: Joe McDonnell 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Michael Ho 
Gerrit-Comment-Date: Wed, 14 Aug 2019 00:22:50 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-8685,IMPALA-8677: Use consistent scheduling for small clusters

2019-08-13 Thread Joe McDonnell (Code Review)
Hello Michael Ho, Lars Volker, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/14026

to look at the new patch set (#8).

Change subject: IMPALA-8685,IMPALA-8677: Use consistent scheduling for small 
clusters
..

IMPALA-8685,IMPALA-8677: Use consistent scheduling for small clusters

In the original change for consistent scheduling, if a cluster has
fewer nodes than the number of remote executor candidates, then
the scheduler falls back to using the old SelectRemoteExecutor().
SelectRemoteExecutor() considers all backends and picks the backend
with the least assigned bytes; to break ties, it uses randomness.
This means that clusters with fewer backends than
num_remote_executor_candidates do not have consistent placement.

For the file handle cache (the original user of consistent
placement), this is not a major problem. However, for data caching,
it can result in slower warm up of the data cache and greater
duplication of the same data across different nodes.

This changes the algorithm to use consistent placement even for
small clusters (num nodes <= num_remote_executor_candidates).
To make this more predictable, it increases the maximum number
of iterations.

This also changes GetRemoteExecutorCandidates() to return the
candidates in the order that they were selected. While still
using a set for detecting duplicate backends, the vector of
distinct backends is constructed directly rather than by
iterating over the set.

Testing:
 - Modify the scheduler-test backend test to verify that small
   clusters use consistent scheduling.

Change-Id: Icfdb2cc53d7206e316ea8a1cc28ad443f246f741
---
M be/src/scheduling/scheduler-test.cc
M be/src/scheduling/scheduler.cc
2 files changed, 85 insertions(+), 42 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/26/14026/8
--
To view, visit http://gerrit.cloudera.org:8080/14026
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Icfdb2cc53d7206e316ea8a1cc28ad443f246f741
Gerrit-Change-Number: 14026
Gerrit-PatchSet: 8
Gerrit-Owner: Joe McDonnell 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Michael Ho 


[Impala-ASF-CR] IMPALA-8685,IMPALA-8677: Use consistent scheduling for small clusters

2019-08-13 Thread Joe McDonnell (Code Review)
Joe McDonnell has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/14026 )

Change subject: IMPALA-8685,IMPALA-8677: Use consistent scheduling for small 
clusters
..


Patch Set 7:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/14026/7/be/src/scheduling/scheduler.cc
File be/src/scheduling/scheduler.cc:

http://gerrit.cloudera.org:8080/#/c/14026/7/be/src/scheduling/scheduler.cc@823
PS7, Line 823: if (distinct_backends.size() != num_distinct_before) {
> std::unordered_set actually returns whether an insertion took place: https:
Good point, that's a much better way to do this. Switched this over to use the 
return value from insert.



--
To view, visit http://gerrit.cloudera.org:8080/14026
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Icfdb2cc53d7206e316ea8a1cc28ad443f246f741
Gerrit-Change-Number: 14026
Gerrit-PatchSet: 7
Gerrit-Owner: Joe McDonnell 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Michael Ho 
Gerrit-Comment-Date: Wed, 14 Aug 2019 00:16:25 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-8685,IMPALA-8677: Use consistent scheduling for small clusters

2019-08-13 Thread Lars Volker (Code Review)
Lars Volker has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/14026 )

Change subject: IMPALA-8685,IMPALA-8677: Use consistent scheduling for small 
clusters
..


Patch Set 7:

(1 comment)

Had one comment, otherwise LGTM

http://gerrit.cloudera.org:8080/#/c/14026/7/be/src/scheduling/scheduler.cc
File be/src/scheduling/scheduler.cc:

http://gerrit.cloudera.org:8080/#/c/14026/7/be/src/scheduling/scheduler.cc@823
PS7, Line 823: if (distinct_backends.size() != num_distinct_before) {
std::unordered_set actually returns whether an insertion took place: 
https://en.cppreference.com/w/cpp/container/unordered_set/insert



--
To view, visit http://gerrit.cloudera.org:8080/14026
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Icfdb2cc53d7206e316ea8a1cc28ad443f246f741
Gerrit-Change-Number: 14026
Gerrit-PatchSet: 7
Gerrit-Owner: Joe McDonnell 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Michael Ho 
Gerrit-Comment-Date: Tue, 13 Aug 2019 23:58:59 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-8685,IMPALA-8677: Use consistent scheduling for small clusters

2019-08-13 Thread Joe McDonnell (Code Review)
Joe McDonnell has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/14026 )

Change subject: IMPALA-8685,IMPALA-8677: Use consistent scheduling for small 
clusters
..


Patch Set 7: Code-Review+1

Carry +1


--
To view, visit http://gerrit.cloudera.org:8080/14026
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Icfdb2cc53d7206e316ea8a1cc28ad443f246f741
Gerrit-Change-Number: 14026
Gerrit-PatchSet: 7
Gerrit-Owner: Joe McDonnell 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Michael Ho 
Gerrit-Comment-Date: Tue, 13 Aug 2019 23:48:02 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-8685,IMPALA-8677: Use consistent scheduling for small clusters

2019-08-13 Thread Joe McDonnell (Code Review)
Hello Michael Ho, Lars Volker, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/14026

to look at the new patch set (#6).

Change subject: IMPALA-8685,IMPALA-8677: Use consistent scheduling for small 
clusters
..

IMPALA-8685,IMPALA-8677: Use consistent scheduling for small clusters

In the original change for consistent scheduling, if a cluster has
fewer nodes than the number of remote executor candidates, then
the scheduler falls back to using the old SelectRemoteExecutor().
SelectRemoteExecutor() considers all backends and picks the backend
with the least assigned bytes; to break ties, it uses randomness.
This means that clusters with fewer backends than
num_remote_executor_candidates do not have consistent placement.

For the file handle cache (the original user of consistent
placement), this is not a major problem. However, for data caching,
it can result in slower warm up of the data cache and greater
duplication of the same data across different nodes.

This changes the algorithm to use consistent placement even for
small clusters (num nodes <= num_remote_executor_candidates).
To make this more predictable, it increases the maximum number
of iterations.

This also changes GetRemoteExecutorCandidates() to return the
candidates in the order that they were selected. While still
using a set for detecting duplicate backends, the vector of
distinct backends is constructed directly rather than by
iterating over the set.

Testing:
 - Modify the scheduler-test backend test to verify that small
   clusters use consistent scheduling.

Change-Id: Icfdb2cc53d7206e316ea8a1cc28ad443f246f741
---
M be/src/scheduling/scheduler-test.cc
M be/src/scheduling/scheduler.cc
2 files changed, 84 insertions(+), 41 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/26/14026/6
--
To view, visit http://gerrit.cloudera.org:8080/14026
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Icfdb2cc53d7206e316ea8a1cc28ad443f246f741
Gerrit-Change-Number: 14026
Gerrit-PatchSet: 6
Gerrit-Owner: Joe McDonnell 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Michael Ho 


[Impala-ASF-CR] IMPALA-8685,IMPALA-8677: Use consistent scheduling for small clusters

2019-08-13 Thread Joe McDonnell (Code Review)
Joe McDonnell has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/14026 )

Change subject: IMPALA-8685,IMPALA-8677: Use consistent scheduling for small 
clusters
..


Patch Set 6: Code-Review+1

Carry +1


--
To view, visit http://gerrit.cloudera.org:8080/14026
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Icfdb2cc53d7206e316ea8a1cc28ad443f246f741
Gerrit-Change-Number: 14026
Gerrit-PatchSet: 6
Gerrit-Owner: Joe McDonnell 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Michael Ho 
Gerrit-Comment-Date: Tue, 13 Aug 2019 23:41:47 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-8685,IMPALA-8677: Use consistent scheduling for small clusters

2019-08-13 Thread Joe McDonnell (Code Review)
Joe McDonnell has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/14026 )

Change subject: IMPALA-8685,IMPALA-8677: Use consistent scheduling for small 
clusters
..


Patch Set 5:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/14026/5/be/src/scheduling/scheduler.cc
File be/src/scheduling/scheduler.cc:

http://gerrit.cloudera.org:8080/#/c/14026/5/be/src/scheduling/scheduler.cc@781
PS5, Line 781: set
> Can this be an unordered_set ? It doesn't look like we rely on the order an
Yes, switched this over to an unordered_set and changed it to call reserve() 
with num_candidates.



--
To view, visit http://gerrit.cloudera.org:8080/14026
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Icfdb2cc53d7206e316ea8a1cc28ad443f246f741
Gerrit-Change-Number: 14026
Gerrit-PatchSet: 5
Gerrit-Owner: Joe McDonnell 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Michael Ho 
Gerrit-Comment-Date: Tue, 13 Aug 2019 23:41:35 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-8685,IMPALA-8677: Use consistent scheduling for small clusters

2019-08-13 Thread Michael Ho (Code Review)
Michael Ho has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/14026 )

Change subject: IMPALA-8685,IMPALA-8677: Use consistent scheduling for small 
clusters
..


Patch Set 5: Code-Review+1

(1 comment)

I will let Lars finish off the review.

http://gerrit.cloudera.org:8080/#/c/14026/5/be/src/scheduling/scheduler.cc
File be/src/scheduling/scheduler.cc:

http://gerrit.cloudera.org:8080/#/c/14026/5/be/src/scheduling/scheduler.cc@781
PS5, Line 781: set
Can this be an unordered_set ? It doesn't look like we rely on the order 
anymore, right ?



--
To view, visit http://gerrit.cloudera.org:8080/14026
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Icfdb2cc53d7206e316ea8a1cc28ad443f246f741
Gerrit-Change-Number: 14026
Gerrit-PatchSet: 5
Gerrit-Owner: Joe McDonnell 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Michael Ho 
Gerrit-Comment-Date: Tue, 13 Aug 2019 23:19:26 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-8685,IMPALA-8677: Use consistent scheduling for small clusters

2019-08-13 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/14026 )

Change subject: IMPALA-8685,IMPALA-8677: Use consistent scheduling for small 
clusters
..


Patch Set 5: Verified+1


--
To view, visit http://gerrit.cloudera.org:8080/14026
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Icfdb2cc53d7206e316ea8a1cc28ad443f246f741
Gerrit-Change-Number: 14026
Gerrit-PatchSet: 5
Gerrit-Owner: Joe McDonnell 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Michael Ho 
Gerrit-Comment-Date: Tue, 13 Aug 2019 10:22:31 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-8685,IMPALA-8677: Use consistent scheduling for small clusters

2019-08-13 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/14026 )

Change subject: IMPALA-8685,IMPALA-8677: Use consistent scheduling for small 
clusters
..


Patch Set 4:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/4230/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/14026
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Icfdb2cc53d7206e316ea8a1cc28ad443f246f741
Gerrit-Change-Number: 14026
Gerrit-PatchSet: 4
Gerrit-Owner: Joe McDonnell 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Michael Ho 
Gerrit-Comment-Date: Tue, 13 Aug 2019 06:55:44 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-8685,IMPALA-8677: Use consistent scheduling for small clusters

2019-08-13 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/14026 )

Change subject: IMPALA-8685,IMPALA-8677: Use consistent scheduling for small 
clusters
..


Patch Set 5:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/4777/ 
DRY_RUN=true


--
To view, visit http://gerrit.cloudera.org:8080/14026
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Icfdb2cc53d7206e316ea8a1cc28ad443f246f741
Gerrit-Change-Number: 14026
Gerrit-PatchSet: 5
Gerrit-Owner: Joe McDonnell 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Michael Ho 
Gerrit-Comment-Date: Tue, 13 Aug 2019 06:16:53 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-8685,IMPALA-8677: Use consistent scheduling for small clusters

2019-08-13 Thread Joe McDonnell (Code Review)
Hello Michael Ho, Lars Volker, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/14026

to look at the new patch set (#4).

Change subject: IMPALA-8685,IMPALA-8677: Use consistent scheduling for small 
clusters
..

IMPALA-8685,IMPALA-8677: Use consistent scheduling for small clusters

In the original change for consistent scheduling, if a cluster has
fewer nodes than the number of remote executor candidates, then
the scheduler falls back to using the old SelectRemoteExecutor().
SelectRemoteExecutor() considers all backends and picks the backend
with the least assigned bytes; to break ties, it uses randomness.
This means that clusters with fewer backends than
num_remote_executor_candidates do not have consistent placement.

For the file handle cache (the original user of consistent
placement), this is not a major problem. However, for data caching,
it can result in slower warm up of the data cache and greater
duplication of the same data across different nodes.

This changes the algorithm to use consistent placement even for
small clusters (num nodes <= num_remote_executor_candidates).
To make this more predictable, it increases the maximum number
of iterations.

This also changes GetRemoteExecutorCandidates() to return the
candidates in the order that they were selected. While still
using a set for detecting duplicate backends, the vector of
distinct backends is constructed directly rather than by
iterating over the set.

Testing:
 - Modify the scheduler-test backend test to verify that small
   clusters use consistent scheduling.

Change-Id: Icfdb2cc53d7206e316ea8a1cc28ad443f246f741
---
M be/src/scheduling/scheduler-test.cc
M be/src/scheduling/scheduler.cc
2 files changed, 82 insertions(+), 40 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/26/14026/4
--
To view, visit http://gerrit.cloudera.org:8080/14026
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Icfdb2cc53d7206e316ea8a1cc28ad443f246f741
Gerrit-Change-Number: 14026
Gerrit-PatchSet: 4
Gerrit-Owner: Joe McDonnell 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Michael Ho 


[Impala-ASF-CR] IMPALA-8685,IMPALA-8677: Use consistent scheduling for small clusters

2019-08-13 Thread Joe McDonnell (Code Review)
Joe McDonnell has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/14026 )

Change subject: IMPALA-8685,IMPALA-8677: Use consistent scheduling for small 
clusters
..


Patch Set 3:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/14026/3/be/src/scheduling/scheduler.cc
File be/src/scheduling/scheduler.cc:

http://gerrit.cloudera.org:8080/#/c/14026/3/be/src/scheduling/scheduler.cc@545
PS3, Line 545:   int num_remote_executor_candidates = 
query_options.num_remote_executor_candidates;
 :   if (executor_group.NumExecutors() < 
num_remote_executor_candidates) {
 : num_remote_executor_candidates = 
executor_group.NumExecutors();
 :   }
> nit: use std::min
Done


http://gerrit.cloudera.org:8080/#/c/14026/3/be/src/scheduling/scheduler.cc@802
PS3, Line 802: 8
> It seems clearer if we define this constant as a constant variable.
Turned this into a static const


http://gerrit.cloudera.org:8080/#/c/14026/3/be/src/scheduling/scheduler.cc@811
PS3, Line 811: if (candidates_it == remote_executor_candidates->end()) {
> My thinking was that when n is small, maintaining one structure (even thoug
Changed this to use a set (like before), but to put the IpAddrs directly in the 
vector rather than getting them from the set.



--
To view, visit http://gerrit.cloudera.org:8080/14026
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Icfdb2cc53d7206e316ea8a1cc28ad443f246f741
Gerrit-Change-Number: 14026
Gerrit-PatchSet: 3
Gerrit-Owner: Joe McDonnell 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Michael Ho 
Gerrit-Comment-Date: Tue, 13 Aug 2019 06:15:20 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-8685,IMPALA-8677: Use consistent scheduling for small clusters

2019-08-12 Thread Joe McDonnell (Code Review)
Joe McDonnell has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/14026 )

Change subject: IMPALA-8685,IMPALA-8677: Use consistent scheduling for small 
clusters
..


Patch Set 3:

(3 comments)

Working on a new upload

http://gerrit.cloudera.org:8080/#/c/14026/3/be/src/scheduling/scheduler.cc
File be/src/scheduling/scheduler.cc:

http://gerrit.cloudera.org:8080/#/c/14026/3/be/src/scheduling/scheduler.cc@796
PS3, Line 796: P((1/3)^(n-1))
> Is there a typo here ? Not sure what (P(1/3^(n-1)) means ?
Definitely a typo


http://gerrit.cloudera.org:8080/#/c/14026/3/be/src/scheduling/scheduler.cc@811
PS3, Line 811: if (candidates_it == remote_executor_candidates->end()) {
> This is now O(n^2), right? Is there a bound on num_executors and if so, sho
Yes, this is O(n^2).

We limit the num_remote_executor_candidates to be at most 16 via the query 
option setting code. We also limit it to be the number of nodes if that is 
smaller. The default is 3 and some systems are going to use 2. I doubt we are 
going to set it higher than 3, so we could cut the maximum allowed value to 8 
without any real problem.

I haven't benchmarked this.


http://gerrit.cloudera.org:8080/#/c/14026/3/be/src/scheduling/scheduler.cc@811
PS3, Line 811: if (candidates_it == remote_executor_candidates->end()) {
> Or we can consider using an unordered_set to track the candidates found so
My thinking was that when n is small, maintaining one structure (even though it 
is O(n^2)) might still be better than maintaining two.

It is easy to go back to using the set. I would just put the IpAddrs directly 
in the vector rather than iterating over the set at the end.



--
To view, visit http://gerrit.cloudera.org:8080/14026
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Icfdb2cc53d7206e316ea8a1cc28ad443f246f741
Gerrit-Change-Number: 14026
Gerrit-PatchSet: 3
Gerrit-Owner: Joe McDonnell 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Michael Ho 
Gerrit-Comment-Date: Mon, 12 Aug 2019 21:27:42 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-8685,IMPALA-8677: Use consistent scheduling for small clusters

2019-08-12 Thread Lars Volker (Code Review)
Lars Volker has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/14026 )

Change subject: IMPALA-8685,IMPALA-8677: Use consistent scheduling for small 
clusters
..


Patch Set 3:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/14026/3/be/src/scheduling/scheduler.cc
File be/src/scheduling/scheduler.cc:

http://gerrit.cloudera.org:8080/#/c/14026/3/be/src/scheduling/scheduler.cc@545
PS3, Line 545:   int num_remote_executor_candidates = 
query_options.num_remote_executor_candidates;
 :   if (executor_group.NumExecutors() < 
num_remote_executor_candidates) {
 : num_remote_executor_candidates = 
executor_group.NumExecutors();
 :   }
nit: use std::min


http://gerrit.cloudera.org:8080/#/c/14026/3/be/src/scheduling/scheduler.cc@811
PS3, Line 811: if (candidates_it == remote_executor_candidates->end()) {
This is now O(n^2), right? Is there a bound on num_executors and if so, should 
we add a DCHECK to make sure it's not large? Have you benchmarked this to see 
if it changes the runtime significantly?



--
To view, visit http://gerrit.cloudera.org:8080/14026
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Icfdb2cc53d7206e316ea8a1cc28ad443f246f741
Gerrit-Change-Number: 14026
Gerrit-PatchSet: 3
Gerrit-Owner: Joe McDonnell 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Michael Ho 
Gerrit-Comment-Date: Mon, 12 Aug 2019 16:15:11 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-8685,IMPALA-8677: Use consistent scheduling for small clusters

2019-08-11 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/14026 )

Change subject: IMPALA-8685,IMPALA-8677: Use consistent scheduling for small 
clusters
..


Patch Set 3: Verified+1


--
To view, visit http://gerrit.cloudera.org:8080/14026
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Icfdb2cc53d7206e316ea8a1cc28ad443f246f741
Gerrit-Change-Number: 14026
Gerrit-PatchSet: 3
Gerrit-Owner: Joe McDonnell 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Michael Ho 
Gerrit-Comment-Date: Sun, 11 Aug 2019 07:59:42 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-8685,IMPALA-8677: Use consistent scheduling for small clusters

2019-08-10 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/14026 )

Change subject: IMPALA-8685,IMPALA-8677: Use consistent scheduling for small 
clusters
..


Patch Set 3:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/4770/ 
DRY_RUN=true


--
To view, visit http://gerrit.cloudera.org:8080/14026
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Icfdb2cc53d7206e316ea8a1cc28ad443f246f741
Gerrit-Change-Number: 14026
Gerrit-PatchSet: 3
Gerrit-Owner: Joe McDonnell 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Michael Ho 
Gerrit-Comment-Date: Sun, 11 Aug 2019 03:51:11 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-8685,IMPALA-8677: Use consistent scheduling for small clusters

2019-08-09 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/14026 )

Change subject: IMPALA-8685,IMPALA-8677: Use consistent scheduling for small 
clusters
..


Patch Set 3: Verified-1

Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/4766/


--
To view, visit http://gerrit.cloudera.org:8080/14026
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Icfdb2cc53d7206e316ea8a1cc28ad443f246f741
Gerrit-Change-Number: 14026
Gerrit-PatchSet: 3
Gerrit-Owner: Joe McDonnell 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Michael Ho 
Gerrit-Comment-Date: Fri, 09 Aug 2019 20:41:09 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-8685,IMPALA-8677: Use consistent scheduling for small clusters

2019-08-09 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/14026 )

Change subject: IMPALA-8685,IMPALA-8677: Use consistent scheduling for small 
clusters
..


Patch Set 3:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/4209/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/14026
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Icfdb2cc53d7206e316ea8a1cc28ad443f246f741
Gerrit-Change-Number: 14026
Gerrit-PatchSet: 3
Gerrit-Owner: Joe McDonnell 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Michael Ho 
Gerrit-Comment-Date: Fri, 09 Aug 2019 17:09:35 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-8685,IMPALA-8677: Use consistent scheduling for small clusters

2019-08-09 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/14026 )

Change subject: IMPALA-8685,IMPALA-8677: Use consistent scheduling for small 
clusters
..


Patch Set 3:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/4766/ 
DRY_RUN=true


--
To view, visit http://gerrit.cloudera.org:8080/14026
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Icfdb2cc53d7206e316ea8a1cc28ad443f246f741
Gerrit-Change-Number: 14026
Gerrit-PatchSet: 3
Gerrit-Owner: Joe McDonnell 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Michael Ho 
Gerrit-Comment-Date: Fri, 09 Aug 2019 16:28:55 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-8685,IMPALA-8677: Use consistent scheduling for small clusters

2019-08-09 Thread Joe McDonnell (Code Review)
Hello Michael Ho, Lars Volker, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/14026

to look at the new patch set (#3).

Change subject: IMPALA-8685,IMPALA-8677: Use consistent scheduling for small 
clusters
..

IMPALA-8685,IMPALA-8677: Use consistent scheduling for small clusters

In the original change for consistent scheduling, if a cluster has
fewer nodes than the number of remote executor candidates, then
the scheduler falls back to using the old SelectRemoteExecutor().
SelectRemoteExecutor() considers all backends and picks the backend
with the least assigned bytes; to break ties, it uses randomness.
This means that clusters with fewer backends than
num_remote_executor_candidates do not have consistent placement.

For the file handle cache (the original user of consistent
placement), this is not a major problem. However, for data caching,
it can result in slower warm up of the data cache and greater
duplication of the same data across different nodes.

This changes the algorithm to use consistent placement even for
small clusters (num nodes <= num_remote_executor_candidates).
To make this more predictable, it increases the maximum number
of iterations.

This also changes GetRemoteExecutorCandidates() to return the
candidates in the order that they were selected. In code, this
means looping over a vector to detect distinct backends rather
than using a set.

Testing:
 - Modify the scheduler-test backend test to verify that small
   clusters use consistent scheduling.

Change-Id: Icfdb2cc53d7206e316ea8a1cc28ad443f246f741
---
M be/src/scheduling/scheduler-test.cc
M be/src/scheduling/scheduler.cc
2 files changed, 75 insertions(+), 44 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/26/14026/3
--
To view, visit http://gerrit.cloudera.org:8080/14026
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Icfdb2cc53d7206e316ea8a1cc28ad443f246f741
Gerrit-Change-Number: 14026
Gerrit-PatchSet: 3
Gerrit-Owner: Joe McDonnell 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Michael Ho 


[Impala-ASF-CR] IMPALA-8685,IMPALA-8677: Use consistent scheduling for small clusters

2019-08-06 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/14026 )

Change subject: IMPALA-8685,IMPALA-8677: Use consistent scheduling for small 
clusters
..


Patch Set 2: Verified-1

Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/4747/


--
To view, visit http://gerrit.cloudera.org:8080/14026
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Icfdb2cc53d7206e316ea8a1cc28ad443f246f741
Gerrit-Change-Number: 14026
Gerrit-PatchSet: 2
Gerrit-Owner: Joe McDonnell 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Michael Ho 
Gerrit-Comment-Date: Wed, 07 Aug 2019 05:27:23 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-8685,IMPALA-8677: Use consistent scheduling for small clusters

2019-08-06 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/14026 )

Change subject: IMPALA-8685,IMPALA-8677: Use consistent scheduling for small 
clusters
..


Patch Set 2:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/4166/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/14026
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Icfdb2cc53d7206e316ea8a1cc28ad443f246f741
Gerrit-Change-Number: 14026
Gerrit-PatchSet: 2
Gerrit-Owner: Joe McDonnell 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Michael Ho 
Gerrit-Comment-Date: Wed, 07 Aug 2019 01:46:24 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-8685,IMPALA-8677: Use consistent scheduling for small clusters

2019-08-06 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/14026 )

Change subject: IMPALA-8685,IMPALA-8677: Use consistent scheduling for small 
clusters
..


Patch Set 1:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/4165/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/14026
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Icfdb2cc53d7206e316ea8a1cc28ad443f246f741
Gerrit-Change-Number: 14026
Gerrit-PatchSet: 1
Gerrit-Owner: Joe McDonnell 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Michael Ho 
Gerrit-Comment-Date: Wed, 07 Aug 2019 01:42:53 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-8685,IMPALA-8677: Use consistent scheduling for small clusters

2019-08-06 Thread Joe McDonnell (Code Review)
Joe McDonnell has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/14026 )

Change subject: IMPALA-8685,IMPALA-8677: Use consistent scheduling for small 
clusters
..


Patch Set 2:

I'm still thinking through the testing, but I thought I'd put this up.


--
To view, visit http://gerrit.cloudera.org:8080/14026
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Icfdb2cc53d7206e316ea8a1cc28ad443f246f741
Gerrit-Change-Number: 14026
Gerrit-PatchSet: 2
Gerrit-Owner: Joe McDonnell 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Michael Ho 
Gerrit-Comment-Date: Wed, 07 Aug 2019 01:04:11 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-8685,IMPALA-8677: Use consistent scheduling for small clusters

2019-08-06 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/14026 )

Change subject: IMPALA-8685,IMPALA-8677: Use consistent scheduling for small 
clusters
..


Patch Set 2:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/4747/ 
DRY_RUN=true


--
To view, visit http://gerrit.cloudera.org:8080/14026
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Icfdb2cc53d7206e316ea8a1cc28ad443f246f741
Gerrit-Change-Number: 14026
Gerrit-PatchSet: 2
Gerrit-Owner: Joe McDonnell 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Michael Ho 
Gerrit-Comment-Date: Wed, 07 Aug 2019 01:04:33 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-8685,IMPALA-8677: Use consistent scheduling for small clusters

2019-08-06 Thread Joe McDonnell (Code Review)
Joe McDonnell has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/14026 )

Change subject: IMPALA-8685,IMPALA-8677: Use consistent scheduling for small 
clusters
..


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/14026/1/be/src/scheduling/scheduler-test.cc
File be/src/scheduling/scheduler-test.cc:

http://gerrit.cloudera.org:8080/#/c/14026/1/be/src/scheduling/scheduler-test.cc@234
PS1, Line 234: for (int num_candidates = 1; num_candidates <= 
num_impala_nodes + 2; ++num_candidates) {
> line too long (92 > 90)
Done



--
To view, visit http://gerrit.cloudera.org:8080/14026
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Icfdb2cc53d7206e316ea8a1cc28ad443f246f741
Gerrit-Change-Number: 14026
Gerrit-PatchSet: 1
Gerrit-Owner: Joe McDonnell 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Comment-Date: Wed, 07 Aug 2019 01:03:11 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-8685,IMPALA-8677: Use consistent scheduling for small clusters

2019-08-06 Thread Joe McDonnell (Code Review)
Hello Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/14026

to look at the new patch set (#2).

Change subject: IMPALA-8685,IMPALA-8677: Use consistent scheduling for small 
clusters
..

IMPALA-8685,IMPALA-8677: Use consistent scheduling for small clusters

In the original change for consistent scheduling, if a cluster has
fewer nodes than the number of remote executor candidates, then
the scheduler falls back to using the old SelectRemoteExecutor().
SelectRemoteExecutor() considers all backends and picks the backend
with the least assigned bytes; to break ties, it uses randomness.
This means that clusters with fewer backends than
num_remote_executor_candidates do not have consistent placement.

For the file handle cache (the original user of consistent
placement), this is not a major problem. However, for data caching,
it can result in slower warm up of the data cache and greater
duplication of the same data across different nodes.

This changes the algorithm to use consistent placement even for
small clusters (num nodes <= num_remote_executor_candidates).

This also changes GetRemoteExecutorCandidates() to return the
candidates in the order that they were selected. In code, this
means looping over a vector to detect distinct backends rather
than using a set.

Testing:
 - Modify the scheduler-test backend test to verify that small
   clusters use consistent scheduling.

Change-Id: Icfdb2cc53d7206e316ea8a1cc28ad443f246f741
---
M be/src/scheduling/scheduler-test.cc
M be/src/scheduling/scheduler.cc
2 files changed, 55 insertions(+), 43 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/26/14026/2
--
To view, visit http://gerrit.cloudera.org:8080/14026
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Icfdb2cc53d7206e316ea8a1cc28ad443f246f741
Gerrit-Change-Number: 14026
Gerrit-PatchSet: 2
Gerrit-Owner: Joe McDonnell 
Gerrit-Reviewer: Impala Public Jenkins 


[Impala-ASF-CR] IMPALA-8685,IMPALA-8677: Use consistent scheduling for small clusters

2019-08-06 Thread Joe McDonnell (Code Review)
Joe McDonnell has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/14026


Change subject: IMPALA-8685,IMPALA-8677: Use consistent scheduling for small 
clusters
..

IMPALA-8685,IMPALA-8677: Use consistent scheduling for small clusters

In the original change for consistent scheduling, if a cluster has
fewer nodes than the number of remote executor candidates, then
the scheduler falls back to using the old SelectRemoteExecutor().
SelectRemoteExecutor() considers all backends and picks the backend
with the least assigned bytes; to break ties, it uses randomness.
This means that clusters with fewer backends than
num_remote_executor_candidates do not have consistent placement.

For the file handle cache (the original user of consistent
placement), this is not a major problem. However, for data caching,
it can result in slower warm up of the data cache and greater
duplication of the same data across different nodes.

This changes the algorithm to use consistent placement even for
small clusters (num nodes <= num_remote_executor_candidates).

This also changes GetRemoteExecutorCandidates() to return the
candidates in the order that they were selected. In code, this
means looping over a vector to detect distinct backends rather
than using a set.

Testing:
 - Modify the scheduler-test backend test to verify that small
   clusters use consistent scheduling.

Change-Id: Icfdb2cc53d7206e316ea8a1cc28ad443f246f741
---
M be/src/scheduling/scheduler-test.cc
M be/src/scheduling/scheduler.cc
2 files changed, 54 insertions(+), 43 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/26/14026/1
--
To view, visit http://gerrit.cloudera.org:8080/14026
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: Icfdb2cc53d7206e316ea8a1cc28ad443f246f741
Gerrit-Change-Number: 14026
Gerrit-PatchSet: 1
Gerrit-Owner: Joe McDonnell 


[Impala-ASF-CR] IMPALA-8685,IMPALA-8677: Use consistent scheduling for small clusters

2019-08-06 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/14026 )

Change subject: IMPALA-8685,IMPALA-8677: Use consistent scheduling for small 
clusters
..


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/14026/1/be/src/scheduling/scheduler-test.cc
File be/src/scheduling/scheduler-test.cc:

http://gerrit.cloudera.org:8080/#/c/14026/1/be/src/scheduling/scheduler-test.cc@234
PS1, Line 234: for (int num_candidates = 1; num_candidates <= 
num_impala_nodes + 2; ++num_candidates) {
line too long (92 > 90)



--
To view, visit http://gerrit.cloudera.org:8080/14026
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Icfdb2cc53d7206e316ea8a1cc28ad443f246f741
Gerrit-Change-Number: 14026
Gerrit-PatchSet: 1
Gerrit-Owner: Joe McDonnell 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Wed, 07 Aug 2019 01:01:41 +
Gerrit-HasComments: Yes