Tim Armstrong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/10415 )

Change subject: IMPALA-6034: Add Cpu and scanned bytes limits per query
......................................................................


Patch Set 1:

(29 comments)

I have one major concern about the coordinator locking that we need to fix then 
a bunch of cleanup asks.

http://gerrit.cloudera.org:8080/#/c/10415/1/be/src/runtime/coordinator-backend-state.h
File be/src/runtime/coordinator-backend-state.h:

http://gerrit.cloudera.org:8080/#/c/10415/1/be/src/runtime/coordinator-backend-state.h@115
PS1, Line 115: Cpu
Maybe clarify:

  /// Return total user CPU consumption across all backends.


http://gerrit.cloudera.org:8080/#/c/10415/1/be/src/runtime/coordinator-backend-state.h@118
PS1, Line 118:   /// Return total sys Cpu.
/// Return total system CPU consumption across all backends.


http://gerrit.cloudera.org:8080/#/c/10415/1/be/src/runtime/coordinator-backend-state.h@121
PS1, Line 121:   /// Return total scanned bytes.
/// Return total scanned bytes (from HDFS or HDFS-compatible filesystem) across 
all backends.


http://gerrit.cloudera.org:8080/#/c/10415/1/be/src/runtime/coordinator-backend-state.h@124
PS1, Line 124:   void GetBackendResourceUsage(int64_t& user_cpu, int64_t& 
sys_cpu, int64_t& scanned_bytes,
It would be more readable if this returned a struct. 4 output arguments with 
the same type makes it easy to mess up argument order.


http://gerrit.cloudera.org:8080/#/c/10415/1/be/src/runtime/coordinator-backend-state.h@125
PS1, Line 125: int64_t& peak_mem
nit: use pointers for output parameters


http://gerrit.cloudera.org:8080/#/c/10415/1/be/src/runtime/coordinator-backend-state.h@206
PS1, Line 206:     /// total scan ranges complete across all scan nodes
maybe add "for this backend"


http://gerrit.cloudera.org:8080/#/c/10415/1/be/src/runtime/coordinator-backend-state.h@209
PS1, Line 209:     /// total user cpu consumed
maybe add "for this backend"


http://gerrit.cloudera.org:8080/#/c/10415/1/be/src/runtime/coordinator-backend-state.h@213
PS1, Line 213:     int64_t cpu_sys_ = 0;
maybe add "for this backend"


http://gerrit.cloudera.org:8080/#/c/10415/1/be/src/runtime/coordinator-backend-state.h@281
PS1, Line 281:   /// total scan ranges complete across all scan nodes
maybe add "across all backends. Updated by AggregateBackendStats()."


http://gerrit.cloudera.org:8080/#/c/10415/1/be/src/runtime/coordinator-backend-state.h@284
PS1, Line 284:   /// total user cpu consumed
maybe add "across all backends. Updated by AggregateBackendStats()."


http://gerrit.cloudera.org:8080/#/c/10415/1/be/src/runtime/coordinator-backend-state.h@287
PS1, Line 287:   /// total system cpu consumed
maybe add "across all backends. Updated by AggregateBackendStats()."


http://gerrit.cloudera.org:8080/#/c/10415/1/be/src/runtime/coordinator.h
File be/src/runtime/coordinator.h:

http://gerrit.cloudera.org:8080/#/c/10415/1/be/src/runtime/coordinator.h@169
PS1, Line 169: int64_t&
nit: use pointer for output parameters


http://gerrit.cloudera.org:8080/#/c/10415/1/be/src/runtime/coordinator.h@169
PS1, Line 169: void
Maybe return a struct instead of multiple output arguments?


http://gerrit.cloudera.org:8080/#/c/10415/1/be/src/runtime/coordinator.cc
File be/src/runtime/coordinator.cc:

http://gerrit.cloudera.org:8080/#/c/10415/1/be/src/runtime/coordinator.cc@721
PS1, Line 721:
             :   int64_t total_cpu =0, total_scanned_bytes =0, backend_user_cpu 
= 0,
             :       backend_sys_cpu = 0,backend_bytes_read = 0, peak_memory = 
0;
nit: various whitespace inconsistencies


http://gerrit.cloudera.org:8080/#/c/10415/1/be/src/runtime/coordinator.cc@726
PS1, Line 726:
nit: don't need some of this vertical whitespace


http://gerrit.cloudera.org:8080/#/c/10415/1/be/src/runtime/coordinator.cc@751
PS1, Line 751:   query_profile_->AddInfoString("Total CPU time", 
cpu_total_info.str());
nit: capitalisation is a bit inconsistent - I think everything should be title 
case, i.e. "Read", "Time"


http://gerrit.cloudera.org:8080/#/c/10415/1/be/src/runtime/coordinator.cc@755
PS1, Line 755:   query_profile_->AddInfoString("Per Node User time", 
cpu_user_info.str());
Any reason why we don't show the aggregate user and system CPU time across all 
backends? I guess people will look at the top-level number then only drill down 
in rare cases?


http://gerrit.cloudera.org:8080/#/c/10415/1/be/src/runtime/coordinator.cc@802
PS1, Line 802:     total_user_cpu += backend_state->GetUserCpu();
Should we modify the BackendState functions so that we can get all three values 
with a single lock acquisition, e.g. just have it return a 
BackendResourceUtilization struct with all three elements?


http://gerrit.cloudera.org:8080/#/c/10415/1/be/src/service/impala-server.h
File be/src/service/impala-server.h:

http://gerrit.cloudera.org:8080/#/c/10415/1/be/src/service/impala-server.h@1041
PS1, Line 1041:     int64_t max_cpu_time_ns;
I think we can avoid including these new fields in ExpirationEvent - the 
expiration thread can just look at the query options directly when it's a 
RESOURCE_LIMIT event.


http://gerrit.cloudera.org:8080/#/c/10415/1/be/src/service/impala-server.cc
File be/src/service/impala-server.cc:

http://gerrit.cloudera.org:8080/#/c/10415/1/be/src/service/impala-server.cc@1015
PS1, Line 1015:           now + (1000L),  max_cpu_time_ns, 0,
nit: don't needs parents around 1000L


http://gerrit.cloudera.org:8080/#/c/10415/1/be/src/service/impala-server.cc@1934
PS1, Line 1934:           int64_t cpu_time_ns =0 , scan_bytes =0;
nit: weird whitespace


http://gerrit.cloudera.org:8080/#/c/10415/1/be/src/service/impala-server.cc@1935
PS1, Line 1935:           
query_state->coord()->AggregateBackendsResourceUsage(cpu_time_ns, scan_bytes);
Ahh... I think I see a problem. It looks like this is acquiring 
Coordinator::lock_, which can be held for a long time when the coordinator is 
doing things. We don't want the expiration thread getting stuck in those cases.

Dan's coordinator patch that got reverted will remove some of the complications 
here, but I'll have to do another pass over this to figure out how the locking 
should work so that this thread cannot get block. This does seem tractable 
though.


http://gerrit.cloudera.org:8080/#/c/10415/1/be/src/service/impala-server.cc@1943
PS1, Line 1943:                 "Query $0 expired due to cpu limit of $1",
Should these messages include the amount of CPU consumed by the query?


http://gerrit.cloudera.org:8080/#/c/10415/1/be/src/service/impala-server.cc@1957
PS1, Line 1957:                 "Query $0 expired due to scan bytes limit of 
$1",
Should these messages include the bytes scanned by the query?


http://gerrit.cloudera.org:8080/#/c/10415/1/be/src/service/impala-server.cc@1967
PS1, Line 1967:
I think we can get rid of this blank line and following comment.


http://gerrit.cloudera.org:8080/#/c/10415/1/be/src/service/query-options-test.cc
File be/src/service/query-options-test.cc:

http://gerrit.cloudera.org:8080/#/c/10415/1/be/src/service/query-options-test.cc@232
PS1, Line 232:       {MAKE_OPTIONDEF(exec_time_limit_s),              {0, 
I32_MAX}},
Add the CPU time option here?


http://gerrit.cloudera.org:8080/#/c/10415/1/be/src/service/query-options.cc
File be/src/service/query-options.cc:

http://gerrit.cloudera.org:8080/#/c/10415/1/be/src/service/query-options.cc@678
PS1, Line 678:         LOG(INFO) << "Max Cpu time is : " << max_cpu_time_s << " 
seconds";
Leftover log message?


http://gerrit.cloudera.org:8080/#/c/10415/1/tests/custom_cluster/test_query_expiration.py
File tests/custom_cluster/test_query_expiration.py:

http://gerrit.cloudera.org:8080/#/c/10415/1/tests/custom_cluster/test_query_expiration.py@168
PS1, Line 168:   @CustomClusterTestSuite.with_args("--idle_query_timeout=0")
I don't think this actually needs to be a custom cluster test.

We can make it a parallel test. I did something similar here but the patch 
isn't merged yet: 
https://gerrit.cloudera.org/#/c/10365/4/tests/query_test/test_resource_limits.py


http://gerrit.cloudera.org:8080/#/c/10415/1/tests/custom_cluster/test_query_expiration.py@219
PS1, Line 219:     sleep(30)
We should avoid long sleeps in tests, they make the tests slow and can mask 
problems. The timeout options are inherently time-based, so we kind-of needed 
to do that above, but we don't need to do that here.

I think we can just poll each of the client handles until they change state, 
right? With some timeout...



--
To view, visit http://gerrit.cloudera.org:8080/10415
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I4c6015e21da684bb9f33e236d71309dd4c178a20
Gerrit-Change-Number: 10415
Gerrit-PatchSet: 1
Gerrit-Owner: Mostafa Mokhtar <mmokh...@cloudera.com>
Gerrit-Reviewer: Bikramjeet Vig <bikramjeet....@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <tarmstr...@cloudera.com>
Gerrit-Comment-Date: Wed, 16 May 2018 19:16:27 +0000
Gerrit-HasComments: Yes

Reply via email to