Mostafa Mokhtar has posted comments on this change. ( http://gerrit.cloudera.org:8080/10415 )
Change subject: IMPALA-6034: Add Cpu and scanned bytes limits per query ...................................................................... Patch Set 1: (31 comments) http://gerrit.cloudera.org:8080/#/c/10415/1/be/src/runtime/coordinator-backend-state.h File be/src/runtime/coordinator-backend-state.h: http://gerrit.cloudera.org:8080/#/c/10415/1/be/src/runtime/coordinator-backend-state.h@115 PS1, Line 115: Cpu > Maybe clarify: These are methods returns total consumption per backend across fragment instances. http://gerrit.cloudera.org:8080/#/c/10415/1/be/src/runtime/coordinator-backend-state.h@118 PS1, Line 118: /// Return total sys Cpu. > /// Return total system CPU consumption across all backends. Done http://gerrit.cloudera.org:8080/#/c/10415/1/be/src/runtime/coordinator-backend-state.h@121 PS1, Line 121: /// Return total scanned bytes. > /// Return total scanned bytes (from HDFS or HDFS-compatible filesystem) ac Done http://gerrit.cloudera.org:8080/#/c/10415/1/be/src/runtime/coordinator-backend-state.h@124 PS1, Line 124: void GetBackendResourceUsage(int64_t& user_cpu, int64_t& sys_cpu, int64_t& scanned_bytes, > It would be more readable if this returned a struct. 4 output arguments wit Done http://gerrit.cloudera.org:8080/#/c/10415/1/be/src/runtime/coordinator-backend-state.h@125 PS1, Line 125: int64_t& peak_mem > nit: use pointers for output parameters Done http://gerrit.cloudera.org:8080/#/c/10415/1/be/src/runtime/coordinator-backend-state.h@206 PS1, Line 206: /// total scan ranges complete across all scan nodes > maybe add "for this backend" This is a per fragment instance counter. http://gerrit.cloudera.org:8080/#/c/10415/1/be/src/runtime/coordinator-backend-state.h@209 PS1, Line 209: /// total user cpu consumed > maybe add "for this backend" Done http://gerrit.cloudera.org:8080/#/c/10415/1/be/src/runtime/coordinator-backend-state.h@213 PS1, Line 213: int64_t cpu_sys_ = 0; > maybe add "for this backend" Done http://gerrit.cloudera.org:8080/#/c/10415/1/be/src/runtime/coordinator-backend-state.h@281 PS1, Line 281: /// total scan ranges complete across all scan nodes > maybe add "across all backends. Updated by AggregateBackendStats()." Done http://gerrit.cloudera.org:8080/#/c/10415/1/be/src/runtime/coordinator-backend-state.h@284 PS1, Line 284: /// total user cpu consumed > maybe add "across all backends. Updated by AggregateBackendStats()." Done http://gerrit.cloudera.org:8080/#/c/10415/1/be/src/runtime/coordinator-backend-state.h@287 PS1, Line 287: /// total system cpu consumed > maybe add "across all backends. Updated by AggregateBackendStats()." Done http://gerrit.cloudera.org:8080/#/c/10415/1/be/src/runtime/coordinator.h File be/src/runtime/coordinator.h: http://gerrit.cloudera.org:8080/#/c/10415/1/be/src/runtime/coordinator.h@169 PS1, Line 169: void > Maybe return a struct instead of multiple output arguments? Done http://gerrit.cloudera.org:8080/#/c/10415/1/be/src/runtime/coordinator.h@169 PS1, Line 169: int64_t& > nit: use pointer for output parameters Done http://gerrit.cloudera.org:8080/#/c/10415/1/be/src/runtime/coordinator.cc File be/src/runtime/coordinator.cc: http://gerrit.cloudera.org:8080/#/c/10415/1/be/src/runtime/coordinator.cc@721 PS1, Line 721: : int64_t total_cpu =0, total_scanned_bytes =0, backend_user_cpu = 0, : backend_sys_cpu = 0,backend_bytes_read = 0, peak_memory = 0; > nit: various whitespace inconsistencies Done http://gerrit.cloudera.org:8080/#/c/10415/1/be/src/runtime/coordinator.cc@726 PS1, Line 726: > nit: don't need some of this vertical whitespace Done http://gerrit.cloudera.org:8080/#/c/10415/1/be/src/runtime/coordinator.cc@751 PS1, Line 751: query_profile_->AddInfoString("Total CPU time", cpu_total_info.str()); > nit: capitalisation is a bit inconsistent - I think everything should be ti Done http://gerrit.cloudera.org:8080/#/c/10415/1/be/src/runtime/coordinator.cc@755 PS1, Line 755: query_profile_->AddInfoString("Per Node User time", cpu_user_info.str()); > Any reason why we don't show the aggregate user and system CPU time across Check "Total CPU time" which aggregates user and system CPU across all backends. http://gerrit.cloudera.org:8080/#/c/10415/1/be/src/runtime/coordinator.cc@802 PS1, Line 802: total_user_cpu += backend_state->GetUserCpu(); > Should we modify the BackendState functions so that we can get all three va Done http://gerrit.cloudera.org:8080/#/c/10415/1/be/src/service/impala-server.h File be/src/service/impala-server.h: http://gerrit.cloudera.org:8080/#/c/10415/1/be/src/service/impala-server.h@1041 PS1, Line 1041: int64_t max_cpu_time_ns; > I think we can avoid including these new fields in ExpirationEvent - the ex Done http://gerrit.cloudera.org:8080/#/c/10415/1/be/src/service/impala-server.cc File be/src/service/impala-server.cc: http://gerrit.cloudera.org:8080/#/c/10415/1/be/src/service/impala-server.cc@1934 PS1, Line 1934: int64_t cpu_time_ns =0 , scan_bytes =0; > nit: weird whitespace Done http://gerrit.cloudera.org:8080/#/c/10415/1/be/src/service/impala-server.cc@1943 PS1, Line 1943: "Query $0 expired due to cpu limit of $1", > Should these messages include the amount of CPU consumed by the query? This is intentional, to avoid error messages like: Query cancelled due consuming 500 Cpu seconds which exceeds limit of 100s. There are cases where the runtime profiles come in slower than expected which results in a non-linear increase in reported Cpu time. Query can report 0 Cpu time for a couple of seconds then report a much higher number than the limit. Also Cpu consumed so far is not a good indicator for how much higher the Cpu limit needs to be. http://gerrit.cloudera.org:8080/#/c/10415/1/be/src/service/impala-server.cc@1957 PS1, Line 1957: "Query $0 expired due to scan bytes limit of $1", > Should these messages include the bytes scanned by the query? Scanned bytes are added to the query profile. Check previous comment. http://gerrit.cloudera.org:8080/#/c/10415/1/be/src/service/impala-server.cc@1967 PS1, Line 1967: > I think we can get rid of this blank line and following comment. Done http://gerrit.cloudera.org:8080/#/c/10415/1/be/src/service/query-options-test.cc File be/src/service/query-options-test.cc: http://gerrit.cloudera.org:8080/#/c/10415/1/be/src/service/query-options-test.cc@232 PS1, Line 232: {MAKE_OPTIONDEF(exec_time_limit_s), {0, I32_MAX}}, > Add the CPU time option here? Done http://gerrit.cloudera.org:8080/#/c/10415/1/be/src/service/query-options.cc File be/src/service/query-options.cc: http://gerrit.cloudera.org:8080/#/c/10415/1/be/src/service/query-options.cc@678 PS1, Line 678: LOG(INFO) << "Max Cpu time is : " << max_cpu_time_s << " seconds"; > Leftover log message? Done http://gerrit.cloudera.org:8080/#/c/10415/1/common/thrift/ImpalaInternalService.thrift File common/thrift/ImpalaInternalService.thrift: http://gerrit.cloudera.org:8080/#/c/10415/1/common/thrift/ImpalaInternalService.thrift@286 PS1, Line 286: Scan bytes limit, after which a query will be cancelled > nit: "See comment in ImpalaService.thrift." would work too Done http://gerrit.cloudera.org:8080/#/c/10415/1/common/thrift/ImpalaInternalService.thrift@289 PS1, Line 289: CPU time limit in seconds, after which a query will be cancelled > same here Done http://gerrit.cloudera.org:8080/#/c/10415/1/tests/custom_cluster/test_query_expiration.py File tests/custom_cluster/test_query_expiration.py: http://gerrit.cloudera.org:8080/#/c/10415/1/tests/custom_cluster/test_query_expiration.py@168 PS1, Line 168: @CustomClusterTestSuite.with_args("--idle_query_timeout=0") > I don't think this actually needs to be a custom cluster test. Done http://gerrit.cloudera.org:8080/#/c/10415/1/tests/custom_cluster/test_query_expiration.py@179 PS1, Line 179: client.execute("SET MAX_CPU_TIME_S=10000") : client.execute("SET MAX_SCAN_BYTES=10G") > can use this instead to set multiple options: Done http://gerrit.cloudera.org:8080/#/c/10415/1/tests/custom_cluster/test_query_expiration.py@182 PS1, Line 182: client.execute("SET MAX_CPU_TIME_S=0") : client.execute("SET MAX_SCAN_BYTES=0") > similarly can use this instead: Tried client.clear_configuration() and doesn't work as expected. http://gerrit.cloudera.org:8080/#/c/10415/1/tests/custom_cluster/test_query_expiration.py@219 PS1, Line 219: sleep(30) > We should avoid long sleeps in tests, they make the tests slow and can mask Done -- To view, visit http://gerrit.cloudera.org:8080/10415 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I4c6015e21da684bb9f33e236d71309dd4c178a20 Gerrit-Change-Number: 10415 Gerrit-PatchSet: 1 Gerrit-Owner: Mostafa Mokhtar <mmokh...@cloudera.com> Gerrit-Reviewer: Bikramjeet Vig <bikramjeet....@cloudera.com> Gerrit-Reviewer: Mostafa Mokhtar <mmokh...@cloudera.com> Gerrit-Reviewer: Tim Armstrong <tarmstr...@cloudera.com> Gerrit-Comment-Date: Mon, 21 May 2018 22:05:11 +0000 Gerrit-HasComments: Yes