Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/14104 )
Change subject: IMPALA-8803: Coordinator should release admitted memory per-backend ...................................................................... Patch Set 2: (10 comments) I actually had another round of comments, nothing too major. http://gerrit.cloudera.org:8080/#/c/14104/1/be/src/runtime/coordinator-backend-state-test.cc File be/src/runtime/coordinator-backend-state-test.cc: http://gerrit.cloudera.org:8080/#/c/14104/1/be/src/runtime/coordinator-backend-state-test.cc@40 PS1, Line 40: TUniqueId()); nit: should be a ** according to style guide http://gerrit.cloudera.org:8080/#/c/14104/1/be/src/runtime/coordinator-backend-state-test.cc@52 PS1, Line 52: void MakeBackendStates(int num_states, QuerySchedule* query_schedule, nit: ++i according to google style guide http://gerrit.cloudera.org:8080/#/c/14104/2/be/src/runtime/coordinator-backend-state-test.cc File be/src/runtime/coordinator-backend-state-test.cc: http://gerrit.cloudera.org:8080/#/c/14104/2/be/src/runtime/coordinator-backend-state-test.cc@113 PS2, Line 113: backend_resource_state->Close(&unreleased_backend_states); Calling Close() multiple times feels a bit weird to me since it doesn't really mirror the real usage pattern of BackendResourceStat. I guess maybe if we rename the method it will be a bit more obvious. Maybe exposing backend_resource_states_ directly, or having a test-only method to get the count of backends in different stats would end up making the intent of the test clearer. http://gerrit.cloudera.org:8080/#/c/14104/2/be/src/runtime/coordinator-backend-state.h File be/src/runtime/coordinator-backend-state.h: http://gerrit.cloudera.org:8080/#/c/14104/2/be/src/runtime/coordinator-backend-state.h@472 PS2, Line 472: /// Return a vector of UNRELEASED or PENDING BackendStates. Can you document that it's idempotent, and also if the contract is that the caller is expected to release resources for the returned backends. http://gerrit.cloudera.org:8080/#/c/14104/2/be/src/runtime/coordinator-backend-state.h@477 PS2, Line 477: static const int64_t RELEASE_BACKEND_STATES_DELAY_MS = 1000; I think we should consider making this and BATCHED_RELEASE_DECAY_FACTOR hidden flags, just as a safety valve in case the heuristic turns out to not work well in production. Hopefully this is the good kind of safety valve where it's easy to add and we never actually need to use it in practice because we picked good defaults. http://gerrit.cloudera.org:8080/#/c/14104/2/be/src/runtime/coordinator-backend-state.h@512 PS2, Line 512: int maybe int64_t? I don't think this is likely to overflow but would be nice to have more of a buffer http://gerrit.cloudera.org:8080/#/c/14104/1/be/src/runtime/coordinator-backend-state.h File be/src/runtime/coordinator-backend-state.h: http://gerrit.cloudera.org:8080/#/c/14104/1/be/src/runtime/coordinator-backend-state.h@477 PS1, Line 477: c con > nit: how about call this CloseAndGetUnreleasedBackends() then return the un +1 to returning the vector instead of requiring the caller pass it in, I think it simplifies the calling code. http://gerrit.cloudera.org:8080/#/c/14104/2/be/src/runtime/coordinator-backend-state.cc File be/src/runtime/coordinator-backend-state.cc: http://gerrit.cloudera.org:8080/#/c/14104/2/be/src/runtime/coordinator-backend-state.cc@807 PS2, Line 807: for (auto backend_state : backend_states_) { I think we can merge this into the constructor, since there's nothing that can fail and Init() is called right after the constructor. Usually the Init() pattern is used when we need to return a Status. http://gerrit.cloudera.org:8080/#/c/14104/2/be/src/runtime/coordinator-backend-state.cc@824 PS2, Line 824: // If the coordinator backend has not been released, but all other have, then the only What would happen if we enabled this when query result spooling was disabled also? If we can release resources early in that case too it may be beneficial. I think for a lot of queries there's enough buffering in the coordinator fragment to allow other fragments to be released (e.g. if it's a non-grouping aggregation, or if there are a small number of rows returned). http://gerrit.cloudera.org:8080/#/c/14104/2/be/src/scheduling/admission-controller.cc File be/src/scheduling/admission-controller.cc: http://gerrit.cloudera.org:8080/#/c/14104/2/be/src/scheduling/admission-controller.cc@992 PS2, Line 992: num_released_backends_[schedule.query_id()] -= host_addrs.size(); nit: We could use the released_backends iterator instead of looking it up again in the map. -- To view, visit http://gerrit.cloudera.org:8080/14104 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I88bb11e0ede7574568020e0277dd8ac8d2586dc9 Gerrit-Change-Number: 14104 Gerrit-PatchSet: 2 Gerrit-Owner: Sahil Takiar <[email protected]> Gerrit-Reviewer: Bikramjeet Vig <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Michael Ho <[email protected]> Gerrit-Reviewer: Sahil Takiar <[email protected]> Gerrit-Reviewer: Tim Armstrong <[email protected]> Gerrit-Comment-Date: Thu, 22 Aug 2019 19:52:38 +0000 Gerrit-HasComments: Yes
