Tim Armstrong has posted comments on this change. Change subject: IMPALA-4856: Port data stream service to KRPC ......................................................................
Patch Set 1: (1 comment) http://gerrit.cloudera.org:8080/#/c/8023/1/be/src/runtime/krpc-data-stream-mgr.h File be/src/runtime/krpc-data-stream-mgr.h: Line 129: /// In exceptional circumstances, the data stream manager will garbage-collect the closed There's a pre-existing flaw in the reasoning here that we should call out. "Exceptional circumstances" is vague and I think hides a distinction between an unhealthy cluster with extreme delays and the expected behaviour of certain long-running queries. I think the problem is an invalid assumption that the the receiver sends batches on a regular cadence with a bounded delay before the first batch is sent and when each subsequent batch is sent. That assumption is incorrect. I think we should call it out in this comment so that readers understand the current flaw. Here's an example where it's wrong. Consider a plan with three fragments. F1 (long-running) | V F2 (limit = 1 on exchange) | V F3 (long-running selective scan) 1. The fragments all start up. 2. Instance 1 of F3 immediately finds and returns a matching row, which is sent to F2. 3. This causes F2 to hit its limit, close its exchange and tear itself down. 4. Let's assume F1 also has a lot of work to do and won't finish for 20 minutes 5. Instance 2 of F3 is still churning away on the scan. After 10 minutes it finally find a matching row. 6. F3 tries to send the row, can't find the receiver after a timeout and returns an error to the coordinator 7. The coordinator cancels the query and returns an error There are two problems here: 1. The query failed when it shouldn't have 2. F3 wasn't cancelled when it was no longer needed and used lots of resources unnecessarily. The JIRA is IMPALA-3990. I believe that the main reason we haven't seen this in practice is that it can only occur when there's a limit without order in a subquery. Most queries with that property are non-deterministic and it doesn't really make a lot of sense to have a long-running query that returns non-deterministic results. But this actually blocked me from implementing early-close for joins with empty build sides, which is a nice optimisations. There may also be a slightly different invalid assumption that the time between the receiver closing the exchange and the sender sending its last batch is bounded. That seems possible to solve with sender-side state if the receiver notifies the sender that the receiver was not present and the sender can infer it was closed cleanly. -- To view, visit http://gerrit.cloudera.org:8080/8023 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ic0b8c1e50678da66ab1547d16530f88b323ed8c1 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Michael Ho <[email protected]> Gerrit-Reviewer: Tim Armstrong <[email protected]> Gerrit-HasComments: Yes
