Tim Armstrong has posted comments on this change.

Change subject: IMPALA-4856: Port data stream service to KRPC
......................................................................


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/8023/1/be/src/runtime/krpc-data-stream-mgr.h
File be/src/runtime/krpc-data-stream-mgr.h:

Line 129: /// In exceptional circumstances, the data stream manager will 
garbage-collect the closed
There's a pre-existing flaw in the reasoning here that we should call out. 
"Exceptional circumstances" is vague and I think hides a distinction between an 
unhealthy cluster with extreme delays and the expected behaviour of certain 
long-running queries. I think the problem is an invalid assumption that the the 
receiver sends batches on a regular cadence with a bounded delay before the 
first batch is sent and when each subsequent batch is sent. That assumption is 
incorrect. I think we should call it out in this comment so that readers 
understand the current flaw. Here's an example where it's wrong.

Consider a plan with three fragments.

  F1 (long-running)
   |
   V
  F2 (limit = 1 on exchange)
   |
   V
  F3 (long-running selective scan)

1. The fragments all start up.
2. Instance 1 of F3 immediately finds and returns a matching row, which is sent 
to F2.
3. This causes F2 to hit its limit, close its exchange and tear itself down.
4. Let's assume F1 also has a lot of work to do and won't finish for 20 minutes
5. Instance 2 of F3 is still churning away on the scan. After 10 minutes it 
finally find a matching row.
6. F3 tries to send the row, can't find the receiver after a timeout and 
returns an error to the coordinator
7. The coordinator cancels the query and returns an error

There are two problems here:
1. The query failed when it shouldn't have
2. F3 wasn't cancelled when it was no longer needed and used lots of resources 
unnecessarily.

The JIRA is IMPALA-3990. I believe that the main reason we haven't seen this in 
practice is that it can only occur when there's a limit without order in a 
subquery. Most queries with that property are non-deterministic and it doesn't 
really make a lot of sense to have a long-running query that returns 
non-deterministic results.

But this actually blocked me from implementing early-close for joins with empty 
build sides, which is a nice optimisations.

There may also be a slightly different invalid assumption that the time between 
the receiver closing the exchange and the sender sending its last batch is 
bounded. That seems possible to solve with sender-side state if the receiver 
notifies the sender that the receiver was not present and the sender can infer 
it was closed cleanly.


-- 
To view, visit http://gerrit.cloudera.org:8080/8023
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ic0b8c1e50678da66ab1547d16530f88b323ed8c1
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Michael Ho <[email protected]>
Gerrit-Reviewer: Tim Armstrong <[email protected]>
Gerrit-HasComments: Yes

Reply via email to