paul-rogers opened a new pull request #1981: DRILL-7583: Remove STOP status from operator outcome URL: https://github.com/apache/drill/pull/1981 Removes the `IterOutcome.STOP` status and revises the `kill` methods. ## Description Operators must handle both the "happy path" and failures. The "happy path" is defined by the `RecordBatch.IterOutcome` enum: each operator's `next()` method returns this outcome to describe what happened: batch, batch with new schema, or EOF. Operators originally handled the error path the same way: via an `IterOutcome` of `STOP`. Error handling was somewhat complex: * Set a failed flag in the operator, * Tell upstream operators they have failed, * Consuming data from the upstream until it returns `NONE`, * Originally, sometimes calling `close()`, * Returning a STOP status. Each operator implemented some variation or subset of the above. Early on, the error protocol was source of errors. I actually wrote a detailed analysis of the issues several years ago, and have been chipping away at fixing the problems ever since. Today, the error path mostly works; is simply a source of highly complex code. And, as it turns out, entirely unnecessary. Drill has long supported a "fail-fast" error path based on throwing an exception; relying on the fragment executor to clean up the operator stack. The "fail-fast" system is simple (throw an exception) and has a uniform way of cleaning up (call `close()` once on each operator.) Recent revisions have converted most operators to use the simpler fail-fast strategy based on throwing an exception instead of using the older STOP approach. The careful reader of those PRs will have noted that, as a result, no code returns the `STOP` status any longer. This PR goes to the next logical step and removes the old, complex, and now-unused `STOP` based path. There is a related mechanism in operators: the `kill()` method, and its implementation, `killIncoming()`. The original purpose appears to be that step above: when an operator fails, fail all its children. Things got messy when an operator received a `STOP` status: should it call `kill()` on its child? On itself? Lots of fun bugs to fix back in the day. With `STOP` retired, the main purpose of the `kill()` method disappears. There is, however, a second use that we retain: cancelling upstream operators in the "normal" case. Think of the LIMIT operator: once a `LIMIT 10` has 10 rows, it need not ask for more. A Parquet reader, say, may still be busily reading columns in worker threads. The `LIMIT` wants to tell its upstream operators, "hey, I have all the rows I need, thanks. You can go ahead and stop reading more data." That is, we need a "normal case" cancellation. To make clear that `kill()` now does cancellation, this PR renames the method to `cancel()`. ## Documentation This is not a user-visible change, it is purely internal to the execution engine. (The one user benefit is that error messages might be a bit more precise because of earlier work.) ## Testing Reran all unit tests which revealed a few complexities that remain, such as the way operators handle the `cancel()` call. All tests pass. The next step is to run the full test suite as part of pre-commit.
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services