paul-rogers opened a new pull request #1981: DRILL-7583: Remove STOP status 
from operator outcome
URL: https://github.com/apache/drill/pull/1981
 
 
   Removes the `IterOutcome.STOP` status and revises the `kill` methods.
   
   ## Description
   
   Operators must handle both the "happy path" and failures. The "happy path" 
is defined by the `RecordBatch.IterOutcome` enum: each operator's `next()` 
method returns this outcome to describe what happened: batch, batch with new 
schema, or EOF.
   
   Operators originally handled the error path the same way: via an 
`IterOutcome` of `STOP`. Error handling was somewhat complex:
   
   * Set a failed flag in the operator,
   * Tell upstream operators they have failed,
   * Consuming data from the upstream until it returns `NONE`,
   * Originally, sometimes calling `close()`,
   * Returning a STOP status.
   
   Each operator implemented some variation or subset of the above. Early on, 
the error protocol was source of errors. I actually wrote a detailed analysis 
of the issues several years ago, and have been chipping away at fixing the 
problems ever since.
   
   Today, the error path mostly works; is simply a source of highly complex 
code. And, as it turns out, entirely unnecessary.
   
   Drill has long supported a "fail-fast" error path based on throwing an 
exception; relying on the fragment executor to clean up the operator stack. The 
"fail-fast" system is simple (throw an exception) and has a uniform way of 
cleaning up (call `close()` once on each operator.) Recent revisions have 
converted most operators to use the simpler fail-fast strategy based on 
throwing an exception instead of using the older STOP approach.
   
   The careful reader of those PRs will have noted that, as a result, no code 
returns the `STOP` status any longer.  This PR goes to the next logical step 
and removes the old, complex, and now-unused `STOP` based path.
   
   There is a related mechanism in operators: the `kill()` method, and its 
implementation, `killIncoming()`. The original purpose appears to be that step 
above: when an operator fails, fail all its children. Things got messy when an 
operator received a `STOP` status: should it call `kill()` on its child? On 
itself? Lots of fun bugs to fix back in the day.
   
   With `STOP` retired, the main purpose of the `kill()` method disappears. 
There is, however, a second use that we retain: cancelling upstream operators 
in the "normal" case. Think of the LIMIT operator: once a `LIMIT 10` has 10 
rows, it need not ask for more. A Parquet reader, say, may still be busily 
reading columns in worker threads. The `LIMIT` wants to tell its upstream 
operators, "hey, I have all the rows I need, thanks. You can go ahead and stop 
reading more data." That is, we need a "normal case" cancellation.
   
   To make clear that `kill()` now does cancellation, this PR renames the 
method to `cancel()`.
   
   ## Documentation
   
   This is not a user-visible change, it is purely internal to the execution 
engine. (The one user benefit is that error messages might be a bit more 
precise because of earlier work.)
   
   ## Testing
   
   Reran all unit tests which revealed a few complexities that remain, such as 
the way operators handle the `cancel()` call. All tests pass. The next step is 
to run the full test suite as part of pre-commit.
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

Reply via email to