Alex created FLINK-13205:
----------------------------

             Summary: Checkpoints/savepoints injection has loose ordering 
properties when a stop-with-savepoint is triggered
                 Key: FLINK-13205
                 URL: https://issues.apache.org/jira/browse/FLINK-13205
             Project: Flink
          Issue Type: Bug
          Components: Runtime / Checkpointing
    Affects Versions: 1.9.0
            Reporter: Alex
            Assignee: Alex


When a stop-with-savepoint is triggered at a source task, the task's dispatcher 
({{Task.asyncCallDispatcher}})'s thread pool is extended (from single-threaded, 
it becomes multi-threaded).

This leads to a race of applying consequent checkpoints/savepoints from 
dispatcher's queue at the same time and checkpoints/savepoints would be not 
strictly ordered in the event stream.

As the result, checkpoints/savepoints that injected later than they should, may 
be "silently subsumed": potentially, they would be ignored and won't be 
reported to checkpoint coordinator.

*Proposed solution:*

Revert {{Task.asyncCallDispatcher}} behavior to be single-threaded.
For stop-with-savepoint feature, the dispatcher's thread that performs the 
synchronous savepoint doesn't need to be blocking and 
{{StreamTask.finishTask()}} invocation can be delegated to 
{{StreamTask.notifyCheckpointComplete()}}.

*Note:* imo, the issue described here is not critical, but the proposed change 
should simplify implementation. This ticket can be considered as enhancement.




--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

Reply via email to