gaoyunhaii opened a new pull request #17: URL: https://github.com/apache/flink-ml/pull/17
This PR implements the checkpoint mechanism for the iteration. The target of the checkpoint mechanism is to ensure 1. The record processing is consistent with the state, which is the same to the normal checkpoints without feedback edges. 2. The notification of epoch incremented is exactly-once. The checkpoints relies on the reference count mechanism to include the feedback records into snapshots. Besides, it also take cares of the state for the controller operator / all-round wrappers and per-round wrappers. At this version it introduce some limitation in that it does not allows for the all the operators inside the iteration to change parallelism after restart from checkpoint. For the long run, the condition could be relaxed to 1. Unbounded iteration could rescale freely. 2. Bounded iteration could rescale all-round / per-round operators if they are restored from a savepoint / external checkpoint taken after round 0 is fully finished. However, the controller operator (Head & Tail) could not support rescaling. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org