gaoyunhaii opened a new pull request #17:
URL: https://github.com/apache/flink-ml/pull/17


   This PR implements the checkpoint mechanism for the iteration. The target of 
the checkpoint mechanism is to ensure
   1. The record processing is consistent with the state, which is the same to 
the normal checkpoints without feedback edges.
   2. The notification of epoch incremented is exactly-once. 
   
   The checkpoints relies on the reference count mechanism to include the 
feedback records into snapshots. Besides, it also take cares of the state for 
the controller operator / all-round wrappers and per-round wrappers. At this 
version it introduce some limitation in that it does not allows for the all the 
operators inside the iteration to change parallelism after restart from 
checkpoint. For the long run, the condition could be relaxed to
   1. Unbounded iteration could rescale freely.
   2. Bounded iteration could rescale all-round / per-round operators if they 
are restored from a savepoint / external checkpoint taken after round 0 is 
fully finished. However, the controller operator (Head & Tail) could not 
support rescaling.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Reply via email to