Github user brkyvz commented on the pull request:
https://github.com/apache/spark/pull/9421#issuecomment-155175426
@zsxwing It should be acceptable as well. Think about it like this:
We have 2 receivers, A and B:
t0 -> A receives batch with seq number x_0, B receives batch with seq
number y_0
t1 -> A receives batch with seq number x_1
t2 -> B receives batch with seq number y_1
In the parallel case, if we were to checkpoint at t1 + epsilon, for A we
would checkpoint x_1, and for B, we would checkpoint y_0.
In the sequential case, assume we checkpoint at t1 + epsilon for A, and t2
+ epsilon for B, since there is some time for the checkpoint, then we would
checkpoint x_1 for A, and y_1 for B.
Performance wise it shouldn't be a problem, we do a "Best effort attempt"
in checkpointing to DynamoDB.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]