Github user matuskik commented on the pull request:
https://github.com/apache/spark/pull/4875#issuecomment-77586406
@jerryshao:
- Yes, the purpose is to be able to recover when a checkpoint recovery is
not possible.
- I agree, re-streaming a day worth of data into the window would also fill
the window. The problem is that the day worth of data would have to be
re-streamed on app start up in one batch and it would persist in the window for
a whole day before going out of scope. This means the window would at one point
hold double the amount of data (two days) before the one big batch of data goes
out of scope.
So what I'm doing in my application is creating `CassandraJavaRDD`'s via
the `spark-cassandra-connector` library and passing these RDDs as the initial
window data.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]