[GitHub] spark pull request: [SPARK-6139] [Streaming] Allow pre-populate sl...

matuskik Fri, 06 Mar 2015 08:25:16 -0800

Github user matuskik commented on the pull request:

    https://github.com/apache/spark/pull/4875#issuecomment-77586406
  
    @jerryshao:
    - Yes, the purpose is to be able to recover when a checkpoint recovery is 
not possible.
    - I agree, re-streaming a day worth of data into the window would also fill 
the window. The problem is that the day worth of data would have to be 
re-streamed on app start up in one batch and it would persist in the window for 
a whole day before going out of scope. This means the window would at one point 
hold double the amount of data (two days) before the one big batch of data goes 
out of scope.
    
    So what I'm doing in my application is creating `CassandraJavaRDD`'s via 
the `spark-cassandra-connector` library and passing these RDDs as the initial 
window data.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request: [SPARK-6139] [Streaming] Allow pre-populate sl...

Reply via email to