[GitHub] spark pull request: [SPARK-4964] [Streaming] Exactly-once semantic...

harishreedharan Thu, 22 Jan 2015 15:34:11 -0800

Github user harishreedharan commented on the pull request:

    https://github.com/apache/spark/pull/3798#issuecomment-71121466
  
    OK. 
    
    Just a thought: Do you think there might be a way to avoid the spikes? Once 
the current RDD is checkpointed, create a "new" pending RDD, which continuously 
receives data, until the compute method is called. When compute gets called, 
the last offset we received can be considered to be the upper bound, and the 
data is now available for transformations. That way, we could spread out 
network transfers from Kafka over a larger period.
    
    Not sure if there are holes in that algorithm, but it looks almost 
equivalent to the current model, no?



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request: [SPARK-4964] [Streaming] Exactly-once semantic...

Reply via email to