Re: How to reduce the amount of data that is getting written to the checkpoint from Spark Streaming
Using a long period betweem checkpoints may cause a long linage of the graphs computations to be created, since Spark uses checkpointing to cut it, which can also cause a delay in the streaming job. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/How-to-reduce-the-amount-of-data-that-is-getting-written-to-the-checkpoint-from-Spark-Streaming-tp28798p28820.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe e-mail: user-unsubscr...@spark.apache.org
Re: How to reduce the amount of data that is getting written to the checkpoint from Spark Streaming
You can't. Spark doesn't let you fiddle with the data being checkpoint, as it's an internal implementation detail. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/How-to-reduce-the-amount-of-data-that-is-getting-written-to-the-checkpoint-from-Spark-Streaming-tp28798p28815.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe e-mail: user-unsubscr...@spark.apache.org
How to reduce the amount of data that is getting written to the checkpoint from Spark Streaming
Hi, I have checkpoints enabled in Spark streaming and I use updateStateByKey and reduceByKeyAndWindow with inverse functions. How do I reduce the amount of data that I am writing to the checkpoint or clear out the data that I dont care? Thanks! -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/How-to-reduce-the-amount-of-data-that-is-getting-written-to-the-checkpoint-from-Spark-Streaming-tp28798.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe e-mail: user-unsubscr...@spark.apache.org