We don’t provide any Kubernetes-specific mechanisms for streaming, such as 
checkpointing to persistent volumes. But as long as streaming doesn’t require 
persisting to the executor’s local disk, streaming ought to work out of the 
box. E.g. you can checkpoint to HDFS, but not to the pod’s local directories.


However, I’m unaware of any specific use of streaming with the Spark on 
Kubernetes integration right now. Would be curious to get feedback on the 
failover behavior right now.


-Matt Cheah


From: Tathagata Das <t...@databricks.com>
Date: Friday, April 13, 2018 at 1:27 AM
To: Krishna Kalyan <krishnakaly...@gmail.com>
Cc: user <user@spark.apache.org>
Subject: Re: Structured Streaming on Kubernetes


Structured streaming is stable in production! At Databricks, we and our 
customers collectively process almost 100s of billions of records per day using 
SS. However, we are not using kubernetes :) 


Though I don't think it will matter too much as long as kubes are correctly 
provisioned+configured and you are checkpointing to HDFS (for fault-tolerance 




On Fri, Apr 13, 2018, 12:28 AM Krishna Kalyan <krishnakaly...@gmail.com> wrote:

Hello All, 

We were evaluating Spark Structured Streaming on Kubernetes (Running on GCP). 
It would be awesome if the spark community could share their experience around 
this. I would like to know more about you production experience and the 
monitoring tools you are using.


Since spark on kubernetes is a relatively new addition to spark, I was 
wondering if structured streaming is stable in production. We were also 
evaluating Apache Beam with Flink.






Attachment: smime.p7s
Description: S/MIME cryptographic signature

Reply via email to