We don’t provide any Kubernetes-specific mechanisms for streaming, such as checkpointing to persistent volumes. But as long as streaming doesn’t require persisting to the executor’s local disk, streaming ought to work out of the box. E.g. you can checkpoint to HDFS, but not to the pod’s local directories.
However, I’m unaware of any specific use of streaming with the Spark on Kubernetes integration right now. Would be curious to get feedback on the failover behavior right now. -Matt Cheah From: Tathagata Das <[email protected]> Date: Friday, April 13, 2018 at 1:27 AM To: Krishna Kalyan <[email protected]> Cc: user <[email protected]> Subject: Re: Structured Streaming on Kubernetes Structured streaming is stable in production! At Databricks, we and our customers collectively process almost 100s of billions of records per day using SS. However, we are not using kubernetes :) Though I don't think it will matter too much as long as kubes are correctly provisioned+configured and you are checkpointing to HDFS (for fault-tolerance guarantees). TD On Fri, Apr 13, 2018, 12:28 AM Krishna Kalyan <[email protected]> wrote: Hello All, We were evaluating Spark Structured Streaming on Kubernetes (Running on GCP). It would be awesome if the spark community could share their experience around this. I would like to know more about you production experience and the monitoring tools you are using. Since spark on kubernetes is a relatively new addition to spark, I was wondering if structured streaming is stable in production. We were also evaluating Apache Beam with Flink. Regards, Krishna
smime.p7s
Description: S/MIME cryptographic signature
