Hi Girish, You can implement a custom state store provider by implementing the StateStore trait ( https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/StateStore.scala ) and setting the correct Spark configuration accordingly:
spark.conf.set( "spark.sql.streaming.stateStore.providerClass", "com.example.path.to.CustomStateStore") See also https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/HDFSBackedStateStoreProvider.scala for the default implementation that is used. Hope this helps! Stefan On Tue, Jul 10, 2018 at 7:06 AM subramgr <subramanian.gir...@gmail.com> wrote: > Hi, > Currently we are using HDFS for our checkpointing but we are having issues > maintaining a HDFS cluster. > > We tried glusterfs in the past for checkpointing but in our setup glusterfs > does not work well. > > We are evaluating using Cassandra for storing the checkpoint data. Has any > one implemented *StateStoreProvider* any blogs or articles which describe > how to create our own *checkpointing* implementation > > Thanks > > > > > -- > Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/ > > --------------------------------------------------------------------- > To unsubscribe e-mail: user-unsubscr...@spark.apache.org > > -- Stefan van Wouw Databricks Inc. stefan.vanw...@databricks.com databricks.com [image: http://databricks.com] <http://databricks.com/> [image: https://databricks.com/sparkaisummit/eu]