Hi Girish,

You can implement a custom state store provider by implementing the
StateStore
trait (
https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/StateStore.scala
)
and setting the correct Spark configuration accordingly:

spark.conf.set(
  "spark.sql.streaming.stateStore.providerClass",
  "com.example.path.to.CustomStateStore")


See also
https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/HDFSBackedStateStoreProvider.scala
for
the default implementation that is used.

Hope this helps!

Stefan


On Tue, Jul 10, 2018 at 7:06 AM subramgr <subramanian.gir...@gmail.com>
wrote:

> Hi,
> Currently we are using HDFS for our checkpointing but we are having issues
> maintaining a HDFS cluster.
>
> We tried glusterfs in the past for checkpointing but in our setup glusterfs
> does not work well.
>
> We are evaluating using Cassandra for storing the checkpoint data. Has any
> one implemented *StateStoreProvider* any blogs or articles which describe
> how to create our own *checkpointing* implementation
>
> Thanks
>
>
>
>
> --
> Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/
>
> ---------------------------------------------------------------------
> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
>
> --
Stefan van Wouw
Databricks Inc.
stefan.vanw...@databricks.com

databricks.com

[image: http://databricks.com] <http://databricks.com/>


[image: https://databricks.com/sparkaisummit/eu]

Reply via email to