Re: [Structured Streaming] Custom StateStoreProvider

2018-07-12 Thread Jungtaek Lim
Girish,

I think reading through implementation of HDFSBackedStateStoreProvider as
well as relevant traits should bring the idea to you how to implement
custom one. HDFSBackedStateStoreProvider is not that complicated to read
and understand. You just need to deal with your underlying storage engine.

Tathagata,

Is it planned to turn StateStore and relevant traits into public API? We
have two annotations (InterfaceStability and Experimental) to represent
evolving public API, and state store provider can be plugged-in so sounds
better to make it being public API but marking as evolving.

2018년 7월 11일 (수) 오후 12:40, Tathagata Das 님이 작성:

> Note that this is not public API yet. Hence this is not very documented.
> So use it at your own risk :)
>
> On Tue, Jul 10, 2018 at 11:04 AM, subramgr 
> wrote:
>
>> Hi,
>>
>> This looks very daunting *trait* is there some blog post or some articles
>> which explains on how to implement this *trait*
>>
>> Thanks
>> Girish
>>
>>
>>
>> --
>> Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/
>>
>> -
>> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
>>
>>
>


Re: [Structured Streaming] Custom StateStoreProvider

2018-07-10 Thread Tathagata Das
Note that this is not public API yet. Hence this is not very documented. So
use it at your own risk :)

On Tue, Jul 10, 2018 at 11:04 AM, subramgr 
wrote:

> Hi,
>
> This looks very daunting *trait* is there some blog post or some articles
> which explains on how to implement this *trait*
>
> Thanks
> Girish
>
>
>
> --
> Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/
>
> -
> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
>
>


Re: [Structured Streaming] Custom StateStoreProvider

2018-07-10 Thread subramgr
Hi, 

This looks very daunting *trait* is there some blog post or some articles
which explains on how to implement this *trait*

Thanks
Girish



--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/

-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org



Re: [Structured Streaming] Custom StateStoreProvider

2018-07-10 Thread Stefan Van Wouw
Hi Girish,

You can implement a custom state store provider by implementing the
StateStore
trait (
https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/StateStore.scala
)
and setting the correct Spark configuration accordingly:

spark.conf.set(
  "spark.sql.streaming.stateStore.providerClass",
  "com.example.path.to.CustomStateStore")


See also
https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/HDFSBackedStateStoreProvider.scala
for
the default implementation that is used.

Hope this helps!

Stefan


On Tue, Jul 10, 2018 at 7:06 AM subramgr 
wrote:

> Hi,
> Currently we are using HDFS for our checkpointing but we are having issues
> maintaining a HDFS cluster.
>
> We tried glusterfs in the past for checkpointing but in our setup glusterfs
> does not work well.
>
> We are evaluating using Cassandra for storing the checkpoint data. Has any
> one implemented *StateStoreProvider* any blogs or articles which describe
> how to create our own *checkpointing* implementation
>
> Thanks
>
>
>
>
> --
> Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/
>
> -
> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
>
> --
Stefan van Wouw
Databricks Inc.
stefan.vanw...@databricks.com

databricks.com

[image: http://databricks.com] 


[image: https://databricks.com/sparkaisummit/eu]