Re: persistent state for spark streaming

Yana Kadiyska Thu, 02 Oct 2014 06:40:15 -0700

Yes -- persist is more akin to caching -- it's telling Spark to materialize
that RDD for fast reuse but it's not meant for the end user to query/use
across processes, etc.(at least that's my understanding).


On Thu, Oct 2, 2014 at 4:04 AM, Chia-Chun Shih <chiachun.s...@gmail.com>
wrote:

> Hi Yana,
>
> So, user quotas need another data store, which can guarantee persistence
> and afford frequent data updates/access. Is it correct?
>
> Thanks,
> Chia-Chun
>
> 2014-10-01 21:48 GMT+08:00 Yana Kadiyska <yana.kadiy...@gmail.com>:
>
>> I don't think persist is meant for end-user usage. You might want to call
>> saveAsTextFiles, for example, if you're saving to the file system as
>> strings. You can also dump the DStream to a DB -- there are samples on this
>> list (you'd have to do a combo of foreachRDD and mapPartitions, likely)
>>
>> On Wed, Oct 1, 2014 at 3:49 AM, Chia-Chun Shih <chiachun.s...@gmail.com>
>> wrote:
>>
>>> Hi,
>>>
>>> My application is to digest user logs and deduct user quotas. I need to
>>> maintain latest states of user quotas persistently, so that latest user
>>> quotas will not be lost.
>>>
>>> I have tried *updateStateByKey* to generate and a DStream for user
>>> quotas and called *persist(StorageLevel.MEMORY_AND_DISK())*, but it
>>> didn't work.
>>>
>>> Are there better approaches to persist states for spark streaming?
>>>
>>> Thanks.
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>
>>
>

Re: persistent state for spark streaming

Reply via email to