Yes -- persist is more akin to caching -- it's telling Spark to materialize that RDD for fast reuse but it's not meant for the end user to query/use across processes, etc.(at least that's my understanding).
On Thu, Oct 2, 2014 at 4:04 AM, Chia-Chun Shih <chiachun.s...@gmail.com> wrote: > Hi Yana, > > So, user quotas need another data store, which can guarantee persistence > and afford frequent data updates/access. Is it correct? > > Thanks, > Chia-Chun > > 2014-10-01 21:48 GMT+08:00 Yana Kadiyska <yana.kadiy...@gmail.com>: > >> I don't think persist is meant for end-user usage. You might want to call >> saveAsTextFiles, for example, if you're saving to the file system as >> strings. You can also dump the DStream to a DB -- there are samples on this >> list (you'd have to do a combo of foreachRDD and mapPartitions, likely) >> >> On Wed, Oct 1, 2014 at 3:49 AM, Chia-Chun Shih <chiachun.s...@gmail.com> >> wrote: >> >>> Hi, >>> >>> My application is to digest user logs and deduct user quotas. I need to >>> maintain latest states of user quotas persistently, so that latest user >>> quotas will not be lost. >>> >>> I have tried *updateStateByKey* to generate and a DStream for user >>> quotas and called *persist(StorageLevel.MEMORY_AND_DISK())*, but it >>> didn't work. >>> >>> Are there better approaches to persist states for spark streaming? >>> >>> Thanks. >>> >>> >>> >>> >>> >>> >>> >> >> >