Re: spark streaming with checkpoint

2015-01-25 Thread Balakrishnan Narendran
- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org

Re: spark streaming with checkpoint

2015-01-25 Thread Balakrishnan Narendran
Yeah use streaming to gather the incoming logs and write to log file then run a spark job evry 5 minutes to process the counts. Got it. Thanks a lot. On 07:07, Mon, 26 Jan 2015 Tobias Pfeiffer wrote: > Hi, > > On Tue, Jan 20, 2015 at 8:16 PM, balu.naren wrote: > >> I am a beginner to spark

Re: spark streaming with checkpoint

2015-01-25 Thread Tobias Pfeiffer
Hi, On Tue, Jan 20, 2015 at 8:16 PM, balu.naren wrote: > I am a beginner to spark streaming. So have a basic doubt regarding > checkpoints. My use case is to calculate the no of unique users by day. I > am using reduce by key and window for this. Where my window duration is 24 > hours and slide

RE: spark streaming with checkpoint

2015-01-22 Thread Shao, Saisai
streaming with checkpoint Thank you Jerry, Does the window operation create new RDDs for each slide duration..? I am asking this because i see a constant increase in memory even when there is no logs received. If not checkpoint is there any alternative that you would suggest.? On Tue, Jan 20

Re: spark streaming with checkpoint

2015-01-22 Thread Jörn Franke
Maybe you use a wrong approach - try something like hyperloglog or bitmap structures as you can find them, for instance, in redis. They are much smaller Le 22 janv. 2015 17:19, "Balakrishnan Narendran" a écrit : > Thank you Jerry, >Does the window operation create new RDDs for each slide

Re: spark streaming with checkpoint

2015-01-22 Thread Balakrishnan Narendran
Thank you Jerry, Does the window operation create new RDDs for each slide duration..? I am asking this because i see a constant increase in memory even when there is no logs received. If not checkpoint is there any alternative that you would suggest.? On Tue, Jan 20, 2015 at 7:08 PM, Shao,

RE: spark streaming with checkpoint

2015-01-20 Thread Shao, Saisai
Hi, Seems you have such a large window (24 hours), so the phenomena of memory increasing is expectable, because of window operation will cache the RDD within this window in memory. So for your requirement, memory should be enough to hold the data of 24 hours. I don't think checkpoint in Spark