-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org
Yeah use streaming to gather the incoming logs and write to log file then
run a spark job evry 5 minutes to process the counts. Got it. Thanks a
lot.
On 07:07, Mon, 26 Jan 2015 Tobias Pfeiffer wrote:
> Hi,
>
> On Tue, Jan 20, 2015 at 8:16 PM, balu.naren wrote:
>
>> I am a beginner to spark
Hi,
On Tue, Jan 20, 2015 at 8:16 PM, balu.naren wrote:
> I am a beginner to spark streaming. So have a basic doubt regarding
> checkpoints. My use case is to calculate the no of unique users by day. I
> am using reduce by key and window for this. Where my window duration is 24
> hours and slide
streaming with checkpoint
Thank you Jerry,
Does the window operation create new RDDs for each slide duration..? I
am asking this because i see a constant increase in memory even when there is
no logs received.
If not checkpoint is there any alternative that you would suggest.?
On Tue, Jan 20
Maybe you use a wrong approach - try something like hyperloglog or bitmap
structures as you can find them, for instance, in redis. They are much
smaller
Le 22 janv. 2015 17:19, "Balakrishnan Narendran" a
écrit :
> Thank you Jerry,
>Does the window operation create new RDDs for each slide
Thank you Jerry,
Does the window operation create new RDDs for each slide duration..?
I am asking this because i see a constant increase in memory even when
there is no logs received.
If not checkpoint is there any alternative that you would suggest.?
On Tue, Jan 20, 2015 at 7:08 PM, Shao,
Hi,
Seems you have such a large window (24 hours), so the phenomena of memory
increasing is expectable, because of window operation will cache the RDD within
this window in memory. So for your requirement, memory should be enough to hold
the data of 24 hours.
I don't think checkpoint in Spark