Re: window analysis with Spark and Spark streaming

Laeeq Ahmed Wed, 09 Jul 2014 07:11:34 -0700

Hi,

For QueueRDD, have a look here.
https://github.com/apache/spark/blob/master/examples/src/main/scala/org/apache/spark/examples/streaming/QueueStream.scala


Regards,
Laeeq,
PhD candidatte,
KTH, Stockholm.

 


On Sunday, July 6, 2014 10:20 AM, alessandro finamore 
<alessandro.finam...@polito.it> wrote:
 


On 5 July 2014 23:08, Mayur Rustagi [via Apache Spark User List] 
<[hidden email]> wrote: 
> Key idea is to simulate your app time as you enter data . So you can connect 
> spark streaming to a queue and insert data in it spaced by time. Easier said 
> than done :). 

I see. 
I'll try to implement also this solution so that I can compare it with 
my current spark implementation. 
I'm interested in seeing if this is faster...as I assume it should be :) 

> What are the parallelism issues you are hitting with your 
> static approach. 

In my current spark implementation, whenever I need to get the 
aggregated stats over the window, I'm re-mapping all the current bins 
to have the same key so that they can be reduced altogether. 
This means that data need to shipped to a single reducer. 
As results, adding nodes/cores to the application does not really 
affect the total time :( 


> 
> 
> On Friday, July 4, 2014, alessandro finamore <[hidden email]> wrote: 
>> 
>> Thanks for the replies 
>> 
>> What is not completely clear to me is how time is managed. 
>> I can create a DStream from file. 
>> But if I set the window property that will be bounded to the application 
>> time, right? 
>> 
>> If I got it right, with a receiver I can control the way DStream are 
>> created. 
>> But, how can apply then the windowing already shipped with the framework 
>> if 
>> this is bounded to the "application time"? 
>> I would like to do define a window of N files but the window() function 
>> requires a duration as input... 
>> 
>> 
>> 
>> 
>> -- 
>> View this message in context: 
>> http://apache-spark-user-list.1001560.n3.nabble.com/window-analysis-with-Spark-and-Spark-streaming-tp8806p8824.html
>> 
>> Sent from the Apache Spark User List mailing list archive at Nabble.com. 
> 
> 
> 
> -- 
> Sent from Gmail Mobile 
> 
> 
> ________________________________ 
> If you reply to this email, your message will be added to the discussion 
> below: 
> http://apache-spark-user-list.1001560.n3.nabble.com/window-analysis-with-Spark-and-Spark-streaming-tp8806p8860.html
> To unsubscribe from window analysis with Spark and Spark streaming, click 
> here. 
> NAML 


-- 
-------------------------------------------------- 
Alessandro Finamore, PhD 
Politecnico di Torino 
-- 
Office:    +39 0115644127 
Mobile:   +39 3280251485 
SkypeId: alessandro.finamore 
--------------------------------------------------- 

________________________________
 View this message in context: Re: window analysis with Spark and Spark 
streaming

Sent from the Apache Spark User List mailing list archive at Nabble.com.

Re: window analysis with Spark and Spark streaming

Reply via email to