Re: Difference among batchDuration, windowDuration, slideDuration

2015-03-18 Thread jaredtims
I think hsy541 is still confused by what is still confusing to me.  Namely,
what is the value that sentence Each RDD in a DStream contains data from a
certain interval is speaking of?  This is from the  Discretized Streams
http://spark.apache.org/docs/latest/streaming-programming-guide.html#discretized-streams-dstreams
  
section.  The example makes it seem like the batchDuration is 4 seconds and
then this mystery interval is 1 second?  Where is this mystery interval
defined?  Or am i missing something altogether?

thanks



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Difference-among-batchDuration-windowDuration-slideDuration-tp9966p22119.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: Difference among batchDuration, windowDuration, slideDuration

2014-07-17 Thread hsy...@gmail.com
Thanks Tathagata, so can I say RDD size(from the stream) is window size.
and the overlap between 2 adjacent RDDs are sliding size.

But I still don't understand what it batch size, why do we need this since
data processing is RDD by RDD right?

And does spark chop the data into RDDs at the very beginning? Do you allow
event by event processing, for example filtering




On Wed, Jul 16, 2014 at 6:47 PM, Tathagata Das tathagata.das1...@gmail.com
wrote:

 I guess this is better explained in the streaming programming guide's
 http://spark.apache.org/docs/latest/streaming-programming-guide.html#transformations
 window operation subsection.

 For completeness sake, its worth mentioning the following. Window
 operations can be applied on other windowed-DStreams as well. So the
 correct thing to say is that the slide duration of the window operations
 must be a multiple of sliding interval of the parent DStream. For simple,
 non-window dstream, this sliding interval is same as the batch interval

 // say batch interval is 2 seconds
 inputstream// moves every batch interval 2
 seconds
 inputstream.window(Seconds(3))  // not allowed, must be multiple of 2
 seconds
 inputstream.window(Seconds(4))  // allowed, moves every 2 seconds
 (therefore sliding interval is 2 seconds)
 inputstream.window(Seconds(10), Seconds(4))// allowed, moves every 4
 seconds (therefore sliding interval is 4 seconds)
 inputstream.window(Seconds(10), Seconds(4)).window(Seconds(6))// not
 allowed, as window interval must be multiple of parent's sliding interval
 which is 4 seconds
 inputstream.window(Seconds(10), Seconds(4)).window(Seconds(8))//
 allowed

 Hopefully that made sense :)

 TD




 On Wed, Jul 16, 2014 at 12:41 PM, Walrus theCat walrusthe...@gmail.com
 wrote:

 I did not!


 On Wed, Jul 16, 2014 at 12:31 PM, aaronjosephs aa...@placeiq.com wrote:

 The only other thing to keep in mind is that window duration and slide
 duration have to be multiples of batch duration, IDK if you made that
 fully
 clear



 --
 View this message in context:
 http://apache-spark-user-list.1001560.n3.nabble.com/Difference-among-batchDuration-windowDuration-slideDuration-tp9966p9973.html
 Sent from the Apache Spark User List mailing list archive at Nabble.com.






Difference among batchDuration, windowDuration, slideDuration

2014-07-16 Thread hsy...@gmail.com
When I'm reading the API of spark streaming, I'm confused by the 3
different durations

StreamingContext(conf: SparkConf
http://spark.apache.org/docs/latest/api/scala/org/apache/spark/SparkConf.html
, batchDuration: Duration
http://spark.apache.org/docs/latest/api/scala/org/apache/spark/streaming/Duration.html
)

DStream window(windowDuration: Duration
http://spark.apache.org/docs/latest/api/scala/org/apache/spark/streaming/Duration.html
, slideDuration: Duration
http://spark.apache.org/docs/latest/api/scala/org/apache/spark/streaming/Duration.html
): DStream
http://spark.apache.org/docs/latest/api/scala/org/apache/spark/streaming/dstream/DStream.html
[T]


Can anyone please explain these 3 different durations


Best,
Siyuan


Re: Difference among batchDuration, windowDuration, slideDuration

2014-07-16 Thread aaronjosephs
The only other thing to keep in mind is that window duration and slide
duration have to be multiples of batch duration, IDK if you made that fully
clear



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Difference-among-batchDuration-windowDuration-slideDuration-tp9966p9973.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.