I find the time window operator is a bit confusing. Can someone clarify if the following are equivalent ?
Case 1 ====== dstreamB = dstreamA.reduceByKeyAndWindow(func, 10, 2) Is it the same as ? dsTmp = dstreamA.window(10, 2) dstreamB = dsTmp.reduceByKey(func) Case 2 ====== ssc = new StreamingContext(..., 10) dstreamA = ssc.socketStream(...) Is it the same as ? ssc = new StreamingContext(..., 1) dsTmp = ssc.socketStream(...) dstreamA = dsTmp.window(10, 10) What is the difference between map() and mapPartition() ? Are they both processing data within each partition without data shuffling ? Rgds, Ricky -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Time-window-size-in-Spark-Streaming-tp836.html Sent from the Apache Spark User List mailing list archive at Nabble.com.
