Ok. Spark 1.4.1 on yarn Here is my application I have 4 different Kafka topics(different object streams)
type Edge = (String,String) val a = KafkaUtils.createDirectStream[...](sc,"A",params).filter( nonEmpty ).map( toEdge ) val b = KafkaUtils.createDirectStream[...](sc,"B",params).filter( nonEmpty ).map( toEdge ) val c = KafkaUtils.createDirectStream[...](sc,"C",params).filter( nonEmpty ).map( toEdge ) val u = a union b union c val source = u.window(Seconds(600), Seconds(10)) val z = KafkaUtils.createDirectStream[...](sc,"Z",params).filter( nonEmpty ).map( toEdge ) val joinResult = source.rightOuterJoin( z ) joinResult.foreachRDD { rdd=> rdd.foreachPartition { partition => .... // save to result topic in kafka } } The 'window' function in the code above is constantly growing, no matter how many events appeared in corresponding kafka topics but if I change one line from val source = u.window(Seconds(600), Seconds(10)) to val partitioner = ssc.sparkContext.broadcast(new HashPartitioner(8)) val source = u.transform(_.partitionBy(partitioner.value) ).window(Seconds(600), Seconds(10)) Everything works perfect. Perhaps the problem was in WindowedDStream I forced to use PartitionerAwareUnionRDD( partitionBy the same partitioner ) instead of UnionRDD. Nonetheless I did not see any hints about such a bahaviour in doc. Is it a bug or absolutely normal behaviour? 08.09.2015, 17:03, "Cody Koeninger" <c...@koeninger.org>: > Can you provide more info (what version of spark, code example)? > > On Tue, Sep 8, 2015 at 8:18 AM, Alexey Ponkin <alexey.pon...@ya.ru> wrote: >> Hi, >> >> I have an application with 2 streams, which are joined together. >> Stream1 - is simple DStream(relativly small size batch chunks) >> Stream2 - is a windowed DStream(with duration for example 60 seconds) >> >> Stream1 and Stream2 are Kafka direct stream. >> The problem is that according to logs window operation is constantly >> increasing(<a >> href="http://en.zimagez.com/zimage/screenshot2015-09-0816-06-55.php">screen</a>). >> And also I see gap in pocessing window(<a >> href="http://en.zimagez.com/zimage/screenshot2015-09-0816-10-22.php">screen</a>) >> in logs there are no events in that period. >> So what is happen in that gap and why window is constantly insreasing? >> >> Thank you in advance >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org >> For additional commands, e-mail: user-h...@spark.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org