Hi I'm working with - Kafka 0.8.2 - Spark Streaming (2.0) direct input stream. - cassandra 3.0
My batch interval is 1s. When I use some map, filter even saveToCassandra functions, the processing time is around 50ms on empty batches => This is fine. As soon as I use some reduceByKey, the processing time is increasing rapidly between 3 and 4s for 3 calls of reduceByKey on empty batches. => Not Good I've found a workaround by using a foreachRDD on DStream and check if rdd is empty before executing the reduceByKey but I find this quite ugly. Do I need to check if RDD is empty on all shuffle operation ? Thanks for your lights