It seems that the slow "reduce" tasks are caused by slow shuffling. Here is the logs regarding one slow "reduce" task:
14/06/11 23:42:45 INFO BlockFetcherIterator$BasicBlockFetcherIterator: Got remote block shuffle_69_88_18 after 5029 ms 14/06/11 23:42:45 INFO BlockFetcherIterator$BasicBlockFetcherIterator: Got remote block shuffle_69_89_18 after 5029 ms 14/06/11 23:42:45 INFO BlockFetcherIterator$BasicBlockFetcherIterator: Got remote block shuffle_69_90_18 after 5029 ms 14/06/11 23:42:45 INFO BlockFetcherIterator$BasicBlockFetcherIterator: Got remote block shuffle_69_91_18 after 5029 ms 14/06/11 23:42:45 INFO BlockFetcherIterator$BasicBlockFetcherIterator: Got remote block shuffle_69_92_18 after 5029 ms 14/06/11 23:42:45 INFO BlockFetcherIterator$BasicBlockFetcherIterator: Got remote block shuffle_69_93_18 after 5029 ms 14/06/11 23:42:45 INFO BlockFetcherIterator$BasicBlockFetcherIterator: Got remote block shuffle_69_94_18 after 5029 ms 14/06/11 23:42:45 INFO BlockFetcherIterator$BasicBlockFetcherIterator: Got remote block shuffle_69_95_18 after 5029 ms 14/06/11 23:42:45 INFO BlockFetcherIterator$BasicBlockFetcherIterator: Got remote block shuffle_69_96_18 after 5029 ms 14/06/11 23:42:45 INFO BlockFetcherIterator$BasicBlockFetcherIterator: Got remote block shuffle_69_97_18 after 5029 ms 14/06/11 23:42:45 INFO BlockFetcherIterator$BasicBlockFetcherIterator: Got remote block shuffle_69_188_18 after 5029 ms 14/06/11 23:42:45 INFO BlockFetcherIterator$BasicBlockFetcherIterator: Got remote block shuffle_69_189_18 after 5029 ms 14/06/11 23:42:45 INFO BlockFetcherIterator$BasicBlockFetcherIterator: Got remote block shuffle_69_190_18 after 5029 ms 14/06/11 23:42:45 INFO BlockFetcherIterator$BasicBlockFetcherIterator: Got remote block shuffle_69_191_18 after 5029 ms 14/06/11 23:42:45 INFO BlockFetcherIterator$BasicBlockFetcherIterator: Got remote block shuffle_69_192_18 after 5029 ms 14/06/11 23:42:45 INFO BlockFetcherIterator$BasicBlockFetcherIterator: Got remote block shuffle_69_193_18 after 5029 ms 14/06/11 23:42:45 INFO BlockFetcherIterator$BasicBlockFetcherIterator: Got remote block shuffle_69_194_18 after 5029 ms 14/06/11 23:42:45 INFO BlockFetcherIterator$BasicBlockFetcherIterator: Got remote block shuffle_69_195_18 after 5029 ms 14/06/11 23:42:45 INFO BlockFetcherIterator$BasicBlockFetcherIterator: Got remote block shuffle_69_196_18 after 5029 ms 14/06/11 23:42:45 INFO BlockFetcherIterator$BasicBlockFetcherIterator: Got remote block shuffle_69_197_18 after 5029 ms 14/06/11 23:42:45 INFO Executor: Serialized size of result for 23643 is 1143 14/06/11 23:42:45 INFO Executor: Sending result for 23643 directly to driver 14/06/11 23:42:45 INFO Executor: Finished task ID 23643 -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/How-to-achieve-reasonable-performance-on-Spark-Streaming-tp7262p7454.html Sent from the Apache Spark User List mailing list archive at Nabble.com.