Hi xiefeng, Even if your RDDs are tiny and reduced to one partition, there is always orchestration overhead (sending tasks to executor(s), reducing results, etc., these things are not free).
If you need fast, [near] real-time processing, look towards spark-streaming. Regards, -- Bedrytski Aliaksandr sp...@bedryt.ski On Mon, Sep 5, 2016, at 04:36, xiefeng wrote: > The spark context will be reused, so the spark context initialization > won't > affect the throughput test. > > > > -- > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/Why-does-spark-take-so-much-time-for-simple-task-without-calculation-tp27628p27657.html > Sent from the Apache Spark User List mailing list archive at Nabble.com. > > --------------------------------------------------------------------- > To unsubscribe e-mail: user-unsubscr...@spark.apache.org > --------------------------------------------------------------------- To unsubscribe e-mail: user-unsubscr...@spark.apache.org