Hi, Spark Experts I have played with Spark several weeks, after some time testing, a reduce operation of DataFrame cost 40s on a cluster with 5 datanode executors. And the back-end rows is about 6,000, is this a normal case? Such performance looks too bad because in Java a loop for 6,000 rows cause just several seconds
I'm wondering any document I should read to make the job much more fast? Thanks in advance Proust