Spark DataFrame Reduce Job Took 40s for 6000 Rows

Proust GZ Feng Sun, 14 Jun 2015 22:59:23 -0700

Hi, Spark Experts

I have played with Spark several weeks, after some time testing, a reduce 
operation of DataFrame cost 40s on a cluster with 5 datanode executors.
And the back-end rows is about 6,000, is this a normal case? Such 
performance looks too bad because in Java a loop for 6,000 rows cause just 
several seconds


I'm wondering any document I should read to make the job much more fast?




Thanks in advance
Proust

Spark DataFrame Reduce Job Took 40s for 6000 Rows

Reply via email to