Thanks a lot Akhil, after try some suggestions in the tuning guide, there 
seems no improvement at all.

And below is the job detail when running locally(8cores) which took 3min 
to complete the job, we can see it is the map operation took most of time, 
looks like the mapPartitions took too long

Is there any additional idea? Thanks a lot.

Proust




From:   Akhil Das <ak...@sigmoidanalytics.com>
To:     Proust GZ Feng/China/IBM@IBMCN
Cc:     "user@spark.apache.org" <user@spark.apache.org>
Date:   06/15/2015 03:02 PM
Subject:        Re: Spark DataFrame Reduce Job Took 40s for 6000 Rows



Have a look here https://spark.apache.org/docs/latest/tuning.html

Thanks
Best Regards

On Mon, Jun 15, 2015 at 11:27 AM, Proust GZ Feng <pf...@cn.ibm.com> wrote:
Hi, Spark Experts 

I have played with Spark several weeks, after some time testing, a reduce 
operation of DataFrame cost 40s on a cluster with 5 datanode executors. 
And the back-end rows is about 6,000, is this a normal case? Such 
performance looks too bad because in Java a loop for 6,000 rows cause just 
several seconds 

I'm wondering any document I should read to make the job much more fast? 




Thanks in advance 
Proust 

Reply via email to