Re: Spark DataFrame Reduce Job Took 40s for 6000 Rows

Todd Nist Mon, 15 Jun 2015 08:16:14 -0700

Hi Proust,

Is it possible to see the query  you are running and can you run EXPLAIN
EXTENDED to show the physical plan
for the query.  To generate the plan you can do something like this from
$SPARK_HOME/bin/beeline:


0: jdbc:hive2://localhost:10001> explain extended select * from
YourTableHere;

-Todd

On Mon, Jun 15, 2015 at 10:57 AM, Proust GZ Feng <pf...@cn.ibm.com> wrote:

> Thanks a lot Akhil, after try some suggestions in the tuning guide, there
> seems no improvement at all.
>
> And below is the job detail when running locally(8cores) which took 3min
> to complete the job, we can see it is the map operation took most of time,
> looks like the mapPartitions took too long
>
> Is there any additional idea? Thanks a lot.
>
> Proust
>
>
>
>
> From:        Akhil Das <ak...@sigmoidanalytics.com>
> To:        Proust GZ Feng/China/IBM@IBMCN
> Cc:        "user@spark.apache.org" <user@spark.apache.org>
> Date:        06/15/2015 03:02 PM
> Subject:        Re: Spark DataFrame Reduce Job Took 40s for 6000 Rows
> ------------------------------
>
>
>
> Have a look here *https://spark.apache.org/docs/latest/tuning.html*
> <https://spark.apache.org/docs/latest/tuning.html>
>
> Thanks
> Best Regards
>
> On Mon, Jun 15, 2015 at 11:27 AM, Proust GZ Feng <*pf...@cn.ibm.com*
> <pf...@cn.ibm.com>> wrote:
> Hi, Spark Experts
>
> I have played with Spark several weeks, after some time testing, a reduce
> operation of DataFrame cost 40s on a cluster with 5 datanode executors.
> And the back-end rows is about 6,000, is this a normal case? Such
> performance looks too bad because in Java a loop for 6,000 rows cause just
> several seconds
>
> I'm wondering any document I should read to make the job much more fast?
>
>
>
>
> Thanks in advance
> Proust
>
>

Re: Spark DataFrame Reduce Job Took 40s for 6000 Rows

Reply via email to