Hi,

I found spark 1.5 sorting is very slow compared to spark 1.4. Below is my code snippet

val sqlRDD = sql("select date, u, v, z from fino3_hr3 where zone == 2 and z >= 2 and z <= order by date, z")
    println("sqlRDD " + sqlRDD.count())

The fino3_hr3 (in the sql command) is a hive table in orc format, partitioned by zone and z.

Spark 1.5 takes 4.5 mins to execute this sql, while spark 1.4 takes 1.5 mins. I noticed that dissimilar to spark 1.4 when spark 1.5 sorted, data was shuffled into few tasks, not divided for all tasks. Do I need to set any configuration explicitly? Any suggestions?

BR,
Patcharee

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Reply via email to