Re: Spark-SQL 1.2.0 "sort by" results are not consistent with Hive

2015-02-26 Thread Cheng Lian
Could you check the Spark web UI for the number of tasks issued when the query is executed? I digged out |mapred.map.tasks| because I saw 2 tasks were issued. On 2/26/15 3:01 AM, Kannan Rajah wrote: Cheng, We tried this setting and it still did not help. This was on Spark 1.2.0. -- Kannan

RE: Spark-SQL 1.2.0 "sort by" results are not consistent with Hive

2015-02-25 Thread Cheng, Hao
eng Lian Cc: user@spark.apache.org Subject: RE: Spark-SQL 1.2.0 "sort by" results are not consistent with Hive How many reducers you set for Hive? With small data set, Hive will run in local mode, which will set the reducer count always as 1. From: Kannan Rajah [mailto:kra...@maprt

RE: Spark-SQL 1.2.0 "sort by" results are not consistent with Hive

2015-02-25 Thread Cheng, Hao
"sort by" results are not consistent with Hive Cheng, We tried this setting and it still did not help. This was on Spark 1.2.0. -- Kannan On Mon, Feb 23, 2015 at 6:38 PM, Cheng Lian mailto:lian.cs@gmail.com>> wrote: (Move to user list.) Hi Kannan, You need to set mapred.m

Re: Spark-SQL 1.2.0 "sort by" results are not consistent with Hive

2015-02-25 Thread Kannan Rajah
Cheng, We tried this setting and it still did not help. This was on Spark 1.2.0. -- Kannan On Mon, Feb 23, 2015 at 6:38 PM, Cheng Lian wrote: > (Move to user list.) > > Hi Kannan, > > You need to set mapred.map.tasks to 1 in hive-site.xml. The reason is this > line of code >

Re: Spark-SQL 1.2.0 "sort by" results are not consistent with Hive

2015-02-23 Thread Cheng Lian
(Move to user list.) Hi Kannan, You need to set |mapred.map.tasks| to 1 in hive-site.xml. The reason is this line of code , which overrides |spark.default.parallelism|. Also,