number of partitions for hive schemaRDD

2015-02-26 Thread masaki rikitoku
Hi all now, I'm trying the SparkSQL with hivecontext. when I execute the hql like the following. --- val ctx = new org.apache.spark.sql.hive.HiveContext(sc) import ctx._ val queries = ctx.hql(select keyword from queries where dt = '2015-02-01' limit 1000) --- It seem that the number of

Re: number of partitions for hive schemaRDD

2015-02-26 Thread Cheng Lian
Hi Masaki, I guess what you saw is the partition number of the last stage, which must be 1 to perform the global phase of LIMIT. To tune partition number of normal shuffles like joins, you may resort to spark.sql.shuffle.partitions. Cheng On 2/26/15 5:31 PM, masaki rikitoku wrote: Hi all