Re: hiveContext sql number of tasks

2015-10-07 Thread Deng Ching-Mallete
Hi,

You can do coalesce(N), where N is the number of partitions you want it
reduced to, after loading the data into an RDD.

HTH,
Deng

On Wed, Oct 7, 2015 at 6:34 PM, patcharee  wrote:

> Hi,
>
> I do a sql query on about 10,000 partitioned orc files. Because of the
> partition schema the files cannot be merged any longer (to reduce the total
> number).
>
> From this command hiveContext.sql(sqlText), the 10K tasks were created to
> handle each file. Is it possible to use less tasks? How to force the spark
> sql to use less tasks?
>
> BR,
> Patcharee
>
> -
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
>
>


hiveContext sql number of tasks

2015-10-07 Thread patcharee

Hi,

I do a sql query on about 10,000 partitioned orc files. Because of the 
partition schema the files cannot be merged any longer (to reduce the 
total number).


From this command hiveContext.sql(sqlText), the 10K tasks were created 
to handle each file. Is it possible to use less tasks? How to force the 
spark sql to use less tasks?


BR,
Patcharee

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org