Hi,
You can do coalesce(N), where N is the number of partitions you want it
reduced to, after loading the data into an RDD.
HTH,
Deng
On Wed, Oct 7, 2015 at 6:34 PM, patcharee wrote:
> Hi,
>
> I do a sql query on about 10,000 partitioned orc files. Because of the
> partition schema the files cannot be merged any longer (to reduce the total
> number).
>
> From this command hiveContext.sql(sqlText), the 10K tasks were created to
> handle each file. Is it possible to use less tasks? How to force the spark
> sql to use less tasks?
>
> BR,
> Patcharee
>
> -
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
>
>