Re: sqlCtx.read.parquet yields lots of small tasks

2016-05-10 Thread Johnny W.
Thanks, Ashish. I've created a JIRA: https://issues.apache.org/jira/browse/SPARK-15247 Best, J. On Sun, May 8, 2016 at 7:07 PM, Ashish Dubey wrote: > I see the behavior - so it always goes with min total tasks possible on > your settings ( num-executors * num-cores ) - however if you use a huge

Re: sqlCtx.read.parquet yields lots of small tasks

2016-05-08 Thread Ashish Dubey
I see the behavior - so it always goes with min total tasks possible on your settings ( num-executors * num-cores ) - however if you use a huge amount of data then you will see more tasks - that means it has some kind of lower bound on num-tasks.. It may require some digging. other formats did not

Re: sqlCtx.read.parquet yields lots of small tasks

2016-05-08 Thread Johnny W.
The file size is very small (< 1M). The stage launches every time i call: -- sqlContext.read.parquet(path_to_file) These are the parquet specific configurations I set: -- spark.sql.parquet.filterPushdown: true spark.sql.parquet.mergeSchema: true Thanks, J. On Sat, May 7, 2016 at 4:20 PM, Ashish

Re: sqlCtx.read.parquet yields lots of small tasks

2016-05-07 Thread Ashish Dubey
How big is your file and can you also share the code snippet On Saturday, May 7, 2016, Johnny W. wrote: > hi spark-user, > > I am using Spark 1.6.0. When I call sqlCtx.read.parquet to create a > dataframe from a parquet data source with a single parquet file, it yields > a stage with lots of sma