Thanks, Ashish. I've created a JIRA:
https://issues.apache.org/jira/browse/SPARK-15247
Best,
J.
On Sun, May 8, 2016 at 7:07 PM, Ashish Dubey wrote:
> I see the behavior - so it always goes with min total tasks possible on
> your settings ( num-executors * num-cores ) - however if you use a huge
I see the behavior - so it always goes with min total tasks possible on
your settings ( num-executors * num-cores ) - however if you use a huge
amount of data then you will see more tasks - that means it has some kind
of lower bound on num-tasks.. It may require some digging. other formats
did not
The file size is very small (< 1M). The stage launches every time i call:
--
sqlContext.read.parquet(path_to_file)
These are the parquet specific configurations I set:
--
spark.sql.parquet.filterPushdown: true
spark.sql.parquet.mergeSchema: true
Thanks,
J.
On Sat, May 7, 2016 at 4:20 PM, Ashish
How big is your file and can you also share the code snippet
On Saturday, May 7, 2016, Johnny W. wrote:
> hi spark-user,
>
> I am using Spark 1.6.0. When I call sqlCtx.read.parquet to create a
> dataframe from a parquet data source with a single parquet file, it yields
> a stage with lots of sma