+1
This is one of the most common problems we encounter in our flow. Mark, I
am happy to help if you would like to share some of the workload.
Best
Yash
On Wednesday 2 March 2016, Mark Hamstra wrote:
> I don't entirely agree. You're best off picking the right size
I don't entirely agree. You're best off picking the right size :). That's
almost impossible, though, since at the input end of the query processing
you often want a large number of partitions to get sufficient parallelism
for both performance and to avoid spilling or OOM, while at the output end
Hi,
I was wondering what the rationale behind defaulting all repartitioning to
spark.sql.shuffle.partitions is. I’m seeing a huge overhead when running a job
whose input partitions is 2 and, using the default value for
spark.sql.shuffle.partitions, this is now 200. Thanks.
-Teng Fei Liao