spark.sql.shuffle.partitions=auto
Because Apache Spark does not build clusters. This configuration option is
specific to Databricks, with their managed Spark offering. It allows
Databricks to automatically determine an optimal number of shuffle
partitions for your workload.
HTH
Mich Talebzadeh
May i know is spark.sql.shuffle.partitions=auto only available on Databricks?
what about on vanilla Spark ? When i set this, it gives error need to put int.
Any open source library that auto find the best partition , block size for
dataframe?