LantaoJin opened a new pull request #25840: URL: https://github.com/apache/spark/pull/25840
### What changes were proposed in this pull request? Dynamic partition in Hive table has some restrictions like limitation of the max number of partitions. Configuration | Default | Note -- | -- | -- hive.exec.max.dynamic.partitions.pernode | 100 | Maximum number of dynamic partitions allowed to be created in each mapper/reducer node hive.exec.max.dynamic.partitions | 1000 | Maximum number of dynamic partitions allowed to be created in total hive.exec.max.created.files | 100000 | Maximum number of HDFS files created by all mappers/reducers in a MapReduce job Ref [DynamicPartitionInserts](https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DML#LanguageManualDML-DynamicPartitionInserts) and [Tutorial-Dynamic-PartitionInsert](https://cwiki.apache.org/confluence/display/Hive/Tutorial#Tutorial-Dynamic-PartitionInsert) It's very useful to prevent to create mistake partitions like ID. Also it can protect the NameNode from mass RPC calls of creating. Data source table also needs similar limitations. ### Why are the changes needed? Add a parameter to limit the number of dynamic partitions for data source table. By default, the max number of partitions is Int.MaxValue. It's nearly no limitation. When the parameter value we set is reached, it will throw SparkException and abort the job. Configuration | Default | Note -- | -- | -- spark.sql.dynamic.partition.maxPartitionsPerTask | Int.MaxValue | Maximum number of dynamic partitions allowed to be created in each task spark.sql.dynamic.partition.maxPartitions | Int.MaxValue | Maximum total number of dynamic partitions allowed to be created by one DML spark.sql.dynamic.partition.maxCreatedFiles | Int.MaxValue | Maximum total number of files allowed to be created in dynamic partitions write by one DML ### Does this PR introduce any user-facing change? No. ### How was this patch tested? Add a unit test. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
