[GitHub] [spark] LantaoJin opened a new pull request #25840: [SPARK-29166][SQL] Add parameters to limit the number of dynamic partitions for data source table

GitBox Sat, 12 Sep 2020 22:29:10 -0700


LantaoJin opened a new pull request #25840:
URL: https://github.com/apache/spark/pull/25840



   ### What changes were proposed in this pull request?
   Dynamic partition in Hive table has some restrictions like limitation of the 
max number of partitions. 
   
    Configuration |  Default |  Note
   -- | -- | --
    hive.exec.max.dynamic.partitions.pernode | 100  |  Maximum number of 
dynamic partitions allowed to be created in each mapper/reducer node
    hive.exec.max.dynamic.partitions | 1000 |  Maximum number of dynamic 
partitions allowed to be created in total
   hive.exec.max.created.files | 100000 | Maximum number of HDFS files created 
by all mappers/reducers in a MapReduce job
   
   Ref 
[DynamicPartitionInserts](https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DML#LanguageManualDML-DynamicPartitionInserts)
 and 
[Tutorial-Dynamic-PartitionInsert](https://cwiki.apache.org/confluence/display/Hive/Tutorial#Tutorial-Dynamic-PartitionInsert)
   
   It's very useful to prevent to create mistake partitions like ID. Also it 
can protect the NameNode from mass RPC calls of creating.
   
   Data source table also needs similar limitations.
   
   ### Why are the changes needed?
   Add a parameter to limit the number of dynamic partitions for data source 
table.
   By default, the max number of partitions is Int.MaxValue. It's nearly no 
limitation.
   When the parameter value we set is reached, it will throw SparkException and 
abort the job.
   
    Configuration |  Default |  Note
   -- | -- | --
    spark.sql.dynamic.partition.maxPartitionsPerTask | Int.MaxValue  |  Maximum 
number of dynamic partitions allowed to be created in each task
    spark.sql.dynamic.partition.maxPartitions | Int.MaxValue |  Maximum total 
number of dynamic partitions allowed to be created by one DML
   spark.sql.dynamic.partition.maxCreatedFiles | Int.MaxValue | Maximum total 
number of files allowed to be created in dynamic partitions write by one DML
   
   ### Does this PR introduce any user-facing change?
   No.
   
   
   ### How was this patch tested?
   Add a unit test.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] LantaoJin opened a new pull request #25840: [SPARK-29166][SQL] Add parameters to limit the number of dynamic partitions for data source table

Reply via email to