[GitHub] [spark] caican00 opened a new pull request #35764: [SPARK-38444][SQL]Automatically calculate the upper and lower bounds of partitions when no specified partition related params

GitBox Tue, 08 Mar 2022 00:47:33 -0800


caican00 opened a new pull request #35764:
URL: https://github.com/apache/spark/pull/35764



   ### What changes were proposed in this pull request?
   when access rdbms, such as mysql, this patch can automatically calculate 
upper and lower bounds according to the primary key to improve parallelism and 
speed up query.
   
   ### Why are the changes needed?
   when access rdbms, such as mysql, if partitionColumn, lowerBound, 
upperBound, numPartitions are not specified, by default only one partition to 
scan database is working. 
   
   It makes load data from database slow and makes it difficult for users to 
configure multiple parameters to improve parallelism.
   
   This patch can automatically calculate upper and lower bounds according to 
the primary key to improve parallelism and speed up query.
   
   ### Does this PR introduce _any_ user-facing change?
   yes. new config `defaultNumPartitions` in JDBCOptions. It is used to set the 
default parallelism.
   
   
   ### How was this patch tested?
   new testing
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] caican00 opened a new pull request #35764: [SPARK-38444][SQL]Automatically calculate the upper and lower bounds of partitions when no specified partition related params

Reply via email to