caican00 opened a new pull request #35764:
URL: https://github.com/apache/spark/pull/35764


   ### What changes were proposed in this pull request?
   when access rdbms, such as mysql, this patch can automatically calculate 
upper and lower bounds according to the primary key to improve parallelism and 
speed up query.
   
   ### Why are the changes needed?
   when access rdbms, such as mysql, if partitionColumn, lowerBound, 
upperBound, numPartitions are not specified, by default only one partition to 
scan database is working. 
   
   It makes load data from database slow and makes it difficult for users to 
configure multiple parameters to improve parallelism.
   
   This patch can automatically calculate upper and lower bounds according to 
the primary key to improve parallelism and speed up query.
   
   ### Does this PR introduce _any_ user-facing change?
   yes. new config `defaultNumPartitions` in JDBCOptions. It is used to set the 
default parallelism.
   
   
   ### How was this patch tested?
   new testing
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to