Hi: I use spark SQL to read kudu data, and the parallel tasks of spark cannot be increased。
My Kudu table uses range partition table(range ), and the number of hash partitions in each range partition is 6. For example,my kudu table is A, The specific partition range information is as follows: HASH (scan_record_id, stowage_no) PARTITIONS 6, RANGE (creater_time) ( PARTITION "2019-12-01" <= VALUES < "2020-01-01", PARTITION "2020-01-01" <= VALUES < "2020-02-01", PARTITION "2020-02-01" <= VALUES < "2020-03-01", PARTITION "2020-03-01" <= VALUES < "2020-04-01", PARTITION "2020-04-01" <= VALUES < "2020-05-01", PARTITION "2020-05-01" <= VALUES < "2020-06-01", PARTITION "2020-06-01" <= VALUES < "2020-07-01", PARTITION "2020-07-01" <= VALUES < "2020-08-01", PARTITION "2020-08-01" <= VALUES < "2020-09-01", PARTITION "2020-09-01" <= VALUES < "2020-10-01", PARTITION "2020-10-01" <= VALUES < "2020-11-01", PARTITION "2020-11-01" <= VALUES < "2020-12-01", PARTITION "2020-12-01" <= VALUES < "2021-01-01" ) mysql sql is:select * from A where creater_time>'2020-11-05' and create_time<'2020-11-27' When I run spark SQL, the specified number of executors is 20, but the number of saprk executors is still 6,the spark_submit commands is: spark-submit --master yarn --deploy-mode cluster --name test --queue bigdata_pro --conf spark.dynamicAllocation.maxExecutors=20 --executor-cores 1 --executor-memory 8g --driver-memory 8g --class uc.com.Test hdfs://ns1/user/hue/Test.jar Saprk and kudu version: the Spark version is 2.4.0 and kudu version is 1.10.0. In addition to increasing the number of hash partitions under each range partition, is there any way to increase the number of tasks for spark to read kudu data through parameters? Thanks!