Deegue opened a new pull request #23506: [SPARK-26577][SQL] Add input optimizer when reading Hive table by SparkSQL URL: https://github.com/apache/spark/pull/23506 ## What changes were proposed in this pull request? When using SparkSQL, for example the ThriftServer, if we set `spark.sql.hive.fileInputFormat.enabled=true` we can optimize the InputFormat to CombineTextInputFormat automatically if it's TextInputFormat before. And we can also change the max/min size of input splits by setting, for example `spark.sql.hive.fileInputFormat.split.maxsize=268435456` `spark.sql.hive.fileInputFormat.split.minsize=134217728` Otherwise, we have to modify Hive Configs and structure of tables. And we made a test by using a Hive table with a lot of small files in HDFS and haven't combined : Before improved:  After improved:  ## How was this patch tested? Added a test.
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
