wangqia0309 opened a new pull request #32331: URL: https://github.com/apache/spark/pull/32331
in most case, users write data to hive table or hdfs dir with spark sql, since as spark3.0 released, offical didn't encourge to use hive module to read/write hive table, preferred switching to datasoruce api from hive strategy rule, so as to centralize io operation with one module. so given a general auto merge output files ability for datasource api would resolve many users's small files problem in production, and it can bind with datasource write framwork tightly, so that the auto merge course is transparent to users, and it is capable to handle all kinds of writing method, such as writing hdfs dir/non-partitioned hive table/dynamic partition hive table this is my individual implemetation for the functionality, and it's stable in production environment of my company -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
