yaooqinn opened a new pull request #31492:
URL: https://github.com/apache/spark/pull/31492


   
   Backport  #31460 to 3.0
   
   
   ### Why are the changes needed?
   In many real-world cases, when interacting with hive catalog through Spark 
SQL, users may just share the `hive-site.xml` for their hive jobs and make a 
copy to `SPARK_HOME`/conf w/o modification. In Spark, when we generate Hadoop 
configurations, we will use `spark.buffer.size(65536)` to reset 
`io.file.buffer.size(4096)`. But when we load the hive-site.xml, we may ignore 
this behavior and reset `io.file.buffer.size` again according to 
`hive-site.xml`.
   
   1. The configuration priority for setting Hadoop and Hive config here is not 
right, while literally, the order should be `spark > spark.hive > spark.hadoop 
> hive > hadoop`
   
   2. This breaks `spark.buffer.size` congfig's behavior for tuning the IO 
performance w/ HDFS if there is an existing `io.file.buffer.size` in 
hive-site.xml
   
   bugfix for configuration behavior and fix performance regression by that 
behavior change
   ### Does this PR introduce _any_ user-facing change?
   
   this pr restores silent user face change
   ### How was this patch tested?
   
   new tests
   
   
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to