Github user JoshRosen commented on the issue: https://github.com/apache/spark/pull/21329 In general, I'm very wary of cleanup changes like this: unless we have a _need_ to do this (i.e. it causes negative side effects, breaks workloads, prevents specific concrete improvements, etc.) then the risk of changing longstanding old code outweighs any benefits of "cleanliness". In this specific case, I'm most concerned about the removal of those `jobContext` `Configuration` changes: they might appear superfluous but I'd bet that they were originally added for a reason which made sense at the time the original code was authored. If we're going to remove such old code, I'd like to see a written explanation of why the change is safe which explains the original context for adding that code. I chased through the git blame for the removed lines and the oldest version I found was https://github.com/apache/spark/commit/0595b6de8f1da04baceda082553c2aa1aa2cb006#diff-5c7d283b2f533b6f491dd1845dd86797R299, which is #5526: "[SPARK-3928] [SPARK-5182] [SQL] Partitioning support for the data sources API" by @liancheng. It looks that PR simply copied even older code from either `hiveWriterContainers.scala` or `HadoopRDD`. It looks like these properties actually _did_ have some effect for something at one point in time because https://github.com/apache/spark/pull/101 was fixing some sort of corner-case or bug in that code in the `HadoopRDD` case. If you go even further back, similar lines are present in a commit which predated our merge script (so it's an intermediate commit from @mateiz as part of some larger changeset): https://github.com/apache/spark/commit/0ccfe20755665aa4c347b82e18297c5b3a2284ee#diff-8382b1a276e6cbfae1efb3ee49c7e06eR144 Given this long history, I'd like to flag that change as potentially high-risk: it's not obvious to me that this code is unneeded and if we don't have a strong reason to change it then I'd prefer to leave it as it was before simply to help us manage / reduce risk.
--- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org