Github user JoshRosen commented on the issue:

    https://github.com/apache/spark/pull/21329
  
    In general, I'm very wary of cleanup changes like this: unless we have a 
_need_ to do this (i.e. it causes negative side effects, breaks workloads, 
prevents specific concrete improvements, etc.) then the risk of changing 
longstanding old code outweighs any benefits of "cleanliness".
    
    In this specific case, I'm most concerned about the removal of those 
`jobContext` `Configuration` changes: they might appear superfluous but I'd bet 
that they were originally added for a reason which made sense at the time the 
original code was authored. If we're going to remove such old code, I'd like to 
see a written explanation of why the change is safe which explains the original 
context for adding that code. I chased through the git blame for the removed 
lines and the oldest version I found was 
https://github.com/apache/spark/commit/0595b6de8f1da04baceda082553c2aa1aa2cb006#diff-5c7d283b2f533b6f491dd1845dd86797R299,
 which is #5526: "[SPARK-3928] [SPARK-5182] [SQL] Partitioning support for the 
data sources API" by @liancheng. It looks that PR simply copied even older code 
from either `hiveWriterContainers.scala` or `HadoopRDD`. It looks like these 
properties actually _did_ have some effect for something at one point in time 
because https://github.com/apache/spark/pull/101 was 
 fixing some sort of corner-case or bug in that code in the `HadoopRDD` case. 
If you go even further back, similar lines are present in a commit which 
predated our merge script (so it's an intermediate commit from @mateiz as part 
of some larger changeset): 
https://github.com/apache/spark/commit/0ccfe20755665aa4c347b82e18297c5b3a2284ee#diff-8382b1a276e6cbfae1efb3ee49c7e06eR144
    
    Given this long history, I'd like to flag that change as potentially 
high-risk: it's not obvious to me that this code is unneeded and if we don't 
have a strong reason to change it then I'd prefer to leave it as it was before 
simply to help us manage / reduce risk.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to