[GitHub] spark issue #22428: [SPARK-25430][SQL] Add map parameter for withColumnRenam...

goungoun Sun, 16 Sep 2018 21:42:02 -0700

Github user goungoun commented on the issue:

    https://github.com/apache/spark/pull/22428
  
    @HyukjinKwon , thanks for your review. Actually, that is the reason that I 
open this pull request. I think it is better to giving reusable option to users 
than repeating too much of same code in their analysis. In notebook 
environment, whenever visualization is required in the middle of the analysis, 
I had to convert column names rather than using it as it is so that I can 
deliver right messages to the report readers. During the process, I had to 
repeat withColumenRenamed too many times. 
    
    So, I've researched how the other users are trying to overcome the 
limitation. It seems that users tend to use foldleft or for loop with 
withColumnRenamed which can cause performance issue creating too many 
dataframes inside of Spark engine even without knowing it. The arguments can be 
found as follows.
    
    StackOverflow
    - 
https://stackoverflow.com/questions/38798567/pyspark-rename-more-than-one-column-using-withcolumnrenamed
    - 
https://stackoverflow.com/questions/35592917/renaming-column-names-of-a-dataframe-in-spark-scala?noredirect=1&lq=1
    
    Spark Issues
    [SPARK-12225] Support adding or replacing multiple columns at once in 
DataFrame API
    
    [SPARK-21582] DataFrame.withColumnRenamed cause huge performance overhead
    If foldleft is used, too many columns can cause performance issue




---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark issue #22428: [SPARK-25430][SQL] Add map parameter for withColumnRenam...

Reply via email to