GitHub user KyleLi1985 opened a pull request:

    https://github.com/apache/spark/pull/23271

    [SPARK-26318][SQL] Enhance function merge performance in Row

    ## What changes were proposed in this pull request?
    Enhance function merge performance in Row
    
    Like do 100000000 time Row.merge for input 
          val row1 = Row("name", "work", 2314, "null", 1, "")
          val row2 = Row(1, true, "name", null, "2010-10-22", 34, "location", 
"situation")
          val row3 = Row.fromSeq(Seq(row1,row2))
          val rows = Seq(row1,row2,row3)
          Row.merge(row1)
          Row.merge(rows:_*)
    it need 108458 millisecond and 158356 millisecond
    
    After add this commit, it only need 24967 millisecond and 34035 millisecond
    
    ## How was this patch tested?
    Unit test
    Accuracy test


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/KyleLi1985/spark master

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/23271.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #23271
    
----
commit 93c4af42d556b3779f6d56ffdf606c1132f8ef47
Author: 李亮 <liang.li.work@...>
Date:   2018-12-10T08:05:40Z

    Enhance function merge performance in Row

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to