[GitHub] spark pull request #21073: [SPARK-23936][SQL][WIP] Implement map_concat

bersprockets Sat, 14 Apr 2018 20:43:08 -0700

GitHub user bersprockets opened a pull request:

    https://github.com/apache/spark/pull/21073


    [SPARK-23936][SQL][WIP] Implement map_concat

    ## What changes were proposed in this pull request?
    
    Implement map_concat high order function.
    
    This is a work in progress.
    
    There's no code generation yet.
    
    My current implementation of MapConcat.checkInputDataTypes does not allow 
valueContainsNull to vary between the input maps. Not sure what the 
requirements are here.
    
    I am using a java.util.Map implementation to merge all the maps together, 
since that is a very straightforward implementation. I chose 
java.util.LinkedHashMap because:
    
    - It allows for predicatable tuple order (essentially, the original 
insertion order) for building the result MapData. Tests, like pyspark's 
doctests, which rely on tuple order, will work across Java versions
    - java.util.LinkedHashMap seems to be about as fast as java.util.HashMap, 
at least when used to concatenate big (500+ key/values) maps for 150k rows, and 
it's much faster than the scala LinkedHashMap implementation.
    
    ## How was this patch tested?
    
    New tests
    Manual tests
    Run all sbt SQL tests
    Run all pyspark sql tests
    


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/bersprockets/spark SPARK-23936

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/21073.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #21073
    
----
commit 707330cd88b269cb1bbee83b9b6476d05c8d177c
Author: Bruce Robbins <bersprockets@...>
Date:   2018-04-14T23:52:37Z

    Initial commit

commit 84d696313972a237691eb46cad6a478167dbabee
Author: Bruce Robbins <bersprockets@...>
Date:   2018-04-15T02:04:45Z

    Remove unused variable in test

commit d04893bccbd2772eb937125895519228588e74b4
Author: Bruce Robbins <bersprockets@...>
Date:   2018-04-15T03:35:47Z

    Cleanup

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21073: [SPARK-23936][SQL][WIP] Implement map_concat

Reply via email to