GitHub user bersprockets opened a pull request: https://github.com/apache/spark/pull/21073
[SPARK-23936][SQL][WIP] Implement map_concat ## What changes were proposed in this pull request? Implement map_concat high order function. This is a work in progress. There's no code generation yet. My current implementation of MapConcat.checkInputDataTypes does not allow valueContainsNull to vary between the input maps. Not sure what the requirements are here. I am using a java.util.Map implementation to merge all the maps together, since that is a very straightforward implementation. I chose java.util.LinkedHashMap because: - It allows for predicatable tuple order (essentially, the original insertion order) for building the result MapData. Tests, like pyspark's doctests, which rely on tuple order, will work across Java versions - java.util.LinkedHashMap seems to be about as fast as java.util.HashMap, at least when used to concatenate big (500+ key/values) maps for 150k rows, and it's much faster than the scala LinkedHashMap implementation. ## How was this patch tested? New tests Manual tests Run all sbt SQL tests Run all pyspark sql tests You can merge this pull request into a Git repository by running: $ git pull https://github.com/bersprockets/spark SPARK-23936 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/21073.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #21073 ---- commit 707330cd88b269cb1bbee83b9b6476d05c8d177c Author: Bruce Robbins <bersprockets@...> Date: 2018-04-14T23:52:37Z Initial commit commit 84d696313972a237691eb46cad6a478167dbabee Author: Bruce Robbins <bersprockets@...> Date: 2018-04-15T02:04:45Z Remove unused variable in test commit d04893bccbd2772eb937125895519228588e74b4 Author: Bruce Robbins <bersprockets@...> Date: 2018-04-15T03:35:47Z Cleanup ---- --- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org