GitHub user bersprockets opened a pull request:
https://github.com/apache/spark/pull/21073
[SPARK-23936][SQL][WIP] Implement map_concat
## What changes were proposed in this pull request?
Implement map_concat high order function.
This is a work in progress.
There's no code generation yet.
My current implementation of MapConcat.checkInputDataTypes does not allow
valueContainsNull to vary between the input maps. Not sure what the
requirements are here.
I am using a java.util.Map implementation to merge all the maps together,
since that is a very straightforward implementation. I chose
java.util.LinkedHashMap because:
- It allows for predicatable tuple order (essentially, the original
insertion order) for building the result MapData. Tests, like pyspark's
doctests, which rely on tuple order, will work across Java versions
- java.util.LinkedHashMap seems to be about as fast as java.util.HashMap,
at least when used to concatenate big (500+ key/values) maps for 150k rows, and
it's much faster than the scala LinkedHashMap implementation.
## How was this patch tested?
New tests
Manual tests
Run all sbt SQL tests
Run all pyspark sql tests
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/bersprockets/spark SPARK-23936
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/21073.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #21073
----
commit 707330cd88b269cb1bbee83b9b6476d05c8d177c
Author: Bruce Robbins <bersprockets@...>
Date: 2018-04-14T23:52:37Z
Initial commit
commit 84d696313972a237691eb46cad6a478167dbabee
Author: Bruce Robbins <bersprockets@...>
Date: 2018-04-15T02:04:45Z
Remove unused variable in test
commit d04893bccbd2772eb937125895519228588e74b4
Author: Bruce Robbins <bersprockets@...>
Date: 2018-04-15T03:35:47Z
Cleanup
----
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]