GitHub user mn-mikke opened a pull request:
https://github.com/apache/spark/pull/21236
[SPARK-23935][SQL] Adding map_entries function
## What changes were proposed in this pull request?
This PR adds `map_entries` function that returns an unordered array of all
entries in the given map.
## How was this patch tested?
New tests added into:
- `CollectionExpressionSuite`
- `DataFrameFunctionsSuite`
## CodeGen examples
### Primitive types
```
val df = Seq(Map(1 -> 5, 2 -> 6)).toDF("m")
df.filter('m.isNotNull).select(map_entries('m)).debugCodegen
```
Result:
```
/* 042 */ boolean project_isNull_0 = false;
/* 043 */
/* 044 */ ArrayData project_value_0 = null;
/* 045 */
/* 046 */ final int project_numElements_0 =
inputadapter_value_0.numElements();
/* 047 */ final ArrayData project_keys_0 =
inputadapter_value_0.keyArray();
/* 048 */ final ArrayData project_values_0 =
inputadapter_value_0.valueArray();
/* 049 */
/* 050 */ final int project_structSize_0 = 24;
/* 051 */ final long project_byteArraySize_0 =
UnsafeArrayData.calculateSizeOfUnderlyingByteArray(project_numElements_0, 8 +
project_structSize_0);
/* 052 */ final int project_structsOffset_0 =
UnsafeArrayData.calculateHeaderPortionInBytes(project_numElements_0) +
project_numElements_0 * 8;
/* 053 */ if (project_byteArraySize_0 > 2147483632) {
/* 054 */ final Object[] project_internalRowArray_0 = new
Object[project_numElements_0];
/* 055 */ for (int z = 0; z < project_numElements_0; z++) {
/* 056 */ project_internalRowArray_0[z] = new
org.apache.spark.sql.catalyst.expressions.GenericInternalRow(new
Object[]{project_keys_0.getInt(z), project_values_0.getInt(z)});
/* 057 */ }
/* 058 */ project_value_0 = new
org.apache.spark.sql.catalyst.util.GenericArrayData(project_internalRowArray_0);
/* 059 */
/* 060 */ } else {
/* 061 */ final byte[] project_byteArray_0 = new
byte[(int)project_byteArraySize_0];
/* 062 */ UnsafeArrayData project_unsafeArrayData_0 = new
UnsafeArrayData();
/* 063 */ Platform.putLong(project_byteArray_0, 16,
project_numElements_0);
/* 064 */ project_unsafeArrayData_0.pointTo(project_byteArray_0,
16, (int)project_byteArraySize_0);
/* 065 */ UnsafeRow project_unsafeRow_0 = new UnsafeRow(2);
/* 066 */ for (int z = 0; z < project_numElements_0; z++) {
/* 067 */ long offset = project_structsOffset_0 + z *
project_structSize_0;
/* 068 */ project_unsafeArrayData_0.setLong(z, (offset << 32) +
project_structSize_0);
/* 069 */ project_unsafeRow_0.pointTo(project_byteArray_0, 16 +
offset, project_structSize_0);
/* 070 */ project_unsafeRow_0.setInt(0,
project_keys_0.getInt(z));
/* 071 */ project_unsafeRow_0.setInt(1,
project_values_0.getInt(z));
/* 072 */ }
/* 073 */ project_value_0 = project_unsafeArrayData_0;
/* 074 */ }
```
### Non-primitive types
```
val df = Seq(Map("a" -> "foo", "b" -> null)).toDF("m")
df.filter('m.isNotNull).select(map_entries('m)).debug
```
Result:
```
/* 042 */ boolean project_isNull_0 = false;
/* 043 */
/* 044 */ ArrayData project_value_0 = null;
/* 045 */
/* 046 */ final int project_numElements_0 =
inputadapter_value_0.numElements();
/* 047 */ final ArrayData project_keys_0 =
inputadapter_value_0.keyArray();
/* 048 */ final ArrayData project_values_0 =
inputadapter_value_0.valueArray();
/* 049 */
/* 050 */ final Object[] project_internalRowArray_0 = new
Object[project_numElements_0];
/* 051 */ for (int z = 0; z < project_numElements_0; z++) {
/* 052 */ project_internalRowArray_0[z] = new
org.apache.spark.sql.catalyst.expressions.GenericInternalRow(new
Object[]{project_keys_0.getUTF8String(z), project_values_0.getUTF8String(z)});
/* 053 */ }
/* 054 */ project_value_0 = new
org.apache.spark.sql.catalyst.util.GenericArrayData(project_internalRowArray_0);
```
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/AbsaOSS/spark
feature/array-api-map_entries-to-master
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/21236.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #21236
----
commit 086e223e89ce0cf56145f4aa9a7aef5421a98810
Author: Marek Novotny <mn.mikke@...>
Date: 2018-05-04T18:00:40Z
[SPARK-23935][SQL] Adding map_entries function
----
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]