[GitHub] spark pull request #21236: [SPARK-23935][SQL] Adding map_entries function

mn-mikke Fri, 04 May 2018 11:50:46 -0700

GitHub user mn-mikke opened a pull request:

    https://github.com/apache/spark/pull/21236


    [SPARK-23935][SQL] Adding map_entries function

    ## What changes were proposed in this pull request?
    
    This PR adds `map_entries` function that returns an unordered array of all 
entries in the given map.
    
    ## How was this patch tested?
    
    New tests added into:
    - `CollectionExpressionSuite`
    - `DataFrameFunctionsSuite`
    
    ## CodeGen examples
    ### Primitive types
    ```
    val df = Seq(Map(1 -> 5, 2 -> 6)).toDF("m")
    df.filter('m.isNotNull).select(map_entries('m)).debugCodegen
    ```
    Result:
    ```
    /* 042 */         boolean project_isNull_0 = false;
    /* 043 */
    /* 044 */         ArrayData project_value_0 = null;
    /* 045 */
    /* 046 */         final int project_numElements_0 = 
inputadapter_value_0.numElements();
    /* 047 */         final ArrayData project_keys_0 = 
inputadapter_value_0.keyArray();
    /* 048 */         final ArrayData project_values_0 = 
inputadapter_value_0.valueArray();
    /* 049 */
    /* 050 */         final int project_structSize_0 = 24;
    /* 051 */         final long project_byteArraySize_0 = 
UnsafeArrayData.calculateSizeOfUnderlyingByteArray(project_numElements_0, 8 + 
project_structSize_0);
    /* 052 */         final int project_structsOffset_0 = 
UnsafeArrayData.calculateHeaderPortionInBytes(project_numElements_0) + 
project_numElements_0 * 8;
    /* 053 */         if (project_byteArraySize_0 > 2147483632) {
    /* 054 */           final Object[] project_internalRowArray_0 = new 
Object[project_numElements_0];
    /* 055 */           for (int z = 0; z < project_numElements_0; z++) {
    /* 056 */             project_internalRowArray_0[z] = new 
org.apache.spark.sql.catalyst.expressions.GenericInternalRow(new 
Object[]{project_keys_0.getInt(z), project_values_0.getInt(z)});
    /* 057 */           }
    /* 058 */           project_value_0 = new 
org.apache.spark.sql.catalyst.util.GenericArrayData(project_internalRowArray_0);
    /* 059 */
    /* 060 */         } else {
    /* 061 */           final byte[] project_byteArray_0 = new 
byte[(int)project_byteArraySize_0];
    /* 062 */           UnsafeArrayData project_unsafeArrayData_0 = new 
UnsafeArrayData();
    /* 063 */           Platform.putLong(project_byteArray_0, 16, 
project_numElements_0);
    /* 064 */           project_unsafeArrayData_0.pointTo(project_byteArray_0, 
16, (int)project_byteArraySize_0);
    /* 065 */           UnsafeRow project_unsafeRow_0 = new UnsafeRow(2);
    /* 066 */           for (int z = 0; z < project_numElements_0; z++) {
    /* 067 */             long offset = project_structsOffset_0 + z * 
project_structSize_0;
    /* 068 */             project_unsafeArrayData_0.setLong(z, (offset << 32) + 
project_structSize_0);
    /* 069 */             project_unsafeRow_0.pointTo(project_byteArray_0, 16 + 
offset, project_structSize_0);
    /* 070 */             project_unsafeRow_0.setInt(0, 
project_keys_0.getInt(z));
    /* 071 */             project_unsafeRow_0.setInt(1, 
project_values_0.getInt(z));
    /* 072 */           }
    /* 073 */           project_value_0 = project_unsafeArrayData_0;
    /* 074 */         }
    ```
    ### Non-primitive types
    ```
    val df = Seq(Map("a" -> "foo", "b" -> null)).toDF("m")
    df.filter('m.isNotNull).select(map_entries('m)).debug
    ```
    Result:
    ```
    /* 042 */         boolean project_isNull_0 = false;
    /* 043 */
    /* 044 */         ArrayData project_value_0 = null;
    /* 045 */
    /* 046 */         final int project_numElements_0 = 
inputadapter_value_0.numElements();
    /* 047 */         final ArrayData project_keys_0 = 
inputadapter_value_0.keyArray();
    /* 048 */         final ArrayData project_values_0 = 
inputadapter_value_0.valueArray();
    /* 049 */
    /* 050 */         final Object[] project_internalRowArray_0 = new 
Object[project_numElements_0];
    /* 051 */         for (int z = 0; z < project_numElements_0; z++) {
    /* 052 */           project_internalRowArray_0[z] = new 
org.apache.spark.sql.catalyst.expressions.GenericInternalRow(new 
Object[]{project_keys_0.getUTF8String(z), project_values_0.getUTF8String(z)});
    /* 053 */         }
    /* 054 */         project_value_0 = new 
org.apache.spark.sql.catalyst.util.GenericArrayData(project_internalRowArray_0);
    ```


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/AbsaOSS/spark 
feature/array-api-map_entries-to-master

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/21236.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #21236
    
----
commit 086e223e89ce0cf56145f4aa9a7aef5421a98810
Author: Marek Novotny <mn.mikke@...>
Date:   2018-05-04T18:00:40Z

    [SPARK-23935][SQL] Adding map_entries function

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request #21236: [SPARK-23935][SQL] Adding map_entries function

Reply via email to