GitHub user rdblue opened a pull request:

    https://github.com/apache/spark/pull/21242

    [SPARK-23657][SQL] Document and expose the internal data API

    ## What changes were proposed in this pull request?
    
    This makes the `InternalRow`, `ArrayData`, and `MapData` classes public and 
adds package documentation for the internal representation used by Spark SQL.
    
    The motivation for this change is to document the internal API because it 
has been leaked as a public API and used in the v2 DataSource classes.
    
    ## How was this patch tested?
    
    Existing tests. This is a refactor and adds documentation.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/rdblue/spark 
SPARK-23657-expose-internal-data-apis

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/21242.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #21242
    
----
commit f738127bc287632ede185daa3345b1f0de8b0ccc
Author: Ryan Blue <blue@...>
Date:   2018-05-04T21:12:22Z

    Move InternalRow to org.apache.spark.sql.catalyst.data.

commit 747867c719029ddc86d44e7149e98e031a3468ea
Author: Ryan Blue <blue@...>
Date:   2018-05-04T21:20:31Z

    Move ArrayData to org.apache.spark.sql.catalyst.data.

commit 7d108e9f1e603977b21773c664ba506b65ed0682
Author: Ryan Blue <blue@...>
Date:   2018-05-04T21:34:25Z

    Move MapData to org.apache.spark.sql.catalyst.data.

commit 45a5c8d5e90df5e659bea107c73fc6e1a015ad7d
Author: Ryan Blue <blue@...>
Date:   2018-05-04T21:36:05Z

    Move SpecializedGetters to org.apache.spark.sql.catalyst.data.

commit 465852a7f32c568b89d4ee988262b53a26710f97
Author: Ryan Blue <blue@...>
Date:   2018-05-04T22:38:18Z

    Clean up the public API for InternalRow, ArrayData, and MapData.
    
    This adds SpecializedSetters, with helper functions that are common
    between InternalRow and ArrayData. This class also defines a default get
    method that was previously implemented by UnsafeArrayData and UnsafeRow.
    
    The package.scala for data has a high-level overview of the internal
    data API.

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to