GitHub user BryanCutler opened a pull request: https://github.com/apache/spark/pull/20114
[SPARK-22530][PYTHON][SQL] Adding Arrow support for ArrayType ## What changes were proposed in this pull request? This change adds `ArrayType` support for working with Arrow in pyspark when creating a DataFrame, calling `toPandas()`, and using vectorized `pandas_udf`. ## How was this patch tested? Added new Python unit tests using Array data. You can merge this pull request into a Git repository by running: $ git pull https://github.com/BryanCutler/spark arrow-ArrayType-support-SPARK-22530 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/20114.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #20114 ---- commit 50fa54c5b04455729b019c660ab8e86c903bda44 Author: Bryan Cutler <cutlerb@...> Date: 2017-11-15T23:44:23Z wip, toPandas works with pyarrow 0.7.1 commit a149352d0c60882bb6692cd43d2fb60c8dddb07b Author: Bryan Cutler <cutlerb@...> Date: 2017-12-01T20:02:16Z createDataFrame test now working commit 36faab4d7a23421968e1885dc6f2f47ac20c0ce0 Author: Bryan Cutler <cutlerb@...> Date: 2017-12-23T08:21:34Z using is_list to check type commit b0c79f108acf3ca91dd931bb9be45e4bbcf840a6 Author: Bryan Cutler <cutlerb@...> Date: 2017-12-24T07:06:06Z Using a workaround for ListVector validity buffer, ArrowTests passing commit f1bc9a5d8ba09cf6d702269b2418697184ef5690 Author: Bryan Cutler <cutlerb@...> Date: 2017-12-29T05:54:44Z ArrayType working in vectorized udfs commit d2c5c2b4ea803ac8d1f08a5f79af1076f9e5bd2b Author: Bryan Cutler <cutlerb@...> Date: 2017-12-29T06:04:19Z fix import order ---- --- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org