Github user BryanCutler commented on the issue: https://github.com/apache/spark/pull/15821 Hi @wesm and @icexelloss , that sounds good on our end. @yinxusen has been working on validating some basic conversion so far, but everything is still very preliminary so it would be great to work with you guys. I'll setup a new integration branch and ping you all when ready. > Related to this we'll also want to be able to precisely instrument and benchmark the Dataset <-> Arrow conversion -- @icexelloss suggested might be able to push down the conversion into the executors instead of doing all the work in the driver, but I'm not sure how feasible that is We were thinking about that too, as it would be more ideal. For simplicity we decided to first do the conversion on the driver side, which should hopefully still show a performance increase, then follow up with some work to better optimize it.
--- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org