Github user BryanCutler commented on the issue:
https://github.com/apache/spark/pull/15821
Hi @wesm and @icexelloss , that sounds good on our end. @yinxusen has been
working on validating some basic conversion so far, but everything is still
very preliminary so it would be great to work with you guys. I'll setup a new
integration branch and ping you all when ready.
> Related to this we'll also want to be able to precisely instrument and
benchmark the Dataset <-> Arrow conversion -- @icexelloss suggested might be
able to push down the conversion into the executors instead of doing all the
work in the driver, but I'm not sure how feasible that is
We were thinking about that too, as it would be more ideal. For simplicity
we decided to first do the conversion on the driver side, which should
hopefully still show a performance increase, then follow up with some work to
better optimize it.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]