xuechendi opened a new pull request #34396: URL: https://github.com/apache/spark/pull/34396
### What changes were proposed in this pull request? This Jira is aim to support Arrow format in RowToColumnarExec. ### Why are the changes needed? Current ArrowColumnVector is not fully equivalent to OnHeap/OffHeapColumnVector in spark, so RowToColumnarExec doesn't support write to Arrow format so far. since Arrow API is now being more stable, and using pandas udf will perform much better than python udf. ### What has been done in this pull request? I am proposing to support RowToColumnarExec with Arrow. What I did in this PR is to add a load api in ArrowColumnVector to load arrowRecordBatch to ArrowColumnVector, then called inside RowToColumnarExec doExecute. ### How was this patch tested? UTs are also added to test this new API and RowToColumnarExec with ArrowFormat. ### Does this PR introduce _any_ user-facing change? NO Signed-off-by: Chendi Xue <[email protected]> -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
