xuechendi opened a new pull request #34396: URL: https://github.com/apache/spark/pull/34396
### What changes were proposed in this pull request? This PR is aim to add Arrow format as an alternative for ColumnVector solution. ### Why are the changes needed? Current ArrowColumnVector is not fully equivalent to OnHeap/OffHeapColumnVector in spark, and since Arrow API is now being more stable, and using pandas udf will perform much better than python udf. ### What has been done in this pull request? What I did in this PR is to create a new class in the same package with OnHeap/OffHeapColumnVector and extend from WritableColumnVector to support all put APIs. ### How was this patch tested? UTs are covering all Data Format with testing on writing to columnVector and reading from columnVector. I also added 3 UTs for testing on loading from ArrowRecordBatch and allocateColumns . ### Does this PR introduce _any_ user-facing change? NO Signed-off-by: Chendi Xue <[email protected]> -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
