[GitHub] [spark] xuechendi opened a new pull request #34396: [SPARK-37124]Add ArrowWritableColumnVector

GitBox Tue, 26 Oct 2021 18:00:28 -0700


xuechendi opened a new pull request #34396:
URL: https://github.com/apache/spark/pull/34396



   ### What changes were proposed in this pull request?
   This PR is aim to add Arrow format as an alternative for ColumnVector 
solution.
   
   ### Why are the changes needed?
   Current ArrowColumnVector is not fully equivalent to 
OnHeap/OffHeapColumnVector in spark, and since Arrow API is now being more 
stable, and using pandas udf will perform much better than python udf.
   
   ### What has been done in this pull request?
   What I did in this PR is to create a new class in the same package with 
OnHeap/OffHeapColumnVector and extend from WritableColumnVector to support all 
put APIs.
   
   ### How was this patch tested?
   UTs are covering all Data Format with testing on writing to columnVector and 
reading from columnVector. I also added 3 UTs for testing on loading from 
ArrowRecordBatch and allocateColumns .
   
   ### Does this PR introduce _any_ user-facing change?
   NO
   
   Signed-off-by: Chendi Xue <[email protected]>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] xuechendi opened a new pull request #34396: [SPARK-37124]Add ArrowWritableColumnVector

Reply via email to