[GitHub] [spark] xuechendi opened a new pull request #34396: [SPARK-37124][SQL] Support RowToColumnarExec with Arrow format

GitBox Sun, 21 Nov 2021 19:46:44 -0800


xuechendi opened a new pull request #34396:
URL: https://github.com/apache/spark/pull/34396



   ### What changes were proposed in this pull request?
   This Jira is aim to support Arrow format in RowToColumnarExec.
   
   ### Why are the changes needed?
   Current ArrowColumnVector is not fully equivalent to 
OnHeap/OffHeapColumnVector in spark, so RowToColumnarExec doesn't support write 
to Arrow format so far.
   
   since Arrow API is now being more stable, and using pandas udf will perform 
much better than python udf.
   
   ### What has been done in this pull request?
   I am  proposing to support RowToColumnarExec with Arrow.
   
   What I did in this PR is to add a load api in ArrowColumnVector to load 
arrowRecordBatch to ArrowColumnVector, then called inside RowToColumnarExec 
doExecute.
   
   ### How was this patch tested?
   UTs are also added to test this new API and RowToColumnarExec with 
ArrowFormat.
   
   ### Does this PR introduce _any_ user-facing change?
   NO
   
   Signed-off-by: Chendi Xue <[email protected]>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] xuechendi opened a new pull request #34396: [SPARK-37124][SQL] Support RowToColumnarExec with Arrow format

Reply via email to