[GitHub] [spark] sunchao commented on pull request #34396: [SPARK-37124][SQL] Add ArrowWritableColumnVector

GitBox Wed, 27 Oct 2021 19:08:16 -0700


sunchao commented on pull request #34396:
URL: https://github.com/apache/spark/pull/34396#issuecomment-953441743



   @xuechendi I'm not aware of a broader plan to enable Arrow in Spark. Like 
@viirya pointed out, instead of having a writable Arrow vector, currently other 
projects such as Iceberg and Spark RAPIDS choose to first construct the Arrow 
`ValueVector`s and then wrap them to Spark with `ArrowColumnVector`. This seems 
to work pretty well so far. I'm curious what new use cases you have in mind 
(and it'd be great if you have any example code to share).
   
   One thing we have discussed is if Spark eventually implements columnar 
execution, then operators may need to write out intermediate `ColumnVector`s, 
in which case we may need an Arrow writable vector so that inputs & outputs are 
sharing the same in-memory format. This is a big if though. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] sunchao commented on pull request #34396: [SPARK-37124][SQL] Add ArrowWritableColumnVector

Reply via email to