[GitHub] [spark] xuechendi commented on pull request #34396: [SPARK-37124][SQL] Add ArrowWritableColumnVector

GitBox Wed, 27 Oct 2021 19:30:16 -0700


xuechendi commented on pull request #34396:
URL: https://github.com/apache/spark/pull/34396#issuecomment-953449947



   @sunchao , yes, That is what we planned and did. We implemented mostly used 
operators/expressions based on arrow-based ColumnVector(ex: project/filter, 
join, sort, aggr, window). And for spark, we are planning to submit our Arrow 
Datasource as a new spark.read.format to directly load parquet/orc. We also 
planned to submit our arrow based InMemoryStore(RDD cache), RowToColumnarExec, 
PandasUDF serizalizer(which directly copy ArrowColumnVector underlying buffers 
to python context) once spark community is aligned on using 
ArrowWritableColumnVector as an alternative solution for current 
WritableColumnVector.
   And the quick answer of adding ArrowWritableColumnVector here instead of 
using simply load to arrow and use ArrowColumnVector to get data is because we 
want to use WritableColumnVector APIs in RowToColumnarExec and Parquet Reader 
to write to arrow format. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] xuechendi commented on pull request #34396: [SPARK-37124][SQL] Add ArrowWritableColumnVector

Reply via email to