viirya commented on pull request #34396: URL: https://github.com/apache/spark/pull/34396#issuecomment-953470870
I think this only makes sense if there are operators wanting to consume directly Arrow vectors from ColumnVector interface (i.e. they use Arrow API), so the producer operators must to write out Arrow-based ColumnVector. Currently in iceberg and spark-rapids, they have some scan operators that directly output Arrow. To make them compatible with Spark, they just wrap into `ArrowColumnVector` as @sunchao mentioned. If the requirement is to make some operators that output Arrow vectors fit into Spark, `ArrowColumnVector` is enough for the purpose. If these operators are all work with ColumnVector interface, they should be fine with any kind of underlying data format (onheap, offheap, or other if any). > And the quick answer of adding ArrowWritableColumnVector here instead of using simply load to arrow and use ArrowColumnVector to get data is because we want to use WritableColumnVector APIs in RowToColumnarExec and Parquet Reader to write to arrow format. If the Parquet reader already writes arrow format, it seems easy to wrap into `ArrowColumnVector`? Why there is a need to change it to use `WritableColumnVector` API? I think Arrow API is more widely used outside Spark than the internal and private `WritableColumnVector` API. So I suppose the Parquet read should output Arrow vectors? Except for the case that you write a new Parquet reader from scratch and write into `WritableColumnVector`. But if it is true, as it can write into `ColumnVector`, why Arrow or not matters for the reader? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
