viirya commented on pull request #34396:
URL: https://github.com/apache/spark/pull/34396#issuecomment-953470870


   I think this only makes sense if there are operators wanting to consume 
directly Arrow vectors from ColumnVector interface (i.e. they use Arrow API), 
so the producer operators must to write out Arrow-based ColumnVector. 
   
   Currently in iceberg and spark-rapids, they have some scan operators that 
directly output Arrow. To make them compatible with Spark, they just wrap into 
`ArrowColumnVector` as @sunchao mentioned.
   
   If the requirement is to make some operators that output Arrow vectors fit 
into Spark, `ArrowColumnVector` is enough for the purpose.
   
   If these operators are all work with ColumnVector interface, they should be 
fine with any kind of underlying data format (onheap, offheap, or other if any).
   
   
   > And the quick answer of adding ArrowWritableColumnVector here instead of 
using simply load to arrow and use ArrowColumnVector to get data is because we 
want to use WritableColumnVector APIs in RowToColumnarExec and Parquet Reader 
to write to arrow format.
   
   If the Parquet reader already writes arrow format, it seems easy to wrap 
into `ArrowColumnVector`? Why there is a need to change it to use 
`WritableColumnVector` API? I think Arrow API is more widely used outside Spark 
than the internal and private `WritableColumnVector` API. So I suppose the 
Parquet read should output Arrow vectors? Except for the case that you write a 
new Parquet reader from scratch and write into `WritableColumnVector`. But if 
it is true, as it can write into `ColumnVector`, why Arrow or not matters for 
the reader?
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to