rdblue commented on issue #24546: [SPARK-27650][SQL] separate the row iterator 
functionality from ColumnarBatch
URL: https://github.com/apache/spark/pull/24546#issuecomment-491102528
 
 
   @cloud-fan, I don't think that separating the iterator functionality from 
`ColumnarBatch` is the right approach.
   
   For implementations to actually use the columnar API in practice, this 
iterator is really useful. For example, sources need to build tests to validate 
batches and those tests need a way to read through a `ColumnarBatch`. Using 
`InternalRow` to access and validate each row makes sense, and it is better if 
implementations can use the same code that Spark would use to produce the rows. 
The iterator itself doesn't need to be removed because it uses only public 
(`Iterator`) or effectively public (`InternalRow`) classes.
   
   I think it would be better to either use a different `InternalRow` 
implementation (that is read-only to avoid depending on 
`WritableColumnVector`), or to move `MutableColumnarRow` but mark it private 
and continue to use it as the concrete implementation of `InternalRow`.
   
   I don't see a good reason to remove useful functionality from 
`ColumnarBatch` just to keep an implementation class in a different module.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to