rdblue commented on issue #24546: [SPARK-27650][SQL] separate the row iterator functionality from ColumnarBatch URL: https://github.com/apache/spark/pull/24546#issuecomment-491102528 @cloud-fan, I don't think that separating the iterator functionality from `ColumnarBatch` is the right approach. For implementations to actually use the columnar API in practice, this iterator is really useful. For example, sources need to build tests to validate batches and those tests need a way to read through a `ColumnarBatch`. Using `InternalRow` to access and validate each row makes sense, and it is better if implementations can use the same code that Spark would use to produce the rows. The iterator itself doesn't need to be removed because it uses only public (`Iterator`) or effectively public (`InternalRow`) classes. I think it would be better to either use a different `InternalRow` implementation (that is read-only to avoid depending on `WritableColumnVector`), or to move `MutableColumnarRow` but mark it private and continue to use it as the concrete implementation of `InternalRow`. I don't see a good reason to remove useful functionality from `ColumnarBatch` just to keep an implementation class in a different module.
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
