cloud-fan commented on issue #25180: [SPARK-28423][SQL] Merge Scan and Batch/Stream URL: https://github.com/apache/spark/pull/25180#issuecomment-515283197 I think the separation between `Scan` and `Batch` is still useless even if we move the operator pushdown to the optimizer. There is no extra information needed to convert a `Scan` to a `Batch`, which means if I have a class that implements `Scan`, there is no problem for me to implement `Batch` at the same time. As a result, almost all the existing DS v2 implementations either implement `Scan` and `Batch/Stream` together, or use anonymous class to implement Scan. This makes me believe that we should remove this separation. Conceptually, the physical scan is represented by `InputPartition` and `PartitionReaderFactory`, not the interface that creates them. It makes more sense to use a single interface to represent a logical scan, which creates `InputPartition` and `PartitionReaderFactory`.
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
