rdblue commented on issue #25180: [SPARK-28423][SQL] Merge Scan and Batch/Stream URL: https://github.com/apache/spark/pull/25180#issuecomment-514725617 > That said, Table is the actual logical data scan. No, table is a source that can be scanned. A logical scan has filters and a projection. > The operator pushdown happens at planning time, so Scan and Batch/Stream are always created together in the planner rules. The scan is created in `DataSourceV2Strategy`, but batch is a lazy field in `BatchScanExec`. There's no need for the planner strategy to know about the batch and stream objects at all. This separation would be useful if we decide to move push-down into a batch in the optimizer. We've been discussing options for doing push-down earlier and being able to use stats in the optimizer. If we did that, then the separation between scan and batch/stream would support that. We would introduce a logical node that has a scan that is produced in the optimizer.
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
