cloud-fan opened a new pull request #25180: [SPARK-28423][SQL] merge Scan and Batch/Stream URL: https://github.com/apache/spark/pull/25180 ## What changes were proposed in this pull request? By design, `Scan` represents a logical data scan, `Batch`/`Stream` represents a physical data scan. However, this doesn't match reality. The logical plan(`DataSourceV2Relation`) contains `Table` and the phyiscal plan(`BatchScanExec` and friends) contains `Batch`/`Stream`. The operator pushdown happens at planning time, so `Scan` and `Batch`/`Stream` are always created together in the planner rules. That said, `Table` is the actual logical data scan. Since there is not much can be separated from `Scan` and `Batch`/`Stream`, almost all the existing DS v2 implementations either implement `Scan` and `Batch`/`Stream` together, or use anonymous class to implement `Scan`. In addition, the write side API has no such separation either: it's just `WriterBuilder` -> `BatchWrite`/`StreamingWrite`. This PR proposes to merge `Scan` and `Batch`/`Stream`, to match the write side API: `ScanBuilder` -> `BatchScan`/`MicroBatchScan`/`ContinuousScan`. ## How was this patch tested? existing tests
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
