[GitHub] [spark] rdblue commented on issue #25180: [SPARK-28423][SQL] Merge Scan and Batch/Stream

GitBox Wed, 24 Jul 2019 10:29:14 -0700

rdblue commented on issue #25180: [SPARK-28423][SQL] Merge Scan and Batch/Stream
URL: https://github.com/apache/spark/pull/25180#issuecomment-514725617
 
 
   > That said, Table is the actual logical data scan.
   
   No, table is a source that can be scanned. A logical scan has filters and a 
projection.
   
   > The operator pushdown happens at planning time, so Scan and Batch/Stream 
are always created together in the planner rules.
   
   The scan is created in `DataSourceV2Strategy`, but batch is a lazy field in 
`BatchScanExec`. There's no need for the planner strategy to know about the 
batch and stream objects at all.
   
   This separation would be useful if we decide to move push-down into a batch 
in the optimizer. We've been discussing options for doing push-down earlier and 
being able to use stats in the optimizer. If we did that, then the separation 
between scan and batch/stream would support that. We would introduce a logical 
node that has a scan that is produced in the optimizer.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] rdblue commented on issue #25180: [SPARK-28423][SQL] Merge Scan and Batch/Stream

Reply via email to