[GitHub] [spark] cloud-fan opened a new pull request #25180: [SPARK-28423][SQL] merge Scan and Batch/Stream

GitBox Wed, 17 Jul 2019 08:27:22 -0700

cloud-fan opened a new pull request #25180: [SPARK-28423][SQL] merge Scan and 
Batch/Stream
URL: https://github.com/apache/spark/pull/25180
 
 
   ## What changes were proposed in this pull request?
   
   By design, `Scan` represents a logical data scan, `Batch`/`Stream` 
represents a physical data scan.
   
   However, this doesn't match reality. The logical 
plan(`DataSourceV2Relation`) contains `Table` and the phyiscal 
plan(`BatchScanExec` and friends) contains `Batch`/`Stream`. The operator 
pushdown happens at planning time, so `Scan` and `Batch`/`Stream` are always 
created together in the planner rules. That said, `Table` is the actual logical 
data scan.
   
   Since there is not much can be separated from `Scan` and `Batch`/`Stream`, 
almost all the existing DS v2 implementations either implement `Scan` and 
`Batch`/`Stream` together, or use anonymous class to implement `Scan`.
   
   In addition, the write side API has no such separation either: it's just 
`WriterBuilder` -> `BatchWrite`/`StreamingWrite`.
   
   This PR proposes to merge `Scan` and `Batch`/`Stream`, to match the write 
side API: `ScanBuilder` -> `BatchScan`/`MicroBatchScan`/`ContinuousScan`.
   
   ## How was this patch tested?
   
   existing tests


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] cloud-fan opened a new pull request #25180: [SPARK-28423][SQL] merge Scan and Batch/Stream

Reply via email to