[GitHub] [spark] cloud-fan commented on issue #25180: [SPARK-28423][SQL] Merge Scan and Batch/Stream

GitBox Thu, 25 Jul 2019 19:22:47 -0700

cloud-fan commented on issue #25180: [SPARK-28423][SQL] Merge Scan and 
Batch/Stream
URL: https://github.com/apache/spark/pull/25180#issuecomment-515283197
 
 
   I think the separation between `Scan` and `Batch` is still useless even if 
we move the operator pushdown to the optimizer. There is no extra information 
needed to convert a `Scan` to a `Batch`, which means if I have a class that 
implements `Scan`, there is no problem for me to implement `Batch` at the same 
time.
   
   As a result, almost all the existing DS v2 implementations either implement 
`Scan` and `Batch/Stream` together, or use anonymous class to implement Scan. 
This makes me believe that we should remove this separation.
   
   Conceptually, the physical scan is represented by `InputPartition` and 
`PartitionReaderFactory`, not the interface that creates them. It makes more 
sense to use a single interface to represent a logical scan, which creates 
`InputPartition` and `PartitionReaderFactory`.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] cloud-fan commented on issue #25180: [SPARK-28423][SQL] Merge Scan and Batch/Stream

Reply via email to