Github user cloud-fan commented on a diff in the pull request:
https://github.com/apache/spark/pull/20397#discussion_r164335994
--- Diff:
sql/core/src/main/java/org/apache/spark/sql/sources/v2/reader/SupportsScanColumnarBatch.java
---
@@ -30,21 +30,21 @@
@InterfaceStability.Evolving
public interface SupportsScanColumnarBatch extends DataSourceV2Reader {
@Override
- default List<ReadTask<Row>> createReadTasks() {
+ default List<DataReaderFactory<Row>> createDataReaderFactories() {
--- End diff --
`DataReaderFactory` is responsible to do serialization and initialize the
actual data readers, so data reader creation must be done at executor side, and
before that we need to determine how many RDD partitions we want, which is this
method doing.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]