Github user cloud-fan commented on a diff in the pull request:
https://github.com/apache/spark/pull/20397#discussion_r164347780
--- Diff:
sql/core/src/main/java/org/apache/spark/sql/sources/v2/reader/DataReaderFactory.java
---
@@ -22,19 +22,19 @@
import org.apache.spark.annotation.InterfaceStability;
/**
- * A read task returned by {@link DataSourceV2Reader#createReadTasks()}
and is responsible for
- * creating the actual data reader. The relationship between {@link
ReadTask} and {@link DataReader}
+ * A reader factory returned by {@link
DataSourceV2Reader#createDataReaderFactories()} and is responsible for
+ * creating the actual data reader. The relationship between {@link
DataReaderFactory} and {@link DataReader}
* is similar to the relationship between {@link Iterable} and {@link
java.util.Iterator}.
*
- * Note that, the read task will be serialized and sent to executors, then
the data reader will be
- * created on executors and do the actual reading. So {@link ReadTask}
must be serializable and
+ * Note that, the reader factory will be serialized and sent to executors,
then the data reader will be
+ * created on executors and do the actual reading. So {@link
DataReaderFactory} must be serializable and
* {@link DataReader} doesn't need to be.
*/
@InterfaceStability.Evolving
-public interface ReadTask<T> extends Serializable {
+public interface DataReaderFactory<T> extends Serializable {
/**
- * The preferred locations where this read task can run faster, but
Spark does not guarantee that
+ * The preferred locations where this data reader factory can run
faster, but Spark does not guarantee that
--- End diff --
`... where the data reader returned by this reader factory can run faster
...`
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]