Micah Kornfield created SPARK-42774: ---------------------------------------
Summary: Expose VectorTypes API for DataSourceV2 Batch Scans Key: SPARK-42774 URL: https://issues.apache.org/jira/browse/SPARK-42774 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 3.3.2 Reporter: Micah Kornfield SparkPlan's vectorType's attribute can be used to [specialize codegen|https://github.com/apache/spark/blob/5556cfc59aa97a3ad4ea0baacebe19859ec0bcb7/sql/core/src/main/scala/org/apache/spark/sql/execution/Columnar.scala#L151] however [BatchScanExecBase|https://github.com/apache/spark/blob/6b6bb6fa20f40aeedea2fb87008e9cce76c54e28/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/DataSourceV2ScanExecBase.scala] does not override this so we DSv2 sources do not get any benefit of concrete class dispatch. This proposes adding an override to BatchScanExecBase which delegates to a new default method on [PartitionReaderFactory|https://github.com/apache/spark/blob/f1d42bb68d6d69d9a32f91a390270f9ec33c3207/sql/catalyst/src/main/java/org/apache/spark/sql/connector/read/PartitionReaderFactory.java] to expose vectoryTypes: {{ default Optional<Iterable<String>> getVectorTypes() { return Optional.empty(); } }} -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org