ryan-johnson-databricks commented on code in PR #40677:
URL: https://github.com/apache/spark/pull/40677#discussion_r1161778302
##########
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FileIndex.scala:
##########
@@ -23,11 +23,30 @@ import org.apache.spark.sql.catalyst.InternalRow
import org.apache.spark.sql.catalyst.expressions._
import org.apache.spark.sql.types.StructType
+/**
+ * A file status augmented with optional metadata. File formats can use the
extra metadata to expose
+ * custom file-constant metadata columns, but in general tasks and readers can
use the per-file
+ * metadata however they see fit.
+ */
+case class FileStatusWithMetadata(fileStatus: FileStatus, metadata:
Map[String, Any] = Map.empty) {
Review Comment:
Update: I remember now another reason why I had added `isSupportedDataType`
-- `ConstantColumnVector` (needed by
[FileScanRDD...createMetadataColumnVector](https://github.com/apache/spark/pull/40677/files/6d35127b60f77475bf4b158b762468f30ec3dd9a#diff-b7b097f8cec6a7ae6640f9ecd6d4ac14ed304ad7c8db802a3ec2c0983535e157R200)
below) supports a limited subset of types, and relies on type-specific getters
and setters. Even if I wrote the (complex recursive) code to handle structs,
maps, and arrays... we still wouldn't have complete coverage for all types.
Do we know for certain that `ConstantColumnVector` supports all types that
can ever be encountered during vectorized execution? If not, we must keep the
`isSupportedDataType` method I introduced.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]