ryan-johnson-databricks commented on code in PR #40677:
URL: https://github.com/apache/spark/pull/40677#discussion_r1162154097
##########
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FileIndex.scala:
##########
@@ -23,11 +23,30 @@ import org.apache.spark.sql.catalyst.InternalRow
import org.apache.spark.sql.catalyst.expressions._
import org.apache.spark.sql.types.StructType
+/**
+ * A file status augmented with optional metadata. File formats can use the
extra metadata to expose
+ * custom file-constant metadata columns, but in general tasks and readers can
use the per-file
+ * metadata however they see fit.
+ */
+case class FileStatusWithMetadata(fileStatus: FileStatus, metadata:
Map[String, Any] = Map.empty) {
Review Comment:
Updated the doc comment here to explain that file-source metadata fields is
only one possible usage for the extra file metadata (which is conceptually at a
deeper layer than catalyst and `Literal`).
Also updated `isSupportedType` doc comment to explain why not all types are
supported.
Relevant implementation details:
1. It would take a lot of work to support all data types, regardless of
whether we use `Literal` vs. `Any`.
2. We anyway end up wrapping the provided value in a call to `Literal(_)`,
because doing so simplifies null handling by making null-because-missing
equivalent to null-because-null. At that point, we get wrapping of primitive
values "for free" if we happen to pass `Any` instead.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]