[GitHub] [spark] ryan-johnson-databricks commented on a diff in pull request #40677: [SPARK-43039][SQL] Support custom fields in the file source _metadata column.

via GitHub Mon, 10 Apr 2023 08:10:54 -0700


ryan-johnson-databricks commented on code in PR #40677:
URL: https://github.com/apache/spark/pull/40677#discussion_r1161805267



##########
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FileIndex.scala:
##########
@@ -23,11 +23,30 @@ import org.apache.spark.sql.catalyst.InternalRow
 import org.apache.spark.sql.catalyst.expressions._
 import org.apache.spark.sql.types.StructType
 
+/**
+ * A file status augmented with optional metadata. File formats can use the 
extra metadata to expose
+ * custom file-constant metadata columns, but in general tasks and readers can 
use the per-file
+ * metadata however they see fit.
+ */
+case class FileStatusWithMetadata(fileStatus: FileStatus, metadata: 
Map[String, Any] = Map.empty) {

Review Comment:
   Update: `ConstantColumnVector` looks like an incompletely implemented API... 
it "supports" array/map/struct on the surface (e.g. `ConstantColumnVectorSuite` 
has superficial tests for it), but e.g. `ColumnVectorUtils.populate` doesn't 
actually handle them and `ColumnVectorUtilsSuite.scala` has negative tests to 
verify that they cannot be used in practice. 
   
   As far as I can tell, the class really only supports data types that can be 
used as partition columns.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] ryan-johnson-databricks commented on a diff in pull request #40677: [SPARK-43039][SQL] Support custom fields in the file source _metadata column.

Reply via email to