[GitHub] [spark] ryan-johnson-databricks commented on a diff in pull request #40677: [SPARK-43039][SQL] Support custom fields in the file source _metadata column.

via GitHub Tue, 11 Apr 2023 05:17:43 -0700


ryan-johnson-databricks commented on code in PR #40677:
URL: https://github.com/apache/spark/pull/40677#discussion_r1162732009



##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/namedExpressions.scala:
##########
@@ -554,6 +555,31 @@ object FileSourceMetadataAttribute {
     metadata.getBoolean(FILE_SOURCE_METADATA_COL_ATTR_KEY)
   }
 
+  /**
+   * True if the given data type is supported in file source metadata 
attributes.
+   *
+   * The set of supported types is limited by [[ColumnVectorUtils.populate]], 
which the constant
+   * file metadata implementation relies on. In general, types that can be 
partition columns are
+   * supported (including most primitive types). Notably unsupported types 
include [[ObjectType]],
+   * [[UserDefinedType]], and the complex types ([[StructType]], [[MapType]], 
[[ArrayType]]).
+   */
+  def isSupportedType(dataType: DataType): Boolean = dataType.physicalDataType 
match {
+    case PhysicalNullType => true
+    case PhysicalBooleanType => true
+    case PhysicalByteType | PhysicalShortType | PhysicalIntegerType | 
PhysicalLongType => true
+    case PhysicalFloatType | PhysicalDoubleType => true

Review Comment:
   Hmm... it's currently true that `ColumnVectorUtils.populate` supports all 
physical primitive types, but somebody neglected to make the latter a sealed 
trait. I'll fix that and simplify the match here.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] ryan-johnson-databricks commented on a diff in pull request #40677: [SPARK-43039][SQL] Support custom fields in the file source _metadata column.

Reply via email to