cashmand commented on code in PR #49234:
URL: https://github.com/apache/spark/pull/49234#discussion_r1894357728
##########
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetSchemaConverter.scala:
##########
@@ -772,6 +780,13 @@ private[sql] object ParquetSchemaConverter {
val EMPTY_MESSAGE: MessageType =
Types.buildMessage().named(ParquetSchemaConverter.SPARK_PARQUET_SCHEMA_NAME)
+ // Used to annotate as metadata on the struct that replaces a VariantType
when shredding.
+ val VARIANT_WRITE_SHREDDING_KEY: String = "__VARIANT_WRITE_SHREDDING_KEY"
+
+ def isVariantShreddingStruct(s: StructType): Boolean = {
+ s.fields.length > 0 &&
s.fields.forall(_.metadata.contains(VARIANT_WRITE_SHREDDING_KEY))
Review Comment:
I put it on all or none in `updateSchemaForVariantShredding`, so that
shouldn't really happen. We could check and fail here if there's an
inconsistency. I could also just put it on the first field in the struct, if
you prefer. I don't think there's any real need to put it on all of them.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]