harshmotw-db commented on code in PR #53120:
URL: https://github.com/apache/spark/pull/53120#discussion_r2548267850
##########
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala:
##########
@@ -1593,6 +1593,14 @@ object SQLConf {
.booleanConf
.createWithDefault(false)
+ val PARQUET_IGNORE_VARIANT_ANNOTATION =
+ buildConf("spark.sql.parquet.ignoreVariantAnnotation")
+ .doc("When true, ignore the variant logical type annotation and treat
the Parquet " +
+ "column in the same way as the underlying struct type")
+ .version("4.1.0")
+ .booleanConf
+ .createWithDefault(false)
Review Comment:
I have added a new test `variant logical type annotation - ignore variant
annotation` to demonstrate this point.
So, if the `ignoreVariantAnnotation` config is enabled, you can read a
parquet file with an underlying variant column into a struct of binaries
schema. So for a variant column `v`, you could run:
`spark.read.format("parquer").schema("v struct<value: BINARY, metadata:
BINARY>").load(...)` and it would load the value and metadata columns into
these fields even though the data is logically not a struct of two binaries but
is instead a variant. People could use this to debug the physical variant
values.
If the config is disabled, which is the default, this read would give an
error and you would need to read variant columns into a variant schema.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]