cashmand commented on code in PR #53005:
URL: https://github.com/apache/spark/pull/53005#discussion_r2516030733
##########
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetSchemaConverter.scala:
##########
@@ -373,6 +373,10 @@ class ParquetToSparkSchemaConverter(
Option(field.getLogicalTypeAnnotation).fold(
convertInternal(groupColumn,
sparkReadType.map(_.asInstanceOf[StructType]))) {
+ // Temporary workaround to read Shredded variant data
+ case v: VariantLogicalTypeAnnotation if v.getSpecVersion == 1 &&
sparkReadType.isEmpty =>
+ convertInternal(groupColumn, None)
Review Comment:
Is the entire read-side PR just this and one or two other similar lines, or
is there something else that I'm missing?
##########
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala:
##########
@@ -1585,6 +1585,21 @@ object SQLConf {
.booleanConf
.createWithDefault(true)
+ val PARQUET_ANNOTATE_VARIANT_LOGICAL_TYPE =
+ buildConf("spark.sql.parquet.variant.annotateLogicalType.enabled")
+ .doc("When enabled, Spark annotates the variant groups written to
Parquet as the parquet " +
+ "variant logical type.")
+ .version("4.1.0")
+ .booleanConf
+ .createWithDefault(false)
+
+ val PARQUET_WRITE_VARIANT_SPEC_VERSION =
Review Comment:
It seems a bit strange for this to be a conf for now. I don't think we
should allow writing a version that Spark doesn't know how to write, and
currently the only valid spec version is 1, so if we have a conf, it should
only accept "1" as a valid setting. Is there some use for this conf that I'm
missing?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]