cashmand commented on PR #49874: URL: https://github.com/apache/spark/pull/49874#issuecomment-2654532466
Hi @pan3793, I opened https://github.com/apache/spark/pull/49910 to remove the Variant docs from Spark, and link to the Parquet repo. Regarding the status of Variant: 1) Shredded writes in are still in a test-only state, although it should be compatible with the latest version of the shredding spec in Parquet. There's currently no API to enable shredded writes other than to use `spark.sql.variant.forceShreddingSchemaForTest`, which is clearly marked as being meant for test purposes, and isn't practical for real use cases. 2) Reads from shredded Variant should work correctly. That being said, there are currently no production writers, so there hasn't been much testing outside of the unit tests added to Spark. It's also possible that the shredding spec could change, although I'm hoping that's unlikely at this point. Given those concerns, I think it's reasonable to disable the flag by default, but will leave it up to you, @gene-db and @cloud-fan. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
