Re: [PR] [SPARK-48994][FOLLOW-UP] Disable support for interval types in Variant by default [spark]

via GitHub Tue, 24 Sep 2024 08:45:56 -0700


gene-db commented on code in PR #48215:
URL: https://github.com/apache/spark/pull/48215#discussion_r1773613194



##########
common/variant/README.md:
##########
@@ -364,11 +362,7 @@ The Decimal type contains a scale, but no precision. The 
implied precision of a
 
 The *Logical Type* column indicates logical equivalence of physically encoded 
types. For example, a user expression operating on a string value containing 
"hello" should behave the same, whether it is encoded with the short string 
optimization, or long string encoding. Similarly, user expressions operating on 
an *int8* value of 1 should behave the same as a decimal16 with scale 2 and 
unscaled value 100.
 
-The year-month and day-time interval types have one byte at the beginning 
indicating the start and end fields. In the case of the year-month interval, 
the least significant bit denotes the start field and the next least 
significant bit denotes the end field. The remaining 6 bits are unused. A field 
value of 0 represents YEAR and 1 represents MONTH. In the case of the day-time 
interval, the least significant 2 bits denote the start field and the next 
least significant 2 bits denote the end field. The remaining 4 bits are unused. 
A field value of 0 represents DAY, 1 represents HOUR, 2 represents MINUTE, and 
3 represents SECOND.
-
-Type IDs 17 and 18 were originally reserved for a prototype feature 
(string-from-metadata) that was never implemented. These IDs are available for 
use by new types.
-
-[1] The parquet format does not have pure equivalents for the year-month and 
day-time interval types. Year-month intervals are usually represented using 
int32 values and the day-time intervals are usually represented using int64 
values. However, these values don't include the start and end fields of these 
types. Therefore, Spark stores them in the column metadata.
+Type IDs 17 and 18 were originally reserved for a prototype feature 
(string-from-metadata) that was never implemented. These IDs are available for 
use by new types. Type IDs 19 and 20 were used to represent interval types for 
whom support has been temporarily disabled, and therefore, these type IDs 
should not be used by new types.

Review Comment:
   Why can't future types use 19 and 20? Shouldn't intervals just be removed 
completely, and added later when the exact encoded format is agreed upon?



##########
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala:
##########
@@ -4461,6 +4461,16 @@ object SQLConf {
       .booleanConf
       .createWithDefault(false)
 
+  val ALLOW_INTERVAL_TYPES_IN_VARIANT = 
buildConf("spark.sql.variant.allowIntervalTypes")

Review Comment:
   I think maybe making this a conf might be dangerous. We should just never 
create a variant value that contains an interval, since interval is not a valid 
type in variant right now.
   
   It is possible that a user enables this, writes a variant that contains this 
version of interval, but later a different version of an interval is updated in 
the spec. Then, there would be corrupt variant values.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] [SPARK-48994][FOLLOW-UP] Disable support for interval types in Variant by default [spark]

Reply via email to