gene-db commented on code in PR #48215:
URL: https://github.com/apache/spark/pull/48215#discussion_r1773613194
##########
common/variant/README.md:
##########
@@ -364,11 +362,7 @@ The Decimal type contains a scale, but no precision. The
implied precision of a
The *Logical Type* column indicates logical equivalence of physically encoded
types. For example, a user expression operating on a string value containing
"hello" should behave the same, whether it is encoded with the short string
optimization, or long string encoding. Similarly, user expressions operating on
an *int8* value of 1 should behave the same as a decimal16 with scale 2 and
unscaled value 100.
-The year-month and day-time interval types have one byte at the beginning
indicating the start and end fields. In the case of the year-month interval,
the least significant bit denotes the start field and the next least
significant bit denotes the end field. The remaining 6 bits are unused. A field
value of 0 represents YEAR and 1 represents MONTH. In the case of the day-time
interval, the least significant 2 bits denote the start field and the next
least significant 2 bits denote the end field. The remaining 4 bits are unused.
A field value of 0 represents DAY, 1 represents HOUR, 2 represents MINUTE, and
3 represents SECOND.
-
-Type IDs 17 and 18 were originally reserved for a prototype feature
(string-from-metadata) that was never implemented. These IDs are available for
use by new types.
-
-[1] The parquet format does not have pure equivalents for the year-month and
day-time interval types. Year-month intervals are usually represented using
int32 values and the day-time intervals are usually represented using int64
values. However, these values don't include the start and end fields of these
types. Therefore, Spark stores them in the column metadata.
+Type IDs 17 and 18 were originally reserved for a prototype feature
(string-from-metadata) that was never implemented. These IDs are available for
use by new types. Type IDs 19 and 20 were used to represent interval types for
whom support has been temporarily disabled, and therefore, these type IDs
should not be used by new types.
Review Comment:
Why can't future types use 19 and 20? Shouldn't intervals just be removed
completely, and added later when the exact encoded format is agreed upon?
##########
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala:
##########
@@ -4461,6 +4461,16 @@ object SQLConf {
.booleanConf
.createWithDefault(false)
+ val ALLOW_INTERVAL_TYPES_IN_VARIANT =
buildConf("spark.sql.variant.allowIntervalTypes")
Review Comment:
I think maybe making this a conf might be dangerous. We should just never
create a variant value that contains an interval, since interval is not a valid
type in variant right now.
It is possible that a user enables this, writes a variant that contains this
version of interval, but later a different version of an interval is updated in
the spec. Then, there would be corrupt variant values.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]