cloud-fan commented on a change in pull request #32949:
URL: https://github.com/apache/spark/pull/32949#discussion_r659867783
##########
File path: docs/sql-migration-guide.md
##########
@@ -83,7 +83,9 @@ license: |
- In Spark 3.2, `TRANSFORM` operator can support
`ArrayType/MapType/StructType` without Hive SerDe, in this mode, we use
`StructsToJosn` to convert `ArrayType/MapType/StructType` column to `STRING`
and use `JsonToStructs` to parse `STRING` to `ArrayType/MapType/StructType`. In
Spark 3.1, Spark just support case `ArrayType/MapType/StructType` column as
`STRING` but can't support parse `STRING` to `ArrayType/MapType/StructType`
output columns.
- - In Spark 3.2, the unit-to-unit interval literals like `INTERVAL '1-1' YEAR
TO MONTH` are converted to ANSI interval types: `YearMonthIntervalType` or
`DayTimeIntervalType`. In Spark 3.1 and earlier, such interval literals are
converted to `CalendarIntervalType`. To restore the behavior before Spark 3.2,
you can set `spark.sql.legacy.interval.enabled` to `true`.
+ - In Spark 3.2, the unit-to-unit interval literals like `INTERVAL '1-1' YEAR
TO MONTH` and the unit list interval literals like `INTERVAL '3' DAYS '1' HOUR`
are converted to ANSI interval types: `YearMonthIntervalType` or
`DayTimeIntervalType`. In Spark 3.1 and earlier, such interval literals are
converted to `CalendarIntervalType`. To restore the behavior before Spark 3.2,
you can set `spark.sql.legacy.interval.enabled` to `true`.
+
+ - In Spark 3.2, the unit list interval literals support two category of
units. The one is year-month units (YEAR and MONTH) and the other is day-time
units (DAY, HOUR, MINUTE and SECOND). Units are allowd to be listed within the
same category like `5 YEARS 8 MONTHS` or `1 DAY 2 MINUTE`. In Spark 3.1 and
earlier, WEEK, MILLISECOND, MICROSECOND and NANOSECOND are also supported as
units, and any units are allowed to be listed in a literal. To restore the
behavior before Spark 3.2, you can set `spark.sql.legacy.interval.enabled` to
`true`.
Review comment:
Do we have to forbid `WEEK, MILLISECOND, MICROSECOND and NANOSECOND`? I
think the only limitation is people can't mix year-month and day-time fields
now, but I don't see why we need to forbid `WEEK, MILLISECOND, MICROSECOND and
NANOSECOND`.
##########
File path:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala
##########
@@ -2358,20 +2358,56 @@ class AstBuilder extends SqlBaseBaseVisitor[AnyRef]
with SQLConfHelper with Logg
// `CalendarInterval` doesn't have enough info. For instance, new
CalendarInterval(0, 0, 0)
// can be derived from INTERVAL '0-0' YEAR TO MONTH as well as from
// INTERVAL '0 00:00:00' DAY TO SECOND.
+ val fromUnit =
+
ctx.errorCapturingUnitToUnitInterval.body.from.getText.toLowerCase(Locale.ROOT)
val toUnit =
ctx.errorCapturingUnitToUnitInterval.body.to.getText.toLowerCase(Locale.ROOT)
if (toUnit == "month") {
assert(calendarInterval.days == 0 && calendarInterval.microseconds ==
0)
- // TODO(SPARK-35773): Parse year-month interval literals to tightest
types
- Literal(calendarInterval.months, YearMonthIntervalType())
+ val start = YearMonthIntervalType.stringToField(fromUnit)
+ val end = YearMonthIntervalType.stringToField(toUnit)
+ Literal(calendarInterval.months, YearMonthIntervalType(start, end))
} else {
assert(calendarInterval.months == 0)
- val fromUnit =
-
ctx.errorCapturingUnitToUnitInterval.body.from.getText.toLowerCase(Locale.ROOT)
val micros = IntervalUtils.getDuration(calendarInterval,
TimeUnit.MICROSECONDS)
val start = DayTimeIntervalType.stringToField(fromUnit)
val end = DayTimeIntervalType.stringToField(toUnit)
Literal(micros, DayTimeIntervalType(start, end))
}
+ } else if (ctx.errorCapturingMultiUnitsInterval != null &&
!conf.legacyIntervalEnabled) {
+ val units =
+ ctx.errorCapturingMultiUnitsInterval.body.unit.asScala.map(
+ _.getText.toLowerCase(Locale.ROOT).stripSuffix("s"))
+ if (units.forall(YearMonthIntervalType.stringToField.contains)) {
+ val fields = units.map(YearMonthIntervalType.stringToField)
+ val start = if (0 < IntervalUtils.getYears(calendarInterval)) {
Review comment:
what does this mean? why can't we always use `fields.min`?
##########
File path: sql/core/src/test/resources/sql-tests/results/ansi/interval.sql.out
##########
@@ -1215,9 +1251,9 @@ SELECT
to_csv(named_struct('a', interval 32 month, 'b', interval 70 minute)),
from_csv(to_csv(named_struct('a', interval 32 month, 'b', interval 70
minute)), 'a interval, b interval')
-- !query schema
-struct<from_csv(1, 1 day):struct<a:int,b:interval>,to_csv(from_csv(1, 1
day)):string,to_csv(named_struct(a, INTERVAL '2 years 8 months', b, INTERVAL '1
hours 10 minutes')):string,from_csv(to_csv(named_struct(a, INTERVAL '2 years 8
months', b, INTERVAL '1 hours 10 minutes'))):struct<a:interval,b:interval>>
+struct<from_csv(1, 1 day):struct<a:int,b:interval>,to_csv(from_csv(1, 1
day)):string,to_csv(named_struct(a, INTERVAL '2-8' YEAR TO MONTH, b, INTERVAL
'01:10' HOUR TO MINUTE)):string,from_csv(to_csv(named_struct(a, INTERVAL '2-8'
YEAR TO MONTH, b, INTERVAL '01:10' HOUR TO
MINUTE))):struct<a:interval,b:interval>>
-- !query output
-{"a":1,"b":1 days} 1,1 days 2 years 8 months,1 hours 10 minutes
{"a":2 years 8 months,"b":1 hours 10 minutes}
+{"a":1,"b":1 days} 1,1 days 32,4200000000 {"a":null,"b":null}
Review comment:
> because the result of named_struct('a', interval 32 month, 'b',
interval 70 minute) is 32,4200000000
Is it corrected? I thought it should be `interval '32' month`
##########
File path:
sql/core/src/test/resources/tpcds-plan-stability/approved-plans-v1_4/q12.sf100/explain.txt
##########
@@ -142,15 +142,15 @@ BroadcastExchange (28)
Output [2]: [d_date_sk#13, d_date#24]
Batched: true
Location [not included in comparison]/{warehouse_dir}/date_dim]
-PushedFilters: [IsNotNull(d_date), GreaterThanOrEqual(d_date,1999-02-22),
LessThanOrEqual(d_date,1999-03-24), IsNotNull(d_date_sk)]
+PushedFilters: [IsNotNull(d_date), GreaterThanOrEqual(d_date,1999-02-22),
IsNotNull(d_date_sk)]
Review comment:
do we break filter pushdown in this PR?
##########
File path:
sql/core/src/test/resources/tpcds-plan-stability/approved-plans-v1_4/q12.sf100/explain.txt
##########
@@ -142,15 +142,15 @@ BroadcastExchange (28)
Output [2]: [d_date_sk#13, d_date#24]
Batched: true
Location [not included in comparison]/{warehouse_dir}/date_dim]
-PushedFilters: [IsNotNull(d_date), GreaterThanOrEqual(d_date,1999-02-22),
LessThanOrEqual(d_date,1999-03-24), IsNotNull(d_date_sk)]
+PushedFilters: [IsNotNull(d_date), GreaterThanOrEqual(d_date,1999-02-22),
IsNotNull(d_date_sk)]
Review comment:
> cast('1999-02-22' AS DATE) + INTERVAL 30 days is TimestampType
I checked the corresponding code in `ResolveBinaryArithmetic`
```
case (DateType, DayTimeIntervalType(DAY, DAY)) => DateAdd(l,
ExtractANSIIntervalDays(r))
```
`DateAdd` returns Date, I think we are fine now?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]