[ https://issues.apache.org/jira/browse/SPARK-27790?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Xiao Li resolved SPARK-27790. ----------------------------- Fix Version/s: 3.3.0 Assignee: Max Gekk (was: Apache Spark) Resolution: Fixed > Support ANSI SQL INTERVAL types > ------------------------------- > > Key: SPARK-27790 > URL: https://issues.apache.org/jira/browse/SPARK-27790 > Project: Spark > Issue Type: Improvement > Components: SQL > Affects Versions: 3.2.0 > Reporter: Max Gekk > Assignee: Max Gekk > Priority: Major > Fix For: 3.3.0 > > > Spark has an INTERVAL data type, but it is “broken”: > # It cannot be persisted > # It is not comparable because it crosses the month day line. That is there > is no telling whether “1 Month 1 Day” is equal to “1 Month 1 Day” since not > all months have the same number of days. > I propose here to introduce the two flavours of INTERVAL as described in the > ANSI SQL Standard and deprecate the Sparks interval type. > * ANSI describes two non overlapping “classes”: > ** YEAR-MONTH, > ** DAY-SECOND ranges > * Members within each class can be compared and sorted. > * Supports datetime arithmetic > * Can be persisted. > The old and new flavors of INTERVAL can coexist until Spark INTERVAL is > eventually retired. Also any semantic “breakage” can be controlled via legacy > config settings. > *Milestone 1* -- Spark Interval equivalency ( The new interval types meet > or exceed all function of the existing SQL Interval): > * Add two new DataType implementations for interval year-month and > day-second. Includes the JSON format and DLL string. > * Infra support: check the caller sides of DateType/TimestampType > * Support the two new interval types in Dataset/UDF. > * Interval literals (with a legacy config to still allow mixed year-month > day-seconds fields and return legacy interval values) > * Interval arithmetic(interval * num, interval / num, interval +/- interval) > * Datetime functions/operators: Datetime - Datetime (to days or day second), > Datetime +/- interval > * Cast to and from the new two interval types, cast string to interval, cast > interval to string (pretty printing), with the SQL syntax to specify the types > * Support sorting intervals. > *Milestone 2* -- Persistence: > * Ability to create tables of type interval > * Ability to write to common file formats such as Parquet and JSON. > * INSERT, SELECT, UPDATE, MERGE > * Discovery > *Milestone 3* -- Client support > * JDBC support > * Hive Thrift server > *Milestone 4* -- PySpark and Spark R integration > * Python UDF can take and return intervals > * DataFrame support -- This message was sent by Atlassian Jira (v8.20.1#820001) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org