maropu commented on a change in pull request #30260:
URL: https://github.com/apache/spark/pull/30260#discussion_r522043455
##########
File path:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala
##########
@@ -1770,6 +1782,110 @@ case class AnsiCast(child: Expression, dataType:
DataType, timeZoneId: Option[St
copy(timeZoneId = Option(timeZoneId))
override protected val ansiEnabled: Boolean = true
+
+ override def canCast(from: DataType, to: DataType): Boolean =
AnsiCast.canCast(from, to)
+}
+
+object AnsiCast {
+ /**
+ * As per section 6.13 "cast specification" in "Information technology —
Database languages " +
Review comment:
Nice update. Thanks!
##########
File path: docs/sql-ref-ansi-compliance.md
##########
@@ -107,6 +107,22 @@ SELECT * FROM t;
+---+
```
+The valid combinations of target data type and source data type in a `Cast`
expression are given by the following table.
+“Y” indicates that the combination is syntactically valid without restriction
and “N” indicates that the combination is not valid.
+
+| From\To | Numeric | String | Date | Timestamp | Interval | Boolean |
Binary | Array | Map | Struct |
+|-----------|---------|--------|------|-----------|----------|---------|--------|-------|-----|--------|
+| Numeric | Y | Y | N | N | N | Y | N
| N | N | N |
+| String | Y | Y | Y | Y | Y | Y | Y
| N | N | N |
+| Date | N | Y | Y | Y | N | N | N
| N | N | N |
+| Timestamp | N | Y | Y | Y | N | N | N
| N | N | N |
+| Interval | N | Y | N | N | Y | N | N
| N | N | N |
+| Boolean | Y | Y | N | N | N | Y | N
| N | N | N |
+| Binary | Y | N | N | N | N | N | Y
| N | N | N |
+| Array | N | N | N | N | N | N | N
| Y | N | N |
+| Map | N | N | N | N | N | N | N
| N | Y | N |
+| Struct | N | N | N | N | N | N | N
| N | N | Y |
Review comment:
How about following the wording in the data type document, e.g.,
`String` -> `StringType`
https://spark.apache.org/docs/3.0.1/sql-ref-datatypes.html
##########
File path: docs/sql-ref-ansi-compliance.md
##########
@@ -107,6 +107,22 @@ SELECT * FROM t;
+---+
```
+The valid combinations of target data type and source data type in a `Cast`
expression are given by the following table.
Review comment:
nit: `Cast` expression -> `CAST` syntax ?
##########
File path: docs/sql-ref-ansi-compliance.md
##########
@@ -107,6 +107,22 @@ SELECT * FROM t;
+---+
```
+The valid combinations of target data type and source data type in a `Cast`
expression are given by the following table.
Review comment:
How about moving this statements into L61-62?
##########
File path:
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/CastSuite.scala
##########
@@ -635,16 +585,74 @@ abstract class CastSuiteBase extends SparkFunSuite with
ExpressionEvalHelper {
checkEvaluation(cast("", BooleanType), null)
}
+ private def checkInvalidCastFromNumericType(to: DataType): Unit = {
+ assert(cast(1.toByte, to).checkInputDataTypes().isFailure)
+ assert(cast(1.toShort, to).checkInputDataTypes().isFailure)
+ assert(cast(1, to).checkInputDataTypes().isFailure)
+ assert(cast(1L, to).checkInputDataTypes().isFailure)
+ assert(cast(1.0.toFloat, to).checkInputDataTypes().isFailure)
+ assert(cast(1.0, to).checkInputDataTypes().isFailure)
+ }
+
test("SPARK-16729 type checking for casting to date type") {
assert(cast("1234", DateType).checkInputDataTypes().isSuccess)
assert(cast(new Timestamp(1), DateType).checkInputDataTypes().isSuccess)
assert(cast(false, DateType).checkInputDataTypes().isFailure)
- assert(cast(1.toByte, DateType).checkInputDataTypes().isFailure)
- assert(cast(1.toShort, DateType).checkInputDataTypes().isFailure)
- assert(cast(1, DateType).checkInputDataTypes().isFailure)
- assert(cast(1L, DateType).checkInputDataTypes().isFailure)
- assert(cast(1.0.toFloat, DateType).checkInputDataTypes().isFailure)
- assert(cast(1.0, DateType).checkInputDataTypes().isFailure)
+ checkInvalidCastFromNumericType(DateType)
+ }
+
+ test("ANSI mode: disallow type conversions between Numeric types and
Timestamp type") {
+ import DataTypeTestUtils.numericTypes
+ withSQLConf(ansiConfForExceptionOnOverflow) {
Review comment:
Is this test related to overflow tests? (Seems the config name suggests
so)
##########
File path:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala
##########
@@ -1770,6 +1782,110 @@ case class AnsiCast(child: Expression, dataType:
DataType, timeZoneId: Option[St
copy(timeZoneId = Option(timeZoneId))
override protected val ansiEnabled: Boolean = true
+
+ override def canCast(from: DataType, to: DataType): Boolean =
AnsiCast.canCast(from, to)
+}
+
+object AnsiCast {
+ /**
+ * As per section 6.13 "cast specification" in "Information technology —
Database languages " +
+ * "- SQL — Part 2: Foundation (SQL/Foundation)":
+ * If the <cast operand> is a <value expression>, then the valid
combinations of TD and SD
+ * in a <cast specification> are given by the following table. “Y” indicates
that the
+ * combination is syntactically valid without restriction; “M” indicates
that the combination
+ * is valid subject to other Syntax Rules in this Sub- clause being
satisfied; and “N” indicates
+ * that the combination is not valid:
+ * SD TD
+ * EN AN C D T TS YM DT BO UDT B RT CT RW
+ * EN Y Y Y N N N M M N M N M N N
+ * AN Y Y Y N N N N N N M N M N N
+ * C Y Y Y Y Y Y Y Y Y M N M N N
+ * D N N Y Y N Y N N N M N M N N
+ * T N N Y N Y Y N N N M N M N N
+ * TS N N Y Y Y Y N N N M N M N N
+ * YM M N Y N N N Y N N M N M N N
+ * DT M N Y N N N N Y N M N M N N
+ * BO N N Y N N N N N Y M N M N N
+ * UDT M M M M M M M M M M M M M N
+ * B N N N N N N N N N M Y M N N
+ * RT M M M M M M M M M M M M N N
+ * CT N N N N N N N N N M N N M N
+ * RW N N N N N N N N N N N N N M
+ *
+ * Where:
+ * EN = Exact Numeric
+ * AN = Approximate Numeric
+ * C = Character (Fixed- or Variable-Length, or Character Large Object)
+ * D = Date
+ * T = Time
+ * TS = Timestamp
+ * YM = Year-Month Interval
+ * DT = Day-Time Interval
+ * BO = Boolean
+ * UDT = User-Defined Type
+ * B = Binary (Fixed- or Variable-Length or Binary Large Object)
+ * RT = Reference type
+ * CT = Collection type
+ * RW = Row type
+ *
+ * Spark's ANSI mode follows the syntax rules, except it specially allow the
following
+ * straightforward type conversions which are disallowed as per the SQL
standard:
+ * - Numeric <=> Boolean
+ * - String <=> Binary
Review comment:
How about describing this in `sql-ref-ansi-compliance.md`, too?
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]