shardulm94 commented on a change in pull request #35332:
URL: https://github.com/apache/spark/pull/35332#discussion_r793102644
##########
File path:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala
##########
@@ -310,13 +310,16 @@ abstract class CastBase extends UnaryExpression with
TimeZoneAwareExpression wit
protected def ansiEnabled: Boolean
+ protected def withDataType(dataType: DataType): CastBase
+
Review comment:
The `copy` methods are not generated for abstract classes.
##########
File path:
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/CanonicalizeSuite.scala
##########
@@ -177,4 +177,18 @@ class CanonicalizeSuite extends SparkFunSuite {
assert(expr.semanticEquals(attr))
assert(attr.semanticEquals(expr))
}
+
+ test("SPARK-38030: Canonicalize Cast should remove nullability of target
dataType") {
+ val structType = StructType(Seq(StructField("name", StringType, nullable =
false)))
+ val attr = AttributeReference("col", structType)()
+ for (cast <- Seq(
+ Cast(attr, structType),
+ AnsiCast(attr, structType),
+ TryCast(attr, structType))) {
+ assert(cast.resolved)
+ // canonicalization should not converted resolved cast to unresolved
+ assert(cast.canonicalized.resolved)
Review comment:
The original issue was detected because of the following plan `UnionExec
-> ProjectExec --> Alias --> Cast --> AttributeReference` and a call to
`.output` on `UnionExec` which in turn calls `ProjectExec.output.nullable`. I
can recreate this chain here, but not sure if thats useful. The root issue was
that a resolved node was being converted to unresolved which we are testing
here.
##########
File path:
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/CanonicalizeSuite.scala
##########
@@ -177,4 +177,18 @@ class CanonicalizeSuite extends SparkFunSuite {
assert(expr.semanticEquals(attr))
assert(attr.semanticEquals(expr))
}
+
+ test("SPARK-38030: Canonicalize Cast should remove nullability of target
dataType") {
+ val structType = StructType(Seq(StructField("name", StringType, nullable =
false)))
+ val attr = AttributeReference("col", structType)()
+ for (cast <- Seq(
+ Cast(attr, structType),
+ AnsiCast(attr, structType),
+ TryCast(attr, structType))) {
Review comment:
The existing approach is simpler to read IMO, but open to changing it if
more folks think the second approach is better.
##########
File path:
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/CanonicalizeSuite.scala
##########
@@ -177,4 +177,18 @@ class CanonicalizeSuite extends SparkFunSuite {
assert(expr.semanticEquals(attr))
assert(attr.semanticEquals(expr))
}
+
+ test("SPARK-38030: Canonicalize Cast should remove nullability of target
dataType") {
+ val structType = StructType(Seq(StructField("name", StringType, nullable =
false)))
+ val attr = AttributeReference("col", structType)()
+ for (cast <- Seq(
+ Cast(attr, structType),
+ AnsiCast(attr, structType),
+ TryCast(attr, structType))) {
+ assert(cast.resolved)
+ // canonicalization should not converted resolved cast to unresolved
+ assert(cast.canonicalized.resolved)
+ assert(cast.canonicalized.dataType == structType.asNullable)
Review comment:
Done
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]