[GitHub] [spark] cloud-fan commented on a diff in pull request #40190: [SPARK-42597][SQL] Support unwrap date type to timestamp type
cloud-fan commented on code in PR #40190: URL: https://github.com/apache/spark/pull/40190#discussion_r1161440201 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/UnwrapCastInBinaryComparison.scala: ## @@ -293,6 +297,34 @@ object UnwrapCastInBinaryComparison extends Rule[LogicalPlan] { } } + /** + * Move the cast to the literal side, because we can only get the minimum value of timestamp, + * so some BinaryComparison needs to be changed, + * such as CAST(ts AS date) > DATE '2023-01-01' ===> ts >= TIMESTAMP '2023-01-02 00:00:00' + */ + private def unwrapDateToTimestamp( + exp: BinaryComparison, + fromExp: Expression, + date: Literal, + tz: Option[String], + evalMode: EvalMode.Value): Expression = { +val dateAddOne = DateAdd(date, Literal(1, IntegerType)) +exp match { + case _: GreaterThan => +GreaterThanOrEqual(fromExp, Cast(dateAddOne, fromExp.dataType, tz, evalMode)) + case _: GreaterThanOrEqual => +GreaterThanOrEqual(fromExp, Cast(date, fromExp.dataType, tz, evalMode)) + case Equality(_, _) => Review Comment: I think this is wrong for `EqualNullSafe`. If the column value is null, then `EqualNullSafe` should return false, but the rewritten expression returns null. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a diff in pull request #40190: [SPARK-42597][SQL] Support unwrap date type to timestamp type
cloud-fan commented on code in PR #40190: URL: https://github.com/apache/spark/pull/40190#discussion_r1132445403 ## sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/UnwrapCastInBinaryComparisonSuite.scala: ## @@ -368,6 +370,61 @@ class UnwrapCastInBinaryComparisonSuite extends PlanTest with ExpressionEvalHelp assertEquivalent(castInt(f4) < t, trueIfNotNull(f4)) } + test("SPARK-42597: Support unwrap date to timestamp type") { +val tsRelation = LocalRelation($"a".timestamp, $"b".timestampNTZ) +val dateLit = Literal.create(Date.valueOf("2023-01-01"), DateType) +val nullLit = Literal.create(null, DateType) + +def assertUnwrapDateToTimestamp(e1: Expression, e2: Expression): Unit = { + val plan = tsRelation.where(e1).analyze + val actual = Optimize.execute(plan) + val expected = tsRelation.where(e2).analyze + comparePlans(actual, expected) +} + +assertUnwrapDateToTimestamp( Review Comment: or we can add timestamp test data, so that we can evaluate as well. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a diff in pull request #40190: [SPARK-42597][SQL] Support unwrap date type to timestamp type
cloud-fan commented on code in PR #40190: URL: https://github.com/apache/spark/pull/40190#discussion_r1132445884 ## sql/core/src/test/scala/org/apache/spark/sql/UnwrapCastInComparisonEndToEndSuite.scala: ## @@ -240,5 +242,27 @@ class UnwrapCastInComparisonEndToEndSuite extends QueryTest with SharedSparkSess } } + test("SPARK-42597: Support unwrap date type to timestamp type") { +val ts1 = Timestamp.valueOf("2023-01-01 23:59:59.09") +val ts2 = Timestamp.valueOf("2023-01-01 23:59:59.98") +val ts3 = Timestamp.valueOf("2023-01-02 23:59:59.08") + +withTable(t) { + Seq(ts1, ts2, ts3).toDF("ts").write.saveAsTable(t) + val df = spark.table(t) + + checkAnswer( +df.where("cast(ts as date) > cast('2023-01-01' as date)"), Seq(ts3).map(Row(_))) Review Comment: it's more common to write date literal with `date''` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a diff in pull request #40190: [SPARK-42597][SQL] Support unwrap date type to timestamp type
cloud-fan commented on code in PR #40190: URL: https://github.com/apache/spark/pull/40190#discussion_r1132444507 ## sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/UnwrapCastInBinaryComparisonSuite.scala: ## @@ -368,6 +370,61 @@ class UnwrapCastInBinaryComparisonSuite extends PlanTest with ExpressionEvalHelp assertEquivalent(castInt(f4) < t, trueIfNotNull(f4)) } + test("SPARK-42597: Support unwrap date to timestamp type") { +val tsRelation = LocalRelation($"a".timestamp, $"b".timestampNTZ) +val dateLit = Literal.create(Date.valueOf("2023-01-01"), DateType) +val nullLit = Literal.create(null, DateType) + +def assertUnwrapDateToTimestamp(e1: Expression, e2: Expression): Unit = { + val plan = tsRelation.where(e1).analyze + val actual = Optimize.execute(plan) + val expected = tsRelation.where(e2).analyze + comparePlans(actual, expected) +} + +assertUnwrapDateToTimestamp( Review Comment: shall we just call `assertEquivalent` with `evaluate = false`? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a diff in pull request #40190: [SPARK-42597][SQL] Support unwrap date type to timestamp type
cloud-fan commented on code in PR #40190: URL: https://github.com/apache/spark/pull/40190#discussion_r1132439974 ## sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/UnwrapCastInBinaryComparisonSuite.scala: ## @@ -368,6 +370,61 @@ class UnwrapCastInBinaryComparisonSuite extends PlanTest with ExpressionEvalHelp assertEquivalent(castInt(f4) < t, trueIfNotNull(f4)) } + test("SPARK-42597: Support unwrap date to timestamp type") { +val tsRelation = LocalRelation($"a".timestamp, $"b".timestampNTZ) +val dateLit = Literal.create(Date.valueOf("2023-01-01"), DateType) Review Comment: let's use the new `LocalDate` java class. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a diff in pull request #40190: [SPARK-42597][SQL] Support unwrap date type to timestamp type
cloud-fan commented on code in PR #40190: URL: https://github.com/apache/spark/pull/40190#discussion_r1132437369 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/UnwrapCastInBinaryComparison.scala: ## @@ -43,6 +43,7 @@ import org.apache.spark.sql.types._ * decimal, etc) or boolean type * 2). `fromType` can be safely coerced to `toType` without precision loss (e.g., short to int, * int to long, but not long to int, nor int to boolean) + * 3). `fromExp` is a timestamp type and `toType` is a date type Review Comment: Can we simply update `1)`? `numeric type` -> `numeric or datetime type` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a diff in pull request #40190: [SPARK-42597][SQL] Support unwrap date type to timestamp type
cloud-fan commented on code in PR #40190: URL: https://github.com/apache/spark/pull/40190#discussion_r1130941590 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/UnwrapCastInBinaryComparison.scala: ## @@ -113,7 +114,7 @@ object UnwrapCastInBinaryComparison extends Rule[LogicalPlan] { // literal and cast side, then process the result and swap the literal and cast again to // restore the original order. case BinaryComparison(Literal(_, literalType), Cast(fromExp, toType, _, _)) -if canImplicitlyCast(fromExp, toType, literalType) => +if needToCanonicalizeExpression(fromExp, toType, literalType) => Review Comment: I see, maybe it's better if `unwrapCast` returns `Option[Expression]`, then here we can just do `unwrapCast(swap(exp)).map(swap)`, and the caller side can do `unwrapCast(e).getOrElse(e)` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a diff in pull request #40190: [SPARK-42597][SQL] Support unwrap date type to timestamp type
cloud-fan commented on code in PR #40190: URL: https://github.com/apache/spark/pull/40190#discussion_r1129686241 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/UnwrapCastInBinaryComparison.scala: ## @@ -113,7 +114,7 @@ object UnwrapCastInBinaryComparison extends Rule[LogicalPlan] { // literal and cast side, then process the result and swap the literal and cast again to // restore the original order. case BinaryComparison(Literal(_, literalType), Cast(fromExp, toType, _, _)) -if canImplicitlyCast(fromExp, toType, literalType) => +if needToCanonicalizeExpression(fromExp, toType, literalType) => Review Comment: OK now I got it. I feel `needToCanonicalizeExpression` is confusing and updating `canImplicitlyCast` directly is fine. We may need to rename `simplifyNumericComparison` as it handles datetime as well. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a diff in pull request #40190: [SPARK-42597][SQL] Support unwrap date type to timestamp type
cloud-fan commented on code in PR #40190: URL: https://github.com/apache/spark/pull/40190#discussion_r1129677751 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/UnwrapCastInBinaryComparison.scala: ## @@ -43,6 +43,7 @@ import org.apache.spark.sql.types._ * decimal, etc) or boolean type * 2). `fromType` can be safely coerced to `toType` without precision loss (e.g., short to int, * int to long, but not long to int, nor int to boolean) + * 3). `fromExp` is a timestamp type and `toType` is a date type Review Comment: this is not a `3)`, but an additional supported case. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a diff in pull request #40190: [SPARK-42597][SQL] Support unwrap date type to timestamp type
cloud-fan commented on code in PR #40190: URL: https://github.com/apache/spark/pull/40190#discussion_r1127955295 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/UnwrapCastInBinaryComparison.scala: ## @@ -350,7 +384,7 @@ object UnwrapCastInBinaryComparison extends Rule[LogicalPlan] { literalType: DataType): Boolean = { toType.sameType(literalType) && !fromExp.foldable && - toType.isInstanceOf[NumericType] && + (toType.isInstanceOf[NumericType] || toType.isInstanceOf[DateType]) && Review Comment: do we need to change here? The new code does not call `canImplicitlyCast` ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/UnwrapCastInBinaryComparison.scala: ## @@ -361,6 +395,7 @@ object UnwrapCastInBinaryComparison extends Rule[LogicalPlan] { case (IntegerType, FloatType) => false case (LongType, FloatType) => false case (LongType, DoubleType) => false +case (TimestampType | TimestampNTZType, DateType) => true Review Comment: ditto -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org