[GitHub] [spark] cloud-fan commented on a diff in pull request #40190: [SPARK-42597][SQL] Support unwrap date type to timestamp type

2023-04-09 Thread via GitHub


cloud-fan commented on code in PR #40190:
URL: https://github.com/apache/spark/pull/40190#discussion_r1161440201


##
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/UnwrapCastInBinaryComparison.scala:
##
@@ -293,6 +297,34 @@ object UnwrapCastInBinaryComparison extends 
Rule[LogicalPlan] {
 }
   }
 
+  /**
+   * Move the cast to the literal side, because we can only get the minimum 
value of timestamp,
+   * so some BinaryComparison needs to be changed,
+   * such as CAST(ts AS date) > DATE '2023-01-01' ===> ts >= TIMESTAMP 
'2023-01-02 00:00:00'
+   */
+  private def unwrapDateToTimestamp(
+  exp: BinaryComparison,
+  fromExp: Expression,
+  date: Literal,
+  tz: Option[String],
+  evalMode: EvalMode.Value): Expression = {
+val dateAddOne = DateAdd(date, Literal(1, IntegerType))
+exp match {
+  case _: GreaterThan =>
+GreaterThanOrEqual(fromExp, Cast(dateAddOne, fromExp.dataType, tz, 
evalMode))
+  case _: GreaterThanOrEqual =>
+GreaterThanOrEqual(fromExp, Cast(date, fromExp.dataType, tz, evalMode))
+  case Equality(_, _) =>

Review Comment:
   I think this is wrong for `EqualNullSafe`. If the column value is null, then 
`EqualNullSafe` should return false, but the rewritten expression returns null.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on a diff in pull request #40190: [SPARK-42597][SQL] Support unwrap date type to timestamp type

2023-03-10 Thread via GitHub


cloud-fan commented on code in PR #40190:
URL: https://github.com/apache/spark/pull/40190#discussion_r1132445403


##
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/UnwrapCastInBinaryComparisonSuite.scala:
##
@@ -368,6 +370,61 @@ class UnwrapCastInBinaryComparisonSuite extends PlanTest 
with ExpressionEvalHelp
 assertEquivalent(castInt(f4) < t, trueIfNotNull(f4))
   }
 
+  test("SPARK-42597: Support unwrap date to timestamp type") {
+val tsRelation = LocalRelation($"a".timestamp, $"b".timestampNTZ)
+val dateLit = Literal.create(Date.valueOf("2023-01-01"), DateType)
+val nullLit = Literal.create(null, DateType)
+
+def assertUnwrapDateToTimestamp(e1: Expression, e2: Expression): Unit = {
+  val plan = tsRelation.where(e1).analyze
+  val actual = Optimize.execute(plan)
+  val expected = tsRelation.where(e2).analyze
+  comparePlans(actual, expected)
+}
+
+assertUnwrapDateToTimestamp(

Review Comment:
   or we can add timestamp test data, so that we can evaluate as well.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on a diff in pull request #40190: [SPARK-42597][SQL] Support unwrap date type to timestamp type

2023-03-10 Thread via GitHub


cloud-fan commented on code in PR #40190:
URL: https://github.com/apache/spark/pull/40190#discussion_r1132445884


##
sql/core/src/test/scala/org/apache/spark/sql/UnwrapCastInComparisonEndToEndSuite.scala:
##
@@ -240,5 +242,27 @@ class UnwrapCastInComparisonEndToEndSuite extends 
QueryTest with SharedSparkSess
 }
   }
 
+  test("SPARK-42597: Support unwrap date type to timestamp type") {
+val ts1 = Timestamp.valueOf("2023-01-01 23:59:59.09")
+val ts2 = Timestamp.valueOf("2023-01-01 23:59:59.98")
+val ts3 = Timestamp.valueOf("2023-01-02 23:59:59.08")
+
+withTable(t) {
+  Seq(ts1, ts2, ts3).toDF("ts").write.saveAsTable(t)
+  val df = spark.table(t)
+
+  checkAnswer(
+df.where("cast(ts as date) > cast('2023-01-01' as date)"), 
Seq(ts3).map(Row(_)))

Review Comment:
   it's more common to write date literal with `date''`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on a diff in pull request #40190: [SPARK-42597][SQL] Support unwrap date type to timestamp type

2023-03-10 Thread via GitHub


cloud-fan commented on code in PR #40190:
URL: https://github.com/apache/spark/pull/40190#discussion_r1132444507


##
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/UnwrapCastInBinaryComparisonSuite.scala:
##
@@ -368,6 +370,61 @@ class UnwrapCastInBinaryComparisonSuite extends PlanTest 
with ExpressionEvalHelp
 assertEquivalent(castInt(f4) < t, trueIfNotNull(f4))
   }
 
+  test("SPARK-42597: Support unwrap date to timestamp type") {
+val tsRelation = LocalRelation($"a".timestamp, $"b".timestampNTZ)
+val dateLit = Literal.create(Date.valueOf("2023-01-01"), DateType)
+val nullLit = Literal.create(null, DateType)
+
+def assertUnwrapDateToTimestamp(e1: Expression, e2: Expression): Unit = {
+  val plan = tsRelation.where(e1).analyze
+  val actual = Optimize.execute(plan)
+  val expected = tsRelation.where(e2).analyze
+  comparePlans(actual, expected)
+}
+
+assertUnwrapDateToTimestamp(

Review Comment:
   shall we just call `assertEquivalent` with `evaluate = false`?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on a diff in pull request #40190: [SPARK-42597][SQL] Support unwrap date type to timestamp type

2023-03-10 Thread via GitHub


cloud-fan commented on code in PR #40190:
URL: https://github.com/apache/spark/pull/40190#discussion_r1132439974


##
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/UnwrapCastInBinaryComparisonSuite.scala:
##
@@ -368,6 +370,61 @@ class UnwrapCastInBinaryComparisonSuite extends PlanTest 
with ExpressionEvalHelp
 assertEquivalent(castInt(f4) < t, trueIfNotNull(f4))
   }
 
+  test("SPARK-42597: Support unwrap date to timestamp type") {
+val tsRelation = LocalRelation($"a".timestamp, $"b".timestampNTZ)
+val dateLit = Literal.create(Date.valueOf("2023-01-01"), DateType)

Review Comment:
   let's use the new `LocalDate` java class.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on a diff in pull request #40190: [SPARK-42597][SQL] Support unwrap date type to timestamp type

2023-03-10 Thread via GitHub


cloud-fan commented on code in PR #40190:
URL: https://github.com/apache/spark/pull/40190#discussion_r1132437369


##
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/UnwrapCastInBinaryComparison.scala:
##
@@ -43,6 +43,7 @@ import org.apache.spark.sql.types._
  * decimal, etc) or boolean type
  *   2). `fromType` can be safely coerced to `toType` without precision loss 
(e.g., short to int,
  * int to long, but not long to int, nor int to boolean)
+ *   3). `fromExp` is a timestamp type and `toType` is a date type

Review Comment:
   Can we simply update `1)`? `numeric type` -> `numeric or datetime type`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on a diff in pull request #40190: [SPARK-42597][SQL] Support unwrap date type to timestamp type

2023-03-09 Thread via GitHub


cloud-fan commented on code in PR #40190:
URL: https://github.com/apache/spark/pull/40190#discussion_r1130941590


##
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/UnwrapCastInBinaryComparison.scala:
##
@@ -113,7 +114,7 @@ object UnwrapCastInBinaryComparison extends 
Rule[LogicalPlan] {
 // literal and cast side, then process the result and swap the literal and 
cast again to
 // restore the original order.
 case BinaryComparison(Literal(_, literalType), Cast(fromExp, toType, _, _))
-if canImplicitlyCast(fromExp, toType, literalType) =>
+if needToCanonicalizeExpression(fromExp, toType, literalType) =>

Review Comment:
   I see, maybe it's better if `unwrapCast` returns `Option[Expression]`, then 
here we can just do `unwrapCast(swap(exp)).map(swap)`, and the caller side can 
do `unwrapCast(e).getOrElse(e)`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on a diff in pull request #40190: [SPARK-42597][SQL] Support unwrap date type to timestamp type

2023-03-08 Thread via GitHub


cloud-fan commented on code in PR #40190:
URL: https://github.com/apache/spark/pull/40190#discussion_r1129686241


##
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/UnwrapCastInBinaryComparison.scala:
##
@@ -113,7 +114,7 @@ object UnwrapCastInBinaryComparison extends 
Rule[LogicalPlan] {
 // literal and cast side, then process the result and swap the literal and 
cast again to
 // restore the original order.
 case BinaryComparison(Literal(_, literalType), Cast(fromExp, toType, _, _))
-if canImplicitlyCast(fromExp, toType, literalType) =>
+if needToCanonicalizeExpression(fromExp, toType, literalType) =>

Review Comment:
   OK now I got it. I feel `needToCanonicalizeExpression` is confusing and 
updating `canImplicitlyCast` directly is fine. We may need to rename 
`simplifyNumericComparison` as it handles datetime as well.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on a diff in pull request #40190: [SPARK-42597][SQL] Support unwrap date type to timestamp type

2023-03-08 Thread via GitHub


cloud-fan commented on code in PR #40190:
URL: https://github.com/apache/spark/pull/40190#discussion_r1129677751


##
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/UnwrapCastInBinaryComparison.scala:
##
@@ -43,6 +43,7 @@ import org.apache.spark.sql.types._
  * decimal, etc) or boolean type
  *   2). `fromType` can be safely coerced to `toType` without precision loss 
(e.g., short to int,
  * int to long, but not long to int, nor int to boolean)
+ *   3). `fromExp` is a timestamp type and `toType` is a date type

Review Comment:
   this is not a `3)`, but an additional supported case.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on a diff in pull request #40190: [SPARK-42597][SQL] Support unwrap date type to timestamp type

2023-03-07 Thread via GitHub


cloud-fan commented on code in PR #40190:
URL: https://github.com/apache/spark/pull/40190#discussion_r1127955295


##
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/UnwrapCastInBinaryComparison.scala:
##
@@ -350,7 +384,7 @@ object UnwrapCastInBinaryComparison extends 
Rule[LogicalPlan] {
   literalType: DataType): Boolean = {
 toType.sameType(literalType) &&
   !fromExp.foldable &&
-  toType.isInstanceOf[NumericType] &&
+  (toType.isInstanceOf[NumericType] || toType.isInstanceOf[DateType]) &&

Review Comment:
   do we need to change here? The new code does not call `canImplicitlyCast`



##
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/UnwrapCastInBinaryComparison.scala:
##
@@ -361,6 +395,7 @@ object UnwrapCastInBinaryComparison extends 
Rule[LogicalPlan] {
 case (IntegerType, FloatType) => false
 case (LongType, FloatType) => false
 case (LongType, DoubleType) => false
+case (TimestampType | TimestampNTZType, DateType) => true

Review Comment:
   ditto



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org