[GitHub] [spark] beliefer commented on a change in pull request #32311: [SPARK-35088][SQL] Accept ANSI intervals by the Sequence expression

GitBox Sun, 25 Apr 2021 23:51:58 -0700


beliefer commented on a change in pull request #32311:
URL: https://github.com/apache/spark/pull/32311#discussion_r619758753




##########
File path: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/CollectionExpressionsSuite.scala
##########
@@ -744,8 +745,8 @@ class CollectionExpressionsSuite extends SparkFunSuite with 
ExpressionEvalHelper
 
     checkEvaluation(new Sequence(
       Literal(Timestamp.valueOf("2018-01-01 00:00:00")),
-      Literal(Timestamp.valueOf("2018-01-02 00:00:01")),

Review comment:
       I'm dazzled.

##########
File path: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/CollectionExpressionsSuite.scala
##########
@@ -934,12 +1071,25 @@ class CollectionExpressionsSuite extends SparkFunSuite 
with ExpressionEvalHelper
         EmptyRow,
         s"sequence boundaries: 0 to 2678400000000 by -${28 * MICROS_PER_DAY}")
 
+      checkExceptionInExpression[IllegalArgumentException](
+        new Sequence(
+          Literal(Date.valueOf("1970-01-01")),
+          Literal(Date.valueOf("1970-02-01")),
+          Literal(Period.ofMonths(-1))),
+        EmptyRow,
+        s"sequence boundaries: 0 to 2678400000000 by -${28 * MICROS_PER_DAY}")

Review comment:
       Good found.
   It seems we should change the behavior of `CalenderInterval` so as it could 
avoid such assumption.
   Or the assumption looks like is a bug for the `CalenderInterval`.
   If so , could we fix the bug in another PR?

##########
File path: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/CollectionExpressionsSuite.scala
##########
@@ -934,12 +1071,25 @@ class CollectionExpressionsSuite extends SparkFunSuite 
with ExpressionEvalHelper
         EmptyRow,
         s"sequence boundaries: 0 to 2678400000000 by -${28 * MICROS_PER_DAY}")
 
+      checkExceptionInExpression[IllegalArgumentException](
+        new Sequence(
+          Literal(Date.valueOf("1970-01-01")),
+          Literal(Date.valueOf("1970-02-01")),
+          Literal(Period.ofMonths(-1))),
+        EmptyRow,
+        s"sequence boundaries: 0 to 2678400000000 by -${28 * MICROS_PER_DAY}")

Review comment:
       In further, microsPerDay just used to estimated length of the sequences. 
We just need to improve the exception message so as avoid output the assume 
value.

##########
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala
##########
@@ -2723,28 +2745,94 @@ object Sequence {
     }
   }
 
+  private class PeriodSequenceImpl[T: ClassTag]
+      (dt: IntegralType, scale: Long, fromLong: Long => T, zoneId: ZoneId)
+      (implicit num: Integral[T]) extends InternalSequenceBase(dt, scale, 
fromLong, zoneId) {

Review comment:
       In fact, the current implement uses `DateTimeUtils.timestampAddInterval` 
and it's behavior seems good.

##########
File path: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/CollectionExpressionsSuite.scala
##########
@@ -934,12 +1071,25 @@ class CollectionExpressionsSuite extends SparkFunSuite 
with ExpressionEvalHelper
         EmptyRow,
         s"sequence boundaries: 0 to 2678400000000 by -${28 * MICROS_PER_DAY}")
 
+      checkExceptionInExpression[IllegalArgumentException](
+        new Sequence(
+          Literal(Date.valueOf("1970-01-01")),
+          Literal(Date.valueOf("1970-02-01")),
+          Literal(Period.ofMonths(-1))),
+        EmptyRow,
+        s"sequence boundaries: 0 to 2678400000000 by -${28 * MICROS_PER_DAY}")

Review comment:
       Good found.
   ~~It seems we should change the behavior of `CalenderInterval` so as it 
could avoid such assumption.~~
   ~~Or the assumption looks like is a bug for the `CalenderInterval`.~~
   ~~If so , could we fix the bug in another PR?~~

##########
File path: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/CollectionExpressionsSuite.scala
##########
@@ -934,12 +1071,25 @@ class CollectionExpressionsSuite extends SparkFunSuite 
with ExpressionEvalHelper
         EmptyRow,
         s"sequence boundaries: 0 to 2678400000000 by -${28 * MICROS_PER_DAY}")
 
+      checkExceptionInExpression[IllegalArgumentException](
+        new Sequence(
+          Literal(Date.valueOf("1970-01-01")),
+          Literal(Date.valueOf("1970-02-01")),
+          Literal(Period.ofMonths(-1))),
+        EmptyRow,
+        s"sequence boundaries: 0 to 2678400000000 by -${28 * MICROS_PER_DAY}")

Review comment:
       In further, `microsPerDay` just used to estimated length of the 
sequences. We just need to improve the exception message so as avoid output the 
assume value.

##########
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala
##########
@@ -2484,8 +2484,8 @@ case class Flatten(child: Expression) extends 
UnaryExpression with NullIntoleran
 
       The start and stop expressions must resolve to the same type.
       If start and stop expressions resolve to the 'date' or 'timestamp' type
-      then the step expression must resolve to the 'interval' type, otherwise 
to the same type
-      as the start and stop expressions.
+      then the step expression must resolve to the 'interval' or 'year-month' 
or 'day-time' type,

Review comment:
       OK

##########
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala
##########
@@ -2561,29 +2568,50 @@ case class Sequence(
       TypeCheckResult.TypeCheckSuccess
     } else {
       TypeCheckResult.TypeCheckFailure(
-        s"$prettyName only supports integral, timestamp or date types")
+        s"""
+           |$prettyName uses the wrong parameter type. The parameter type must 
conform to:
+           |1. The start and stop expressions must resolve to the same type.
+           |2. If start and stop expressions resolve to the 'date' or 
'timestamp' type
+           |then the step expression must resolve to the 'interval' or 
'year-month' or

Review comment:
       OK




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] beliefer commented on a change in pull request #32311: [SPARK-35088][SQL] Accept ANSI intervals by the Sequence expression

Reply via email to