amaliujia commented on code in PR #35975:
URL: https://github.com/apache/spark/pull/35975#discussion_r852406001


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala:
##########
@@ -830,6 +848,30 @@ trait CheckAnalysis extends PredicateHelper with 
LookupCatalog {
     check(plan)
   }
 
+  /**
+   * Validate whether the [[Offset]] is valid.
+   */
+  private def checkOffsetOperator(plan: LogicalPlan): Unit = {
+    plan.foreachUp {
+      case o if !o.isInstanceOf[GlobalLimit] && !o.isInstanceOf[LocalLimit]
+        && o.children.exists(_.isInstanceOf[Offset]) =>
+        failAnalysis(
+          s"""
+             |The OFFSET clause is only allowed in the LIMIT clause, but the 
OFFSET
+             |clause found in: ${o.nodeName}.""".stripMargin.replace("\n", " 
"))
+      case _ =>
+    }
+    plan match {
+      case Offset(offsetExpr, _) =>
+        checkLimitLikeClause("offset", offsetExpr)
+        failAnalysis(
+          s"""
+             |The OFFSET clause is only allowed in the LIMIT clause, but the 
OFFSET
+             |clause is found to be the outermost 
node.""".stripMargin.replace("\n", " "))
+      case _ =>
+    }

Review Comment:
   Just checking, could line 864 to line 871 be merged with line 855 to line 
862? Do `case Offset(offsetExpr, _)` match somehow during `plan.foreachUp`?



##########
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/analysis/AnalysisErrorSuite.scala:
##########
@@ -531,6 +532,52 @@ class AnalysisErrorSuite extends AnalysisTest {
     "The limit expression must be equal to or greater than 0, but got -1" :: 
Nil
   )
 
+  errorTest(
+    "an evaluated offset class must not be string",
+    testRelation.offset(Literal(UTF8String.fromString("abc"), StringType)),
+    "The offset expression must be integer type, but got string" :: Nil
+  )
+
+  errorTest(
+    "an evaluated offset class must not be long",
+    testRelation.offset(Literal(10L, LongType)),

Review Comment:
   I am a bit confused here: is there is way for SQL to specify if a number is 
long, thus we hit this error message?
   
   Like `LIMIT 1` in which 1 is integer while `LIMIT 1L` in which 1L is long?



##########
sql/core/src/main/scala/org/apache/spark/sql/execution/limit.scala:
##########
@@ -37,15 +37,27 @@ trait LimitExec extends UnaryExecNode {
 }
 
 /**
- * Take the first `limit` elements and collect them to a single partition.
+ * Take the first `limit` + `offset` elements and collect them to a single 
partition and then to
+ * drop the first `offset` elements.
  *
  * This operator will be used when a logical `Limit` operation is the final 
operator in an
  * logical plan, which happens when the user is collecting results back to the 
driver.
  */
-case class CollectLimitExec(limit: Int, child: SparkPlan) extends LimitExec {
+case class CollectLimitExec(limit: Int, offset: Int, child: SparkPlan) extends 
LimitExec {
   override def output: Seq[Attribute] = child.output
   override def outputPartitioning: Partitioning = SinglePartition
-  override def executeCollect(): Array[InternalRow] = child.executeTake(limit)
+  override def executeCollect(): Array[InternalRow] = {
+    // Because CollectLimitExec collect all the output of child to a single 
partition, so we need
+    // collect the first `limit` + `offset` elements and then to drop the 
first `offset` elements.
+    // For example: limit is 1 and offset is 2 and the child output two 
partition.
+    // The first partition output [1, 2] and the Second partition output [3, 
4, 5].
+    // Then [1, 2, 3] will be taken and output [3].
+    if (offset > 0) {
+      child.executeTake(limit + offset).drop(offset)
+    } else {

Review Comment:
   so the assumption here is when offset is not > 0 then offset is not set?
   
   Will use Option and None be better to indicate: 
   1. offset is set and legal. Some(value)
   2. offset is set but not legal. Won't be here. It should be rejected in 
analyzer
   3. offset is not set. None.



##########
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/analysis/AnalysisErrorSuite.scala:
##########
@@ -531,6 +532,52 @@ class AnalysisErrorSuite extends AnalysisTest {
     "The limit expression must be equal to or greater than 0, but got -1" :: 
Nil
   )
 
+  errorTest(
+    "an evaluated offset class must not be string",
+    testRelation.offset(Literal(UTF8String.fromString("abc"), StringType)),
+    "The offset expression must be integer type, but got string" :: Nil
+  )
+
+  errorTest(
+    "an evaluated offset class must not be long",
+    testRelation.offset(Literal(10L, LongType)),
+    "The offset expression must be integer type, but got bigint" :: Nil
+  )
+
+  errorTest(
+    "an evaluated offset class must not be null",
+    testRelation.offset(Literal(null, IntegerType)),
+    "The evaluated offset expression must not be null, but got " :: Nil
+  )
+
+  errorTest(
+    "num_rows in offset clause must be equal to or greater than 0",
+    testRelation.offset(-1),
+    "The offset expression must be equal to or greater than 0, but got -1" :: 
Nil
+  )
+
+  errorTest(
+    "OFFSET clause is outermost node",
+    testRelation.offset(Literal(10, IntegerType)),
+    "The OFFSET clause is only allowed in the LIMIT clause, but the OFFSET" +
+      " clause is found to be the outermost node." :: Nil

Review Comment:
   I found the second half of error message is a bit confusing. 
   
   I would guess it tries to say that the `OFFSET is found to be used without a 
LIMIT` (it is phrased as it is be the outermost node)?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to