[GitHub] [spark] huaxingao commented on a change in pull request #34291: [SPARK-37020][SQL] DS V2 LIMIT push down

GitBox Sat, 16 Oct 2021 22:18:37 -0700


huaxingao commented on a change in pull request #34291:
URL: https://github.com/apache/spark/pull/34291#discussion_r730355466




##########
File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/V2ScanRelationPushDown.scala
##########
@@ -225,6 +226,31 @@ object V2ScanRelationPushDown extends Rule[LogicalPlan] 
with PredicateHelper {
       withProjection
   }
 
+  def applyLimit(plan: LogicalPlan): LogicalPlan = plan.transform {
+    case globalLimit @ GlobalLimit(_,
+        LocalLimit(limitExpr, DataSourceV2ScanRelation(_, scan, _))) =>
+      val supportsPushDownLimit = scan match {
+        case _: SupportsPushDownLimit => true
+        case v1: V1ScanWrapper =>
+          v1.v1Scan match {
+            case _: SupportsPushDownLimit => true
+            case _ => false
+          }
+        case _ => false
+      }
+      if (supportsPushDownLimit) {
+        assert(limitExpr.isInstanceOf[Literal] &&
+          limitExpr.asInstanceOf[Literal].value.isInstanceOf[Integer],
+          "Limit has to be an Integer")
+        val value = limitExpr.asInstanceOf[Literal].value.asInstanceOf[Integer]
+        val limit = LogicalExpressions.limit(LiteralValue(value, IntegerType))
+        PushDownUtils.pushLimit(scan, limit)
+        globalLimit

Review comment:
       Even though we push down LIMIT to the data source, we still want to keep 
this LIMIT operation in Spark. It is safer this way, just in case somehow the 
data source returns more rows than the LIMIT requests. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] huaxingao commented on a change in pull request #34291: [SPARK-37020][SQL] DS V2 LIMIT push down

Reply via email to