huaxingao commented on a change in pull request #34291:
URL: https://github.com/apache/spark/pull/34291#discussion_r730355466
##########
File path:
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/V2ScanRelationPushDown.scala
##########
@@ -225,6 +226,31 @@ object V2ScanRelationPushDown extends Rule[LogicalPlan]
with PredicateHelper {
withProjection
}
+ def applyLimit(plan: LogicalPlan): LogicalPlan = plan.transform {
+ case globalLimit @ GlobalLimit(_,
+ LocalLimit(limitExpr, DataSourceV2ScanRelation(_, scan, _))) =>
+ val supportsPushDownLimit = scan match {
+ case _: SupportsPushDownLimit => true
+ case v1: V1ScanWrapper =>
+ v1.v1Scan match {
+ case _: SupportsPushDownLimit => true
+ case _ => false
+ }
+ case _ => false
+ }
+ if (supportsPushDownLimit) {
+ assert(limitExpr.isInstanceOf[Literal] &&
+ limitExpr.asInstanceOf[Literal].value.isInstanceOf[Integer],
+ "Limit has to be an Integer")
+ val value = limitExpr.asInstanceOf[Literal].value.asInstanceOf[Integer]
+ val limit = LogicalExpressions.limit(LiteralValue(value, IntegerType))
+ PushDownUtils.pushLimit(scan, limit)
+ globalLimit
Review comment:
Even though we push down LIMIT to the data source, we still want to keep
this LIMIT operation in Spark. It is safer this way, just in case somehow the
data source returns more rows than the LIMIT requests.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]