HeartSaVioR commented on a change in pull request #34642:
URL: https://github.com/apache/spark/pull/34642#discussion_r767325385



##########
File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/SparkPlan.scala
##########
@@ -71,6 +71,11 @@ abstract class SparkPlan extends QueryPlan[SparkPlan] with 
Logging with Serializ
 
   val id: Int = SparkPlan.newPlanId()
 
+  /**
+   * Return true if this stage of the plan supports row-based execution.

Review comment:
       For now I guess columnar route seems to be considered as superior, 
otherwise there should be calculation for the cost of plan between row vs 
columnar.
   
   This is just to cover the case that downstream doesn't support columnar but 
upstream can support both row and columnar, and has performance of producing 
output as `columnar output > row output > columnar output + columnar-to-row 
conversion`, so upstream wants to produce directly whatever downstream wants 
without conversion.
   
   If upstream can produce columnar output faster enough to cover the overhead 
of columnar-to-row conversion (`columnar output > columnar output + 
columnar-to-row conversion > row output`), then it could just tactically say 
"it only cover columnar output" and Spark will take the conversion.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to