sunchao commented on a change in pull request #34642:
URL: https://github.com/apache/spark/pull/34642#discussion_r766884028



##########
File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/SparkPlan.scala
##########
@@ -71,6 +71,11 @@ abstract class SparkPlan extends QueryPlan[SparkPlan] with 
Logging with Serializ
 
   val id: Int = SparkPlan.newPlanId()
 
+  /**
+   * Return true if this stage of the plan supports row-based execution.

Review comment:
       > Not every plan supports row-based output.
   
   Hmm any example? every physical plan node has to implement `doExecute` which 
outputs rows, while `doExecuteColumnar` throws exception by default.
   
   > As I see, we usually prefer columnar output already.
   
   I'm not sure about this part. To my understanding, at the moment it appears 
we prefer columnar output because 1) vectorized readers for OPC/Parquet yield 
much better performance so we always want to use that over the default 
row-based impls, and 2) `supportsColumnar` defaults to false as most operators 
don't support columnar execution yet, so we'll do the columnar-row conversion 
and switch back to whole-stage codegen.
   
   However this may not hold true if we add columnar support for more operators 
like filter/project etc in future. Do we want to prefer columnar execution over 
the whole-stage codegen approach? I'm not sure yet and maybe some evaluation is 
required. `prefersColumnar` could give us a knob to control this.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to