sunchao commented on a change in pull request #34642:
URL: https://github.com/apache/spark/pull/34642#discussion_r766884028
##########
File path:
sql/core/src/main/scala/org/apache/spark/sql/execution/SparkPlan.scala
##########
@@ -71,6 +71,11 @@ abstract class SparkPlan extends QueryPlan[SparkPlan] with
Logging with Serializ
val id: Int = SparkPlan.newPlanId()
+ /**
+ * Return true if this stage of the plan supports row-based execution.
Review comment:
> Not every plan supports row-based output.
Hmm any example? every physical plan node has to implement `doExecute` which
outputs rows, while `doExecuteColumnar` throws exception by default.
> As I see, we usually prefer columnar output already.
I'm not sure about this part. To my understanding, at the moment it appears
we prefer columnar output because 1) vectorized readers for OPC/Parquet yield
much better performance so we always want to use that over the default
row-based impls, and 2) `supportsColumnar` defaults to false as most operators
don't support columnar execution yet, so we'll do the columnar-row conversion
and switch back to whole-stage codegen.
However this may not hold true if we add columnar support for more operators
like filter/project etc in future. Do we want to prefer columnar execution over
the whole-stage codegen approach? I'm not sure yet and maybe some evaluation is
required. `prefersColumnar` could give us a knob to control this.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]