viirya commented on a change in pull request #34642:
URL: https://github.com/apache/spark/pull/34642#discussion_r766931441
##########
File path:
sql/core/src/main/scala/org/apache/spark/sql/execution/SparkPlan.scala
##########
@@ -71,6 +71,11 @@ abstract class SparkPlan extends QueryPlan[SparkPlan] with
Logging with Serializ
val id: Int = SparkPlan.newPlanId()
+ /**
+ * Return true if this stage of the plan supports row-based execution.
Review comment:
> Hmm any example? every physical plan node has to implement doExecute
which outputs rows, while doExecuteColumnar throws exception by default.
I think there is no guarantee that a physical node must implement a working
`doExecute`. For a columnar node, it can just throw exception saying it is not
implemented (like the default `doExecuteColumnar`) if it is not designed to be
executed under row-based execution.
I also don't see a need to have implement both (working) row-based and
columnar execution for a node in general. But in Spark, because we don't
actually have official columnar execution nodes, so maybe I cannot get an
example from Spark itself. Hopefully I convey the idea clearly.
> prefersColumnar: this plan prefers to output columnar batches even if it
is not explicitly requested (e.g., outputsColumnar is false).
BTW, `outputsColumnar` is not a preference option I think (at least for its
usage now in the rule). It actually means that the output should be in columnar
or not. Once `outputsColumnar` is false, the plan should output row-based
output and it is why we add `ColumnarToRowExec` for the case.
Yea, the preference I mentioned is pretty limited so far. I agree that we
maybe need to have a preference rule (or something) in the future. As we don't
have real built-in columnar operators in Spark, so currently the situation
seems that some columnar extensions/libraries replace row-based operators with
columnar operators during planning. I'm not sure if we can estimate which one
is preferred during planning.
##########
File path:
sql/core/src/main/scala/org/apache/spark/sql/execution/SparkPlan.scala
##########
@@ -71,6 +71,11 @@ abstract class SparkPlan extends QueryPlan[SparkPlan] with
Logging with Serializ
val id: Int = SparkPlan.newPlanId()
+ /**
+ * Return true if this stage of the plan supports row-based execution.
Review comment:
BTW, IMHO, if we add columnar support for more operators in the future,
I guess it already implicitly indicates we "prefer" it over current execution
(whole-stage codegen or interpreted one)? Just like whole-stage codegen, seems
we simply prefer it once we verify it having better performance generally. This
is similar to the 3rd party extensions/libraries situation, I think.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]