Github user marmbrus commented on a diff in the pull request:
https://github.com/apache/spark/pull/1071#discussion_r13734140
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/commands.scala ---
@@ -22,45 +22,69 @@ import org.apache.spark.rdd.RDD
import org.apache.spark.sql.{SQLContext, Row}
import org.apache.spark.sql.catalyst.expressions.{GenericRow, Attribute}
+trait PhysicalCommand {
+ /**
+ * A concrete command should override this lazy field to wrap up any
side effects caused by the
+ * command or any other computation that should be evaluated exactly
once. The value of this field
+ * can be used as the contents of the corresponding RDD generated from
the physical plan of this
+ * command.
+ *
+ * The `execute()` method of all the physical command classes should
reference `sideEffect` so
+ * that the command can be executed eagerly right after the command
query is created.
+ */
+ protected[sql] lazy val sideEffectResult: Seq[Any] = Seq.empty[Any]
--- End diff --
> After some thought I think it's not only about naming, the semantics is
wrong: en RDD[Row] with an empty row indicates that the schema of the result
has no fields, while an empty RDD[Row] can be fit schemas with any number of
fields. And, for SELECT 1, shouldn't it be an RDD[Row] with a single row
containing a 1?
We use a Project to build the actual result.
However, looks like I already did the separation... so that plan sounds
good to me :)
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---