Github user liancheng commented on a diff in the pull request:
https://github.com/apache/spark/pull/1071#discussion_r13734016
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/commands.scala ---
@@ -22,45 +22,69 @@ import org.apache.spark.rdd.RDD
import org.apache.spark.sql.{SQLContext, Row}
import org.apache.spark.sql.catalyst.expressions.{GenericRow, Attribute}
+trait PhysicalCommand {
+ /**
+ * A concrete command should override this lazy field to wrap up any
side effects caused by the
+ * command or any other computation that should be evaluated exactly
once. The value of this field
+ * can be used as the contents of the corresponding RDD generated from
the physical plan of this
+ * command.
+ *
+ * The `execute()` method of all the physical command classes should
reference `sideEffect` so
+ * that the command can be executed eagerly right after the command
query is created.
+ */
+ protected[sql] lazy val sideEffectResult: Seq[Any] = Seq.empty[Any]
--- End diff --
After some thought I think it's not only about naming, the semantics is
wrong: en `RDD[Row]` with an empty row indicates that the schema of the result
has *no* fields, while an empty `RDD[Row]` can be fit schemas with any number
of fields. And, for `SELECT 1`, shouldn't it be an `RDD[Row]` with a single row
containing a `1`?
There does exist a `singleRowRdd` in `SparkStrategies`, used to translate
`logical.NoRelation`.
Currently `emptyResult` is only used in `execution.CacheCommand` and
`execution.NativeCommand`. So I'll just fix `emptyResult` definition and see if
all test suites pass.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---