[GitHub] spark pull request: [SPARK-2094][SQL] Exactly once command

liancheng Thu, 12 Jun 2014 17:15:28 -0700

Github user liancheng commented on a diff in the pull request:

    https://github.com/apache/spark/pull/1071#discussion_r13734016
  
    --- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/commands.scala ---
    @@ -22,45 +22,69 @@ import org.apache.spark.rdd.RDD
     import org.apache.spark.sql.{SQLContext, Row}
     import org.apache.spark.sql.catalyst.expressions.{GenericRow, Attribute}
     
    +trait PhysicalCommand {
    +  /**
    +   * A concrete command should override this lazy field to wrap up any 
side effects caused by the
    +   * command or any other computation that should be evaluated exactly 
once. The value of this field
    +   * can be used as the contents of the corresponding RDD generated from 
the physical plan of this
    +   * command.
    +   *
    +   * The `execute()` method of all the physical command classes should 
reference `sideEffect` so
    +   * that the command can be executed eagerly right after the command 
query is created.
    +   */
    +  protected[sql] lazy val sideEffectResult: Seq[Any] = Seq.empty[Any]
    --- End diff --
    
    After some thought I think it's not only about naming, the semantics is 
wrong: en `RDD[Row]` with an empty row indicates that the schema of the result 
has *no* fields, while an empty `RDD[Row]` can be fit schemas with any number 
of fields. And, for `SELECT 1`, shouldn't it be an `RDD[Row]` with a single row 
containing a `1`?
    
    There does exist a `singleRowRdd` in `SparkStrategies`, used to translate 
`logical.NoRelation`.
    
    Currently `emptyResult` is only used in `execution.CacheCommand` and 
`execution.NativeCommand`. So I'll just fix `emptyResult` definition and see if 
all test suites pass.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-2094][SQL] Exactly once command

Reply via email to