spark git commit: [SPARK-22977][SQL] fix web UI SQL tab for CTAS

2018-02-12 Thread wenchen
Repository: spark
Updated Branches:
  refs/heads/branch-2.3 79e8650cc -> 1e3118c2e


[SPARK-22977][SQL] fix web UI SQL tab for CTAS

## What changes were proposed in this pull request?

This is a regression in Spark 2.3.

In Spark 2.2, we have a fragile UI support for SQL data writing commands. We 
only track the input query plan of `FileFormatWriter` and display its metrics. 
This is not ideal because we don't know who triggered the writing(can be table 
insertion, CTAS, etc.), but it's still useful to see the metrics of the input 
query.

In Spark 2.3, we introduced a new mechanism: `DataWritigCommand`, to fix the UI 
issue entirely. Now these writing commands have real children, and we don't 
need to hack into the `FileFormatWriter` for the UI. This also helps with 
`explain`, now `explain` can show the physical plan of the input query, while 
in 2.2 the physical writing plan is simply `ExecutedCommandExec` and it has no 
child.

However there is a regression in CTAS. CTAS commands don't extend 
`DataWritigCommand`, and we don't have the UI hack in `FileFormatWriter` 
anymore, so the UI for CTAS is just an empty node. See 
https://issues.apache.org/jira/browse/SPARK-22977 for more information about 
this UI issue.

To fix it, we should apply the `DataWritigCommand` mechanism to CTAS commands.

TODO: In the future, we should refactor this part and create some physical 
layer code pieces for data writing, and reuse them in different writing 
commands. We should have different logical nodes for different operators, even 
some of them share some same logic, e.g. CTAS, CREATE TABLE, INSERT TABLE. 
Internally we can share the same physical logic.

## How was this patch tested?

manually tested.
For data source table
https://user-images.githubusercontent.com/3182036/35874155-bdffab28-0ba6-11e8-94a8-e32e106ba069.png;>
For hive table
https://user-images.githubusercontent.com/3182036/35874161-c437e2a8-0ba6-11e8-98ed-7930f01432c5.png;>

Author: Wenchen Fan 

Closes #20521 from cloud-fan/UI.

(cherry picked from commit 0e2c266de7189473177f45aa68ea6a45c7e47ec3)
Signed-off-by: Wenchen Fan 


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/1e3118c2
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/1e3118c2
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/1e3118c2

Branch: refs/heads/branch-2.3
Commit: 1e3118c2ee0fe7d2c59cb3e2055709bb2809a6db
Parents: 79e8650
Author: Wenchen Fan 
Authored: Mon Feb 12 22:07:59 2018 +0800
Committer: Wenchen Fan 
Committed: Mon Feb 12 22:08:16 2018 +0800

--
 .../command/createDataSourceTables.scala| 21 
 .../sql/execution/datasources/DataSource.scala  | 44 +---
 .../datasources/DataSourceStrategy.scala|  2 +-
 .../apache/spark/sql/hive/HiveStrategies.scala  |  2 +-
 .../CreateHiveTableAsSelectCommand.scala| 55 +++-
 .../sql/hive/execution/HiveExplainSuite.scala   | 26 -
 6 files changed, 80 insertions(+), 70 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/1e3118c2/sql/core/src/main/scala/org/apache/spark/sql/execution/command/createDataSourceTables.scala
--
diff --git 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/command/createDataSourceTables.scala
 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/command/createDataSourceTables.scala
index 306f43d..e974776 100644
--- 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/command/createDataSourceTables.scala
+++ 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/command/createDataSourceTables.scala
@@ -21,7 +21,9 @@ import java.net.URI
 
 import org.apache.spark.sql._
 import org.apache.spark.sql.catalyst.catalog._
+import org.apache.spark.sql.catalyst.expressions.Attribute
 import org.apache.spark.sql.catalyst.plans.logical.LogicalPlan
+import org.apache.spark.sql.execution.SparkPlan
 import org.apache.spark.sql.execution.datasources._
 import org.apache.spark.sql.sources.BaseRelation
 import org.apache.spark.sql.types.StructType
@@ -136,12 +138,11 @@ case class CreateDataSourceTableCommand(table: 
CatalogTable, ignoreIfExists: Boo
 case class CreateDataSourceTableAsSelectCommand(
 table: CatalogTable,
 mode: SaveMode,
-query: LogicalPlan)
-  extends RunnableCommand {
-
-  override protected def innerChildren: Seq[LogicalPlan] = Seq(query)
+query: LogicalPlan,
+outputColumns: Seq[Attribute])
+  extends DataWritingCommand {
 
-  override def run(sparkSession: SparkSession): Seq[Row] = {
+  override def run(sparkSession: SparkSession, child: SparkPlan): Seq[Row] = {
 assert(table.tableType != CatalogTableType.VIEW)
 

spark git commit: [SPARK-22977][SQL] fix web UI SQL tab for CTAS

2018-02-12 Thread wenchen
Repository: spark
Updated Branches:
  refs/heads/master caeb108e2 -> 0e2c266de


[SPARK-22977][SQL] fix web UI SQL tab for CTAS

## What changes were proposed in this pull request?

This is a regression in Spark 2.3.

In Spark 2.2, we have a fragile UI support for SQL data writing commands. We 
only track the input query plan of `FileFormatWriter` and display its metrics. 
This is not ideal because we don't know who triggered the writing(can be table 
insertion, CTAS, etc.), but it's still useful to see the metrics of the input 
query.

In Spark 2.3, we introduced a new mechanism: `DataWritigCommand`, to fix the UI 
issue entirely. Now these writing commands have real children, and we don't 
need to hack into the `FileFormatWriter` for the UI. This also helps with 
`explain`, now `explain` can show the physical plan of the input query, while 
in 2.2 the physical writing plan is simply `ExecutedCommandExec` and it has no 
child.

However there is a regression in CTAS. CTAS commands don't extend 
`DataWritigCommand`, and we don't have the UI hack in `FileFormatWriter` 
anymore, so the UI for CTAS is just an empty node. See 
https://issues.apache.org/jira/browse/SPARK-22977 for more information about 
this UI issue.

To fix it, we should apply the `DataWritigCommand` mechanism to CTAS commands.

TODO: In the future, we should refactor this part and create some physical 
layer code pieces for data writing, and reuse them in different writing 
commands. We should have different logical nodes for different operators, even 
some of them share some same logic, e.g. CTAS, CREATE TABLE, INSERT TABLE. 
Internally we can share the same physical logic.

## How was this patch tested?

manually tested.
For data source table
https://user-images.githubusercontent.com/3182036/35874155-bdffab28-0ba6-11e8-94a8-e32e106ba069.png;>
For hive table
https://user-images.githubusercontent.com/3182036/35874161-c437e2a8-0ba6-11e8-98ed-7930f01432c5.png;>

Author: Wenchen Fan 

Closes #20521 from cloud-fan/UI.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/0e2c266d
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/0e2c266d
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/0e2c266d

Branch: refs/heads/master
Commit: 0e2c266de7189473177f45aa68ea6a45c7e47ec3
Parents: caeb108
Author: Wenchen Fan 
Authored: Mon Feb 12 22:07:59 2018 +0800
Committer: Wenchen Fan 
Committed: Mon Feb 12 22:07:59 2018 +0800

--
 .../command/createDataSourceTables.scala| 21 
 .../sql/execution/datasources/DataSource.scala  | 44 +---
 .../datasources/DataSourceStrategy.scala|  2 +-
 .../apache/spark/sql/hive/HiveStrategies.scala  |  2 +-
 .../CreateHiveTableAsSelectCommand.scala| 55 +++-
 .../sql/hive/execution/HiveExplainSuite.scala   | 26 -
 6 files changed, 80 insertions(+), 70 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/0e2c266d/sql/core/src/main/scala/org/apache/spark/sql/execution/command/createDataSourceTables.scala
--
diff --git 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/command/createDataSourceTables.scala
 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/command/createDataSourceTables.scala
index 306f43d..e974776 100644
--- 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/command/createDataSourceTables.scala
+++ 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/command/createDataSourceTables.scala
@@ -21,7 +21,9 @@ import java.net.URI
 
 import org.apache.spark.sql._
 import org.apache.spark.sql.catalyst.catalog._
+import org.apache.spark.sql.catalyst.expressions.Attribute
 import org.apache.spark.sql.catalyst.plans.logical.LogicalPlan
+import org.apache.spark.sql.execution.SparkPlan
 import org.apache.spark.sql.execution.datasources._
 import org.apache.spark.sql.sources.BaseRelation
 import org.apache.spark.sql.types.StructType
@@ -136,12 +138,11 @@ case class CreateDataSourceTableCommand(table: 
CatalogTable, ignoreIfExists: Boo
 case class CreateDataSourceTableAsSelectCommand(
 table: CatalogTable,
 mode: SaveMode,
-query: LogicalPlan)
-  extends RunnableCommand {
-
-  override protected def innerChildren: Seq[LogicalPlan] = Seq(query)
+query: LogicalPlan,
+outputColumns: Seq[Attribute])
+  extends DataWritingCommand {
 
-  override def run(sparkSession: SparkSession): Seq[Row] = {
+  override def run(sparkSession: SparkSession, child: SparkPlan): Seq[Row] = {
 assert(table.tableType != CatalogTableType.VIEW)
 assert(table.provider.isDefined)
 
@@ -163,7 +164,7 @@ case class CreateDataSourceTableAsSelectCommand(
   }