[GitHub] spark pull request #22941: [SPARK-25936][SQL] Fix InsertIntoDataSourceComman...
Github user wangyum closed the pull request at: https://github.com/apache/spark/pull/22941 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22941: [SPARK-25936][SQL] Fix InsertIntoDataSourceComman...
Github user wangyum commented on a diff in the pull request: https://github.com/apache/spark/pull/22941#discussion_r230622708 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/sources/InsertSuite.scala --- @@ -589,4 +590,33 @@ class InsertSuite extends DataSourceTest with SharedSQLContext { sql("INSERT INTO TABLE test_table SELECT 2, null") } } + + test("SPARK-25936 InsertIntoDataSourceCommand does not use Cached Data") { --- End diff -- It works. Do we need to fix this plan issue? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22941: [SPARK-25936][SQL] Fix InsertIntoDataSourceComman...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/22941#discussion_r230609046 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/sources/InsertSuite.scala --- @@ -589,4 +590,33 @@ class InsertSuite extends DataSourceTest with SharedSQLContext { sql("INSERT INTO TABLE test_table SELECT 2, null") } } + + test("SPARK-25936 InsertIntoDataSourceCommand does not use Cached Data") { --- End diff -- You can move this test suite to CachedTableSuite.scala and use the helper functions to verify whether the cache is used. See the example. ``` spark.range(2).createTempView("test_view") spark.catalog.cacheTable("test_view") val rddId = rddIdOf("test_view") assert(!isMaterialized(rddId)) sql("INSERT INTO TABLE test_table SELECT * FROM test_view") assert(isMaterialized(rddId)) ``` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22941: [SPARK-25936][SQL] Fix InsertIntoDataSourceComman...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/22941#discussion_r230608937 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/InsertIntoDataSourceCommand.scala --- @@ -30,14 +30,13 @@ import org.apache.spark.sql.sources.InsertableRelation case class InsertIntoDataSourceCommand( logicalRelation: LogicalRelation, query: LogicalPlan, -overwrite: Boolean) - extends RunnableCommand { +overwrite: Boolean, +outputColumnNames: Seq[String]) + extends DataWritingCommand { - override protected def innerChildren: Seq[QueryPlan[_]] = Seq(query) - - override def run(sparkSession: SparkSession): Seq[Row] = { + override def run(sparkSession: SparkSession, child: SparkPlan): Seq[Row] = { val relation = logicalRelation.relation.asInstanceOf[InsertableRelation] -val data = Dataset.ofRows(sparkSession, query) --- End diff -- This will use the cached data, although the plan does not show the cached data is used. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22941: [SPARK-25936][SQL] Fix InsertIntoDataSourceComman...
GitHub user wangyum opened a pull request: https://github.com/apache/spark/pull/22941 [SPARK-25936][SQL] Fix InsertIntoDataSourceCommand does not use Cached Data ## What changes were proposed in this pull request? ```java spark.sql(""" CREATE TABLE jdbcTable USING org.apache.spark.sql.jdbc OPTIONS ( url "jdbc:mysql://localhost:3306/test", dbtable "test.InsertIntoDataSourceCommand", user "hive", password "hive" )""") spark.range(2).createTempView("test_view") spark.catalog.cacheTable("test_view") spark.sql("INSERT INTO TABLE jdbcTable SELECT * FROM test_view").explain ``` Before this PR: ``` == Physical Plan == Execute InsertIntoDataSourceCommand +- InsertIntoDataSourceCommand +- Project +- SubqueryAlias +- Range (0, 2, step=1, splits=Some(8)) ``` After this PR: ``` == Physical Plan == Execute InsertIntoDataSourceCommand InsertIntoDataSourceCommand Relation[id#8L] JDBCRelation(test.InsertIntoDataSourceCommand) [numPartitions=1], false, [id] +- *(1) InMemoryTableScan [id#0L] +- InMemoryRelation [id#0L], StorageLevel(disk, memory, deserialized, 1 replicas) +- *(1) Range (0, 2, step=1, splits=8) ``` ## How was this patch tested? unit tests You can merge this pull request into a Git repository by running: $ git pull https://github.com/wangyum/spark SPARK-25936 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/22941.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #22941 commit 2968b2c34f42f6b0bcb5e373a400377abfd09e86 Author: Yuming Wang Date: 2018-11-04T10:36:20Z Fix InsertIntoDataSourceCommand does not use Cached Data --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org