[GitHub] spark pull request #19941: [SPARK-22753][SQL] Get rid of dataSource.writeAnd...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/19941 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19941: [SPARK-22753][SQL] Get rid of dataSource.writeAnd...
Github user xuanyuanking commented on a diff in the pull request: https://github.com/apache/spark/pull/19941#discussion_r157118985 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/command/InsertIntoDataSourceDirCommand.scala --- @@ -67,8 +67,9 @@ case class InsertIntoDataSourceDirCommand( val saveMode = if (overwrite) SaveMode.Overwrite else SaveMode.ErrorIfExists try { - sparkSession.sessionState.executePlan(dataSource.planForWriting(saveMode, query)) - dataSource.writeAndRead(saveMode, query) --- End diff -- Revert done. Sorry, maybe I misunderstand your words of 'get rid of dataSource.writeAndRead'. Like you and Wenchen's discussion in https://github.com/apache/spark/pull/16481, shouldn't we make `writeAndRead` just return a BaseRelation without write to the destination? Thank you for your patient reply. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19941: [SPARK-22753][SQL] Get rid of dataSource.writeAnd...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/19941#discussion_r157017206 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/command/InsertIntoDataSourceDirCommand.scala --- @@ -67,8 +67,9 @@ case class InsertIntoDataSourceDirCommand( val saveMode = if (overwrite) SaveMode.Overwrite else SaveMode.ErrorIfExists try { - sparkSession.sessionState.executePlan(dataSource.planForWriting(saveMode, query)) - dataSource.writeAndRead(saveMode, query) --- End diff -- Could we revert all the changes except this? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19941: [SPARK-22753][SQL] Get rid of dataSource.writeAnd...
Github user xuanyuanking commented on a diff in the pull request: https://github.com/apache/spark/pull/19941#discussion_r156960449 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSource.scala --- @@ -490,20 +489,19 @@ case class DataSource( } /** - * Writes the given [[LogicalPlan]] out to this [[DataSource]] and returns a [[BaseRelation]] for - * the following reading. + * Returns a [[BaseRelation]] for creating table after `planForWriting`. Only use + * in `CreateDataSourceTableAsSelectCommand` while saving data to non-existing table. */ - def writeAndRead(mode: SaveMode, data: LogicalPlan): BaseRelation = { + def getRelation(mode: SaveMode, data: LogicalPlan): BaseRelation = { if (data.schema.map(_.dataType).exists(_.isInstanceOf[CalendarIntervalType])) { throw new AnalysisException("Cannot save interval data type into external storage.") } providingClass.newInstance() match { - case dataSource: CreatableRelationProvider => + case dataSource: RelationProvider => --- End diff -- Ah, I see. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19941: [SPARK-22753][SQL] Get rid of dataSource.writeAnd...
Github user xuanyuanking commented on a diff in the pull request: https://github.com/apache/spark/pull/19941#discussion_r156958793 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/command/InsertIntoDataSourceDirCommand.scala --- @@ -67,8 +67,9 @@ case class InsertIntoDataSourceDirCommand( val saveMode = if (overwrite) SaveMode.Overwrite else SaveMode.ErrorIfExists try { - sparkSession.sessionState.executePlan(dataSource.planForWriting(saveMode, query)) - dataSource.writeAndRead(saveMode, query) --- End diff -- You're right, test the query "INSERT OVERWRITE DIRECTORY '/home/liyuanjian/tmp' USING json SELECT 1 AS a, 'c' as b;". ![image](https://user-images.githubusercontent.com/4833765/33997144-6d159446-e11e-11e7-8bae-f485873d84c3.png) --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19941: [SPARK-22753][SQL] Get rid of dataSource.writeAnd...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/19941#discussion_r156811247 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSource.scala --- @@ -490,20 +489,19 @@ case class DataSource( } /** - * Writes the given [[LogicalPlan]] out to this [[DataSource]] and returns a [[BaseRelation]] for - * the following reading. + * Returns a [[BaseRelation]] for creating table after `planForWriting`. Only use + * in `CreateDataSourceTableAsSelectCommand` while saving data to non-existing table. */ - def writeAndRead(mode: SaveMode, data: LogicalPlan): BaseRelation = { + def getRelation(mode: SaveMode, data: LogicalPlan): BaseRelation = { if (data.schema.map(_.dataType).exists(_.isInstanceOf[CalendarIntervalType])) { throw new AnalysisException("Cannot save interval data type into external storage.") } providingClass.newInstance() match { - case dataSource: CreatableRelationProvider => + case dataSource: RelationProvider => --- End diff -- Yes, it will write to the destination, but `InsertIntoDataSourceDirCommand` only supports `FileFormat` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19941: [SPARK-22753][SQL] Get rid of dataSource.writeAnd...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/19941#discussion_r156810647 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/command/InsertIntoDataSourceDirCommand.scala --- @@ -67,8 +67,9 @@ case class InsertIntoDataSourceDirCommand( val saveMode = if (overwrite) SaveMode.Overwrite else SaveMode.ErrorIfExists try { - sparkSession.sessionState.executePlan(dataSource.planForWriting(saveMode, query)) - dataSource.writeAndRead(saveMode, query) --- End diff -- I think it is not needed. You can try it. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19941: [SPARK-22753][SQL] Get rid of dataSource.writeAnd...
Github user xuanyuanking commented on a diff in the pull request: https://github.com/apache/spark/pull/19941#discussion_r156661304 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSource.scala --- @@ -490,20 +489,19 @@ case class DataSource( } /** - * Writes the given [[LogicalPlan]] out to this [[DataSource]] and returns a [[BaseRelation]] for - * the following reading. + * Returns a [[BaseRelation]] for creating table after `planForWriting`. Only use + * in `CreateDataSourceTableAsSelectCommand` while saving data to non-existing table. */ - def writeAndRead(mode: SaveMode, data: LogicalPlan): BaseRelation = { + def getRelation(mode: SaveMode, data: LogicalPlan): BaseRelation = { if (data.schema.map(_.dataType).exists(_.isInstanceOf[CalendarIntervalType])) { throw new AnalysisException("Cannot save interval data type into external storage.") } providingClass.newInstance() match { - case dataSource: CreatableRelationProvider => + case dataSource: RelationProvider => --- End diff -- If here use `CreatableRelationProvider.createRelation`, will it write to destination one more time? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19941: [SPARK-22753][SQL] Get rid of dataSource.writeAnd...
Github user xuanyuanking commented on a diff in the pull request: https://github.com/apache/spark/pull/19941#discussion_r156659744 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/command/InsertIntoDataSourceDirCommand.scala --- @@ -67,8 +67,9 @@ case class InsertIntoDataSourceDirCommand( val saveMode = if (overwrite) SaveMode.Overwrite else SaveMode.ErrorIfExists try { - sparkSession.sessionState.executePlan(dataSource.planForWriting(saveMode, query)) - dataSource.writeAndRead(saveMode, query) --- End diff -- I implemented like this at first, but after checking this patch (https://github.com/apache/spark/pull/18064/files), I changed to current implementation, is the wrapping of execution id unnecessary here? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19941: [SPARK-22753][SQL] Get rid of dataSource.writeAnd...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/19941#discussion_r156473299 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/command/InsertIntoDataSourceDirCommand.scala --- @@ -67,8 +67,9 @@ case class InsertIntoDataSourceDirCommand( val saveMode = if (overwrite) SaveMode.Overwrite else SaveMode.ErrorIfExists try { - sparkSession.sessionState.executePlan(dataSource.planForWriting(saveMode, query)) - dataSource.writeAndRead(saveMode, query) --- End diff -- How about changing it to ```Scala sparkSession.sessionState.executePlan(dataSource.planForWriting(saveMode, query)).toRdd ``` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19941: [SPARK-22753][SQL] Get rid of dataSource.writeAnd...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/19941#discussion_r156466045 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSource.scala --- @@ -490,20 +489,19 @@ case class DataSource( } /** - * Writes the given [[LogicalPlan]] out to this [[DataSource]] and returns a [[BaseRelation]] for - * the following reading. + * Returns a [[BaseRelation]] for creating table after `planForWriting`. Only use + * in `CreateDataSourceTableAsSelectCommand` while saving data to non-existing table. */ - def writeAndRead(mode: SaveMode, data: LogicalPlan): BaseRelation = { + def getRelation(mode: SaveMode, data: LogicalPlan): BaseRelation = { if (data.schema.map(_.dataType).exists(_.isInstanceOf[CalendarIntervalType])) { throw new AnalysisException("Cannot save interval data type into external storage.") } providingClass.newInstance() match { - case dataSource: CreatableRelationProvider => + case dataSource: RelationProvider => --- End diff -- They are different public traits. We are unable to change this. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19941: [SPARK-22753][SQL] Get rid of dataSource.writeAnd...
Github user xuanyuanking commented on a diff in the pull request: https://github.com/apache/spark/pull/19941#discussion_r156258956 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSource.scala --- @@ -490,20 +489,19 @@ case class DataSource( } /** - * Writes the given [[LogicalPlan]] out to this [[DataSource]] and returns a [[BaseRelation]] for - * the following reading. + * Returns a [[BaseRelation]] for creating table after `planForWriting`. Only use + * in `CreateDataSourceTableAsSelectCommand` while saving data to non-existing table. */ - def writeAndRead(mode: SaveMode, data: LogicalPlan): BaseRelation = { + def getRelation(mode: SaveMode, data: LogicalPlan): BaseRelation = { if (data.schema.map(_.dataType).exists(_.isInstanceOf[CalendarIntervalType])) { throw new AnalysisException("Cannot save interval data type into external storage.") } providingClass.newInstance() match { - case dataSource: CreatableRelationProvider => + case dataSource: RelationProvider => --- End diff -- To the reviewers, current implementation here made datasource should both inherit `RelationProvider` and `CreatableRelationProvider`([like the UT](https://github.com/apache/spark/pull/19941/files#diff-d15c669bd62aabde751ae6c54f768d36R28)), because here just want get the relation by using `RelationProvider.createRelation` but not save a DF to a destination by using `CreatableRelationProvider.createRelation`. I need more advise to do better. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19941: [SPARK-22753][SQL] Get rid of dataSource.writeAnd...
GitHub user xuanyuanking opened a pull request: https://github.com/apache/spark/pull/19941 [SPARK-22753][SQL] Get rid of dataSource.writeAndRead ## What changes were proposed in this pull request? As the discussion in https://github.com/apache/spark/pull/16481 and https://github.com/apache/spark/pull/18975#discussion_r155454606 Currently the BaseRelation returned by `dataSource.writeAndRead` only used in `CreateDataSourceTableAsSelect`, planForWriting and writeAndRead has some common code paths. In this patch I removed the writeAndRead function and added the getRelation function which only use in `CreateDataSourceTableAsSelectCommand` while saving data to non-existing table. ## How was this patch tested? Existing UT You can merge this pull request into a Git repository by running: $ git pull https://github.com/xuanyuanking/spark SPARK-22753 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/19941.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #19941 commit 14811636692810809033bc7caf03fcecb6939aa3 Author: Yuanjian LiDate: 2017-12-11T12:12:45Z Get rid of dataSource.writeAndRead --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org