[GitHub] spark pull request #19941: [SPARK-22753][SQL] Get rid of dataSource.writeAnd...

2017-12-14 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/19941


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19941: [SPARK-22753][SQL] Get rid of dataSource.writeAnd...

2017-12-14 Thread xuanyuanking
Github user xuanyuanking commented on a diff in the pull request:

https://github.com/apache/spark/pull/19941#discussion_r157118985
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/command/InsertIntoDataSourceDirCommand.scala
 ---
@@ -67,8 +67,9 @@ case class InsertIntoDataSourceDirCommand(
 
 val saveMode = if (overwrite) SaveMode.Overwrite else 
SaveMode.ErrorIfExists
 try {
-  
sparkSession.sessionState.executePlan(dataSource.planForWriting(saveMode, 
query))
-  dataSource.writeAndRead(saveMode, query)
--- End diff --

Revert done.
Sorry, maybe I misunderstand your words of 'get rid of 
dataSource.writeAndRead'. Like you and Wenchen's discussion in 
https://github.com/apache/spark/pull/16481, shouldn't we make `writeAndRead` 
just return a BaseRelation without write to the destination? Thank you for your 
patient reply.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19941: [SPARK-22753][SQL] Get rid of dataSource.writeAnd...

2017-12-14 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/19941#discussion_r157017206
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/command/InsertIntoDataSourceDirCommand.scala
 ---
@@ -67,8 +67,9 @@ case class InsertIntoDataSourceDirCommand(
 
 val saveMode = if (overwrite) SaveMode.Overwrite else 
SaveMode.ErrorIfExists
 try {
-  
sparkSession.sessionState.executePlan(dataSource.planForWriting(saveMode, 
query))
-  dataSource.writeAndRead(saveMode, query)
--- End diff --

Could we revert all the changes except this?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19941: [SPARK-22753][SQL] Get rid of dataSource.writeAnd...

2017-12-14 Thread xuanyuanking
Github user xuanyuanking commented on a diff in the pull request:

https://github.com/apache/spark/pull/19941#discussion_r156960449
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSource.scala
 ---
@@ -490,20 +489,19 @@ case class DataSource(
   }
 
   /**
-   * Writes the given [[LogicalPlan]] out to this [[DataSource]] and 
returns a [[BaseRelation]] for
-   * the following reading.
+   * Returns a [[BaseRelation]] for creating table after `planForWriting`. 
Only use
+   * in `CreateDataSourceTableAsSelectCommand` while saving data to 
non-existing table.
*/
-  def writeAndRead(mode: SaveMode, data: LogicalPlan): BaseRelation = {
+  def getRelation(mode: SaveMode, data: LogicalPlan): BaseRelation = {
 if 
(data.schema.map(_.dataType).exists(_.isInstanceOf[CalendarIntervalType])) {
   throw new AnalysisException("Cannot save interval data type into 
external storage.")
 }
 
 providingClass.newInstance() match {
-  case dataSource: CreatableRelationProvider =>
+  case dataSource: RelationProvider =>
--- End diff --

Ah, I see. 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19941: [SPARK-22753][SQL] Get rid of dataSource.writeAnd...

2017-12-14 Thread xuanyuanking
Github user xuanyuanking commented on a diff in the pull request:

https://github.com/apache/spark/pull/19941#discussion_r156958793
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/command/InsertIntoDataSourceDirCommand.scala
 ---
@@ -67,8 +67,9 @@ case class InsertIntoDataSourceDirCommand(
 
 val saveMode = if (overwrite) SaveMode.Overwrite else 
SaveMode.ErrorIfExists
 try {
-  
sparkSession.sessionState.executePlan(dataSource.planForWriting(saveMode, 
query))
-  dataSource.writeAndRead(saveMode, query)
--- End diff --

You're right, test the query "INSERT OVERWRITE DIRECTORY 
'/home/liyuanjian/tmp' USING json SELECT 1 AS a, 'c' as b;".

![image](https://user-images.githubusercontent.com/4833765/33997144-6d159446-e11e-11e7-8bae-f485873d84c3.png)


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19941: [SPARK-22753][SQL] Get rid of dataSource.writeAnd...

2017-12-13 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/19941#discussion_r156811247
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSource.scala
 ---
@@ -490,20 +489,19 @@ case class DataSource(
   }
 
   /**
-   * Writes the given [[LogicalPlan]] out to this [[DataSource]] and 
returns a [[BaseRelation]] for
-   * the following reading.
+   * Returns a [[BaseRelation]] for creating table after `planForWriting`. 
Only use
+   * in `CreateDataSourceTableAsSelectCommand` while saving data to 
non-existing table.
*/
-  def writeAndRead(mode: SaveMode, data: LogicalPlan): BaseRelation = {
+  def getRelation(mode: SaveMode, data: LogicalPlan): BaseRelation = {
 if 
(data.schema.map(_.dataType).exists(_.isInstanceOf[CalendarIntervalType])) {
   throw new AnalysisException("Cannot save interval data type into 
external storage.")
 }
 
 providingClass.newInstance() match {
-  case dataSource: CreatableRelationProvider =>
+  case dataSource: RelationProvider =>
--- End diff --

Yes, it will write to the destination, but `InsertIntoDataSourceDirCommand` 
only supports `FileFormat`


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19941: [SPARK-22753][SQL] Get rid of dataSource.writeAnd...

2017-12-13 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/19941#discussion_r156810647
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/command/InsertIntoDataSourceDirCommand.scala
 ---
@@ -67,8 +67,9 @@ case class InsertIntoDataSourceDirCommand(
 
 val saveMode = if (overwrite) SaveMode.Overwrite else 
SaveMode.ErrorIfExists
 try {
-  
sparkSession.sessionState.executePlan(dataSource.planForWriting(saveMode, 
query))
-  dataSource.writeAndRead(saveMode, query)
--- End diff --

I think it is not needed. You can try it. 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19941: [SPARK-22753][SQL] Get rid of dataSource.writeAnd...

2017-12-13 Thread xuanyuanking
Github user xuanyuanking commented on a diff in the pull request:

https://github.com/apache/spark/pull/19941#discussion_r156661304
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSource.scala
 ---
@@ -490,20 +489,19 @@ case class DataSource(
   }
 
   /**
-   * Writes the given [[LogicalPlan]] out to this [[DataSource]] and 
returns a [[BaseRelation]] for
-   * the following reading.
+   * Returns a [[BaseRelation]] for creating table after `planForWriting`. 
Only use
+   * in `CreateDataSourceTableAsSelectCommand` while saving data to 
non-existing table.
*/
-  def writeAndRead(mode: SaveMode, data: LogicalPlan): BaseRelation = {
+  def getRelation(mode: SaveMode, data: LogicalPlan): BaseRelation = {
 if 
(data.schema.map(_.dataType).exists(_.isInstanceOf[CalendarIntervalType])) {
   throw new AnalysisException("Cannot save interval data type into 
external storage.")
 }
 
 providingClass.newInstance() match {
-  case dataSource: CreatableRelationProvider =>
+  case dataSource: RelationProvider =>
--- End diff --

If here use `CreatableRelationProvider.createRelation`, will it write to 
destination one more time?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19941: [SPARK-22753][SQL] Get rid of dataSource.writeAnd...

2017-12-13 Thread xuanyuanking
Github user xuanyuanking commented on a diff in the pull request:

https://github.com/apache/spark/pull/19941#discussion_r156659744
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/command/InsertIntoDataSourceDirCommand.scala
 ---
@@ -67,8 +67,9 @@ case class InsertIntoDataSourceDirCommand(
 
 val saveMode = if (overwrite) SaveMode.Overwrite else 
SaveMode.ErrorIfExists
 try {
-  
sparkSession.sessionState.executePlan(dataSource.planForWriting(saveMode, 
query))
-  dataSource.writeAndRead(saveMode, query)
--- End diff --

I implemented like this at first, but after checking this patch 
(https://github.com/apache/spark/pull/18064/files), I changed to current 
implementation, is the wrapping of execution id unnecessary here?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19941: [SPARK-22753][SQL] Get rid of dataSource.writeAnd...

2017-12-12 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/19941#discussion_r156473299
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/command/InsertIntoDataSourceDirCommand.scala
 ---
@@ -67,8 +67,9 @@ case class InsertIntoDataSourceDirCommand(
 
 val saveMode = if (overwrite) SaveMode.Overwrite else 
SaveMode.ErrorIfExists
 try {
-  
sparkSession.sessionState.executePlan(dataSource.planForWriting(saveMode, 
query))
-  dataSource.writeAndRead(saveMode, query)
--- End diff --

How about changing it to
```Scala
  
sparkSession.sessionState.executePlan(dataSource.planForWriting(saveMode, 
query)).toRdd
```


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19941: [SPARK-22753][SQL] Get rid of dataSource.writeAnd...

2017-12-12 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/19941#discussion_r156466045
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSource.scala
 ---
@@ -490,20 +489,19 @@ case class DataSource(
   }
 
   /**
-   * Writes the given [[LogicalPlan]] out to this [[DataSource]] and 
returns a [[BaseRelation]] for
-   * the following reading.
+   * Returns a [[BaseRelation]] for creating table after `planForWriting`. 
Only use
+   * in `CreateDataSourceTableAsSelectCommand` while saving data to 
non-existing table.
*/
-  def writeAndRead(mode: SaveMode, data: LogicalPlan): BaseRelation = {
+  def getRelation(mode: SaveMode, data: LogicalPlan): BaseRelation = {
 if 
(data.schema.map(_.dataType).exists(_.isInstanceOf[CalendarIntervalType])) {
   throw new AnalysisException("Cannot save interval data type into 
external storage.")
 }
 
 providingClass.newInstance() match {
-  case dataSource: CreatableRelationProvider =>
+  case dataSource: RelationProvider =>
--- End diff --

They are different public traits. We are unable to change this.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19941: [SPARK-22753][SQL] Get rid of dataSource.writeAnd...

2017-12-11 Thread xuanyuanking
Github user xuanyuanking commented on a diff in the pull request:

https://github.com/apache/spark/pull/19941#discussion_r156258956
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSource.scala
 ---
@@ -490,20 +489,19 @@ case class DataSource(
   }
 
   /**
-   * Writes the given [[LogicalPlan]] out to this [[DataSource]] and 
returns a [[BaseRelation]] for
-   * the following reading.
+   * Returns a [[BaseRelation]] for creating table after `planForWriting`. 
Only use
+   * in `CreateDataSourceTableAsSelectCommand` while saving data to 
non-existing table.
*/
-  def writeAndRead(mode: SaveMode, data: LogicalPlan): BaseRelation = {
+  def getRelation(mode: SaveMode, data: LogicalPlan): BaseRelation = {
 if 
(data.schema.map(_.dataType).exists(_.isInstanceOf[CalendarIntervalType])) {
   throw new AnalysisException("Cannot save interval data type into 
external storage.")
 }
 
 providingClass.newInstance() match {
-  case dataSource: CreatableRelationProvider =>
+  case dataSource: RelationProvider =>
--- End diff --

To the reviewers, current implementation here made datasource should both 
inherit `RelationProvider` and `CreatableRelationProvider`([like the 
UT](https://github.com/apache/spark/pull/19941/files#diff-d15c669bd62aabde751ae6c54f768d36R28)),
 because here just want get the relation by using 
`RelationProvider.createRelation` but not save a DF to a destination by using 
`CreatableRelationProvider.createRelation`. I need more advise to do better.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19941: [SPARK-22753][SQL] Get rid of dataSource.writeAnd...

2017-12-11 Thread xuanyuanking
GitHub user xuanyuanking opened a pull request:

https://github.com/apache/spark/pull/19941

[SPARK-22753][SQL] Get rid of dataSource.writeAndRead

## What changes were proposed in this pull request?

As the discussion in https://github.com/apache/spark/pull/16481 and 
https://github.com/apache/spark/pull/18975#discussion_r155454606
Currently the BaseRelation returned by `dataSource.writeAndRead` only used 
in `CreateDataSourceTableAsSelect`, planForWriting and writeAndRead has some 
common code paths. 
In this patch I removed the writeAndRead function and added the getRelation 
function which only use in `CreateDataSourceTableAsSelectCommand` while saving 
data to non-existing table.

## How was this patch tested?

Existing UT


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/xuanyuanking/spark SPARK-22753

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/19941.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #19941


commit 14811636692810809033bc7caf03fcecb6939aa3
Author: Yuanjian Li 
Date:   2017-12-11T12:12:45Z

Get rid of dataSource.writeAndRead




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org