[GitHub] [spark] AmplabJenkins removed a comment on pull request #29107: [SPARK-32308][SQL] Move by-name resolution logic of unionByName from API code to analysis phase

2020-07-22 Thread GitBox


AmplabJenkins removed a comment on pull request #29107:
URL: https://github.com/apache/spark/pull/29107#issuecomment-662831333







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #29190: [do not review][testing master PR build][discard] comment change

2020-07-22 Thread GitBox


AmplabJenkins removed a comment on pull request #29190:
URL: https://github.com/apache/spark/pull/29190#issuecomment-662831354







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #29190: [do not review][testing master PR build][discard] comment change

2020-07-22 Thread GitBox


AmplabJenkins commented on pull request #29190:
URL: https://github.com/apache/spark/pull/29190#issuecomment-662831354







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #29107: [SPARK-32308][SQL] Move by-name resolution logic of unionByName from API code to analysis phase

2020-07-22 Thread GitBox


AmplabJenkins commented on pull request #29107:
URL: https://github.com/apache/spark/pull/29107#issuecomment-662831333







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #29190: [do not review][testing master PR build][discard] comment change

2020-07-22 Thread GitBox


SparkQA commented on pull request #29190:
URL: https://github.com/apache/spark/pull/29190#issuecomment-662831076


   **[Test build #126387 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/126387/testReport)**
 for PR 29190 at commit 
[`fa41817`](https://github.com/apache/spark/commit/fa41817475bb6c1bdb067979f297fe63ce5b7790).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on pull request #29032: [SPARK-32217] Plumb whether a worker would also be decommissioned along with executor

2020-07-22 Thread GitBox


cloud-fan commented on pull request #29032:
URL: https://github.com/apache/spark/pull/29032#issuecomment-662829893


   I agree `host` is better as it's general to any deployment model. I think we 
should only use `worker` when it refers to the standalone worker. Maybe we can 
start following this principle from now on.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on a change in pull request #29191: [SPARK-32364][SQL][FOLLOWUP] Add toMap to return originalMap

2020-07-22 Thread GitBox


dongjoon-hyun commented on a change in pull request #29191:
URL: https://github.com/apache/spark/pull/29191#discussion_r459225681



##
File path: sql/core/src/main/scala/org/apache/spark/sql/DataFrameReader.scala
##
@@ -361,7 +361,7 @@ class DataFrameReader private[sql](sparkSession: 
SparkSession) extends Logging {
   connectionProperties: Properties): DataFrame = {
 assertNoSpecifiedSchema("jdbc")
 // connectionProperties should override settings in extraOptions.
-val params = extraOptions.toMap ++ connectionProperties.asScala.toMap
+val params = extraOptions ++ connectionProperties.asScala

Review comment:
   `extraOptions ++ connectionProperties.asScala` will guarantee that 
`connectionProoperties` overrides the `extraOption`. We added 
`CaseInsensitiveMap.++` operation for that~
   
   Also, `new JDBCOptions(url, table, params)` works like the following. The 
passed `params: CaseInsensitiveMap` guarantees that JDBC_URL and 
JDBC_TABLE_NAME override `parameters`.
   ```scala
 def this(url: String, table: String, parameters: Map[String, String]) = {
   this(CaseInsensitiveMap(parameters ++ Map(
 JDBCOptions.JDBC_URL -> url,
 JDBCOptions.JDBC_TABLE_NAME -> table)))
 }
   ```
   
   The above code is for your logic.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #29191: [SPARK-32364][SQL][FOLLOWUP] Add toMap to return originalMap

2020-07-22 Thread GitBox


AmplabJenkins removed a comment on pull request #29191:
URL: https://github.com/apache/spark/pull/29191#issuecomment-662829543







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #29191: [SPARK-32364][SQL][FOLLOWUP] Add toMap to return originalMap

2020-07-22 Thread GitBox


AmplabJenkins commented on pull request #29191:
URL: https://github.com/apache/spark/pull/29191#issuecomment-662829543







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on a change in pull request #29191: [SPARK-32364][SQL][FOLLOWUP] Add toMap to return originalMap

2020-07-22 Thread GitBox


dongjoon-hyun commented on a change in pull request #29191:
URL: https://github.com/apache/spark/pull/29191#discussion_r459225681



##
File path: sql/core/src/main/scala/org/apache/spark/sql/DataFrameReader.scala
##
@@ -361,7 +361,7 @@ class DataFrameReader private[sql](sparkSession: 
SparkSession) extends Logging {
   connectionProperties: Properties): DataFrame = {
 assertNoSpecifiedSchema("jdbc")
 // connectionProperties should override settings in extraOptions.
-val params = extraOptions.toMap ++ connectionProperties.asScala.toMap
+val params = extraOptions ++ connectionProperties.asScala

Review comment:
   `extraOptions ++ connectionProperties.asScala` will guarantee that 
`connectionProoperties` overrides the `extraOption`. We added 
`CaseInsensitiveMap.++` operation for that~
   
   Also, `new JDBCOptions(url, table, params)` works like the following. The 
passed `params: CaseInsensitiveMap` guarantees that JDBC_URL and 
JDBC_TABLE_NAME override `parameters`.
   ```
 def this(url: String, table: String, parameters: Map[String, String]) = {
   this(CaseInsensitiveMap(parameters ++ Map(
 JDBCOptions.JDBC_URL -> url,
 JDBCOptions.JDBC_TABLE_NAME -> table)))
 }
   ```
   
   The above code is for your logic.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on a change in pull request #29191: [SPARK-32364][SQL][FOLLOWUP] Add toMap to return originalMap

2020-07-22 Thread GitBox


dongjoon-hyun commented on a change in pull request #29191:
URL: https://github.com/apache/spark/pull/29191#discussion_r459225681



##
File path: sql/core/src/main/scala/org/apache/spark/sql/DataFrameReader.scala
##
@@ -361,7 +361,7 @@ class DataFrameReader private[sql](sparkSession: 
SparkSession) extends Logging {
   connectionProperties: Properties): DataFrame = {
 assertNoSpecifiedSchema("jdbc")
 // connectionProperties should override settings in extraOptions.
-val params = extraOptions.toMap ++ connectionProperties.asScala.toMap
+val params = extraOptions ++ connectionProperties.asScala

Review comment:
   `extraOptions ++ connectionProperties.asScala` will guarantee that 
`connectionProoperties` overrides the `extraOption`. We added 
`CaseInsensitiveMap.++` operation for that~
   
   Also, `new JDBCOptions(url, table, params)` works like this. The passed 
`params: CaseInsensitiveMap` guarantees that JDBC_URL and JDBC_TABLE_NAME 
override `parameters`.
   ```
 def this(url: String, table: String, parameters: Map[String, String]) = {
   this(CaseInsensitiveMap(parameters ++ Map(
 JDBCOptions.JDBC_URL -> url,
 JDBCOptions.JDBC_TABLE_NAME -> table)))
 }
   ```
   
   The above code is for your logic.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #29191: [SPARK-32364][SQL][FOLLOWUP] Add toMap to return originalMap

2020-07-22 Thread GitBox


SparkQA commented on pull request #29191:
URL: https://github.com/apache/spark/pull/29191#issuecomment-662829231


   **[Test build #126386 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/126386/testReport)**
 for PR 29191 at commit 
[`3e55841`](https://github.com/apache/spark/commit/3e558415be4238fa955a5ace6ca7a480f45abffd).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on a change in pull request #29191: [SPARK-32364][SQL][FOLLOWUP] Add toMap to return originalMap

2020-07-22 Thread GitBox


dongjoon-hyun commented on a change in pull request #29191:
URL: https://github.com/apache/spark/pull/29191#discussion_r459225681



##
File path: sql/core/src/main/scala/org/apache/spark/sql/DataFrameReader.scala
##
@@ -361,7 +361,7 @@ class DataFrameReader private[sql](sparkSession: 
SparkSession) extends Logging {
   connectionProperties: Properties): DataFrame = {
 assertNoSpecifiedSchema("jdbc")
 // connectionProperties should override settings in extraOptions.
-val params = extraOptions.toMap ++ connectionProperties.asScala.toMap
+val params = extraOptions ++ connectionProperties.asScala

Review comment:
   `extraOptions ++ connectionProperties.asScala` will guarantee that 
`connectionProoperties` overrides the `extraOption`. We added `++` operation 
for that~
   
   Also, `new JDBCOptions(url, table, params)` works like this. The passed 
`params: CaseInsensitiveMap` guarantees that JDBC_URL and JDBC_TABLE_NAME 
override `parameters`.
   ```
 def this(url: String, table: String, parameters: Map[String, String]) = {
   this(CaseInsensitiveMap(parameters ++ Map(
 JDBCOptions.JDBC_URL -> url,
 JDBCOptions.JDBC_TABLE_NAME -> table)))
 }
   ```
   
   The above code is for your logic.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on a change in pull request #29191: [SPARK-32364][SQL][FOLLOWUP] Add toMap to return originalMap

2020-07-22 Thread GitBox


dongjoon-hyun commented on a change in pull request #29191:
URL: https://github.com/apache/spark/pull/29191#discussion_r459225681



##
File path: sql/core/src/main/scala/org/apache/spark/sql/DataFrameReader.scala
##
@@ -361,7 +361,7 @@ class DataFrameReader private[sql](sparkSession: 
SparkSession) extends Logging {
   connectionProperties: Properties): DataFrame = {
 assertNoSpecifiedSchema("jdbc")
 // connectionProperties should override settings in extraOptions.
-val params = extraOptions.toMap ++ connectionProperties.asScala.toMap
+val params = extraOptions ++ connectionProperties.asScala

Review comment:
   `extraOptions ++ connectionProperties.asScala` will guarantee 
`connectionProoperties` overrides the `extraOption`. We added `++` operation 
for that~
   
   Also, `new JDBCOptions(url, table, params)` works like this. The passed 
`params: CaseInsensitiveMap` guarantees that JDBC_URL and JDBC_TABLE_NAME 
override `parameters`.
   ```
 def this(url: String, table: String, parameters: Map[String, String]) = {
   this(CaseInsensitiveMap(parameters ++ Map(
 JDBCOptions.JDBC_URL -> url,
 JDBCOptions.JDBC_TABLE_NAME -> table)))
 }
   ```
   
   The above code is for your logic.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] agrawaldevesh commented on pull request #29032: [SPARK-32217] Plumb whether a worker would also be decommissioned along with executor

2020-07-22 Thread GitBox


agrawaldevesh commented on pull request #29032:
URL: https://github.com/apache/spark/pull/29032#issuecomment-662828785


   > Merged, thanks everyone.
   > 
   > I don't know if we have a good principle around naming host v. worker with 
the deprecation.
   
   Thanks @holdenk for championing and shepherding this change !!
   
   @cloud-fan, I agree that this host <=> worker naming is confusing. It was 
especially confusing for a newcomer to me: `ExecutorProcessLost` has a field 
`workerLost` (renamed from the erstwhile `slaveLost`). I had to pick b/w 
`Worker` and `Host` in `ExecutorDecommissionInfo.isHostDecommissioned` and I 
went with `Worker` initially but later changed `Host`. 
   
   But in my mind, they are both the same: Basically there is one external 
shuffle service (if any) per worker (or host). The word `worker` might draw 
some unintended connotation to the Standalone mode so I refrained from using 
it. But I think its usage is quite pervasive already.
   
   It would be nice to have consistent terminology around `worker` vs `host` vs 
`node` :-P. In the absence of any, I went with `host` here. 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on a change in pull request #29191: [SPARK-32364][SQL][FOLLOWUP] Add toMap to return originalMap

2020-07-22 Thread GitBox


cloud-fan commented on a change in pull request #29191:
URL: https://github.com/apache/spark/pull/29191#discussion_r459223512



##
File path: sql/core/src/main/scala/org/apache/spark/sql/DataFrameReader.scala
##
@@ -361,7 +361,7 @@ class DataFrameReader private[sql](sparkSession: 
SparkSession) extends Logging {
   connectionProperties: Properties): DataFrame = {
 assertNoSpecifiedSchema("jdbc")
 // connectionProperties should override settings in extraOptions.
-val params = extraOptions.toMap ++ connectionProperties.asScala.toMap
+val params = extraOptions ++ connectionProperties.asScala

Review comment:
   and to guarantee `connectionProperties` overrides `extraOptions`, we may 
also need to do `map1.filterKeys... ++ map2`





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] agrawaldevesh commented on a change in pull request #29104: [SPARK-32290][SQL] SingleColumn Null Aware Anti Join Optimize

2020-07-22 Thread GitBox


agrawaldevesh commented on a change in pull request #29104:
URL: https://github.com/apache/spark/pull/29104#discussion_r459223391



##
File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/joins/BroadcastHashJoinExec.scala
##
@@ -454,6 +478,28 @@ case class BroadcastHashJoinExec(
 val (matched, checkCondition, _) = getJoinCondition(ctx, input)
 val numOutput = metricTerm(ctx, "numOutputRows")
 
+if (isNullAwareAntiJoin) {
+  require(leftKeys.length == 1, "leftKeys length should be 1")
+  require(rightKeys.length == 1, "rightKeys length should be 1")
+  require(joinType == LeftAnti, "joinType must be LeftAnti.")
+  require(buildSide == BuildRight, "buildSide must be BuildRight.")
+  require(SQLConf.get.nullAwareAntiJoinOptimizeEnabled,
+"nullAwareAntiJoinOptimizeEnabled must be on for null aware anti join 
optimize.")
+  require(checkCondition == "", "null aware anti join optimize condition 
should be empty.")
+
+  if (broadcastRelation.value.inputEmpty) {
+return s"""
+   |// singleColumn NAAJ inputEmpty(true) accept all
+   |$numOutput.add(1);
+   |${consume(ctx, input)}
+ """.stripMargin
+  } else if (broadcastRelation.value.anyNullKeyExists) {
+return s"""
+   |// singleColumn NAAJ inputEmpty(false) anyNullKeyExists(true) 
reject all

Review comment:
   > @maryannxue @agrawaldevesh @maropu @viirya what do you think?
   
   Thanks for the empirical analysis @cloud-fan ... that always helps to seal 
an argument :-) 
   
   I do agree that introducing a new node is sometimes more pain than worth, so 
I would be equally be okay with just modifying BHJ. I think they are both fine 
choices and I would leave it to @leanken to make a call. The ideal scenario 
would be to have minimal code diff in BHJ and introducing a new node is only a 
fallback if the diff cannot be reduced to our satisfaction.
   
   Either way, we ought to decide one way or the other soon since the PR is 
becoming slightly confusing with both approaches co-existing :-)
   
   Are there any performance benchmarks that we should be re-running for BHJ to 
ensure no regression ?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on a change in pull request #29191: [SPARK-32364][SQL][FOLLOWUP] Add toMap to return originalMap

2020-07-22 Thread GitBox


cloud-fan commented on a change in pull request #29191:
URL: https://github.com/apache/spark/pull/29191#discussion_r459223066



##
File path: sql/core/src/main/scala/org/apache/spark/sql/DataFrameReader.scala
##
@@ -361,7 +361,7 @@ class DataFrameReader private[sql](sparkSession: 
SparkSession) extends Logging {
   connectionProperties: Properties): DataFrame = {
 assertNoSpecifiedSchema("jdbc")
 // connectionProperties should override settings in extraOptions.
-val params = extraOptions.toMap ++ connectionProperties.asScala.toMap
+val params = extraOptions ++ connectionProperties.asScala

Review comment:
   shall we call `extraOptions.toMap` here? to be case-preserving
   
   `params` is used to create `new JDBCOptions(url, table, params)`, and 
`JDBCOptions.asProperties` needs to be case-preserving.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #29191: [SPARK-32364][SQL][FOLLOWUP] Add toMap to return originalMap

2020-07-22 Thread GitBox


AmplabJenkins removed a comment on pull request #29191:
URL: https://github.com/apache/spark/pull/29191#issuecomment-662825995







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #29014: [SPARK-32199][SPARK-32198] Reduce job failures during decommissioning

2020-07-22 Thread GitBox


AmplabJenkins removed a comment on pull request #29014:
URL: https://github.com/apache/spark/pull/29014#issuecomment-662825993







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #29191: [SPARK-32364][SQL][FOLLOWUP] Add toMap to return originalMap

2020-07-22 Thread GitBox


AmplabJenkins commented on pull request #29191:
URL: https://github.com/apache/spark/pull/29191#issuecomment-662825995







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #29014: [SPARK-32199][SPARK-32198] Reduce job failures during decommissioning

2020-07-22 Thread GitBox


AmplabJenkins commented on pull request #29014:
URL: https://github.com/apache/spark/pull/29014#issuecomment-662825993







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #29191: [SPARK-32364][SQL][FOLLOWUP] Add toMap to return originalMap

2020-07-22 Thread GitBox


SparkQA commented on pull request #29191:
URL: https://github.com/apache/spark/pull/29191#issuecomment-662825656


   **[Test build #126384 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/126384/testReport)**
 for PR 29191 at commit 
[`e64e7ad`](https://github.com/apache/spark/commit/e64e7adfab001a508591ef1e0a6ecf214af1ac9e).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #29014: [SPARK-32199][SPARK-32198] Reduce job failures during decommissioning

2020-07-22 Thread GitBox


SparkQA commented on pull request #29014:
URL: https://github.com/apache/spark/pull/29014#issuecomment-662825676


   **[Test build #126385 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/126385/testReport)**
 for PR 29014 at commit 
[`7203980`](https://github.com/apache/spark/commit/72039805bbf2d6ba149d03ea1a3be1d62a58d092).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon commented on pull request #29191: [SPARK-32364][SQL][FOLLOWUP] Add toMap to return originalMap

2020-07-22 Thread GitBox


HyukjinKwon commented on pull request #29191:
URL: https://github.com/apache/spark/pull/29191#issuecomment-662824703


   Looks good



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #29191: [SPARK-32364][SQL][FOLLOWUP] Add toMap to return originalMap

2020-07-22 Thread GitBox


AmplabJenkins removed a comment on pull request #29191:
URL: https://github.com/apache/spark/pull/29191#issuecomment-662824244







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #29191: [SPARK-32364][SQL][FOLLOWUP] Add toMap to return originalMap

2020-07-22 Thread GitBox


AmplabJenkins commented on pull request #29191:
URL: https://github.com/apache/spark/pull/29191#issuecomment-662824244







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #29191: [SPARK-32364][SQL][FOLLOWUP] Add toMap to return originalMap

2020-07-22 Thread GitBox


SparkQA commented on pull request #29191:
URL: https://github.com/apache/spark/pull/29191#issuecomment-662824010


   **[Test build #126383 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/126383/testReport)**
 for PR 29191 at commit 
[`e5ddea4`](https://github.com/apache/spark/commit/e5ddea4cd0fc8b6f4b9da722a0c1b1d2fb890126).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon commented on pull request #29196: [SPARK-32398][TESTS][CORE][STREAMING][SQL][ML] Update to scalatest 3.2.0 for Scala 2.13.3+

2020-07-22 Thread GitBox


HyukjinKwon commented on pull request #29196:
URL: https://github.com/apache/spark/pull/29196#issuecomment-662824076


   Yeah, we still should address 
https://github.com/apache/spark/pull/29196#discussion_r459209060 but the 
changes here look good.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on a change in pull request #29191: [SPARK-32364][SQL][FOLLOWUP] Add toMap to return originalMap

2020-07-22 Thread GitBox


dongjoon-hyun commented on a change in pull request #29191:
URL: https://github.com/apache/spark/pull/29191#discussion_r459219673



##
File path: sql/core/src/main/scala/org/apache/spark/sql/DataFrameWriter.scala
##
@@ -288,7 +288,7 @@ final class DataFrameWriter[T] private[sql](ds: Dataset[T]) 
{
   val provider = maybeV2Provider.get
   val sessionOptions = DataSourceV2Utils.extractSessionConfigs(
 provider, df.sparkSession.sessionState.conf)
-  val options = sessionOptions ++ extraOptions

Review comment:
   Oh, got it.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #29199: [SPARK-32105][SQL][FOLLOWUP]Refactor current ScriptTransformationExec…

2020-07-22 Thread GitBox


SparkQA commented on pull request #29199:
URL: https://github.com/apache/spark/pull/29199#issuecomment-662822321


   **[Test build #126382 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/126382/testReport)**
 for PR 29199 at commit 
[`9b1f28a`](https://github.com/apache/spark/commit/9b1f28a72dc140f7b33a1ff91113c4633535b580).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on a change in pull request #29191: [SPARK-32364][SQL][FOLLOWUP] Add toMap to return originalMap

2020-07-22 Thread GitBox


dongjoon-hyun commented on a change in pull request #29191:
URL: https://github.com/apache/spark/pull/29191#discussion_r459218262



##
File path: sql/core/src/main/scala/org/apache/spark/sql/DataFrameWriter.scala
##
@@ -288,7 +288,7 @@ final class DataFrameWriter[T] private[sql](ds: Dataset[T]) 
{
   val provider = maybeV2Provider.get
   val sessionOptions = DataSourceV2Utils.extractSessionConfigs(
 provider, df.sparkSession.sessionState.conf)
-  val options = sessionOptions ++ extraOptions
+  val options = sessionOptions ++ extraOptions.originalMap

Review comment:
   Yes. `toMap` is added to support the existing code pattern.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #29199: [SPARK-32105][SQL][FOLLOWUP]Refactor current ScriptTransformationExec…

2020-07-22 Thread GitBox


AmplabJenkins removed a comment on pull request #29199:
URL: https://github.com/apache/spark/pull/29199#issuecomment-662820990







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #29199: [SPARK-32105][SQL][FOLLOWUP]Refactor current ScriptTransformationExec…

2020-07-22 Thread GitBox


AmplabJenkins commented on pull request #29199:
URL: https://github.com/apache/spark/pull/29199#issuecomment-662820990







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #29199: [SPARK-32105][SQL][FOLLOWUP]Refactor current ScriptTransformationExec…

2020-07-22 Thread GitBox


SparkQA commented on pull request #29199:
URL: https://github.com/apache/spark/pull/29199#issuecomment-662820692


   **[Test build #126381 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/126381/testReport)**
 for PR 29199 at commit 
[`a007fa1`](https://github.com/apache/spark/commit/a007fa1e5f57e22682881903c6076d46c9676905).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] kujon commented on pull request #29192: [SPARK-32393][SQL] Fix postgres bpchar array support

2020-07-22 Thread GitBox


kujon commented on pull request #29192:
URL: https://github.com/apache/spark/pull/29192#issuecomment-662819772


   > Could you add tests in PostgresIntegrationSuite then check if the test can 
pass on your local env? (Note: our testing framework, Jenkins, does not run it).
   
   will do!



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #29196: [SPARK-32398][TESTS][CORE][STREAMING][SQL][ML] Update to scalatest 3.2.0 for Scala 2.13.3+

2020-07-22 Thread GitBox


AmplabJenkins removed a comment on pull request #29196:
URL: https://github.com/apache/spark/pull/29196#issuecomment-662819514







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #29199: [SPARK-32105][SQL][FOLLOWUP]Refactor current ScriptTransformationExec…

2020-07-22 Thread GitBox


AmplabJenkins removed a comment on pull request #29199:
URL: https://github.com/apache/spark/pull/29199#issuecomment-662819301







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #29196: [SPARK-32398][TESTS][CORE][STREAMING][SQL][ML] Update to scalatest 3.2.0 for Scala 2.13.3+

2020-07-22 Thread GitBox


AmplabJenkins commented on pull request #29196:
URL: https://github.com/apache/spark/pull/29196#issuecomment-662819514







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #29186: [SPARK-32386][SS][TESTS] Fix temp view leaking in Structured Streaming tests

2020-07-22 Thread GitBox


AmplabJenkins removed a comment on pull request #29186:
URL: https://github.com/apache/spark/pull/29186#issuecomment-662819334







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #29199: [SPARK-32105][SQL][FOLLOWUP]Refactor current ScriptTransformationExec…

2020-07-22 Thread GitBox


AmplabJenkins commented on pull request #29199:
URL: https://github.com/apache/spark/pull/29199#issuecomment-662819301







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #29167: [SPARK-32374][SQL] Disallow setting properties when creating temporary views

2020-07-22 Thread GitBox


AmplabJenkins removed a comment on pull request #29167:
URL: https://github.com/apache/spark/pull/29167#issuecomment-662819369







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AngersZhuuuu commented on pull request #29199: [SPARK-32105][SQL][FOLLOWUP]Refactor current ScriptTransformationExec…

2020-07-22 Thread GitBox


AngersZh commented on pull request #29199:
URL: https://github.com/apache/spark/pull/29199#issuecomment-662819271


   FYI @maropu 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #29186: [SPARK-32386][SS][TESTS] Fix temp view leaking in Structured Streaming tests

2020-07-22 Thread GitBox


AmplabJenkins commented on pull request #29186:
URL: https://github.com/apache/spark/pull/29186#issuecomment-662819334







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #29196: [SPARK-32398][TESTS][CORE][STREAMING][SQL][ML] Update to scalatest 3.2.0 for Scala 2.13.3+

2020-07-22 Thread GitBox


SparkQA removed a comment on pull request #29196:
URL: https://github.com/apache/spark/pull/29196#issuecomment-662785553


   **[Test build #126371 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/126371/testReport)**
 for PR 29196 at commit 
[`9c12088`](https://github.com/apache/spark/commit/9c120880b35f1ddeaec6161481380c0b3e601eea).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #29167: [SPARK-32374][SQL] Disallow setting properties when creating temporary views

2020-07-22 Thread GitBox


AmplabJenkins commented on pull request #29167:
URL: https://github.com/apache/spark/pull/29167#issuecomment-662819369







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #29196: [SPARK-32398][TESTS][CORE][STREAMING][SQL][ML] Update to scalatest 3.2.0 for Scala 2.13.3+

2020-07-22 Thread GitBox


SparkQA commented on pull request #29196:
URL: https://github.com/apache/spark/pull/29196#issuecomment-662819159


   **[Test build #126371 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/126371/testReport)**
 for PR 29196 at commit 
[`9c12088`](https://github.com/apache/spark/commit/9c120880b35f1ddeaec6161481380c0b3e601eea).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #29186: [SPARK-32386][SS][TESTS] Fix temp view leaking in Structured Streaming tests

2020-07-22 Thread GitBox


SparkQA commented on pull request #29186:
URL: https://github.com/apache/spark/pull/29186#issuecomment-662819020


   **[Test build #126379 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/126379/testReport)**
 for PR 29186 at commit 
[`b654ebd`](https://github.com/apache/spark/commit/b654ebd022c2dad1cbd085664f8baa0ef3db154d).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #29167: [SPARK-32374][SQL] Disallow setting properties when creating temporary views

2020-07-22 Thread GitBox


SparkQA commented on pull request #29167:
URL: https://github.com/apache/spark/pull/29167#issuecomment-662819073


   **[Test build #126380 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/126380/testReport)**
 for PR 29167 at commit 
[`477b696`](https://github.com/apache/spark/commit/477b696ecd2dd75b114809f2b04e36e8a78f2007).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] viirya commented on a change in pull request #29166: [SPARK-32280][SPARK-32372][SQL] ResolveReferences.dedupRight should only rewrite attributes for ancestor nodes of the conflict pl

2020-07-22 Thread GitBox


viirya commented on a change in pull request #29166:
URL: https://github.com/apache/spark/pull/29166#discussion_r459214912



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
##
@@ -1237,20 +1250,79 @@ class Analyzer(
   if (conflictPlans.isEmpty) {
 right
   } else {
-val attributeRewrites = AttributeMap(conflictPlans.flatMap {
-  case (oldRelation, newRelation) => 
oldRelation.output.zip(newRelation.output)})
-val conflictPlanMap = conflictPlans.toMap
-// transformDown so that we can replace all the old Relations in one 
turn due to
-// the reason that `conflictPlans` are also collected in pre-order.
-right transformDown {
-  case r => conflictPlanMap.getOrElse(r, r)
-} transformUp {
-  case other => other transformExpressions {
+rewritePlan(right, conflictPlans.toMap)._1
+  }
+}
+
+private def rewritePlan(plan: LogicalPlan, conflictPlanMap: 
Map[LogicalPlan, LogicalPlan])
+  : (LogicalPlan, Seq[(Attribute, Attribute)]) = {
+  if (conflictPlanMap.contains(plan)) {
+// If the plan is the one that conflict the with left one, we'd
+// just replace it with the new plan and collect the rewrite
+// attributes for the parent node.
+val newRelation = conflictPlanMap(plan)
+newRelation -> plan.output.zip(newRelation.output)
+  } else {
+val attrMapping = new mutable.ArrayBuffer[(Attribute, Attribute)]()
+val newPlan = plan.mapChildren { child =>
+  // If not, we'd rewrite child plan recursively until we find the
+  // conflict node or reach the leaf node.
+  val (newChild, childAttrMapping) = rewritePlan(child, 
conflictPlanMap)
+  // Only return rewrite attributes which could be used by the parent 
node.
+  // Otherwise, it could introduce duplicate rewrite attributes. For 
example,
+  // for the following plan, if we don't do filter for the 
`childAttrMapping`,
+  // the node `SubqueryAlias b` will return rewrite attribute of 
[kind#220 -> kind#228]
+  // (which is from the conflict plan `Project [id#218, foo AS 
kind#228]`), and the node
+  // `SubqueryAlias c` will return rewrite attribute of [kind#220 -> 
kind#229] (which
+  // is from the conflict plan `Project [id#227, foo AS kind#229]`). 
As a result, the top
+  // Join will have duplicated rewrite attribute.
+  //
+  // The problem is, the plan `Join Inner, (kind#229 = kind#223)` 
shouldn't keep returning
+  // rewrite attribute of [kind#220 -> kind#229] to its parent node 
`Project [id#227]` as
+  // it doesn't really need it.
+  //
+  // Join Inner, (id#218 = id#227)
+  // :- SubqueryAlias b
+  // :  +- Project [id#218, foo AS kind#228]
+  // : +- SubqueryAlias a
+  // :+- Project [1 AS id#218]
+  // :   +- OneRowRelation
+  // +- SubqueryAlias c
+  //+- Project [id#227]
+  //   +- Join Inner, (kind#229 = kind#223)
+  //  :- SubqueryAlias l
+  //  :  +- SubqueryAlias b
+  //  : +- Project [id#227, foo AS kind#229]
+  //  :+- SubqueryAlias a
+  //  :   +- Project [1 AS id#227]
+  //  :  +- OneRowRelation
+  //  +- SubqueryAlias r
+  // +- SubqueryAlias b
+  //+- Project [id#224, foo AS kind#223]
+  //   +- SubqueryAlias a
+  //  +- Project [1 AS id#224]
+  // +- OneRowRelation
+  attrMapping ++= childAttrMapping.filter { case (oldAttr, _) =>
+(plan.references ++ plan.outputSet ++ 
plan.producedAttributes).contains(oldAttr)

Review comment:
   Nvm, I actually referred `plan.outputSet`. But now I see 
`plan.references` is also needed.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #29199: [SPARK-32105][SQL][FOLLOWUP]Refactor current ScriptTransformationExec…

2020-07-22 Thread GitBox


SparkQA commented on pull request #29199:
URL: https://github.com/apache/spark/pull/29199#issuecomment-662818996


   **[Test build #126378 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/126378/testReport)**
 for PR 29199 at commit 
[`8364f1f`](https://github.com/apache/spark/commit/8364f1fa182827f29ae16bb528604de85f52449c).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AngersZhuuuu opened a new pull request #29199: [SPARK-32105][SQL][FOLLOWUP]Refactor current ScriptTransformationExec…

2020-07-22 Thread GitBox


AngersZh opened a new pull request #29199:
URL: https://github.com/apache/spark/pull/29199


   # What changes were proposed in this pull request?
   
   - Extract common Script IOSchema `ScriptTransformationIOSchema`
   - avoid repeated judgement extract process output row method 
`createOutputIteratorWithoutSerde` && `createOutputIteratorWithSerde`
   - add default no serde IO schemas 
`ScriptTransformationIOSchema.defaultIOSchema`
   
   
   ### Why are the changes needed?
   Refactor code
   
   
   ### Does this PR introduce _any_ user-facing change?
   NO
   
   
   ### How was this patch tested?
   NO
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] xuanyuanking commented on pull request #29186: [SPARK-32386][SS][TESTS] Fix temp view leaking in Structured Streaming tests

2020-07-22 Thread GitBox


xuanyuanking commented on pull request #29186:
URL: https://github.com/apache/spark/pull/29186#issuecomment-662818538


   retest this please



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon closed pull request #29195: [SPARK-32338][SQL][PYSPARK][FOLLOW-UP] Update slice to accept Column for start and length.

2020-07-22 Thread GitBox


HyukjinKwon closed pull request #29195:
URL: https://github.com/apache/spark/pull/29195


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on a change in pull request #29191: [SPARK-32364][SQL][FOLLOWUP] Add toMap to return originalMap

2020-07-22 Thread GitBox


cloud-fan commented on a change in pull request #29191:
URL: https://github.com/apache/spark/pull/29191#discussion_r459213960



##
File path: sql/core/src/main/scala/org/apache/spark/sql/DataFrameWriter.scala
##
@@ -288,7 +288,7 @@ final class DataFrameWriter[T] private[sql](ds: Dataset[T]) 
{
   val provider = maybeV2Provider.get
   val sessionOptions = DataSourceV2Utils.extractSessionConfigs(
 provider, df.sparkSession.sessionState.conf)
-  val options = sessionOptions ++ extraOptions

Review comment:
   actually, `originalMap` should be `toMap`





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on a change in pull request #29191: [SPARK-32364][SQL][FOLLOWUP] Add toMap to return originalMap

2020-07-22 Thread GitBox


cloud-fan commented on a change in pull request #29191:
URL: https://github.com/apache/spark/pull/29191#discussion_r459213858



##
File path: 
sql/catalyst/src/main/scala-2.12/org/apache/spark/sql/catalyst/util/CaseInsensitiveMap.scala
##
@@ -52,6 +52,8 @@ class CaseInsensitiveMap[T] private (val originalMap: 
Map[String, T]) extends Ma
   override def -(key: String): Map[String, T] = {
 new CaseInsensitiveMap(originalMap.filter(!_._1.equalsIgnoreCase(key)))
   }
+
+  def toMap: Map[String, T] = originalMap

Review comment:
   that's too many. let's keep `toMap` then





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon commented on pull request #29195: [SPARK-32338][SQL][PYSPARK][FOLLOW-UP] Update slice to accept Column for start and length.

2020-07-22 Thread GitBox


HyukjinKwon commented on pull request #29195:
URL: https://github.com/apache/spark/pull/29195#issuecomment-662817669


   Merged to master.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] imback82 commented on a change in pull request #29167: [SPARK-32374][SQL] Disallow setting properties when creating temporary views

2020-07-22 Thread GitBox


imback82 commented on a change in pull request #29167:
URL: https://github.com/apache/spark/pull/29167#discussion_r459212899



##
File path: 
sql/core/src/test/scala/org/apache/spark/sql/execution/SQLViewSuite.scala
##
@@ -266,6 +266,16 @@ abstract class SQLViewSuite extends QueryTest with 
SQLTestUtils {
 }
   }
 
+  test("SPARK-32374: disallow setting properties for CREATE TEMPORARY VIEW") {
+withTempView("myabcdview") {
+  val e = intercept[AnalysisException] {

Review comment:
   Oh because `ParseException` extends `AnalysisException`. I will update 
this.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on a change in pull request #29107: [SPARK-32308][SQL] Move by-name resolution logic of unionByName from API code to analysis phase

2020-07-22 Thread GitBox


cloud-fan commented on a change in pull request #29107:
URL: https://github.com/apache/spark/pull/29107#discussion_r459212692



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicLogicalOperators.scala
##
@@ -271,7 +281,7 @@ case class Union(children: Seq[LogicalPlan]) extends 
LogicalPlan {
 child.output.zip(children.head.output).forall {
   case (l, r) => l.dataType.sameType(r.dataType)
 })
-children.length > 1 && childrenResolved && allChildrenCompatible
+children.length > 1 && childrenResolved && allChildrenCompatible && 
!(byName || allowMissingCol)

Review comment:
   nit: `!(byName || allowMissingCol)` should be checked before 
`childrenResolved`, as it's cheaper.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on a change in pull request #29107: [SPARK-32308][SQL] Move by-name resolution logic of unionByName from API code to analysis phase

2020-07-22 Thread GitBox


cloud-fan commented on a change in pull request #29107:
URL: https://github.com/apache/spark/pull/29107#discussion_r459212340



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
##
@@ -1387,10 +1388,11 @@ class Analyzer(
 i.copy(right = dedupRight(left, right))
   case e @ Except(left, right, _) if !e.duplicateResolved =>
 e.copy(right = dedupRight(left, right))
-  case u @ Union(children) if !u.duplicateResolved =>
+  // Only after we finish by-name resolution for Union
+  case u: logical.Union if !u.duplicateResolved && !u.byName =>

Review comment:
   nit: `logical.Union` -> `Union`.
   
   `!u.duplicateResolved && !u.byName` -> `!u.byName && !u.duplicateResolved`





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on a change in pull request #29167: [SPARK-32374][SQL] Disallow setting properties when creating temporary views

2020-07-22 Thread GitBox


cloud-fan commented on a change in pull request #29167:
URL: https://github.com/apache/spark/pull/29167#discussion_r459210115



##
File path: 
sql/core/src/test/scala/org/apache/spark/sql/execution/SQLViewSuite.scala
##
@@ -266,6 +266,16 @@ abstract class SQLViewSuite extends QueryTest with 
SQLTestUtils {
 }
   }
 
+  test("SPARK-32374: disallow setting properties for CREATE TEMPORARY VIEW") {
+withTempView("myabcdview") {
+  val e = intercept[AnalysisException] {

Review comment:
   interesting, the tests pass. Do you know why?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on a change in pull request #29167: [SPARK-32374][SQL] Disallow setting properties when creating temporary views

2020-07-22 Thread GitBox


cloud-fan commented on a change in pull request #29167:
URL: https://github.com/apache/spark/pull/29167#discussion_r459209763



##
File path: 
sql/core/src/test/scala/org/apache/spark/sql/execution/SQLViewSuite.scala
##
@@ -266,6 +266,16 @@ abstract class SQLViewSuite extends QueryTest with 
SQLTestUtils {
 }
   }
 
+  test("SPARK-32374: disallow setting properties for CREATE TEMPORARY VIEW") {
+withTempView("myabcdview") {
+  val e = intercept[AnalysisException] {

Review comment:
   parser exception?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on a change in pull request #29196: [SPARK-32398][TESTS][CORE][STREAMING][SQL][ML] Update to scalatest 3.2.0 for Scala 2.13.3+

2020-07-22 Thread GitBox


dongjoon-hyun commented on a change in pull request #29196:
URL: https://github.com/apache/spark/pull/29196#discussion_r459209060



##
File path: pom.xml
##
@@ -892,7 +907,25 @@
   
 org.scalatest
 scalatest_${scala.binary.version}
-3.0.8
+3.2.0
+test
+  
+  
+org.scalatestplus
+
scalatestplus-scalacheck_${scala.binary.version}
+3.1.0.0-RC2
+test
+  
+  
+org.scalatestplus
+scalatestplus-mockito_${scala.binary.version}
+1.0.0-SNAP5
+test
+  
+  
+org.scalatestplus
+scalatestplus-selenium_${scala.binary.version}
+1.0.0-SNAP5

Review comment:
   Oh, interesting. All of these are `test dependency` changes, but 
`dev/test-dependencies.sh` seems to complain for some reasons.
   ```
   --- a/dev/deps/spark-deps-hadoop-2.7-hive-1.2
   +++ b/dev/pr-deps/spark-deps-hadoop-2.7-hive-1.2
   @@ -157,9 +157,9 @@ metrics-json/4.1.1//metrics-json-4.1.1.jar
metrics-jvm/4.1.1//metrics-jvm-4.1.1.jar
minlog/1.3.0//minlog-1.3.0.jar
netty-all/4.1.47.Final//netty-all-4.1.47.Final.jar
   -objenesis/2.5.1//objenesis-2.5.1.jar
   +objenesis/2.6//objenesis-2.6.jar
okhttp/3.12.6//okhttp-3.12.6.jar
   -okio/1.15.0//okio-1.15.0.jar
   +okio/1.14.0//okio-1.14.0.jar
   ```





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on a change in pull request #29104: [SPARK-32290][SQL] SingleColumn Null Aware Anti Join Optimize

2020-07-22 Thread GitBox


cloud-fan commented on a change in pull request #29104:
URL: https://github.com/apache/spark/pull/29104#discussion_r459208970



##
File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/joins/BroadcastHashJoinExec.scala
##
@@ -454,6 +478,28 @@ case class BroadcastHashJoinExec(
 val (matched, checkCondition, _) = getJoinCondition(ctx, input)
 val numOutput = metricTerm(ctx, "numOutputRows")
 
+if (isNullAwareAntiJoin) {
+  require(leftKeys.length == 1, "leftKeys length should be 1")
+  require(rightKeys.length == 1, "rightKeys length should be 1")
+  require(joinType == LeftAnti, "joinType must be LeftAnti.")
+  require(buildSide == BuildRight, "buildSide must be BuildRight.")
+  require(SQLConf.get.nullAwareAntiJoinOptimizeEnabled,
+"nullAwareAntiJoinOptimizeEnabled must be on for null aware anti join 
optimize.")
+  require(checkCondition == "", "null aware anti join optimize condition 
should be empty.")
+
+  if (broadcastRelation.value.inputEmpty) {
+return s"""
+   |// singleColumn NAAJ inputEmpty(true) accept all

Review comment:
   We can do that with a custom `BroadcastMode`.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] asfgit closed pull request #29032: [SPARK-32217] Plumb whether a worker would also be decommissioned along with executor

2020-07-22 Thread GitBox


asfgit closed pull request #29032:
URL: https://github.com/apache/spark/pull/29032


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on a change in pull request #29191: [SPARK-32364][SQL][FOLLOWUP] Add toMap to return originalMap

2020-07-22 Thread GitBox


dongjoon-hyun commented on a change in pull request #29191:
URL: https://github.com/apache/spark/pull/29191#discussion_r459207445



##
File path: 
sql/catalyst/src/main/scala-2.12/org/apache/spark/sql/catalyst/util/CaseInsensitiveMap.scala
##
@@ -52,6 +52,8 @@ class CaseInsensitiveMap[T] private (val originalMap: 
Map[String, T]) extends Ma
   override def -(key: String): Map[String, T] = {
 new CaseInsensitiveMap(originalMap.filter(!_._1.equalsIgnoreCase(key)))
   }
+
+  def toMap: Map[String, T] = originalMap

Review comment:
   Just for my understand. In that case, among all the existing 
`extraOptions.toMap`s, do I need to rename only `AppendData.byName` instance?
   ```
   sql/core/src/main/scala/org/apache/spark/sql/DataFrameReader.scala:val 
params = extraOptions.toMap ++ connectionProperties.asScala.toMap
   sql/core/src/main/scala/org/apache/spark/sql/DataFrameReader.scala:  
extraOptions.toMap,
   sql/core/src/main/scala/org/apache/spark/sql/DataFrameReader.scala:  
extraOptions.toMap,
   sql/core/src/main/scala/org/apache/spark/sql/DataFrameWriter.scala:  
AppendData.byName(relation, df.logicalPlan, extraOptions.toMap)
   sql/core/src/main/scala/org/apache/spark/sql/DataFrameWriter.scala:  
  relation, df.logicalPlan, Literal(true), extraOptions.toMap)
   sql/core/src/main/scala/org/apache/spark/sql/DataFrameWriter.scala:  
extraOptions.toMap,
   sql/core/src/main/scala/org/apache/spark/sql/DataFrameWriter.scala:
options = extraOptions.toMap).planForWriting(mode, df.logicalPlan)
   sql/core/src/main/scala/org/apache/spark/sql/DataFrameWriter.scala:
AppendData.byPosition(table, df.logicalPlan, extraOptions.toMap)
   sql/core/src/main/scala/org/apache/spark/sql/DataFrameWriter.scala:  
OverwritePartitionsDynamic.byPosition(table, df.logicalPlan, extraOptions.toMap)
   sql/core/src/main/scala/org/apache/spark/sql/DataFrameWriter.scala:  
OverwriteByExpression.byPosition(table, df.logicalPlan, Literal(true), 
extraOptions.toMap)
   sql/core/src/main/scala/org/apache/spark/sql/DataFrameWriter.scala:
AppendData.byName(v2Relation, df.logicalPlan, extraOptions.toMap)
   sql/core/src/main/scala/org/apache/spark/sql/DataFrameWriter.scala:  
extraOptions.toMap,
   sql/core/src/main/scala/org/apache/spark/sql/DataFrameWriter.scala:  
extraOptions.toMap,
   sql/core/src/main/scala/org/apache/spark/sql/DataFrameWriter.scala:val 
storage = DataSource.buildStorageFormatFromOptions(extraOptions.toMap)
   
sql/core/src/main/scala/org/apache/spark/sql/streaming/DataStreamReader.scala:  
options = extraOptions.toMap)
   
sql/core/src/main/scala/org/apache/spark/sql/streaming/DataStreamWriter.scala:  
  extraOptions.toMap,
   
sql/core/src/main/scala/org/apache/spark/sql/streaming/DataStreamWriter.scala:  
  extraOptions.toMap,
   
sql/core/src/main/scala/org/apache/spark/sql/streaming/DataStreamWriter.scala:  
  extraOptions.toMap,
   
sql/core/src/main/scala/org/apache/spark/sql/streaming/DataStreamWriter.scala:  
  extraOptions.toMap,
   
sql/core/src/main/scala/org/apache/spark/sql/streaming/DataStreamWriter.scala:  
options = extraOptions.toMap,
   ```





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on a change in pull request #29191: [SPARK-32364][SQL][FOLLOWUP] Add toMap to return originalMap

2020-07-22 Thread GitBox


dongjoon-hyun commented on a change in pull request #29191:
URL: https://github.com/apache/spark/pull/29191#discussion_r459207445



##
File path: 
sql/catalyst/src/main/scala-2.12/org/apache/spark/sql/catalyst/util/CaseInsensitiveMap.scala
##
@@ -52,6 +52,8 @@ class CaseInsensitiveMap[T] private (val originalMap: 
Map[String, T]) extends Ma
   override def -(key: String): Map[String, T] = {
 new CaseInsensitiveMap(originalMap.filter(!_._1.equalsIgnoreCase(key)))
   }
+
+  def toMap: Map[String, T] = originalMap

Review comment:
   In that case, among all the existing `extraOptions.toMap`s, do I need to 
rename only `AppendData.byName` instance?
   ```
   sql/core/src/main/scala/org/apache/spark/sql/DataFrameReader.scala:val 
params = extraOptions.toMap ++ connectionProperties.asScala.toMap
   sql/core/src/main/scala/org/apache/spark/sql/DataFrameReader.scala:  
extraOptions.toMap,
   sql/core/src/main/scala/org/apache/spark/sql/DataFrameReader.scala:  
extraOptions.toMap,
   sql/core/src/main/scala/org/apache/spark/sql/DataFrameWriter.scala:  
AppendData.byName(relation, df.logicalPlan, extraOptions.toMap)
   sql/core/src/main/scala/org/apache/spark/sql/DataFrameWriter.scala:  
  relation, df.logicalPlan, Literal(true), extraOptions.toMap)
   sql/core/src/main/scala/org/apache/spark/sql/DataFrameWriter.scala:  
extraOptions.toMap,
   sql/core/src/main/scala/org/apache/spark/sql/DataFrameWriter.scala:
options = extraOptions.toMap).planForWriting(mode, df.logicalPlan)
   sql/core/src/main/scala/org/apache/spark/sql/DataFrameWriter.scala:
AppendData.byPosition(table, df.logicalPlan, extraOptions.toMap)
   sql/core/src/main/scala/org/apache/spark/sql/DataFrameWriter.scala:  
OverwritePartitionsDynamic.byPosition(table, df.logicalPlan, extraOptions.toMap)
   sql/core/src/main/scala/org/apache/spark/sql/DataFrameWriter.scala:  
OverwriteByExpression.byPosition(table, df.logicalPlan, Literal(true), 
extraOptions.toMap)
   sql/core/src/main/scala/org/apache/spark/sql/DataFrameWriter.scala:
AppendData.byName(v2Relation, df.logicalPlan, extraOptions.toMap)
   sql/core/src/main/scala/org/apache/spark/sql/DataFrameWriter.scala:  
extraOptions.toMap,
   sql/core/src/main/scala/org/apache/spark/sql/DataFrameWriter.scala:  
extraOptions.toMap,
   sql/core/src/main/scala/org/apache/spark/sql/DataFrameWriter.scala:val 
storage = DataSource.buildStorageFormatFromOptions(extraOptions.toMap)
   
sql/core/src/main/scala/org/apache/spark/sql/streaming/DataStreamReader.scala:  
options = extraOptions.toMap)
   
sql/core/src/main/scala/org/apache/spark/sql/streaming/DataStreamWriter.scala:  
  extraOptions.toMap,
   
sql/core/src/main/scala/org/apache/spark/sql/streaming/DataStreamWriter.scala:  
  extraOptions.toMap,
   
sql/core/src/main/scala/org/apache/spark/sql/streaming/DataStreamWriter.scala:  
  extraOptions.toMap,
   
sql/core/src/main/scala/org/apache/spark/sql/streaming/DataStreamWriter.scala:  
  extraOptions.toMap,
   
sql/core/src/main/scala/org/apache/spark/sql/streaming/DataStreamWriter.scala:  
options = extraOptions.toMap,
   ```





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on a change in pull request #29191: [SPARK-32364][SQL][FOLLOWUP] Add toMap to return originalMap

2020-07-22 Thread GitBox


dongjoon-hyun commented on a change in pull request #29191:
URL: https://github.com/apache/spark/pull/29191#discussion_r459206761



##
File path: sql/core/src/main/scala/org/apache/spark/sql/DataFrameWriter.scala
##
@@ -288,7 +288,7 @@ final class DataFrameWriter[T] private[sql](ds: Dataset[T]) 
{
   val provider = maybeV2Provider.get
   val sessionOptions = DataSourceV2Utils.extractSessionConfigs(
 provider, df.sparkSession.sessionState.conf)
-  val options = sessionOptions ++ extraOptions

Review comment:
   Sure. I'll use `sessionOptions.filterKeys(!extraOptions.contains(_)) ++ 
extraOptions.originalMap`.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #29198: [SPARK-32401][SQL] Migrate function related commands to new resolution framework

2020-07-22 Thread GitBox


AmplabJenkins removed a comment on pull request #29198:
URL: https://github.com/apache/spark/pull/29198#issuecomment-662810535







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #29198: [SPARK-32401][SQL] Migrate function related commands to new resolution framework

2020-07-22 Thread GitBox


SparkQA commented on pull request #29198:
URL: https://github.com/apache/spark/pull/29198#issuecomment-662810236


   **[Test build #126377 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/126377/testReport)**
 for PR 29198 at commit 
[`38756fc`](https://github.com/apache/spark/commit/38756fc8f0232dd62dce3cd4eb4bf6b467742061).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] LantaoJin commented on pull request #29062: [SPARK-32237][SQL] Resolve hint in CTE

2020-07-22 Thread GitBox


LantaoJin commented on pull request #29062:
URL: https://github.com/apache/spark/pull/29062#issuecomment-662810277


   Thanks. I will file a backport PR.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] imback82 commented on a change in pull request #28840: [SPARK-31999][SQL] Add REFRESH FUNCTION command

2020-07-22 Thread GitBox


imback82 commented on a change in pull request #28840:
URL: https://github.com/apache/spark/pull/28840#discussion_r459205275



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/connector/catalog/LookupCatalog.scala
##
@@ -155,4 +155,31 @@ private[sql] trait LookupCatalog extends Logging {
 None
 }
   }
+
+  // TODO: move function related v2 statements to the new framework.

Review comment:
   Created #29198





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #29198: [SPARK-32401][SQL] Migrate function related commands to new resolution framework

2020-07-22 Thread GitBox


AmplabJenkins commented on pull request #29198:
URL: https://github.com/apache/spark/pull/29198#issuecomment-662810535







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] imback82 commented on a change in pull request #29198: [SPARK-32401][SQL] Migrate function related commands to new resolution framework

2020-07-22 Thread GitBox


imback82 commented on a change in pull request #29198:
URL: https://github.com/apache/spark/pull/29198#discussion_r459204665



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala
##
@@ -3650,12 +3652,24 @@ class AstBuilder(conf: SQLConf) extends 
SqlBaseBaseVisitor[AnyRef] with Logging
   }
 }
 
-val functionIdentifier = visitMultipartIdentifier(ctx.multipartIdentifier)
-CreateFunctionStatement(
-  functionIdentifier,
+val nameParts = visitMultipartIdentifier(ctx.multipartIdentifier)
+val isTemp = ctx.TEMPORARY != null
+val func: LogicalPlan = if (isTemp) {
+  import org.apache.spark.sql.connector.catalog.CatalogV2Implicits._
+  // temp func doesn't belong to any catalog and we shouldn't resolve 
catalog in the name.
+  if (nameParts.length > 2) {
+throw new AnalysisException(s"Unsupported function name 
'${nameParts.quoted}'")
+  }
+  ResolvedFunc(nameParts.asIdentifier)

Review comment:
   This logic is moved from `ResolveSessionCatalog`, and I think we can 
resolve here directly.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on pull request #29191: [SPARK-32364][SQL][FOLLOWUP] Add toMap to return originalMap

2020-07-22 Thread GitBox


dongjoon-hyun commented on pull request #29191:
URL: https://github.com/apache/spark/pull/29191#issuecomment-662809395


   Thank you, @cloud-fan  and @maropu .



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] holdenk commented on pull request #29032: [SPARK-32217] Plumb whether a worker would also be decommissioned along with executor

2020-07-22 Thread GitBox


holdenk commented on pull request #29032:
URL: https://github.com/apache/spark/pull/29032#issuecomment-662808939


   Merged, thanks everyone.
   
   I don't know if we have a good principle around naming host v. worker with 
the deprecation.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] imback82 opened a new pull request #29198: [SPARK-32401][SQL] Migrate function related commands to new resolution framework

2020-07-22 Thread GitBox


imback82 opened a new pull request #29198:
URL: https://github.com/apache/spark/pull/29198


   
   
   ### What changes were proposed in this pull request?
   
   This PR proposes to migrate the following function related commands to the 
new resolution framework:
   - CREATE FUNCTION
   - DROP FUNCTION
   - DESCRIBE FUNCTION
   - SHOW FUNCTIONS
   
   ### Why are the changes needed?
   
   Migrating to the new resolution framework.
   
   ### Does this PR introduce _any_ user-facing change?
   
   The message of exception thrown when a catalog is resolved to v2 has been 
merged to:
   `function is only supported in v1 catalog`
   
   Previously, it printed out the command used. E.g.,:
   `CREATE FUNCTION is only supported in v1 catalog`
   
   
   ### How was this patch tested?
   
   Updated existing tests.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on pull request #29032: [SPARK-32217] Plumb whether a worker would also be decommissioned along with executor

2020-07-22 Thread GitBox


cloud-fan commented on pull request #29032:
URL: https://github.com/apache/spark/pull/29032#issuecomment-662805220


   I have a general question (not related to this PR): We have deprecated the 
usage of "multi-workers on one host", so host and worker should be the same 
thing. However, in the codebase sometimes we use `host` and sometimes `worker`, 
do we have a general rule for the naming?
   
   This PR LGTM, I just ask this question for curiosity.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on a change in pull request #29166: [SPARK-32280][SPARK-32372][SQL] ResolveReferences.dedupRight should only rewrite attributes for ancestor nodes of the conflict

2020-07-22 Thread GitBox


cloud-fan commented on a change in pull request #29166:
URL: https://github.com/apache/spark/pull/29166#discussion_r459199272



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
##
@@ -1237,20 +1250,48 @@ class Analyzer(
   if (conflictPlans.isEmpty) {
 right
   } else {
-val attributeRewrites = AttributeMap(conflictPlans.flatMap {
-  case (oldRelation, newRelation) => 
oldRelation.output.zip(newRelation.output)})
-val conflictPlanMap = conflictPlans.toMap
-// transformDown so that we can replace all the old Relations in one 
turn due to
-// the reason that `conflictPlans` are also collected in pre-order.
-right transformDown {
-  case r => conflictPlanMap.getOrElse(r, r)
-} transformUp {
-  case other => other transformExpressions {
+rewritePlan(right, conflictPlans.toMap)._1
+  }
+}
+
+private def rewritePlan(plan: LogicalPlan, conflictPlanMap: 
Map[LogicalPlan, LogicalPlan])
+  : (LogicalPlan, Seq[(Attribute, Attribute)]) = {
+  if (conflictPlanMap.contains(plan)) {
+// If the plan is the one that conflict the with left one, we'd
+// just replace it with the new plan and collect the rewrite
+// attributes for the parent node.
+val newRelation = conflictPlanMap(plan)
+newRelation -> plan.output.zip(newRelation.output)
+  } else {
+val attrMapping = new mutable.ArrayBuffer[(Attribute, Attribute)]()
+val newPlan = plan.mapChildren { child =>
+  // If not, we'd rewrite child plan recursively until we find the
+  // conflict node or reach the leaf node.
+  val (newChild, childAttrMapping) = rewritePlan(child, 
conflictPlanMap)
+  attrMapping ++= childAttrMapping.filter { case (oldAttr, _) =>
+// If an attribute from the child is not output or referenced by 
the parent plan,
+// it's useless to propagate this attribute to the parent plan, as 
they are not
+// going to use this attribute.
+(plan.outputSet ++ plan.references).contains(oldAttr)

Review comment:
   maybe something like
   ```
   // `attrMapping` is used to replace the attributes of the current `plan`, 
and also to be propagated to
   // parent plans, so the `oldAttr` must either be part of `plan.references` 
(so that it can be used to
   // replace attributes of current plan) or `plan.outputSet` (so that it can 
be used by parent plans).
   ```





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on a change in pull request #29166: [SPARK-32280][SPARK-32372][SQL] ResolveReferences.dedupRight should only rewrite attributes for ancestor nodes of the conflict

2020-07-22 Thread GitBox


cloud-fan commented on a change in pull request #29166:
URL: https://github.com/apache/spark/pull/29166#discussion_r459198562



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
##
@@ -1237,20 +1250,48 @@ class Analyzer(
   if (conflictPlans.isEmpty) {
 right
   } else {
-val attributeRewrites = AttributeMap(conflictPlans.flatMap {
-  case (oldRelation, newRelation) => 
oldRelation.output.zip(newRelation.output)})
-val conflictPlanMap = conflictPlans.toMap
-// transformDown so that we can replace all the old Relations in one 
turn due to
-// the reason that `conflictPlans` are also collected in pre-order.
-right transformDown {
-  case r => conflictPlanMap.getOrElse(r, r)
-} transformUp {
-  case other => other transformExpressions {
+rewritePlan(right, conflictPlans.toMap)._1
+  }
+}
+
+private def rewritePlan(plan: LogicalPlan, conflictPlanMap: 
Map[LogicalPlan, LogicalPlan])
+  : (LogicalPlan, Seq[(Attribute, Attribute)]) = {
+  if (conflictPlanMap.contains(plan)) {
+// If the plan is the one that conflict the with left one, we'd
+// just replace it with the new plan and collect the rewrite
+// attributes for the parent node.
+val newRelation = conflictPlanMap(plan)
+newRelation -> plan.output.zip(newRelation.output)
+  } else {
+val attrMapping = new mutable.ArrayBuffer[(Attribute, Attribute)]()
+val newPlan = plan.mapChildren { child =>
+  // If not, we'd rewrite child plan recursively until we find the
+  // conflict node or reach the leaf node.
+  val (newChild, childAttrMapping) = rewritePlan(child, 
conflictPlanMap)
+  attrMapping ++= childAttrMapping.filter { case (oldAttr, _) =>
+// If an attribute from the child is not output or referenced by 
the parent plan,
+// it's useless to propagate this attribute to the parent plan, as 
they are not
+// going to use this attribute.
+(plan.outputSet ++ plan.references).contains(oldAttr)

Review comment:
   ah sorry, you are right. `attrMapping` is not only propagated to 
parents, but is also used to rewrite attributes of the currennt `plan`, so 
`plan.references` is necessary. Can we make the comment clearer about it?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on a change in pull request #29166: [SPARK-32280][SPARK-32372][SQL] ResolveReferences.dedupRight should only rewrite attributes for ancestor nodes of the conflict

2020-07-22 Thread GitBox


cloud-fan commented on a change in pull request #29166:
URL: https://github.com/apache/spark/pull/29166#discussion_r459198329



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
##
@@ -1237,20 +1250,48 @@ class Analyzer(
   if (conflictPlans.isEmpty) {
 right
   } else {
-val attributeRewrites = AttributeMap(conflictPlans.flatMap {
-  case (oldRelation, newRelation) => 
oldRelation.output.zip(newRelation.output)})
-val conflictPlanMap = conflictPlans.toMap
-// transformDown so that we can replace all the old Relations in one 
turn due to
-// the reason that `conflictPlans` are also collected in pre-order.
-right transformDown {
-  case r => conflictPlanMap.getOrElse(r, r)
-} transformUp {
-  case other => other transformExpressions {
+rewritePlan(right, conflictPlans.toMap)._1
+  }
+}
+
+private def rewritePlan(plan: LogicalPlan, conflictPlanMap: 
Map[LogicalPlan, LogicalPlan])
+  : (LogicalPlan, Seq[(Attribute, Attribute)]) = {
+  if (conflictPlanMap.contains(plan)) {
+// If the plan is the one that conflict the with left one, we'd
+// just replace it with the new plan and collect the rewrite
+// attributes for the parent node.
+val newRelation = conflictPlanMap(plan)
+newRelation -> plan.output.zip(newRelation.output)
+  } else {
+val attrMapping = new mutable.ArrayBuffer[(Attribute, Attribute)]()
+val newPlan = plan.mapChildren { child =>
+  // If not, we'd rewrite child plan recursively until we find the
+  // conflict node or reach the leaf node.
+  val (newChild, childAttrMapping) = rewritePlan(child, 
conflictPlanMap)
+  attrMapping ++= childAttrMapping.filter { case (oldAttr, _) =>
+// If an attribute from the child is not output or referenced by 
the parent plan,
+// it's useless to propagate this attribute to the parent plan, as 
they are not
+// going to use this attribute.
+(plan.outputSet ++ plan.references).contains(oldAttr)

Review comment:
   the parents of `plan` can only reference `plan.outputSet`, that's why I 
think only `plan.outputSet` is necessary.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] kevinjmh commented on pull request #28901: [SPARK-32064][SQL] Supporting create temporary table

2020-07-22 Thread GitBox


kevinjmh commented on pull request #28901:
URL: https://github.com/apache/spark/pull/28901#issuecomment-662802864


   how about to use `CACHE TABLE` command to do that?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on a change in pull request #29191: [SPARK-32364][SQL][FOLLOWUP] Add toMap to return originalMap

2020-07-22 Thread GitBox


cloud-fan commented on a change in pull request #29191:
URL: https://github.com/apache/spark/pull/29191#discussion_r459197779



##
File path: sql/core/src/main/scala/org/apache/spark/sql/DataFrameWriter.scala
##
@@ -288,7 +288,7 @@ final class DataFrameWriter[T] private[sql](ds: Dataset[T]) 
{
   val provider = maybeV2Provider.get
   val sessionOptions = DataSourceV2Utils.extractSessionConfigs(
 provider, df.sparkSession.sessionState.conf)
-  val options = sessionOptions ++ extraOptions

Review comment:
   maybe `sessionOptions.filterNot(extraOptions.contains) ++ 
extraOptions.originalMap`





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on a change in pull request #29191: [SPARK-32364][SQL][FOLLOWUP] Add toMap to return originalMap

2020-07-22 Thread GitBox


cloud-fan commented on a change in pull request #29191:
URL: https://github.com/apache/spark/pull/29191#discussion_r459197666



##
File path: 
sql/catalyst/src/main/scala-2.12/org/apache/spark/sql/catalyst/util/CaseInsensitiveMap.scala
##
@@ -52,6 +52,8 @@ class CaseInsensitiveMap[T] private (val originalMap: 
Map[String, T]) extends Ma
   override def -(key: String): Map[String, T] = {
 new CaseInsensitiveMap(originalMap.filter(!_._1.equalsIgnoreCase(key)))
   }
+
+  def toMap: Map[String, T] = originalMap

Review comment:
   shall we use consistent name `asCaseSensitiveMap`?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on pull request #29191: [SPARK-32364][SQL][FOLLOWUP] Add toMap to return originalMap

2020-07-22 Thread GitBox


cloud-fan commented on pull request #29191:
URL: https://github.com/apache/spark/pull/29191#issuecomment-662802631


   Let's define the behavior more clear:
   1. DFReader/Writer options are case insensitive, so later option should 
override prior ones if the key equals case-insensitively. This already fixed by 
your previous PR. We should also clearly document it in DFReader/Writer.
   2. For DS V1, the options are passed as `Map[String, String]`, so v1 sources 
don't really have a way to get the `originalMap`. All keys are lowercased. 
(unless you cast it to `CaseInsensitiveMap` which is private)
   3. For DS v2, the options are passed as `CaseInsensitiveStringMap`, which 
provides `asCaseSensitiveMap`. So it's case-preserving.
   
   This PR is trying to fix 3. There are 2 places we should fix:
   1. `val options = sessionOptions ++ extraOptions`. Using `extraOptions` 
directly as a map will return lower-cased keys. There is another concern that 
if `sessionOptions` and `extraOptions` have duplicated keys, we should make 
sure `extraOptions` takes precedence.
   2. `AppendData.byName...`. It should also get the case-preserving map, as 
the `AppendDataExec` will wrap it with `CaseInsensitiveStringMap`.
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AngersZhuuuu commented on pull request #29085: [SPARK-32106][SQL]Implement SparkScriptTransformationExec in sql/core

2020-07-22 Thread GitBox


AngersZh commented on pull request #29085:
URL: https://github.com/apache/spark/pull/29085#issuecomment-662801978


   > Looks almost okay now, so could you split this PR into pieces? I think its 
somewhat big fro reviews. For example;
   > 
   > 1. More refactoring PR  for `HiveScriptTransformationExec` and 
`BaseScriptTransfromationExec` just like 
[SPARK-32105](https://issues.apache.org/jira/browse/SPARK-32105)
   > 2. PR to improve test coverage of `HiveScriptTransformationExec`
   > 3. Then, PR to implement Spark-native TRRANSFORM
   > 
   > WDTY?
   
   Yea, raise pr  one by one.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AngersZhuuuu commented on a change in pull request #29085: [SPARK-32106][SQL]Implement SparkScriptTransformationExec in sql/core

2020-07-22 Thread GitBox


AngersZh commented on a change in pull request #29085:
URL: https://github.com/apache/spark/pull/29085#discussion_r459194715



##
File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/BaseScriptTransformationExec.scala
##
@@ -87,17 +181,72 @@ trait BaseScriptTransformationExec extends UnaryExecNode {
   }
 }
   }
+
+  private lazy val outputFieldWriters: Seq[String => Any] = output.map { attr 
=>
+val converter = 
CatalystTypeConverters.createToCatalystConverter(attr.dataType)
+attr.dataType match {
+  case StringType => wrapperConvertException(data => data, converter)
+  case BooleanType => wrapperConvertException(data => data.toBoolean, 
converter)
+  case ByteType => wrapperConvertException(data => data.toByte, converter)
+  case BinaryType =>
+wrapperConvertException(data => UTF8String.fromString(data).getBytes, 
converter)
+  case IntegerType => wrapperConvertException(data => data.toInt, 
converter)
+  case ShortType => wrapperConvertException(data => data.toShort, 
converter)
+  case LongType => wrapperConvertException(data => data.toLong, converter)
+  case FloatType => wrapperConvertException(data => data.toFloat, 
converter)
+  case DoubleType => wrapperConvertException(data => data.toDouble, 
converter)
+  case _: DecimalType => wrapperConvertException(data => BigDecimal(data), 
converter)
+  case DateType if conf.datetimeJava8ApiEnabled =>
+wrapperConvertException(data => DateTimeUtils.stringToDate(
+  UTF8String.fromString(data),
+  DateTimeUtils.getZoneId(conf.sessionLocalTimeZone))
+  .map(DateTimeUtils.daysToLocalDate).orNull, converter)
+  case DateType => wrapperConvertException(data => 
DateTimeUtils.stringToDate(
+UTF8String.fromString(data),
+DateTimeUtils.getZoneId(conf.sessionLocalTimeZone))
+.map(DateTimeUtils.toJavaDate).orNull, converter)
+  case TimestampType if conf.datetimeJava8ApiEnabled =>
+ wrapperConvertException(data => DateTimeUtils.stringToTimestamp(
+  UTF8String.fromString(data),
+  DateTimeUtils.getZoneId(conf.sessionLocalTimeZone))
+  .map(DateTimeUtils.microsToInstant).orNull, converter)
+  case TimestampType => wrapperConvertException(data => 
DateTimeUtils.stringToTimestamp(
+UTF8String.fromString(data),
+DateTimeUtils.getZoneId(conf.sessionLocalTimeZone))
+.map(DateTimeUtils.toJavaTimestamp).orNull, converter)
+  case CalendarIntervalType => wrapperConvertException(
+data => IntervalUtils.stringToInterval(UTF8String.fromString(data)),
+converter)
+  case udt: UserDefinedType[_] =>
+wrapperConvertException(data => udt.deserialize(data), converter)
+  case dt =>
+throw new SparkException("TRANSFORM without serde does not support " +

Review comment:
   > nit: `TRANSFORM` -> `s"$nodeName...`
   
   Done





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #29085: [SPARK-32106][SQL]Implement SparkScriptTransformationExec in sql/core

2020-07-22 Thread GitBox


AmplabJenkins removed a comment on pull request #29085:
URL: https://github.com/apache/spark/pull/29085#issuecomment-662799097







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #29085: [SPARK-32106][SQL]Implement SparkScriptTransformationExec in sql/core

2020-07-22 Thread GitBox


AmplabJenkins commented on pull request #29085:
URL: https://github.com/apache/spark/pull/29085#issuecomment-662799097







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #29197: [SPARK-32395][SQL] Commit output files after task attempt succeed in dynamic partition datawriter

2020-07-22 Thread GitBox


AmplabJenkins removed a comment on pull request #29197:
URL: https://github.com/apache/spark/pull/29197#issuecomment-662798617


   Can one of the admins verify this patch?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #29197: [SPARK-32395][SQL] Commit output files after task attempt succeed in dynamic partition datawriter

2020-07-22 Thread GitBox


AmplabJenkins commented on pull request #29197:
URL: https://github.com/apache/spark/pull/29197#issuecomment-662798935


   Can one of the admins verify this patch?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #29085: [SPARK-32106][SQL]Implement SparkScriptTransformationExec in sql/core

2020-07-22 Thread GitBox


SparkQA commented on pull request #29085:
URL: https://github.com/apache/spark/pull/29085#issuecomment-662798740


   **[Test build #126376 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/126376/testReport)**
 for PR 29085 at commit 
[`03d3409`](https://github.com/apache/spark/commit/03d3409f6ca641a4f10fdc2ac71479445220f676).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #29197: [SPARK-32395][SQL] Commit output files after task attempt succeed in dynamic partition datawriter

2020-07-22 Thread GitBox


AmplabJenkins commented on pull request #29197:
URL: https://github.com/apache/spark/pull/29197#issuecomment-662798617


   Can one of the admins verify this patch?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on pull request #29062: [SPARK-32237][SQL] Resolve hint in CTE

2020-07-22 Thread GitBox


cloud-fan commented on pull request #29062:
URL: https://github.com/apache/spark/pull/29062#issuecomment-662798239


   merged to master, thanks!
   
   @LantaoJin can you send a backport to 3.0?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] wankunde opened a new pull request #29197: [SPARK-32395][SQL] Commit output files after task attempt succeed in dynamic partition datawriter

2020-07-22 Thread GitBox


wankunde opened a new pull request #29197:
URL: https://github.com/apache/spark/pull/29197


   # What changes were proposed in this pull request?
   Generally, distributed jobs have two stages of committing files: committing 
task's output files and committing job's output files. If one attempt fails, 
another attempt will try to run the task again, after all tasks succeed, the 
job will commit the output of all tasks.
   But now if we run a dynamic partition overwrite job, for example, `INSERT 
OVERWRITE table dst partition(part) SELECT * from src`, then if one of the 
final stage tasks fails, the job will fail.
   The first task attempt  datawriter in final stage writes the output data 
directly to spark stage directory.If the first taskattempt fails, the second 
taskattempt datawriter will fail to setup, because the task's output file is 
already exists. Then the job will fail.
   Therefore, I think we should write the temporary data to the task attempt's 
work directory, and commit result files after the task attempt succeed.
   
   ### Why are the changes needed?
   Bug fix in case one dynamic partition data writer of final stage tasks fails 
.
   
   ### Does this PR introduce _any_ user-facing change?
   No
   
   ### How was this patch tested?
   Added UT



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan closed pull request #29062: [SPARK-32237][SQL] Resolve hint in CTE

2020-07-22 Thread GitBox


cloud-fan closed pull request #29062:
URL: https://github.com/apache/spark/pull/29062


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on pull request #29194: [SPARK-30616][SQL][FOLLOW-UP] Use only config key name in the config doc.

2020-07-22 Thread GitBox


cloud-fan commented on pull request #29194:
URL: https://github.com/apache/spark/pull/29194#issuecomment-662797390


   thanks, merging to master!



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan closed pull request #29194: [SPARK-30616][SQL][FOLLOW-UP] Use only config key name in the config doc.

2020-07-22 Thread GitBox


cloud-fan closed pull request #29194:
URL: https://github.com/apache/spark/pull/29194


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AngersZhuuuu commented on a change in pull request #29085: [SPARK-32106][SQL]Implement SparkScriptTransformationExec in sql/core

2020-07-22 Thread GitBox


AngersZh commented on a change in pull request #29085:
URL: https://github.com/apache/spark/pull/29085#discussion_r459189448



##
File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/BaseScriptTransformationExec.scala
##
@@ -87,17 +181,72 @@ trait BaseScriptTransformationExec extends UnaryExecNode {
   }
 }
   }
+
+  private lazy val outputFieldWriters: Seq[String => Any] = output.map { attr 
=>
+val converter = 
CatalystTypeConverters.createToCatalystConverter(attr.dataType)
+attr.dataType match {
+  case StringType => wrapperConvertException(data => data, converter)
+  case BooleanType => wrapperConvertException(data => data.toBoolean, 
converter)
+  case ByteType => wrapperConvertException(data => data.toByte, converter)
+  case BinaryType =>
+wrapperConvertException(data => UTF8String.fromString(data).getBytes, 
converter)
+  case IntegerType => wrapperConvertException(data => data.toInt, 
converter)
+  case ShortType => wrapperConvertException(data => data.toShort, 
converter)
+  case LongType => wrapperConvertException(data => data.toLong, converter)
+  case FloatType => wrapperConvertException(data => data.toFloat, 
converter)
+  case DoubleType => wrapperConvertException(data => data.toDouble, 
converter)
+  case _: DecimalType => wrapperConvertException(data => BigDecimal(data), 
converter)
+  case DateType if conf.datetimeJava8ApiEnabled =>
+wrapperConvertException(data => DateTimeUtils.stringToDate(
+  UTF8String.fromString(data),
+  DateTimeUtils.getZoneId(conf.sessionLocalTimeZone))
+  .map(DateTimeUtils.daysToLocalDate).orNull, converter)
+  case DateType => wrapperConvertException(data => 
DateTimeUtils.stringToDate(
+UTF8String.fromString(data),
+DateTimeUtils.getZoneId(conf.sessionLocalTimeZone))
+.map(DateTimeUtils.toJavaDate).orNull, converter)
+  case TimestampType if conf.datetimeJava8ApiEnabled =>
+ wrapperConvertException(data => DateTimeUtils.stringToTimestamp(
+  UTF8String.fromString(data),
+  DateTimeUtils.getZoneId(conf.sessionLocalTimeZone))
+  .map(DateTimeUtils.microsToInstant).orNull, converter)
+  case TimestampType => wrapperConvertException(data => 
DateTimeUtils.stringToTimestamp(
+UTF8String.fromString(data),
+DateTimeUtils.getZoneId(conf.sessionLocalTimeZone))
+.map(DateTimeUtils.toJavaTimestamp).orNull, converter)
+  case CalendarIntervalType => wrapperConvertException(
+data => IntervalUtils.stringToInterval(UTF8String.fromString(data)),
+converter)
+  case udt: UserDefinedType[_] =>
+wrapperConvertException(data => udt.deserialize(data), converter)
+  case dt =>
+throw new SparkException("TRANSFORM without serde does not support " +
+  s"${dt.getClass.getSimpleName} as output data type")

Review comment:
   > `dt.getClass.getSimpleName` -> `dt.catalogString`
   
   It is not general, for ArrayType will show  `array`,  StructType will 
show  `struct`
   WDYT?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AngersZhuuuu commented on a change in pull request #29085: [SPARK-32106][SQL]Implement SparkScriptTransformationExec in sql/core

2020-07-22 Thread GitBox


AngersZh commented on a change in pull request #29085:
URL: https://github.com/apache/spark/pull/29085#discussion_r459189448



##
File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/BaseScriptTransformationExec.scala
##
@@ -87,17 +181,72 @@ trait BaseScriptTransformationExec extends UnaryExecNode {
   }
 }
   }
+
+  private lazy val outputFieldWriters: Seq[String => Any] = output.map { attr 
=>
+val converter = 
CatalystTypeConverters.createToCatalystConverter(attr.dataType)
+attr.dataType match {
+  case StringType => wrapperConvertException(data => data, converter)
+  case BooleanType => wrapperConvertException(data => data.toBoolean, 
converter)
+  case ByteType => wrapperConvertException(data => data.toByte, converter)
+  case BinaryType =>
+wrapperConvertException(data => UTF8String.fromString(data).getBytes, 
converter)
+  case IntegerType => wrapperConvertException(data => data.toInt, 
converter)
+  case ShortType => wrapperConvertException(data => data.toShort, 
converter)
+  case LongType => wrapperConvertException(data => data.toLong, converter)
+  case FloatType => wrapperConvertException(data => data.toFloat, 
converter)
+  case DoubleType => wrapperConvertException(data => data.toDouble, 
converter)
+  case _: DecimalType => wrapperConvertException(data => BigDecimal(data), 
converter)
+  case DateType if conf.datetimeJava8ApiEnabled =>
+wrapperConvertException(data => DateTimeUtils.stringToDate(
+  UTF8String.fromString(data),
+  DateTimeUtils.getZoneId(conf.sessionLocalTimeZone))
+  .map(DateTimeUtils.daysToLocalDate).orNull, converter)
+  case DateType => wrapperConvertException(data => 
DateTimeUtils.stringToDate(
+UTF8String.fromString(data),
+DateTimeUtils.getZoneId(conf.sessionLocalTimeZone))
+.map(DateTimeUtils.toJavaDate).orNull, converter)
+  case TimestampType if conf.datetimeJava8ApiEnabled =>
+ wrapperConvertException(data => DateTimeUtils.stringToTimestamp(
+  UTF8String.fromString(data),
+  DateTimeUtils.getZoneId(conf.sessionLocalTimeZone))
+  .map(DateTimeUtils.microsToInstant).orNull, converter)
+  case TimestampType => wrapperConvertException(data => 
DateTimeUtils.stringToTimestamp(
+UTF8String.fromString(data),
+DateTimeUtils.getZoneId(conf.sessionLocalTimeZone))
+.map(DateTimeUtils.toJavaTimestamp).orNull, converter)
+  case CalendarIntervalType => wrapperConvertException(
+data => IntervalUtils.stringToInterval(UTF8String.fromString(data)),
+converter)
+  case udt: UserDefinedType[_] =>
+wrapperConvertException(data => udt.deserialize(data), converter)
+  case dt =>
+throw new SparkException("TRANSFORM without serde does not support " +
+  s"${dt.getClass.getSimpleName} as output data type")

Review comment:
   > `dt.getClass.getSimpleName` -> `dt.catalogString`
   
   It is not general, for ArrayType will show  `array`,  StructType will 
show  `struct`





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



  1   2   3   4   5   6   7   8   9   10   >