[GitHub] [spark] AmplabJenkins removed a comment on pull request #29107: [SPARK-32308][SQL] Move by-name resolution logic of unionByName from API code to analysis phase
AmplabJenkins removed a comment on pull request #29107: URL: https://github.com/apache/spark/pull/29107#issuecomment-662831333 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #29190: [do not review][testing master PR build][discard] comment change
AmplabJenkins removed a comment on pull request #29190: URL: https://github.com/apache/spark/pull/29190#issuecomment-662831354 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #29190: [do not review][testing master PR build][discard] comment change
AmplabJenkins commented on pull request #29190: URL: https://github.com/apache/spark/pull/29190#issuecomment-662831354 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #29107: [SPARK-32308][SQL] Move by-name resolution logic of unionByName from API code to analysis phase
AmplabJenkins commented on pull request #29107: URL: https://github.com/apache/spark/pull/29107#issuecomment-662831333 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #29190: [do not review][testing master PR build][discard] comment change
SparkQA commented on pull request #29190: URL: https://github.com/apache/spark/pull/29190#issuecomment-662831076 **[Test build #126387 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/126387/testReport)** for PR 29190 at commit [`fa41817`](https://github.com/apache/spark/commit/fa41817475bb6c1bdb067979f297fe63ce5b7790). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on pull request #29032: [SPARK-32217] Plumb whether a worker would also be decommissioned along with executor
cloud-fan commented on pull request #29032: URL: https://github.com/apache/spark/pull/29032#issuecomment-662829893 I agree `host` is better as it's general to any deployment model. I think we should only use `worker` when it refers to the standalone worker. Maybe we can start following this principle from now on. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on a change in pull request #29191: [SPARK-32364][SQL][FOLLOWUP] Add toMap to return originalMap
dongjoon-hyun commented on a change in pull request #29191: URL: https://github.com/apache/spark/pull/29191#discussion_r459225681 ## File path: sql/core/src/main/scala/org/apache/spark/sql/DataFrameReader.scala ## @@ -361,7 +361,7 @@ class DataFrameReader private[sql](sparkSession: SparkSession) extends Logging { connectionProperties: Properties): DataFrame = { assertNoSpecifiedSchema("jdbc") // connectionProperties should override settings in extraOptions. -val params = extraOptions.toMap ++ connectionProperties.asScala.toMap +val params = extraOptions ++ connectionProperties.asScala Review comment: `extraOptions ++ connectionProperties.asScala` will guarantee that `connectionProoperties` overrides the `extraOption`. We added `CaseInsensitiveMap.++` operation for that~ Also, `new JDBCOptions(url, table, params)` works like the following. The passed `params: CaseInsensitiveMap` guarantees that JDBC_URL and JDBC_TABLE_NAME override `parameters`. ```scala def this(url: String, table: String, parameters: Map[String, String]) = { this(CaseInsensitiveMap(parameters ++ Map( JDBCOptions.JDBC_URL -> url, JDBCOptions.JDBC_TABLE_NAME -> table))) } ``` The above code is for your logic. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #29191: [SPARK-32364][SQL][FOLLOWUP] Add toMap to return originalMap
AmplabJenkins removed a comment on pull request #29191: URL: https://github.com/apache/spark/pull/29191#issuecomment-662829543 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #29191: [SPARK-32364][SQL][FOLLOWUP] Add toMap to return originalMap
AmplabJenkins commented on pull request #29191: URL: https://github.com/apache/spark/pull/29191#issuecomment-662829543 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on a change in pull request #29191: [SPARK-32364][SQL][FOLLOWUP] Add toMap to return originalMap
dongjoon-hyun commented on a change in pull request #29191: URL: https://github.com/apache/spark/pull/29191#discussion_r459225681 ## File path: sql/core/src/main/scala/org/apache/spark/sql/DataFrameReader.scala ## @@ -361,7 +361,7 @@ class DataFrameReader private[sql](sparkSession: SparkSession) extends Logging { connectionProperties: Properties): DataFrame = { assertNoSpecifiedSchema("jdbc") // connectionProperties should override settings in extraOptions. -val params = extraOptions.toMap ++ connectionProperties.asScala.toMap +val params = extraOptions ++ connectionProperties.asScala Review comment: `extraOptions ++ connectionProperties.asScala` will guarantee that `connectionProoperties` overrides the `extraOption`. We added `CaseInsensitiveMap.++` operation for that~ Also, `new JDBCOptions(url, table, params)` works like the following. The passed `params: CaseInsensitiveMap` guarantees that JDBC_URL and JDBC_TABLE_NAME override `parameters`. ``` def this(url: String, table: String, parameters: Map[String, String]) = { this(CaseInsensitiveMap(parameters ++ Map( JDBCOptions.JDBC_URL -> url, JDBCOptions.JDBC_TABLE_NAME -> table))) } ``` The above code is for your logic. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on a change in pull request #29191: [SPARK-32364][SQL][FOLLOWUP] Add toMap to return originalMap
dongjoon-hyun commented on a change in pull request #29191: URL: https://github.com/apache/spark/pull/29191#discussion_r459225681 ## File path: sql/core/src/main/scala/org/apache/spark/sql/DataFrameReader.scala ## @@ -361,7 +361,7 @@ class DataFrameReader private[sql](sparkSession: SparkSession) extends Logging { connectionProperties: Properties): DataFrame = { assertNoSpecifiedSchema("jdbc") // connectionProperties should override settings in extraOptions. -val params = extraOptions.toMap ++ connectionProperties.asScala.toMap +val params = extraOptions ++ connectionProperties.asScala Review comment: `extraOptions ++ connectionProperties.asScala` will guarantee that `connectionProoperties` overrides the `extraOption`. We added `CaseInsensitiveMap.++` operation for that~ Also, `new JDBCOptions(url, table, params)` works like this. The passed `params: CaseInsensitiveMap` guarantees that JDBC_URL and JDBC_TABLE_NAME override `parameters`. ``` def this(url: String, table: String, parameters: Map[String, String]) = { this(CaseInsensitiveMap(parameters ++ Map( JDBCOptions.JDBC_URL -> url, JDBCOptions.JDBC_TABLE_NAME -> table))) } ``` The above code is for your logic. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #29191: [SPARK-32364][SQL][FOLLOWUP] Add toMap to return originalMap
SparkQA commented on pull request #29191: URL: https://github.com/apache/spark/pull/29191#issuecomment-662829231 **[Test build #126386 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/126386/testReport)** for PR 29191 at commit [`3e55841`](https://github.com/apache/spark/commit/3e558415be4238fa955a5ace6ca7a480f45abffd). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on a change in pull request #29191: [SPARK-32364][SQL][FOLLOWUP] Add toMap to return originalMap
dongjoon-hyun commented on a change in pull request #29191: URL: https://github.com/apache/spark/pull/29191#discussion_r459225681 ## File path: sql/core/src/main/scala/org/apache/spark/sql/DataFrameReader.scala ## @@ -361,7 +361,7 @@ class DataFrameReader private[sql](sparkSession: SparkSession) extends Logging { connectionProperties: Properties): DataFrame = { assertNoSpecifiedSchema("jdbc") // connectionProperties should override settings in extraOptions. -val params = extraOptions.toMap ++ connectionProperties.asScala.toMap +val params = extraOptions ++ connectionProperties.asScala Review comment: `extraOptions ++ connectionProperties.asScala` will guarantee that `connectionProoperties` overrides the `extraOption`. We added `++` operation for that~ Also, `new JDBCOptions(url, table, params)` works like this. The passed `params: CaseInsensitiveMap` guarantees that JDBC_URL and JDBC_TABLE_NAME override `parameters`. ``` def this(url: String, table: String, parameters: Map[String, String]) = { this(CaseInsensitiveMap(parameters ++ Map( JDBCOptions.JDBC_URL -> url, JDBCOptions.JDBC_TABLE_NAME -> table))) } ``` The above code is for your logic. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on a change in pull request #29191: [SPARK-32364][SQL][FOLLOWUP] Add toMap to return originalMap
dongjoon-hyun commented on a change in pull request #29191: URL: https://github.com/apache/spark/pull/29191#discussion_r459225681 ## File path: sql/core/src/main/scala/org/apache/spark/sql/DataFrameReader.scala ## @@ -361,7 +361,7 @@ class DataFrameReader private[sql](sparkSession: SparkSession) extends Logging { connectionProperties: Properties): DataFrame = { assertNoSpecifiedSchema("jdbc") // connectionProperties should override settings in extraOptions. -val params = extraOptions.toMap ++ connectionProperties.asScala.toMap +val params = extraOptions ++ connectionProperties.asScala Review comment: `extraOptions ++ connectionProperties.asScala` will guarantee `connectionProoperties` overrides the `extraOption`. We added `++` operation for that~ Also, `new JDBCOptions(url, table, params)` works like this. The passed `params: CaseInsensitiveMap` guarantees that JDBC_URL and JDBC_TABLE_NAME override `parameters`. ``` def this(url: String, table: String, parameters: Map[String, String]) = { this(CaseInsensitiveMap(parameters ++ Map( JDBCOptions.JDBC_URL -> url, JDBCOptions.JDBC_TABLE_NAME -> table))) } ``` The above code is for your logic. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] agrawaldevesh commented on pull request #29032: [SPARK-32217] Plumb whether a worker would also be decommissioned along with executor
agrawaldevesh commented on pull request #29032: URL: https://github.com/apache/spark/pull/29032#issuecomment-662828785 > Merged, thanks everyone. > > I don't know if we have a good principle around naming host v. worker with the deprecation. Thanks @holdenk for championing and shepherding this change !! @cloud-fan, I agree that this host <=> worker naming is confusing. It was especially confusing for a newcomer to me: `ExecutorProcessLost` has a field `workerLost` (renamed from the erstwhile `slaveLost`). I had to pick b/w `Worker` and `Host` in `ExecutorDecommissionInfo.isHostDecommissioned` and I went with `Worker` initially but later changed `Host`. But in my mind, they are both the same: Basically there is one external shuffle service (if any) per worker (or host). The word `worker` might draw some unintended connotation to the Standalone mode so I refrained from using it. But I think its usage is quite pervasive already. It would be nice to have consistent terminology around `worker` vs `host` vs `node` :-P. In the absence of any, I went with `host` here. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #29191: [SPARK-32364][SQL][FOLLOWUP] Add toMap to return originalMap
cloud-fan commented on a change in pull request #29191: URL: https://github.com/apache/spark/pull/29191#discussion_r459223512 ## File path: sql/core/src/main/scala/org/apache/spark/sql/DataFrameReader.scala ## @@ -361,7 +361,7 @@ class DataFrameReader private[sql](sparkSession: SparkSession) extends Logging { connectionProperties: Properties): DataFrame = { assertNoSpecifiedSchema("jdbc") // connectionProperties should override settings in extraOptions. -val params = extraOptions.toMap ++ connectionProperties.asScala.toMap +val params = extraOptions ++ connectionProperties.asScala Review comment: and to guarantee `connectionProperties` overrides `extraOptions`, we may also need to do `map1.filterKeys... ++ map2` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] agrawaldevesh commented on a change in pull request #29104: [SPARK-32290][SQL] SingleColumn Null Aware Anti Join Optimize
agrawaldevesh commented on a change in pull request #29104: URL: https://github.com/apache/spark/pull/29104#discussion_r459223391 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/joins/BroadcastHashJoinExec.scala ## @@ -454,6 +478,28 @@ case class BroadcastHashJoinExec( val (matched, checkCondition, _) = getJoinCondition(ctx, input) val numOutput = metricTerm(ctx, "numOutputRows") +if (isNullAwareAntiJoin) { + require(leftKeys.length == 1, "leftKeys length should be 1") + require(rightKeys.length == 1, "rightKeys length should be 1") + require(joinType == LeftAnti, "joinType must be LeftAnti.") + require(buildSide == BuildRight, "buildSide must be BuildRight.") + require(SQLConf.get.nullAwareAntiJoinOptimizeEnabled, +"nullAwareAntiJoinOptimizeEnabled must be on for null aware anti join optimize.") + require(checkCondition == "", "null aware anti join optimize condition should be empty.") + + if (broadcastRelation.value.inputEmpty) { +return s""" + |// singleColumn NAAJ inputEmpty(true) accept all + |$numOutput.add(1); + |${consume(ctx, input)} + """.stripMargin + } else if (broadcastRelation.value.anyNullKeyExists) { +return s""" + |// singleColumn NAAJ inputEmpty(false) anyNullKeyExists(true) reject all Review comment: > @maryannxue @agrawaldevesh @maropu @viirya what do you think? Thanks for the empirical analysis @cloud-fan ... that always helps to seal an argument :-) I do agree that introducing a new node is sometimes more pain than worth, so I would be equally be okay with just modifying BHJ. I think they are both fine choices and I would leave it to @leanken to make a call. The ideal scenario would be to have minimal code diff in BHJ and introducing a new node is only a fallback if the diff cannot be reduced to our satisfaction. Either way, we ought to decide one way or the other soon since the PR is becoming slightly confusing with both approaches co-existing :-) Are there any performance benchmarks that we should be re-running for BHJ to ensure no regression ? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #29191: [SPARK-32364][SQL][FOLLOWUP] Add toMap to return originalMap
cloud-fan commented on a change in pull request #29191: URL: https://github.com/apache/spark/pull/29191#discussion_r459223066 ## File path: sql/core/src/main/scala/org/apache/spark/sql/DataFrameReader.scala ## @@ -361,7 +361,7 @@ class DataFrameReader private[sql](sparkSession: SparkSession) extends Logging { connectionProperties: Properties): DataFrame = { assertNoSpecifiedSchema("jdbc") // connectionProperties should override settings in extraOptions. -val params = extraOptions.toMap ++ connectionProperties.asScala.toMap +val params = extraOptions ++ connectionProperties.asScala Review comment: shall we call `extraOptions.toMap` here? to be case-preserving `params` is used to create `new JDBCOptions(url, table, params)`, and `JDBCOptions.asProperties` needs to be case-preserving. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #29191: [SPARK-32364][SQL][FOLLOWUP] Add toMap to return originalMap
AmplabJenkins removed a comment on pull request #29191: URL: https://github.com/apache/spark/pull/29191#issuecomment-662825995 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #29014: [SPARK-32199][SPARK-32198] Reduce job failures during decommissioning
AmplabJenkins removed a comment on pull request #29014: URL: https://github.com/apache/spark/pull/29014#issuecomment-662825993 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #29191: [SPARK-32364][SQL][FOLLOWUP] Add toMap to return originalMap
AmplabJenkins commented on pull request #29191: URL: https://github.com/apache/spark/pull/29191#issuecomment-662825995 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #29014: [SPARK-32199][SPARK-32198] Reduce job failures during decommissioning
AmplabJenkins commented on pull request #29014: URL: https://github.com/apache/spark/pull/29014#issuecomment-662825993 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #29191: [SPARK-32364][SQL][FOLLOWUP] Add toMap to return originalMap
SparkQA commented on pull request #29191: URL: https://github.com/apache/spark/pull/29191#issuecomment-662825656 **[Test build #126384 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/126384/testReport)** for PR 29191 at commit [`e64e7ad`](https://github.com/apache/spark/commit/e64e7adfab001a508591ef1e0a6ecf214af1ac9e). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #29014: [SPARK-32199][SPARK-32198] Reduce job failures during decommissioning
SparkQA commented on pull request #29014: URL: https://github.com/apache/spark/pull/29014#issuecomment-662825676 **[Test build #126385 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/126385/testReport)** for PR 29014 at commit [`7203980`](https://github.com/apache/spark/commit/72039805bbf2d6ba149d03ea1a3be1d62a58d092). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on pull request #29191: [SPARK-32364][SQL][FOLLOWUP] Add toMap to return originalMap
HyukjinKwon commented on pull request #29191: URL: https://github.com/apache/spark/pull/29191#issuecomment-662824703 Looks good This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #29191: [SPARK-32364][SQL][FOLLOWUP] Add toMap to return originalMap
AmplabJenkins removed a comment on pull request #29191: URL: https://github.com/apache/spark/pull/29191#issuecomment-662824244 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #29191: [SPARK-32364][SQL][FOLLOWUP] Add toMap to return originalMap
AmplabJenkins commented on pull request #29191: URL: https://github.com/apache/spark/pull/29191#issuecomment-662824244 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #29191: [SPARK-32364][SQL][FOLLOWUP] Add toMap to return originalMap
SparkQA commented on pull request #29191: URL: https://github.com/apache/spark/pull/29191#issuecomment-662824010 **[Test build #126383 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/126383/testReport)** for PR 29191 at commit [`e5ddea4`](https://github.com/apache/spark/commit/e5ddea4cd0fc8b6f4b9da722a0c1b1d2fb890126). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on pull request #29196: [SPARK-32398][TESTS][CORE][STREAMING][SQL][ML] Update to scalatest 3.2.0 for Scala 2.13.3+
HyukjinKwon commented on pull request #29196: URL: https://github.com/apache/spark/pull/29196#issuecomment-662824076 Yeah, we still should address https://github.com/apache/spark/pull/29196#discussion_r459209060 but the changes here look good. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on a change in pull request #29191: [SPARK-32364][SQL][FOLLOWUP] Add toMap to return originalMap
dongjoon-hyun commented on a change in pull request #29191: URL: https://github.com/apache/spark/pull/29191#discussion_r459219673 ## File path: sql/core/src/main/scala/org/apache/spark/sql/DataFrameWriter.scala ## @@ -288,7 +288,7 @@ final class DataFrameWriter[T] private[sql](ds: Dataset[T]) { val provider = maybeV2Provider.get val sessionOptions = DataSourceV2Utils.extractSessionConfigs( provider, df.sparkSession.sessionState.conf) - val options = sessionOptions ++ extraOptions Review comment: Oh, got it. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #29199: [SPARK-32105][SQL][FOLLOWUP]Refactor current ScriptTransformationExec…
SparkQA commented on pull request #29199: URL: https://github.com/apache/spark/pull/29199#issuecomment-662822321 **[Test build #126382 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/126382/testReport)** for PR 29199 at commit [`9b1f28a`](https://github.com/apache/spark/commit/9b1f28a72dc140f7b33a1ff91113c4633535b580). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on a change in pull request #29191: [SPARK-32364][SQL][FOLLOWUP] Add toMap to return originalMap
dongjoon-hyun commented on a change in pull request #29191: URL: https://github.com/apache/spark/pull/29191#discussion_r459218262 ## File path: sql/core/src/main/scala/org/apache/spark/sql/DataFrameWriter.scala ## @@ -288,7 +288,7 @@ final class DataFrameWriter[T] private[sql](ds: Dataset[T]) { val provider = maybeV2Provider.get val sessionOptions = DataSourceV2Utils.extractSessionConfigs( provider, df.sparkSession.sessionState.conf) - val options = sessionOptions ++ extraOptions + val options = sessionOptions ++ extraOptions.originalMap Review comment: Yes. `toMap` is added to support the existing code pattern. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #29199: [SPARK-32105][SQL][FOLLOWUP]Refactor current ScriptTransformationExec…
AmplabJenkins removed a comment on pull request #29199: URL: https://github.com/apache/spark/pull/29199#issuecomment-662820990 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #29199: [SPARK-32105][SQL][FOLLOWUP]Refactor current ScriptTransformationExec…
AmplabJenkins commented on pull request #29199: URL: https://github.com/apache/spark/pull/29199#issuecomment-662820990 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #29199: [SPARK-32105][SQL][FOLLOWUP]Refactor current ScriptTransformationExec…
SparkQA commented on pull request #29199: URL: https://github.com/apache/spark/pull/29199#issuecomment-662820692 **[Test build #126381 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/126381/testReport)** for PR 29199 at commit [`a007fa1`](https://github.com/apache/spark/commit/a007fa1e5f57e22682881903c6076d46c9676905). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] kujon commented on pull request #29192: [SPARK-32393][SQL] Fix postgres bpchar array support
kujon commented on pull request #29192: URL: https://github.com/apache/spark/pull/29192#issuecomment-662819772 > Could you add tests in PostgresIntegrationSuite then check if the test can pass on your local env? (Note: our testing framework, Jenkins, does not run it). will do! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #29196: [SPARK-32398][TESTS][CORE][STREAMING][SQL][ML] Update to scalatest 3.2.0 for Scala 2.13.3+
AmplabJenkins removed a comment on pull request #29196: URL: https://github.com/apache/spark/pull/29196#issuecomment-662819514 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #29199: [SPARK-32105][SQL][FOLLOWUP]Refactor current ScriptTransformationExec…
AmplabJenkins removed a comment on pull request #29199: URL: https://github.com/apache/spark/pull/29199#issuecomment-662819301 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #29196: [SPARK-32398][TESTS][CORE][STREAMING][SQL][ML] Update to scalatest 3.2.0 for Scala 2.13.3+
AmplabJenkins commented on pull request #29196: URL: https://github.com/apache/spark/pull/29196#issuecomment-662819514 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #29186: [SPARK-32386][SS][TESTS] Fix temp view leaking in Structured Streaming tests
AmplabJenkins removed a comment on pull request #29186: URL: https://github.com/apache/spark/pull/29186#issuecomment-662819334 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #29199: [SPARK-32105][SQL][FOLLOWUP]Refactor current ScriptTransformationExec…
AmplabJenkins commented on pull request #29199: URL: https://github.com/apache/spark/pull/29199#issuecomment-662819301 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #29167: [SPARK-32374][SQL] Disallow setting properties when creating temporary views
AmplabJenkins removed a comment on pull request #29167: URL: https://github.com/apache/spark/pull/29167#issuecomment-662819369 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AngersZhuuuu commented on pull request #29199: [SPARK-32105][SQL][FOLLOWUP]Refactor current ScriptTransformationExec…
AngersZh commented on pull request #29199: URL: https://github.com/apache/spark/pull/29199#issuecomment-662819271 FYI @maropu This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #29186: [SPARK-32386][SS][TESTS] Fix temp view leaking in Structured Streaming tests
AmplabJenkins commented on pull request #29186: URL: https://github.com/apache/spark/pull/29186#issuecomment-662819334 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #29196: [SPARK-32398][TESTS][CORE][STREAMING][SQL][ML] Update to scalatest 3.2.0 for Scala 2.13.3+
SparkQA removed a comment on pull request #29196: URL: https://github.com/apache/spark/pull/29196#issuecomment-662785553 **[Test build #126371 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/126371/testReport)** for PR 29196 at commit [`9c12088`](https://github.com/apache/spark/commit/9c120880b35f1ddeaec6161481380c0b3e601eea). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #29167: [SPARK-32374][SQL] Disallow setting properties when creating temporary views
AmplabJenkins commented on pull request #29167: URL: https://github.com/apache/spark/pull/29167#issuecomment-662819369 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #29196: [SPARK-32398][TESTS][CORE][STREAMING][SQL][ML] Update to scalatest 3.2.0 for Scala 2.13.3+
SparkQA commented on pull request #29196: URL: https://github.com/apache/spark/pull/29196#issuecomment-662819159 **[Test build #126371 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/126371/testReport)** for PR 29196 at commit [`9c12088`](https://github.com/apache/spark/commit/9c120880b35f1ddeaec6161481380c0b3e601eea). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #29186: [SPARK-32386][SS][TESTS] Fix temp view leaking in Structured Streaming tests
SparkQA commented on pull request #29186: URL: https://github.com/apache/spark/pull/29186#issuecomment-662819020 **[Test build #126379 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/126379/testReport)** for PR 29186 at commit [`b654ebd`](https://github.com/apache/spark/commit/b654ebd022c2dad1cbd085664f8baa0ef3db154d). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #29167: [SPARK-32374][SQL] Disallow setting properties when creating temporary views
SparkQA commented on pull request #29167: URL: https://github.com/apache/spark/pull/29167#issuecomment-662819073 **[Test build #126380 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/126380/testReport)** for PR 29167 at commit [`477b696`](https://github.com/apache/spark/commit/477b696ecd2dd75b114809f2b04e36e8a78f2007). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] viirya commented on a change in pull request #29166: [SPARK-32280][SPARK-32372][SQL] ResolveReferences.dedupRight should only rewrite attributes for ancestor nodes of the conflict pl
viirya commented on a change in pull request #29166: URL: https://github.com/apache/spark/pull/29166#discussion_r459214912 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala ## @@ -1237,20 +1250,79 @@ class Analyzer( if (conflictPlans.isEmpty) { right } else { -val attributeRewrites = AttributeMap(conflictPlans.flatMap { - case (oldRelation, newRelation) => oldRelation.output.zip(newRelation.output)}) -val conflictPlanMap = conflictPlans.toMap -// transformDown so that we can replace all the old Relations in one turn due to -// the reason that `conflictPlans` are also collected in pre-order. -right transformDown { - case r => conflictPlanMap.getOrElse(r, r) -} transformUp { - case other => other transformExpressions { +rewritePlan(right, conflictPlans.toMap)._1 + } +} + +private def rewritePlan(plan: LogicalPlan, conflictPlanMap: Map[LogicalPlan, LogicalPlan]) + : (LogicalPlan, Seq[(Attribute, Attribute)]) = { + if (conflictPlanMap.contains(plan)) { +// If the plan is the one that conflict the with left one, we'd +// just replace it with the new plan and collect the rewrite +// attributes for the parent node. +val newRelation = conflictPlanMap(plan) +newRelation -> plan.output.zip(newRelation.output) + } else { +val attrMapping = new mutable.ArrayBuffer[(Attribute, Attribute)]() +val newPlan = plan.mapChildren { child => + // If not, we'd rewrite child plan recursively until we find the + // conflict node or reach the leaf node. + val (newChild, childAttrMapping) = rewritePlan(child, conflictPlanMap) + // Only return rewrite attributes which could be used by the parent node. + // Otherwise, it could introduce duplicate rewrite attributes. For example, + // for the following plan, if we don't do filter for the `childAttrMapping`, + // the node `SubqueryAlias b` will return rewrite attribute of [kind#220 -> kind#228] + // (which is from the conflict plan `Project [id#218, foo AS kind#228]`), and the node + // `SubqueryAlias c` will return rewrite attribute of [kind#220 -> kind#229] (which + // is from the conflict plan `Project [id#227, foo AS kind#229]`). As a result, the top + // Join will have duplicated rewrite attribute. + // + // The problem is, the plan `Join Inner, (kind#229 = kind#223)` shouldn't keep returning + // rewrite attribute of [kind#220 -> kind#229] to its parent node `Project [id#227]` as + // it doesn't really need it. + // + // Join Inner, (id#218 = id#227) + // :- SubqueryAlias b + // : +- Project [id#218, foo AS kind#228] + // : +- SubqueryAlias a + // :+- Project [1 AS id#218] + // : +- OneRowRelation + // +- SubqueryAlias c + //+- Project [id#227] + // +- Join Inner, (kind#229 = kind#223) + // :- SubqueryAlias l + // : +- SubqueryAlias b + // : +- Project [id#227, foo AS kind#229] + // :+- SubqueryAlias a + // : +- Project [1 AS id#227] + // : +- OneRowRelation + // +- SubqueryAlias r + // +- SubqueryAlias b + //+- Project [id#224, foo AS kind#223] + // +- SubqueryAlias a + // +- Project [1 AS id#224] + // +- OneRowRelation + attrMapping ++= childAttrMapping.filter { case (oldAttr, _) => +(plan.references ++ plan.outputSet ++ plan.producedAttributes).contains(oldAttr) Review comment: Nvm, I actually referred `plan.outputSet`. But now I see `plan.references` is also needed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #29199: [SPARK-32105][SQL][FOLLOWUP]Refactor current ScriptTransformationExec…
SparkQA commented on pull request #29199: URL: https://github.com/apache/spark/pull/29199#issuecomment-662818996 **[Test build #126378 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/126378/testReport)** for PR 29199 at commit [`8364f1f`](https://github.com/apache/spark/commit/8364f1fa182827f29ae16bb528604de85f52449c). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AngersZhuuuu opened a new pull request #29199: [SPARK-32105][SQL][FOLLOWUP]Refactor current ScriptTransformationExec…
AngersZh opened a new pull request #29199: URL: https://github.com/apache/spark/pull/29199 # What changes were proposed in this pull request? - Extract common Script IOSchema `ScriptTransformationIOSchema` - avoid repeated judgement extract process output row method `createOutputIteratorWithoutSerde` && `createOutputIteratorWithSerde` - add default no serde IO schemas `ScriptTransformationIOSchema.defaultIOSchema` ### Why are the changes needed? Refactor code ### Does this PR introduce _any_ user-facing change? NO ### How was this patch tested? NO This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] xuanyuanking commented on pull request #29186: [SPARK-32386][SS][TESTS] Fix temp view leaking in Structured Streaming tests
xuanyuanking commented on pull request #29186: URL: https://github.com/apache/spark/pull/29186#issuecomment-662818538 retest this please This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon closed pull request #29195: [SPARK-32338][SQL][PYSPARK][FOLLOW-UP] Update slice to accept Column for start and length.
HyukjinKwon closed pull request #29195: URL: https://github.com/apache/spark/pull/29195 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #29191: [SPARK-32364][SQL][FOLLOWUP] Add toMap to return originalMap
cloud-fan commented on a change in pull request #29191: URL: https://github.com/apache/spark/pull/29191#discussion_r459213960 ## File path: sql/core/src/main/scala/org/apache/spark/sql/DataFrameWriter.scala ## @@ -288,7 +288,7 @@ final class DataFrameWriter[T] private[sql](ds: Dataset[T]) { val provider = maybeV2Provider.get val sessionOptions = DataSourceV2Utils.extractSessionConfigs( provider, df.sparkSession.sessionState.conf) - val options = sessionOptions ++ extraOptions Review comment: actually, `originalMap` should be `toMap` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #29191: [SPARK-32364][SQL][FOLLOWUP] Add toMap to return originalMap
cloud-fan commented on a change in pull request #29191: URL: https://github.com/apache/spark/pull/29191#discussion_r459213858 ## File path: sql/catalyst/src/main/scala-2.12/org/apache/spark/sql/catalyst/util/CaseInsensitiveMap.scala ## @@ -52,6 +52,8 @@ class CaseInsensitiveMap[T] private (val originalMap: Map[String, T]) extends Ma override def -(key: String): Map[String, T] = { new CaseInsensitiveMap(originalMap.filter(!_._1.equalsIgnoreCase(key))) } + + def toMap: Map[String, T] = originalMap Review comment: that's too many. let's keep `toMap` then This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on pull request #29195: [SPARK-32338][SQL][PYSPARK][FOLLOW-UP] Update slice to accept Column for start and length.
HyukjinKwon commented on pull request #29195: URL: https://github.com/apache/spark/pull/29195#issuecomment-662817669 Merged to master. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] imback82 commented on a change in pull request #29167: [SPARK-32374][SQL] Disallow setting properties when creating temporary views
imback82 commented on a change in pull request #29167: URL: https://github.com/apache/spark/pull/29167#discussion_r459212899 ## File path: sql/core/src/test/scala/org/apache/spark/sql/execution/SQLViewSuite.scala ## @@ -266,6 +266,16 @@ abstract class SQLViewSuite extends QueryTest with SQLTestUtils { } } + test("SPARK-32374: disallow setting properties for CREATE TEMPORARY VIEW") { +withTempView("myabcdview") { + val e = intercept[AnalysisException] { Review comment: Oh because `ParseException` extends `AnalysisException`. I will update this. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #29107: [SPARK-32308][SQL] Move by-name resolution logic of unionByName from API code to analysis phase
cloud-fan commented on a change in pull request #29107: URL: https://github.com/apache/spark/pull/29107#discussion_r459212692 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicLogicalOperators.scala ## @@ -271,7 +281,7 @@ case class Union(children: Seq[LogicalPlan]) extends LogicalPlan { child.output.zip(children.head.output).forall { case (l, r) => l.dataType.sameType(r.dataType) }) -children.length > 1 && childrenResolved && allChildrenCompatible +children.length > 1 && childrenResolved && allChildrenCompatible && !(byName || allowMissingCol) Review comment: nit: `!(byName || allowMissingCol)` should be checked before `childrenResolved`, as it's cheaper. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #29107: [SPARK-32308][SQL] Move by-name resolution logic of unionByName from API code to analysis phase
cloud-fan commented on a change in pull request #29107: URL: https://github.com/apache/spark/pull/29107#discussion_r459212340 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala ## @@ -1387,10 +1388,11 @@ class Analyzer( i.copy(right = dedupRight(left, right)) case e @ Except(left, right, _) if !e.duplicateResolved => e.copy(right = dedupRight(left, right)) - case u @ Union(children) if !u.duplicateResolved => + // Only after we finish by-name resolution for Union + case u: logical.Union if !u.duplicateResolved && !u.byName => Review comment: nit: `logical.Union` -> `Union`. `!u.duplicateResolved && !u.byName` -> `!u.byName && !u.duplicateResolved` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #29167: [SPARK-32374][SQL] Disallow setting properties when creating temporary views
cloud-fan commented on a change in pull request #29167: URL: https://github.com/apache/spark/pull/29167#discussion_r459210115 ## File path: sql/core/src/test/scala/org/apache/spark/sql/execution/SQLViewSuite.scala ## @@ -266,6 +266,16 @@ abstract class SQLViewSuite extends QueryTest with SQLTestUtils { } } + test("SPARK-32374: disallow setting properties for CREATE TEMPORARY VIEW") { +withTempView("myabcdview") { + val e = intercept[AnalysisException] { Review comment: interesting, the tests pass. Do you know why? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #29167: [SPARK-32374][SQL] Disallow setting properties when creating temporary views
cloud-fan commented on a change in pull request #29167: URL: https://github.com/apache/spark/pull/29167#discussion_r459209763 ## File path: sql/core/src/test/scala/org/apache/spark/sql/execution/SQLViewSuite.scala ## @@ -266,6 +266,16 @@ abstract class SQLViewSuite extends QueryTest with SQLTestUtils { } } + test("SPARK-32374: disallow setting properties for CREATE TEMPORARY VIEW") { +withTempView("myabcdview") { + val e = intercept[AnalysisException] { Review comment: parser exception? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on a change in pull request #29196: [SPARK-32398][TESTS][CORE][STREAMING][SQL][ML] Update to scalatest 3.2.0 for Scala 2.13.3+
dongjoon-hyun commented on a change in pull request #29196: URL: https://github.com/apache/spark/pull/29196#discussion_r459209060 ## File path: pom.xml ## @@ -892,7 +907,25 @@ org.scalatest scalatest_${scala.binary.version} -3.0.8 +3.2.0 +test + + +org.scalatestplus + scalatestplus-scalacheck_${scala.binary.version} +3.1.0.0-RC2 +test + + +org.scalatestplus +scalatestplus-mockito_${scala.binary.version} +1.0.0-SNAP5 +test + + +org.scalatestplus +scalatestplus-selenium_${scala.binary.version} +1.0.0-SNAP5 Review comment: Oh, interesting. All of these are `test dependency` changes, but `dev/test-dependencies.sh` seems to complain for some reasons. ``` --- a/dev/deps/spark-deps-hadoop-2.7-hive-1.2 +++ b/dev/pr-deps/spark-deps-hadoop-2.7-hive-1.2 @@ -157,9 +157,9 @@ metrics-json/4.1.1//metrics-json-4.1.1.jar metrics-jvm/4.1.1//metrics-jvm-4.1.1.jar minlog/1.3.0//minlog-1.3.0.jar netty-all/4.1.47.Final//netty-all-4.1.47.Final.jar -objenesis/2.5.1//objenesis-2.5.1.jar +objenesis/2.6//objenesis-2.6.jar okhttp/3.12.6//okhttp-3.12.6.jar -okio/1.15.0//okio-1.15.0.jar +okio/1.14.0//okio-1.14.0.jar ``` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #29104: [SPARK-32290][SQL] SingleColumn Null Aware Anti Join Optimize
cloud-fan commented on a change in pull request #29104: URL: https://github.com/apache/spark/pull/29104#discussion_r459208970 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/joins/BroadcastHashJoinExec.scala ## @@ -454,6 +478,28 @@ case class BroadcastHashJoinExec( val (matched, checkCondition, _) = getJoinCondition(ctx, input) val numOutput = metricTerm(ctx, "numOutputRows") +if (isNullAwareAntiJoin) { + require(leftKeys.length == 1, "leftKeys length should be 1") + require(rightKeys.length == 1, "rightKeys length should be 1") + require(joinType == LeftAnti, "joinType must be LeftAnti.") + require(buildSide == BuildRight, "buildSide must be BuildRight.") + require(SQLConf.get.nullAwareAntiJoinOptimizeEnabled, +"nullAwareAntiJoinOptimizeEnabled must be on for null aware anti join optimize.") + require(checkCondition == "", "null aware anti join optimize condition should be empty.") + + if (broadcastRelation.value.inputEmpty) { +return s""" + |// singleColumn NAAJ inputEmpty(true) accept all Review comment: We can do that with a custom `BroadcastMode`. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] asfgit closed pull request #29032: [SPARK-32217] Plumb whether a worker would also be decommissioned along with executor
asfgit closed pull request #29032: URL: https://github.com/apache/spark/pull/29032 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on a change in pull request #29191: [SPARK-32364][SQL][FOLLOWUP] Add toMap to return originalMap
dongjoon-hyun commented on a change in pull request #29191: URL: https://github.com/apache/spark/pull/29191#discussion_r459207445 ## File path: sql/catalyst/src/main/scala-2.12/org/apache/spark/sql/catalyst/util/CaseInsensitiveMap.scala ## @@ -52,6 +52,8 @@ class CaseInsensitiveMap[T] private (val originalMap: Map[String, T]) extends Ma override def -(key: String): Map[String, T] = { new CaseInsensitiveMap(originalMap.filter(!_._1.equalsIgnoreCase(key))) } + + def toMap: Map[String, T] = originalMap Review comment: Just for my understand. In that case, among all the existing `extraOptions.toMap`s, do I need to rename only `AppendData.byName` instance? ``` sql/core/src/main/scala/org/apache/spark/sql/DataFrameReader.scala:val params = extraOptions.toMap ++ connectionProperties.asScala.toMap sql/core/src/main/scala/org/apache/spark/sql/DataFrameReader.scala: extraOptions.toMap, sql/core/src/main/scala/org/apache/spark/sql/DataFrameReader.scala: extraOptions.toMap, sql/core/src/main/scala/org/apache/spark/sql/DataFrameWriter.scala: AppendData.byName(relation, df.logicalPlan, extraOptions.toMap) sql/core/src/main/scala/org/apache/spark/sql/DataFrameWriter.scala: relation, df.logicalPlan, Literal(true), extraOptions.toMap) sql/core/src/main/scala/org/apache/spark/sql/DataFrameWriter.scala: extraOptions.toMap, sql/core/src/main/scala/org/apache/spark/sql/DataFrameWriter.scala: options = extraOptions.toMap).planForWriting(mode, df.logicalPlan) sql/core/src/main/scala/org/apache/spark/sql/DataFrameWriter.scala: AppendData.byPosition(table, df.logicalPlan, extraOptions.toMap) sql/core/src/main/scala/org/apache/spark/sql/DataFrameWriter.scala: OverwritePartitionsDynamic.byPosition(table, df.logicalPlan, extraOptions.toMap) sql/core/src/main/scala/org/apache/spark/sql/DataFrameWriter.scala: OverwriteByExpression.byPosition(table, df.logicalPlan, Literal(true), extraOptions.toMap) sql/core/src/main/scala/org/apache/spark/sql/DataFrameWriter.scala: AppendData.byName(v2Relation, df.logicalPlan, extraOptions.toMap) sql/core/src/main/scala/org/apache/spark/sql/DataFrameWriter.scala: extraOptions.toMap, sql/core/src/main/scala/org/apache/spark/sql/DataFrameWriter.scala: extraOptions.toMap, sql/core/src/main/scala/org/apache/spark/sql/DataFrameWriter.scala:val storage = DataSource.buildStorageFormatFromOptions(extraOptions.toMap) sql/core/src/main/scala/org/apache/spark/sql/streaming/DataStreamReader.scala: options = extraOptions.toMap) sql/core/src/main/scala/org/apache/spark/sql/streaming/DataStreamWriter.scala: extraOptions.toMap, sql/core/src/main/scala/org/apache/spark/sql/streaming/DataStreamWriter.scala: extraOptions.toMap, sql/core/src/main/scala/org/apache/spark/sql/streaming/DataStreamWriter.scala: extraOptions.toMap, sql/core/src/main/scala/org/apache/spark/sql/streaming/DataStreamWriter.scala: extraOptions.toMap, sql/core/src/main/scala/org/apache/spark/sql/streaming/DataStreamWriter.scala: options = extraOptions.toMap, ``` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on a change in pull request #29191: [SPARK-32364][SQL][FOLLOWUP] Add toMap to return originalMap
dongjoon-hyun commented on a change in pull request #29191: URL: https://github.com/apache/spark/pull/29191#discussion_r459207445 ## File path: sql/catalyst/src/main/scala-2.12/org/apache/spark/sql/catalyst/util/CaseInsensitiveMap.scala ## @@ -52,6 +52,8 @@ class CaseInsensitiveMap[T] private (val originalMap: Map[String, T]) extends Ma override def -(key: String): Map[String, T] = { new CaseInsensitiveMap(originalMap.filter(!_._1.equalsIgnoreCase(key))) } + + def toMap: Map[String, T] = originalMap Review comment: In that case, among all the existing `extraOptions.toMap`s, do I need to rename only `AppendData.byName` instance? ``` sql/core/src/main/scala/org/apache/spark/sql/DataFrameReader.scala:val params = extraOptions.toMap ++ connectionProperties.asScala.toMap sql/core/src/main/scala/org/apache/spark/sql/DataFrameReader.scala: extraOptions.toMap, sql/core/src/main/scala/org/apache/spark/sql/DataFrameReader.scala: extraOptions.toMap, sql/core/src/main/scala/org/apache/spark/sql/DataFrameWriter.scala: AppendData.byName(relation, df.logicalPlan, extraOptions.toMap) sql/core/src/main/scala/org/apache/spark/sql/DataFrameWriter.scala: relation, df.logicalPlan, Literal(true), extraOptions.toMap) sql/core/src/main/scala/org/apache/spark/sql/DataFrameWriter.scala: extraOptions.toMap, sql/core/src/main/scala/org/apache/spark/sql/DataFrameWriter.scala: options = extraOptions.toMap).planForWriting(mode, df.logicalPlan) sql/core/src/main/scala/org/apache/spark/sql/DataFrameWriter.scala: AppendData.byPosition(table, df.logicalPlan, extraOptions.toMap) sql/core/src/main/scala/org/apache/spark/sql/DataFrameWriter.scala: OverwritePartitionsDynamic.byPosition(table, df.logicalPlan, extraOptions.toMap) sql/core/src/main/scala/org/apache/spark/sql/DataFrameWriter.scala: OverwriteByExpression.byPosition(table, df.logicalPlan, Literal(true), extraOptions.toMap) sql/core/src/main/scala/org/apache/spark/sql/DataFrameWriter.scala: AppendData.byName(v2Relation, df.logicalPlan, extraOptions.toMap) sql/core/src/main/scala/org/apache/spark/sql/DataFrameWriter.scala: extraOptions.toMap, sql/core/src/main/scala/org/apache/spark/sql/DataFrameWriter.scala: extraOptions.toMap, sql/core/src/main/scala/org/apache/spark/sql/DataFrameWriter.scala:val storage = DataSource.buildStorageFormatFromOptions(extraOptions.toMap) sql/core/src/main/scala/org/apache/spark/sql/streaming/DataStreamReader.scala: options = extraOptions.toMap) sql/core/src/main/scala/org/apache/spark/sql/streaming/DataStreamWriter.scala: extraOptions.toMap, sql/core/src/main/scala/org/apache/spark/sql/streaming/DataStreamWriter.scala: extraOptions.toMap, sql/core/src/main/scala/org/apache/spark/sql/streaming/DataStreamWriter.scala: extraOptions.toMap, sql/core/src/main/scala/org/apache/spark/sql/streaming/DataStreamWriter.scala: extraOptions.toMap, sql/core/src/main/scala/org/apache/spark/sql/streaming/DataStreamWriter.scala: options = extraOptions.toMap, ``` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on a change in pull request #29191: [SPARK-32364][SQL][FOLLOWUP] Add toMap to return originalMap
dongjoon-hyun commented on a change in pull request #29191: URL: https://github.com/apache/spark/pull/29191#discussion_r459206761 ## File path: sql/core/src/main/scala/org/apache/spark/sql/DataFrameWriter.scala ## @@ -288,7 +288,7 @@ final class DataFrameWriter[T] private[sql](ds: Dataset[T]) { val provider = maybeV2Provider.get val sessionOptions = DataSourceV2Utils.extractSessionConfigs( provider, df.sparkSession.sessionState.conf) - val options = sessionOptions ++ extraOptions Review comment: Sure. I'll use `sessionOptions.filterKeys(!extraOptions.contains(_)) ++ extraOptions.originalMap`. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #29198: [SPARK-32401][SQL] Migrate function related commands to new resolution framework
AmplabJenkins removed a comment on pull request #29198: URL: https://github.com/apache/spark/pull/29198#issuecomment-662810535 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #29198: [SPARK-32401][SQL] Migrate function related commands to new resolution framework
SparkQA commented on pull request #29198: URL: https://github.com/apache/spark/pull/29198#issuecomment-662810236 **[Test build #126377 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/126377/testReport)** for PR 29198 at commit [`38756fc`](https://github.com/apache/spark/commit/38756fc8f0232dd62dce3cd4eb4bf6b467742061). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] LantaoJin commented on pull request #29062: [SPARK-32237][SQL] Resolve hint in CTE
LantaoJin commented on pull request #29062: URL: https://github.com/apache/spark/pull/29062#issuecomment-662810277 Thanks. I will file a backport PR. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] imback82 commented on a change in pull request #28840: [SPARK-31999][SQL] Add REFRESH FUNCTION command
imback82 commented on a change in pull request #28840: URL: https://github.com/apache/spark/pull/28840#discussion_r459205275 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/connector/catalog/LookupCatalog.scala ## @@ -155,4 +155,31 @@ private[sql] trait LookupCatalog extends Logging { None } } + + // TODO: move function related v2 statements to the new framework. Review comment: Created #29198 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #29198: [SPARK-32401][SQL] Migrate function related commands to new resolution framework
AmplabJenkins commented on pull request #29198: URL: https://github.com/apache/spark/pull/29198#issuecomment-662810535 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] imback82 commented on a change in pull request #29198: [SPARK-32401][SQL] Migrate function related commands to new resolution framework
imback82 commented on a change in pull request #29198: URL: https://github.com/apache/spark/pull/29198#discussion_r459204665 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala ## @@ -3650,12 +3652,24 @@ class AstBuilder(conf: SQLConf) extends SqlBaseBaseVisitor[AnyRef] with Logging } } -val functionIdentifier = visitMultipartIdentifier(ctx.multipartIdentifier) -CreateFunctionStatement( - functionIdentifier, +val nameParts = visitMultipartIdentifier(ctx.multipartIdentifier) +val isTemp = ctx.TEMPORARY != null +val func: LogicalPlan = if (isTemp) { + import org.apache.spark.sql.connector.catalog.CatalogV2Implicits._ + // temp func doesn't belong to any catalog and we shouldn't resolve catalog in the name. + if (nameParts.length > 2) { +throw new AnalysisException(s"Unsupported function name '${nameParts.quoted}'") + } + ResolvedFunc(nameParts.asIdentifier) Review comment: This logic is moved from `ResolveSessionCatalog`, and I think we can resolve here directly. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on pull request #29191: [SPARK-32364][SQL][FOLLOWUP] Add toMap to return originalMap
dongjoon-hyun commented on pull request #29191: URL: https://github.com/apache/spark/pull/29191#issuecomment-662809395 Thank you, @cloud-fan and @maropu . This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] holdenk commented on pull request #29032: [SPARK-32217] Plumb whether a worker would also be decommissioned along with executor
holdenk commented on pull request #29032: URL: https://github.com/apache/spark/pull/29032#issuecomment-662808939 Merged, thanks everyone. I don't know if we have a good principle around naming host v. worker with the deprecation. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] imback82 opened a new pull request #29198: [SPARK-32401][SQL] Migrate function related commands to new resolution framework
imback82 opened a new pull request #29198: URL: https://github.com/apache/spark/pull/29198 ### What changes were proposed in this pull request? This PR proposes to migrate the following function related commands to the new resolution framework: - CREATE FUNCTION - DROP FUNCTION - DESCRIBE FUNCTION - SHOW FUNCTIONS ### Why are the changes needed? Migrating to the new resolution framework. ### Does this PR introduce _any_ user-facing change? The message of exception thrown when a catalog is resolved to v2 has been merged to: `function is only supported in v1 catalog` Previously, it printed out the command used. E.g.,: `CREATE FUNCTION is only supported in v1 catalog` ### How was this patch tested? Updated existing tests. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on pull request #29032: [SPARK-32217] Plumb whether a worker would also be decommissioned along with executor
cloud-fan commented on pull request #29032: URL: https://github.com/apache/spark/pull/29032#issuecomment-662805220 I have a general question (not related to this PR): We have deprecated the usage of "multi-workers on one host", so host and worker should be the same thing. However, in the codebase sometimes we use `host` and sometimes `worker`, do we have a general rule for the naming? This PR LGTM, I just ask this question for curiosity. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #29166: [SPARK-32280][SPARK-32372][SQL] ResolveReferences.dedupRight should only rewrite attributes for ancestor nodes of the conflict
cloud-fan commented on a change in pull request #29166: URL: https://github.com/apache/spark/pull/29166#discussion_r459199272 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala ## @@ -1237,20 +1250,48 @@ class Analyzer( if (conflictPlans.isEmpty) { right } else { -val attributeRewrites = AttributeMap(conflictPlans.flatMap { - case (oldRelation, newRelation) => oldRelation.output.zip(newRelation.output)}) -val conflictPlanMap = conflictPlans.toMap -// transformDown so that we can replace all the old Relations in one turn due to -// the reason that `conflictPlans` are also collected in pre-order. -right transformDown { - case r => conflictPlanMap.getOrElse(r, r) -} transformUp { - case other => other transformExpressions { +rewritePlan(right, conflictPlans.toMap)._1 + } +} + +private def rewritePlan(plan: LogicalPlan, conflictPlanMap: Map[LogicalPlan, LogicalPlan]) + : (LogicalPlan, Seq[(Attribute, Attribute)]) = { + if (conflictPlanMap.contains(plan)) { +// If the plan is the one that conflict the with left one, we'd +// just replace it with the new plan and collect the rewrite +// attributes for the parent node. +val newRelation = conflictPlanMap(plan) +newRelation -> plan.output.zip(newRelation.output) + } else { +val attrMapping = new mutable.ArrayBuffer[(Attribute, Attribute)]() +val newPlan = plan.mapChildren { child => + // If not, we'd rewrite child plan recursively until we find the + // conflict node or reach the leaf node. + val (newChild, childAttrMapping) = rewritePlan(child, conflictPlanMap) + attrMapping ++= childAttrMapping.filter { case (oldAttr, _) => +// If an attribute from the child is not output or referenced by the parent plan, +// it's useless to propagate this attribute to the parent plan, as they are not +// going to use this attribute. +(plan.outputSet ++ plan.references).contains(oldAttr) Review comment: maybe something like ``` // `attrMapping` is used to replace the attributes of the current `plan`, and also to be propagated to // parent plans, so the `oldAttr` must either be part of `plan.references` (so that it can be used to // replace attributes of current plan) or `plan.outputSet` (so that it can be used by parent plans). ``` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #29166: [SPARK-32280][SPARK-32372][SQL] ResolveReferences.dedupRight should only rewrite attributes for ancestor nodes of the conflict
cloud-fan commented on a change in pull request #29166: URL: https://github.com/apache/spark/pull/29166#discussion_r459198562 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala ## @@ -1237,20 +1250,48 @@ class Analyzer( if (conflictPlans.isEmpty) { right } else { -val attributeRewrites = AttributeMap(conflictPlans.flatMap { - case (oldRelation, newRelation) => oldRelation.output.zip(newRelation.output)}) -val conflictPlanMap = conflictPlans.toMap -// transformDown so that we can replace all the old Relations in one turn due to -// the reason that `conflictPlans` are also collected in pre-order. -right transformDown { - case r => conflictPlanMap.getOrElse(r, r) -} transformUp { - case other => other transformExpressions { +rewritePlan(right, conflictPlans.toMap)._1 + } +} + +private def rewritePlan(plan: LogicalPlan, conflictPlanMap: Map[LogicalPlan, LogicalPlan]) + : (LogicalPlan, Seq[(Attribute, Attribute)]) = { + if (conflictPlanMap.contains(plan)) { +// If the plan is the one that conflict the with left one, we'd +// just replace it with the new plan and collect the rewrite +// attributes for the parent node. +val newRelation = conflictPlanMap(plan) +newRelation -> plan.output.zip(newRelation.output) + } else { +val attrMapping = new mutable.ArrayBuffer[(Attribute, Attribute)]() +val newPlan = plan.mapChildren { child => + // If not, we'd rewrite child plan recursively until we find the + // conflict node or reach the leaf node. + val (newChild, childAttrMapping) = rewritePlan(child, conflictPlanMap) + attrMapping ++= childAttrMapping.filter { case (oldAttr, _) => +// If an attribute from the child is not output or referenced by the parent plan, +// it's useless to propagate this attribute to the parent plan, as they are not +// going to use this attribute. +(plan.outputSet ++ plan.references).contains(oldAttr) Review comment: ah sorry, you are right. `attrMapping` is not only propagated to parents, but is also used to rewrite attributes of the currennt `plan`, so `plan.references` is necessary. Can we make the comment clearer about it? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #29166: [SPARK-32280][SPARK-32372][SQL] ResolveReferences.dedupRight should only rewrite attributes for ancestor nodes of the conflict
cloud-fan commented on a change in pull request #29166: URL: https://github.com/apache/spark/pull/29166#discussion_r459198329 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala ## @@ -1237,20 +1250,48 @@ class Analyzer( if (conflictPlans.isEmpty) { right } else { -val attributeRewrites = AttributeMap(conflictPlans.flatMap { - case (oldRelation, newRelation) => oldRelation.output.zip(newRelation.output)}) -val conflictPlanMap = conflictPlans.toMap -// transformDown so that we can replace all the old Relations in one turn due to -// the reason that `conflictPlans` are also collected in pre-order. -right transformDown { - case r => conflictPlanMap.getOrElse(r, r) -} transformUp { - case other => other transformExpressions { +rewritePlan(right, conflictPlans.toMap)._1 + } +} + +private def rewritePlan(plan: LogicalPlan, conflictPlanMap: Map[LogicalPlan, LogicalPlan]) + : (LogicalPlan, Seq[(Attribute, Attribute)]) = { + if (conflictPlanMap.contains(plan)) { +// If the plan is the one that conflict the with left one, we'd +// just replace it with the new plan and collect the rewrite +// attributes for the parent node. +val newRelation = conflictPlanMap(plan) +newRelation -> plan.output.zip(newRelation.output) + } else { +val attrMapping = new mutable.ArrayBuffer[(Attribute, Attribute)]() +val newPlan = plan.mapChildren { child => + // If not, we'd rewrite child plan recursively until we find the + // conflict node or reach the leaf node. + val (newChild, childAttrMapping) = rewritePlan(child, conflictPlanMap) + attrMapping ++= childAttrMapping.filter { case (oldAttr, _) => +// If an attribute from the child is not output or referenced by the parent plan, +// it's useless to propagate this attribute to the parent plan, as they are not +// going to use this attribute. +(plan.outputSet ++ plan.references).contains(oldAttr) Review comment: the parents of `plan` can only reference `plan.outputSet`, that's why I think only `plan.outputSet` is necessary. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] kevinjmh commented on pull request #28901: [SPARK-32064][SQL] Supporting create temporary table
kevinjmh commented on pull request #28901: URL: https://github.com/apache/spark/pull/28901#issuecomment-662802864 how about to use `CACHE TABLE` command to do that? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #29191: [SPARK-32364][SQL][FOLLOWUP] Add toMap to return originalMap
cloud-fan commented on a change in pull request #29191: URL: https://github.com/apache/spark/pull/29191#discussion_r459197779 ## File path: sql/core/src/main/scala/org/apache/spark/sql/DataFrameWriter.scala ## @@ -288,7 +288,7 @@ final class DataFrameWriter[T] private[sql](ds: Dataset[T]) { val provider = maybeV2Provider.get val sessionOptions = DataSourceV2Utils.extractSessionConfigs( provider, df.sparkSession.sessionState.conf) - val options = sessionOptions ++ extraOptions Review comment: maybe `sessionOptions.filterNot(extraOptions.contains) ++ extraOptions.originalMap` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #29191: [SPARK-32364][SQL][FOLLOWUP] Add toMap to return originalMap
cloud-fan commented on a change in pull request #29191: URL: https://github.com/apache/spark/pull/29191#discussion_r459197666 ## File path: sql/catalyst/src/main/scala-2.12/org/apache/spark/sql/catalyst/util/CaseInsensitiveMap.scala ## @@ -52,6 +52,8 @@ class CaseInsensitiveMap[T] private (val originalMap: Map[String, T]) extends Ma override def -(key: String): Map[String, T] = { new CaseInsensitiveMap(originalMap.filter(!_._1.equalsIgnoreCase(key))) } + + def toMap: Map[String, T] = originalMap Review comment: shall we use consistent name `asCaseSensitiveMap`? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on pull request #29191: [SPARK-32364][SQL][FOLLOWUP] Add toMap to return originalMap
cloud-fan commented on pull request #29191: URL: https://github.com/apache/spark/pull/29191#issuecomment-662802631 Let's define the behavior more clear: 1. DFReader/Writer options are case insensitive, so later option should override prior ones if the key equals case-insensitively. This already fixed by your previous PR. We should also clearly document it in DFReader/Writer. 2. For DS V1, the options are passed as `Map[String, String]`, so v1 sources don't really have a way to get the `originalMap`. All keys are lowercased. (unless you cast it to `CaseInsensitiveMap` which is private) 3. For DS v2, the options are passed as `CaseInsensitiveStringMap`, which provides `asCaseSensitiveMap`. So it's case-preserving. This PR is trying to fix 3. There are 2 places we should fix: 1. `val options = sessionOptions ++ extraOptions`. Using `extraOptions` directly as a map will return lower-cased keys. There is another concern that if `sessionOptions` and `extraOptions` have duplicated keys, we should make sure `extraOptions` takes precedence. 2. `AppendData.byName...`. It should also get the case-preserving map, as the `AppendDataExec` will wrap it with `CaseInsensitiveStringMap`. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AngersZhuuuu commented on pull request #29085: [SPARK-32106][SQL]Implement SparkScriptTransformationExec in sql/core
AngersZh commented on pull request #29085: URL: https://github.com/apache/spark/pull/29085#issuecomment-662801978 > Looks almost okay now, so could you split this PR into pieces? I think its somewhat big fro reviews. For example; > > 1. More refactoring PR for `HiveScriptTransformationExec` and `BaseScriptTransfromationExec` just like [SPARK-32105](https://issues.apache.org/jira/browse/SPARK-32105) > 2. PR to improve test coverage of `HiveScriptTransformationExec` > 3. Then, PR to implement Spark-native TRRANSFORM > > WDTY? Yea, raise pr one by one. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AngersZhuuuu commented on a change in pull request #29085: [SPARK-32106][SQL]Implement SparkScriptTransformationExec in sql/core
AngersZh commented on a change in pull request #29085: URL: https://github.com/apache/spark/pull/29085#discussion_r459194715 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/BaseScriptTransformationExec.scala ## @@ -87,17 +181,72 @@ trait BaseScriptTransformationExec extends UnaryExecNode { } } } + + private lazy val outputFieldWriters: Seq[String => Any] = output.map { attr => +val converter = CatalystTypeConverters.createToCatalystConverter(attr.dataType) +attr.dataType match { + case StringType => wrapperConvertException(data => data, converter) + case BooleanType => wrapperConvertException(data => data.toBoolean, converter) + case ByteType => wrapperConvertException(data => data.toByte, converter) + case BinaryType => +wrapperConvertException(data => UTF8String.fromString(data).getBytes, converter) + case IntegerType => wrapperConvertException(data => data.toInt, converter) + case ShortType => wrapperConvertException(data => data.toShort, converter) + case LongType => wrapperConvertException(data => data.toLong, converter) + case FloatType => wrapperConvertException(data => data.toFloat, converter) + case DoubleType => wrapperConvertException(data => data.toDouble, converter) + case _: DecimalType => wrapperConvertException(data => BigDecimal(data), converter) + case DateType if conf.datetimeJava8ApiEnabled => +wrapperConvertException(data => DateTimeUtils.stringToDate( + UTF8String.fromString(data), + DateTimeUtils.getZoneId(conf.sessionLocalTimeZone)) + .map(DateTimeUtils.daysToLocalDate).orNull, converter) + case DateType => wrapperConvertException(data => DateTimeUtils.stringToDate( +UTF8String.fromString(data), +DateTimeUtils.getZoneId(conf.sessionLocalTimeZone)) +.map(DateTimeUtils.toJavaDate).orNull, converter) + case TimestampType if conf.datetimeJava8ApiEnabled => + wrapperConvertException(data => DateTimeUtils.stringToTimestamp( + UTF8String.fromString(data), + DateTimeUtils.getZoneId(conf.sessionLocalTimeZone)) + .map(DateTimeUtils.microsToInstant).orNull, converter) + case TimestampType => wrapperConvertException(data => DateTimeUtils.stringToTimestamp( +UTF8String.fromString(data), +DateTimeUtils.getZoneId(conf.sessionLocalTimeZone)) +.map(DateTimeUtils.toJavaTimestamp).orNull, converter) + case CalendarIntervalType => wrapperConvertException( +data => IntervalUtils.stringToInterval(UTF8String.fromString(data)), +converter) + case udt: UserDefinedType[_] => +wrapperConvertException(data => udt.deserialize(data), converter) + case dt => +throw new SparkException("TRANSFORM without serde does not support " + Review comment: > nit: `TRANSFORM` -> `s"$nodeName...` Done This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #29085: [SPARK-32106][SQL]Implement SparkScriptTransformationExec in sql/core
AmplabJenkins removed a comment on pull request #29085: URL: https://github.com/apache/spark/pull/29085#issuecomment-662799097 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #29085: [SPARK-32106][SQL]Implement SparkScriptTransformationExec in sql/core
AmplabJenkins commented on pull request #29085: URL: https://github.com/apache/spark/pull/29085#issuecomment-662799097 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #29197: [SPARK-32395][SQL] Commit output files after task attempt succeed in dynamic partition datawriter
AmplabJenkins removed a comment on pull request #29197: URL: https://github.com/apache/spark/pull/29197#issuecomment-662798617 Can one of the admins verify this patch? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #29197: [SPARK-32395][SQL] Commit output files after task attempt succeed in dynamic partition datawriter
AmplabJenkins commented on pull request #29197: URL: https://github.com/apache/spark/pull/29197#issuecomment-662798935 Can one of the admins verify this patch? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #29085: [SPARK-32106][SQL]Implement SparkScriptTransformationExec in sql/core
SparkQA commented on pull request #29085: URL: https://github.com/apache/spark/pull/29085#issuecomment-662798740 **[Test build #126376 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/126376/testReport)** for PR 29085 at commit [`03d3409`](https://github.com/apache/spark/commit/03d3409f6ca641a4f10fdc2ac71479445220f676). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #29197: [SPARK-32395][SQL] Commit output files after task attempt succeed in dynamic partition datawriter
AmplabJenkins commented on pull request #29197: URL: https://github.com/apache/spark/pull/29197#issuecomment-662798617 Can one of the admins verify this patch? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on pull request #29062: [SPARK-32237][SQL] Resolve hint in CTE
cloud-fan commented on pull request #29062: URL: https://github.com/apache/spark/pull/29062#issuecomment-662798239 merged to master, thanks! @LantaoJin can you send a backport to 3.0? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] wankunde opened a new pull request #29197: [SPARK-32395][SQL] Commit output files after task attempt succeed in dynamic partition datawriter
wankunde opened a new pull request #29197: URL: https://github.com/apache/spark/pull/29197 # What changes were proposed in this pull request? Generally, distributed jobs have two stages of committing files: committing task's output files and committing job's output files. If one attempt fails, another attempt will try to run the task again, after all tasks succeed, the job will commit the output of all tasks. But now if we run a dynamic partition overwrite job, for example, `INSERT OVERWRITE table dst partition(part) SELECT * from src`, then if one of the final stage tasks fails, the job will fail. The first task attempt datawriter in final stage writes the output data directly to spark stage directory.If the first taskattempt fails, the second taskattempt datawriter will fail to setup, because the task's output file is already exists. Then the job will fail. Therefore, I think we should write the temporary data to the task attempt's work directory, and commit result files after the task attempt succeed. ### Why are the changes needed? Bug fix in case one dynamic partition data writer of final stage tasks fails . ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Added UT This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan closed pull request #29062: [SPARK-32237][SQL] Resolve hint in CTE
cloud-fan closed pull request #29062: URL: https://github.com/apache/spark/pull/29062 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on pull request #29194: [SPARK-30616][SQL][FOLLOW-UP] Use only config key name in the config doc.
cloud-fan commented on pull request #29194: URL: https://github.com/apache/spark/pull/29194#issuecomment-662797390 thanks, merging to master! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan closed pull request #29194: [SPARK-30616][SQL][FOLLOW-UP] Use only config key name in the config doc.
cloud-fan closed pull request #29194: URL: https://github.com/apache/spark/pull/29194 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AngersZhuuuu commented on a change in pull request #29085: [SPARK-32106][SQL]Implement SparkScriptTransformationExec in sql/core
AngersZh commented on a change in pull request #29085: URL: https://github.com/apache/spark/pull/29085#discussion_r459189448 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/BaseScriptTransformationExec.scala ## @@ -87,17 +181,72 @@ trait BaseScriptTransformationExec extends UnaryExecNode { } } } + + private lazy val outputFieldWriters: Seq[String => Any] = output.map { attr => +val converter = CatalystTypeConverters.createToCatalystConverter(attr.dataType) +attr.dataType match { + case StringType => wrapperConvertException(data => data, converter) + case BooleanType => wrapperConvertException(data => data.toBoolean, converter) + case ByteType => wrapperConvertException(data => data.toByte, converter) + case BinaryType => +wrapperConvertException(data => UTF8String.fromString(data).getBytes, converter) + case IntegerType => wrapperConvertException(data => data.toInt, converter) + case ShortType => wrapperConvertException(data => data.toShort, converter) + case LongType => wrapperConvertException(data => data.toLong, converter) + case FloatType => wrapperConvertException(data => data.toFloat, converter) + case DoubleType => wrapperConvertException(data => data.toDouble, converter) + case _: DecimalType => wrapperConvertException(data => BigDecimal(data), converter) + case DateType if conf.datetimeJava8ApiEnabled => +wrapperConvertException(data => DateTimeUtils.stringToDate( + UTF8String.fromString(data), + DateTimeUtils.getZoneId(conf.sessionLocalTimeZone)) + .map(DateTimeUtils.daysToLocalDate).orNull, converter) + case DateType => wrapperConvertException(data => DateTimeUtils.stringToDate( +UTF8String.fromString(data), +DateTimeUtils.getZoneId(conf.sessionLocalTimeZone)) +.map(DateTimeUtils.toJavaDate).orNull, converter) + case TimestampType if conf.datetimeJava8ApiEnabled => + wrapperConvertException(data => DateTimeUtils.stringToTimestamp( + UTF8String.fromString(data), + DateTimeUtils.getZoneId(conf.sessionLocalTimeZone)) + .map(DateTimeUtils.microsToInstant).orNull, converter) + case TimestampType => wrapperConvertException(data => DateTimeUtils.stringToTimestamp( +UTF8String.fromString(data), +DateTimeUtils.getZoneId(conf.sessionLocalTimeZone)) +.map(DateTimeUtils.toJavaTimestamp).orNull, converter) + case CalendarIntervalType => wrapperConvertException( +data => IntervalUtils.stringToInterval(UTF8String.fromString(data)), +converter) + case udt: UserDefinedType[_] => +wrapperConvertException(data => udt.deserialize(data), converter) + case dt => +throw new SparkException("TRANSFORM without serde does not support " + + s"${dt.getClass.getSimpleName} as output data type") Review comment: > `dt.getClass.getSimpleName` -> `dt.catalogString` It is not general, for ArrayType will show `array`, StructType will show `struct` WDYT? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AngersZhuuuu commented on a change in pull request #29085: [SPARK-32106][SQL]Implement SparkScriptTransformationExec in sql/core
AngersZh commented on a change in pull request #29085: URL: https://github.com/apache/spark/pull/29085#discussion_r459189448 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/BaseScriptTransformationExec.scala ## @@ -87,17 +181,72 @@ trait BaseScriptTransformationExec extends UnaryExecNode { } } } + + private lazy val outputFieldWriters: Seq[String => Any] = output.map { attr => +val converter = CatalystTypeConverters.createToCatalystConverter(attr.dataType) +attr.dataType match { + case StringType => wrapperConvertException(data => data, converter) + case BooleanType => wrapperConvertException(data => data.toBoolean, converter) + case ByteType => wrapperConvertException(data => data.toByte, converter) + case BinaryType => +wrapperConvertException(data => UTF8String.fromString(data).getBytes, converter) + case IntegerType => wrapperConvertException(data => data.toInt, converter) + case ShortType => wrapperConvertException(data => data.toShort, converter) + case LongType => wrapperConvertException(data => data.toLong, converter) + case FloatType => wrapperConvertException(data => data.toFloat, converter) + case DoubleType => wrapperConvertException(data => data.toDouble, converter) + case _: DecimalType => wrapperConvertException(data => BigDecimal(data), converter) + case DateType if conf.datetimeJava8ApiEnabled => +wrapperConvertException(data => DateTimeUtils.stringToDate( + UTF8String.fromString(data), + DateTimeUtils.getZoneId(conf.sessionLocalTimeZone)) + .map(DateTimeUtils.daysToLocalDate).orNull, converter) + case DateType => wrapperConvertException(data => DateTimeUtils.stringToDate( +UTF8String.fromString(data), +DateTimeUtils.getZoneId(conf.sessionLocalTimeZone)) +.map(DateTimeUtils.toJavaDate).orNull, converter) + case TimestampType if conf.datetimeJava8ApiEnabled => + wrapperConvertException(data => DateTimeUtils.stringToTimestamp( + UTF8String.fromString(data), + DateTimeUtils.getZoneId(conf.sessionLocalTimeZone)) + .map(DateTimeUtils.microsToInstant).orNull, converter) + case TimestampType => wrapperConvertException(data => DateTimeUtils.stringToTimestamp( +UTF8String.fromString(data), +DateTimeUtils.getZoneId(conf.sessionLocalTimeZone)) +.map(DateTimeUtils.toJavaTimestamp).orNull, converter) + case CalendarIntervalType => wrapperConvertException( +data => IntervalUtils.stringToInterval(UTF8String.fromString(data)), +converter) + case udt: UserDefinedType[_] => +wrapperConvertException(data => udt.deserialize(data), converter) + case dt => +throw new SparkException("TRANSFORM without serde does not support " + + s"${dt.getClass.getSimpleName} as output data type") Review comment: > `dt.getClass.getSimpleName` -> `dt.catalogString` It is not general, for ArrayType will show `array`, StructType will show `struct` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org