[GitHub] [spark] grundprinzip commented on a diff in pull request #40447: [SPARK-42816][CONNECT] Support Max Message size up to 128MB

2023-03-15 Thread via GitHub
grundprinzip commented on code in PR #40447: URL: https://github.com/apache/spark/pull/40447#discussion_r1138163227 ## python/pyspark/sql/connect/client.py: ## @@ -122,6 +122,8 @@ class ChannelBuilder: PARAM_TOKEN = "token" PARAM_USER_ID = "user_id"

[GitHub] [spark] navinvishy commented on pull request #38947: [SPARK-41233][SQL][PYTHON] Add `array_prepend` function

2023-03-15 Thread via GitHub
navinvishy commented on PR #38947: URL: https://github.com/apache/spark/pull/38947#issuecomment-1471352117 > @navinvishy would you mind addressing wenchen's comments? we can merge it then. I've addressed them. Thanks for checking, @zhengruifeng ! -- This is an automated message

[GitHub] [spark] navinvishy commented on a diff in pull request #38947: [SPARK-41233][SQL][PYTHON] Add `array_prepend` function

2023-03-15 Thread via GitHub
navinvishy commented on code in PR #38947: URL: https://github.com/apache/spark/pull/38947#discussion_r1138152130 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala: ## @@ -1399,6 +1399,151 @@ case class ArrayContains(left:

[GitHub] [spark] navinvishy commented on a diff in pull request #38947: [SPARK-41233][SQL][PYTHON] Add `array_prepend` function

2023-03-15 Thread via GitHub
navinvishy commented on code in PR #38947: URL: https://github.com/apache/spark/pull/38947#discussion_r1138150738 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala: ## @@ -1399,6 +1399,151 @@ case class ArrayContains(left:

[GitHub] [spark] zhouyejoe commented on pull request #40412: [SPARK-42784] should still create subDir when the number of subDir in merge dir is less than conf

2023-03-15 Thread via GitHub
zhouyejoe commented on PR #40412: URL: https://github.com/apache/spark/pull/40412#issuecomment-1471342162 Thanks for creating the PR. Will review ASAP @Stove-hust -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

[GitHub] [spark] dongjoon-hyun closed pull request #40453: [SPARK-42820][BUILD] Update ORC to 1.8.3

2023-03-15 Thread via GitHub
dongjoon-hyun closed pull request #40453: [SPARK-42820][BUILD] Update ORC to 1.8.3 URL: https://github.com/apache/spark/pull/40453 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] dongjoon-hyun commented on pull request #40453: [SPARK-42820][BUILD] Update ORC to 1.8.3

2023-03-15 Thread via GitHub
dongjoon-hyun commented on PR #40453: URL: https://github.com/apache/spark/pull/40453#issuecomment-1471315962 Let me merge this~ Merged to master/3.4. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[GitHub] [spark] dongjoon-hyun commented on pull request #40453: [SPARK-42820][BUILD] Update ORC to 1.8.3

2023-03-15 Thread via GitHub
dongjoon-hyun commented on PR #40453: URL: https://github.com/apache/spark/pull/40453#issuecomment-1471314521 It seems that there is some GitHub Action setting issue on William side. Actually, I was the release manager of Apache ORC 1.8.3 and tested this here in my repo. -

[GitHub] [spark] ulysses-you commented on a diff in pull request #40446: [SPARK-42815][SQL] Subexpression elimination support shortcut expression

2023-03-15 Thread via GitHub
ulysses-you commented on code in PR #40446: URL: https://github.com/apache/spark/pull/40446#discussion_r1138120937 ## sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala: ## @@ -864,6 +864,14 @@ object SQLConf { .checkValue(_ >= 0, "The maximum must

[GitHub] [spark] zhengruifeng commented on pull request #40432: [SPARK-42800][CONNECT][PYTHON][ML] Implement ml function `{array_to_vector, vector_to_array}`

2023-03-15 Thread via GitHub
zhengruifeng commented on PR #40432: URL: https://github.com/apache/spark/pull/40432#issuecomment-1471307277 @WeichenXu123 not ready. `sql slow` failed with message related to `mllib-common`: ``` [error]

[GitHub] [spark] ulysses-you commented on a diff in pull request #40446: [SPARK-42815][SQL] Subexpression elimination support shortcut expression

2023-03-15 Thread via GitHub
ulysses-you commented on code in PR #40446: URL: https://github.com/apache/spark/pull/40446#discussion_r1138102904 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/EquivalentExpressions.scala: ## @@ -128,10 +131,23 @@ class EquivalentExpressions { //

[GitHub] [spark] WeichenXu123 commented on pull request #40432: [SPARK-42800][CONNECT][PYTHON][ML] Implement ml function `{array_to_vector, vector_to_array}`

2023-03-15 Thread via GitHub
WeichenXu123 commented on PR #40432: URL: https://github.com/apache/spark/pull/40432#issuecomment-1471290106 Is it ready to merge ? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] ulysses-you commented on a diff in pull request #40446: [SPARK-42815][SQL] Subexpression elimination support shortcut expression

2023-03-15 Thread via GitHub
ulysses-you commented on code in PR #40446: URL: https://github.com/apache/spark/pull/40446#discussion_r1138102904 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/EquivalentExpressions.scala: ## @@ -128,10 +131,23 @@ class EquivalentExpressions { //

[GitHub] [spark] shuwang21 commented on pull request #40448: [SPARK-42817][CORE] Logging the shuffle service name once in ApplicationMaster

2023-03-15 Thread via GitHub
shuwang21 commented on PR #40448: URL: https://github.com/apache/spark/pull/40448#issuecomment-1471288187 LGTM. thanks! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

[GitHub] [spark] viirya commented on a diff in pull request #40446: [SPARK-42815][SQL] Subexpression elimination support shortcut expression

2023-03-15 Thread via GitHub
viirya commented on code in PR #40446: URL: https://github.com/apache/spark/pull/40446#discussion_r1138097640 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/EquivalentExpressions.scala: ## @@ -128,10 +131,23 @@ class EquivalentExpressions { // There

[GitHub] [spark] viirya commented on a diff in pull request #40446: [SPARK-42815][SQL] Subexpression elimination support shortcut expression

2023-03-15 Thread via GitHub
viirya commented on code in PR #40446: URL: https://github.com/apache/spark/pull/40446#discussion_r1138095296 ## sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala: ## @@ -864,6 +864,14 @@ object SQLConf { .checkValue(_ >= 0, "The maximum must not be

[GitHub] [spark] LuciferYang commented on pull request #40452: [MINOR] Add comments of `xercesImpl` upgrade precautions in `pom.xml`

2023-03-15 Thread via GitHub
LuciferYang commented on PR #40452: URL: https://github.com/apache/spark/pull/40452#issuecomment-1471278961 Thanks @dongjoon-hyun @yaooqinn -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] dongjoon-hyun closed pull request #40452: [MINOR] Add comments of `xercesImpl` upgrade precautions in `pom.xml`

2023-03-15 Thread via GitHub
dongjoon-hyun closed pull request #40452: [MINOR] Add comments of `xercesImpl` upgrade precautions in `pom.xml` URL: https://github.com/apache/spark/pull/40452 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[GitHub] [spark] LuciferYang commented on a diff in pull request #40452: [MINOR] Add comments of `xercesImpl` upgrade precautions in `pom.xml`

2023-03-15 Thread via GitHub
LuciferYang commented on code in PR #40452: URL: https://github.com/apache/spark/pull/40452#discussion_r1138092261 ## pom.xml: ## @@ -1426,6 +1426,7 @@ test + Review Comment: done -- This is an automated message from the Apache Git

[GitHub] [spark] LuciferYang commented on pull request #40389: [SPARK-42767][CONNECT][TESTS] Add a precondition to start connect server fallback with `in-memory` and auto ignored some tests strongly d

2023-03-15 Thread via GitHub
LuciferYang commented on PR #40389: URL: https://github.com/apache/spark/pull/40389#issuecomment-1471275097 Thanks @HyukjinKwon and @Hisoka-X -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] yaooqinn commented on pull request #40453: [SPARK-42820][BUILD] Update ORC to 1.8.3

2023-03-15 Thread via GitHub
yaooqinn commented on PR #40453: URL: https://github.com/apache/spark/pull/40453#issuecomment-1471274601 LGTM, thanks @williamhyun @dongjoon-hyun -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

[GitHub] [spark] HyukjinKwon closed pull request #40389: [SPARK-42767][CONNECT][TESTS] Add a precondition to start connect server fallback with `in-memory` and auto ignored some tests strongly depend

2023-03-15 Thread via GitHub
HyukjinKwon closed pull request #40389: [SPARK-42767][CONNECT][TESTS] Add a precondition to start connect server fallback with `in-memory` and auto ignored some tests strongly depend on hive URL: https://github.com/apache/spark/pull/40389 -- This is an automated message from the Apache Git

[GitHub] [spark] HyukjinKwon commented on pull request #40389: [SPARK-42767][CONNECT][TESTS] Add a precondition to start connect server fallback with `in-memory` and auto ignored some tests strongly d

2023-03-15 Thread via GitHub
HyukjinKwon commented on PR #40389: URL: https://github.com/apache/spark/pull/40389#issuecomment-1471270622 Merged to master and branch-3.4. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] dongjoon-hyun commented on pull request #40448: [SPARK-42817][CORE] Logging the shuffle service name once in ApplicationMaster

2023-03-15 Thread via GitHub
dongjoon-hyun commented on PR #40448: URL: https://github.com/apache/spark/pull/40448#issuecomment-1471270199 cc @mridulm -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

[GitHub] [spark] HyukjinKwon commented on pull request #39239: [SPARK-41730][PYTHON] Set tz to UTC while converting of timestamps to python's datetime

2023-03-15 Thread via GitHub
HyukjinKwon commented on PR #39239: URL: https://github.com/apache/spark/pull/39239#issuecomment-1471268820 Yup, I meant that most of cases work except few cases that can happen because of timezone. we're on the same page. -- This is an automated message from the Apache Git Service. To

[GitHub] [spark] dongjoon-hyun commented on a diff in pull request #40448: [SPARK-42817][CORE] Logging the shuffle service name once in ApplicationMaster

2023-03-15 Thread via GitHub
dongjoon-hyun commented on code in PR #40448: URL: https://github.com/apache/spark/pull/40448#discussion_r1138090342 ## resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala: ## @@ -498,7 +498,10 @@ private[spark] class ApplicationMaster(

[GitHub] [spark] dongjoon-hyun commented on a diff in pull request #40452: [MINOR] Add comments of `xercesImpl` upgrade precautions in `pom.xml`

2023-03-15 Thread via GitHub
dongjoon-hyun commented on code in PR #40452: URL: https://github.com/apache/spark/pull/40452#discussion_r1138089863 ## pom.xml: ## @@ -1426,6 +1426,7 @@ test + Review Comment: Thank you, @LuciferYang . Could you split this into two lines?

[GitHub] [spark] zhengruifeng commented on pull request #40432: [SPARK-42800][CONNECT][PYTHON][ML] Implement ml function `{array_to_vector, vector_to_array}`

2023-03-15 Thread via GitHub
zhengruifeng commented on PR #40432: URL: https://github.com/apache/spark/pull/40432#issuecomment-1471248695 `sql - slow` failed, not sure whether it is related, let me investigate it first -- This is an automated message from the Apache Git Service. To respond to the message, please log

[GitHub] [spark] ulysses-you commented on a diff in pull request #40446: [SPARK-42815][SQL] Subexpression elimination support shortcut conditional expression

2023-03-15 Thread via GitHub
ulysses-you commented on code in PR #40446: URL: https://github.com/apache/spark/pull/40446#discussion_r1138066754 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/EquivalentExpressions.scala: ## @@ -130,7 +133,19 @@ class EquivalentExpressions { //

[GitHub] [spark] cloud-fan commented on a diff in pull request #40116: [SPARK-41391][SQL] The output column name of groupBy.agg(count_distinct) is incorrect

2023-03-15 Thread via GitHub
cloud-fan commented on code in PR #40116: URL: https://github.com/apache/spark/pull/40116#discussion_r1138054456 ## sql/core/src/main/scala/org/apache/spark/sql/SQLImplicits.scala: ## @@ -40,12 +40,15 @@ abstract class SQLImplicits extends LowPrioritySQLImplicits { */

[GitHub] [spark] cloud-fan commented on pull request #40116: [SPARK-41391][SQL] The output column name of groupBy.agg(count_distinct) is incorrect

2023-03-15 Thread via GitHub
cloud-fan commented on PR #40116: URL: https://github.com/apache/spark/pull/40116#issuecomment-1471235372 The single quote indicates that the expression is unresolved, I think it doesn't matter here. -- This is an automated message from the Apache Git Service. To respond to the message,

[GitHub] [spark] zhengruifeng commented on pull request #40451: [SPARK-42818][CONNECT][PYTHON][FOLLOWUP] Add versionchanged

2023-03-15 Thread via GitHub
zhengruifeng commented on PR #40451: URL: https://github.com/apache/spark/pull/40451#issuecomment-1471228184 merged to master/branch-3.4 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] zhengruifeng closed pull request #40451: [SPARK-42818][CONNECT][PYTHON][FOLLOWUP] Add versionchanged

2023-03-15 Thread via GitHub
zhengruifeng closed pull request #40451: [SPARK-42818][CONNECT][PYTHON][FOLLOWUP] Add versionchanged URL: https://github.com/apache/spark/pull/40451 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

[GitHub] [spark] beliefer commented on pull request #40396: [SPARK-42772][SQL] Change the default value of JDBC options about push down to true

2023-03-15 Thread via GitHub
beliefer commented on PR #40396: URL: https://github.com/apache/spark/pull/40396#issuecomment-1471226525 @dongjoon-hyun @cloud-fan @huaxingao @sadikovi Thank you. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

[GitHub] [spark] cloud-fan commented on a diff in pull request #40446: [SPARK-42815][SQL] Subexpression elimination support shortcut conditional expression

2023-03-15 Thread via GitHub
cloud-fan commented on code in PR #40446: URL: https://github.com/apache/spark/pull/40446#discussion_r1138040356 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/EquivalentExpressions.scala: ## @@ -130,7 +133,19 @@ class EquivalentExpressions { //

[GitHub] [spark] panbingkun opened a new pull request, #40454: [SPARK-42821][SQL] Remove unused parameters in splitFiles methods

2023-03-15 Thread via GitHub
panbingkun opened a new pull request, #40454: URL: https://github.com/apache/spark/pull/40454 ### What changes were proposed in this pull request? The pr aims to remove unused parameters in PartitionedFileUtil.splitFiles methods ### Why are the changes needed? ###

[GitHub] [spark] zhengruifeng commented on pull request #38947: [SPARK-41233][SQL][PYTHON] Add `array_prepend` function

2023-03-15 Thread via GitHub
zhengruifeng commented on PR #38947: URL: https://github.com/apache/spark/pull/38947#issuecomment-1471205562 @navinvishy would you mind addressing wenchen's comments? we can merge it then. -- This is an automated message from the Apache Git Service. To respond to the message, please log

[GitHub] [spark] zhengruifeng commented on pull request #40450: [SPARK-42818][CONNECT][PYTHON] Implement DataFrameReader/Writer.jdbc

2023-03-15 Thread via GitHub
zhengruifeng commented on PR #40450: URL: https://github.com/apache/spark/pull/40450#issuecomment-1471199868 Late LGTM -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

[GitHub] [spark] williamhyun opened a new pull request, #40453: [SPARK-42820][BUILD] Update ORC to 1.8.3

2023-03-15 Thread via GitHub
williamhyun opened a new pull request, #40453: URL: https://github.com/apache/spark/pull/40453 ### What changes were proposed in this pull request? This PR aims to update ORC to 1.8.3. ### Why are the changes needed? This will bring the following bug fixes. -

[GitHub] [spark] ulysses-you commented on a diff in pull request #40446: [SPARK-42815][SQL] Subexpression elimination support shortcut conditional expression

2023-03-15 Thread via GitHub
ulysses-you commented on code in PR #40446: URL: https://github.com/apache/spark/pull/40446#discussion_r1138020573 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/EquivalentExpressions.scala: ## @@ -130,7 +133,19 @@ class EquivalentExpressions { //

[GitHub] [spark] zhengruifeng commented on a diff in pull request #40432: [SPARK-42800][CONNECT][PYTHON][ML] Implement ml function `{array_to_vector, vector_to_array}`

2023-03-15 Thread via GitHub
zhengruifeng commented on code in PR #40432: URL: https://github.com/apache/spark/pull/40432#discussion_r1138017400 ## dev/sparktestsupport/modules.py: ## @@ -781,6 +740,57 @@ def __hash__(self): ], ) + +pyspark_connect = Module( +name="pyspark-connect", +

[GitHub] [spark] cloud-fan commented on a diff in pull request #40446: [SPARK-42815][SQL] Subexpression elimination support shortcut conditional expression

2023-03-15 Thread via GitHub
cloud-fan commented on code in PR #40446: URL: https://github.com/apache/spark/pull/40446#discussion_r1138016464 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/EquivalentExpressions.scala: ## @@ -130,7 +133,19 @@ class EquivalentExpressions { //

[GitHub] [spark] zhengruifeng commented on a diff in pull request #40432: [SPARK-42800][CONNECT][PYTHON][ML] Implement ml function `{array_to_vector, vector_to_array}`

2023-03-15 Thread via GitHub
zhengruifeng commented on code in PR #40432: URL: https://github.com/apache/spark/pull/40432#discussion_r1138015354 ## python/pyspark/ml/connect/functions.py: ## @@ -0,0 +1,76 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license

[GitHub] [spark] LuciferYang commented on pull request #40431: [SPARK-42799][BUILD] Update SBT build `xercesImpl` version to match with `pom.xml`

2023-03-15 Thread via GitHub
LuciferYang commented on PR #40431: URL: https://github.com/apache/spark/pull/40431#issuecomment-1471187025 https://github.com/apache/spark/pull/40452/files : I added a comment in `pom.xml` to prevent us from forgetting this -- This is an automated message from the Apache Git Service. To

[GitHub] [spark] zhengruifeng commented on a diff in pull request #40432: [SPARK-42800][CONNECT][PYTHON][ML] Implement ml function `{array_to_vector, vector_to_array}`

2023-03-15 Thread via GitHub
zhengruifeng commented on code in PR #40432: URL: https://github.com/apache/spark/pull/40432#discussion_r1138014700 ## dev/sparktestsupport/modules.py: ## @@ -655,6 +655,7 @@ def __hash__(self): "pyspark.ml.tests.test_wrapper",

[GitHub] [spark] LuciferYang opened a new pull request, #40452: [MINOR] Add comments of `xercesImpl` upgrade precautions in `pom.xml`

2023-03-15 Thread via GitHub
LuciferYang opened a new pull request, #40452: URL: https://github.com/apache/spark/pull/40452 ### What changes were proposed in this pull request? This pr just add comments of `xercesImpl` upgrade precautions in `pom.xml`. ### Why are the changes needed? Add comments to remind

[GitHub] [spark] cloud-fan commented on a diff in pull request #40446: [SPARK-42815][SQL] Subexpression elimination support shortcut conditional expression

2023-03-15 Thread via GitHub
cloud-fan commented on code in PR #40446: URL: https://github.com/apache/spark/pull/40446#discussion_r1138011206 ## sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala: ## @@ -864,6 +864,15 @@ object SQLConf { .checkValue(_ >= 0, "The maximum must not

[GitHub] [spark] amaliujia commented on a diff in pull request #40447: [SPARK-42816][CONNECT] Support Max Message size up to 128MB

2023-03-15 Thread via GitHub
amaliujia commented on code in PR #40447: URL: https://github.com/apache/spark/pull/40447#discussion_r1138008293 ## connector/connect/server/src/main/scala/org/apache/spark/sql/connect/config/Connect.scala: ## @@ -47,6 +47,13 @@ object Connect { .bytesConf(ByteUnit.MiB)

[GitHub] [spark] amaliujia commented on a diff in pull request #40447: [SPARK-42816][CONNECT] Support Max Message size up to 128MB

2023-03-15 Thread via GitHub
amaliujia commented on code in PR #40447: URL: https://github.com/apache/spark/pull/40447#discussion_r1138007108 ## connector/connect/server/src/main/scala/org/apache/spark/sql/connect/config/Connect.scala: ## @@ -47,6 +47,13 @@ object Connect { .bytesConf(ByteUnit.MiB)

[GitHub] [spark] ueshin opened a new pull request, #40451: [SPARK-42818][CONNECT][PYTHON][FOLLOWUP] Add versionchanged

2023-03-15 Thread via GitHub
ueshin opened a new pull request, #40451: URL: https://github.com/apache/spark/pull/40451 ### What changes were proposed in this pull request? Follow-up of #40450. Adds `versionchanged` to the docstring. ### Why are the changes needed? The `versionchanged` is

[GitHub] [spark] HyukjinKwon closed pull request #40450: [SPARK-42818][CONNECT][PYTHON] Implement DataFrameReader/Writer.jdbc

2023-03-15 Thread via GitHub
HyukjinKwon closed pull request #40450: [SPARK-42818][CONNECT][PYTHON] Implement DataFrameReader/Writer.jdbc URL: https://github.com/apache/spark/pull/40450 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

[GitHub] [spark] HyukjinKwon commented on pull request #40450: [SPARK-42818][CONNECT][PYTHON] Implement DataFrameReader/Writer.jdbc

2023-03-15 Thread via GitHub
HyukjinKwon commented on PR #40450: URL: https://github.com/apache/spark/pull/40450#issuecomment-1471147451 Merged to master and branch-3.4. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] ulysses-you commented on pull request #40446: [SPARK-42815][SQL] Subexpression elimination support shortcut conditional expression

2023-03-15 Thread via GitHub
ulysses-you commented on PR #40446: URL: https://github.com/apache/spark/pull/40446#issuecomment-1471123749 cc @viirya @cloud-fan thank you -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] cloud-fan closed pull request #40396: [SPARK-42772][SQL] Change the default value of JDBC options about push down to true

2023-03-15 Thread via GitHub
cloud-fan closed pull request #40396: [SPARK-42772][SQL] Change the default value of JDBC options about push down to true URL: https://github.com/apache/spark/pull/40396 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

[GitHub] [spark] cloud-fan commented on pull request #40396: [SPARK-42772][SQL] Change the default value of JDBC options about push down to true

2023-03-15 Thread via GitHub
cloud-fan commented on PR #40396: URL: https://github.com/apache/spark/pull/40396#issuecomment-1471116456 thanks, merging to master! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] dongjoon-hyun commented on a diff in pull request #40447: [SPARK-42816][CONNECT] Support Max Message size up to 128MB

2023-03-15 Thread via GitHub
dongjoon-hyun commented on code in PR #40447: URL: https://github.com/apache/spark/pull/40447#discussion_r1137915076 ## python/pyspark/sql/connect/client.py: ## @@ -122,6 +122,8 @@ class ChannelBuilder: PARAM_TOKEN = "token" PARAM_USER_ID = "user_id"

[GitHub] [spark] dongjoon-hyun commented on a diff in pull request #40447: [SPARK-42816][CONNECT] Support Max Message size up to 128MB

2023-03-15 Thread via GitHub
dongjoon-hyun commented on code in PR #40447: URL: https://github.com/apache/spark/pull/40447#discussion_r1137915076 ## python/pyspark/sql/connect/client.py: ## @@ -122,6 +122,8 @@ class ChannelBuilder: PARAM_TOKEN = "token" PARAM_USER_ID = "user_id"

[GitHub] [spark] dongjoon-hyun commented on a diff in pull request #40447: [SPARK-42816][CONNECT] Support Max Message size up to 128MB

2023-03-15 Thread via GitHub
dongjoon-hyun commented on code in PR #40447: URL: https://github.com/apache/spark/pull/40447#discussion_r1137914674 ## connector/connect/server/src/main/scala/org/apache/spark/sql/connect/config/Connect.scala: ## @@ -47,6 +47,13 @@ object Connect {

[GitHub] [spark] github-actions[bot] commented on pull request #37725: [DO-NOT-MERGE] Exceptions without error classes in SQL golden files

2023-03-15 Thread via GitHub
github-actions[bot] commented on PR #37725: URL: https://github.com/apache/spark/pull/37725#issuecomment-1471019944 We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.

[GitHub] [spark] HyukjinKwon commented on pull request #40448: [SPARK-42817][CORE] Logging the shuffle service name once in ApplicationMaster

2023-03-15 Thread via GitHub
HyukjinKwon commented on PR #40448: URL: https://github.com/apache/spark/pull/40448#issuecomment-1471014495 AppVeyor failure (`continuous-integration/appveyor/pr`) should be fine to ignore for now. -- This is an automated message from the Apache Git Service. To respond to the message,

[GitHub] [spark] dongjoon-hyun commented on a diff in pull request #40444: [SPARK-42813][K8S] Print application info when waitAppCompletion is false

2023-03-15 Thread via GitHub
dongjoon-hyun commented on code in PR #40444: URL: https://github.com/apache/spark/pull/40444#discussion_r1137906960 ## resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/submit/LoggingPodStatusWatcher.scala: ## @@ -95,8 +95,8 @@ private[k8s] class

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #40432: [SPARK-42800][CONNECT][PYTHON][ML] Implement ml function `{array_to_vector, vector_to_array}`

2023-03-15 Thread via GitHub
HyukjinKwon commented on code in PR #40432: URL: https://github.com/apache/spark/pull/40432#discussion_r1137906670 ## python/pyspark/ml/tests/connect/test_connect_function.py: ## @@ -0,0 +1,113 @@ +# Review Comment: I think you can remove this test - I believe the doctests

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #40432: [SPARK-42800][CONNECT][PYTHON][ML] Implement ml function `{array_to_vector, vector_to_array}`

2023-03-15 Thread via GitHub
HyukjinKwon commented on code in PR #40432: URL: https://github.com/apache/spark/pull/40432#discussion_r1137906453 ## dev/sparktestsupport/modules.py: ## @@ -655,6 +655,7 @@ def __hash__(self): "pyspark.ml.tests.test_wrapper",

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #40447: [SPARK-42816] Support Max Message size up to 128MB

2023-03-15 Thread via GitHub
HyukjinKwon commented on code in PR #40447: URL: https://github.com/apache/spark/pull/40447#discussion_r1137904888 ## connector/connect/server/src/main/scala/org/apache/spark/sql/connect/config/Connect.scala: ## @@ -47,6 +47,13 @@ object Connect {

[GitHub] [spark] HyukjinKwon closed pull request #40442: [SPARK-42809][BUILD] Upgrade scala-maven-plugin from 4.8.0 to 4.8.1

2023-03-15 Thread via GitHub
HyukjinKwon closed pull request #40442: [SPARK-42809][BUILD] Upgrade scala-maven-plugin from 4.8.0 to 4.8.1 URL: https://github.com/apache/spark/pull/40442 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[GitHub] [spark] HyukjinKwon commented on pull request #40442: [SPARK-42809][BUILD] Upgrade scala-maven-plugin from 4.8.0 to 4.8.1

2023-03-15 Thread via GitHub
HyukjinKwon commented on PR #40442: URL: https://github.com/apache/spark/pull/40442#issuecomment-1471000337 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] dtenedor commented on pull request #40449: [SPARK-42791][SQL] Create a new golden file test framework for analysis

2023-03-15 Thread via GitHub
dtenedor commented on PR #40449: URL: https://github.com/apache/spark/pull/40449#issuecomment-1470971482 @gengliangwang alright I made this change, please look again when you are ready. -- This is an automated message from the Apache Git Service. To respond to the message, please log on

[GitHub] [spark] ueshin opened a new pull request, #40450: [SPARK-42818][CONNECT][PYTHON] Implement DataFrameReader/Writer.jdbc

2023-03-15 Thread via GitHub
ueshin opened a new pull request, #40450: URL: https://github.com/apache/spark/pull/40450 ### What changes were proposed in this pull request? Implements `DataFrameReader/Writer.jdbc`. ### Why are the changes needed? Missing API. ### Does this PR introduce _any_

[GitHub] [spark] gengliangwang commented on pull request #40449: [SPARK-42791][SQL] Create a new golden file test framework for analysis

2023-03-15 Thread via GitHub
gengliangwang commented on PR #40449: URL: https://github.com/apache/spark/pull/40449#issuecomment-1470835260 > I will put the analyzer results in separate files. Sounds great! Thanks for the work! -- This is an automated message from the Apache Git Service. To respond to the

[GitHub] [spark] dtenedor commented on pull request #40449: [SPARK-42791][SQL] Create a new golden file test framework for analysis

2023-03-15 Thread via GitHub
dtenedor commented on PR #40449: URL: https://github.com/apache/spark/pull/40449#issuecomment-1470824639 @gengliangwang from past experience we will want to keep the query plans separate from the SQL results, otherwise the SQL results become hard to read. I will put the analyzer results in

[GitHub] [spark] dtenedor commented on pull request #40449: [SPARK-42791][SQL] Create a new golden file test framework for analysis

2023-03-15 Thread via GitHub
dtenedor commented on PR #40449: URL: https://github.com/apache/spark/pull/40449#issuecomment-1470823193 @gengliangwang Sure, I was thinking about this too. We can reuse the same input SQL query files if we want, and just generate and test against different analyzer test output files. Let

[GitHub] [spark] rithwik-db closed pull request #40423: [SPARK-41775][PYTHON][FOLLOW-UP] Torch distributor multiple gpus per task

2023-03-15 Thread via GitHub
rithwik-db closed pull request #40423: [SPARK-41775][PYTHON][FOLLOW-UP] Torch distributor multiple gpus per task URL: https://github.com/apache/spark/pull/40423 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[GitHub] [spark] rithwik-db commented on pull request #40423: [SPARK-41775][PYTHON][FOLLOW-UP] Torch distributor multiple gpus per task

2023-03-15 Thread via GitHub
rithwik-db commented on PR #40423: URL: https://github.com/apache/spark/pull/40423#issuecomment-1470815851 This ticket will be closed for now; related changes may be in a V2 of the TorchDistributor -- This is an automated message from the Apache Git Service. To respond to the message,

[GitHub] [spark] steveloughran commented on pull request #39124: [DON'T MERGE] Test build and test with hadoop 3.3.5-RC2

2023-03-15 Thread via GitHub
steveloughran commented on PR #39124: URL: https://github.com/apache/spark/pull/39124#issuecomment-1470813750 got a new RC up to play with...hopefully RC3 will ship. main changes are fixes to some HDFS cases which can trigger NPEs -- This is an automated message from the Apache Git

[GitHub] [spark] bjornjorgensen commented on pull request #40431: [SPARK-42799][BUILD] Update SBT build `xercesImpl` version to match with `pom.xml`

2023-03-15 Thread via GitHub
bjornjorgensen commented on PR #40431: URL: https://github.com/apache/spark/pull/40431#issuecomment-1470803805 CC @panbingkun so you too are aware of this and hopefully don't make the same mistake I did. -- This is an automated message from the Apache Git Service. To respond to the

[GitHub] [spark] gengliangwang commented on pull request #40449: [SPARK-42791][SQL] Create a new golden file test framework for analysis

2023-03-15 Thread via GitHub
gengliangwang commented on PR #40449: URL: https://github.com/apache/spark/pull/40449#issuecomment-1470801546 @dtenedor Since we already have `SQLQueryTestSuite` which has good basic Spark SQL features coverage, shall we combine both? E.g. let `SQLQueryTestSuite` show analyzed

[GitHub] [spark] dtenedor commented on pull request #40449: [SPARK-42791][SQL] Create a new golden file test framework for analysis

2023-03-15 Thread via GitHub
dtenedor commented on PR #40449: URL: https://github.com/apache/spark/pull/40449#issuecomment-1470712114 Hi @gengliangwang this should be ready for a first look! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[GitHub] [spark] MaxGekk commented on pull request #39239: [SPARK-41730][PYTHON] Set tz to UTC while converting of timestamps to python's datetime

2023-03-15 Thread via GitHub
MaxGekk commented on PR #39239: URL: https://github.com/apache/spark/pull/39239#issuecomment-1470569694 > So the problem here would be implementation detail. @HyukjinKwon I think this is not impl details but a fundamental problem of `datetime`, especially in the corner case of

[GitHub] [spark] NarekDW commented on pull request #40422: [SPARK-42803][CORE][SQL][ML] Use getParameterCount function instead of getParameterTypes.length

2023-03-15 Thread via GitHub
NarekDW commented on PR #40422: URL: https://github.com/apache/spark/pull/40422#issuecomment-1470557942 > @NarekDW Are there any more similar cases? > > cc @srowen FYI @LuciferYang no, these are all -- This is an automated message from the Apache Git Service. To respond to

[GitHub] [spark] MaxGekk commented on a diff in pull request #39239: [SPARK-41730][PYTHON] Set tz to UTC while converting of timestamps to python's datetime

2023-03-15 Thread via GitHub
MaxGekk commented on code in PR #39239: URL: https://github.com/apache/spark/pull/39239#discussion_r1137571293 ## python/pyspark/sql/types.py: ## @@ -276,7 +276,18 @@ def toInternal(self, dt: datetime.datetime) -> int: def fromInternal(self, ts: int) -> datetime.datetime:

[GitHub] [spark] dtenedor opened a new pull request, #40449: [SPARK-42791][SQL] Create a new golden file test framework for analysis

2023-03-15 Thread via GitHub
dtenedor opened a new pull request, #40449: URL: https://github.com/apache/spark/pull/40449 ### What changes were proposed in this pull request? This PR creates a new `SQLAnalyzerTestSuite` that consumes input SQL queries from files and then performs analysis and generates the string

[GitHub] [spark] otterc commented on pull request #40393: [SPARK-40082] Schedule mergeFinalize when push merge shuffleMapStage retry but no running tasks

2023-03-15 Thread via GitHub
otterc commented on PR #40393: URL: https://github.com/apache/spark/pull/40393#issuecomment-1470501033 @akpatnam25 @shuwang21 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] otterc commented on pull request #40448: Logging the shuffle service name once in ApplicationMaster

2023-03-15 Thread via GitHub
otterc commented on PR #40448: URL: https://github.com/apache/spark/pull/40448#issuecomment-1470498081 @mridulm @xkrogen @akpatnam25 @shuwang21 Please help review. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

[GitHub] [spark] otterc opened a new pull request, #40448: Logging the shuffle service name once in ApplicationMaster

2023-03-15 Thread via GitHub
otterc opened a new pull request, #40448: URL: https://github.com/apache/spark/pull/40448 ### What changes were proposed in this pull request? Removed the logging of shuffle service name multiple times in the driver log. It gets logged everytime a new executor is allocated. ###

[GitHub] [spark] LuciferYang commented on pull request #40395: [SPARK-42770][CONNECT] Add `truncatedTo(ChronoUnit.MICROS)` to make `SQLImplicitsTestSuite` in Java 17 daily test GA task pass

2023-03-15 Thread via GitHub
LuciferYang commented on PR #40395: URL: https://github.com/apache/spark/pull/40395#issuecomment-1470433831 https://github.com/apache/spark/actions/runs/4420600519 https://user-images.githubusercontent.com/1475305/225388240-9f85593f-f6d6-47dd-be07-9ab906bf53a8.png;> The latest

[GitHub] [spark] pan3793 commented on a diff in pull request #40447: [SPARK-42816] Support Max Message size up to 128MB

2023-03-15 Thread via GitHub
pan3793 commented on code in PR #40447: URL: https://github.com/apache/spark/pull/40447#discussion_r1137446054 ## connector/connect/server/src/main/scala/org/apache/spark/sql/connect/config/Connect.scala: ## @@ -47,6 +47,13 @@ object Connect { .bytesConf(ByteUnit.MiB)

[GitHub] [spark] jdferreira commented on pull request #40398: [MINOR][DOCS] Update `translate` docblock

2023-03-15 Thread via GitHub
jdferreira commented on PR #40398: URL: https://github.com/apache/spark/pull/40398#issuecomment-1470420003 @srowen I haev enabled it, but now I don't know how to progress. Is there a "re-run" button to re-trigger the build? Or do I push an empty commit into this branch? -- This is an

[GitHub] [spark] pan3793 commented on a diff in pull request #40447: [SPARK-42816] Support Max Message size up to 128MB

2023-03-15 Thread via GitHub
pan3793 commented on code in PR #40447: URL: https://github.com/apache/spark/pull/40447#discussion_r1137446054 ## connector/connect/server/src/main/scala/org/apache/spark/sql/connect/config/Connect.scala: ## @@ -47,6 +47,13 @@ object Connect { .bytesConf(ByteUnit.MiB)

[GitHub] [spark] pan3793 commented on a diff in pull request #40444: [SPARK-42813][K8S] Print application info when waitAppCompletion is false

2023-03-15 Thread via GitHub
pan3793 commented on code in PR #40444: URL: https://github.com/apache/spark/pull/40444#discussion_r1137429300 ## resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/submit/LoggingPodStatusWatcher.scala: ## @@ -95,8 +95,8 @@ private[k8s] class

[GitHub] [spark] grundprinzip opened a new pull request, #40447: [SPARK-42816] Support Max Message size up to 128MB

2023-03-15 Thread via GitHub
grundprinzip opened a new pull request, #40447: URL: https://github.com/apache/spark/pull/40447 ### What changes were proposed in this pull request? This change lifts the default message size of 4MB to 128MB and makes it configurable. While 128MB is a "random number" it supports

[GitHub] [spark] ritikam2 commented on pull request #40116: [SPARK-41391][SQL] The output column name of groupBy.agg(count_distinct) is incorrect

2023-03-15 Thread via GitHub
ritikam2 commented on PR #40116: URL: https://github.com/apache/spark/pull/40116#issuecomment-1470374550 Can anyone tell me how I am getting this single quote in count expression. Attaching the picture. This can potentially cause problems down the lance where tree nodes are compared in the

[GitHub] [spark] dongjoon-hyun commented on a diff in pull request #40396: [SPARK-42772][SQL] Change the default value of JDBC options about push down to true

2023-03-15 Thread via GitHub
dongjoon-hyun commented on code in PR #40396: URL: https://github.com/apache/spark/pull/40396#discussion_r1137367307 ## sql/core/src/main/scala/org/apache/spark/sql/jdbc/JdbcDialects.scala: ## @@ -583,12 +583,16 @@ abstract class JdbcDialect extends Serializable with Logging {

[GitHub] [spark] dongjoon-hyun commented on pull request #40410: [SPARK-42783][SQL] Infer window group limit should run as late as possible

2023-03-15 Thread via GitHub
dongjoon-hyun commented on PR #40410: URL: https://github.com/apache/spark/pull/40410#issuecomment-1470311818 Thank you, @beliefer and @cloud-fan . -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

[GitHub] [spark] dongjoon-hyun commented on a diff in pull request #40358: [SPARK-42733][CONNECT][Followup] Write without path or table

2023-03-15 Thread via GitHub
dongjoon-hyun commented on code in PR #40358: URL: https://github.com/apache/spark/pull/40358#discussion_r1137339742 ## connector/connect/client/jvm/src/test/scala/org/apache/spark/sql/ClientE2ETestSuite.scala: ## @@ -175,6 +176,26 @@ class ClientE2ETestSuite extends

[GitHub] [spark] dongjoon-hyun commented on a diff in pull request #40358: [SPARK-42733][CONNECT][Followup] Write without path or table

2023-03-15 Thread via GitHub
dongjoon-hyun commented on code in PR #40358: URL: https://github.com/apache/spark/pull/40358#discussion_r1137339742 ## connector/connect/client/jvm/src/test/scala/org/apache/spark/sql/ClientE2ETestSuite.scala: ## @@ -175,6 +176,26 @@ class ClientE2ETestSuite extends

[GitHub] [spark] dongjoon-hyun commented on a diff in pull request #40444: [SPARK-42813][K8S] Print application info when waitAppCompletion is false

2023-03-15 Thread via GitHub
dongjoon-hyun commented on code in PR #40444: URL: https://github.com/apache/spark/pull/40444#discussion_r1137331510 ## resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/submit/LoggingPodStatusWatcher.scala: ## @@ -95,8 +95,8 @@ private[k8s] class

[GitHub] [spark] dongjoon-hyun commented on a diff in pull request #40444: [SPARK-42813][K8S] Print application info when waitAppCompletion is false

2023-03-15 Thread via GitHub
dongjoon-hyun commented on code in PR #40444: URL: https://github.com/apache/spark/pull/40444#discussion_r1137331510 ## resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/submit/LoggingPodStatusWatcher.scala: ## @@ -95,8 +95,8 @@ private[k8s] class

[GitHub] [spark] ulysses-you opened a new pull request, #40446: [SPARK-42815][SQL] Subexpression elimination support shortcut conditional expression

2023-03-15 Thread via GitHub
ulysses-you opened a new pull request, #40446: URL: https://github.com/apache/spark/pull/40446 ### What changes were proposed in this pull request? Add a new config to shortcut subexpression elimination for conditional expression. The subexpression in conditional

[GitHub] [spark] LuciferYang opened a new pull request, #40445: [SPARK-42814][BUILD] Upgrade some maven plugins

2023-03-15 Thread via GitHub
LuciferYang opened a new pull request, #40445: URL: https://github.com/apache/spark/pull/40445 ### What changes were proposed in this pull request? This pr aims to upgrade the following maven plugins - maven-enforcer-plugin 3.0.0-M2 -> 3.2.1 - build-helper-maven-plugin 3.2.0 ->

[GitHub] [spark] pan3793 commented on pull request #40444: [SPARK-42813][K8S] Print application info when waitAppCompletion is false

2023-03-15 Thread via GitHub
pan3793 commented on PR #40444: URL: https://github.com/apache/spark/pull/40444#issuecomment-1470098995 cc @slothspot @dongjoon-hyun @yaooqinn, please take a look when you get time, thanks. -- This is an automated message from the Apache Git Service. To respond to the message, please log

  1   2   3   >