[GitHub] [spark] wangyum commented on a diff in pull request #39512: [SPARK-41986][SQL] Introduce shuffle on SinglePartition

2023-01-11 Thread GitBox
wangyum commented on code in PR #39512: URL: https://github.com/apache/spark/pull/39512#discussion_r1067797166 ## sql/core/src/test/scala/org/apache/spark/sql/CachedTableSuite.scala: ## @@ -541,7 +541,8 @@ class CachedTableSuite extends QueryTest with SQLTestUtils

[GitHub] [spark] viirya commented on pull request #39508: [SPARK-41985][SQL] Centralize more column resolution rules

2023-01-11 Thread GitBox
viirya commented on PR #39508: URL: https://github.com/apache/spark/pull/39508#issuecomment-1379904716 There are some conflicts. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] mridulm commented on pull request #37638: [SPARK-33573][SHUFFLE][YARN] Shuffle server side metrics for Push-based shuffle

2023-01-11 Thread GitBox
mridulm commented on PR #37638: URL: https://github.com/apache/spark/pull/37638#issuecomment-1379889561 Merged to master. Thanks for working on this @rmcyang ! Thanks for the reviews @zhouyejoe, @otterc :-) -- This is an automated message from the Apache Git Service. To respond to

[GitHub] [spark] asfgit closed pull request #37638: [SPARK-33573][SHUFFLE][YARN] Shuffle server side metrics for Push-based shuffle

2023-01-11 Thread GitBox
asfgit closed pull request #37638: [SPARK-33573][SHUFFLE][YARN] Shuffle server side metrics for Push-based shuffle URL: https://github.com/apache/spark/pull/37638 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[GitHub] [spark] techaddict commented on pull request #39451: [SPARK-41832][CONNECT][PYTHON] Fix `DataFrame.unionByName`, add allow_missing_columns

2023-01-11 Thread GitBox
techaddict commented on PR #39451: URL: https://github.com/apache/spark/pull/39451#issuecomment-1379881170 Cc @amaliujia @HyukjinKwon @zhengruifeng can you review this PR ? I think its a starightforward change. -- This is an automated message from the Apache Git Service. To respond to

[GitHub] [spark] HyukjinKwon closed pull request #39522: [SPARK-41998][CONNECT][TESTS] Reuse pyspark.sql.tests.test_readwriter test cases

2023-01-11 Thread GitBox
HyukjinKwon closed pull request #39522: [SPARK-41998][CONNECT][TESTS] Reuse pyspark.sql.tests.test_readwriter test cases URL: https://github.com/apache/spark/pull/39522 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

[GitHub] [spark] HyukjinKwon closed pull request #39521: [SPARK-41887][CONNECT][TESTS][FOLLOW-UP] Enable test_extended_hint_types test case

2023-01-11 Thread GitBox
HyukjinKwon closed pull request #39521: [SPARK-41887][CONNECT][TESTS][FOLLOW-UP] Enable test_extended_hint_types test case URL: https://github.com/apache/spark/pull/39521 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

[GitHub] [spark] HyukjinKwon commented on pull request #39522: [SPARK-41998][CONNECT][TESTS] Reuse pyspark.sql.tests.test_readwriter test cases

2023-01-11 Thread GitBox
HyukjinKwon commented on PR #39522: URL: https://github.com/apache/spark/pull/39522#issuecomment-1379874535 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] HyukjinKwon commented on pull request #39521: [SPARK-41887][CONNECT][TESTS][FOLLOW-UP] Enable test_extended_hint_types test case

2023-01-11 Thread GitBox
HyukjinKwon commented on PR #39521: URL: https://github.com/apache/spark/pull/39521#issuecomment-1379874451 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] LuciferYang commented on pull request #39435: [SPARK-41926][UI][TESTS] Add Github action test job with RocksDB as UI backend

2023-01-11 Thread GitBox
LuciferYang commented on PR #39435: URL: https://github.com/apache/spark/pull/39435#issuecomment-1379870165 Manually checked the core module with this pr as follows: ``` gh pr checkout 39435 export LIVE_UI_LOCAL_STORE_DIR=/Users/yangjie01/SourceCode/spark-ui build/mvn clean

[GitHub] [spark] kuwii commented on pull request #39190: [SPARK-41683][CORE] Fix issue of getting incorrect property numActiveStages in jobs API

2023-01-11 Thread GitBox
kuwii commented on PR #39190: URL: https://github.com/apache/spark/pull/39190#issuecomment-1379868894 > Hi. this impacts Jobs API so this is a user facing change right? @VindhyaG Thanks for the comment. I've updated the PR description. -- This is an automated message from the

[GitHub] [spark] LuciferYang commented on pull request #39487: [SPARK-41968][CORE][SQL] Refactor `ProtobufSerDe` to `ProtobufSerDe[T]`

2023-01-11 Thread GitBox
LuciferYang commented on PR #39487: URL: https://github.com/apache/spark/pull/39487#issuecomment-1379868522 Thanks @gengliangwang -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] gengliangwang closed pull request #39487: [SPARK-41968][CORE][SQL] Refactor `ProtobufSerDe` to `ProtobufSerDe[T]`

2023-01-11 Thread GitBox
gengliangwang closed pull request #39487: [SPARK-41968][CORE][SQL] Refactor `ProtobufSerDe` to `ProtobufSerDe[T]` URL: https://github.com/apache/spark/pull/39487 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[GitHub] [spark] gengliangwang commented on pull request #39487: [SPARK-41968][CORE][SQL] Refactor `ProtobufSerDe` to `ProtobufSerDe[T]`

2023-01-11 Thread GitBox
gengliangwang commented on PR #39487: URL: https://github.com/apache/spark/pull/39487#issuecomment-1379867822 @LuciferYang Thanks for the work. Merging to master -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

[GitHub] [spark] mridulm commented on a diff in pull request #37922: [SPARK-40480][SHUFFLE] Remove push-based shuffle data after query finished

2023-01-11 Thread GitBox
mridulm commented on code in PR #37922: URL: https://github.com/apache/spark/pull/37922#discussion_r1067741652 ## core/src/main/scala/org/apache/spark/storage/BlockManagerMasterEndpoint.scala: ## @@ -321,6 +321,12 @@ class BlockManagerMasterEndpoint( } private def

[GitHub] [spark] gengliangwang commented on a diff in pull request #39487: [SPARK-41968][CORE][SQL] Refactor `ProtobufSerDe` to `ProtobufSerDe[T]`

2023-01-11 Thread GitBox
gengliangwang commented on code in PR #39487: URL: https://github.com/apache/spark/pull/39487#discussion_r1067741516 ## core/src/main/scala/org/apache/spark/status/protobuf/KVStoreProtobufSerializer.scala: ## @@ -40,10 +41,16 @@ private[spark] class KVStoreProtobufSerializer

[GitHub] [spark] HyukjinKwon opened a new pull request, #39529: [SPARK-42019][CONNECT][TESTS] Reuse pyspark.sql.tests.test_types test cases

2023-01-11 Thread GitBox
HyukjinKwon opened a new pull request, #39529: URL: https://github.com/apache/spark/pull/39529 ### What changes were proposed in this pull request? This PR reuses PySpark `pyspark.sql.tests.test_types` tests in Spark Connect that pass for now. ### Why are the changes needed?

[GitHub] [spark] mridulm commented on a diff in pull request #37922: [SPARK-40480][SHUFFLE] Remove push-based shuffle data after query finished

2023-01-11 Thread GitBox
mridulm commented on code in PR #37922: URL: https://github.com/apache/spark/pull/37922#discussion_r1067733179 ## core/src/main/scala/org/apache/spark/storage/BlockManagerMasterEndpoint.scala: ## @@ -321,6 +321,12 @@ class BlockManagerMasterEndpoint( } private def

[GitHub] [spark] HyukjinKwon opened a new pull request, #39528: [SPARK-42010][CONNECT][TESTS] Reuse pyspark.sql.tests.test_column test cases

2023-01-11 Thread GitBox
HyukjinKwon opened a new pull request, #39528: URL: https://github.com/apache/spark/pull/39528 ### What changes were proposed in this pull request? This PR reuses PySpark `pyspark.sql.tests.test_column` tests in Spark Connect that pass for now. ### Why are the changes needed?

[GitHub] [spark] HyukjinKwon opened a new pull request, #39527: [SPARK-42009][CONNECT][TESTS] Reuse pyspark.sql.tests.test_serde test cases

2023-01-11 Thread GitBox
HyukjinKwon opened a new pull request, #39527: URL: https://github.com/apache/spark/pull/39527 ### What changes were proposed in this pull request? This PR reuses PySpark `pyspark.sql.tests.test_serde` tests in Spark Connect that pass for now. ### Why are the changes needed?

[GitHub] [spark] HyukjinKwon opened a new pull request, #39526: [SPARK-42008][CONNECT][TESTS] Reuse pyspark.sql.tests.test_datasources test cases

2023-01-11 Thread GitBox
HyukjinKwon opened a new pull request, #39526: URL: https://github.com/apache/spark/pull/39526 ### What changes were proposed in this pull request? This PR reuses PySpark `pyspark.sql.tests.test_datasources` tests in Spark Connect that pass for now. ### Why are the changes

[GitHub] [spark] HyukjinKwon opened a new pull request, #39525: [SPARK-42007][CONNECT][TESTS] Reuse pyspark.sql.tests.test_group test cases

2023-01-11 Thread GitBox
HyukjinKwon opened a new pull request, #39525: URL: https://github.com/apache/spark/pull/39525 ### What changes were proposed in this pull request? This PR reuses PySpark `pyspark.sql.tests.test_group` tests in Spark Connect that pass for now. ### Why are the changes needed?

[GitHub] [spark] wankunde commented on a diff in pull request #37922: [SPARK-40480][SHUFFLE] Remove push-based shuffle data after query finished

2023-01-11 Thread GitBox
wankunde commented on code in PR #37922: URL: https://github.com/apache/spark/pull/37922#discussion_r1067694549 ## core/src/main/scala/org/apache/spark/storage/BlockManagerMasterEndpoint.scala: ## @@ -321,6 +321,12 @@ class BlockManagerMasterEndpoint( } private def

[GitHub] [spark] cloud-fan commented on a diff in pull request #39037: [SPARK-41214][SQL] Fix AQE cache does not update plan and metrics

2023-01-11 Thread GitBox
cloud-fan commented on code in PR #39037: URL: https://github.com/apache/spark/pull/39037#discussion_r1067691706 ## sql/core/src/test/scala/org/apache/spark/sql/execution/adaptive/AdaptiveQueryExecSuite.scala: ## @@ -2693,6 +2694,21 @@ class AdaptiveQueryExecSuite

[GitHub] [spark] HeartSaVioR closed pull request #39520: [SPARK-41996][SQL][SS] Fix kafka test to verify lost partitions to account for slow Kafka operations

2023-01-11 Thread GitBox
HeartSaVioR closed pull request #39520: [SPARK-41996][SQL][SS] Fix kafka test to verify lost partitions to account for slow Kafka operations URL: https://github.com/apache/spark/pull/39520 -- This is an automated message from the Apache Git Service. To respond to the message, please log on

[GitHub] [spark] HeartSaVioR commented on pull request #39520: [SPARK-41996][SQL][SS] Fix kafka test to verify lost partitions to account for slow Kafka operations

2023-01-11 Thread GitBox
HeartSaVioR commented on PR #39520: URL: https://github.com/apache/spark/pull/39520#issuecomment-1379775649 Thanks! Merging to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] itholic commented on pull request #39496: [SPARK-41974][SQL] Turn `INCORRECT_END_OFFSET` into `INTERNAL_ERROR`

2023-01-11 Thread GitBox
itholic commented on PR #39496: URL: https://github.com/apache/spark/pull/39496#issuecomment-1379771335 @srielau Yeah, that makes sense. Just created [OSS ticket](https://issues.apache.org/jira/browse/SPARK-42004) to handle this. Thanks!! -- This is an automated message from the Apache

[GitHub] [spark] cloud-fan commented on a diff in pull request #39517: [SPARK-41993][SQL] Move RowEncoder to AgnosticEncoders

2023-01-11 Thread GitBox
cloud-fan commented on code in PR #39517: URL: https://github.com/apache/spark/pull/39517#discussion_r1067668533 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/encoders/AgnosticEncoder.scala: ## @@ -46,35 +46,42 @@ object AgnosticEncoders { override val

[GitHub] [spark] cloud-fan commented on a diff in pull request #39523: [SPARK-42003][SQL] Reduce duplicate code in ResolveGroupByAll

2023-01-11 Thread GitBox
cloud-fan commented on code in PR #39523: URL: https://github.com/apache/spark/pull/39523#discussion_r1067665831 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveGroupByAll.scala: ## @@ -47,25 +47,40 @@ object ResolveGroupByAll extends

[GitHub] [spark] gengliangwang commented on a diff in pull request #39509: [SPARK-41635][SQL] Fix group by all error reporting

2023-01-11 Thread GitBox
gengliangwang commented on code in PR #39509: URL: https://github.com/apache/spark/pull/39509#discussion_r1067663382 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveGroupByAll.scala: ## @@ -93,8 +93,9 @@ object ResolveGroupByAll extends

[GitHub] [spark] panbingkun opened a new pull request, #39524: [WIP][SPARK-41990][SQL] Fix bug for FieldReference

2023-01-11 Thread GitBox
panbingkun opened a new pull request, #39524: URL: https://github.com/apache/spark/pull/39524 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? No. ### How was this patch

[GitHub] [spark] gengliangwang opened a new pull request, #39523: [SPARK-42003][SQL] Reduce duplicate code in ResolveGroupByAll

2023-01-11 Thread GitBox
gengliangwang opened a new pull request, #39523: URL: https://github.com/apache/spark/pull/39523 ### What changes were proposed in this pull request? Reduce duplicate code in ResolveGroupByAll by moving the group by expression inference into a new method. ### Why are

[GitHub] [spark] rangareddy commented on pull request #39515: [SPARK-38743][SQL][TEST] Test the error class: MISSING_STATIC_PARTITION_COLUMNAdding test case for Missing Static Partition Column

2023-01-11 Thread GitBox
rangareddy commented on PR #39515: URL: https://github.com/apache/spark/pull/39515#issuecomment-1379755267 Hi @maxgekk Could you please review this PR? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

[GitHub] [spark] cloud-fan commented on a diff in pull request #39517: [SPARK-41993][SQL] Move RowEncoder to AgnosticEncoders

2023-01-11 Thread GitBox
cloud-fan commented on code in PR #39517: URL: https://github.com/apache/spark/pull/39517#discussion_r1067657414 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/ScalaReflection.scala: ## @@ -377,27 +408,96 @@ object ScalaReflection extends ScalaReflection {

[GitHub] [spark] cloud-fan commented on a diff in pull request #39517: [SPARK-41993][SQL] Move RowEncoder to AgnosticEncoders

2023-01-11 Thread GitBox
cloud-fan commented on code in PR #39517: URL: https://github.com/apache/spark/pull/39517#discussion_r1067657104 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/ScalaReflection.scala: ## @@ -155,11 +169,19 @@ object ScalaReflection extends ScalaReflection {

[GitHub] [spark] cloud-fan commented on a diff in pull request #39517: [SPARK-41993][SQL] Move RowEncoder to AgnosticEncoders

2023-01-11 Thread GitBox
cloud-fan commented on code in PR #39517: URL: https://github.com/apache/spark/pull/39517#discussion_r1067656569 ## sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/encoders/RowEncoderSuite.scala: ## @@ -125,7 +125,7 @@ class RowEncoderSuite extends

[GitHub] [spark] cloud-fan commented on a diff in pull request #39512: [SPARK-41986][SQL] Introduce shuffle on SinglePartition

2023-01-11 Thread GitBox
cloud-fan commented on code in PR #39512: URL: https://github.com/apache/spark/pull/39512#discussion_r1067653257 ## sql/core/src/main/scala/org/apache/spark/sql/execution/exchange/EnsureRequirements.scala: ## @@ -76,13 +76,17 @@ case class EnsureRequirements( case _ =>

[GitHub] [spark] cloud-fan commented on a diff in pull request #39512: [SPARK-41986][SQL] Introduce shuffle on SinglePartition

2023-01-11 Thread GitBox
cloud-fan commented on code in PR #39512: URL: https://github.com/apache/spark/pull/39512#discussion_r1067652958 ## sql/core/src/test/scala/org/apache/spark/sql/CachedTableSuite.scala: ## @@ -541,7 +541,8 @@ class CachedTableSuite extends QueryTest with SQLTestUtils

[GitHub] [spark] LuciferYang commented on a diff in pull request #39487: [SPARK-41968][CORE][SQL] Refactor `ProtobufSerDe` to `ProtobufSerDe[T]`

2023-01-11 Thread GitBox
LuciferYang commented on code in PR #39487: URL: https://github.com/apache/spark/pull/39487#discussion_r1067649883 ## core/src/main/scala/org/apache/spark/status/protobuf/KVStoreProtobufSerializer.scala: ## @@ -40,10 +41,16 @@ private[spark] class KVStoreProtobufSerializer

[GitHub] [spark] HyukjinKwon opened a new pull request, #39522: [SPARK-41998][CONNECT][TESTS] Reeuse pyspark.sql.tests.test_readwriter test cases

2023-01-11 Thread GitBox
HyukjinKwon opened a new pull request, #39522: URL: https://github.com/apache/spark/pull/39522 ### What changes were proposed in this pull request? This PR reuses PySpark `pyspark.sql.tests.test_readwriter` tests in Spark Connect that pass for now. ### Why are the changes

[GitHub] [spark] LuciferYang commented on a diff in pull request #39487: [SPARK-41968][CORE][SQL] Refactor `ProtobufSerDe` to `ProtobufSerDe[T]`

2023-01-11 Thread GitBox
LuciferYang commented on code in PR #39487: URL: https://github.com/apache/spark/pull/39487#discussion_r1067650529 ## core/src/main/scala/org/apache/spark/status/protobuf/KVStoreProtobufSerializer.scala: ## @@ -40,10 +41,16 @@ private[spark] class KVStoreProtobufSerializer

[GitHub] [spark] LuciferYang commented on a diff in pull request #39487: [SPARK-41968][CORE][SQL] Refactor `ProtobufSerDe` to `ProtobufSerDe[T]`

2023-01-11 Thread GitBox
LuciferYang commented on code in PR #39487: URL: https://github.com/apache/spark/pull/39487#discussion_r1067650529 ## core/src/main/scala/org/apache/spark/status/protobuf/KVStoreProtobufSerializer.scala: ## @@ -40,10 +41,16 @@ private[spark] class KVStoreProtobufSerializer

[GitHub] [spark] LuciferYang commented on a diff in pull request #39487: [SPARK-41968][CORE][SQL] Refactor `ProtobufSerDe` to `ProtobufSerDe[T]`

2023-01-11 Thread GitBox
LuciferYang commented on code in PR #39487: URL: https://github.com/apache/spark/pull/39487#discussion_r1067647694 ## core/src/main/scala/org/apache/spark/status/protobuf/KVStoreProtobufSerializer.scala: ## @@ -40,10 +41,16 @@ private[spark] class KVStoreProtobufSerializer

[GitHub] [spark] LuciferYang commented on a diff in pull request #39487: [SPARK-41968][CORE][SQL] Refactor `ProtobufSerDe` to `ProtobufSerDe[T]`

2023-01-11 Thread GitBox
LuciferYang commented on code in PR #39487: URL: https://github.com/apache/spark/pull/39487#discussion_r1067649883 ## core/src/main/scala/org/apache/spark/status/protobuf/KVStoreProtobufSerializer.scala: ## @@ -40,10 +41,16 @@ private[spark] class KVStoreProtobufSerializer

[GitHub] [spark] LuciferYang commented on a diff in pull request #39487: [SPARK-41968][CORE][SQL] Refactor `ProtobufSerDe` to `ProtobufSerDe[T]`

2023-01-11 Thread GitBox
LuciferYang commented on code in PR #39487: URL: https://github.com/apache/spark/pull/39487#discussion_r1067649883 ## core/src/main/scala/org/apache/spark/status/protobuf/KVStoreProtobufSerializer.scala: ## @@ -40,10 +41,16 @@ private[spark] class KVStoreProtobufSerializer

[GitHub] [spark] LuciferYang commented on a diff in pull request #39487: [SPARK-41968][CORE][SQL] Refactor `ProtobufSerDe` to `ProtobufSerDe[T]`

2023-01-11 Thread GitBox
LuciferYang commented on code in PR #39487: URL: https://github.com/apache/spark/pull/39487#discussion_r1067650529 ## core/src/main/scala/org/apache/spark/status/protobuf/KVStoreProtobufSerializer.scala: ## @@ -40,10 +41,16 @@ private[spark] class KVStoreProtobufSerializer

[GitHub] [spark] LuciferYang commented on a diff in pull request #39487: [SPARK-41968][CORE][SQL] Refactor `ProtobufSerDe` to `ProtobufSerDe[T]`

2023-01-11 Thread GitBox
LuciferYang commented on code in PR #39487: URL: https://github.com/apache/spark/pull/39487#discussion_r1067649883 ## core/src/main/scala/org/apache/spark/status/protobuf/KVStoreProtobufSerializer.scala: ## @@ -40,10 +41,16 @@ private[spark] class KVStoreProtobufSerializer

[GitHub] [spark] LuciferYang commented on a diff in pull request #39487: [SPARK-41968][CORE][SQL] Refactor `ProtobufSerDe` to `ProtobufSerDe[T]`

2023-01-11 Thread GitBox
LuciferYang commented on code in PR #39487: URL: https://github.com/apache/spark/pull/39487#discussion_r1067647694 ## core/src/main/scala/org/apache/spark/status/protobuf/KVStoreProtobufSerializer.scala: ## @@ -40,10 +41,16 @@ private[spark] class KVStoreProtobufSerializer

[GitHub] [spark] itholic commented on pull request #39505: [WIP][SPARK-41979][SQL] Add missing dots for error messages in error classes.

2023-01-11 Thread GitBox
itholic commented on PR #39505: URL: https://github.com/apache/spark/pull/39505#issuecomment-1379739624 Let me fix the related tests as well while we're here. Will update the PR description. -- This is an automated message from the Apache Git Service. To respond to the message, please

[GitHub] [spark] rmcyang commented on a diff in pull request #37638: [SPARK-33573][SHUFFLE][YARN] Shuffle server side metrics for Push-based shuffle

2023-01-11 Thread GitBox
rmcyang commented on code in PR #37638: URL: https://github.com/apache/spark/pull/37638#discussion_r1067643931 ## docs/monitoring.md: ## @@ -1421,6 +1421,21 @@ Note: applies to the shuffle service - shuffle-server.usedDirectMemory - shuffle-server.usedHeapMemory +Note:

[GitHub] [spark] mridulm commented on a diff in pull request #37638: [SPARK-33573][SHUFFLE][YARN] Shuffle server side metrics for Push-based shuffle

2023-01-11 Thread GitBox
mridulm commented on code in PR #37638: URL: https://github.com/apache/spark/pull/37638#discussion_r1067641509 ## docs/monitoring.md: ## @@ -1421,6 +1421,21 @@ Note: applies to the shuffle service - shuffle-server.usedDirectMemory - shuffle-server.usedHeapMemory +Note:

[GitHub] [spark] cloud-fan commented on a diff in pull request #39479: [SPARK-41961][SQL] Support table-valued functions with LATERAL

2023-01-11 Thread GitBox
cloud-fan commented on code in PR #39479: URL: https://github.com/apache/spark/pull/39479#discussion_r1067639729 ## sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/OptimizeOneRowRelationSubquerySuite.scala: ## @@ -177,4 +177,27 @@ class

[GitHub] [spark] HyukjinKwon opened a new pull request, #39521: [SPARK-41887][CONNECT][TESTS][FOLLOW-UP] Enable test_extended_hint_types test case

2023-01-11 Thread GitBox
HyukjinKwon opened a new pull request, #39521: URL: https://github.com/apache/spark/pull/39521 ### What changes were proposed in this pull request? This PR is a followup of https://github.com/apache/spark/pull/39491 that enables `test_extended_hint_types` test back by avoiding `_jdf`

[GitHub] [spark] cloud-fan commented on a diff in pull request #39479: [SPARK-41961][SQL] Support table-valued functions with LATERAL

2023-01-11 Thread GitBox
cloud-fan commented on code in PR #39479: URL: https://github.com/apache/spark/pull/39479#discussion_r1067638085 ## sql/core/src/test/resources/sql-tests/inputs/join-lateral.sql: ## @@ -177,6 +177,25 @@ SELECT * FROM t3 JOIN LATERAL (SELECT EXPLODE_OUTER(c2)); SELECT * FROM t3

[GitHub] [spark] mridulm commented on a diff in pull request #37922: [SPARK-40480][SHUFFLE] Remove push-based shuffle data after query finished

2023-01-11 Thread GitBox
mridulm commented on code in PR #37922: URL: https://github.com/apache/spark/pull/37922#discussion_r1067637314 ## common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/ExternalBlockStoreClient.java: ## @@ -256,6 +256,22 @@ public void onFailure(Throwable e) {

[GitHub] [spark] cloud-fan commented on a diff in pull request #39479: [SPARK-41961][SQL] Support table-valued functions with LATERAL

2023-01-11 Thread GitBox
cloud-fan commented on code in PR #39479: URL: https://github.com/apache/spark/pull/39479#discussion_r1067637188 ## sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/OptimizeOneRowRelationSubquerySuite.scala: ## @@ -177,4 +177,27 @@ class

[GitHub] [spark] anishshri-db commented on pull request #39520: [SPARK-41996][SS] Fix kafka test to verify lost partitions to account for slow Kafka operations

2023-01-11 Thread GitBox
anishshri-db commented on PR #39520: URL: https://github.com/apache/spark/pull/39520#issuecomment-1379722527 @HeartSaVioR - please take a look. Thx -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

[GitHub] [spark] cloud-fan closed pull request #39509: [SPARK-41635][SQL] Fix group by all error reporting

2023-01-11 Thread GitBox
cloud-fan closed pull request #39509: [SPARK-41635][SQL] Fix group by all error reporting URL: https://github.com/apache/spark/pull/39509 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] cloud-fan commented on pull request #39509: [SPARK-41635][SQL] Fix group by all error reporting

2023-01-11 Thread GitBox
cloud-fan commented on PR #39509: URL: https://github.com/apache/spark/pull/39509#issuecomment-1379720548 thanks for review, merging to master! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

[GitHub] [spark] cloud-fan commented on a diff in pull request #39509: [SPARK-41635][SQL] Fix group by all error reporting

2023-01-11 Thread GitBox
cloud-fan commented on code in PR #39509: URL: https://github.com/apache/spark/pull/39509#discussion_r1067633765 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveGroupByAll.scala: ## @@ -93,8 +93,9 @@ object ResolveGroupByAll extends

[GitHub] [spark] anishshri-db opened a new pull request, #39520: [SPARK-41996] Fix kafka test to verify lost partitions to account for slow Kafka operations

2023-01-11 Thread GitBox
anishshri-db opened a new pull request, #39520: URL: https://github.com/apache/spark/pull/39520 ### What changes were proposed in this pull request? Fix kafka test to verify lost partitions to account for slow Kafka operations Basically its possible that kafka operations around

[GitHub] [spark] gengliangwang commented on pull request #39435: [SPARK-41926][UI][TESTS] Add Github action test job with RocksDB as UI backend

2023-01-11 Thread GitBox
gengliangwang commented on PR #39435: URL: https://github.com/apache/spark/pull/39435#issuecomment-1379718544 cc @LuciferYang @panbingkun @techaddict as well. I tried with hard coding a rocksdb backend path before commit

[GitHub] [spark] HyukjinKwon closed pull request #39500: [SPARK-41980][CONNECT][TESTS] Enable test_functions_broadcast in functions parity test

2023-01-11 Thread GitBox
HyukjinKwon closed pull request #39500: [SPARK-41980][CONNECT][TESTS] Enable test_functions_broadcast in functions parity test URL: https://github.com/apache/spark/pull/39500 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

[GitHub] [spark] HyukjinKwon commented on pull request #39500: [SPARK-41980][CONNECT][TESTS] Enable test_functions_broadcast in functions parity test

2023-01-11 Thread GitBox
HyukjinKwon commented on PR #39500: URL: https://github.com/apache/spark/pull/39500#issuecomment-1379696692 All related tests passed. Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[GitHub] [spark] eric-maynard commented on pull request #39519: [SPARK-41995][SQL] Accept non-foldable expressions in schema_of_json

2023-01-11 Thread GitBox
eric-maynard commented on PR #39519: URL: https://github.com/apache/spark/pull/39519#issuecomment-1379685811 I see, where should we change to add to the other APIs? It looks like SchemaOfJson is used under the hood by the Scala API. -- This is an automated message from the Apache Git

[GitHub] [spark] erenavsarogullari commented on pull request #39037: [SPARK-41214][SQL] Fix AQE cache does not update plan and metrics

2023-01-11 Thread GitBox
erenavsarogullari commented on PR #39037: URL: https://github.com/apache/spark/pull/39037#issuecomment-1379684356 Thanks @ulysses-you for this fix. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

[GitHub] [spark] wangyum commented on a diff in pull request #39512: [SPARK-41986][SQL] Introduce shuffle on SinglePartition

2023-01-11 Thread GitBox
wangyum commented on code in PR #39512: URL: https://github.com/apache/spark/pull/39512#discussion_r1067605737 ## sql/core/src/main/scala/org/apache/spark/sql/execution/exchange/EnsureRequirements.scala: ## @@ -76,13 +76,17 @@ case class EnsureRequirements( case _ =>

[GitHub] [spark] HyukjinKwon commented on pull request #39499: [SPARK-41977][SPARK-41978][CONNECT] SparkSession.range to take float as arguments

2023-01-11 Thread GitBox
HyukjinKwon commented on PR #39499: URL: https://github.com/apache/spark/pull/39499#issuecomment-1379677495 Yeah .. so technically it should only take `int`s as that's what the method wants. There are a lot of cases like that in PySpark (e.g., `DataFrameReader.jdbc`), and a lot of

[GitHub] [spark] zhengruifeng commented on pull request #39499: [SPARK-41977][SPARK-41978][CONNECT] SparkSession.range to take float as arguments

2023-01-11 Thread GitBox
zhengruifeng commented on PR #39499: URL: https://github.com/apache/spark/pull/39499#issuecomment-1379668005 @dongjoon-hyun good question. `range` in Connect has the same signature as the PySpark's one, which should only accept intergers. But PySpark's implementation doesn't

[GitHub] [spark] zhouyejoe commented on a diff in pull request #37638: [SPARK-33573][SHUFFLE][YARN] Shuffle server side metrics for Push-based shuffle

2023-01-11 Thread GitBox
zhouyejoe commented on code in PR #37638: URL: https://github.com/apache/spark/pull/37638#discussion_r1067596021 ## docs/monitoring.md: ## @@ -1421,6 +1421,21 @@ Note: applies to the shuffle service - shuffle-server.usedDirectMemory - shuffle-server.usedHeapMemory +Note:

[GitHub] [spark] HyukjinKwon commented on pull request #39518: [SPARK-41991][SQL] `CheckOverflowInTableInsert` should accept ExpressionProxy as child

2023-01-11 Thread GitBox
HyukjinKwon commented on PR #39518: URL: https://github.com/apache/spark/pull/39518#issuecomment-1379655842 cc @rednaxelafx @cloud-fan FYI -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] HyukjinKwon commented on pull request #39516: [SPARK-41989][PYTHON] Avoid breaking logging config from pyspark.pandas

2023-01-11 Thread GitBox
HyukjinKwon commented on PR #39516: URL: https://github.com/apache/spark/pull/39516#issuecomment-1379647638 Thanks for the fix @soxofaan. I think it was a mistake, cc @itholic -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #39516: [SPARK-41989][PYTHON] Avoid breaking logging config from pyspark.pandas

2023-01-11 Thread GitBox
HyukjinKwon commented on code in PR #39516: URL: https://github.com/apache/spark/pull/39516#discussion_r1067585079 ## python/pyspark/pandas/__init__.py: ## @@ -49,7 +49,8 @@ ): import logging -logging.warning( +logger = logging.getLogger(__name__) +

[GitHub] [spark] HyukjinKwon commented on pull request #39519: [SPARK-41995][SQL] Accept non-foldable expressions in schema_of_json

2023-01-11 Thread GitBox
HyukjinKwon commented on PR #39519: URL: https://github.com/apache/spark/pull/39519#issuecomment-1379641861 Would be great if we can add it to Scala, Python, and R API. I don't mind doing that in a separate PR. -- This is an automated message from the Apache Git Service. To respond to

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #39519: [SPARK-41995][SQL] Accept non-foldable expressions in schema_of_json

2023-01-11 Thread GitBox
HyukjinKwon commented on code in PR #39519: URL: https://github.com/apache/spark/pull/39519#discussion_r1067581392 ## sql/core/src/test/scala/org/apache/spark/sql/JsonFunctionsSuite.scala: ## @@ -583,6 +583,17 @@ class JsonFunctionsSuite extends QueryTest with

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #39519: [SPARK-41995] Accept non-foldable expressions in schema_of_json

2023-01-11 Thread GitBox
HyukjinKwon commented on code in PR #39519: URL: https://github.com/apache/spark/pull/39519#discussion_r1067580794 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/jsonExpressions.scala: ## @@ -793,27 +796,17 @@ case class SchemaOfJson( @transient

[GitHub] [spark] rithwik-db commented on a diff in pull request #39267: [WIP][SPARK-41592][PYTHON][ML] Pytorch file Distributed Training

2023-01-11 Thread GitBox
rithwik-db commented on code in PR #39267: URL: https://github.com/apache/spark/pull/39267#discussion_r1067576354 ## python/pyspark/ml/torch/distributor.py: ## @@ -407,13 +418,6 @@ def _run_local_training( try: if self.use_gpu: gpus_owned

[GitHub] [spark] HyukjinKwon closed pull request #39188: [SPARK-41591][PYTHON][ML] Training PyTorch Files on Single Node Multi GPU

2023-01-11 Thread GitBox
HyukjinKwon closed pull request #39188: [SPARK-41591][PYTHON][ML] Training PyTorch Files on Single Node Multi GPU URL: https://github.com/apache/spark/pull/39188 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[GitHub] [spark] HyukjinKwon commented on pull request #39188: [SPARK-41591][PYTHON][ML] Training PyTorch Files on Single Node Multi GPU

2023-01-11 Thread GitBox
HyukjinKwon commented on PR #39188: URL: https://github.com/apache/spark/pull/39188#issuecomment-1379623516 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] gengliangwang commented on a diff in pull request #39487: [SPARK-41968][CORE][SQL] Refactor `ProtobufSerDe` to `ProtobufSerDe[T]`

2023-01-11 Thread GitBox
gengliangwang commented on code in PR #39487: URL: https://github.com/apache/spark/pull/39487#discussion_r1067558519 ## core/src/main/scala/org/apache/spark/status/protobuf/KVStoreProtobufSerializer.scala: ## @@ -40,10 +41,16 @@ private[spark] class KVStoreProtobufSerializer

[GitHub] [spark] gengliangwang commented on a diff in pull request #39487: [SPARK-41968][CORE][SQL] Refactor `ProtobufSerDe` to `ProtobufSerDe[T]`

2023-01-11 Thread GitBox
gengliangwang commented on code in PR #39487: URL: https://github.com/apache/spark/pull/39487#discussion_r1067558195 ## core/src/main/scala/org/apache/spark/status/protobuf/KVStoreProtobufSerializer.scala: ## @@ -40,10 +41,16 @@ private[spark] class KVStoreProtobufSerializer

[GitHub] [spark] eric-maynard opened a new pull request, #39519: [SPARK-41995] Accept non-foldable expressions in schema_of_json

2023-01-11 Thread GitBox
eric-maynard opened a new pull request, #39519: URL: https://github.com/apache/spark/pull/39519 ### What changes were proposed in this pull request? Presently, only foldable expressions can be passed in to schema_of_json, e.g. `SCHEMA_OF_JSON(CONCAT('', ''))`. With this change, we

[GitHub] [spark] mridulm commented on a diff in pull request #37922: [SPARK-40480][SHUFFLE] Remove push-based shuffle data after query finished

2023-01-11 Thread GitBox
mridulm commented on code in PR #37922: URL: https://github.com/apache/spark/pull/37922#discussion_r1067554329 ## common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/NoOpMergedShuffleFileManager.java: ## @@ -84,4 +85,9 @@ public MergedBlockMeta

[GitHub] [spark] mridulm commented on a diff in pull request #37922: [SPARK-40480][SHUFFLE] Remove push-based shuffle data after query finished

2023-01-11 Thread GitBox
mridulm commented on code in PR #37922: URL: https://github.com/apache/spark/pull/37922#discussion_r1067551485 ## common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/ExternalBlockStoreClient.java: ## @@ -256,6 +256,22 @@ public void onFailure(Throwable e) {

[GitHub] [spark] mridulm commented on a diff in pull request #37922: [SPARK-40480][SHUFFLE] Remove push-based shuffle data after query finished

2023-01-11 Thread GitBox
mridulm commented on code in PR #37922: URL: https://github.com/apache/spark/pull/37922#discussion_r1067551485 ## common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/ExternalBlockStoreClient.java: ## @@ -256,6 +256,22 @@ public void onFailure(Throwable e) {

[GitHub] [spark] mridulm commented on a diff in pull request #37922: [SPARK-40480][SHUFFLE] Remove push-based shuffle data after query finished

2023-01-11 Thread GitBox
mridulm commented on code in PR #37922: URL: https://github.com/apache/spark/pull/37922#discussion_r1067551485 ## common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/ExternalBlockStoreClient.java: ## @@ -256,6 +256,22 @@ public void onFailure(Throwable e) {

[GitHub] [spark] mridulm commented on a diff in pull request #37922: [SPARK-40480][SHUFFLE] Remove push-based shuffle data after query finished

2023-01-11 Thread GitBox
mridulm commented on code in PR #37922: URL: https://github.com/apache/spark/pull/37922#discussion_r1067551485 ## common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/ExternalBlockStoreClient.java: ## @@ -256,6 +256,22 @@ public void onFailure(Throwable e) {

[GitHub] [spark] mridulm commented on a diff in pull request #37922: [SPARK-40480][SHUFFLE] Remove push-based shuffle data after query finished

2023-01-11 Thread GitBox
mridulm commented on code in PR #37922: URL: https://github.com/apache/spark/pull/37922#discussion_r1067551485 ## common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/ExternalBlockStoreClient.java: ## @@ -256,6 +256,22 @@ public void onFailure(Throwable e) {

[GitHub] [spark] mridulm commented on a diff in pull request #37922: [SPARK-40480][SHUFFLE] Remove push-based shuffle data after query finished

2023-01-11 Thread GitBox
mridulm commented on code in PR #37922: URL: https://github.com/apache/spark/pull/37922#discussion_r1067548067 ## core/src/main/scala/org/apache/spark/storage/BlockManagerMasterEndpoint.scala: ## @@ -321,6 +321,12 @@ class BlockManagerMasterEndpoint( } private def

[GitHub] [spark] allisonwang-db commented on pull request #39479: [SPARK-41961][SQL] Support table-valued functions with LATERAL

2023-01-11 Thread GitBox
allisonwang-db commented on PR #39479: URL: https://github.com/apache/spark/pull/39479#issuecomment-1379590363 cc @cloud-fan -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] rmcyang commented on a diff in pull request #37638: [SPARK-33573][SHUFFLE][YARN] Shuffle server side metrics for Push-based shuffle

2023-01-11 Thread GitBox
rmcyang commented on code in PR #37638: URL: https://github.com/apache/spark/pull/37638#discussion_r1067536500 ## docs/monitoring.md: ## @@ -1421,6 +1421,21 @@ Note: applies to the shuffle service - shuffle-server.usedDirectMemory - shuffle-server.usedHeapMemory +Note:

[GitHub] [spark] rmcyang commented on a diff in pull request #37638: [SPARK-33573][SHUFFLE][YARN] Shuffle server side metrics for Push-based shuffle

2023-01-11 Thread GitBox
rmcyang commented on code in PR #37638: URL: https://github.com/apache/spark/pull/37638#discussion_r1067534717 ## docs/monitoring.md: ## @@ -1421,6 +1421,21 @@ Note: applies to the shuffle service - shuffle-server.usedDirectMemory - shuffle-server.usedHeapMemory +Note:

[GitHub] [spark] srowen closed pull request #39511: [SPARK-41047][SQL] Improve docs for round

2023-01-11 Thread GitBox
srowen closed pull request #39511: [SPARK-41047][SQL] Improve docs for round URL: https://github.com/apache/spark/pull/39511 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

[GitHub] [spark] panbingkun commented on pull request #39511: [SPARK-41047][SQL] Improve docs for round

2023-01-11 Thread GitBox
panbingkun commented on PR #39511: URL: https://github.com/apache/spark/pull/39511#issuecomment-1379568848 Done -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

[GitHub] [spark] mridulm commented on a diff in pull request #37638: [SPARK-33573][SHUFFLE][YARN] Shuffle server side metrics for Push-based shuffle

2023-01-11 Thread GitBox
mridulm commented on code in PR #37638: URL: https://github.com/apache/spark/pull/37638#discussion_r1067505986 ## docs/monitoring.md: ## @@ -1421,6 +1421,21 @@ Note: applies to the shuffle service - shuffle-server.usedDirectMemory - shuffle-server.usedHeapMemory +Note:

[GitHub] [spark] akpatnam25 commented on a diff in pull request #38959: SPARK-41415: SASL Request Retries

2023-01-11 Thread GitBox
akpatnam25 commented on code in PR #38959: URL: https://github.com/apache/spark/pull/38959#discussion_r1067498072 ## core/src/main/scala/org/apache/spark/shuffle/ShuffleBlockPusher.scala: ## @@ -251,6 +251,10 @@ private[spark] class ShuffleBlockPusher(conf: SparkConf) extends

[GitHub] [spark] bersprockets opened a new pull request, #39518: [SPARK-41991][SQL] `CheckOverflowInTableInsert` should accept ExpressionProxy as child

2023-01-11 Thread GitBox
bersprockets opened a new pull request, #39518: URL: https://github.com/apache/spark/pull/39518 ### What changes were proposed in this pull request? Change `CheckOverflowInTableInsert` to accept a `Cast` wrapped by an `ExpressionProxy` as a child. ### Why are the changes

[GitHub] [spark] hvanhovell commented on a diff in pull request #39517: [SPARK-41993][SQL] Move RowEncoder to AgnosticEncoders

2023-01-11 Thread GitBox
hvanhovell commented on code in PR #39517: URL: https://github.com/apache/spark/pull/39517#discussion_r1067435036 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/ScalaReflection.scala: ## @@ -377,27 +408,96 @@ object ScalaReflection extends ScalaReflection {

[GitHub] [spark] hvanhovell commented on pull request #39517: [SPARK-41993][SQL] Move RowEncoder to AgnosticEncoders

2023-01-11 Thread GitBox
hvanhovell commented on PR #39517: URL: https://github.com/apache/spark/pull/39517#issuecomment-1379453290 A note for the reviewers. I know that Catalyst tests pass. I have not run other tests, so there might still be a few things to iron out. -- This is an automated message from the

[GitHub] [spark] hvanhovell commented on a diff in pull request #39517: [SPARK-41993][SQL] Move RowEncoder to AgnosticEncoders

2023-01-11 Thread GitBox
hvanhovell commented on code in PR #39517: URL: https://github.com/apache/spark/pull/39517#discussion_r1067433716 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/ScalaReflection.scala: ## @@ -306,7 +330,7 @@ object ScalaReflection extends ScalaReflection { *

  1   2   >