[GitHub] [spark] techaddict commented on pull request #39104: [SPARK-41425][UI] Protobuf serializer for RDDStorageInfoWrapper

2022-12-19 Thread GitBox
techaddict commented on PR #39104: URL: https://github.com/apache/spark/pull/39104#issuecomment-1358568558 @gengliangwang addressed comments -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] amaliujia commented on a diff in pull request #39068: [SPARK-41434][CONNECT][PYTHON] Initial `LambdaFunction` implementation

2022-12-19 Thread GitBox
amaliujia commented on code in PR #39068: URL: https://github.com/apache/spark/pull/39068#discussion_r1052727913 ## connector/connect/server/src/main/scala/org/apache/spark/sql/connect/planner/SparkConnectPlanner.scala: ## @@ -534,6 +536,36 @@ class SparkConnectPlanner(session:

[GitHub] [spark] gengliangwang commented on a diff in pull request #39040: [SPARK-27561][SQL][FOLLOWUP] Support implicit lateral column alias resolution on Aggregate

2022-12-19 Thread GitBox
gengliangwang commented on code in PR #39040: URL: https://github.com/apache/spark/pull/39040#discussion_r1052735150 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveLateralColumnAlias.scala: ## @@ -244,7 +303,67 @@ object

[GitHub] [spark] anchovYu commented on a diff in pull request #39040: [SPARK-27561][SQL][FOLLOWUP] Support implicit lateral column alias resolution on Aggregate

2022-12-19 Thread GitBox
anchovYu commented on code in PR #39040: URL: https://github.com/apache/spark/pull/39040#discussion_r1052738997 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveLateralColumnAlias.scala: ## @@ -244,7 +303,67 @@ object

[GitHub] [spark] srielau commented on a diff in pull request #38861: [SPARK-41294][SQL] Assign a name to the error class _LEGACY_ERROR_TEMP_1203 / 1168

2022-12-19 Thread GitBox
srielau commented on code in PR #38861: URL: https://github.com/apache/spark/pull/38861#discussion_r1052745779 ## sql/core/src/test/resources/sql-tests/results/postgreSQL/numeric.sql.out: ## @@ -3831,12 +3831,12 @@ struct<> -- !query output

[GitHub] [spark] HyukjinKwon commented on pull request #39041: [SPARK-41528][CONNECT] Merge namespace of Spark Connect and PySpark API

2022-12-19 Thread GitBox
HyukjinKwon commented on PR #39041: URL: https://github.com/apache/spark/pull/39041#issuecomment-1358666500 Let me get this in in few days if there are no more comments. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

[GitHub] [spark] HyukjinKwon commented on pull request #39129: [SPARK-41587][BUILD] Upgrade `org.scalatestplus:selenium-4-4` to `org.scalatestplus:selenium-4-7`

2022-12-19 Thread GitBox
HyukjinKwon commented on PR #39129: URL: https://github.com/apache/spark/pull/39129#issuecomment-1358669841 cc @sarutak FYI -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #39128: [SPARK-41586][PYTHON] Introduce new PySpark package: `pyspark.errors` and error classes.

2022-12-19 Thread GitBox
HyukjinKwon commented on code in PR #39128: URL: https://github.com/apache/spark/pull/39128#discussion_r1052751097 ## python/pyspark/errors/__init__.py: ## @@ -0,0 +1,140 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #39128: [SPARK-41586][PYTHON] Introduce new PySpark package: `pyspark.errors` and error classes.

2022-12-19 Thread GitBox
HyukjinKwon commented on code in PR #39128: URL: https://github.com/apache/spark/pull/39128#discussion_r1052751697 ## python/pyspark/errors/error_classes.py: ## @@ -0,0 +1,30 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #39128: [SPARK-41586][PYTHON] Introduce new PySpark package: `pyspark.errors` and error classes.

2022-12-19 Thread GitBox
HyukjinKwon commented on code in PR #39128: URL: https://github.com/apache/spark/pull/39128#discussion_r1052751875 ## python/pyspark/testing/utils.py: ## @@ -138,6 +140,32 @@ def setUpClass(cls): def tearDownClass(cls): cls.sc.stop() +def checkError( Review

[GitHub] [spark] HyukjinKwon closed pull request #39117: [SPARK-41535][SQL] Set null correctly for calendar interval fields in `InterpretedUnsafeProjection` and `InterpretedMutableProjection`

2022-12-19 Thread GitBox
HyukjinKwon closed pull request #39117: [SPARK-41535][SQL] Set null correctly for calendar interval fields in `InterpretedUnsafeProjection` and `InterpretedMutableProjection` URL: https://github.com/apache/spark/pull/39117 -- This is an automated message from the Apache Git Service. To

[GitHub] [spark] WweiL opened a new pull request, #39132: [MINOR][DOC] Fix for Kafka Consumer Config Link

2022-12-19 Thread GitBox
WweiL opened a new pull request, #39132: URL: https://github.com/apache/spark/pull/39132 ### What changes were proposed in this pull request? Right the redirect link for kafka consumer config, before it points you to the top of the page, now it redirects you to the correct

[GitHub] [spark] HeartSaVioR closed pull request #39132: [MINOR][DOC] Fix for Kafka Consumer Config Link

2022-12-19 Thread GitBox
HeartSaVioR closed pull request #39132: [MINOR][DOC] Fix for Kafka Consumer Config Link URL: https://github.com/apache/spark/pull/39132 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] techaddict commented on a diff in pull request #39104: [SPARK-41425][UI] Protobuf serializer for RDDStorageInfoWrapper

2022-12-19 Thread GitBox
techaddict commented on code in PR #39104: URL: https://github.com/apache/spark/pull/39104#discussion_r1052723531 ## core/src/main/scala/org/apache/spark/status/protobuf/KVStoreProtobufSerializer.scala: ## @@ -17,7 +17,7 @@ package org.apache.spark.status.protobuf -import

[GitHub] [spark] amaliujia commented on a diff in pull request #39078: [SPARK-41534][CONNECT][SQL] Setup initial client module for Spark Connect

2022-12-19 Thread GitBox
amaliujia commented on code in PR #39078: URL: https://github.com/apache/spark/pull/39078#discussion_r1052725981 ## connector/connect/client/src/main/scala/org/apache/spark/sql/connect/client/SparkSession.scala: ## @@ -0,0 +1,133 @@ +/* + * Licensed to the Apache Software

[GitHub] [spark] amaliujia commented on a diff in pull request #39078: [SPARK-41534][CONNECT][SQL] Setup initial client module for Spark Connect

2022-12-19 Thread GitBox
amaliujia commented on code in PR #39078: URL: https://github.com/apache/spark/pull/39078#discussion_r1052725694 ## connector/connect/client/src/main/scala/org/apache/spark/sql/connect/client/SparkSession.scala: ## @@ -0,0 +1,133 @@ +/* + * Licensed to the Apache Software

[GitHub] [spark] HeartSaVioR commented on a diff in pull request #38517: [SPARK-39591][SS] Async Progress Tracking

2022-12-19 Thread GitBox
HeartSaVioR commented on code in PR #38517: URL: https://github.com/apache/spark/pull/38517#discussion_r1052728943 ## sql/core/src/test/scala/org/apache/spark/sql/execution/streaming/AsyncProgressTrackingMicroBatchExecutionSuite.scala: ## @@ -0,0 +1,1865 @@ +/* + * Licensed to

[GitHub] [spark] gengliangwang commented on a diff in pull request #39040: [SPARK-27561][SQL][FOLLOWUP] Support implicit lateral column alias resolution on Aggregate

2022-12-19 Thread GitBox
gengliangwang commented on code in PR #39040: URL: https://github.com/apache/spark/pull/39040#discussion_r1052735930 ## sql/core/src/test/scala/org/apache/spark/sql/LateralColumnAliasSuite.scala: ## @@ -689,4 +713,38 @@ class LateralColumnAliasSuite extends

[GitHub] [spark] gengliangwang commented on a diff in pull request #39104: [SPARK-41425][UI] Protobuf serializer for RDDStorageInfoWrapper

2022-12-19 Thread GitBox
gengliangwang commented on code in PR #39104: URL: https://github.com/apache/spark/pull/39104#discussion_r1052742075 ## core/src/main/scala/org/apache/spark/status/protobuf/RDDStorageInfoWrapperSerializer.scala: ## @@ -0,0 +1,126 @@ +/* + * Licensed to the Apache Software

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #39123: [SPARK-41583][CONNECT][PROTOBUF] Add Spark Connect and protobuf into setup.py with specifying dependencies

2022-12-19 Thread GitBox
HyukjinKwon commented on code in PR #39123: URL: https://github.com/apache/spark/pull/39123#discussion_r1052742314 ## python/setup.py: ## @@ -113,6 +113,7 @@ def _supports_symlinks(): # Also don't forget to update python/docs/source/getting_started/install.rst.

[GitHub] [spark] HyukjinKwon closed pull request #39123: [SPARK-41583][CONNECT][PROTOBUF] Add Spark Connect and protobuf into setup.py with specifying dependencies

2022-12-19 Thread GitBox
HyukjinKwon closed pull request #39123: [SPARK-41583][CONNECT][PROTOBUF] Add Spark Connect and protobuf into setup.py with specifying dependencies URL: https://github.com/apache/spark/pull/39123 -- This is an automated message from the Apache Git Service. To respond to the message, please

[GitHub] [spark] HyukjinKwon commented on pull request #39123: [SPARK-41583][CONNECT][PROTOBUF] Add Spark Connect and protobuf into setup.py with specifying dependencies

2022-12-19 Thread GitBox
HyukjinKwon commented on PR #39123: URL: https://github.com/apache/spark/pull/39123#issuecomment-1358655672 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] github-actions[bot] closed pull request #37831: [SPARK-40354][SQL] Support eliminate dynamic partition for datasource v1 writes

2022-12-19 Thread GitBox
github-actions[bot] closed pull request #37831: [SPARK-40354][SQL] Support eliminate dynamic partition for datasource v1 writes URL: https://github.com/apache/spark/pull/37831 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #39128: [SPARK-41586][PYTHON] Introduce new PySpark package: `pyspark.errors` and error classes.

2022-12-19 Thread GitBox
HyukjinKwon commented on code in PR #39128: URL: https://github.com/apache/spark/pull/39128#discussion_r1052750600 ## python/pyspark/errors/__init__.py: ## @@ -0,0 +1,140 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license

[GitHub] [spark] HyukjinKwon commented on pull request #39117: [SPARK-41535][SQL] Set null correctly for calendar interval fields in `InterpretedUnsafeProjection` and `InterpretedMutableProjection`

2022-12-19 Thread GitBox
HyukjinKwon commented on PR #39117: URL: https://github.com/apache/spark/pull/39117#issuecomment-1358675707 Merged to master, branch-3.3, and branch-3.2. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #39128: [SPARK-41586][PYTHON] Introduce new PySpark package: `pyspark.errors` and error classes.

2022-12-19 Thread GitBox
HyukjinKwon commented on code in PR #39128: URL: https://github.com/apache/spark/pull/39128#discussion_r1052752065 ## python/pyspark/sql/functions.py: ## @@ -8122,15 +8130,13 @@ def _get_lambda_parameters(f: Callable) -> ValuesView[inspect.Parameter]: # Validate that

[GitHub] [spark] zhengruifeng commented on a diff in pull request #39068: [SPARK-41434][CONNECT][PYTHON] Initial `LambdaFunction` implementation

2022-12-19 Thread GitBox
zhengruifeng commented on code in PR #39068: URL: https://github.com/apache/spark/pull/39068#discussion_r1052762202 ## connector/connect/server/src/main/scala/org/apache/spark/sql/connect/planner/SparkConnectPlanner.scala: ## @@ -534,6 +536,36 @@ class

[GitHub] [spark] HeartSaVioR commented on pull request #39132: [MINOR][DOC] Fix for Kafka Consumer Config Link

2022-12-19 Thread GitBox
HeartSaVioR commented on PR #39132: URL: https://github.com/apache/spark/pull/39132#issuecomment-1358709546 Thanks! Merging to master. (It's just a small doc change so won't wait for CI build.) -- This is an automated message from the Apache Git Service. To respond to the message, please

[GitHub] [spark] amaliujia commented on a diff in pull request #39078: [SPARK-41534][CONNECT][SQL] Setup initial client module for Spark Connect

2022-12-19 Thread GitBox
amaliujia commented on code in PR #39078: URL: https://github.com/apache/spark/pull/39078#discussion_r1052834300 ## connector/connect/client/src/main/scala/org/apache/spark/sql/connect/client/SparkSession.scala: ## @@ -0,0 +1,133 @@ +/* + * Licensed to the Apache Software

[GitHub] [spark] amaliujia commented on a diff in pull request #39078: [SPARK-41534][CONNECT][SQL] Setup initial client module for Spark Connect

2022-12-19 Thread GitBox
amaliujia commented on code in PR #39078: URL: https://github.com/apache/spark/pull/39078#discussion_r1052834300 ## connector/connect/client/src/main/scala/org/apache/spark/sql/connect/client/SparkSession.scala: ## @@ -0,0 +1,133 @@ +/* + * Licensed to the Apache Software

[GitHub] [spark] LuciferYang commented on pull request #39124: [DON'T MERGE] Test build and test with hadoop 3.3.5-RC0

2022-12-19 Thread GitBox
LuciferYang commented on PR #39124: URL: https://github.com/apache/spark/pull/39124#issuecomment-1358794388 Many test failed as follows: ``` 2022-12-20T03:15:37.0609530Z [info] org.apache.spark.sql.hive.execution.command.AlterTableAddColumnsSuite *** ABORTED *** (28 milliseconds)

[GitHub] [spark] LuciferYang commented on pull request #39124: [DON'T MERGE] Test build and test with hadoop 3.3.5-RC0

2022-12-19 Thread GitBox
LuciferYang commented on PR #39124: URL: https://github.com/apache/spark/pull/39124#issuecomment-1358800857 also cc @wangyum -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] itholic commented on pull request #39128: [SPARK-41586][PYTHON] Introduce new PySpark package: `pyspark.errors` and error classes.

2022-12-19 Thread GitBox
itholic commented on PR #39128: URL: https://github.com/apache/spark/pull/39128#issuecomment-1358810969 Let me close it for now, and re-create the PR to change the logic to re-use JVM. -- This is an automated message from the Apache Git Service. To respond to the message, please log on

[GitHub] [spark] itholic closed pull request #39128: [SPARK-41586][PYTHON] Introduce new PySpark package: `pyspark.errors` and error classes.

2022-12-19 Thread GitBox
itholic closed pull request #39128: [SPARK-41586][PYTHON] Introduce new PySpark package: `pyspark.errors` and error classes. URL: https://github.com/apache/spark/pull/39128 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

[GitHub] [spark] zhengruifeng closed pull request #38984: [SPARK-41349][CONNECT][PYTHON] Implement DataFrame.hint

2022-12-19 Thread GitBox
zhengruifeng closed pull request #38984: [SPARK-41349][CONNECT][PYTHON] Implement DataFrame.hint URL: https://github.com/apache/spark/pull/38984 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] zhengruifeng commented on pull request #38984: [SPARK-41349][CONNECT][PYTHON] Implement DataFrame.hint

2022-12-19 Thread GitBox
zhengruifeng commented on PR #38984: URL: https://github.com/apache/spark/pull/38984#issuecomment-1358823132 merged into master, thank you @dengziming for working on it! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

[GitHub] [spark] jerrypeng commented on pull request #38517: [SPARK-39591][SS] Async Progress Tracking

2022-12-19 Thread GitBox
jerrypeng commented on PR #38517: URL: https://github.com/apache/spark/pull/38517#issuecomment-1358828963 @HeartSaVioR I have addressed your comments please take another look -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

[GitHub] [spark] rxin opened a new pull request, #39134: [WIP] Implement group by star (aka group by all)

2022-12-19 Thread GitBox
rxin opened a new pull request, #39134: URL: https://github.com/apache/spark/pull/39134 ### What changes were proposed in this pull request? This patch implements group by star. This is similar to the "group by all" implemented in DuckDB. Note that I'm not done yet. We need to decide if

[GitHub] [spark] rxin commented on a diff in pull request #39134: [WIP] Implement group by star (aka group by all)

2022-12-19 Thread GitBox
rxin commented on code in PR #39134: URL: https://github.com/apache/spark/pull/39134#discussion_r1052909245 ## sql/core/src/test/resources/sql-tests/inputs/group-by-star.sql: ## @@ -0,0 +1,45 @@ +-- group by all Review Comment: do we need a test case for window functions?

[GitHub] [spark] cloud-fan commented on a diff in pull request #39134: [WIP] Implement group by star (aka group by all)

2022-12-19 Thread GitBox
cloud-fan commented on code in PR #39134: URL: https://github.com/apache/spark/pull/39134#discussion_r1052911386 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveGroupByStar.scala: ## @@ -0,0 +1,47 @@ +/* + * Licensed to the Apache Software

[GitHub] [spark] cloud-fan commented on a diff in pull request #39134: [WIP] Implement group by star (aka group by all)

2022-12-19 Thread GitBox
cloud-fan commented on code in PR #39134: URL: https://github.com/apache/spark/pull/39134#discussion_r1052911008 ## sql/core/src/test/resources/sql-tests/results/group-by-star-mosha.sql.out: ## @@ -0,0 +1,141 @@ +-- Automatically generated by SQLQueryTestSuite +-- !query

[GitHub] [spark] rxin commented on a diff in pull request #39134: [WIP] Implement group by star (aka group by all)

2022-12-19 Thread GitBox
rxin commented on code in PR #39134: URL: https://github.com/apache/spark/pull/39134#discussion_r1052910941 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveGroupByStar.scala: ## @@ -0,0 +1,47 @@ +/* + * Licensed to the Apache Software Foundation

[GitHub] [spark] gengliangwang commented on pull request #39100: [SPARK-41427][UI] Protobuf serializer for ExecutorStageSummaryWrapper

2022-12-19 Thread GitBox
gengliangwang commented on PR #39100: URL: https://github.com/apache/spark/pull/39100#issuecomment-1358848179 Merging to master -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] gengliangwang closed pull request #39100: [SPARK-41427][UI] Protobuf serializer for ExecutorStageSummaryWrapper

2022-12-19 Thread GitBox
gengliangwang closed pull request #39100: [SPARK-41427][UI] Protobuf serializer for ExecutorStageSummaryWrapper URL: https://github.com/apache/spark/pull/39100 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[GitHub] [spark] gengliangwang closed pull request #39104: [SPARK-41425][UI] Protobuf serializer for RDDStorageInfoWrapper

2022-12-19 Thread GitBox
gengliangwang closed pull request #39104: [SPARK-41425][UI] Protobuf serializer for RDDStorageInfoWrapper URL: https://github.com/apache/spark/pull/39104 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[GitHub] [spark] rxin commented on a diff in pull request #39134: [WIP] Implement group by star (aka group by all)

2022-12-19 Thread GitBox
rxin commented on code in PR #39134: URL: https://github.com/apache/spark/pull/39134#discussion_r1052911704 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveGroupByStar.scala: ## @@ -0,0 +1,47 @@ +/* + * Licensed to the Apache Software Foundation

[GitHub] [spark] gengliangwang commented on pull request #39104: [SPARK-41425][UI] Protobuf serializer for RDDStorageInfoWrapper

2022-12-19 Thread GitBox
gengliangwang commented on PR #39104: URL: https://github.com/apache/spark/pull/39104#issuecomment-1358850152 Thanks, merging to master -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] gengliangwang commented on a diff in pull request #39134: [WIP] Implement group by star (aka group by all)

2022-12-19 Thread GitBox
gengliangwang commented on code in PR #39134: URL: https://github.com/apache/spark/pull/39134#discussion_r1052919912 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveGroupByStar.scala: ## @@ -0,0 +1,47 @@ +/* + * Licensed to the Apache Software

[GitHub] [spark] gengliangwang commented on a diff in pull request #39134: [WIP] Implement group by star (aka group by all)

2022-12-19 Thread GitBox
gengliangwang commented on code in PR #39134: URL: https://github.com/apache/spark/pull/39134#discussion_r1052919759 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveGroupByStar.scala: ## @@ -0,0 +1,47 @@ +/* + * Licensed to the Apache Software

[GitHub] [spark] gengliangwang commented on a diff in pull request #39134: [WIP] Implement group by star (aka group by all)

2022-12-19 Thread GitBox
gengliangwang commented on code in PR #39134: URL: https://github.com/apache/spark/pull/39134#discussion_r1052920262 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveGroupByStar.scala: ## @@ -0,0 +1,47 @@ +/* + * Licensed to the Apache Software

[GitHub] [spark] gengliangwang commented on a diff in pull request #39134: [WIP] Implement group by star (aka group by all)

2022-12-19 Thread GitBox
gengliangwang commented on code in PR #39134: URL: https://github.com/apache/spark/pull/39134#discussion_r1052920528 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveGroupByStar.scala: ## @@ -0,0 +1,47 @@ +/* + * Licensed to the Apache Software

[GitHub] [spark] HeartSaVioR commented on pull request #39082: [SPARK-41539][SQL] Remap stats and constraints against output in logical plan for LogicalRDD

2022-12-19 Thread GitBox
HeartSaVioR commented on PR #39082: URL: https://github.com/apache/spark/pull/39082#issuecomment-1357261191 cc. @cloud-fan @viirya Friendly reminder. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

[GitHub] [spark] bjornjorgensen commented on a diff in pull request #39116: [SPARK-41566][BUILD] Upgrade `netty` to 4.1.86.Final

2022-12-19 Thread GitBox
bjornjorgensen commented on code in PR #39116: URL: https://github.com/apache/spark/pull/39116#discussion_r1051924268 ## dev/deps/spark-deps-hadoop-2-hive-2.3: ## @@ -200,24 +200,25 @@ metrics-jmx/4.2.13//metrics-jmx-4.2.13.jar metrics-json/4.2.13//metrics-json-4.2.13.jar

[GitHub] [spark] HyukjinKwon commented on pull request #39041: [SPARK-41528][CONNECT] Merge namespace of Spark Connect and PySpark API

2022-12-19 Thread GitBox
HyukjinKwon commented on PR #39041: URL: https://github.com/apache/spark/pull/39041#issuecomment-1357290022 We will use the same package and introduce an option but this PR doesn't cover yet. I'll work on that soon -- This is an automated message from the Apache Git Service. To respond

[GitHub] [spark] LuciferYang commented on a diff in pull request #39116: [SPARK-41566][BUILD] Upgrade `netty` to 4.1.86.Final

2022-12-19 Thread GitBox
LuciferYang commented on code in PR #39116: URL: https://github.com/apache/spark/pull/39116#discussion_r1051953415 ## dev/deps/spark-deps-hadoop-2-hive-2.3: ## @@ -200,24 +200,25 @@ metrics-jmx/4.2.13//metrics-jmx-4.2.13.jar metrics-json/4.2.13//metrics-json-4.2.13.jar

[GitHub] [spark] bjornjorgensen commented on a diff in pull request #39116: [SPARK-41566][BUILD] Upgrade `netty` to 4.1.86.Final

2022-12-19 Thread GitBox
bjornjorgensen commented on code in PR #39116: URL: https://github.com/apache/spark/pull/39116#discussion_r1051972925 ## dev/deps/spark-deps-hadoop-2-hive-2.3: ## @@ -200,24 +200,25 @@ metrics-jmx/4.2.13//metrics-jmx-4.2.13.jar metrics-json/4.2.13//metrics-json-4.2.13.jar

[GitHub] [spark] gboo-infa commented on pull request #39097: [SPARK-41049][SQL] Make to_csv function deterministic

2022-12-19 Thread GitBox
gboo-infa commented on PR #39097: URL: https://github.com/apache/spark/pull/39097#issuecomment-1357352474 While this change may be good to have for its own reasons, it doesn't really address the problem in the JIRA. The problem is that CodegenFallback is incompatible with nondeterministic

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #39041: [SPARK-41528][CONNECT] Merge namespace of Spark Connect and PySpark API

2022-12-19 Thread GitBox
HyukjinKwon commented on code in PR #39041: URL: https://github.com/apache/spark/pull/39041#discussion_r1052017485 ## python/pyspark/sql/observation.py: ## @@ -109,7 +111,9 @@ def _on(self, df: DataFrame, *exprs: Column) -> DataFrame: ) return

[GitHub] [spark] HyukjinKwon commented on pull request #39123: [SPARK-41583][CONNECT][PROTOBUF] Add Spark Connect and protobuf into setup.py with specifying dependencies

2022-12-19 Thread GitBox
HyukjinKwon commented on PR #39123: URL: https://github.com/apache/spark/pull/39123#issuecomment-1357448291 Build: https://github.com/HyukjinKwon/spark/actions/runs/3730423845 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

[GitHub] [spark] LuciferYang commented on pull request #39125: [SPARK-41584][BUILD] Upgrade RoaringBitmap to 0.9.36

2022-12-19 Thread GitBox
LuciferYang commented on PR #39125: URL: https://github.com/apache/spark/pull/39125#issuecomment-1357245572 will update result of `MapStatusesConvertBenchmark` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[GitHub] [spark] cloud-fan commented on pull request #39081: [SPARK-41538][SQL] Metadata column should be appended at the end of project list

2022-12-19 Thread GitBox
cloud-fan commented on PR #39081: URL: https://github.com/apache/spark/pull/39081#issuecomment-1357246940 late LGTM -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

[GitHub] [spark] zhengruifeng commented on a diff in pull request #39068: [SPARK-41434][CONNECT][PYTHON] Initial `LambdaFunction` implementation

2022-12-19 Thread GitBox
zhengruifeng commented on code in PR #39068: URL: https://github.com/apache/spark/pull/39068#discussion_r1052020129 ## connector/connect/common/src/main/protobuf/spark/connect/expressions.proto: ## @@ -167,4 +169,26 @@ message Expression { // (Optional) Alias metadata

[GitHub] [spark] zhengruifeng commented on a diff in pull request #39068: [SPARK-41434][CONNECT][PYTHON] Initial `LambdaFunction` implementation

2022-12-19 Thread GitBox
zhengruifeng commented on code in PR #39068: URL: https://github.com/apache/spark/pull/39068#discussion_r1052021125 ## python/pyspark/sql/connect/functions.py: ## @@ -80,6 +84,78 @@ def _invoke_binary_math_function(name: str, col1: Any, col2: Any) -> Column: return

[GitHub] [spark] beliefer commented on pull request #39084: [SPARK-41464][CONNECT][PYTHON] Implement `DataFrame.to`

2022-12-19 Thread GitBox
beliefer commented on PR #39084: URL: https://github.com/apache/spark/pull/39084#issuecomment-1357476253 ping @zhengruifeng @amaliujia -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] cloud-fan commented on pull request #39041: [SPARK-41528][CONNECT] Merge namespace of Spark Connect and PySpark API

2022-12-19 Thread GitBox
cloud-fan commented on PR #39041: URL: https://github.com/apache/spark/pull/39041#issuecomment-1357280213 The PR description looks pretty clear. One thing I'm not very clear is the step 0: `pip install pyspark`. Do we need a different package to use `--remote`, or the `pyspark` package

[GitHub] [spark] cloud-fan commented on a diff in pull request #39041: [SPARK-41528][CONNECT] Merge namespace of Spark Connect and PySpark API

2022-12-19 Thread GitBox
cloud-fan commented on code in PR #39041: URL: https://github.com/apache/spark/pull/39041#discussion_r1051932455 ## python/pyspark/sql/observation.py: ## @@ -109,7 +111,9 @@ def _on(self, df: DataFrame, *exprs: Column) -> DataFrame: ) return

[GitHub] [spark] cloud-fan commented on a diff in pull request #39068: [SPARK-41434][CONNECT][PYTHON] Initial `LambdaFunction` implementation

2022-12-19 Thread GitBox
cloud-fan commented on code in PR #39068: URL: https://github.com/apache/spark/pull/39068#discussion_r1051940915 ## connector/connect/common/src/main/protobuf/spark/connect/expressions.proto: ## @@ -167,4 +169,26 @@ message Expression { // (Optional) Alias metadata

[GitHub] [spark] LuciferYang commented on a diff in pull request #39116: [SPARK-41566][BUILD] Upgrade `netty` to 4.1.86.Final

2022-12-19 Thread GitBox
LuciferYang commented on code in PR #39116: URL: https://github.com/apache/spark/pull/39116#discussion_r1051961044 ## dev/deps/spark-deps-hadoop-2-hive-2.3: ## @@ -200,24 +200,25 @@ metrics-jmx/4.2.13//metrics-jmx-4.2.13.jar metrics-json/4.2.13//metrics-json-4.2.13.jar

[GitHub] [spark] LucaCanali opened a new pull request, #39127: [SPARK-41585][YARN] Set excludeNodes for executor allocation in YARN besides dynamic allo…

2022-12-19 Thread GitBox
LucaCanali opened a new pull request, #39127: URL: https://github.com/apache/spark/pull/39127 ### What changes were proposed in this pull request? The Spark exclude node functionality for Spark on YARN, introduced in [SPARK-26688](https://issues.apache.org/jira/browse/SPARK-26688),

[GitHub] [spark] LuciferYang commented on pull request #39104: [SPARK-41425] Protobuf serializer for RDDStorageInfoWrapper

2022-12-19 Thread GitBox
LuciferYang commented on PR #39104: URL: https://github.com/apache/spark/pull/39104#issuecomment-1357385626 If so, we should add `[UI]` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] cloud-fan commented on a diff in pull request #39068: [SPARK-41434][CONNECT][PYTHON] Initial `LambdaFunction` implementation

2022-12-19 Thread GitBox
cloud-fan commented on code in PR #39068: URL: https://github.com/apache/spark/pull/39068#discussion_r1051941366 ## python/pyspark/sql/connect/functions.py: ## @@ -80,6 +84,78 @@ def _invoke_binary_math_function(name: str, col1: Any, col2: Any) -> Column: return

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #39041: [SPARK-41528][CONNECT] Merge namespace of Spark Connect and PySpark API

2022-12-19 Thread GitBox
HyukjinKwon commented on code in PR #39041: URL: https://github.com/apache/spark/pull/39041#discussion_r1052015205 ## core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala: ## @@ -229,15 +229,20 @@ private[spark] class SparkSubmit extends Logging { var

[GitHub] [spark] fred-db commented on a diff in pull request #38941: [SPARK-41498] Propagate metadata through Union

2022-12-19 Thread GitBox
fred-db commented on code in PR #38941: URL: https://github.com/apache/spark/pull/38941#discussion_r1051925750 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicLogicalOperators.scala: ## @@ -449,22 +449,54 @@ case class Union(

[GitHub] [spark] NarekDW commented on pull request #39097: [SPARK-41049][SQL] Make to_csv function deterministic

2022-12-19 Thread GitBox
NarekDW commented on PR #39097: URL: https://github.com/apache/spark/pull/39097#issuecomment-1357494252 > While this change may be good to have for its own reasons, it doesn't really address the problem in the JIRA. The problem is that CodegenFallback is incompatible with nondeterministic

[GitHub] [spark] grundprinzip commented on a diff in pull request #39084: [SPARK-41464][CONNECT][PYTHON] Implement `DataFrame.to`

2022-12-19 Thread GitBox
grundprinzip commented on code in PR #39084: URL: https://github.com/apache/spark/pull/39084#discussion_r1052226595 ## python/pyspark/sql/tests/connect/test_connect_plan_only.py: ## @@ -498,6 +499,19 @@ def test_coalesce_and_repartition(self):

[GitHub] [spark] itholic opened a new pull request, #39128: [SPARK-41586][Python] Introduce new PySpark package: `pyspark.errors` and error classes.

2022-12-19 Thread GitBox
itholic opened a new pull request, #39128: URL: https://github.com/apache/spark/pull/39128 ### What changes were proposed in this pull request? This PR proposes to introduce `pyspark.errors` and error classes to unifying & improving errors generated by PySpark under a single path.

[GitHub] [spark] cloud-fan commented on a diff in pull request #39095: [SPARK-41565][SQL] Add the error class `UNRESOLVED_ROUTINE`

2022-12-19 Thread GitBox
cloud-fan commented on code in PR #39095: URL: https://github.com/apache/spark/pull/39095#discussion_r1052134644 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala: ## @@ -2185,7 +2185,10 @@ class Analyzer(override val catalogManager:

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #39041: [SPARK-41528][CONNECT] Merge namespace of Spark Connect and PySpark API

2022-12-19 Thread GitBox
HyukjinKwon commented on code in PR #39041: URL: https://github.com/apache/spark/pull/39041#discussion_r1052134706 ## python/pyspark/sql/observation.py: ## @@ -109,7 +111,9 @@ def _on(self, df: DataFrame, *exprs: Column) -> DataFrame: ) return

[GitHub] [spark] MaxGekk commented on a diff in pull request #39095: [SPARK-41565][SQL] Add the error class `UNRESOLVED_ROUTINE`

2022-12-19 Thread GitBox
MaxGekk commented on code in PR #39095: URL: https://github.com/apache/spark/pull/39095#discussion_r1052159400 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/FunctionRegistry.scala: ## @@ -227,7 +227,7 @@ trait SimpleFunctionRegistryBase[T] extends

[GitHub] [spark] MaxGekk commented on a diff in pull request #39095: [SPARK-41565][SQL] Add the error class `UNRESOLVED_ROUTINE`

2022-12-19 Thread GitBox
MaxGekk commented on code in PR #39095: URL: https://github.com/apache/spark/pull/39095#discussion_r1052182744 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala: ## @@ -2283,7 +2286,12 @@ class Analyzer(override val catalogManager:

[GitHub] [spark] grundprinzip commented on a diff in pull request #39068: [SPARK-41434][CONNECT][PYTHON] Initial `LambdaFunction` implementation

2022-12-19 Thread GitBox
grundprinzip commented on code in PR #39068: URL: https://github.com/apache/spark/pull/39068#discussion_r1051953537 ## connector/connect/common/src/main/protobuf/spark/connect/expressions.proto: ## @@ -167,4 +169,26 @@ message Expression { // (Optional) Alias metadata

[GitHub] [spark] cloud-fan commented on a diff in pull request #38941: [SPARK-41498] Propagate metadata through Union

2022-12-19 Thread GitBox
cloud-fan commented on code in PR #38941: URL: https://github.com/apache/spark/pull/38941#discussion_r1052269541 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicLogicalOperators.scala: ## @@ -449,22 +449,54 @@ case class Union(

[GitHub] [spark] cloud-fan commented on a diff in pull request #38941: [SPARK-41498] Propagate metadata through Union

2022-12-19 Thread GitBox
cloud-fan commented on code in PR #38941: URL: https://github.com/apache/spark/pull/38941#discussion_r1052283207 ## sql/core/src/test/scala/org/apache/spark/sql/connector/MetadataColumnSuite.scala: ## @@ -232,4 +239,191 @@ class MetadataColumnSuite extends DatasourceV2SQLBase {

[GitHub] [spark] itholic commented on a diff in pull request #39128: [SPARK-41586][Python] Introduce new PySpark package: `pyspark.errors` and error classes.

2022-12-19 Thread GitBox
itholic commented on code in PR #39128: URL: https://github.com/apache/spark/pull/39128#discussion_r1052117811 ## python/pyspark/errors/__init__.py: ## @@ -0,0 +1,140 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements.

[GitHub] [spark] yabola commented on a diff in pull request #37922: [SPARK-40480][SHUFFLE] Remove push-based shuffle data after query finished

2022-12-19 Thread GitBox
yabola commented on code in PR #37922: URL: https://github.com/apache/spark/pull/37922#discussion_r1052111525 ## common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/RemoteBlockPushResolver.java: ## @@ -396,6 +403,56 @@ public void applicationRemoved(String

[GitHub] [spark] cloud-fan commented on pull request #38968: [SPARK-41441][SQL] Support Generate with no required child output to host outer references

2022-12-19 Thread GitBox
cloud-fan commented on PR #38968: URL: https://github.com/apache/spark/pull/38968#issuecomment-1357560347 thanks, merging to master! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] cloud-fan closed pull request #38968: [SPARK-41441][SQL] Support Generate with no required child output to host outer references

2022-12-19 Thread GitBox
cloud-fan closed pull request #38968: [SPARK-41441][SQL] Support Generate with no required child output to host outer references URL: https://github.com/apache/spark/pull/38968 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

[GitHub] [spark] monkeyboy123 commented on a diff in pull request #39102: [SPARK-41555][SQL] Multi sparkSession should share single SQLAppStatusStore

2022-12-19 Thread GitBox
monkeyboy123 commented on code in PR #39102: URL: https://github.com/apache/spark/pull/39102#discussion_r1052167027 ## sql/core/src/main/scala/org/apache/spark/sql/internal/SharedState.scala: ## @@ -109,13 +109,18 @@ private[sql] class SharedState( * A status store to query

[GitHub] [spark] grundprinzip commented on a diff in pull request #39084: [SPARK-41464][CONNECT][PYTHON] Implement `DataFrame.to`

2022-12-19 Thread GitBox
grundprinzip commented on code in PR #39084: URL: https://github.com/apache/spark/pull/39084#discussion_r1052227007 ## python/pyspark/sql/tests/connect/test_connect_basic.py: ## @@ -389,6 +389,21 @@ def test_schema(self): self.connect.sql(query).schema.__repr__(),

[GitHub] [spark] beliefer commented on pull request #39091: [SPARK-41527][CONNECT][PYTHON] Implement `DataFrame.observe`

2022-12-19 Thread GitBox
beliefer commented on PR #39091: URL: https://github.com/apache/spark/pull/39091#issuecomment-1357520476 > I think it would be possible to add another result batch type for observed metrics and simply pass them at the end. I have an idea: 1. cache the `Observation` at server.

[GitHub] [spark] yabola commented on a diff in pull request #37922: [SPARK-40480][SHUFFLE] Remove push-based shuffle data after query finished

2022-12-19 Thread GitBox
yabola commented on code in PR #37922: URL: https://github.com/apache/spark/pull/37922#discussion_r1052111525 ## common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/RemoteBlockPushResolver.java: ## @@ -396,6 +403,56 @@ public void applicationRemoved(String

[GitHub] [spark] itholic commented on a diff in pull request #39128: [SPARK-41586][Python] Introduce new PySpark package: `pyspark.errors` and error classes.

2022-12-19 Thread GitBox
itholic commented on code in PR #39128: URL: https://github.com/apache/spark/pull/39128#discussion_r1052115930 ## python/pyspark/errors/__init__.py: ## @@ -0,0 +1,140 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements.

[GitHub] [spark] LuciferYang opened a new pull request, #39129: [SPARK-41587][BUILD] Upgrade `org.scalatestplus:selenium-4-4` to `org.scalatestplus:selenium-4-7`

2022-12-19 Thread GitBox
LuciferYang opened a new pull request, #39129: URL: https://github.com/apache/spark/pull/39129 ### What changes were proposed in this pull request? This pr aims upgrade `org.scalatestplus:selenium-4-4` to `org.scalatestplus:selenium-4-7`: - `org.scalatestplus:selenium-4-4` ->

[GitHub] [spark] yabola commented on a diff in pull request #37922: [SPARK-40480][SHUFFLE] Remove push-based shuffle data after query finished

2022-12-19 Thread GitBox
yabola commented on code in PR #37922: URL: https://github.com/apache/spark/pull/37922#discussion_r1052111525 ## common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/RemoteBlockPushResolver.java: ## @@ -396,6 +403,56 @@ public void applicationRemoved(String

[GitHub] [spark] cloud-fan commented on a diff in pull request #39095: [SPARK-41565][SQL] Add the error class `UNRESOLVED_ROUTINE`

2022-12-19 Thread GitBox
cloud-fan commented on code in PR #39095: URL: https://github.com/apache/spark/pull/39095#discussion_r1052136601 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/FunctionRegistry.scala: ## @@ -227,7 +227,7 @@ trait SimpleFunctionRegistryBase[T] extends

[GitHub] [spark] wecharyu commented on pull request #39115: [SPARK-41563][SQL] Support partition filter in MSCK REPAIR TABLE statement

2022-12-19 Thread GitBox
wecharyu commented on PR #39115: URL: https://github.com/apache/spark/pull/39115#issuecomment-1357617492 @MaxGekk @cloud-fan @dongjoon-hyun could you please take a look? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

[GitHub] [spark] Shooter23 opened a new pull request, #39130: Update dataframe.py

2022-12-19 Thread GitBox
Shooter23 opened a new pull request, #39130: URL: https://github.com/apache/spark/pull/39130 Fix docstring. ### What changes were proposed in this pull request? Grammatical fix to documentation. ### Why are the changes needed? The documentation has a

[GitHub] [spark] itholic commented on a diff in pull request #39128: [SPARK-41586][Python] Introduce new PySpark package: `pyspark.errors` and error classes.

2022-12-19 Thread GitBox
itholic commented on code in PR #39128: URL: https://github.com/apache/spark/pull/39128#discussion_r1052123143 ## python/pyspark/errors/error_classes.py: ## @@ -0,0 +1,30 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license

[GitHub] [spark] LuciferYang commented on a diff in pull request #39120: [WIP] Make "rule id not found" error slightly easier to debug.

2022-12-19 Thread GitBox
LuciferYang commented on code in PR #39120: URL: https://github.com/apache/spark/pull/39120#discussion_r1052127797 ## core/src/main/resources/error/error-classes.json: ## @@ -4298,7 +4298,7 @@ }, "_LEGACY_ERROR_TEMP_2175" : { Review Comment: From the intention of the

[GitHub] [spark] cloud-fan commented on a diff in pull request #39041: [SPARK-41528][CONNECT] Merge namespace of Spark Connect and PySpark API

2022-12-19 Thread GitBox
cloud-fan commented on code in PR #39041: URL: https://github.com/apache/spark/pull/39041#discussion_r1052132399 ## python/pyspark/sql/observation.py: ## @@ -109,7 +111,9 @@ def _on(self, df: DataFrame, *exprs: Column) -> DataFrame: ) return

  1   2   3   >