date:20221219

[GitHub] [spark] techaddict commented on pull request #39104: [SPARK-41425][UI] Protobuf serializer for RDDStorageInfoWrapper

2022-12-19 Thread GitBox

techaddict commented on PR #39104: URL: https://github.com/apache/spark/pull/39104#issuecomment-1358568558 @gengliangwang addressed comments -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] amaliujia commented on a diff in pull request #39068: [SPARK-41434][CONNECT][PYTHON] Initial `LambdaFunction` implementation

2022-12-19 Thread GitBox

amaliujia commented on code in PR #39068: URL: https://github.com/apache/spark/pull/39068#discussion_r1052727913 ## connector/connect/server/src/main/scala/org/apache/spark/sql/connect/planner/SparkConnectPlanner.scala: ## @@ -534,6 +536,36 @@ class SparkConnectPlanner(session:

[GitHub] [spark] gengliangwang commented on a diff in pull request #39040: [SPARK-27561][SQL][FOLLOWUP] Support implicit lateral column alias resolution on Aggregate

2022-12-19 Thread GitBox

gengliangwang commented on code in PR #39040: URL: https://github.com/apache/spark/pull/39040#discussion_r1052735150 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveLateralColumnAlias.scala: ## @@ -244,7 +303,67 @@ object

[GitHub] [spark] anchovYu commented on a diff in pull request #39040: [SPARK-27561][SQL][FOLLOWUP] Support implicit lateral column alias resolution on Aggregate

2022-12-19 Thread GitBox

anchovYu commented on code in PR #39040: URL: https://github.com/apache/spark/pull/39040#discussion_r1052738997 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveLateralColumnAlias.scala: ## @@ -244,7 +303,67 @@ object

[GitHub] [spark] srielau commented on a diff in pull request #38861: [SPARK-41294][SQL] Assign a name to the error class _LEGACY_ERROR_TEMP_1203 / 1168

2022-12-19 Thread GitBox

srielau commented on code in PR #38861: URL: https://github.com/apache/spark/pull/38861#discussion_r1052745779 ## sql/core/src/test/resources/sql-tests/results/postgreSQL/numeric.sql.out: ## @@ -3831,12 +3831,12 @@ struct<> -- !query output

[GitHub] [spark] HyukjinKwon commented on pull request #39041: [SPARK-41528][CONNECT] Merge namespace of Spark Connect and PySpark API

2022-12-19 Thread GitBox

HyukjinKwon commented on PR #39041: URL: https://github.com/apache/spark/pull/39041#issuecomment-1358666500 Let me get this in in few days if there are no more comments. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

[GitHub] [spark] HyukjinKwon commented on pull request #39129: [SPARK-41587][BUILD] Upgrade `org.scalatestplus:selenium-4-4` to `org.scalatestplus:selenium-4-7`

2022-12-19 Thread GitBox

HyukjinKwon commented on PR #39129: URL: https://github.com/apache/spark/pull/39129#issuecomment-1358669841 cc @sarutak FYI -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #39128: [SPARK-41586][PYTHON] Introduce new PySpark package: `pyspark.errors` and error classes.

2022-12-19 Thread GitBox

HyukjinKwon commented on code in PR #39128: URL: https://github.com/apache/spark/pull/39128#discussion_r1052751097 ## python/pyspark/errors/__init__.py: ## @@ -0,0 +1,140 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #39128: [SPARK-41586][PYTHON] Introduce new PySpark package: `pyspark.errors` and error classes.

2022-12-19 Thread GitBox

HyukjinKwon commented on code in PR #39128: URL: https://github.com/apache/spark/pull/39128#discussion_r1052751697 ## python/pyspark/errors/error_classes.py: ## @@ -0,0 +1,30 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #39128: [SPARK-41586][PYTHON] Introduce new PySpark package: `pyspark.errors` and error classes.

2022-12-19 Thread GitBox

HyukjinKwon commented on code in PR #39128: URL: https://github.com/apache/spark/pull/39128#discussion_r1052751875 ## python/pyspark/testing/utils.py: ## @@ -138,6 +140,32 @@ def setUpClass(cls): def tearDownClass(cls): cls.sc.stop() +def checkError( Review

[GitHub] [spark] HyukjinKwon closed pull request #39117: [SPARK-41535][SQL] Set null correctly for calendar interval fields in `InterpretedUnsafeProjection` and `InterpretedMutableProjection`

2022-12-19 Thread GitBox

HyukjinKwon closed pull request #39117: [SPARK-41535][SQL] Set null correctly for calendar interval fields in `InterpretedUnsafeProjection` and `InterpretedMutableProjection` URL: https://github.com/apache/spark/pull/39117 -- This is an automated message from the Apache Git Service. To

[GitHub] [spark] WweiL opened a new pull request, #39132: [MINOR][DOC] Fix for Kafka Consumer Config Link

2022-12-19 Thread GitBox

WweiL opened a new pull request, #39132: URL: https://github.com/apache/spark/pull/39132 ### What changes were proposed in this pull request? Right the redirect link for kafka consumer config, before it points you to the top of the page, now it redirects you to the correct

[GitHub] [spark] HeartSaVioR closed pull request #39132: [MINOR][DOC] Fix for Kafka Consumer Config Link

2022-12-19 Thread GitBox

HeartSaVioR closed pull request #39132: [MINOR][DOC] Fix for Kafka Consumer Config Link URL: https://github.com/apache/spark/pull/39132 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] techaddict commented on a diff in pull request #39104: [SPARK-41425][UI] Protobuf serializer for RDDStorageInfoWrapper

2022-12-19 Thread GitBox

techaddict commented on code in PR #39104: URL: https://github.com/apache/spark/pull/39104#discussion_r1052723531 ## core/src/main/scala/org/apache/spark/status/protobuf/KVStoreProtobufSerializer.scala: ## @@ -17,7 +17,7 @@ package org.apache.spark.status.protobuf -import

[GitHub] [spark] amaliujia commented on a diff in pull request #39078: [SPARK-41534][CONNECT][SQL] Setup initial client module for Spark Connect

2022-12-19 Thread GitBox

amaliujia commented on code in PR #39078: URL: https://github.com/apache/spark/pull/39078#discussion_r1052725981 ## connector/connect/client/src/main/scala/org/apache/spark/sql/connect/client/SparkSession.scala: ## @@ -0,0 +1,133 @@ +/* + * Licensed to the Apache Software

[GitHub] [spark] amaliujia commented on a diff in pull request #39078: [SPARK-41534][CONNECT][SQL] Setup initial client module for Spark Connect

2022-12-19 Thread GitBox

amaliujia commented on code in PR #39078: URL: https://github.com/apache/spark/pull/39078#discussion_r1052725694 ## connector/connect/client/src/main/scala/org/apache/spark/sql/connect/client/SparkSession.scala: ## @@ -0,0 +1,133 @@ +/* + * Licensed to the Apache Software

[GitHub] [spark] HeartSaVioR commented on a diff in pull request #38517: [SPARK-39591][SS] Async Progress Tracking

2022-12-19 Thread GitBox

HeartSaVioR commented on code in PR #38517: URL: https://github.com/apache/spark/pull/38517#discussion_r1052728943 ## sql/core/src/test/scala/org/apache/spark/sql/execution/streaming/AsyncProgressTrackingMicroBatchExecutionSuite.scala: ## @@ -0,0 +1,1865 @@ +/* + * Licensed to

[GitHub] [spark] gengliangwang commented on a diff in pull request #39040: [SPARK-27561][SQL][FOLLOWUP] Support implicit lateral column alias resolution on Aggregate

2022-12-19 Thread GitBox

gengliangwang commented on code in PR #39040: URL: https://github.com/apache/spark/pull/39040#discussion_r1052735930 ## sql/core/src/test/scala/org/apache/spark/sql/LateralColumnAliasSuite.scala: ## @@ -689,4 +713,38 @@ class LateralColumnAliasSuite extends

[GitHub] [spark] gengliangwang commented on a diff in pull request #39104: [SPARK-41425][UI] Protobuf serializer for RDDStorageInfoWrapper

2022-12-19 Thread GitBox

gengliangwang commented on code in PR #39104: URL: https://github.com/apache/spark/pull/39104#discussion_r1052742075 ## core/src/main/scala/org/apache/spark/status/protobuf/RDDStorageInfoWrapperSerializer.scala: ## @@ -0,0 +1,126 @@ +/* + * Licensed to the Apache Software

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #39123: [SPARK-41583][CONNECT][PROTOBUF] Add Spark Connect and protobuf into setup.py with specifying dependencies

2022-12-19 Thread GitBox

HyukjinKwon commented on code in PR #39123: URL: https://github.com/apache/spark/pull/39123#discussion_r1052742314 ## python/setup.py: ## @@ -113,6 +113,7 @@ def _supports_symlinks(): # Also don't forget to update python/docs/source/getting_started/install.rst.

[GitHub] [spark] HyukjinKwon closed pull request #39123: [SPARK-41583][CONNECT][PROTOBUF] Add Spark Connect and protobuf into setup.py with specifying dependencies

2022-12-19 Thread GitBox

HyukjinKwon closed pull request #39123: [SPARK-41583][CONNECT][PROTOBUF] Add Spark Connect and protobuf into setup.py with specifying dependencies URL: https://github.com/apache/spark/pull/39123 -- This is an automated message from the Apache Git Service. To respond to the message, please

[GitHub] [spark] HyukjinKwon commented on pull request #39123: [SPARK-41583][CONNECT][PROTOBUF] Add Spark Connect and protobuf into setup.py with specifying dependencies

2022-12-19 Thread GitBox

HyukjinKwon commented on PR #39123: URL: https://github.com/apache/spark/pull/39123#issuecomment-1358655672 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] github-actions[bot] closed pull request #37831: [SPARK-40354][SQL] Support eliminate dynamic partition for datasource v1 writes

2022-12-19 Thread GitBox

github-actions[bot] closed pull request #37831: [SPARK-40354][SQL] Support eliminate dynamic partition for datasource v1 writes URL: https://github.com/apache/spark/pull/37831 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #39128: [SPARK-41586][PYTHON] Introduce new PySpark package: `pyspark.errors` and error classes.

2022-12-19 Thread GitBox

HyukjinKwon commented on code in PR #39128: URL: https://github.com/apache/spark/pull/39128#discussion_r1052750600 ## python/pyspark/errors/__init__.py: ## @@ -0,0 +1,140 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license

[GitHub] [spark] HyukjinKwon commented on pull request #39117: [SPARK-41535][SQL] Set null correctly for calendar interval fields in `InterpretedUnsafeProjection` and `InterpretedMutableProjection`

2022-12-19 Thread GitBox

HyukjinKwon commented on PR #39117: URL: https://github.com/apache/spark/pull/39117#issuecomment-1358675707 Merged to master, branch-3.3, and branch-3.2. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #39128: [SPARK-41586][PYTHON] Introduce new PySpark package: `pyspark.errors` and error classes.

2022-12-19 Thread GitBox

HyukjinKwon commented on code in PR #39128: URL: https://github.com/apache/spark/pull/39128#discussion_r1052752065 ## python/pyspark/sql/functions.py: ## @@ -8122,15 +8130,13 @@ def _get_lambda_parameters(f: Callable) -> ValuesView[inspect.Parameter]: # Validate that

[GitHub] [spark] zhengruifeng commented on a diff in pull request #39068: [SPARK-41434][CONNECT][PYTHON] Initial `LambdaFunction` implementation

2022-12-19 Thread GitBox

zhengruifeng commented on code in PR #39068: URL: https://github.com/apache/spark/pull/39068#discussion_r1052762202 ## connector/connect/server/src/main/scala/org/apache/spark/sql/connect/planner/SparkConnectPlanner.scala: ## @@ -534,6 +536,36 @@ class

[GitHub] [spark] HeartSaVioR commented on pull request #39132: [MINOR][DOC] Fix for Kafka Consumer Config Link

2022-12-19 Thread GitBox

HeartSaVioR commented on PR #39132: URL: https://github.com/apache/spark/pull/39132#issuecomment-1358709546 Thanks! Merging to master. (It's just a small doc change so won't wait for CI build.) -- This is an automated message from the Apache Git Service. To respond to the message, please

[GitHub] [spark] amaliujia commented on a diff in pull request #39078: [SPARK-41534][CONNECT][SQL] Setup initial client module for Spark Connect

2022-12-19 Thread GitBox

amaliujia commented on code in PR #39078: URL: https://github.com/apache/spark/pull/39078#discussion_r1052834300 ## connector/connect/client/src/main/scala/org/apache/spark/sql/connect/client/SparkSession.scala: ## @@ -0,0 +1,133 @@ +/* + * Licensed to the Apache Software

[GitHub] [spark] amaliujia commented on a diff in pull request #39078: [SPARK-41534][CONNECT][SQL] Setup initial client module for Spark Connect

2022-12-19 Thread GitBox

amaliujia commented on code in PR #39078: URL: https://github.com/apache/spark/pull/39078#discussion_r1052834300 ## connector/connect/client/src/main/scala/org/apache/spark/sql/connect/client/SparkSession.scala: ## @@ -0,0 +1,133 @@ +/* + * Licensed to the Apache Software

[GitHub] [spark] LuciferYang commented on pull request #39124: [DON'T MERGE] Test build and test with hadoop 3.3.5-RC0

2022-12-19 Thread GitBox

LuciferYang commented on PR #39124: URL: https://github.com/apache/spark/pull/39124#issuecomment-1358794388 Many test failed as follows: ``` 2022-12-20T03:15:37.0609530Z [info] org.apache.spark.sql.hive.execution.command.AlterTableAddColumnsSuite *** ABORTED *** (28 milliseconds)

[GitHub] [spark] LuciferYang commented on pull request #39124: [DON'T MERGE] Test build and test with hadoop 3.3.5-RC0

2022-12-19 Thread GitBox

LuciferYang commented on PR #39124: URL: https://github.com/apache/spark/pull/39124#issuecomment-1358800857 also cc @wangyum -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] itholic commented on pull request #39128: [SPARK-41586][PYTHON] Introduce new PySpark package: `pyspark.errors` and error classes.

2022-12-19 Thread GitBox

itholic commented on PR #39128: URL: https://github.com/apache/spark/pull/39128#issuecomment-1358810969 Let me close it for now, and re-create the PR to change the logic to re-use JVM. -- This is an automated message from the Apache Git Service. To respond to the message, please log on

[GitHub] [spark] itholic closed pull request #39128: [SPARK-41586][PYTHON] Introduce new PySpark package: `pyspark.errors` and error classes.

2022-12-19 Thread GitBox

itholic closed pull request #39128: [SPARK-41586][PYTHON] Introduce new PySpark package: `pyspark.errors` and error classes. URL: https://github.com/apache/spark/pull/39128 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

[GitHub] [spark] zhengruifeng closed pull request #38984: [SPARK-41349][CONNECT][PYTHON] Implement DataFrame.hint

2022-12-19 Thread GitBox

zhengruifeng closed pull request #38984: [SPARK-41349][CONNECT][PYTHON] Implement DataFrame.hint URL: https://github.com/apache/spark/pull/38984 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] zhengruifeng commented on pull request #38984: [SPARK-41349][CONNECT][PYTHON] Implement DataFrame.hint

2022-12-19 Thread GitBox

zhengruifeng commented on PR #38984: URL: https://github.com/apache/spark/pull/38984#issuecomment-1358823132 merged into master, thank you @dengziming for working on it! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

[GitHub] [spark] jerrypeng commented on pull request #38517: [SPARK-39591][SS] Async Progress Tracking

2022-12-19 Thread GitBox

jerrypeng commented on PR #38517: URL: https://github.com/apache/spark/pull/38517#issuecomment-1358828963 @HeartSaVioR I have addressed your comments please take another look -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

[GitHub] [spark] rxin opened a new pull request, #39134: [WIP] Implement group by star (aka group by all)

2022-12-19 Thread GitBox

rxin opened a new pull request, #39134: URL: https://github.com/apache/spark/pull/39134 ### What changes were proposed in this pull request? This patch implements group by star. This is similar to the "group by all" implemented in DuckDB. Note that I'm not done yet. We need to decide if

[GitHub] [spark] rxin commented on a diff in pull request #39134: [WIP] Implement group by star (aka group by all)

2022-12-19 Thread GitBox

rxin commented on code in PR #39134: URL: https://github.com/apache/spark/pull/39134#discussion_r1052909245 ## sql/core/src/test/resources/sql-tests/inputs/group-by-star.sql: ## @@ -0,0 +1,45 @@ +-- group by all Review Comment: do we need a test case for window functions?

[GitHub] [spark] cloud-fan commented on a diff in pull request #39134: [WIP] Implement group by star (aka group by all)

2022-12-19 Thread GitBox

cloud-fan commented on code in PR #39134: URL: https://github.com/apache/spark/pull/39134#discussion_r1052911386 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveGroupByStar.scala: ## @@ -0,0 +1,47 @@ +/* + * Licensed to the Apache Software

[GitHub] [spark] cloud-fan commented on a diff in pull request #39134: [WIP] Implement group by star (aka group by all)

2022-12-19 Thread GitBox

cloud-fan commented on code in PR #39134: URL: https://github.com/apache/spark/pull/39134#discussion_r1052911008 ## sql/core/src/test/resources/sql-tests/results/group-by-star-mosha.sql.out: ## @@ -0,0 +1,141 @@ +-- Automatically generated by SQLQueryTestSuite +-- !query

[GitHub] [spark] rxin commented on a diff in pull request #39134: [WIP] Implement group by star (aka group by all)

2022-12-19 Thread GitBox

rxin commented on code in PR #39134: URL: https://github.com/apache/spark/pull/39134#discussion_r1052910941 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveGroupByStar.scala: ## @@ -0,0 +1,47 @@ +/* + * Licensed to the Apache Software Foundation

[GitHub] [spark] gengliangwang commented on pull request #39100: [SPARK-41427][UI] Protobuf serializer for ExecutorStageSummaryWrapper

2022-12-19 Thread GitBox

gengliangwang commented on PR #39100: URL: https://github.com/apache/spark/pull/39100#issuecomment-1358848179 Merging to master -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] gengliangwang closed pull request #39100: [SPARK-41427][UI] Protobuf serializer for ExecutorStageSummaryWrapper

2022-12-19 Thread GitBox

gengliangwang closed pull request #39100: [SPARK-41427][UI] Protobuf serializer for ExecutorStageSummaryWrapper URL: https://github.com/apache/spark/pull/39100 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[GitHub] [spark] gengliangwang closed pull request #39104: [SPARK-41425][UI] Protobuf serializer for RDDStorageInfoWrapper

2022-12-19 Thread GitBox

gengliangwang closed pull request #39104: [SPARK-41425][UI] Protobuf serializer for RDDStorageInfoWrapper URL: https://github.com/apache/spark/pull/39104 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[GitHub] [spark] rxin commented on a diff in pull request #39134: [WIP] Implement group by star (aka group by all)

2022-12-19 Thread GitBox

rxin commented on code in PR #39134: URL: https://github.com/apache/spark/pull/39134#discussion_r1052911704 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveGroupByStar.scala: ## @@ -0,0 +1,47 @@ +/* + * Licensed to the Apache Software Foundation

[GitHub] [spark] gengliangwang commented on pull request #39104: [SPARK-41425][UI] Protobuf serializer for RDDStorageInfoWrapper

2022-12-19 Thread GitBox

gengliangwang commented on PR #39104: URL: https://github.com/apache/spark/pull/39104#issuecomment-1358850152 Thanks, merging to master -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] gengliangwang commented on a diff in pull request #39134: [WIP] Implement group by star (aka group by all)

2022-12-19 Thread GitBox

gengliangwang commented on code in PR #39134: URL: https://github.com/apache/spark/pull/39134#discussion_r1052919912 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveGroupByStar.scala: ## @@ -0,0 +1,47 @@ +/* + * Licensed to the Apache Software

[GitHub] [spark] gengliangwang commented on a diff in pull request #39134: [WIP] Implement group by star (aka group by all)

2022-12-19 Thread GitBox

gengliangwang commented on code in PR #39134: URL: https://github.com/apache/spark/pull/39134#discussion_r1052919759 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveGroupByStar.scala: ## @@ -0,0 +1,47 @@ +/* + * Licensed to the Apache Software

[GitHub] [spark] gengliangwang commented on a diff in pull request #39134: [WIP] Implement group by star (aka group by all)

2022-12-19 Thread GitBox

gengliangwang commented on code in PR #39134: URL: https://github.com/apache/spark/pull/39134#discussion_r1052920262 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveGroupByStar.scala: ## @@ -0,0 +1,47 @@ +/* + * Licensed to the Apache Software

[GitHub] [spark] gengliangwang commented on a diff in pull request #39134: [WIP] Implement group by star (aka group by all)

2022-12-19 Thread GitBox

gengliangwang commented on code in PR #39134: URL: https://github.com/apache/spark/pull/39134#discussion_r1052920528 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveGroupByStar.scala: ## @@ -0,0 +1,47 @@ +/* + * Licensed to the Apache Software

[GitHub] [spark] HeartSaVioR commented on pull request #39082: [SPARK-41539][SQL] Remap stats and constraints against output in logical plan for LogicalRDD

2022-12-19 Thread GitBox

HeartSaVioR commented on PR #39082: URL: https://github.com/apache/spark/pull/39082#issuecomment-1357261191 cc. @cloud-fan @viirya Friendly reminder. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

[GitHub] [spark] bjornjorgensen commented on a diff in pull request #39116: [SPARK-41566][BUILD] Upgrade `netty` to 4.1.86.Final

2022-12-19 Thread GitBox

bjornjorgensen commented on code in PR #39116: URL: https://github.com/apache/spark/pull/39116#discussion_r1051924268 ## dev/deps/spark-deps-hadoop-2-hive-2.3: ## @@ -200,24 +200,25 @@ metrics-jmx/4.2.13//metrics-jmx-4.2.13.jar metrics-json/4.2.13//metrics-json-4.2.13.jar

[GitHub] [spark] HyukjinKwon commented on pull request #39041: [SPARK-41528][CONNECT] Merge namespace of Spark Connect and PySpark API

2022-12-19 Thread GitBox

HyukjinKwon commented on PR #39041: URL: https://github.com/apache/spark/pull/39041#issuecomment-1357290022 We will use the same package and introduce an option but this PR doesn't cover yet. I'll work on that soon -- This is an automated message from the Apache Git Service. To respond

[GitHub] [spark] LuciferYang commented on a diff in pull request #39116: [SPARK-41566][BUILD] Upgrade `netty` to 4.1.86.Final

2022-12-19 Thread GitBox

LuciferYang commented on code in PR #39116: URL: https://github.com/apache/spark/pull/39116#discussion_r1051953415 ## dev/deps/spark-deps-hadoop-2-hive-2.3: ## @@ -200,24 +200,25 @@ metrics-jmx/4.2.13//metrics-jmx-4.2.13.jar metrics-json/4.2.13//metrics-json-4.2.13.jar

[GitHub] [spark] bjornjorgensen commented on a diff in pull request #39116: [SPARK-41566][BUILD] Upgrade `netty` to 4.1.86.Final

2022-12-19 Thread GitBox

bjornjorgensen commented on code in PR #39116: URL: https://github.com/apache/spark/pull/39116#discussion_r1051972925 ## dev/deps/spark-deps-hadoop-2-hive-2.3: ## @@ -200,24 +200,25 @@ metrics-jmx/4.2.13//metrics-jmx-4.2.13.jar metrics-json/4.2.13//metrics-json-4.2.13.jar

[GitHub] [spark] gboo-infa commented on pull request #39097: [SPARK-41049][SQL] Make to_csv function deterministic

2022-12-19 Thread GitBox

gboo-infa commented on PR #39097: URL: https://github.com/apache/spark/pull/39097#issuecomment-1357352474 While this change may be good to have for its own reasons, it doesn't really address the problem in the JIRA. The problem is that CodegenFallback is incompatible with nondeterministic

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #39041: [SPARK-41528][CONNECT] Merge namespace of Spark Connect and PySpark API

2022-12-19 Thread GitBox

HyukjinKwon commented on code in PR #39041: URL: https://github.com/apache/spark/pull/39041#discussion_r1052017485 ## python/pyspark/sql/observation.py: ## @@ -109,7 +111,9 @@ def _on(self, df: DataFrame, *exprs: Column) -> DataFrame: ) return

[GitHub] [spark] HyukjinKwon commented on pull request #39123: [SPARK-41583][CONNECT][PROTOBUF] Add Spark Connect and protobuf into setup.py with specifying dependencies

2022-12-19 Thread GitBox

HyukjinKwon commented on PR #39123: URL: https://github.com/apache/spark/pull/39123#issuecomment-1357448291 Build: https://github.com/HyukjinKwon/spark/actions/runs/3730423845 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

[GitHub] [spark] LuciferYang commented on pull request #39125: [SPARK-41584][BUILD] Upgrade RoaringBitmap to 0.9.36

2022-12-19 Thread GitBox

LuciferYang commented on PR #39125: URL: https://github.com/apache/spark/pull/39125#issuecomment-1357245572 will update result of `MapStatusesConvertBenchmark` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[GitHub] [spark] cloud-fan commented on pull request #39081: [SPARK-41538][SQL] Metadata column should be appended at the end of project list

2022-12-19 Thread GitBox

cloud-fan commented on PR #39081: URL: https://github.com/apache/spark/pull/39081#issuecomment-1357246940 late LGTM -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

[GitHub] [spark] zhengruifeng commented on a diff in pull request #39068: [SPARK-41434][CONNECT][PYTHON] Initial `LambdaFunction` implementation

2022-12-19 Thread GitBox

zhengruifeng commented on code in PR #39068: URL: https://github.com/apache/spark/pull/39068#discussion_r1052020129 ## connector/connect/common/src/main/protobuf/spark/connect/expressions.proto: ## @@ -167,4 +169,26 @@ message Expression { // (Optional) Alias metadata

[GitHub] [spark] zhengruifeng commented on a diff in pull request #39068: [SPARK-41434][CONNECT][PYTHON] Initial `LambdaFunction` implementation

2022-12-19 Thread GitBox

zhengruifeng commented on code in PR #39068: URL: https://github.com/apache/spark/pull/39068#discussion_r1052021125 ## python/pyspark/sql/connect/functions.py: ## @@ -80,6 +84,78 @@ def _invoke_binary_math_function(name: str, col1: Any, col2: Any) -> Column: return

[GitHub] [spark] beliefer commented on pull request #39084: [SPARK-41464][CONNECT][PYTHON] Implement `DataFrame.to`

2022-12-19 Thread GitBox

beliefer commented on PR #39084: URL: https://github.com/apache/spark/pull/39084#issuecomment-1357476253 ping @zhengruifeng @amaliujia -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] cloud-fan commented on pull request #39041: [SPARK-41528][CONNECT] Merge namespace of Spark Connect and PySpark API

2022-12-19 Thread GitBox

cloud-fan commented on PR #39041: URL: https://github.com/apache/spark/pull/39041#issuecomment-1357280213 The PR description looks pretty clear. One thing I'm not very clear is the step 0: `pip install pyspark`. Do we need a different package to use `--remote`, or the `pyspark` package

[GitHub] [spark] cloud-fan commented on a diff in pull request #39041: [SPARK-41528][CONNECT] Merge namespace of Spark Connect and PySpark API

2022-12-19 Thread GitBox

cloud-fan commented on code in PR #39041: URL: https://github.com/apache/spark/pull/39041#discussion_r1051932455 ## python/pyspark/sql/observation.py: ## @@ -109,7 +111,9 @@ def _on(self, df: DataFrame, *exprs: Column) -> DataFrame: ) return

[GitHub] [spark] cloud-fan commented on a diff in pull request #39068: [SPARK-41434][CONNECT][PYTHON] Initial `LambdaFunction` implementation

2022-12-19 Thread GitBox

cloud-fan commented on code in PR #39068: URL: https://github.com/apache/spark/pull/39068#discussion_r1051940915 ## connector/connect/common/src/main/protobuf/spark/connect/expressions.proto: ## @@ -167,4 +169,26 @@ message Expression { // (Optional) Alias metadata

[GitHub] [spark] LuciferYang commented on a diff in pull request #39116: [SPARK-41566][BUILD] Upgrade `netty` to 4.1.86.Final

2022-12-19 Thread GitBox

LuciferYang commented on code in PR #39116: URL: https://github.com/apache/spark/pull/39116#discussion_r1051961044 ## dev/deps/spark-deps-hadoop-2-hive-2.3: ## @@ -200,24 +200,25 @@ metrics-jmx/4.2.13//metrics-jmx-4.2.13.jar metrics-json/4.2.13//metrics-json-4.2.13.jar

[GitHub] [spark] LucaCanali opened a new pull request, #39127: [SPARK-41585][YARN] Set excludeNodes for executor allocation in YARN besides dynamic allo…

2022-12-19 Thread GitBox

LucaCanali opened a new pull request, #39127: URL: https://github.com/apache/spark/pull/39127 ### What changes were proposed in this pull request? The Spark exclude node functionality for Spark on YARN, introduced in [SPARK-26688](https://issues.apache.org/jira/browse/SPARK-26688),

[GitHub] [spark] LuciferYang commented on pull request #39104: [SPARK-41425] Protobuf serializer for RDDStorageInfoWrapper

2022-12-19 Thread GitBox

LuciferYang commented on PR #39104: URL: https://github.com/apache/spark/pull/39104#issuecomment-1357385626 If so, we should add `[UI]` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] cloud-fan commented on a diff in pull request #39068: [SPARK-41434][CONNECT][PYTHON] Initial `LambdaFunction` implementation

2022-12-19 Thread GitBox

cloud-fan commented on code in PR #39068: URL: https://github.com/apache/spark/pull/39068#discussion_r1051941366 ## python/pyspark/sql/connect/functions.py: ## @@ -80,6 +84,78 @@ def _invoke_binary_math_function(name: str, col1: Any, col2: Any) -> Column: return

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #39041: [SPARK-41528][CONNECT] Merge namespace of Spark Connect and PySpark API

2022-12-19 Thread GitBox

HyukjinKwon commented on code in PR #39041: URL: https://github.com/apache/spark/pull/39041#discussion_r1052015205 ## core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala: ## @@ -229,15 +229,20 @@ private[spark] class SparkSubmit extends Logging { var

[GitHub] [spark] fred-db commented on a diff in pull request #38941: [SPARK-41498] Propagate metadata through Union

2022-12-19 Thread GitBox

fred-db commented on code in PR #38941: URL: https://github.com/apache/spark/pull/38941#discussion_r1051925750 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicLogicalOperators.scala: ## @@ -449,22 +449,54 @@ case class Union(

[GitHub] [spark] NarekDW commented on pull request #39097: [SPARK-41049][SQL] Make to_csv function deterministic

2022-12-19 Thread GitBox

NarekDW commented on PR #39097: URL: https://github.com/apache/spark/pull/39097#issuecomment-1357494252 > While this change may be good to have for its own reasons, it doesn't really address the problem in the JIRA. The problem is that CodegenFallback is incompatible with nondeterministic

[GitHub] [spark] grundprinzip commented on a diff in pull request #39084: [SPARK-41464][CONNECT][PYTHON] Implement `DataFrame.to`

2022-12-19 Thread GitBox

grundprinzip commented on code in PR #39084: URL: https://github.com/apache/spark/pull/39084#discussion_r1052226595 ## python/pyspark/sql/tests/connect/test_connect_plan_only.py: ## @@ -498,6 +499,19 @@ def test_coalesce_and_repartition(self):

[GitHub] [spark] itholic opened a new pull request, #39128: [SPARK-41586][Python] Introduce new PySpark package: `pyspark.errors` and error classes.

2022-12-19 Thread GitBox

itholic opened a new pull request, #39128: URL: https://github.com/apache/spark/pull/39128 ### What changes were proposed in this pull request? This PR proposes to introduce `pyspark.errors` and error classes to unifying & improving errors generated by PySpark under a single path.

[GitHub] [spark] cloud-fan commented on a diff in pull request #39095: [SPARK-41565][SQL] Add the error class `UNRESOLVED_ROUTINE`

2022-12-19 Thread GitBox

cloud-fan commented on code in PR #39095: URL: https://github.com/apache/spark/pull/39095#discussion_r1052134644 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala: ## @@ -2185,7 +2185,10 @@ class Analyzer(override val catalogManager:

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #39041: [SPARK-41528][CONNECT] Merge namespace of Spark Connect and PySpark API

2022-12-19 Thread GitBox

HyukjinKwon commented on code in PR #39041: URL: https://github.com/apache/spark/pull/39041#discussion_r1052134706 ## python/pyspark/sql/observation.py: ## @@ -109,7 +111,9 @@ def _on(self, df: DataFrame, *exprs: Column) -> DataFrame: ) return

[GitHub] [spark] MaxGekk commented on a diff in pull request #39095: [SPARK-41565][SQL] Add the error class `UNRESOLVED_ROUTINE`

2022-12-19 Thread GitBox

MaxGekk commented on code in PR #39095: URL: https://github.com/apache/spark/pull/39095#discussion_r1052159400 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/FunctionRegistry.scala: ## @@ -227,7 +227,7 @@ trait SimpleFunctionRegistryBase[T] extends

[GitHub] [spark] MaxGekk commented on a diff in pull request #39095: [SPARK-41565][SQL] Add the error class `UNRESOLVED_ROUTINE`

2022-12-19 Thread GitBox

MaxGekk commented on code in PR #39095: URL: https://github.com/apache/spark/pull/39095#discussion_r1052182744 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala: ## @@ -2283,7 +2286,12 @@ class Analyzer(override val catalogManager:

[GitHub] [spark] grundprinzip commented on a diff in pull request #39068: [SPARK-41434][CONNECT][PYTHON] Initial `LambdaFunction` implementation

2022-12-19 Thread GitBox

grundprinzip commented on code in PR #39068: URL: https://github.com/apache/spark/pull/39068#discussion_r1051953537 ## connector/connect/common/src/main/protobuf/spark/connect/expressions.proto: ## @@ -167,4 +169,26 @@ message Expression { // (Optional) Alias metadata

[GitHub] [spark] cloud-fan commented on a diff in pull request #38941: [SPARK-41498] Propagate metadata through Union

2022-12-19 Thread GitBox

cloud-fan commented on code in PR #38941: URL: https://github.com/apache/spark/pull/38941#discussion_r1052269541 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicLogicalOperators.scala: ## @@ -449,22 +449,54 @@ case class Union(

[GitHub] [spark] cloud-fan commented on a diff in pull request #38941: [SPARK-41498] Propagate metadata through Union

2022-12-19 Thread GitBox

cloud-fan commented on code in PR #38941: URL: https://github.com/apache/spark/pull/38941#discussion_r1052283207 ## sql/core/src/test/scala/org/apache/spark/sql/connector/MetadataColumnSuite.scala: ## @@ -232,4 +239,191 @@ class MetadataColumnSuite extends DatasourceV2SQLBase {

[GitHub] [spark] itholic commented on a diff in pull request #39128: [SPARK-41586][Python] Introduce new PySpark package: `pyspark.errors` and error classes.

2022-12-19 Thread GitBox

itholic commented on code in PR #39128: URL: https://github.com/apache/spark/pull/39128#discussion_r1052117811 ## python/pyspark/errors/__init__.py: ## @@ -0,0 +1,140 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements.

[GitHub] [spark] yabola commented on a diff in pull request #37922: [SPARK-40480][SHUFFLE] Remove push-based shuffle data after query finished

2022-12-19 Thread GitBox

yabola commented on code in PR #37922: URL: https://github.com/apache/spark/pull/37922#discussion_r1052111525 ## common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/RemoteBlockPushResolver.java: ## @@ -396,6 +403,56 @@ public void applicationRemoved(String

[GitHub] [spark] cloud-fan commented on pull request #38968: [SPARK-41441][SQL] Support Generate with no required child output to host outer references

2022-12-19 Thread GitBox

cloud-fan commented on PR #38968: URL: https://github.com/apache/spark/pull/38968#issuecomment-1357560347 thanks, merging to master! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] cloud-fan closed pull request #38968: [SPARK-41441][SQL] Support Generate with no required child output to host outer references

2022-12-19 Thread GitBox

cloud-fan closed pull request #38968: [SPARK-41441][SQL] Support Generate with no required child output to host outer references URL: https://github.com/apache/spark/pull/38968 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

[GitHub] [spark] monkeyboy123 commented on a diff in pull request #39102: [SPARK-41555][SQL] Multi sparkSession should share single SQLAppStatusStore

2022-12-19 Thread GitBox

monkeyboy123 commented on code in PR #39102: URL: https://github.com/apache/spark/pull/39102#discussion_r1052167027 ## sql/core/src/main/scala/org/apache/spark/sql/internal/SharedState.scala: ## @@ -109,13 +109,18 @@ private[sql] class SharedState( * A status store to query

[GitHub] [spark] grundprinzip commented on a diff in pull request #39084: [SPARK-41464][CONNECT][PYTHON] Implement `DataFrame.to`

2022-12-19 Thread GitBox

grundprinzip commented on code in PR #39084: URL: https://github.com/apache/spark/pull/39084#discussion_r1052227007 ## python/pyspark/sql/tests/connect/test_connect_basic.py: ## @@ -389,6 +389,21 @@ def test_schema(self): self.connect.sql(query).schema.__repr__(),

[GitHub] [spark] beliefer commented on pull request #39091: [SPARK-41527][CONNECT][PYTHON] Implement `DataFrame.observe`

2022-12-19 Thread GitBox

beliefer commented on PR #39091: URL: https://github.com/apache/spark/pull/39091#issuecomment-1357520476 > I think it would be possible to add another result batch type for observed metrics and simply pass them at the end. I have an idea: 1. cache the `Observation` at server.

[GitHub] [spark] yabola commented on a diff in pull request #37922: [SPARK-40480][SHUFFLE] Remove push-based shuffle data after query finished

2022-12-19 Thread GitBox

yabola commented on code in PR #37922: URL: https://github.com/apache/spark/pull/37922#discussion_r1052111525 ## common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/RemoteBlockPushResolver.java: ## @@ -396,6 +403,56 @@ public void applicationRemoved(String

[GitHub] [spark] itholic commented on a diff in pull request #39128: [SPARK-41586][Python] Introduce new PySpark package: `pyspark.errors` and error classes.

2022-12-19 Thread GitBox

itholic commented on code in PR #39128: URL: https://github.com/apache/spark/pull/39128#discussion_r1052115930 ## python/pyspark/errors/__init__.py: ## @@ -0,0 +1,140 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements.

[GitHub] [spark] LuciferYang opened a new pull request, #39129: [SPARK-41587][BUILD] Upgrade `org.scalatestplus:selenium-4-4` to `org.scalatestplus:selenium-4-7`

2022-12-19 Thread GitBox

LuciferYang opened a new pull request, #39129: URL: https://github.com/apache/spark/pull/39129 ### What changes were proposed in this pull request? This pr aims upgrade `org.scalatestplus:selenium-4-4` to `org.scalatestplus:selenium-4-7`: - `org.scalatestplus:selenium-4-4` ->

[GitHub] [spark] yabola commented on a diff in pull request #37922: [SPARK-40480][SHUFFLE] Remove push-based shuffle data after query finished

2022-12-19 Thread GitBox

yabola commented on code in PR #37922: URL: https://github.com/apache/spark/pull/37922#discussion_r1052111525 ## common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/RemoteBlockPushResolver.java: ## @@ -396,6 +403,56 @@ public void applicationRemoved(String

[GitHub] [spark] cloud-fan commented on a diff in pull request #39095: [SPARK-41565][SQL] Add the error class `UNRESOLVED_ROUTINE`

2022-12-19 Thread GitBox

cloud-fan commented on code in PR #39095: URL: https://github.com/apache/spark/pull/39095#discussion_r1052136601 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/FunctionRegistry.scala: ## @@ -227,7 +227,7 @@ trait SimpleFunctionRegistryBase[T] extends

[GitHub] [spark] wecharyu commented on pull request #39115: [SPARK-41563][SQL] Support partition filter in MSCK REPAIR TABLE statement

2022-12-19 Thread GitBox

wecharyu commented on PR #39115: URL: https://github.com/apache/spark/pull/39115#issuecomment-1357617492 @MaxGekk @cloud-fan @dongjoon-hyun could you please take a look? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

[GitHub] [spark] Shooter23 opened a new pull request, #39130: Update dataframe.py

2022-12-19 Thread GitBox

Shooter23 opened a new pull request, #39130: URL: https://github.com/apache/spark/pull/39130 Fix docstring. ### What changes were proposed in this pull request? Grammatical fix to documentation. ### Why are the changes needed? The documentation has a

[GitHub] [spark] itholic commented on a diff in pull request #39128: [SPARK-41586][Python] Introduce new PySpark package: `pyspark.errors` and error classes.

2022-12-19 Thread GitBox

itholic commented on code in PR #39128: URL: https://github.com/apache/spark/pull/39128#discussion_r1052123143 ## python/pyspark/errors/error_classes.py: ## @@ -0,0 +1,30 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license

[GitHub] [spark] LuciferYang commented on a diff in pull request #39120: [WIP] Make "rule id not found" error slightly easier to debug.

2022-12-19 Thread GitBox

LuciferYang commented on code in PR #39120: URL: https://github.com/apache/spark/pull/39120#discussion_r1052127797 ## core/src/main/resources/error/error-classes.json: ## @@ -4298,7 +4298,7 @@ }, "_LEGACY_ERROR_TEMP_2175" : { Review Comment: From the intention of the

[GitHub] [spark] cloud-fan commented on a diff in pull request #39041: [SPARK-41528][CONNECT] Merge namespace of Spark Connect and PySpark API

2022-12-19 Thread GitBox

cloud-fan commented on code in PR #39041: URL: https://github.com/apache/spark/pull/39041#discussion_r1052132399 ## python/pyspark/sql/observation.py: ## @@ -109,7 +111,9 @@ def _on(self, df: DataFrame, *exprs: Column) -> DataFrame: ) return

1 2 3 >

1 - 100 of 209 matches

Mail list logo