[GitHub] [spark] tedyu commented on pull request #39250: [SQL][MINOR] Use Diamond operator for constructing HashMap

2022-12-28 Thread GitBox
tedyu commented on PR #39250: URL: https://github.com/apache/spark/pull/39250#issuecomment-1367136917 @srowen Please let me know what else should be done for this PR. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

[GitHub] [spark] MaxGekk commented on a diff in pull request #39239: [SPARK-41730][PYTHON] Set tz to UTC while converting of timestamps to python's datetime

2022-12-28 Thread GitBox
MaxGekk commented on code in PR #39239: URL: https://github.com/apache/spark/pull/39239#discussion_r1058794838 ## python/pyspark/pandas/tests/test_resample.py: ## @@ -263,7 +263,7 @@ def test_dataframe_resample(self): def test_series_resample(self): self._test_resa

[GitHub] [spark] grundprinzip commented on a diff in pull request #39283: [SPARK-41767][CONNECT][PYTHON] Implement `Column.{withField, dropFields}`

2022-12-28 Thread GitBox
grundprinzip commented on code in PR #39283: URL: https://github.com/apache/spark/pull/39283#discussion_r1058789739 ## connector/connect/common/src/main/protobuf/spark/connect/expressions.proto: ## @@ -241,6 +242,20 @@ message Expression { Expression extraction = 2; }

[GitHub] [spark] cloud-fan closed pull request #39269: [SPARK-41631][FOLLOWUP][SQL] Fix two issues in implicit lateral column alias resolution on Aggregate

2022-12-28 Thread GitBox
cloud-fan closed pull request #39269: [SPARK-41631][FOLLOWUP][SQL] Fix two issues in implicit lateral column alias resolution on Aggregate URL: https://github.com/apache/spark/pull/39269 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to G

[GitHub] [spark] cloud-fan commented on pull request #39269: [SPARK-41631][FOLLOWUP][SQL] Fix two issues in implicit lateral column alias resolution on Aggregate

2022-12-28 Thread GitBox
cloud-fan commented on PR #39269: URL: https://github.com/apache/spark/pull/39269#issuecomment-1367129829 thanks, merging to master! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific c

[GitHub] [spark] cloud-fan commented on pull request #39269: [SPARK-41631][FOLLOWUP][SQL] Fix two issues in implicit lateral column alias resolution on Aggregate

2022-12-28 Thread GitBox
cloud-fan commented on PR #39269: URL: https://github.com/apache/spark/pull/39269#issuecomment-1367129489 The failure is unrelated: `python/pyspark/sql/connect/client.py:25: error: Skipping analyzing "grpc_status": module is installed, but missing library stubs or py.typed marker` -- Thi

[GitHub] [spark] cloud-fan commented on a diff in pull request #39239: [SPARK-41730][PYTHON] Set tz to UTC while converting of timestamps to python's datetime

2022-12-28 Thread GitBox
cloud-fan commented on code in PR #39239: URL: https://github.com/apache/spark/pull/39239#discussion_r1058787040 ## python/pyspark/pandas/tests/test_resample.py: ## @@ -263,7 +263,7 @@ def test_dataframe_resample(self): def test_series_resample(self): self._test_re

[GitHub] [spark] cloud-fan commented on a diff in pull request #39239: [SPARK-41730][PYTHON] Set tz to UTC while converting of timestamps to python's datetime

2022-12-28 Thread GitBox
cloud-fan commented on code in PR #39239: URL: https://github.com/apache/spark/pull/39239#discussion_r1058786637 ## python/pyspark/sql/types.py: ## @@ -276,7 +276,18 @@ def toInternal(self, dt: datetime.datetime) -> int: def fromInternal(self, ts: int) -> datetime.datetime:

[GitHub] [spark] zhengruifeng commented on pull request #39283: [SPARK-41767][CONNECT][PYTHON] Implement `Column.{withField, dropFields}`

2022-12-28 Thread GitBox
zhengruifeng commented on PR #39283: URL: https://github.com/apache/spark/pull/39283#issuecomment-1367128645 cc @HyukjinKwon @cloud-fan @grundprinzip -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to g

[GitHub] [spark] mridulm commented on pull request #39275: [SPARK-41759][CORE] Use `weakIntern` on string values in create new objects during deserialization

2022-12-28 Thread GitBox
mridulm commented on PR #39275: URL: https://github.com/apache/spark/pull/39275#issuecomment-1367126643 Primarily use weakIntern for cases where there are a large number of duplicated strings with same value (so app start won't qualify), for the most common values. Not for others -- T

[GitHub] [spark] zhengruifeng opened a new pull request, #39283: [SPARK-41767][CONNECT][PYTHON] Implement `Column.{withField, dropFields}`

2022-12-28 Thread GitBox
zhengruifeng opened a new pull request, #39283: URL: https://github.com/apache/spark/pull/39283 ### What changes were proposed in this pull request? Implement `Column.{withField, dropFields}` ### Why are the changes needed? For API coverage ### Does this PR introdu

[GitHub] [spark] LuciferYang commented on pull request #39255: [DON'T MERGE][BUILD] Switch default protobuf-java version to 3.x

2022-12-28 Thread GitBox
LuciferYang commented on PR #39255: URL: https://github.com/apache/spark/pull/39255#issuecomment-1367117093 @bjornjorgensen The pr is still being tested. I feel strange that the yarn module can pass the test with `-Phadoop-2`. -- This is an automated message from the Apache Git Servic

[GitHub] [spark] viirya commented on a diff in pull request #39248: [SPARK-41049][SQL] Revisit stateful expression handling

2022-12-28 Thread GitBox
viirya commented on code in PR #39248: URL: https://github.com/apache/spark/pull/39248#discussion_r1058775725 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Expression.scala: ## @@ -127,6 +125,54 @@ abstract class Expression extends TreeNode[Expression]

[GitHub] [spark] LuciferYang commented on pull request #39215: [SPARK-41709][CORE][SQL][UI] Explicitly define `Seq` as `collection.Seq` to avoid `toSeq` when create ui objects from protobuf objects fo

2022-12-28 Thread GitBox
LuciferYang commented on PR #39215: URL: https://github.com/apache/spark/pull/39215#issuecomment-1367116092 Thanks @srowen -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

[GitHub] [spark] LuciferYang commented on pull request #39265: [SPARK-41750][BUILD] Upgrade `dev.ludovic.netlib` to 3.0.3

2022-12-28 Thread GitBox
LuciferYang commented on PR #39265: URL: https://github.com/apache/spark/pull/39265#issuecomment-1367116153 Thanks @dongjoon-hyun -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific com

[GitHub] [spark] cloud-fan commented on a diff in pull request #39248: [SPARK-41049][SQL] Revisit stateful expression handling

2022-12-28 Thread GitBox
cloud-fan commented on code in PR #39248: URL: https://github.com/apache/spark/pull/39248#discussion_r1058768775 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Expression.scala: ## @@ -127,6 +125,54 @@ abstract class Expression extends TreeNode[Expressi

[GitHub] [spark] cloud-fan commented on a diff in pull request #39248: [SPARK-41049][SQL] Revisit stateful expression handling

2022-12-28 Thread GitBox
cloud-fan commented on code in PR #39248: URL: https://github.com/apache/spark/pull/39248#discussion_r1058768172 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/ExpressionsEvaluator.scala: ## @@ -0,0 +1,45 @@ +/* + * Licensed to the Apache Software Found

[GitHub] [spark] cloud-fan commented on a diff in pull request #39248: [SPARK-41049][SQL] Revisit stateful expression handling

2022-12-28 Thread GitBox
cloud-fan commented on code in PR #39248: URL: https://github.com/apache/spark/pull/39248#discussion_r1058767972 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Expression.scala: ## @@ -127,6 +125,54 @@ abstract class Expression extends TreeNode[Expressi

[GitHub] [spark] cloud-fan commented on a diff in pull request #39248: [SPARK-41049][SQL] Revisit stateful expression handling

2022-12-28 Thread GitBox
cloud-fan commented on code in PR #39248: URL: https://github.com/apache/spark/pull/39248#discussion_r1058767012 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/ExpressionsEvaluator.scala: ## @@ -0,0 +1,45 @@ +/* + * Licensed to the Apache Software Found

[GitHub] [spark] cloud-fan commented on a diff in pull request #39248: [SPARK-41049][SQL] Revisit stateful expression handling

2022-12-28 Thread GitBox
cloud-fan commented on code in PR #39248: URL: https://github.com/apache/spark/pull/39248#discussion_r1058766773 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/ExpressionsEvaluator.scala: ## @@ -0,0 +1,45 @@ +/* + * Licensed to the Apache Software Found

[GitHub] [spark] itholic opened a new pull request, #39282: [SPARK-41581][SQL] Assign name to _LEGACY_ERROR_TEMP_1230

2022-12-28 Thread GitBox
itholic opened a new pull request, #39282: URL: https://github.com/apache/spark/pull/39282 ### What changes were proposed in this pull request? This PR proposes to assign name to _LEGACY_ERROR_TEMP_1230, "NEGATIVE_SCALE_NOT_ALLOWED". ### Why are the changes needed?

[GitHub] [spark] LuciferYang commented on a diff in pull request #39270: [SPARK-41754][UI] Add simple developer guides for UI Protobuf serializer

2022-12-28 Thread GitBox
LuciferYang commented on code in PR #39270: URL: https://github.com/apache/spark/pull/39270#discussion_r1058762420 ## core/src/main/protobuf/org/apache/spark/status/protobuf/store_types.proto: ## @@ -18,8 +18,15 @@ syntax = "proto3"; package org.apache.spark.status.protobuf;

[GitHub] [spark] LuciferYang commented on a diff in pull request #39270: [SPARK-41754][UI] Add simple developer guides for UI Protobuf serializer

2022-12-28 Thread GitBox
LuciferYang commented on code in PR #39270: URL: https://github.com/apache/spark/pull/39270#discussion_r1058761067 ## core/src/main/protobuf/org/apache/spark/status/protobuf/store_types.proto: ## @@ -18,8 +18,15 @@ syntax = "proto3"; package org.apache.spark.status.protobuf;

[GitHub] [spark] LuciferYang commented on a diff in pull request #39270: [SPARK-41754][UI] Add simple developer guides for UI Protobuf serializer

2022-12-28 Thread GitBox
LuciferYang commented on code in PR #39270: URL: https://github.com/apache/spark/pull/39270#discussion_r1058761067 ## core/src/main/protobuf/org/apache/spark/status/protobuf/store_types.proto: ## @@ -18,8 +18,15 @@ syntax = "proto3"; package org.apache.spark.status.protobuf;

[GitHub] [spark] viirya commented on a diff in pull request #39248: [SPARK-41049][SQL] Revisit stateful expression handling

2022-12-28 Thread GitBox
viirya commented on code in PR #39248: URL: https://github.com/apache/spark/pull/39248#discussion_r1058759293 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Expression.scala: ## @@ -127,6 +125,54 @@ abstract class Expression extends TreeNode[Expression]

[GitHub] [spark] LuciferYang commented on a diff in pull request #39226: [SPARK-41694][CORE] Add new config to clean up `spark.ui.store.path` directory when `SparkContext.stop()`

2022-12-28 Thread GitBox
LuciferYang commented on code in PR #39226: URL: https://github.com/apache/spark/pull/39226#discussion_r1058754762 ## core/src/main/scala/org/apache/spark/status/AppStatusStore.scala: ## @@ -733,6 +734,15 @@ private[spark] class AppStatusStore( def close(): Unit = { st

[GitHub] [spark] viirya commented on a diff in pull request #39248: [SPARK-41049][SQL] Revisit stateful expression handling

2022-12-28 Thread GitBox
viirya commented on code in PR #39248: URL: https://github.com/apache/spark/pull/39248#discussion_r1058750103 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/InterpretedMutableProjection.scala: ## @@ -117,10 +111,6 @@ object InterpretedMutableProjection

[GitHub] [spark] viirya commented on a diff in pull request #39248: [SPARK-41049][SQL] Revisit stateful expression handling

2022-12-28 Thread GitBox
viirya commented on code in PR #39248: URL: https://github.com/apache/spark/pull/39248#discussion_r1058750004 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/ExpressionsEvaluator.scala: ## @@ -0,0 +1,45 @@ +/* + * Licensed to the Apache Software Foundati

[GitHub] [spark] viirya commented on a diff in pull request #39248: [SPARK-41049][SQL] Revisit stateful expression handling

2022-12-28 Thread GitBox
viirya commented on code in PR #39248: URL: https://github.com/apache/spark/pull/39248#discussion_r1058748428 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/ExpressionsEvaluator.scala: ## @@ -0,0 +1,45 @@ +/* + * Licensed to the Apache Software Foundati

[GitHub] [spark] itholic opened a new pull request, #39281: [SPARK-41576][SQL] Assign name to _LEGACY_ERROR_TEMP_2051

2022-12-28 Thread GitBox
itholic opened a new pull request, #39281: URL: https://github.com/apache/spark/pull/39281 ### What changes were proposed in this pull request? This PR proposes to assign name to _LEGACY_ERROR_TEMP_2051, "DATA_SOURCE_NOT_FOUND". ### Why are the changes needed?

[GitHub] [spark] itholic commented on a diff in pull request #39258: [SPARK-41572][SQL] Assign name to _LEGACY_ERROR_TEMP_2149

2022-12-28 Thread GitBox
itholic commented on code in PR #39258: URL: https://github.com/apache/spark/pull/39258#discussion_r1058741568 ## sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/csv/CSVSuite.scala: ## @@ -370,8 +370,11 @@ abstract class CSVSuite .load(testFile(cars

[GitHub] [spark] itholic commented on a diff in pull request #39258: [SPARK-41572][SQL] Assign name to _LEGACY_ERROR_TEMP_2149

2022-12-28 Thread GitBox
itholic commented on code in PR #39258: URL: https://github.com/apache/spark/pull/39258#discussion_r1058741274 ## core/src/main/resources/error/error-classes.json: ## @@ -851,6 +851,11 @@ "Cannot name the managed table as , as its associated location already exists. Ple

[GitHub] [spark] gengliangwang commented on a diff in pull request #39192: [SPARK-41423][CORE] Protobuf serializer for StageDataWrapper

2022-12-28 Thread GitBox
gengliangwang commented on code in PR #39192: URL: https://github.com/apache/spark/pull/39192#discussion_r1058739922 ## core/src/main/protobuf/org/apache/spark/status/protobuf/store_types.proto: ## @@ -390,3 +390,214 @@ message SQLExecutionUIData { repeated int64 stages = 11;

[GitHub] [spark] gengliangwang commented on pull request #39270: [SPARK-41754][UI] Add simple developer guides for UI Protobuf serializer

2022-12-28 Thread GitBox
gengliangwang commented on PR #39270: URL: https://github.com/apache/spark/pull/39270#issuecomment-1367086148 This is for the issue in https://github.com/apache/spark/pull/39192#discussion_r1058002256 cc @LuciferYang @panbingkun -- This is an automated message from the Apache Git Serv

[GitHub] [spark] gengliangwang closed pull request #39202: [SPARK-41685][UI] Support Protobuf serializer for the KVStore in History server

2022-12-28 Thread GitBox
gengliangwang closed pull request #39202: [SPARK-41685][UI] Support Protobuf serializer for the KVStore in History server URL: https://github.com/apache/spark/pull/39202 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

[GitHub] [spark] gengliangwang commented on pull request #39202: [SPARK-41685][UI] Support Protobuf serializer for the KVStore in History server

2022-12-28 Thread GitBox
gengliangwang commented on PR #39202: URL: https://github.com/apache/spark/pull/39202#issuecomment-1367085611 @techaddict @mridulm @LuciferYang @cloud-fan thanks for the review. Merging to master -- This is an automated message from the Apache Git Service. To respond to the message, ple

[GitHub] [spark] warrenzhu25 commented on pull request #38852: [SPARK-41341][CORE] Wait shuffle fetch to finish when decommission executor

2022-12-28 Thread GitBox
warrenzhu25 commented on PR #38852: URL: https://github.com/apache/spark/pull/38852#issuecomment-1367081140 @holdenk @dongjoon-hyun @Ngone51 Help take a look? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL abo

[GitHub] [spark] warrenzhu25 commented on pull request #39280: [SPARK-41766][CORE] Handle decommission request sent before executor registration

2022-12-28 Thread GitBox
warrenzhu25 commented on PR #39280: URL: https://github.com/apache/spark/pull/39280#issuecomment-1367081016 @dongjoon-hyun @mridulm @Ngone51 Help take a look? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL abo

[GitHub] [spark] warrenzhu25 opened a new pull request, #39280: [SPARK-41766][CORE] Handle decommission request sent before executor registration

2022-12-28 Thread GitBox
warrenzhu25 opened a new pull request, #39280: URL: https://github.com/apache/spark/pull/39280 ### What changes were proposed in this pull request? Handle decommission request sent before executor registration ### Why are the changes needed? Current behavior is such requests will

[GitHub] [spark] zhengruifeng commented on a diff in pull request #39236: [SPARK-41068][CONNECT][PYTHON] Implement `DataFrame.stat.corr`

2022-12-28 Thread GitBox
zhengruifeng commented on code in PR #39236: URL: https://github.com/apache/spark/pull/39236#discussion_r1058734162 ## connector/connect/server/src/main/scala/org/apache/spark/sql/connect/planner/SparkConnectPlanner.scala: ## @@ -352,6 +353,16 @@ class SparkConnectPlanner(sessio

[GitHub] [spark] itholic opened a new pull request, #39279: [SPARK-41578][SQL] Assign name to _LEGACY_ERROR_TEMP_2141

2022-12-28 Thread GitBox
itholic opened a new pull request, #39279: URL: https://github.com/apache/spark/pull/39279 ### What changes were proposed in this pull request? This PR proposes to assign name to _LEGACY_ERROR_TEMP_2141, "ENCODER_NOT_FOUND". ### Why are the changes needed?

[GitHub] [spark] beliefer commented on pull request #39262: [SPARK-41069][CONNECT][PYTHON] Implement `DataFrame.approxQuantile` and `DataFrame.stat.approxQuantile`

2022-12-28 Thread GitBox
beliefer commented on PR #39262: URL: https://github.com/apache/spark/pull/39262#issuecomment-1367067211 ping @HyukjinKwon @zhengruifeng @grundprinzip @amaliujia -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[GitHub] [spark] beliefer commented on a diff in pull request #39236: [SPARK-41068][CONNECT][PYTHON] Implement `DataFrame.stat.corr`

2022-12-28 Thread GitBox
beliefer commented on code in PR #39236: URL: https://github.com/apache/spark/pull/39236#discussion_r1058721777 ## connector/connect/server/src/main/scala/org/apache/spark/sql/connect/planner/SparkConnectPlanner.scala: ## @@ -352,6 +353,16 @@ class SparkConnectPlanner(session: S

[GitHub] [spark] zhengruifeng commented on a diff in pull request #39236: [SPARK-41068][CONNECT][PYTHON] Implement `DataFrame.stat.corr`

2022-12-28 Thread GitBox
zhengruifeng commented on code in PR #39236: URL: https://github.com/apache/spark/pull/39236#discussion_r1058720184 ## connector/connect/server/src/main/scala/org/apache/spark/sql/connect/planner/SparkConnectPlanner.scala: ## @@ -352,6 +353,16 @@ class SparkConnectPlanner(sessio

[GitHub] [spark] zhengruifeng commented on a diff in pull request #39236: [SPARK-41068][CONNECT][PYTHON] Implement `DataFrame.stat.corr`

2022-12-28 Thread GitBox
zhengruifeng commented on code in PR #39236: URL: https://github.com/apache/spark/pull/39236#discussion_r1058719947 ## connector/connect/server/src/main/scala/org/apache/spark/sql/connect/planner/SparkConnectPlanner.scala: ## @@ -352,6 +353,16 @@ class SparkConnectPlanner(sessio

[GitHub] [spark] anchovYu commented on a diff in pull request #39269: [SPARK-41631][FOLLOWUP][SQL] Fix two issues in implicit lateral column alias resolution on Aggregate

2022-12-28 Thread GitBox
anchovYu commented on code in PR #39269: URL: https://github.com/apache/spark/pull/39269#discussion_r1058719511 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveLateralColumnAliasReference.scala: ## @@ -168,19 +168,18 @@ object ResolveLateralColumnAli

[GitHub] [spark] cloud-fan commented on a diff in pull request #39269: [SPARK-41631][FOLLOWUP][SQL] Fix two issues in implicit lateral column alias resolution on Aggregate

2022-12-28 Thread GitBox
cloud-fan commented on code in PR #39269: URL: https://github.com/apache/spark/pull/39269#discussion_r1058718417 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveLateralColumnAliasReference.scala: ## @@ -168,19 +168,18 @@ object ResolveLateralColumnAl

[GitHub] [spark] cloud-fan commented on a diff in pull request #39269: [SPARK-41631][FOLLOWUP][SQL] Fix two issues in implicit lateral column alias resolution on Aggregate

2022-12-28 Thread GitBox
cloud-fan commented on code in PR #39269: URL: https://github.com/apache/spark/pull/39269#discussion_r1058718270 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveLateralColumnAliasReference.scala: ## @@ -168,19 +168,18 @@ object ResolveLateralColumnAl

[GitHub] [spark] zhengruifeng commented on pull request #39272: [SPARK-41751][CONNECT][PYTHON] Fix `Column.{bitwiseAND, bitwiseOR, bitwiseXOR}`

2022-12-28 Thread GitBox
zhengruifeng commented on PR #39272: URL: https://github.com/apache/spark/pull/39272#issuecomment-1367059119 merged into master -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific commen

[GitHub] [spark] zhengruifeng closed pull request #39272: [SPARK-41751][CONNECT][PYTHON] Fix `Column.{bitwiseAND, bitwiseOR, bitwiseXOR}`

2022-12-28 Thread GitBox
zhengruifeng closed pull request #39272: [SPARK-41751][CONNECT][PYTHON] Fix `Column.{bitwiseAND, bitwiseOR, bitwiseXOR}` URL: https://github.com/apache/spark/pull/39272 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

[GitHub] [spark] cloud-fan commented on pull request #39268: [SPARK-41752][SQL][UI] Group nested executions under the root execution

2022-12-28 Thread GitBox
cloud-fan commented on PR #39268: URL: https://github.com/apache/spark/pull/39268#issuecomment-1367058299 also cc @ulysses-you -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific commen

[GitHub] [spark] anchovYu commented on a diff in pull request #39269: [SPARK-41631][FOLLOWUP][SQL] Fix two issues in implicit lateral column alias resolution on Aggregate

2022-12-28 Thread GitBox
anchovYu commented on code in PR #39269: URL: https://github.com/apache/spark/pull/39269#discussion_r1058716520 ## sql/core/src/test/scala/org/apache/spark/sql/LateralColumnAliasSuite.scala: ## @@ -547,7 +547,8 @@ class LateralColumnAliasSuite extends LateralColumnAliasSuiteBas

[GitHub] [spark] cloud-fan commented on a diff in pull request #39268: [SPARK-41752][SQL][UI] Group nested executions under the root execution

2022-12-28 Thread GitBox
cloud-fan commented on code in PR #39268: URL: https://github.com/apache/spark/pull/39268#discussion_r1058717124 ## sql/core/src/main/scala/org/apache/spark/sql/execution/ui/AllExecutionsPage.scala: ## @@ -26,40 +26,65 @@ import scala.xml.{Node, NodeSeq} import org.apache.spa

[GitHub] [spark] cloud-fan commented on a diff in pull request #39268: [SPARK-41752][SQL][UI] Group nested executions under the root execution

2022-12-28 Thread GitBox
cloud-fan commented on code in PR #39268: URL: https://github.com/apache/spark/pull/39268#discussion_r1058716626 ## core/src/main/scala/org/apache/spark/internal/config/UI.scala: ## @@ -229,4 +229,11 @@ private[spark] object UI { .stringConf .transform(_.toUpperCase(Lo

[GitHub] [spark] anchovYu commented on a diff in pull request #39269: [SPARK-41631][FOLLOWUP][SQL] Fix two issues in implicit lateral column alias resolution on Aggregate

2022-12-28 Thread GitBox
anchovYu commented on code in PR #39269: URL: https://github.com/apache/spark/pull/39269#discussion_r1058716520 ## sql/core/src/test/scala/org/apache/spark/sql/LateralColumnAliasSuite.scala: ## @@ -547,7 +547,8 @@ class LateralColumnAliasSuite extends LateralColumnAliasSuiteBas

[GitHub] [spark] zhengruifeng opened a new pull request, #39278: [SPARK-41764][CONNECT][PYTHON] Make the internal string op name consistent with FunctionRegistry

2022-12-28 Thread GitBox
zhengruifeng opened a new pull request, #39278: URL: https://github.com/apache/spark/pull/39278 ### What changes were proposed in this pull request? 1, Make the internal string op names `startswith`, `endswith` consistent with FunctionRegistry 2, add test for string ops ### Why

[GitHub] [spark] cloud-fan commented on a diff in pull request #39269: [SPARK-41631][FOLLOWUP][SQL] Fix two issues in implicit lateral column alias resolution on Aggregate

2022-12-28 Thread GitBox
cloud-fan commented on code in PR #39269: URL: https://github.com/apache/spark/pull/39269#discussion_r1058715468 ## sql/core/src/test/scala/org/apache/spark/sql/LateralColumnAliasSuite.scala: ## @@ -547,7 +547,8 @@ class LateralColumnAliasSuite extends LateralColumnAliasSuiteBa

[GitHub] [spark] anchovYu commented on a diff in pull request #39269: [WIP][SPARK-41631][FOLLOWUP][SQL] Fix two issues in implicit lateral column alias resolution on Aggregate

2022-12-28 Thread GitBox
anchovYu commented on code in PR #39269: URL: https://github.com/apache/spark/pull/39269#discussion_r1058714323 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveLateralColumnAliasReference.scala: ## @@ -168,19 +168,18 @@ object ResolveLateralColumnAli

[GitHub] [spark] anchovYu commented on a diff in pull request #39269: [WIP][SPARK-41631][FOLLOWUP][SQL] Fix two issues in implicit lateral column alias resolution on Aggregate

2022-12-28 Thread GitBox
anchovYu commented on code in PR #39269: URL: https://github.com/apache/spark/pull/39269#discussion_r1058714106 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveLateralColumnAliasReference.scala: ## @@ -168,19 +168,18 @@ object ResolveLateralColumnAli

[GitHub] [spark] ulysses-you commented on a diff in pull request #39263: [SPARK-41726][SQL] Remove OptimizedCreateHiveTableAsSelectCommand

2022-12-28 Thread GitBox
ulysses-you commented on code in PR #39263: URL: https://github.com/apache/spark/pull/39263#discussion_r1058714055 ## sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveStrategies.scala: ## @@ -232,15 +233,35 @@ case class RelationConversions( if DDLUtils.isHiveTab

[GitHub] [spark] cloud-fan commented on a diff in pull request #39269: [WIP][SPARK-41631][FOLLOWUP][SQL] Fix two issues in implicit lateral column alias resolution on Aggregate

2022-12-28 Thread GitBox
cloud-fan commented on code in PR #39269: URL: https://github.com/apache/spark/pull/39269#discussion_r1058713809 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveLateralColumnAliasReference.scala: ## @@ -168,19 +168,18 @@ object ResolveLateralColumnAl

[GitHub] [spark] beliefer commented on pull request #39236: [SPARK-41068][CONNECT][PYTHON] Implement `DataFrame.stat.corr`

2022-12-28 Thread GitBox
beliefer commented on PR #39236: URL: https://github.com/apache/spark/pull/39236#issuecomment-1367052882 ping @HyukjinKwon @zhengruifeng @grundprinzip @amaliujia -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[GitHub] [spark] ulysses-you opened a new pull request, #39277: [SPARK-41708][SQL] Pull v1write information to write file node

2022-12-28 Thread GitBox
ulysses-you opened a new pull request, #39277: URL: https://github.com/apache/spark/pull/39277 ### What changes were proposed in this pull request? This pr aims to pull out the v1write information from `V1WriteCommand` to `WriteFiles`: ```scala case class WriteFiles(chil

[GitHub] [spark] zhengruifeng opened a new pull request, #39276: [SPARK-41761][CONNECT][PYTHON] Fix arithmetic ops: `__neg__`, `__pow__`, `__rpow__`

2022-12-28 Thread GitBox
zhengruifeng opened a new pull request, #39276: URL: https://github.com/apache/spark/pull/39276 ### What changes were proposed in this pull request? Fix arithmetic ops: `__neg__`, `__pow__`: 1, `__neg__` fix `[UNRESOLVED_ROUTINE] Cannot resolve function `negate` on search path [`s

[GitHub] [spark] panbingkun opened a new pull request, #39275: [SPARK-41759][CORE] Use `weakIntern` on string values in create new objects during deserialization

2022-12-28 Thread GitBox
panbingkun opened a new pull request, #39275: URL: https://github.com/apache/spark/pull/39275 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? No. ### How was this patch te

[GitHub] [spark] cloud-fan commented on a diff in pull request #39202: [SPARK-41685][UI] Support Protobuf serializer for the KVStore in History server

2022-12-28 Thread GitBox
cloud-fan commented on code in PR #39202: URL: https://github.com/apache/spark/pull/39202#discussion_r1058711270 ## core/src/main/scala/org/apache/spark/internal/config/History.scala: ## @@ -79,6 +79,21 @@ private[spark] object History { .stringConf .createOptional +

[GitHub] [spark] cloud-fan commented on a diff in pull request #39263: [SPARK-41726][SQL] Remove OptimizedCreateHiveTableAsSelectCommand

2022-12-28 Thread GitBox
cloud-fan commented on code in PR #39263: URL: https://github.com/apache/spark/pull/39263#discussion_r1058706279 ## sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveStrategies.scala: ## @@ -232,15 +233,35 @@ case class RelationConversions( if DDLUtils.isHiveTable

[GitHub] [spark] dengziming opened a new pull request, #39274: [SPARK-41760][BUILD][CONNECT] Enforce scalafmt for Connect Client module

2022-12-28 Thread GitBox
dengziming opened a new pull request, #39274: URL: https://github.com/apache/spark/pull/39274 ### What changes were proposed in this pull request? 1. This changes enables enforcing `scalafmt` for the Connect client module since it's a new module. 2. This change applies `scalafmt` o

[GitHub] [spark] mridulm commented on pull request #39202: [SPARK-41685][UI] Support Protobuf serializer for the KVStore in History server

2022-12-28 Thread GitBox
mridulm commented on PR #39202: URL: https://github.com/apache/spark/pull/39202#issuecomment-1367044296 +CC @thejdeep, @shardulm94 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific co

[GitHub] [spark] mridulm commented on a diff in pull request #39226: [SPARK-41694][CORE] Add new config to clean up `spark.ui.store.path` directory when `SparkContext.stop()`

2022-12-28 Thread GitBox
mridulm commented on code in PR #39226: URL: https://github.com/apache/spark/pull/39226#discussion_r1058705480 ## core/src/main/scala/org/apache/spark/status/AppStatusStore.scala: ## @@ -733,6 +734,15 @@ private[spark] class AppStatusStore( def close(): Unit = { store.

[GitHub] [spark] zhengruifeng commented on pull request #39273: [SPARK-41751][CONNECT][PYTHON] Fix `Column.{isNull, isNotNull, eqNullSafe}`

2022-12-28 Thread GitBox
zhengruifeng commented on PR #39273: URL: https://github.com/apache/spark/pull/39273#issuecomment-1367042645 @HyukjinKwon -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. T

[GitHub] [spark] zhengruifeng opened a new pull request, #39273: [SPARK-41751][CONNECT][PYTHON] Fix `Column.{isNull, isNotNull, eqNullSafe}`

2022-12-28 Thread GitBox
zhengruifeng opened a new pull request, #39273: URL: https://github.com/apache/spark/pull/39273 ### What changes were proposed in this pull request? Fix `Column.{isNull, isNotNull, eqNullSafe}` ### Why are the changes needed? they were wrongly implemented ### Does

[GitHub] [spark] ulysses-you commented on a diff in pull request #39263: [SPARK-41726][SQL] Remove OptimizedCreateHiveTableAsSelectCommand

2022-12-28 Thread GitBox
ulysses-you commented on code in PR #39263: URL: https://github.com/apache/spark/pull/39263#discussion_r1058703839 ## sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveStrategies.scala: ## @@ -232,15 +233,35 @@ case class RelationConversions( if DDLUtils.isHiveTab

[GitHub] [spark] cloud-fan commented on a diff in pull request #39099: [SPARK-41554] fix changing of Decimal scale when scale decreased by m…

2022-12-28 Thread GitBox
cloud-fan commented on code in PR #39099: URL: https://github.com/apache/spark/pull/39099#discussion_r1058702415 ## sql/catalyst/src/main/scala/org/apache/spark/sql/types/Decimal.scala: ## @@ -374,7 +374,7 @@ final class Decimal extends Ordered[Decimal] with Serializable {

[GitHub] [spark] cloud-fan commented on a diff in pull request #39099: [SPARK-41554] fix changing of Decimal scale when scale decreased by m…

2022-12-28 Thread GitBox
cloud-fan commented on code in PR #39099: URL: https://github.com/apache/spark/pull/39099#discussion_r1058702308 ## sql/catalyst/src/main/scala/org/apache/spark/sql/types/Decimal.scala: ## @@ -374,7 +374,7 @@ final class Decimal extends Ordered[Decimal] with Serializable {

[GitHub] [spark] cloud-fan commented on a diff in pull request #39263: [SPARK-41726][SQL] Remove OptimizedCreateHiveTableAsSelectCommand

2022-12-28 Thread GitBox
cloud-fan commented on code in PR #39263: URL: https://github.com/apache/spark/pull/39263#discussion_r1058702056 ## sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveStrategies.scala: ## @@ -232,15 +233,35 @@ case class RelationConversions( if DDLUtils.isHiveTable

[GitHub] [spark] mridulm commented on a diff in pull request #36165: [SPARK-36620][SHUFFLE] Add Push Based Shuffle client side read metrics

2022-12-28 Thread GitBox
mridulm commented on code in PR #36165: URL: https://github.com/apache/spark/pull/36165#discussion_r1058701274 ## core/src/main/protobuf/org/apache/spark/status/protobuf/store_types.proto: ## @@ -100,11 +100,21 @@ message TaskDataWrapper { int64 shuffle_remote_bytes_read_to_d

[GitHub] [spark] mridulm commented on a diff in pull request #36165: [SPARK-36620][SHUFFLE] Add Push Based Shuffle client side read metrics

2022-12-28 Thread GitBox
mridulm commented on code in PR #36165: URL: https://github.com/apache/spark/pull/36165#discussion_r1058701274 ## core/src/main/protobuf/org/apache/spark/status/protobuf/store_types.proto: ## @@ -100,11 +100,21 @@ message TaskDataWrapper { int64 shuffle_remote_bytes_read_to_d

[GitHub] [spark] dengziming commented on pull request #39158: [SPARK-41354][CONNECT] Add `RepartitionByExpression` to proto

2022-12-28 Thread GitBox
dengziming commented on PR #39158: URL: https://github.com/apache/spark/pull/39158#issuecomment-1367036120 > adding the repartitionBy* APIs in Client ? Do you mean adding them to python client? yes, I'm working on it. -- This is an automated message from the Apache Git Service. To r

[GitHub] [spark] zhengruifeng commented on pull request #39272: [SPARK-41751][CONNECT][PYTHON] Implement `Column.{bitwiseAND, bitwiseOR, bitwiseXOR}`

2022-12-28 Thread GitBox
zhengruifeng commented on PR #39272: URL: https://github.com/apache/spark/pull/39272#issuecomment-1367036019 cc @HyukjinKwon -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

[GitHub] [spark] zhengruifeng opened a new pull request, #39272: [SPARK-41751][CONNECT][PYTHON] Implement `Column.{bitwiseAND, bitwiseOR, bitwiseXOR}`

2022-12-28 Thread GitBox
zhengruifeng opened a new pull request, #39272: URL: https://github.com/apache/spark/pull/39272 ### What changes were proposed in this pull request? Implement `Column.{bitwiseAND, bitwiseOR, bitwiseXOR}` ### Why are the changes needed? fix ### Does this PR introdu

[GitHub] [spark] cloud-fan closed pull request #39266: [SPARK-41753][SQL][TEST] Add tests for ArrayZip to check the result size and nullability

2022-12-28 Thread GitBox
cloud-fan closed pull request #39266: [SPARK-41753][SQL][TEST] Add tests for ArrayZip to check the result size and nullability URL: https://github.com/apache/spark/pull/39266 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and us

[GitHub] [spark] cloud-fan commented on pull request #39266: [SPARK-41753][SQL][TEST] Add tests for ArrayZip to check the result size and nullability

2022-12-28 Thread GitBox
cloud-fan commented on PR #39266: URL: https://github.com/apache/spark/pull/39266#issuecomment-1367035659 thanks, merging to master! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific c

[GitHub] [spark] techaddict commented on a diff in pull request #39249: [SPARK-41655][CONNECT] Enable doctests in pyspark.sql.connect.column

2022-12-28 Thread GitBox
techaddict commented on code in PR #39249: URL: https://github.com/apache/spark/pull/39249#discussion_r1058694818 ## python/pyspark/sql/connect/column.py: ## @@ -390,3 +391,61 @@ def __nonzero__(self) -> None: Column.__doc__ = PySparkColumn.__doc__ + + +def _test() -> None:

[GitHub] [spark] techaddict commented on a diff in pull request #39249: [SPARK-41655][CONNECT] Enable doctests in pyspark.sql.connect.column

2022-12-28 Thread GitBox
techaddict commented on code in PR #39249: URL: https://github.com/apache/spark/pull/39249#discussion_r1058693270 ## python/pyspark/sql/connect/column.py: ## @@ -390,3 +391,61 @@ def __nonzero__(self) -> None: Column.__doc__ = PySparkColumn.__doc__ + + +def _test() -> None:

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #39249: [SPARK-41655][CONNECT] Enable doctests in pyspark.sql.connect.column

2022-12-28 Thread GitBox
HyukjinKwon commented on code in PR #39249: URL: https://github.com/apache/spark/pull/39249#discussion_r1058690914 ## python/pyspark/sql/connect/column.py: ## @@ -390,3 +391,61 @@ def __nonzero__(self) -> None: Column.__doc__ = PySparkColumn.__doc__ + + +def _test() -> None

[GitHub] [spark] HyukjinKwon closed pull request #39271: [SPARK-41747][SPARK-41744][SPARK-41748][SPARK-41749][CONNECT][TESTS] Reeanble tests for multiple arguments in max, min, sum and avg in groupby

2022-12-28 Thread GitBox
HyukjinKwon closed pull request #39271: [SPARK-41747][SPARK-41744][SPARK-41748][SPARK-41749][CONNECT][TESTS] Reeanble tests for multiple arguments in max, min, sum and avg in groupby URL: https://github.com/apache/spark/pull/39271 -- This is an automated message from the Apache Git Service.

[GitHub] [spark] HyukjinKwon commented on pull request #39271: [SPARK-41747][SPARK-41744][SPARK-41748][SPARK-41749][CONNECT][TESTS] Reeanble tests for multiple arguments in max, min, sum and avg in gr

2022-12-28 Thread GitBox
HyukjinKwon commented on PR #39271: URL: https://github.com/apache/spark/pull/39271#issuecomment-1367024253 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #39249: [SPARK-41655][CONNECT] Enable doctests in pyspark.sql.connect.column

2022-12-28 Thread GitBox
HyukjinKwon commented on code in PR #39249: URL: https://github.com/apache/spark/pull/39249#discussion_r1058687939 ## python/pyspark/sql/connect/column.py: ## @@ -390,3 +391,61 @@ def __nonzero__(self) -> None: Column.__doc__ = PySparkColumn.__doc__ + + +def _test() -> None

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #39249: [SPARK-41655][CONNECT] Enable doctests in pyspark.sql.connect.column

2022-12-28 Thread GitBox
HyukjinKwon commented on code in PR #39249: URL: https://github.com/apache/spark/pull/39249#discussion_r1058687670 ## python/pyspark/sql/column.py: ## @@ -200,17 +200,17 @@ class Column: ... [(2, "Alice"), (5, "Bob")], ["age", "name"]) Select a column out of a D

[GitHub] [spark] techaddict commented on a diff in pull request #39249: [SPARK-41655][CONNECT] Enable doctests in pyspark.sql.connect.column

2022-12-28 Thread GitBox
techaddict commented on code in PR #39249: URL: https://github.com/apache/spark/pull/39249#discussion_r1058687601 ## python/pyspark/sql/connect/column.py: ## @@ -390,3 +391,61 @@ def __nonzero__(self) -> None: Column.__doc__ = PySparkColumn.__doc__ + + +def _test() -> None:

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #39249: [SPARK-41655][CONNECT] Enable doctests in pyspark.sql.connect.column

2022-12-28 Thread GitBox
HyukjinKwon commented on code in PR #39249: URL: https://github.com/apache/spark/pull/39249#discussion_r1058687254 ## python/pyspark/sql/connect/column.py: ## @@ -390,3 +391,61 @@ def __nonzero__(self) -> None: Column.__doc__ = PySparkColumn.__doc__ + + +def _test() -> None

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #39249: [SPARK-41655][CONNECT] Enable doctests in pyspark.sql.connect.column

2022-12-28 Thread GitBox
HyukjinKwon commented on code in PR #39249: URL: https://github.com/apache/spark/pull/39249#discussion_r1058686957 ## python/pyspark/sql/connect/column.py: ## @@ -388,5 +389,62 @@ def __nonzero__(self) -> None: __bool__ = __nonzero__ Review Comment: ```suggestion

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #39249: [SPARK-41655][CONNECT] Enable doctests in pyspark.sql.connect.column

2022-12-28 Thread GitBox
HyukjinKwon commented on code in PR #39249: URL: https://github.com/apache/spark/pull/39249#discussion_r1058686880 ## python/pyspark/sql/column.py: ## @@ -1258,8 +1258,7 @@ def over(self, window: "WindowSpec") -> "Column": >>> from pyspark.sql import Window >>>

[GitHub] [spark] HyukjinKwon commented on pull request #39249: [WIP] [SPARK-41655][CONNECT] Enable doctests in pyspark.sql.connect.column

2022-12-28 Thread GitBox
HyukjinKwon commented on PR #39249: URL: https://github.com/apache/spark/pull/39249#issuecomment-1367019627 Thanks for working on this @techaddict -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #39239: [SPARK-41730][PYTHON] Set tz to UTC while converting of timestamps to python's datetime

2022-12-28 Thread GitBox
HyukjinKwon commented on code in PR #39239: URL: https://github.com/apache/spark/pull/39239#discussion_r1058683926 ## python/pyspark/sql/types.py: ## @@ -276,7 +276,15 @@ def toInternal(self, dt: datetime.datetime) -> int: def fromInternal(self, ts: int) -> datetime.datetim

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #39249: [WIP] [SPARK-41655][CONNECT] Enable doctests in pyspark.sql.connect.column

2022-12-28 Thread GitBox
HyukjinKwon commented on code in PR #39249: URL: https://github.com/apache/spark/pull/39249#discussion_r1058682621 ## python/pyspark/sql/connect/column.py: ## @@ -390,3 +391,61 @@ def __nonzero__(self) -> None: Column.__doc__ = PySparkColumn.__doc__ + + +def _test() -> None

[GitHub] [spark] HyukjinKwon commented on pull request #39271: [SPARK-41747][SPARK-41744][SPARK-41748][SPARK-41749][CONNECT][TESTS] Reeanble tests for multiple arguments in max, min, sum and avg in gr

2022-12-28 Thread GitBox
HyukjinKwon commented on PR #39271: URL: https://github.com/apache/spark/pull/39271#issuecomment-1367016580 cc @zhengruifeng -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

[GitHub] [spark] techaddict commented on a diff in pull request #39110: [SPARK-41429][UI] Protobuf serializer for RDDOperationGraphWrapper

2022-12-28 Thread GitBox
techaddict commented on code in PR #39110: URL: https://github.com/apache/spark/pull/39110#discussion_r1058682080 ## core/src/main/protobuf/org/apache/spark/status/protobuf/store_types.proto: ## @@ -390,3 +390,38 @@ message SQLExecutionUIData { repeated int64 stages = 11;

[GitHub] [spark] techaddict commented on pull request #39110: [SPARK-41429][UI] Protobuf serializer for RDDOperationGraphWrapper

2022-12-28 Thread GitBox
techaddict commented on PR #39110: URL: https://github.com/apache/spark/pull/39110#issuecomment-1367016502 @gengliangwang updated the PR -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specif

  1   2   >