[GitHub] [spark] HyukjinKwon closed pull request #39390: [SPARK-41840][CONNECT][PYTHON] Add the missing alias `groupby`

2023-01-04 Thread GitBox
HyukjinKwon closed pull request #39390: [SPARK-41840][CONNECT][PYTHON] Add the missing alias `groupby` URL: https://github.com/apache/spark/pull/39390 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

[GitHub] [spark] HyukjinKwon closed pull request #39392: [SPARK-41846][CONNECT][PYTHON] Enable doctests for window functions

2023-01-04 Thread GitBox
HyukjinKwon closed pull request #39392: [SPARK-41846][CONNECT][PYTHON] Enable doctests for window functions URL: https://github.com/apache/spark/pull/39392 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[GitHub] [spark] HyukjinKwon commented on pull request #39392: [SPARK-41846][CONNECT][PYTHON] Enable doctests for window functions

2023-01-04 Thread GitBox
HyukjinKwon commented on PR #39392: URL: https://github.com/apache/spark/pull/39392#issuecomment-1371554769 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] HyukjinKwon commented on pull request #39390: [SPARK-41840][CONNECT][PYTHON] Add the missing alias `groupby`

2023-01-04 Thread GitBox
HyukjinKwon commented on PR #39390: URL: https://github.com/apache/spark/pull/39390#issuecomment-1371554309 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #39393: [SPARK-41871][CONNECT] DataFrame hint parameter can be str, list, float or int

2023-01-04 Thread GitBox
HyukjinKwon commented on code in PR #39393: URL: https://github.com/apache/spark/pull/39393#discussion_r1061976193 ## python/pyspark/sql/tests/test_dataframe.py: ## @@ -553,13 +553,17 @@ def test_generic_hints(self): def test_extended_hint_types(self): df =

[GitHub] [spark] lu-wang-dl commented on a diff in pull request #39188: [WIP][SPARK-41591][PYTHON][ML] Training PyTorch Files on Single Node Multi GPU

2023-01-04 Thread GitBox
lu-wang-dl commented on code in PR #39188: URL: https://github.com/apache/spark/pull/39188#discussion_r1061973010 ## python/pyspark/ml/torch/distributor.py: ## @@ -0,0 +1,491 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license

[GitHub] [spark] akpatnam25 commented on pull request #38959: SPARK-41415: SASL Request Retries

2023-01-04 Thread GitBox
akpatnam25 commented on PR #38959: URL: https://github.com/apache/spark/pull/38959#issuecomment-1371548903 @mridulm updated the PR to not have protocol/server side changes. In this case, we are creating a new connection every time the SASL retry is triggered. Confirmed that this is the

[GitHub] [spark] lu-wang-dl commented on a diff in pull request #39188: [WIP][SPARK-41591][PYTHON][ML] Training PyTorch Files on Single Node Multi GPU

2023-01-04 Thread GitBox
lu-wang-dl commented on code in PR #39188: URL: https://github.com/apache/spark/pull/39188#discussion_r1061968170 ## python/pyspark/ml/torch/distributor.py: ## @@ -0,0 +1,491 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license

[GitHub] [spark] rithwik-db commented on a diff in pull request #39188: [WIP][SPARK-41591][PYTHON][ML] Training PyTorch Files on Single Node Multi GPU

2023-01-04 Thread GitBox
rithwik-db commented on code in PR #39188: URL: https://github.com/apache/spark/pull/39188#discussion_r1061943616 ## python/pyspark/ml/torch/distributor.py: ## @@ -0,0 +1,491 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license

[GitHub] [spark] rithwik-db commented on a diff in pull request #39188: [WIP][SPARK-41591][PYTHON][ML] Training PyTorch Files on Single Node Multi GPU

2023-01-04 Thread GitBox
rithwik-db commented on code in PR #39188: URL: https://github.com/apache/spark/pull/39188#discussion_r1061941062 ## python/pyspark/ml/torch/distributor.py: ## @@ -0,0 +1,491 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license

[GitHub] [spark] rithwik-db commented on a diff in pull request #39188: [WIP][SPARK-41591][PYTHON][ML] Training PyTorch Files on Single Node Multi GPU

2023-01-04 Thread GitBox
rithwik-db commented on code in PR #39188: URL: https://github.com/apache/spark/pull/39188#discussion_r1061940175 ## python/pyspark/ml/torch/distributor.py: ## @@ -0,0 +1,491 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license

[GitHub] [spark] dongjoon-hyun commented on a diff in pull request #39268: [SPARK-41752][SQL][UI] Group nested executions under the root execution

2023-01-04 Thread GitBox
dongjoon-hyun commented on code in PR #39268: URL: https://github.com/apache/spark/pull/39268#discussion_r1061939747 ## sql/core/src/main/scala/org/apache/spark/sql/execution/SQLExecution.scala: ## @@ -140,6 +166,7 @@ object SQLExecution { } finally {

[GitHub] [spark] leewyang commented on a diff in pull request #37734: [SPARK-40264][ML] add batch_infer_udf function to pyspark.ml.functions

2023-01-04 Thread GitBox
leewyang commented on code in PR #37734: URL: https://github.com/apache/spark/pull/37734#discussion_r1061936672 ## python/pyspark/ml/functions.py: ## @@ -106,6 +138,605 @@ def array_to_vector(col: Column) -> Column: return

[GitHub] [spark] dongjoon-hyun commented on a diff in pull request #39268: [SPARK-41752][SQL][UI] Group nested executions under the root execution

2023-01-04 Thread GitBox
dongjoon-hyun commented on code in PR #39268: URL: https://github.com/apache/spark/pull/39268#discussion_r1061935560 ## sql/core/src/main/scala/org/apache/spark/sql/execution/SQLExecution.scala: ## @@ -55,6 +56,28 @@ object SQLExecution { } } + /** + * Track the

[GitHub] [spark] dongjoon-hyun commented on a diff in pull request #39268: [SPARK-41752][SQL][UI] Group nested executions under the root execution

2023-01-04 Thread GitBox
dongjoon-hyun commented on code in PR #39268: URL: https://github.com/apache/spark/pull/39268#discussion_r1061933315 ## sql/core/src/main/scala/org/apache/spark/sql/execution/SQLExecution.scala: ## @@ -55,6 +56,28 @@ object SQLExecution { } } + /** + * Track the

[GitHub] [spark] dongjoon-hyun commented on a diff in pull request #39268: [SPARK-41752][SQL][UI] Group nested executions under the root execution

2023-01-04 Thread GitBox
dongjoon-hyun commented on code in PR #39268: URL: https://github.com/apache/spark/pull/39268#discussion_r1061933315 ## sql/core/src/main/scala/org/apache/spark/sql/execution/SQLExecution.scala: ## @@ -55,6 +56,28 @@ object SQLExecution { } } + /** + * Track the

[GitHub] [spark] dongjoon-hyun commented on a diff in pull request #39268: [SPARK-41752][SQL][UI] Group nested executions under the root execution

2023-01-04 Thread GitBox
dongjoon-hyun commented on code in PR #39268: URL: https://github.com/apache/spark/pull/39268#discussion_r1061928098 ## core/src/main/scala/org/apache/spark/internal/config/UI.scala: ## @@ -229,4 +229,11 @@ private[spark] object UI { .stringConf

[GitHub] [spark] dongjoon-hyun commented on a diff in pull request #39268: [SPARK-41752][SQL][UI] Group nested executions under the root execution

2023-01-04 Thread GitBox
dongjoon-hyun commented on code in PR #39268: URL: https://github.com/apache/spark/pull/39268#discussion_r1061928098 ## core/src/main/scala/org/apache/spark/internal/config/UI.scala: ## @@ -229,4 +229,11 @@ private[spark] object UI { .stringConf

[GitHub] [spark] dongjoon-hyun commented on a diff in pull request #39268: [SPARK-41752][SQL][UI] Group nested executions under the root execution

2023-01-04 Thread GitBox
dongjoon-hyun commented on code in PR #39268: URL: https://github.com/apache/spark/pull/39268#discussion_r1061927117 ## core/src/main/resources/org/apache/spark/ui/static/webui.css: ## @@ -187,6 +187,18 @@ pre { display: none; } +.sub-execution-list { + font-size:0.9rem;

[GitHub] [spark] leewyang commented on a diff in pull request #37734: [SPARK-40264][ML] add batch_infer_udf function to pyspark.ml.functions

2023-01-04 Thread GitBox
leewyang commented on code in PR #37734: URL: https://github.com/apache/spark/pull/37734#discussion_r1061926376 ## python/pyspark/ml/model_cache.py: ## @@ -0,0 +1,44 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements.

[GitHub] [spark] leewyang commented on a diff in pull request #37734: [SPARK-40264][ML] add batch_infer_udf function to pyspark.ml.functions

2023-01-04 Thread GitBox
leewyang commented on code in PR #37734: URL: https://github.com/apache/spark/pull/37734#discussion_r1061926045 ## python/pyspark/ml/functions.py: ## @@ -106,6 +117,474 @@ def array_to_vector(col: Column) -> Column: return

[GitHub] [spark] leewyang commented on a diff in pull request #37734: [SPARK-40264][ML] add batch_infer_udf function to pyspark.ml.functions

2023-01-04 Thread GitBox
leewyang commented on code in PR #37734: URL: https://github.com/apache/spark/pull/37734#discussion_r1061925652 ## python/pyspark/ml/functions.py: ## @@ -106,6 +117,474 @@ def array_to_vector(col: Column) -> Column: return

[GitHub] [spark] gengliangwang closed pull request #39357: [SPARK-41677][CORE][SQL][SS][UI] Add Protobuf serializer for StreamingQueryProgressWrapper

2023-01-04 Thread GitBox
gengliangwang closed pull request #39357: [SPARK-41677][CORE][SQL][SS][UI] Add Protobuf serializer for StreamingQueryProgressWrapper URL: https://github.com/apache/spark/pull/39357 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

[GitHub] [spark] gengliangwang commented on pull request #39357: [SPARK-41677][CORE][SQL][SS][UI] Add Protobuf serializer for StreamingQueryProgressWrapper

2023-01-04 Thread GitBox
gengliangwang commented on PR #39357: URL: https://github.com/apache/spark/pull/39357#issuecomment-1371482013 Thanks, merging to master -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] gengliangwang commented on a diff in pull request #39357: [SPARK-41677][CORE][SQL][SS][UI] Add Protobuf serializer for StreamingQueryProgressWrapper

2023-01-04 Thread GitBox
gengliangwang commented on code in PR #39357: URL: https://github.com/apache/spark/pull/39357#discussion_r1061918139 ## core/src/main/protobuf/org/apache/spark/status/protobuf/store_types.proto: ## @@ -684,3 +684,54 @@ message ExecutorPeakMetricsDistributions { repeated

[GitHub] [spark] gengliangwang commented on a diff in pull request #39226: [SPARK-41694][CORE] Isolate RocksDB path for Live UI and automatically cleanup when `SparkContext.stop()`

2023-01-04 Thread GitBox
gengliangwang commented on code in PR #39226: URL: https://github.com/apache/spark/pull/39226#discussion_r1061892643 ## core/src/test/scala/org/apache/spark/status/AutoCleanupLiveUIDirSuite.scala: ## @@ -0,0 +1,56 @@ +/* + * Licensed to the Apache Software Foundation (ASF)

[GitHub] [spark] gengliangwang closed pull request #39286: [SPARK-41768][CORE] Refactor the definition of enum to follow with the code style

2023-01-04 Thread GitBox
gengliangwang closed pull request #39286: [SPARK-41768][CORE] Refactor the definition of enum to follow with the code style URL: https://github.com/apache/spark/pull/39286 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

[GitHub] [spark] gengliangwang commented on pull request #39286: [SPARK-41768][CORE] Refactor the definition of enum to follow with the code style

2023-01-04 Thread GitBox
gengliangwang commented on PR #39286: URL: https://github.com/apache/spark/pull/39286#issuecomment-1371426746 Thanks, merging to master -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] lu-wang-dl commented on a diff in pull request #39188: [WIP][SPARK-41591][PYTHON][ML] Training PyTorch Files on Single Node Multi GPU

2023-01-04 Thread GitBox
lu-wang-dl commented on code in PR #39188: URL: https://github.com/apache/spark/pull/39188#discussion_r1061884522 ## python/pyspark/ml/torch/distributor.py: ## @@ -0,0 +1,491 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license

[GitHub] [spark] lu-wang-dl commented on a diff in pull request #39188: [WIP][SPARK-41591][PYTHON][ML] Training PyTorch Files on Single Node Multi GPU

2023-01-04 Thread GitBox
lu-wang-dl commented on code in PR #39188: URL: https://github.com/apache/spark/pull/39188#discussion_r1061884189 ## python/pyspark/ml/torch/distributor.py: ## @@ -0,0 +1,491 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license

[GitHub] [spark] srowen commented on pull request #39391: [SPARK-41883][BUILD] Upgrade dropwizard metrics 4.2.15

2023-01-04 Thread GitBox
srowen commented on PR #39391: URL: https://github.com/apache/spark/pull/39391#issuecomment-1371405185 Can you try rerunning the tests? they're stuck or something -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

[GitHub] [spark] gerashegalov commented on a diff in pull request #39383: [SPARK-41780][SQL] Should throw INVALID_PARAMETER_VALUE when the parameters `regexp` in regexp_replace is invalid

2023-01-04 Thread GitBox
gerashegalov commented on code in PR #39383: URL: https://github.com/apache/spark/pull/39383#discussion_r1061843543 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/regexpExpressions.scala: ## @@ -634,7 +634,12 @@ case class RegExpReplace(subject:

[GitHub] [spark] srowen commented on a diff in pull request #39286: [SPARK-41768][CORE] Refactor the definition of enum to follow with the code style

2023-01-04 Thread GitBox
srowen commented on code in PR #39286: URL: https://github.com/apache/spark/pull/39286#discussion_r1061854567 ## core/src/main/protobuf/org/apache/spark/status/protobuf/store_types.proto: ## @@ -27,10 +27,10 @@ package org.apache.spark.status.protobuf; enum

[GitHub] [spark] gengliangwang commented on a diff in pull request #39286: [SPARK-41768][CORE] Refactor the definition of enum to follow with the code style

2023-01-04 Thread GitBox
gengliangwang commented on code in PR #39286: URL: https://github.com/apache/spark/pull/39286#discussion_r1061842276 ## core/src/main/protobuf/org/apache/spark/status/protobuf/store_types.proto: ## @@ -27,10 +27,10 @@ package org.apache.spark.status.protobuf; enum

[GitHub] [spark] gengliangwang commented on a diff in pull request #39385: [SPARK-41882][CORE][SQL][UI] Add tests for `SQLAppStatusStore` with RocksDB backend and fix some bugs

2023-01-04 Thread GitBox
gengliangwang commented on code in PR #39385: URL: https://github.com/apache/spark/pull/39385#discussion_r1061839322 ## sql/core/src/test/scala/org/apache/spark/sql/execution/ui/SQLAppStatusListenerSuite.scala: ## @@ -1007,6 +1004,36 @@ class SQLAppStatusListenerSuite extends

[GitHub] [spark] dongjoon-hyun commented on pull request #39371: [SPARK-41030][BUILD][3.2] Upgrade `Apache Ivy` to 2.5.1

2023-01-04 Thread GitBox
dongjoon-hyun commented on PR #39371: URL: https://github.com/apache/spark/pull/39371#issuecomment-1371346517 Apache Spark community always recommends to use the latest one. In case of SPARK-41030, `v3.3.2` is the fastest release with that. -- This is an automated message from the Apache

[GitHub] [spark] dongjoon-hyun commented on pull request #39371: [SPARK-41030][BUILD][3.2] Upgrade `Apache Ivy` to 2.5.1

2023-01-04 Thread GitBox
dongjoon-hyun commented on PR #39371: URL: https://github.com/apache/spark/pull/39371#issuecomment-1371344946 Before `v3.2.4`, - `v3.3.2` will arrive on Feb/March timeframe - `v3.4.0` feature freeze will start on January 16th and RC will start on

[GitHub] [spark] dongjoon-hyun commented on pull request #39371: [SPARK-41030][BUILD][3.2] Upgrade `Apache Ivy` to 2.5.1

2023-01-04 Thread GitBox
dongjoon-hyun commented on PR #39371: URL: https://github.com/apache/spark/pull/39371#issuecomment-1371342486 BTW, `v3.2.4` is expected on April 2023 as EOL release according to the release cadence. -- This is an automated message from the Apache Git Service. To respond to the message,

[GitHub] [spark] bjornjorgensen commented on pull request #39371: [SPARK-41030][BUILD][3.2] Upgrade `Apache Ivy` to 2.5.1

2023-01-04 Thread GitBox
bjornjorgensen commented on PR #39371: URL: https://github.com/apache/spark/pull/39371#issuecomment-1371339204 @kyle-ai2 Yes, this PR is a part of the 3.2 branch now. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

[GitHub] [spark] kyle-ai2 commented on pull request #39371: [SPARK-41030][BUILD][3.2] Upgrade `Apache Ivy` to 2.5.1

2023-01-04 Thread GitBox
kyle-ai2 commented on PR #39371: URL: https://github.com/apache/spark/pull/39371#issuecomment-1371329812 Thanks everyone. Will this be released in a new Spark 3.2.4 image? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

[GitHub] [spark] MaxGekk closed pull request #39284: [SPARK-41573][SQL] Assign name to _LEGACY_ERROR_TEMP_2136

2023-01-04 Thread GitBox
MaxGekk closed pull request #39284: [SPARK-41573][SQL] Assign name to _LEGACY_ERROR_TEMP_2136 URL: https://github.com/apache/spark/pull/39284 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] MaxGekk commented on pull request #39284: [SPARK-41573][SQL] Assign name to _LEGACY_ERROR_TEMP_2136

2023-01-04 Thread GitBox
MaxGekk commented on PR #39284: URL: https://github.com/apache/spark/pull/39284#issuecomment-1371324425 +1, LGTM. Merging to master. Thank you, @itholic. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[GitHub] [spark] neshkeev commented on pull request #39350: [MINOR] Fix a typo "from from" -> "from"

2023-01-04 Thread GitBox
neshkeev commented on PR #39350: URL: https://github.com/apache/spark/pull/39350#issuecomment-1371297053 @srowen , thank you. Please tell me when I can safely delete the branch -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

[GitHub] [spark] MaxGekk commented on a diff in pull request #39282: [SPARK-41581][SQL] Assign name to _LEGACY_ERROR_TEMP_1230

2023-01-04 Thread GitBox
MaxGekk commented on code in PR #39282: URL: https://github.com/apache/spark/pull/39282#discussion_r1061771537 ## sql/core/src/test/scala/org/apache/spark/sql/errors/QueryCompilationErrorsSuite.scala: ## @@ -680,6 +681,18 @@ class QueryCompilationErrorsSuite context =

[GitHub] [spark] lu-wang-dl commented on a diff in pull request #39188: [WIP][SPARK-41591][PYTHON][ML] Training PyTorch Files on Single Node Multi GPU

2023-01-04 Thread GitBox
lu-wang-dl commented on code in PR #39188: URL: https://github.com/apache/spark/pull/39188#discussion_r1061769537 ## python/pyspark/ml/torch/distributor.py: ## @@ -0,0 +1,491 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license

[GitHub] [spark] viirya commented on a diff in pull request #39131: [SPARK-41162][SQL] Fix anti- and semi-join for self-join with aggregations

2023-01-04 Thread GitBox
viirya commented on code in PR #39131: URL: https://github.com/apache/spark/pull/39131#discussion_r1061750586 ## sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/LeftSemiAntiJoinPushDownSuite.scala: ## @@ -46,7 +46,7 @@ class LeftSemiPushdownSuite extends

[GitHub] [spark] smallzhongfeng commented on pull request #39368: [SPARK-28764][CORE][TEST] Remove writePartitionedFile in ExternalSorter

2023-01-04 Thread GitBox
smallzhongfeng commented on PR #39368: URL: https://github.com/apache/spark/pull/39368#issuecomment-1371229625 cc @mccheah @cloud-fan @HyukjinKwon If I have misunderstood, thank you very much for pointing it out :). -- This is an automated message from the Apache Git Service. To respond

[GitHub] [spark] cloud-fan commented on pull request #39131: [SPARK-41162][SQL] Fix anti- and semi-join for self-join with aggregations

2023-01-04 Thread GitBox
cloud-fan commented on PR #39131: URL: https://github.com/apache/spark/pull/39131#issuecomment-1371227066 @EnricoMi thanks for the fix! which spark version starts to have this bug? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

[GitHub] [spark] thejdeep commented on a diff in pull request #36165: [SPARK-36620][SHUFFLE] Add Push Based Shuffle client side read metrics

2023-01-04 Thread GitBox
thejdeep commented on code in PR #36165: URL: https://github.com/apache/spark/pull/36165#discussion_r1061723365 ## core/src/main/scala/org/apache/spark/status/storeTypes.scala: ## @@ -233,6 +243,38 @@ private[spark] class TaskDataWrapper( val shuffleLocalBytesRead: Long,

[GitHub] [spark] thejdeep commented on pull request #36165: [SPARK-36620][SHUFFLE] Add Push Based Shuffle client side read metrics

2023-01-04 Thread GitBox
thejdeep commented on PR #36165: URL: https://github.com/apache/spark/pull/36165#issuecomment-1371213103 Fixed failing tests and updated commit messages to reflect the overall changes. PTAL. Thanks -- This is an automated message from the Apache Git Service. To respond to the message,

[GitHub] [spark] itholic commented on pull request #39388: [SPARK-41354][CONNECT][PYTHON] implement RepartitionByExpression

2023-01-04 Thread GitBox
itholic commented on PR #39388: URL: https://github.com/apache/spark/pull/39388#issuecomment-1371198964 nit: > Implement DataFrame.hint for pyspark Maybe `DataFrame.repartition` or `RepartitionByExpression` instead of `DataFrame.hint` ? -- This is an automated message from the

[GitHub] [spark] itholic commented on a diff in pull request #39383: [SPARK-41780][SQL] Should throw INVALID_PARAMETER_VALUE when the parameters `regexp` in regexp_replace is invalid

2023-01-04 Thread GitBox
itholic commented on code in PR #39383: URL: https://github.com/apache/spark/pull/39383#discussion_r1061705413 ## sql/core/src/test/scala/org/apache/spark/sql/StringFunctionsSuite.scala: ## @@ -663,4 +664,18 @@ class StringFunctionsSuite extends QueryTest with

[GitHub] [spark] EnricoMi commented on pull request #38223: [SPARK-40770][PYTHON] Improved error messages for applyInPandas for schema mismatch

2023-01-04 Thread GitBox
EnricoMi commented on PR #38223: URL: https://github.com/apache/spark/pull/38223#issuecomment-1371196200 @HyukjinKwon @cloud-fan @xinrong-meng can we get this into Spark 3.4? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

[GitHub] [spark] EnricoMi closed pull request #38676: [SPARK-41162][SQL] Do not push down anti-join predicates that become ambiguous

2023-01-04 Thread GitBox
EnricoMi closed pull request #38676: [SPARK-41162][SQL] Do not push down anti-join predicates that become ambiguous URL: https://github.com/apache/spark/pull/38676 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[GitHub] [spark] EnricoMi commented on pull request #38676: [SPARK-41162][SQL] Do not push down anti-join predicates that become ambiguous

2023-01-04 Thread GitBox
EnricoMi commented on PR #38676: URL: https://github.com/apache/spark/pull/38676#issuecomment-1371192971 Closed in favour of #39131. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] srowen commented on a diff in pull request #39326: [SPARK-41800][BUILD] Upgrade commons-compress to 1.22

2023-01-04 Thread GitBox
srowen commented on code in PR #39326: URL: https://github.com/apache/spark/pull/39326#discussion_r1061696714 ## core/pom.xml: ## @@ -181,6 +181,10 @@ commons-codec commons-codec + + org.apache.commons + commons-compress + Review Comment:

[GitHub] [spark] itholic commented on a diff in pull request #39393: [SPARK-41871][CONNECT] DataFrame hint parameter can be str, list, float or int

2023-01-04 Thread GitBox
itholic commented on code in PR #39393: URL: https://github.com/apache/spark/pull/39393#discussion_r1061696091 ## python/pyspark/sql/connect/dataframe.py: ## @@ -478,9 +478,10 @@ def to_jcols( def hint(self, name: str, *params: Any) -> "DataFrame": for param in

[GitHub] [spark] srowen commented on a diff in pull request #39286: [SPARK-41768][CORE] Refactor the definition of enum to follow with the code style

2023-01-04 Thread GitBox
srowen commented on code in PR #39286: URL: https://github.com/apache/spark/pull/39286#discussion_r1061695522 ## core/src/main/protobuf/org/apache/spark/status/protobuf/store_types.proto: ## @@ -27,10 +27,10 @@ package org.apache.spark.status.protobuf; enum

[GitHub] [spark] srowen closed pull request #39350: [MINOR] Fix a typo "from from" -> "from"

2023-01-04 Thread GitBox
srowen closed pull request #39350: [MINOR] Fix a typo "from from" -> "from" URL: https://github.com/apache/spark/pull/39350 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

[GitHub] [spark] srowen commented on pull request #39350: [MINOR] Fix a typo "from from" -> "from"

2023-01-04 Thread GitBox
srowen commented on PR #39350: URL: https://github.com/apache/spark/pull/39350#issuecomment-1371179995 I'll merge it. You didn't enable tests to run, but, these are just .md file changes that can't affect anything else -- This is an automated message from the Apache Git Service. To

[GitHub] [spark] MaxGekk commented on a diff in pull request #39260: [SPARK-41579][SQL] Assign name to _LEGACY_ERROR_TEMP_1249

2023-01-04 Thread GitBox
MaxGekk commented on code in PR #39260: URL: https://github.com/apache/spark/pull/39260#discussion_r1061672988 ## sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryCompilationErrors.scala: ## @@ -2405,22 +2405,24 @@ private[sql] object QueryCompilationErrors extends

[GitHub] [spark] techaddict commented on pull request #39385: [SPARK-41882][CORE][SQL][UI] Add tests for `SQLAppStatusStore` with RocksDB backend and fix some bugs

2023-01-04 Thread GitBox
techaddict commented on PR #39385: URL: https://github.com/apache/spark/pull/39385#issuecomment-1371130899 @LuciferYang  good catch, thanks -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] itholic opened a new pull request, #39394: [SPARK-41575][SQL] Assign name to _LEGACY_ERROR_TEMP_2054

2023-01-04 Thread GitBox
itholic opened a new pull request, #39394: URL: https://github.com/apache/spark/pull/39394 ### What changes were proposed in this pull request? This PR proposes to assign name to _LEGACY_ERROR_TEMP_2054, "TASK_WRITE_FAILED". ### Why are the changes needed?

[GitHub] [spark] olaky commented on a diff in pull request #39314: [SPARK-41791] Add new metadata types

2023-01-04 Thread GitBox
olaky commented on code in PR #39314: URL: https://github.com/apache/spark/pull/39314#discussion_r1061637682 ## sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FileSourceStrategy.scala: ## @@ -249,8 +255,26 @@ object FileSourceStrategy extends Strategy with

[GitHub] [spark] techaddict closed pull request #39355: [SPARK-40263][CORE] Use interruptible lock instead of synchronized in TransportClientFactory.createClient()

2023-01-04 Thread GitBox
techaddict closed pull request #39355: [SPARK-40263][CORE] Use interruptible lock instead of synchronized in TransportClientFactory.createClient() URL: https://github.com/apache/spark/pull/39355 -- This is an automated message from the Apache Git Service. To respond to the message, please

[GitHub] [spark] cloud-fan commented on a diff in pull request #39277: [SPARK-41708][SQL] Pull v1write information to `WriteFiles`

2023-01-04 Thread GitBox
cloud-fan commented on code in PR #39277: URL: https://github.com/apache/spark/pull/39277#discussion_r1061624700 ## sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/V1WritesHiveUtils.scala: ## @@ -105,4 +112,164 @@ trait V1WritesHiveUtils { .map(_ =>

[GitHub] [spark] cloud-fan commented on a diff in pull request #39277: [SPARK-41708][SQL] Pull v1write information to `WriteFiles`

2023-01-04 Thread GitBox
cloud-fan commented on code in PR #39277: URL: https://github.com/apache/spark/pull/39277#discussion_r1061620498 ## sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/InsertIntoHiveTable.scala: ## @@ -294,3 +282,40 @@ case class InsertIntoHiveTable( override

[GitHub] [spark] cloud-fan commented on a diff in pull request #39277: [SPARK-41708][SQL] Pull v1write information to `WriteFiles`

2023-01-04 Thread GitBox
cloud-fan commented on code in PR #39277: URL: https://github.com/apache/spark/pull/39277#discussion_r1061619079 ## sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/WriteFiles.scala: ## @@ -53,13 +59,17 @@ case class WriteFiles(child: LogicalPlan) extends

[GitHub] [spark] techaddict opened a new pull request, #39393: [SPARK-41871][CONNECT] DataFrame hint parameter can be str, list, float or int

2023-01-04 Thread GitBox
techaddict opened a new pull request, #39393: URL: https://github.com/apache/spark/pull/39393 ### What changes were proposed in this pull request? Spark Connect DataFrame hint parameter can be str, list, float, or int. This is done in parity with pyspark DataFrame.hint ### Why

[GitHub] [spark] grundprinzip commented on a diff in pull request #39361: [SPARK-41822][CONNECT] Setup gRPC connection for Scala/JVM client

2023-01-04 Thread GitBox
grundprinzip commented on code in PR #39361: URL: https://github.com/apache/spark/pull/39361#discussion_r1061588135 ## connector/connect/server/src/main/scala/org/apache/spark/sql/connect/config/Connect.scala: ## @@ -18,14 +18,15 @@ package org.apache.spark.sql.connect.config

[GitHub] [spark] cloud-fan commented on pull request #39343: [SPARK-41816][SQL] Not close filesystem when log out ThriftServer

2023-01-04 Thread GitBox
cloud-fan commented on PR #39343: URL: https://github.com/apache/spark/pull/39343#issuecomment-1371044732 cc @bogdanghit -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

[GitHub] [spark] cloud-fan commented on a diff in pull request #38163: [SPARK-40711][SQL] Add spill size metrics for window

2023-01-04 Thread GitBox
cloud-fan commented on code in PR #38163: URL: https://github.com/apache/spark/pull/38163#discussion_r1061576773 ## sql/core/src/main/scala/org/apache/spark/sql/execution/python/WindowInPandasExec.scala: ## @@ -337,6 +338,7 @@ case class WindowInPandasExec( if

[GitHub] [spark] cloud-fan commented on pull request #39170: [SPARK-41674][SQL] Runtime filter should supports multi level shuffle join side as filter creation side

2023-01-04 Thread GitBox
cloud-fan commented on PR #39170: URL: https://github.com/apache/spark/pull/39170#issuecomment-1371031081 ``` SELECT * FROM bf1 JOIN bf2 JOIN bf3 ON bf1.c1 = bf2.c2 AND bf3.c3 = bf2.c2 WHERE bf2.a2 = 5 ``` Can you show the query plan before

[GitHub] [spark] cloud-fan commented on a diff in pull request #39314: [SPARK-41791] Add new metadata types

2023-01-04 Thread GitBox
cloud-fan commented on code in PR #39314: URL: https://github.com/apache/spark/pull/39314#discussion_r1061563125 ## sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FileSourceStrategy.scala: ## @@ -249,8 +255,26 @@ object FileSourceStrategy extends Strategy

[GitHub] [spark] vicennial commented on a diff in pull request #39361: [SPARK-41822][CONNECT] Setup gRPC connection for Scala/JVM client

2023-01-04 Thread GitBox
vicennial commented on code in PR #39361: URL: https://github.com/apache/spark/pull/39361#discussion_r1061524162 ## connector/connect/client/jvm/src/test/scala/org/apache/spark/sql/connect/client/SparkConnectClientSuite.scala: ## @@ -16,17 +16,79 @@ */ package

[GitHub] [spark] Daniel-Davies commented on a diff in pull request #38867: [SPARK-41234][SQL][PYTHON] Add `array_insert` function

2023-01-04 Thread GitBox
Daniel-Davies commented on code in PR #38867: URL: https://github.com/apache/spark/pull/38867#discussion_r1061515779 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala: ## @@ -4601,6 +4601,231 @@ case class ArrayExcept(left:

[GitHub] [spark] zhengruifeng opened a new pull request, #39392: [SPARK-41846][CONNECT][PYTHON] Enable doctests for window functions

2023-01-04 Thread GitBox
zhengruifeng opened a new pull request, #39392: URL: https://github.com/apache/spark/pull/39392 ### What changes were proposed in this pull request? Enable doctests for window functions ### Why are the changes needed? for test coverage ### Does this PR introduce

[GitHub] [spark] LuciferYang opened a new pull request, #39391: [SPARK-41883][BUILD] Upgrade dropwizard metrics 4.2.15

2023-01-04 Thread GitBox
LuciferYang opened a new pull request, #39391: URL: https://github.com/apache/spark/pull/39391 ### What changes were proposed in this pull request? This pr aims upgrade dropwizard metrics to 4.2.15. ### Why are the changes needed? The release notes as follows: -

[GitHub] [spark] zhengruifeng commented on pull request #39390: [SPARK-41840][CONNECT][PYTHON] Add the missing alias `groupby`

2023-01-04 Thread GitBox
zhengruifeng commented on PR #39390: URL: https://github.com/apache/spark/pull/39390#issuecomment-1370856219 cc @HyukjinKwon -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] zhengruifeng opened a new pull request, #39390: [SPARK-41840][CONNECT][PYTHON] Add the missing alias `groupby`

2023-01-04 Thread GitBox
zhengruifeng opened a new pull request, #39390: URL: https://github.com/apache/spark/pull/39390 ### What changes were proposed in this pull request? Add the missing alias `groupby` ### Why are the changes needed? for api coverage and test coverage ### Does this PR

[GitHub] [spark] LuciferYang commented on pull request #39385: [SPARK-41882][SQL] Add tests for `SQLAppStatusStore` with RocksDB backend and fix some bugs

2023-01-04 Thread GitBox
LuciferYang commented on PR #39385: URL: https://github.com/apache/spark/pull/39385#issuecomment-1370840366 > @LuciferYang Thanks for fixing this! Could you add tests for the SQL UI with RocksDB as the backend? For example, you can have a new test suite based on `SQLAppStatusListenerSuite`

[GitHub] [spark] LuciferYang commented on a diff in pull request #39385: [SPARK-41882][SQL] Add tests for `SQLAppStatusStore` with RocksDB backend and fix some bugs

2023-01-04 Thread GitBox
LuciferYang commented on code in PR #39385: URL: https://github.com/apache/spark/pull/39385#discussion_r1061409595 ## core/src/main/protobuf/org/apache/spark/status/protobuf/store_types.proto: ## @@ -395,7 +395,8 @@ message SQLExecutionUIData { optional string error_message

[GitHub] [spark] itholic opened a new pull request, #39389: [SPARK-41574][SQL] Assign name to _LEGACY_ERROR_TEMP_2009

2023-01-04 Thread GitBox
itholic opened a new pull request, #39389: URL: https://github.com/apache/spark/pull/39389 ### What changes were proposed in this pull request? This PR proposes to migrate `_LEGACY_ERROR_TEMP_2136` into `INTERNAL_ERROR`. ### Why are the changes needed? We

[GitHub] [spark] LuciferYang commented on a diff in pull request #39385: [SPARK-41882][SQL] Add tests for `SQLAppStatusStore` with RocksDB backend and fix some bugs

2023-01-04 Thread GitBox
LuciferYang commented on code in PR #39385: URL: https://github.com/apache/spark/pull/39385#discussion_r1061400974 ## sql/core/src/main/scala/org/apache/spark/status/protobuf/sql/SparkPlanGraphWrapperSerializer.scala: ## @@ -53,8 +53,9 @@ class SparkPlanGraphWrapperSerializer

[GitHub] [spark] WeichenXu123 commented on a diff in pull request #39188: [WIP][SPARK-41591][PYTHON][ML] Training PyTorch Files on Single Node Multi GPU

2023-01-04 Thread GitBox
WeichenXu123 commented on code in PR #39188: URL: https://github.com/apache/spark/pull/39188#discussion_r1061399015 ## python/pyspark/ml/torch/distributor.py: ## @@ -0,0 +1,491 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license

[GitHub] [spark] juechen507 commented on pull request #30003: [SPARK-32709][SQL] Support write Hive ORC/Parquet bucketed table (for Hive 1,2)

2023-01-04 Thread GitBox
juechen507 commented on PR #30003: URL: https://github.com/apache/spark/pull/30003#issuecomment-1370825133 Whether the spark-written-hive-bucketed-table can be read by spark-sql to do bucket filter pruning, join, group-by? In my test, bucket information cannot be used for group-by

[GitHub] [spark] itholic commented on a diff in pull request #39387: [SPARK-41586][PYTHON] Introduce `pyspark.errors` and error classes for PySpark.

2023-01-04 Thread GitBox
itholic commented on code in PR #39387: URL: https://github.com/apache/spark/pull/39387#discussion_r1061376821 ## python/pyspark/sql/tests/test_functions.py: ## @@ -1030,9 +1031,13 @@ def test_lit_list(self): self.assertEqual(actual, expected) df =

[GitHub] [spark] itholic commented on a diff in pull request #39387: [SPARK-41586][PYTHON] Introduce `pyspark.errors` and error classes for PySpark.

2023-01-04 Thread GitBox
itholic commented on code in PR #39387: URL: https://github.com/apache/spark/pull/39387#discussion_r1061372891 ## python/pyspark/errors/error-classes.json: ## @@ -0,0 +1,7 @@ +{ + "COLUMN_IN_LIST" : { +"message" : [ + " does not allow a column in a list" +]

[GitHub] [spark] dengziming opened a new pull request, #39388: [SPARK-41354][CONNECT][PYTHON] implement RepartitionByExpression

2023-01-04 Thread GitBox
dengziming opened a new pull request, #39388: URL: https://github.com/apache/spark/pull/39388 ### What changes were proposed in this pull request? Implement DataFrame.hint for pyspark ### Why are the changes needed? For API coverage ### Does this PR introduce _any_

[GitHub] [spark] itholic commented on a diff in pull request #39387: [SPARK-41586][PYTHON] Introduce `pyspark.errors` and error classes for PySpark.

2023-01-04 Thread GitBox
itholic commented on code in PR #39387: URL: https://github.com/apache/spark/pull/39387#discussion_r1061374539 ## python/pyspark/sql/functions.py: ## @@ -172,7 +173,9 @@ def lit(col: Any) -> Column: return col elif isinstance(col, list): if

[GitHub] [spark] itholic commented on a diff in pull request #39387: [SPARK-41586][PYTHON] Introduce `pyspark.errors` and error classes for PySpark.

2023-01-04 Thread GitBox
itholic commented on code in PR #39387: URL: https://github.com/apache/spark/pull/39387#discussion_r1061372713 ## python/pyspark/errors/error-classes.json: ## @@ -0,0 +1,7 @@ +{ + "COLUMN_IN_LIST" : { +"message" : [ + " does not allow a column in a list" +]

[GitHub] [spark] itholic commented on a diff in pull request #39387: [SPARK-41586][PYTHON] Introduce `pyspark.errors` and error classes for PySpark.

2023-01-04 Thread GitBox
itholic commented on code in PR #39387: URL: https://github.com/apache/spark/pull/39387#discussion_r1061372309 ## python/pyspark/errors/error-classes.json: ## @@ -0,0 +1,7 @@ +{ + "COLUMN_IN_LIST" : { Review Comment: Will add more error-classes in follow-up PRs.

[GitHub] [spark] LuciferYang commented on a diff in pull request #39385: [SPARK-41432][SQL][FOLLOWUP] Fix npe when `SparkPlanGraphWrapperSerializer#serializeSparkPlanGraphNodeWrapper`

2023-01-04 Thread GitBox
LuciferYang commented on code in PR #39385: URL: https://github.com/apache/spark/pull/39385#discussion_r1061369728 ## core/src/main/protobuf/org/apache/spark/status/protobuf/store_types.proto: ## @@ -395,7 +395,8 @@ message SQLExecutionUIData { optional string error_message

[GitHub] [spark] itholic opened a new pull request, #39387: [SPARK-41586][PYTHON] Introduce `pyspark.errors` and error classes for PySpark.

2023-01-04 Thread GitBox
itholic opened a new pull request, #39387: URL: https://github.com/apache/spark/pull/39387 ### What changes were proposed in this pull request? This PR proposes to introduce `pyspark.errors` and error classes to unifying & improving errors generated by PySpark under a single path.

[GitHub] [spark] zhengruifeng commented on pull request #39386: [SPARK-41833][SPARK-41881][SPARK-41815][CONNECT][PYTHON] Make `DataFrame.collect` handle None/NaN/Array/Binary porperly

2023-01-04 Thread GitBox
zhengruifeng commented on PR #39386: URL: https://github.com/apache/spark/pull/39386#issuecomment-1370782139 cc @HyukjinKwon @grundprinzip -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] AmplabJenkins commented on pull request #39350: [MINOR] Fix a typo "from from" -> "from"

2023-01-04 Thread GitBox
AmplabJenkins commented on PR #39350: URL: https://github.com/apache/spark/pull/39350#issuecomment-1370754460 Can one of the admins verify this patch? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[GitHub] [spark] vicennial commented on a diff in pull request #39361: [SPARK-41822][CONNECT] Setup gRPC connection for Scala/JVM client

2023-01-04 Thread GitBox
vicennial commented on code in PR #39361: URL: https://github.com/apache/spark/pull/39361#discussion_r1061307438 ## connector/connect/client/jvm/src/test/scala/org/apache/spark/sql/connect/client/SparkConnectClientSuite.scala: ## @@ -16,17 +16,79 @@ */ package

[GitHub] [spark] vicennial commented on a diff in pull request #39361: [SPARK-41822][CONNECT] Setup gRPC connection for Scala/JVM client

2023-01-04 Thread GitBox
vicennial commented on code in PR #39361: URL: https://github.com/apache/spark/pull/39361#discussion_r1061305904 ## connector/connect/server/src/main/scala/org/apache/spark/sql/connect/config/Connect.scala: ## @@ -18,14 +18,15 @@ package org.apache.spark.sql.connect.config

[GitHub] [spark] LorenzoMartini commented on pull request #39367: [SPARK-41861][SQL] Make v2 ScanBuilders' build() return typed scan

2023-01-04 Thread GitBox
LorenzoMartini commented on PR #39367: URL: https://github.com/apache/spark/pull/39367#issuecomment-1370667960 Tests are passing, failing on `linters` that seems like a very common flake and `docker integration tests` which doesn't seem related at all so probably just another flake --

[GitHub] [spark] LorenzoMartini commented on pull request #39366: [SPARK-41860][SQL] Make AvroScanBuilder and JsonScanBuilder case classes

2023-01-04 Thread GitBox
LorenzoMartini commented on PR #39366: URL: https://github.com/apache/spark/pull/39366#issuecomment-1370664846 Tests are passing. Only `python linter` tests are failing, but there is no related change and it sems like they are failing in many other PRs so it's a flake -- This is an

<    1   2   3   >