[GitHub] [spark] mcdull-zhang commented on a diff in pull request #38877: [SPARK-41361] [SQL] Invalid call toAttribute on unresolved object exception caused by WidenSetOperationTypes

2023-01-12 Thread GitBox
mcdull-zhang commented on code in PR #38877: URL: https://github.com/apache/spark/pull/38877#discussion_r1068089336 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/ScriptTransformation.scala: ## @@ -32,7 +32,13 @@ case class ScriptTransformation(

[GitHub] [spark] huaxingao commented on pull request #39533: [SPARK-42031][CORE][SQL] Clean up `remove` methods that do not need override

2023-01-12 Thread GitBox
huaxingao commented on PR #39533: URL: https://github.com/apache/spark/pull/39533#issuecomment-1380734995 LGTM cc @sunchao -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] itholic commented on a diff in pull request #39387: [SPARK-41586][PYTHON] Introduce `pyspark.errors` and error classes for PySpark.

2023-01-12 Thread GitBox
itholic commented on code in PR #39387: URL: https://github.com/apache/spark/pull/39387#discussion_r1068212755 ## dev/pyproject.toml: ## @@ -31,4 +31,4 @@ required-version = "22.6.0" line-length = 100 target-version = ['py37'] include = '\.pyi?$' -extend-exclude =

[GitHub] [spark] AmplabJenkins commented on pull request #39524: [WIP][SPARK-41990][SQL] Fix bug for FieldReference

2023-01-12 Thread GitBox
AmplabJenkins commented on PR #39524: URL: https://github.com/apache/spark/pull/39524#issuecomment-1380641678 Can one of the admins verify this patch? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[GitHub] [spark] AmplabJenkins commented on pull request #39519: [SPARK-41995][SQL] Accept non-foldable expressions in schema_of_json

2023-01-12 Thread GitBox
AmplabJenkins commented on PR #39519: URL: https://github.com/apache/spark/pull/39519#issuecomment-1380641820 Can one of the admins verify this patch? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[GitHub] [spark] LuciferYang commented on pull request #39534: [MINOR][CONNECT][DOCS] Fix typo in `connect/README.md`

2023-01-12 Thread GitBox
LuciferYang commented on PR #39534: URL: https://github.com/apache/spark/pull/39534#issuecomment-1380375275 cc @HyukjinKwon just a typo fix -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] cloud-fan commented on a diff in pull request #39524: [WIP][SPARK-41990][SQL] Fix bug for FieldReference

2023-01-12 Thread GitBox
cloud-fan commented on code in PR #39524: URL: https://github.com/apache/spark/pull/39524#discussion_r1068166049 ## sql/catalyst/src/main/scala/org/apache/spark/sql/connector/expressions/expressions.scala: ## @@ -365,11 +366,11 @@ private[sql] final case class

[GitHub] [spark] zhengruifeng commented on pull request #39534: [MINOR][CONNECT][DOCS] Fix typo in `connect/README.md`

2023-01-12 Thread GitBox
zhengruifeng commented on PR #39534: URL: https://github.com/apache/spark/pull/39534#issuecomment-1380478700 merged into master -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] zhengruifeng commented on a diff in pull request #39535: [SPARK-41746][SPARK-41838][SPARK-41837][SPARK-41835][SPARK-41836][SPARK-41847][CONNECT][PYTHON] Make `createDataFrame(rows/lis

2023-01-12 Thread GitBox
zhengruifeng commented on code in PR #39535: URL: https://github.com/apache/spark/pull/39535#discussion_r1068252930 ## python/pyspark/sql/tests/connect/test_connect_basic.py: ## @@ -439,7 +439,7 @@ def test_with_local_list(self): with self.assertRaisesRegex(

[GitHub] [spark] zhengruifeng commented on a diff in pull request #39535: [SPARK-41746][SPARK-41838][SPARK-41837][SPARK-41835][SPARK-41836][SPARK-41847][CONNECT][PYTHON] Make `createDataFrame(rows/lis

2023-01-12 Thread GitBox
zhengruifeng commented on code in PR #39535: URL: https://github.com/apache/spark/pull/39535#discussion_r1068251650 ## python/pyspark/sql/connect/session.py: ## @@ -289,65 +295,48 @@ def createDataFrame( else: _data = list(data) -if _schema

[GitHub] [spark] itholic commented on pull request #39506: [SPARK-41983][SQL] Rename & improve error message for `NULL_COMPARISON_RESULT`

2023-01-12 Thread GitBox
itholic commented on PR #39506: URL: https://github.com/apache/spark/pull/39506#issuecomment-1380547746 cc @MaxGekk @srielau -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] itholic commented on pull request #39507: [SPARK-41984][SQL] Rename & improve error message for `RESET_PERMISSION_TO_ORIGINAL`

2023-01-12 Thread GitBox
itholic commented on PR #39507: URL: https://github.com/apache/spark/pull/39507#issuecomment-1380547638 cc @MaxGekk @srielau -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] eejbyfeldt commented on pull request #38428: [SPARK-40912][CORE]Overhead of Exceptions in KryoDeserializationStream

2023-01-12 Thread GitBox
eejbyfeldt commented on PR #38428: URL: https://github.com/apache/spark/pull/38428#issuecomment-1380312253 > The PR as such looks reasonable to me - can we add a test to explicitly test for EOF behavior ? @mridulm I added a spec for this in:

[GitHub] [spark] attilapiros commented on pull request #38828: [SPARK-35084][CORE] Spark 3: supporting --packages in k8s cluster mode

2023-01-12 Thread GitBox
attilapiros commented on PR #38828: URL: https://github.com/apache/spark/pull/38828#issuecomment-1380652767 I have started to review this PR. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

[GitHub] [spark] zhengruifeng closed pull request #39534: [MINOR][CONNECT][DOCS] Fix typo in `connect/README.md`

2023-01-12 Thread GitBox
zhengruifeng closed pull request #39534: [MINOR][CONNECT][DOCS] Fix typo in `connect/README.md` URL: https://github.com/apache/spark/pull/39534 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] LuciferYang opened a new pull request, #39533: [DON'T MERGE] Clean up `remove` methods that do not need override

2023-01-12 Thread GitBox
LuciferYang opened a new pull request, #39533: URL: https://github.com/apache/spark/pull/39533 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ###

[GitHub] [spark] LuciferYang opened a new pull request, #39534: [MINOR][CONNECT][DOCS] Fix typo in `connect/README.md`

2023-01-12 Thread GitBox
LuciferYang opened a new pull request, #39534: URL: https://github.com/apache/spark/pull/39534 ### What changes were proposed in this pull request? This pr fix typo in `connect/README.md` ### Why are the changes needed? Fix typo in `connect/README.md` ### Does

[GitHub] [spark] wangyum commented on pull request #39460: [SPARK-39217][SQL] Makes DPP support the pruning side has Union

2023-01-12 Thread GitBox
wangyum commented on PR #39460: URL: https://github.com/apache/spark/pull/39460#issuecomment-1380395079 cc @cloud-fan -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

[GitHub] [spark] wangyum commented on a diff in pull request #39512: [SPARK-41986][SQL] Introduce shuffle on SinglePartition

2023-01-12 Thread GitBox
wangyum commented on code in PR #39512: URL: https://github.com/apache/spark/pull/39512#discussion_r1068155569 ## sql/core/src/main/scala/org/apache/spark/sql/execution/exchange/EnsureRequirements.scala: ## @@ -76,13 +76,17 @@ case class EnsureRequirements( case _ =>

[GitHub] [spark] zhengruifeng opened a new pull request, #39535: [SPARK-41746][SPARK-41838][SPARK-41837][SPARK-41835][SPARK-41836][SPARK-41847][CONNECT][PYTHON] Make `createDataFrame(rows/lists/tuples

2023-01-12 Thread GitBox
zhengruifeng opened a new pull request, #39535: URL: https://github.com/apache/spark/pull/39535 ### What changes were proposed in this pull request? Make `createDataFrame` support nested types when the input data are rows, lists, tuples, dicts ### Why are the changes needed?

[GitHub] [spark] zhengruifeng commented on a diff in pull request #39535: [SPARK-41746][SPARK-41838][SPARK-41837][SPARK-41835][SPARK-41836][SPARK-41847][CONNECT][PYTHON] Make `createDataFrame(rows/lis

2023-01-12 Thread GitBox
zhengruifeng commented on code in PR #39535: URL: https://github.com/apache/spark/pull/39535#discussion_r1068248250 ## python/pyspark/sql/connect/conversion.py: ## @@ -0,0 +1,208 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license

[GitHub] [spark] mengxr commented on a diff in pull request #37734: [SPARK-40264][ML] add batch_infer_udf function to pyspark.ml.functions

2023-01-12 Thread GitBox
mengxr commented on code in PR #37734: URL: https://github.com/apache/spark/pull/37734#discussion_r1068357996 ## python/pyspark/ml/functions.py: ## @@ -106,6 +152,597 @@ def array_to_vector(col: Column) -> Column: return

[GitHub] [spark] rangadi opened a new pull request, #39536: [SQL][PROTOBUF] Fix how exception is handled in error reporting.

2023-01-12 Thread GitBox
rangadi opened a new pull request, #39536: URL: https://github.com/apache/spark/pull/39536 ### What changes were proposed in this pull request? Protobuf connector related error handlers incorrectly report the exception. This is makes it hard for users to see actual issue. E.g. if

[GitHub] [spark] rangadi commented on pull request #39536: [SQL][PROTOBUF] Fix how exception is handled in error reporting.

2023-01-12 Thread GitBox
rangadi commented on PR #39536: URL: https://github.com/apache/spark/pull/39536#issuecomment-1381070483 @SandishKumarHN please take a look. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] rednaxelafx commented on pull request #39518: [SPARK-41991][SQL] `CheckOverflowInTableInsert` should accept ExpressionProxy as child

2023-01-12 Thread GitBox
rednaxelafx commented on PR #39518: URL: https://github.com/apache/spark/pull/39518#issuecomment-1381080180 I did a search as well and I haven't found any other `Expression.withNewChildrenInternal()` implementations that have casts in them. Most scalar expressions are indeed only requiring

[GitHub] [spark] HyukjinKwon commented on pull request #39518: [SPARK-41991][SQL] `CheckOverflowInTableInsert` should accept ExpressionProxy as child

2023-01-12 Thread GitBox
HyukjinKwon commented on PR #39518: URL: https://github.com/apache/spark/pull/39518#issuecomment-1381161461 @bersprockets mind creating a backporting PR for branch-3.3 please? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

[GitHub] [spark] HyukjinKwon commented on pull request #39518: [SPARK-41991][SQL] `CheckOverflowInTableInsert` should accept ExpressionProxy as child

2023-01-12 Thread GitBox
HyukjinKwon commented on PR #39518: URL: https://github.com/apache/spark/pull/39518#issuecomment-1381160678 Let me just get this in as a one time fix for now (since it needs to be backported too, and has to be a minimized fix). If similar things come up to the surface next time, maybe we

[GitHub] [spark] HyukjinKwon closed pull request #39518: [SPARK-41991][SQL] `CheckOverflowInTableInsert` should accept ExpressionProxy as child

2023-01-12 Thread GitBox
HyukjinKwon closed pull request #39518: [SPARK-41991][SQL] `CheckOverflowInTableInsert` should accept ExpressionProxy as child URL: https://github.com/apache/spark/pull/39518 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

[GitHub] [spark] itholic commented on a diff in pull request #39387: [SPARK-41586][PYTHON] Introduce `pyspark.errors` and error classes for PySpark.

2023-01-12 Thread GitBox
itholic commented on code in PR #39387: URL: https://github.com/apache/spark/pull/39387#discussion_r1068831796 ## python/pyspark/errors/error_classes.py: ## @@ -0,0 +1,30 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license

[GitHub] [spark] rednaxelafx commented on pull request #39518: [SPARK-41991][SQL] `CheckOverflowInTableInsert` should accept ExpressionProxy as child

2023-01-12 Thread GitBox
rednaxelafx commented on PR #39518: URL: https://github.com/apache/spark/pull/39518#issuecomment-1381033109 The fix sure works. I feel like polluting the `ExpressionProxy` into core expression implementation is unfortunate, but off the top of my head I don't have a better alternative

[GitHub] [spark] rithwik-db commented on a diff in pull request #39267: [WIP][SPARK-41592][PYTHON][ML] Pytorch file Distributed Training

2023-01-12 Thread GitBox
rithwik-db commented on code in PR #39267: URL: https://github.com/apache/spark/pull/39267#discussion_r1068755416 ## python/pyspark/ml/torch/distributor.py: ## @@ -428,6 +432,84 @@ def _run_local_training( return output +def _get_spark_task_program( +

[GitHub] [spark] sunchao opened a new pull request, #39540: [SPARK-42039][SQL] SPJ: Remove Option in KeyGroupedPartitioning#partitionValuesOpt

2023-01-12 Thread GitBox
sunchao opened a new pull request, #39540: URL: https://github.com/apache/spark/pull/39540 ### What changes were proposed in this pull request? Currently `KeyGroupedPartitioning#partitionValuesOpt` is of type: `Option[Seq[InternalRow]]`. This refactors it into

[GitHub] [spark] zhengruifeng commented on pull request #39514: [SPARK-41987][CONNECT][PYTHON] Connect API: createDataFrame should supports column with map type

2023-01-12 Thread GitBox
zhengruifeng commented on PR #39514: URL: https://github.com/apache/spark/pull/39514#issuecomment-1381181758 @beliefer Thanks for working on this, but I prefer to supporting all the complex types together [in one batch](https://github.com/apache/spark/pull/39535). Would you mind taking a

[GitHub] [spark] rithwik-db commented on a diff in pull request #39267: [WIP][SPARK-41592][PYTHON][ML] Pytorch file Distributed Training

2023-01-12 Thread GitBox
rithwik-db commented on code in PR #39267: URL: https://github.com/apache/spark/pull/39267#discussion_r1068757070 ## python/pyspark/ml/torch/distributor.py: ## @@ -428,6 +432,84 @@ def _run_local_training( return output +def _get_spark_task_program( +

[GitHub] [spark] dongjoon-hyun opened a new pull request, #39539: [SPARK-42037][INFRA] Remove prefix in build environment variables

2023-01-12 Thread GitBox
dongjoon-hyun opened a new pull request, #39539: URL: https://github.com/apache/spark/pull/39539 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ###

[GitHub] [spark] lu-wang-dl commented on a diff in pull request #39267: [WIP][SPARK-41592][PYTHON][ML] Pytorch file Distributed Training

2023-01-12 Thread GitBox
lu-wang-dl commented on code in PR #39267: URL: https://github.com/apache/spark/pull/39267#discussion_r1068709600 ## python/pyspark/ml/torch/distributor.py: ## @@ -428,6 +432,84 @@ def _run_local_training( return output +def _get_spark_task_program( +

[GitHub] [spark] rangadi commented on a diff in pull request #39536: [SQL][PROTOBUF] Fix how exception is handled in error reporting.

2023-01-12 Thread GitBox
rangadi commented on code in PR #39536: URL: https://github.com/apache/spark/pull/39536#discussion_r1068737942 ## connector/protobuf/src/main/scala/org/apache/spark/sql/protobuf/utils/ProtobufUtils.scala: ## @@ -229,7 +229,7 @@ private[sql] object ProtobufUtils extends Logging

[GitHub] [spark] rithwik-db commented on a diff in pull request #39267: [WIP][SPARK-41592][PYTHON][ML] Pytorch file Distributed Training

2023-01-12 Thread GitBox
rithwik-db commented on code in PR #39267: URL: https://github.com/apache/spark/pull/39267#discussion_r1068751565 ## python/pyspark/ml/torch/distributor.py: ## @@ -428,6 +432,84 @@ def _run_local_training( return output +def _get_spark_task_program( +

[GitHub] [spark] HyukjinKwon closed pull request #39435: [SPARK-41926][UI][TESTS] Add Github action test job with RocksDB as UI backend

2023-01-12 Thread GitBox
HyukjinKwon closed pull request #39435: [SPARK-41926][UI][TESTS] Add Github action test job with RocksDB as UI backend URL: https://github.com/apache/spark/pull/39435 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

[GitHub] [spark] HyukjinKwon commented on pull request #39435: [SPARK-41926][UI][TESTS] Add Github action test job with RocksDB as UI backend

2023-01-12 Thread GitBox
HyukjinKwon commented on PR #39435: URL: https://github.com/apache/spark/pull/39435#issuecomment-1381174218 Merged to master. cc @Yikun FYI if you find some time for a posthoc review. -- This is an automated message from the Apache Git Service. To respond to the message, please

[GitHub] [spark] LuciferYang commented on pull request #39532: [SPARK-42030][CORE] Remove unused Constructor from RocksDB.TypeAliases and LevelDB.TypeAliases

2023-01-12 Thread GitBox
LuciferYang commented on PR #39532: URL: https://github.com/apache/spark/pull/39532#issuecomment-1381236764 close this one due to jackson need this Constructor -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[GitHub] [spark] LuciferYang closed pull request #39532: [SPARK-42030][CORE] Remove unused Constructor from RocksDB.TypeAliases and LevelDB.TypeAliases

2023-01-12 Thread GitBox
LuciferYang closed pull request #39532: [SPARK-42030][CORE] Remove unused Constructor from RocksDB.TypeAliases and LevelDB.TypeAliases URL: https://github.com/apache/spark/pull/39532 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

[GitHub] [spark] lu-wang-dl commented on a diff in pull request #39267: [WIP][SPARK-41592][PYTHON][ML] Pytorch file Distributed Training

2023-01-12 Thread GitBox
lu-wang-dl commented on code in PR #39267: URL: https://github.com/apache/spark/pull/39267#discussion_r1068711508 ## python/pyspark/ml/torch/distributor.py: ## @@ -428,6 +432,84 @@ def _run_local_training( return output +def _get_spark_task_program( +

[GitHub] [spark] rithwik-db commented on a diff in pull request #39267: [WIP][SPARK-41592][PYTHON][ML] Pytorch file Distributed Training

2023-01-12 Thread GitBox
rithwik-db commented on code in PR #39267: URL: https://github.com/apache/spark/pull/39267#discussion_r1068751120 ## python/pyspark/ml/torch/distributor.py: ## @@ -428,6 +432,84 @@ def _run_local_training( return output +def _get_spark_task_program( +

[GitHub] [spark] rithwik-db commented on a diff in pull request #39267: [WIP][SPARK-41592][PYTHON][ML] Pytorch file Distributed Training

2023-01-12 Thread GitBox
rithwik-db commented on code in PR #39267: URL: https://github.com/apache/spark/pull/39267#discussion_r1068755416 ## python/pyspark/ml/torch/distributor.py: ## @@ -428,6 +432,84 @@ def _run_local_training( return output +def _get_spark_task_program( +

[GitHub] [spark] srielau opened a new pull request, #39537: [SPARK-41994] [DRAFT] Assign SQLSTATE's (1/?)

2023-01-12 Thread GitBox
srielau opened a new pull request, #39537: URL: https://github.com/apache/spark/pull/39537 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### How

[GitHub] [spark] jerrypeng opened a new pull request, #39538: [SPARK-41596] Document the new feature "Async Progress Tracking" to Structured Streaming guide doc

2023-01-12 Thread GitBox
jerrypeng opened a new pull request, #39538: URL: https://github.com/apache/spark/pull/39538 … ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change?

[GitHub] [spark] attilapiros commented on a diff in pull request #38828: [SPARK-35084][CORE] Spark 3: supporting --packages in k8s cluster mode

2023-01-12 Thread GitBox
attilapiros commented on code in PR #38828: URL: https://github.com/apache/spark/pull/38828#discussion_r1068765407 ## core/src/test/scala/org/apache/spark/deploy/SparkSubmitSuite.scala: ## @@ -486,6 +486,34 @@ class SparkSubmitSuite

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #39539: [SPARK-42037][INFRA] Remove `AMPLAB_` prefix in build environment variables

2023-01-12 Thread GitBox
HyukjinKwon commented on code in PR #39539: URL: https://github.com/apache/spark/pull/39539#discussion_r1068790662 ## dev/run-tests.py: ## @@ -500,12 +500,12 @@ def main(): else: print("Cannot install SparkR as R was not found in PATH") -if

[GitHub] [spark] bersprockets commented on pull request #39518: [SPARK-41991][SQL] `CheckOverflowInTableInsert` should accept ExpressionProxy as child

2023-01-12 Thread GitBox
bersprockets commented on PR #39518: URL: https://github.com/apache/spark/pull/39518#issuecomment-1381168964 Thanks @HyukjinKwon ! I will work on the backport to 3.3. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

[GitHub] [spark] huaxingao commented on pull request #39533: [SPARK-42031][CORE][SQL] Clean up `remove` methods that do not need override

2023-01-12 Thread GitBox
huaxingao commented on PR #39533: URL: https://github.com/apache/spark/pull/39533#issuecomment-1381031450 Merged to master. Thanks @LuciferYang @sunchao -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

[GitHub] [spark] huaxingao closed pull request #39533: [SPARK-42031][CORE][SQL] Clean up `remove` methods that do not need override

2023-01-12 Thread GitBox
huaxingao closed pull request #39533: [SPARK-42031][CORE][SQL] Clean up `remove` methods that do not need override URL: https://github.com/apache/spark/pull/39533 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[GitHub] [spark] lu-wang-dl commented on a diff in pull request #39267: [WIP][SPARK-41592][PYTHON][ML] Pytorch file Distributed Training

2023-01-12 Thread GitBox
lu-wang-dl commented on code in PR #39267: URL: https://github.com/apache/spark/pull/39267#discussion_r1068708067 ## python/pyspark/ml/torch/distributor.py: ## @@ -428,6 +432,84 @@ def _run_local_training( return output +def _get_spark_task_program( +

[GitHub] [spark] lu-wang-dl commented on a diff in pull request #39267: [WIP][SPARK-41592][PYTHON][ML] Pytorch file Distributed Training

2023-01-12 Thread GitBox
lu-wang-dl commented on code in PR #39267: URL: https://github.com/apache/spark/pull/39267#discussion_r1068713107 ## python/pyspark/ml/torch/distributor.py: ## @@ -428,6 +432,84 @@ def _run_local_training( return output +def _get_spark_task_program( +

[GitHub] [spark] bersprockets commented on pull request #39518: [SPARK-41991][SQL] `CheckOverflowInTableInsert` should accept ExpressionProxy as child

2023-01-12 Thread GitBox
bersprockets commented on PR #39518: URL: https://github.com/apache/spark/pull/39518#issuecomment-1381055130 @rednaxelafx >Should we fix all occurrences of such pattern I checked this at some point. I could find only two additional cases: `SortOrder` `GeneratorOuter`

[GitHub] [spark] rithwik-db commented on a diff in pull request #39267: [WIP][SPARK-41592][PYTHON][ML] Pytorch file Distributed Training

2023-01-12 Thread GitBox
rithwik-db commented on code in PR #39267: URL: https://github.com/apache/spark/pull/39267#discussion_r1068750849 ## python/pyspark/ml/torch/distributor.py: ## @@ -428,6 +432,84 @@ def _run_local_training( return output +def _get_spark_task_program( +

[GitHub] [spark] dongjoon-hyun commented on pull request #39539: [SPARK-42037][INFRA] Remove `AMPLAB_` prefix in build environment variables

2023-01-12 Thread GitBox
dongjoon-hyun commented on PR #39539: URL: https://github.com/apache/spark/pull/39539#issuecomment-1381116307 cc @HyukjinKwon -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] dongjoon-hyun commented on a diff in pull request #39512: [SPARK-41986][SQL] Introduce shuffle on SinglePartition

2023-01-12 Thread GitBox
dongjoon-hyun commented on code in PR #39512: URL: https://github.com/apache/spark/pull/39512#discussion_r1068470209 ## sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala: ## @@ -513,6 +513,14 @@ object SQLConf { .booleanConf

[GitHub] [spark] mridulm commented on a diff in pull request #37922: [SPARK-40480][SHUFFLE] Remove push-based shuffle data after query finished

2023-01-12 Thread GitBox
mridulm commented on code in PR #37922: URL: https://github.com/apache/spark/pull/37922#discussion_r1068470671 ## common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/RemoteBlockPushResolver.java: ## @@ -470,6 +530,40 @@ void closeAndDeleteOutdatedPartitions(

[GitHub] [spark] akpatnam25 commented on pull request #38959: SPARK-41415: SASL Request Retries

2023-01-12 Thread GitBox
akpatnam25 commented on PR #38959: URL: https://github.com/apache/spark/pull/38959#issuecomment-1380920219 @mridulm should be good to review now -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

[GitHub] [spark] gengliangwang commented on pull request #39530: [SPARK-42026][CORE] Protobuf serializer for `AppSummary` and `PoolData`

2023-01-12 Thread GitBox
gengliangwang commented on PR #39530: URL: https://github.com/apache/spark/pull/39530#issuecomment-1380875264 Thanks, merging to master -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] AmplabJenkins commented on pull request #39515: [SPARK-38743][SQL][TEST] Test the error class: MISSING_STATIC_PARTITION_COLUMN

2023-01-12 Thread GitBox
AmplabJenkins commented on PR #39515: URL: https://github.com/apache/spark/pull/39515#issuecomment-1380953631 Can one of the admins verify this patch? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[GitHub] [spark] gengliangwang closed pull request #39530: [SPARK-42026][CORE] Protobuf serializer for `AppSummary` and `PoolData`

2023-01-12 Thread GitBox
gengliangwang closed pull request #39530: [SPARK-42026][CORE] Protobuf serializer for `AppSummary` and `PoolData` URL: https://github.com/apache/spark/pull/39530 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[GitHub] [spark] mridulm commented on a diff in pull request #37922: [SPARK-40480][SHUFFLE] Remove push-based shuffle data after query finished

2023-01-12 Thread GitBox
mridulm commented on code in PR #37922: URL: https://github.com/apache/spark/pull/37922#discussion_r1068472060 ## common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/ExternalBlockStoreClient.java: ## @@ -256,6 +256,22 @@ public void onFailure(Throwable e) {

[GitHub] [spark] dongjoon-hyun commented on a diff in pull request #39512: [SPARK-41986][SQL] Introduce shuffle on SinglePartition

2023-01-12 Thread GitBox
dongjoon-hyun commented on code in PR #39512: URL: https://github.com/apache/spark/pull/39512#discussion_r1068473346 ## sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala: ## @@ -513,6 +513,14 @@ object SQLConf { .booleanConf

[GitHub] [spark] leewyang commented on a diff in pull request #37734: [SPARK-40264][ML] add batch_infer_udf function to pyspark.ml.functions

2023-01-12 Thread GitBox
leewyang commented on code in PR #37734: URL: https://github.com/apache/spark/pull/37734#discussion_r1068492718 ## python/pyspark/ml/functions.py: ## @@ -106,6 +152,597 @@ def array_to_vector(col: Column) -> Column: return

[GitHub] [spark] mengxr commented on a diff in pull request #37734: [SPARK-40264][ML] add batch_infer_udf function to pyspark.ml.functions

2023-01-12 Thread GitBox
mengxr commented on code in PR #37734: URL: https://github.com/apache/spark/pull/37734#discussion_r1068515229 ## python/pyspark/ml/functions.py: ## @@ -106,6 +152,597 @@ def array_to_vector(col: Column) -> Column: return

[GitHub] [spark] dongjoon-hyun commented on a diff in pull request #39512: [SPARK-41986][SQL] Introduce shuffle on SinglePartition

2023-01-12 Thread GitBox
dongjoon-hyun commented on code in PR #39512: URL: https://github.com/apache/spark/pull/39512#discussion_r1068472465 ## sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala: ## @@ -513,6 +513,14 @@ object SQLConf { .booleanConf

[GitHub] [spark] leewyang commented on a diff in pull request #37734: [SPARK-40264][ML] add batch_infer_udf function to pyspark.ml.functions

2023-01-12 Thread GitBox
leewyang commented on code in PR #37734: URL: https://github.com/apache/spark/pull/37734#discussion_r1068598889 ## python/pyspark/ml/functions.py: ## @@ -106,6 +152,597 @@ def array_to_vector(col: Column) -> Column: return

[GitHub] [spark] otterc commented on a diff in pull request #37922: [SPARK-40480][SHUFFLE] Remove push-based shuffle data after query finished

2023-01-12 Thread GitBox
otterc commented on code in PR #37922: URL: https://github.com/apache/spark/pull/37922#discussion_r1068388955 ## common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/RemoteBlockPushResolver.java: ## @@ -396,6 +403,59 @@ public void applicationRemoved(String

[GitHub] [spark] otterc commented on a diff in pull request #37922: [SPARK-40480][SHUFFLE] Remove push-based shuffle data after query finished

2023-01-12 Thread GitBox
otterc commented on code in PR #37922: URL: https://github.com/apache/spark/pull/37922#discussion_r1068509859 ## common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/RemoteBlockPushResolver.java: ## @@ -470,6 +530,40 @@ void closeAndDeleteOutdatedPartitions(

[GitHub] [spark] itholic opened a new pull request, #39543: [SPARK-42044][SQL] Fix incorrect error message for `MUST_AGGREGATE_CORRELATED_SCALAR_SUBQUERY`

2023-01-12 Thread GitBox
itholic opened a new pull request, #39543: URL: https://github.com/apache/spark/pull/39543 ### What changes were proposed in this pull request? This PR proposes to fix incorrect error message for `MUST_AGGREGATE_CORRELATED_SCALAR_SUBQUERY` ### Why are the changes needed?

[GitHub] [spark] hvanhovell commented on a diff in pull request #39517: [SPARK-41993][SQL] Move RowEncoder to AgnosticEncoders

2023-01-12 Thread GitBox
hvanhovell commented on code in PR #39517: URL: https://github.com/apache/spark/pull/39517#discussion_r1068889994 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/ScalaReflection.scala: ## @@ -155,11 +169,19 @@ object ScalaReflection extends ScalaReflection {

[GitHub] [spark] hvanhovell commented on a diff in pull request #39517: [SPARK-41993][SQL] Move RowEncoder to AgnosticEncoders

2023-01-12 Thread GitBox
hvanhovell commented on code in PR #39517: URL: https://github.com/apache/spark/pull/39517#discussion_r106785 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/encoders/AgnosticEncoder.scala: ## @@ -46,35 +46,42 @@ object AgnosticEncoders { override val

[GitHub] [spark] hvanhovell commented on a diff in pull request #39517: [SPARK-41993][SQL] Move RowEncoder to AgnosticEncoders

2023-01-12 Thread GitBox
hvanhovell commented on code in PR #39517: URL: https://github.com/apache/spark/pull/39517#discussion_r1068889994 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/ScalaReflection.scala: ## @@ -155,11 +169,19 @@ object ScalaReflection extends ScalaReflection {

[GitHub] [spark] harupy commented on pull request #39188: [SPARK-41591][PYTHON][ML] Training PyTorch Files on Single Node Multi GPU

2023-01-12 Thread GitBox
harupy commented on PR #39188: URL: https://github.com/apache/spark/pull/39188#issuecomment-1381288242 @rithwik-db Left a few comments on strings. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

[GitHub] [spark] harupy commented on a diff in pull request #39188: [SPARK-41591][PYTHON][ML] Training PyTorch Files on Single Node Multi GPU

2023-01-12 Thread GitBox
harupy commented on code in PR #39188: URL: https://github.com/apache/spark/pull/39188#discussion_r1068903058 ## python/pyspark/ml/torch/distributor.py: ## @@ -261,6 +312,130 @@ def __init__( super().__init__(num_processes, local_mode, use_gpu) self.ssl_conf =

[GitHub] [spark] harupy commented on a diff in pull request #39188: [SPARK-41591][PYTHON][ML] Training PyTorch Files on Single Node Multi GPU

2023-01-12 Thread GitBox
harupy commented on code in PR #39188: URL: https://github.com/apache/spark/pull/39188#discussion_r1068923718 ## python/pyspark/ml/torch/distributor.py: ## @@ -261,6 +312,130 @@ def __init__( super().__init__(num_processes, local_mode, use_gpu) self.ssl_conf =

[GitHub] [spark] zhengruifeng commented on pull request #39514: [SPARK-41987][CONNECT][PYTHON] Connect API: createDataFrame should supports column with map type

2023-01-12 Thread GitBox
zhengruifeng commented on PR #39514: URL: https://github.com/apache/spark/pull/39514#issuecomment-1381312106 Thank you @beliefer -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] zhengruifeng opened a new pull request, #39548: [SPARK-42014][CONNECT][PYTHON] Enable 2 tests in test_parity_serde

2023-01-12 Thread GitBox
zhengruifeng opened a new pull request, #39548: URL: https://github.com/apache/spark/pull/39548 ### What changes were proposed in this pull request? Enable 2 tests in test_parity_serde ### Why are the changes needed? test coverage ### Does this PR introduce _any_

[GitHub] [spark] LuciferYang commented on pull request #38496: [SPARK-40708][SQL] Auto update table statistics based on write metrics

2023-01-12 Thread GitBox
LuciferYang commented on PR #38496: URL: https://github.com/apache/spark/pull/38496#issuecomment-1381349599 @wankunde cloud you resolve the conflicts? @wangyum Does this feature need to be finished in Spark 3.4.0? -- This is an automated message from the Apache Git Service.

[GitHub] [spark] HyukjinKwon closed pull request #39544: [SPARK-42028][CONNECT][PYTHON][FOLLOW-UP] Uses the same logic with PySpark, and reeanbles skipped test

2023-01-12 Thread GitBox
HyukjinKwon closed pull request #39544: [SPARK-42028][CONNECT][PYTHON][FOLLOW-UP] Uses the same logic with PySpark, and reeanbles skipped test URL: https://github.com/apache/spark/pull/39544 -- This is an automated message from the Apache Git Service. To respond to the message, please log

[GitHub] [spark] HyukjinKwon commented on pull request #39544: [SPARK-42028][CONNECT][PYTHON][FOLLOW-UP] Uses the same logic with PySpark, and reeanbles skipped test

2023-01-12 Thread GitBox
HyukjinKwon commented on PR #39544: URL: https://github.com/apache/spark/pull/39544#issuecomment-1381364417 Thank you guys! Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

[GitHub] [spark] rangadi opened a new pull request, #39550: [SQL][PROTOBUF] Add missing options for Protobuf functions

2023-01-12 Thread GitBox
rangadi opened a new pull request, #39550: URL: https://github.com/apache/spark/pull/39550 This adds missing options for Protobuf functions in both Scala & Python. We should be able to pass options for both `from_protobuf()` and `to_protobuf()`. This PR fixes various gaps:

[GitHub] [spark] rangadi commented on pull request #39550: [SQL][PROTOBUF] Add missing options for Protobuf functions

2023-01-12 Thread GitBox
rangadi commented on PR #39550: URL: https://github.com/apache/spark/pull/39550#issuecomment-1381366565 @SandishKumarHN PTAL. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] zhengruifeng commented on a diff in pull request #39551: [SPARK-42047][SPARK-41900][CONNECT][PYTHON] Literal should support numpy datatypes

2023-01-12 Thread GitBox
zhengruifeng commented on code in PR #39551: URL: https://github.com/apache/spark/pull/39551#discussion_r1068985761 ## python/pyspark/sql/tests/connect/test_parity_functions.py: ## @@ -59,7 +59,7 @@ def test_inverse_trig_functions(self): def test_lit_list(self):

[GitHub] [spark] allisonwang-db commented on a diff in pull request #39479: [SPARK-41961][SQL] Support table-valued functions with LATERAL

2023-01-12 Thread GitBox
allisonwang-db commented on code in PR #39479: URL: https://github.com/apache/spark/pull/39479#discussion_r1069000543 ## sql/core/src/test/resources/sql-tests/inputs/join-lateral.sql: ## @@ -177,6 +177,25 @@ SELECT * FROM t3 JOIN LATERAL (SELECT EXPLODE_OUTER(c2)); SELECT *

[GitHub] [spark] dongjoon-hyun commented on a diff in pull request #39361: [SPARK-41822][CONNECT] Setup gRPC connection for Scala/JVM client

2023-01-12 Thread GitBox
dongjoon-hyun commented on code in PR #39361: URL: https://github.com/apache/spark/pull/39361#discussion_r1069027035 ## connector/connect/client/jvm/src/test/scala/org/apache/spark/sql/connect/client/SparkConnectClientSuite.scala: ## @@ -16,17 +16,151 @@ */ package

[GitHub] [spark] AmplabJenkins commented on pull request #39501: [SPARK-41295][SQL] Rename the error classes

2023-01-12 Thread GitBox
AmplabJenkins commented on PR #39501: URL: https://github.com/apache/spark/pull/39501#issuecomment-1381437612 Can one of the admins verify this patch? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[GitHub] [spark] AmplabJenkins commented on pull request #39502: [SPARK-41981][SQL] Collapse percentile functions if possible

2023-01-12 Thread GitBox
AmplabJenkins commented on PR #39502: URL: https://github.com/apache/spark/pull/39502#issuecomment-1381437566 Can one of the admins verify this patch? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[GitHub] [spark] zhengruifeng opened a new pull request, #39545: [SPARK-42042][CONNECT][PYTHON] `DataFrameReader` should support StructType schema

2023-01-12 Thread GitBox
zhengruifeng opened a new pull request, #39545: URL: https://github.com/apache/spark/pull/39545 ### What changes were proposed in this pull request? `DataFrameReader` should support StructType schema ### Why are the changes needed? for parity ### Does this PR

[GitHub] [spark] wankunde commented on a diff in pull request #37922: [SPARK-40480][SHUFFLE] Remove push-based shuffle data after query finished

2023-01-12 Thread GitBox
wankunde commented on code in PR #37922: URL: https://github.com/apache/spark/pull/37922#discussion_r1068910109 ## core/src/test/scala/org/apache/spark/MapOutputTrackerSuite.scala: ## @@ -913,6 +918,59 @@ class MapOutputTrackerSuite extends SparkFunSuite with LocalSparkContext

[GitHub] [spark] mridulm commented on a diff in pull request #37922: [SPARK-40480][SHUFFLE] Remove push-based shuffle data after query finished

2023-01-12 Thread GitBox
mridulm commented on code in PR #37922: URL: https://github.com/apache/spark/pull/37922#discussion_r1068956139 ## common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/ExternalBlockStoreClient.java: ## @@ -256,6 +256,23 @@ public void onFailure(Throwable e) {

[GitHub] [spark] LuciferYang commented on a diff in pull request #39531: [SPARK-42029][CONNECT] Add Guava Shading rules to `connect-common` to avoid startup failure

2023-01-12 Thread GitBox
LuciferYang commented on code in PR #39531: URL: https://github.com/apache/spark/pull/39531#discussion_r1068958012 ## connector/connect/common/pom.xml: ## @@ -156,6 +156,43 @@ + + +

[GitHub] [spark] LuciferYang commented on a diff in pull request #39531: [SPARK-42029][CONNECT] Add Guava Shading rules to `connect-common` to avoid startup failure

2023-01-12 Thread GitBox
LuciferYang commented on code in PR #39531: URL: https://github.com/apache/spark/pull/39531#discussion_r1068958012 ## connector/connect/common/pom.xml: ## @@ -156,6 +156,43 @@ + + +

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #39544: [SPARK-42028][CONNECT][PYTHON][FOLLOW-UP] Uses the same logic with PySpark, and reeanbles skipped test

2023-01-12 Thread GitBox
HyukjinKwon commented on code in PR #39544: URL: https://github.com/apache/spark/pull/39544#discussion_r1068966627 ## python/pyspark/sql/connect/session.py: ## @@ -215,47 +219,37 @@ def createDataFrame( _inferred_schema: Optional[StructType] = None if

[GitHub] [spark] dongjoon-hyun commented on pull request #39549: [SPARK-42046][TESTS] Add `connect-client-jvm` to `connect` module

2023-01-12 Thread GitBox
dongjoon-hyun commented on PR #39549: URL: https://github.com/apache/spark/pull/39549#issuecomment-1381362037 cc @HyukjinKwon -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] HeartSaVioR commented on a diff in pull request #39538: [SPARK-41596][SS][DOCS] Document the new feature "Async Progress Tracking" to Structured Streaming guide doc

2023-01-12 Thread GitBox
HeartSaVioR commented on code in PR #39538: URL: https://github.com/apache/spark/pull/39538#discussion_r1068980633 ## docs/structured-streaming-programming-guide.md: ## @@ -3569,7 +3569,63 @@ the effect of the change is not well-defined. For all of them: structures into

[GitHub] [spark] dongjoon-hyun commented on pull request #39549: [SPARK-42046][TESTS] Add `connect-client-jvm` to `connect` module

2023-01-12 Thread GitBox
dongjoon-hyun commented on PR #39549: URL: https://github.com/apache/spark/pull/39549#issuecomment-1381418225 Oh, one test seems to fail in GitHub Action environment. ``` [info] - Check URI: sc://localhost:123/, isCorrect: true *** FAILED *** (6 milliseconds) [info]

[GitHub] [spark] HyukjinKwon closed pull request #39542: [SPARK-41591][PYTHON][ML][FOLLOW-UP] Fix type hints that are incompatible with Python <= 3.8

2023-01-12 Thread GitBox
HyukjinKwon closed pull request #39542: [SPARK-41591][PYTHON][ML][FOLLOW-UP] Fix type hints that are incompatible with Python <= 3.8 URL: https://github.com/apache/spark/pull/39542 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

  1   2   3   >