[GitHub] [spark] ivoson commented on a diff in pull request #39410: [SPARK-41848][CORE] Fixing task over-scheduled with TaskResourceProfile

2023-01-08 Thread GitBox
ivoson commented on code in PR #39410: URL: https://github.com/apache/spark/pull/39410#discussion_r1064356843 ## core/src/test/scala/org/apache/spark/scheduler/CoarseGrainedSchedulerBackendSuite.scala: ## @@ -403,6 +405,92 @@ class CoarseGrainedSchedulerBackendSuite extends

[GitHub] [spark] WangGuangxin commented on a diff in pull request #38877: [SPARK-41361] [SQL] Invalid call toAttribute on unresolved object exception caused by WidenSetOperationTypes

2023-01-08 Thread GitBox
WangGuangxin commented on code in PR #38877: URL: https://github.com/apache/spark/pull/38877#discussion_r1064352128 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/ScriptTransformation.scala: ## @@ -32,7 +32,13 @@ case class ScriptTransformation(

[GitHub] [spark] WangGuangxin commented on a diff in pull request #38877: [SPARK-41361] [SQL] Invalid call toAttribute on unresolved object exception caused by WidenSetOperationTypes

2023-01-08 Thread GitBox
WangGuangxin commented on code in PR #38877: URL: https://github.com/apache/spark/pull/38877#discussion_r1064352128 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/ScriptTransformation.scala: ## @@ -32,7 +32,13 @@ case class ScriptTransformation(

[GitHub] [spark] MaxGekk commented on a diff in pull request #39464: [SPARK-41947][CORE][DOCS] Update the contents of error class guidelines

2023-01-08 Thread GitBox
MaxGekk commented on code in PR #39464: URL: https://github.com/apache/spark/pull/39464#discussion_r1064350822 ## core/src/main/resources/error/README.md: ## @@ -24,27 +24,27 @@ Throw with arbitrary error message: ### After -`error-class.json` +`error-classes.json`

[GitHub] [spark] MaxGekk closed pull request #39282: [SPARK-41581][SQL] Update `_LEGACY_ERROR_TEMP_1230` as `INTERNAL_ERROR`

2023-01-08 Thread GitBox
MaxGekk closed pull request #39282: [SPARK-41581][SQL] Update `_LEGACY_ERROR_TEMP_1230` as `INTERNAL_ERROR` URL: https://github.com/apache/spark/pull/39282 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[GitHub] [spark] MaxGekk commented on pull request #39282: [SPARK-41581][SQL] Update `_LEGACY_ERROR_TEMP_1230` as `INTERNAL_ERROR`

2023-01-08 Thread GitBox
MaxGekk commented on PR #39282: URL: https://github.com/apache/spark/pull/39282#issuecomment-1375213258 +1, LGTM. Merging to master. Thank you, @itholic. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[GitHub] [spark] itholic opened a new pull request, #39464: [SPARK-41947][CORE][DOCS] Update the contents of error class guidelines

2023-01-08 Thread GitBox
itholic opened a new pull request, #39464: URL: https://github.com/apache/spark/pull/39464 ### What changes were proposed in this pull request? This PR proposes to update error class guidelines for `core/src/main/resources/error/README.md`. ### Why are the changes

[GitHub] [spark] zhengruifeng commented on a diff in pull request #39461: [WIP][SPARK-41945][CONNECT][PYTHON] Python: connect client lost column data with pyarrow.Table.to_pylist

2023-01-08 Thread GitBox
zhengruifeng commented on code in PR #39461: URL: https://github.com/apache/spark/pull/39461#discussion_r1064336239 ## python/pyspark/sql/tests/connect/test_parity_functions.py: ## @@ -90,8 +90,6 @@ def test_nested_higher_order_function(self): def

[GitHub] [spark] zhengruifeng commented on pull request #39461: [WIP][SPARK-41945][CONNECT][PYTHON] Python: connect client lost column data with pyarrow.Table.to_pylist

2023-01-08 Thread GitBox
zhengruifeng commented on PR #39461: URL: https://github.com/apache/spark/pull/39461#issuecomment-1375197987 due to the duplicated column names? can you update the example with a simpler one? -- This is an automated message from the Apache Git Service. To respond to the message,

[GitHub] [spark] zhengruifeng commented on pull request #39388: [SPARK-41354][CONNECT][PYTHON] Implement RepartitionByExpression

2023-01-08 Thread GitBox
zhengruifeng commented on PR #39388: URL: https://github.com/apache/spark/pull/39388#issuecomment-1375190045 merged into master, thank you @dengziming for working on this! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

[GitHub] [spark] zhengruifeng closed pull request #39388: [SPARK-41354][CONNECT][PYTHON] Implement RepartitionByExpression

2023-01-08 Thread GitBox
zhengruifeng closed pull request #39388: [SPARK-41354][CONNECT][PYTHON] Implement RepartitionByExpression URL: https://github.com/apache/spark/pull/39388 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[GitHub] [spark] dongjoon-hyun commented on a diff in pull request #39410: [SPARK-41848][CORE] Fixing task over-scheduled with TaskResourceProfile

2023-01-08 Thread GitBox
dongjoon-hyun commented on code in PR #39410: URL: https://github.com/apache/spark/pull/39410#discussion_r1064326242 ## core/src/test/scala/org/apache/spark/scheduler/CoarseGrainedSchedulerBackendSuite.scala: ## @@ -403,6 +405,92 @@ class CoarseGrainedSchedulerBackendSuite

[GitHub] [spark] zhengruifeng commented on pull request #39456: [SPARK-41904][CONNECT][PYTHON] Fix Function `nth_value` functions output

2023-01-08 Thread GitBox
zhengruifeng commented on PR #39456: URL: https://github.com/apache/spark/pull/39456#issuecomment-1375171107 @beliefer you may also need to change

[GitHub] [spark] EnricoMi commented on a diff in pull request #39431: [SPARK-41914][SQL] FileFormatWriter materializes AQE plan before accessing outputOrdering

2023-01-08 Thread GitBox
EnricoMi commented on code in PR #39431: URL: https://github.com/apache/spark/pull/39431#discussion_r1064317964 ## sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FileFormatWriter.scala: ## @@ -138,9 +144,21 @@ object FileFormatWriter extends Logging {

[GitHub] [spark] HyukjinKwon commented on pull request #39463: [SPARK-41944][CONNECT] Pass configurations when local remote mode is on

2023-01-08 Thread GitBox
HyukjinKwon commented on PR #39463: URL: https://github.com/apache/spark/pull/39463#issuecomment-1375166461 cc @zhengruifeng @amaliujia FYI -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] HyukjinKwon opened a new pull request, #39463: [SPARK-41944][CONNECT] Pass configurations when local remote mode is on

2023-01-08 Thread GitBox
HyukjinKwon opened a new pull request, #39463: URL: https://github.com/apache/spark/pull/39463 ### What changes were proposed in this pull request? This PR mainly proposes to pass the user-specified configurations to local remote mode. Previously, all user-specific

[GitHub] [spark] smallzhongfeng commented on pull request #39448: [SPARK-41943][CORE] Use java api to create files and grant permissions

2023-01-08 Thread GitBox
smallzhongfeng commented on PR #39448: URL: https://github.com/apache/spark/pull/39448#issuecomment-1375166196 > There was a long discussion thread regarding this implementation in this [PR](https://github.com/apache/spark/pull/35085#discussion_r786892744). There will be some issue with

[GitHub] [spark] zhengruifeng opened a new pull request, #39462: [SPARK-41879][CONNECT][PYTHON] Make `DataFrame.collect` support nested types

2023-01-08 Thread GitBox
zhengruifeng opened a new pull request, #39462: URL: https://github.com/apache/spark/pull/39462 ### What changes were proposed in this pull request? Make `DataFrame.collect` support nested types, by introducing a new data converter ### Why are the changes needed? to be

[GitHub] [spark] dengziming commented on pull request #39388: [SPARK-41354][CONNECT][PYTHON] Implement RepartitionByExpression

2023-01-08 Thread GitBox
dengziming commented on PR #39388: URL: https://github.com/apache/spark/pull/39388#issuecomment-1375160517 > @dengziming would you mind resolving the conflicts? thanks Done! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

[GitHub] [spark] beliefer opened a new pull request, #39461: [SPARK-41945][CONNECT][PYTHON] Python: connect client lost column data with pyarrow.Table.to_pylist

2023-01-08 Thread GitBox
beliefer opened a new pull request, #39461: URL: https://github.com/apache/spark/pull/39461 ### What changes were proposed in this pull request? Python: connect client should not use pyarrow.Table.to_pylist to transform fetched data. For example: the data in pyarrow.Table show

[GitHub] [spark] cloud-fan commented on pull request #39333: [SPARK-41805][SQL] Reuse expressions in WindowSpecDefinition

2023-01-08 Thread GitBox
cloud-fan commented on PR #39333: URL: https://github.com/apache/spark/pull/39333#issuecomment-1375126878 thanks, merging to master! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] cloud-fan closed pull request #39333: [SPARK-41805][SQL] Reuse expressions in WindowSpecDefinition

2023-01-08 Thread GitBox
cloud-fan closed pull request #39333: [SPARK-41805][SQL] Reuse expressions in WindowSpecDefinition URL: https://github.com/apache/spark/pull/39333 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

[GitHub] [spark] zhouyejoe commented on pull request #39448: [SPARK-41943][CORE] Use java api to create files and grant permissions

2023-01-08 Thread GitBox
zhouyejoe commented on PR #39448: URL: https://github.com/apache/spark/pull/39448#issuecomment-1375124798 There was a long discussion thread regarding this implementation in this [PR](https://github.com/apache/spark/pull/35085#discussion_r786892744). There will be some issue with setgid.

[GitHub] [spark] LuciferYang commented on pull request #39458: [SPARK-41941][BUILD] Upgrade `scalatest` related test dependencies to 3.2.15

2023-01-08 Thread GitBox
LuciferYang commented on PR #39458: URL: https://github.com/apache/spark/pull/39458#issuecomment-1375122436 Thanks @dongjoon-hyun @HyukjinKwon -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

[GitHub] [spark] dongjoon-hyun commented on pull request #39458: [SPARK-41941][BUILD] Upgrade `scalatest` related test dependencies to 3.2.15

2023-01-08 Thread GitBox
dongjoon-hyun commented on PR #39458: URL: https://github.com/apache/spark/pull/39458#issuecomment-1375122255 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] dongjoon-hyun closed pull request #39458: [SPARK-41941][BUILD] Upgrade `scalatest` related test dependencies to 3.2.15

2023-01-08 Thread GitBox
dongjoon-hyun closed pull request #39458: [SPARK-41941][BUILD] Upgrade `scalatest` related test dependencies to 3.2.15 URL: https://github.com/apache/spark/pull/39458 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

[GitHub] [spark] itholic commented on a diff in pull request #39387: [SPARK-41586][PYTHON] Introduce `pyspark.errors` and error classes for PySpark.

2023-01-08 Thread GitBox
itholic commented on code in PR #39387: URL: https://github.com/apache/spark/pull/39387#discussion_r1064286705 ## python/pyspark/errors/tests/test_errors.py: ## @@ -0,0 +1,48 @@ +# -*- encoding: utf-8 -*- +# +# Licensed to the Apache Software Foundation (ASF) under one or more

[GitHub] [spark] itholic commented on a diff in pull request #39387: [SPARK-41586][PYTHON] Introduce `pyspark.errors` and error classes for PySpark.

2023-01-08 Thread GitBox
itholic commented on code in PR #39387: URL: https://github.com/apache/spark/pull/39387#discussion_r1064285339 ## python/pyspark/testing/utils.py: ## @@ -138,6 +140,32 @@ def setUpClass(cls): def tearDownClass(cls): cls.sc.stop() +def check_error( +

[GitHub] [spark] cloud-fan commented on pull request #38163: [SPARK-40711][SQL] Add spill size metrics for window

2023-01-08 Thread GitBox
cloud-fan commented on PR #38163: URL: https://github.com/apache/spark/pull/38163#issuecomment-1375102383 thanks, merging to master! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] cloud-fan closed pull request #38163: [SPARK-40711][SQL] Add spill size metrics for window

2023-01-08 Thread GitBox
cloud-fan closed pull request #38163: [SPARK-40711][SQL] Add spill size metrics for window URL: https://github.com/apache/spark/pull/38163 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] smallzhongfeng commented on a diff in pull request #39448: [SPARK-41943][CORE] Use java api to create files and grant permissions

2023-01-08 Thread GitBox
smallzhongfeng commented on code in PR #39448: URL: https://github.com/apache/spark/pull/39448#discussion_r1064268534 ## core/src/main/scala/org/apache/spark/storage/DiskBlockManager.scala: ## @@ -301,9 +301,6 @@ private[spark] class DiskBlockManager( * Create a directory

[GitHub] [spark] beliefer commented on pull request #39091: [SPARK-41527][CONNECT][PYTHON] Implement `DataFrame.observe`

2023-01-08 Thread GitBox
beliefer commented on PR #39091: URL: https://github.com/apache/spark/pull/39091#issuecomment-1375067246 > In particular, the discussion on the `isObservation` flag in the proto message needs to be addressed to simplify. Hi, @grundprinzip . In fact, I removed the `Observation` that

[GitHub] [spark] cloud-fan commented on a diff in pull request #39431: [SPARK-41914][SQL] FileFormatWriter materializes AQE plan before accessing outputOrdering

2023-01-08 Thread GitBox
cloud-fan commented on code in PR #39431: URL: https://github.com/apache/spark/pull/39431#discussion_r1064261329 ## sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/V1WriteCommandSuite.scala: ## @@ -181,13 +178,111 @@ class V1WriteCommandSuite extends

[GitHub] [spark] cloud-fan commented on a diff in pull request #39431: [SPARK-41914][SQL] FileFormatWriter materializes AQE plan before accessing outputOrdering

2023-01-08 Thread GitBox
cloud-fan commented on code in PR #39431: URL: https://github.com/apache/spark/pull/39431#discussion_r1064261228 ## sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FileFormatWriter.scala: ## @@ -138,9 +144,21 @@ object FileFormatWriter extends Logging {

[GitHub] [spark] cloud-fan commented on pull request #35709: [SPARK-38389][SQL] Add the `DATEDIFF()` and `DATE_DIFF()` aliases for `TIMESTAMPDIFF()`

2023-01-08 Thread GitBox
cloud-fan commented on PR #35709: URL: https://github.com/apache/spark/pull/35709#issuecomment-1375060899 Maybe we should deprecate the old `datediff` function. The new `datediff` function has a special parser rule so that it won't conflict with the old one, but I agree that 2 `datedff`

[GitHub] [spark] wankunde opened a new pull request, #39460: [SPARK-39217][SQL] Makes DPP support the pruning side has Union

2023-01-08 Thread GitBox
wankunde opened a new pull request, #39460: URL: https://github.com/apache/spark/pull/39460 ### What changes were proposed in this pull request? Makes DPP support the pruning side has `Union`. For example: ```sql SELECT f.store_id, f.date_id, s.state_province

[GitHub] [spark] itholic commented on a diff in pull request #39260: [SPARK-41579][SQL] Assign name to _LEGACY_ERROR_TEMP_1249

2023-01-08 Thread GitBox
itholic commented on code in PR #39260: URL: https://github.com/apache/spark/pull/39260#discussion_r1064249717 ## sql/core/src/test/scala/org/apache/spark/sql/execution/command/DDLSuite.scala: ## @@ -2166,6 +2166,22 @@ abstract class DDLSuite extends QueryTest with

[GitHub] [spark] LuciferYang commented on pull request #39406: [SPARK-41894][SS][TESTS] Restore the write permission of `commitDir` after run `testAsyncWriteErrorsPermissionsIssue`

2023-01-08 Thread GitBox
LuciferYang commented on PR #39406: URL: https://github.com/apache/spark/pull/39406#issuecomment-1375032421 Thanks @HeartSaVioR -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] itholic commented on pull request #39282: [SPARK-41581][SQL] Update `_LEGACY_ERROR_TEMP_1230` as `INTERNAL_ERROR`

2023-01-08 Thread GitBox
itholic commented on PR #39282: URL: https://github.com/apache/spark/pull/39282#issuecomment-1375031781 Update the title & description, thanks :-) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

[GitHub] [spark] HeartSaVioR closed pull request #39406: [SPARK-41894][SS][TESTS] Restore the write permission of `commitDir` after run `testAsyncWriteErrorsPermissionsIssue`

2023-01-08 Thread GitBox
HeartSaVioR closed pull request #39406: [SPARK-41894][SS][TESTS] Restore the write permission of `commitDir` after run `testAsyncWriteErrorsPermissionsIssue` URL: https://github.com/apache/spark/pull/39406 -- This is an automated message from the Apache Git Service. To respond to the

[GitHub] [spark] HeartSaVioR commented on pull request #39406: [SPARK-41894][SS][TESTS] Restore the write permission of `commitDir` after run `testAsyncWriteErrorsPermissionsIssue`

2023-01-08 Thread GitBox
HeartSaVioR commented on PR #39406: URL: https://github.com/apache/spark/pull/39406#issuecomment-1375031024 Thanks! Merging to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] itholic commented on a diff in pull request #39389: [SPARK-41574][SQL] Update `_LEGACY_ERROR_TEMP_2009` as `INTERNAL_ERROR`.

2023-01-08 Thread GitBox
itholic commented on code in PR #39389: URL: https://github.com/apache/spark/pull/39389#discussion_r1064247087 ## sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryExecutionErrors.scala: ## @@ -378,10 +378,9 @@ private[sql] object QueryExecutionErrors extends

[GitHub] [spark] itholic commented on pull request #39394: [SPARK-41575][SQL] Assign name to _LEGACY_ERROR_TEMP_2054

2023-01-08 Thread GitBox
itholic commented on PR #39394: URL: https://github.com/apache/spark/pull/39394#issuecomment-1375027707 Sounds good. Just exposed `path` to error message. Thanks! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

[GitHub] [spark] ulysses-you commented on a diff in pull request #39431: [SPARK-41914][SQL] FileFormatWriter materializes AQE plan before accessing outputOrdering

2023-01-08 Thread GitBox
ulysses-you commented on code in PR #39431: URL: https://github.com/apache/spark/pull/39431#discussion_r1064243471 ## sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FileFormatWriter.scala: ## @@ -138,9 +144,21 @@ object FileFormatWriter extends Logging {

[GitHub] [spark] ulysses-you commented on pull request #38163: [SPARK-40711][SQL] Add spill size metrics for window

2023-01-08 Thread GitBox
ulysses-you commented on PR #38163: URL: https://github.com/apache/spark/pull/38163#issuecomment-1375015821 cc @cloud-fan @HyukjinKwon if you find some time to take an another look, thank you -- This is an automated message from the Apache Git Service. To respond to the message, please

[GitHub] [spark] LuciferYang commented on pull request #39458: [SPARK-41941][BUILD] Upgrade `scalatest` related test dependencies to 3.2.15

2023-01-08 Thread GitBox
LuciferYang commented on PR #39458: URL: https://github.com/apache/spark/pull/39458#issuecomment-1375013705 re-trigger the failed task -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] ulysses-you commented on a diff in pull request #39277: [SPARK-41708][SQL] Pull v1write information to `WriteFiles`

2023-01-08 Thread GitBox
ulysses-you commented on code in PR #39277: URL: https://github.com/apache/spark/pull/39277#discussion_r1064238626 ## sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/InsertIntoHiveTable.scala: ## @@ -294,3 +285,40 @@ case class InsertIntoHiveTable( override

[GitHub] [spark] HyukjinKwon closed pull request #39368: [SPARK-28764][CORE][TEST] Remove writePartitionedFile in ExternalSorter

2023-01-08 Thread GitBox
HyukjinKwon closed pull request #39368: [SPARK-28764][CORE][TEST] Remove writePartitionedFile in ExternalSorter URL: https://github.com/apache/spark/pull/39368 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[GitHub] [spark] HyukjinKwon commented on pull request #39368: [SPARK-28764][CORE][TEST] Remove writePartitionedFile in ExternalSorter

2023-01-08 Thread GitBox
HyukjinKwon commented on PR #39368: URL: https://github.com/apache/spark/pull/39368#issuecomment-1374983004 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #39448: [SPARK-41943][CORE] Use java api to create files and grant permissions

2023-01-08 Thread GitBox
HyukjinKwon commented on code in PR #39448: URL: https://github.com/apache/spark/pull/39448#discussion_r1064230294 ## core/src/main/scala/org/apache/spark/storage/DiskBlockManager.scala: ## @@ -301,9 +301,6 @@ private[spark] class DiskBlockManager( * Create a directory that

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #39456: [SPARK-41904][CONNECT][PYTHON] Fix Function `nth_value` functions output

2023-01-08 Thread GitBox
HyukjinKwon commented on code in PR #39456: URL: https://github.com/apache/spark/pull/39456#discussion_r1064229998 ## python/pyspark/sql/tests/connect/test_parity_functions.py: ## @@ -122,8 +122,6 @@ def test_nested_higher_order_function(self): def

[GitHub] [spark] HyukjinKwon closed pull request #39453: [SPARK-41938][BUILD] Upgrade sbt to 1.8.2

2023-01-08 Thread GitBox
HyukjinKwon closed pull request #39453: [SPARK-41938][BUILD] Upgrade sbt to 1.8.2 URL: https://github.com/apache/spark/pull/39453 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] HyukjinKwon commented on pull request #39453: [SPARK-41938][BUILD] Upgrade sbt to 1.8.2

2023-01-08 Thread GitBox
HyukjinKwon commented on PR #39453: URL: https://github.com/apache/spark/pull/39453#issuecomment-1374979616 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] HyukjinKwon closed pull request #39454: [SPARK-41937][R] Fix error in R (>= 4.2.0) for SparkR datetime column comparing with Sys.time()

2023-01-08 Thread GitBox
HyukjinKwon closed pull request #39454: [SPARK-41937][R] Fix error in R (>= 4.2.0) for SparkR datetime column comparing with Sys.time() URL: https://github.com/apache/spark/pull/39454 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

[GitHub] [spark] HyukjinKwon commented on pull request #39454: [SPARK-41937][R] Fix error in R (>= 4.2.0) for SparkR datetime column comparing with Sys.time()

2023-01-08 Thread GitBox
HyukjinKwon commented on PR #39454: URL: https://github.com/apache/spark/pull/39454#issuecomment-1374979008 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] github-actions[bot] commented on pull request #37759: [SPARK-40306][SQL]Support more than Integer.MAX_VALUE of the same join key

2023-01-08 Thread GitBox
github-actions[bot] commented on PR #37759: URL: https://github.com/apache/spark/pull/37759#issuecomment-1374972033 We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.

[GitHub] [spark] github-actions[bot] commented on pull request #36850: [SPARK-39069][SQL] Enhance ConstantPropagation to replace constants in inequality predicates

2023-01-08 Thread GitBox
github-actions[bot] commented on PR #36850: URL: https://github.com/apache/spark/pull/36850#issuecomment-1374972048 We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.

[GitHub] [spark] github-actions[bot] closed pull request #36588: [SPARK-39217][SQL] Makes DPP support the pruning side has Union

2023-01-08 Thread GitBox
github-actions[bot] closed pull request #36588: [SPARK-39217][SQL] Makes DPP support the pruning side has Union URL: https://github.com/apache/spark/pull/36588 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[GitHub] [spark] vicennial commented on a diff in pull request #39361: [SPARK-41822][CONNECT] Setup gRPC connection for Scala/JVM client

2023-01-08 Thread GitBox
vicennial commented on code in PR #39361: URL: https://github.com/apache/spark/pull/39361#discussion_r1064210276 ## connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/connect/client/SparkConnectClient.scala: ## @@ -17,31 +17,147 @@ package

[GitHub] [spark] vicennial commented on a diff in pull request #39361: [SPARK-41822][CONNECT] Setup gRPC connection for Scala/JVM client

2023-01-08 Thread GitBox
vicennial commented on code in PR #39361: URL: https://github.com/apache/spark/pull/39361#discussion_r1064217874 ## connector/connect/client/jvm/pom.xml: ## @@ -52,6 +53,12 @@ ${protobuf.version} compile + + com.google.guava + guava +

[GitHub] [spark] vicennial commented on a diff in pull request #39361: [SPARK-41822][CONNECT] Setup gRPC connection for Scala/JVM client

2023-01-08 Thread GitBox
vicennial commented on code in PR #39361: URL: https://github.com/apache/spark/pull/39361#discussion_r1064210276 ## connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/connect/client/SparkConnectClient.scala: ## @@ -17,31 +17,147 @@ package

[GitHub] [spark] grundprinzip commented on a diff in pull request #39456: [SPARK-41904][CONNECT][PYTHON] Fix Function `nth_value` functions output

2023-01-08 Thread GitBox
grundprinzip commented on code in PR #39456: URL: https://github.com/apache/spark/pull/39456#discussion_r1064210177 ## connector/connect/common/src/main/protobuf/spark/connect/expressions.proto: ## @@ -209,11 +209,15 @@ message Expression { // (Required) Indicate if this

[GitHub] [spark] grundprinzip commented on pull request #39361: [SPARK-41822][CONNECT] Setup gRPC connection for Scala/JVM client

2023-01-08 Thread GitBox
grundprinzip commented on PR #39361: URL: https://github.com/apache/spark/pull/39361#issuecomment-1374940010 Hi @vicennial, please resolve the addressed comments for easier review. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

[GitHub] [spark] grundprinzip commented on pull request #39091: [SPARK-41527][CONNECT][PYTHON] Implement `DataFrame.observe`

2023-01-08 Thread GitBox
grundprinzip commented on PR #39091: URL: https://github.com/apache/spark/pull/39091#issuecomment-1374939250 Hi @beliefer, when you're ready for another round of reviews, I would suggest to resolve the comments that you think you have addressed because otherwise it's going to be

[GitHub] [spark] grundprinzip commented on pull request #38879: [SPARK-41362][CONNECT][PYTHON] Better error messages for invalid argument types.

2023-01-08 Thread GitBox
grundprinzip commented on PR #38879: URL: https://github.com/apache/spark/pull/38879#issuecomment-1374938706 Closing for now. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] grundprinzip closed pull request #38879: [SPARK-41362][CONNECT][PYTHON] Better error messages for invalid argument types.

2023-01-08 Thread GitBox
grundprinzip closed pull request #38879: [SPARK-41362][CONNECT][PYTHON] Better error messages for invalid argument types. URL: https://github.com/apache/spark/pull/38879 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

[GitHub] [spark] warrenzhu25 commented on a diff in pull request #39280: [SPARK-41766][CORE] Handle decommission request sent before executor registration

2023-01-08 Thread GitBox
warrenzhu25 commented on code in PR #39280: URL: https://github.com/apache/spark/pull/39280#discussion_r1064192028 ## core/src/main/scala/org/apache/spark/scheduler/cluster/CoarseGrainedSchedulerBackend.scala: ## @@ -102,6 +104,15 @@ class

[GitHub] [spark] wankunde closed pull request #39457: [WIP][SPARK-41940][SQL] Infer IsNotNull constraints for complex join expressions

2023-01-08 Thread GitBox
wankunde closed pull request #39457: [WIP][SPARK-41940][SQL] Infer IsNotNull constraints for complex join expressions URL: https://github.com/apache/spark/pull/39457 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

[GitHub] [spark] wankunde commented on pull request #39457: [WIP][SPARK-41940][SQL] Infer IsNotNull constraints for complex join expressions

2023-01-08 Thread GitBox
wankunde commented on PR #39457: URL: https://github.com/apache/spark/pull/39457#issuecomment-1374887807 This pr may infer too many unnecessary constraints. Maybe we can add a `MayBeNull` strait for the expressions which output may be evaluated to null with all inputs are not null. And

[GitHub] [spark] smallzhongfeng commented on pull request #39448: [CORE] Use java api to create files and grant permissions

2023-01-08 Thread GitBox
smallzhongfeng commented on PR #39448: URL: https://github.com/apache/spark/pull/39448#issuecomment-1374881384 cc @cloud-fan @HyukjinKwon @LuciferYang @@zhouyejoe Hope to get your opinion. :) -- This is an automated message from the Apache Git Service. To respond to the message, please

[GitHub] [spark] ivoson opened a new pull request, #39459: [WIP][SPARK-41497][CORE] Fixing accumulator undercount in the case of the retry task with rdd cache.

2023-01-08 Thread GitBox
ivoson opened a new pull request, #39459: URL: https://github.com/apache/spark/pull/39459 ### What changes were proposed in this pull request? Make rdd block(rdd cache) available only when a task generate the block succeed. ### Why are the changes needed? Fixing the bug as

[GitHub] [spark] ivoson commented on a diff in pull request #39410: [SPARK-41848][CORE] Fixing task over-scheduled with TaskResourceProfile

2023-01-08 Thread GitBox
ivoson commented on code in PR #39410: URL: https://github.com/apache/spark/pull/39410#discussion_r1064168456 ## core/src/main/scala/org/apache/spark/executor/CoarseGrainedExecutorBackend.scala: ## @@ -262,9 +269,11 @@ private[spark] class CoarseGrainedExecutorBackend(

[GitHub] [spark] ivoson commented on a diff in pull request #39410: [SPARK-41848][CORE] Fixing task over-scheduled with TaskResourceProfile

2023-01-08 Thread GitBox
ivoson commented on code in PR #39410: URL: https://github.com/apache/spark/pull/39410#discussion_r1064168342 ## core/src/test/scala/org/apache/spark/scheduler/CoarseGrainedSchedulerBackendSuite.scala: ## @@ -403,6 +405,92 @@ class CoarseGrainedSchedulerBackendSuite extends

[GitHub] [spark] ivoson commented on a diff in pull request #39410: [SPARK-41848][CORE] Fixing task over-scheduled with TaskResourceProfile

2023-01-08 Thread GitBox
ivoson commented on code in PR #39410: URL: https://github.com/apache/spark/pull/39410#discussion_r1064167868 ## core/src/test/scala/org/apache/spark/scheduler/CoarseGrainedSchedulerBackendSuite.scala: ## @@ -403,6 +405,92 @@ class CoarseGrainedSchedulerBackendSuite extends

[GitHub] [spark] ivoson commented on a diff in pull request #39410: [SPARK-41848][CORE] Fixing task over-scheduled with TaskResourceProfile

2023-01-08 Thread GitBox
ivoson commented on code in PR #39410: URL: https://github.com/apache/spark/pull/39410#discussion_r1064167770 ## core/src/test/scala/org/apache/spark/scheduler/CoarseGrainedSchedulerBackendSuite.scala: ## @@ -403,6 +405,92 @@ class CoarseGrainedSchedulerBackendSuite extends

[GitHub] [spark] ivoson commented on a diff in pull request #39410: [SPARK-41848][CORE] Fixing task over-scheduled with TaskResourceProfile

2023-01-08 Thread GitBox
ivoson commented on code in PR #39410: URL: https://github.com/apache/spark/pull/39410#discussion_r1064156234 ## core/src/main/scala/org/apache/spark/executor/CoarseGrainedExecutorBackend.scala: ## @@ -262,9 +269,11 @@ private[spark] class CoarseGrainedExecutorBackend(

[GitHub] [spark] Daniel-Davies commented on a diff in pull request #38867: [SPARK-41234][SQL][PYTHON] Add `array_insert` function

2023-01-08 Thread GitBox
Daniel-Davies commented on code in PR #38867: URL: https://github.com/apache/spark/pull/38867#discussion_r1064153224 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala: ## @@ -4601,6 +4601,231 @@ case class ArrayExcept(left:

[GitHub] [spark] Daniel-Davies commented on a diff in pull request #38867: [SPARK-41234][SQL][PYTHON] Add `array_insert` function

2023-01-08 Thread GitBox
Daniel-Davies commented on code in PR #38867: URL: https://github.com/apache/spark/pull/38867#discussion_r1059780745 ## sql/core/src/test/resources/sql-tests/results/array.sql.out: ## @@ -427,6 +427,103 @@ struct NULL +-- !query +select array_insert(array(1, 2, 3), 4, 4)

[GitHub] [spark] Daniel-Davies commented on a diff in pull request #38867: [SPARK-41234][SQL][PYTHON] Add `array_insert` function

2023-01-08 Thread GitBox
Daniel-Davies commented on code in PR #38867: URL: https://github.com/apache/spark/pull/38867#discussion_r1064152931 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala: ## @@ -4601,6 +4601,231 @@ case class ArrayExcept(left:

[GitHub] [spark] LuciferYang opened a new pull request, #39458: [SPARK-41941][BUILD] Upgrade `scalatest` related test dependencies to 3.2.15

2023-01-08 Thread GitBox
LuciferYang opened a new pull request, #39458: URL: https://github.com/apache/spark/pull/39458 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ###

[GitHub] [spark] LuciferYang commented on pull request #39406: [SPARK-41894][SS][TESTS] Restore the write permission of `commitDir` after run `testAsyncWriteErrorsPermissionsIssue`

2023-01-08 Thread GitBox
LuciferYang commented on PR #39406: URL: https://github.com/apache/spark/pull/39406#issuecomment-1374832851 friendly ping @HyukjinKwon @HeartSaVioR -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

[GitHub] [spark] MaxGekk commented on pull request #39332: [WIP][SPARK-40822][SQL] Stable derived column aliases

2023-01-08 Thread GitBox
MaxGekk commented on PR #39332: URL: https://github.com/apache/spark/pull/39332#issuecomment-1374816921 @cloud-fan @srielau Could you review generating of column aliases, please. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

[GitHub] [spark] xinrong-meng commented on a diff in pull request #39384: [SPARK-40307][PYTHON] Introduce Arrow-optimized Python UDFs

2023-01-08 Thread GitBox
xinrong-meng commented on code in PR #39384: URL: https://github.com/apache/spark/pull/39384#discussion_r1064122141 ## python/pyspark/sql/tests/test_arrow_python_udf.py: ## @@ -0,0 +1,131 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor

[GitHub] [spark] xinrong-meng commented on a diff in pull request #39384: [SPARK-40307][PYTHON] Introduce Arrow-optimized Python UDFs

2023-01-08 Thread GitBox
xinrong-meng commented on code in PR #39384: URL: https://github.com/apache/spark/pull/39384#discussion_r1064122105 ## python/pyspark/sql/udf.py: ## @@ -75,6 +81,104 @@ def _create_udf( return udf_obj._wrapped() +def _create_py_udf( +f: Callable[..., Any], +

[GitHub] [spark] wankunde opened a new pull request, #39457: [SPARK-41940][SQL] Infer IsNotNull constraints for complex join expressions

2023-01-08 Thread GitBox
wankunde opened a new pull request, #39457: URL: https://github.com/apache/spark/pull/39457 ### What changes were proposed in this pull request? Infer IsNotNull constraints for complex join expressions along with IsNotNull constraints for the attribute. For example,

[GitHub] [spark] techaddict commented on a diff in pull request #39450: [SPARK-41897][CONNECT][TESTS] Enable tests with error mismatch in connect/test_parity_functions.py

2023-01-08 Thread GitBox
techaddict commented on code in PR #39450: URL: https://github.com/apache/spark/pull/39450#discussion_r1064101499 ## python/pyspark/sql/tests/test_functions.py: ## @@ -24,6 +24,7 @@ from py4j.protocol import Py4JJavaError from pyspark.sql import Row, Window, types +from

[GitHub] [spark] techaddict commented on a diff in pull request #39451: [SPARK-41832][CONNECT][PYTHON] Fix `DataFrame.unionByName`, add allow_missing_columns

2023-01-08 Thread GitBox
techaddict commented on code in PR #39451: URL: https://github.com/apache/spark/pull/39451#discussion_r1064101288 ## connector/connect/server/src/main/scala/org/apache/spark/sql/connect/planner/SparkConnectPlanner.scala: ## @@ -1123,7 +1123,7 @@ class