[GitHub] [spark] amaliujia commented on pull request #38899: [SPARK-41349][CONNECT] Implement DataFrame.hint

2022-12-04 Thread GitBox
amaliujia commented on PR #38899: URL: https://github.com/apache/spark/pull/38899#issuecomment-1336537633 @dengziming there are several techniques that can help you: when you working on Scala, you can run `./dev/lint-scala` locally which does scala side lint check. One of the check

[GitHub] [spark] zhengruifeng commented on a diff in pull request #38884: [SPARK-41363][CONNECT][PYTHON] Implement normal functions

2022-12-04 Thread GitBox
zhengruifeng commented on code in PR #38884: URL: https://github.com/apache/spark/pull/38884#discussion_r1039063296 ## python/pyspark/sql/connect/functions.py: ## @@ -89,6 +93,505 @@ def lit(col: Any) -> Column: return Column(LiteralExpression(col)) +# def

[GitHub] [spark] panbingkun commented on a diff in pull request #38861: [SPARK-41294][SQL] Assign a name to the error class _LEGACY_ERROR_TEMP_1203 / 1168

2022-12-04 Thread GitBox
panbingkun commented on code in PR #38861: URL: https://github.com/apache/spark/pull/38861#discussion_r1039071288 ## core/src/main/resources/error/error-classes.json: ## @@ -876,6 +876,13 @@ ], "sqlState" : "42000" }, + "NOT_ENOUGH_DATA_COLUMNS" : { +"message"

[GitHub] [spark] zwangsheng commented on pull request #38202: [SPARK-40763][K8S] Should expose driver service name to config for user features

2022-12-04 Thread GitBox
zwangsheng commented on PR #38202: URL: https://github.com/apache/spark/pull/38202#issuecomment-1336652652 > Could you add a test case please, @zwangsheng ? Thanks for your reminder, the corresponding unit test has been added. -- This is an automated message from the Apache Git

[GitHub] [spark] dongjoon-hyun commented on a diff in pull request #38202: [SPARK-40763][K8S] Should expose driver service name to config for user features

2022-12-04 Thread GitBox
dongjoon-hyun commented on code in PR #38202: URL: https://github.com/apache/spark/pull/38202#discussion_r1039122836 ## resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/features/DriverServiceFeatureStep.scala: ## @@ -50,6 +50,8 @@ private[spark]

[GitHub] [spark] LuciferYang commented on a diff in pull request #38865: [SPARK-41232][SQL][PYTHON] Adding array_append function

2022-12-04 Thread GitBox
LuciferYang commented on code in PR #38865: URL: https://github.com/apache/spark/pull/38865#discussion_r1039142575 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala: ## @@ -4600,3 +4600,69 @@ case class ArrayExcept(left:

[GitHub] [spark] mridulm commented on pull request #38901: [SPARK-41376][CORE] Correct the Netty preferDirectBufs check logic on executor start

2022-12-04 Thread GitBox
mridulm commented on PR #38901: URL: https://github.com/apache/spark/pull/38901#issuecomment-1336751517 +CC @Ngone51 (who last updated this) and @cloud-fan (who merged the commit). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

[GitHub] [spark] cloud-fan commented on a diff in pull request #38908: [SPARK-41384][CONNECT] Should use SQLExpression for str arguments in Projection

2022-12-04 Thread GitBox
cloud-fan commented on code in PR #38908: URL: https://github.com/apache/spark/pull/38908#discussion_r1039153768 ## python/pyspark/sql/connect/dataframe.py: ## @@ -137,7 +137,13 @@ def isEmpty(self) -> bool: return len(self.take(1)) == 0 def select(self, *cols:

[GitHub] [spark] cloud-fan commented on a diff in pull request #38908: [SPARK-41384][CONNECT] Should use SQLExpression for str arguments in Projection

2022-12-04 Thread GitBox
cloud-fan commented on code in PR #38908: URL: https://github.com/apache/spark/pull/38908#discussion_r1039153585 ## python/pyspark/sql/connect/dataframe.py: ## @@ -137,7 +137,13 @@ def isEmpty(self) -> bool: return len(self.take(1)) == 0 def select(self, *cols:

[GitHub] [spark] dongjoon-hyun commented on pull request #38909: [SPARK-41385][K8S] Replace deprecated `.newInstance()` in K8s module

2022-12-04 Thread GitBox
dongjoon-hyun commented on PR #38909: URL: https://github.com/apache/spark/pull/38909#issuecomment-1336810296 All tests (except documentation generation) are finished. This PR is irrelevant to the doc generation. Merged to master/3.3. -- This is an automated message from the Apache

[GitHub] [spark] dongjoon-hyun closed pull request #38909: [SPARK-41385][K8S] Replace deprecated `.newInstance()` in K8s module

2022-12-04 Thread GitBox
dongjoon-hyun closed pull request #38909: [SPARK-41385][K8S] Replace deprecated `.newInstance()` in K8s module URL: https://github.com/apache/spark/pull/38909 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

[GitHub] [spark] Yaohua628 commented on pull request #38910: [SPARK-41151][FOLLOW-UP][SQL][3.3] Keep built-in file _metadata fields nullable value consistent

2022-12-04 Thread GitBox
Yaohua628 commented on PR #38910: URL: https://github.com/apache/spark/pull/38910#issuecomment-1336816189 @cloud-fan Here's the 3.3 cherry-pick, thanks! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[GitHub] [spark] wankunde commented on pull request #38682: [SPARK-41167][SQL] Improve multi like performance by creating a balanced expression tree predicate

2022-12-04 Thread GitBox
wankunde commented on PR #38682: URL: https://github.com/apache/spark/pull/38682#issuecomment-1336852190 Retest this please -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

[GitHub] [spark] DenineLu commented on pull request #35806: [SPARK-38505][SQL] Make partial aggregation adaptive

2022-12-04 Thread GitBox
DenineLu commented on PR #35806: URL: https://github.com/apache/spark/pull/35806#issuecomment-1336634413 I was interested in working on this, but I tested it with an online production task and found that the performance was regressing. Even though the aggregation time is shortened, the

[GitHub] [spark] pan3793 commented on a diff in pull request #38901: [SPARK-41376][CORE] Executor netty direct memory check should respect spark.shuffle.io.preferDirectBufs

2022-12-04 Thread GitBox
pan3793 commented on code in PR #38901: URL: https://github.com/apache/spark/pull/38901#discussion_r1039086498 ## core/src/main/scala/org/apache/spark/executor/CoarseGrainedExecutorBackend.scala: ## @@ -85,7 +85,8 @@ private[spark] class CoarseGrainedExecutorBackend(

[GitHub] [spark] ulysses-you commented on a diff in pull request #38875: [SPARK-40988][SQL][TEST] Test case for insert partition should verify value

2022-12-04 Thread GitBox
ulysses-you commented on code in PR #38875: URL: https://github.com/apache/spark/pull/38875#discussion_r1039123347 ## sql/core/src/test/scala/org/apache/spark/sql/sources/InsertSuite.scala: ## @@ -2313,6 +2313,33 @@ class InsertSuite extends DataSourceTest with

[GitHub] [spark] LuciferYang commented on a diff in pull request #38865: [SPARK-41232][SQL][PYTHON] Adding array_append function

2022-12-04 Thread GitBox
LuciferYang commented on code in PR #38865: URL: https://github.com/apache/spark/pull/38865#discussion_r1039114830 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala: ## @@ -4600,3 +4600,118 @@ case class ArrayExcept(left:

[GitHub] [spark] dongjoon-hyun opened a new pull request, #38907: [SPARK-40379][K8S][FOLLOWUP] Fix scalastyle failure

2022-12-04 Thread GitBox
dongjoon-hyun opened a new pull request, #38907: URL: https://github.com/apache/spark/pull/38907 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ###

[GitHub] [spark] dongjoon-hyun commented on pull request #37821: [SPARK-40379][K8S] Propagate decommission executor loss reason in K8s

2022-12-04 Thread GitBox
dongjoon-hyun commented on PR #37821: URL: https://github.com/apache/spark/pull/37821#issuecomment-1336723918 Oh, the fixed indentation causes scalastyle failure. :( -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

[GitHub] [spark] mridulm commented on a diff in pull request #38876: [SPARK-41360][CORE] Avoid BlockManager re-registration if the executor has been lost

2022-12-04 Thread GitBox
mridulm commented on code in PR #38876: URL: https://github.com/apache/spark/pull/38876#discussion_r1039131641 ## core/src/main/scala/org/apache/spark/storage/BlockManagerMasterEndpoint.scala: ## @@ -583,7 +586,12 @@ class BlockManagerMasterEndpoint( val time =

[GitHub] [spark] dongjoon-hyun commented on pull request #38909: [SPARK-41385][K8S] Replace deprecated `.newInstance()` in K8s module

2022-12-04 Thread GitBox
dongjoon-hyun commented on PR #38909: URL: https://github.com/apache/spark/pull/38909#issuecomment-1336800566 Thank you for review and approval, @Yikun . -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

[GitHub] [spark] cloud-fan commented on a diff in pull request #38877: [SPARK-41361] [SQL] Invalid call toAttribute on unresolved object exception caused by WidenSetOperationTypes

2022-12-04 Thread GitBox
cloud-fan commented on code in PR #38877: URL: https://github.com/apache/spark/pull/38877#discussion_r1039178253 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/ScriptTransformation.scala: ## @@ -32,7 +32,13 @@ case class ScriptTransformation(

[GitHub] [spark] cloud-fan commented on a diff in pull request #38795: [SPARK-41259][SQL] Spark-sql cli query results should correspond to schema

2022-12-04 Thread GitBox
cloud-fan commented on code in PR #38795: URL: https://github.com/apache/spark/pull/38795#discussion_r1039187199 ## sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkSQLDriver.scala: ## @@ -50,8 +52,21 @@ private[hive] class SparkSQLDriver(val

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #38908: [SPARK-41384][CONNECT] Should use SQLExpression for str arguments in Projection

2022-12-04 Thread GitBox
HyukjinKwon commented on code in PR #38908: URL: https://github.com/apache/spark/pull/38908#discussion_r1039219748 ## python/pyspark/sql/connect/dataframe.py: ## @@ -137,7 +137,13 @@ def isEmpty(self) -> bool: return len(self.take(1)) == 0 def select(self,

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #38908: [SPARK-41384][CONNECT] Should use SQLExpression for str arguments in Projection

2022-12-04 Thread GitBox
HyukjinKwon commented on code in PR #38908: URL: https://github.com/apache/spark/pull/38908#discussion_r1039219061 ## python/pyspark/sql/connect/dataframe.py: ## @@ -137,7 +137,13 @@ def isEmpty(self) -> bool: return len(self.take(1)) == 0 def select(self,

[GitHub] [spark] cloud-fan commented on a diff in pull request #38823: [SPARK-41290][SQL] Support GENERATED ALWAYS AS expressions for columns in create/replace table statements

2022-12-04 Thread GitBox
cloud-fan commented on code in PR #38823: URL: https://github.com/apache/spark/pull/38823#discussion_r1039260520 ## core/src/main/resources/error/error-classes.json: ## @@ -1266,6 +1266,11 @@ "DISTRIBUTE BY clause." ] }, +

[GitHub] [spark] amaliujia commented on a diff in pull request #38899: [SPARK-41349][CONNECT] Implement DataFrame.hint

2022-12-04 Thread GitBox
amaliujia commented on code in PR #38899: URL: https://github.com/apache/spark/pull/38899#discussion_r1039041717 ## connector/connect/src/test/scala/org/apache/spark/sql/connect/planner/SparkConnectPlannerSuite.scala: ## @@ -571,4 +572,28 @@ class SparkConnectPlannerSuite

[GitHub] [spark] HyukjinKwon commented on pull request #38874: [SPARK-41235][SQL][PYTHON]High-order function: array_compact implementation

2022-12-04 Thread GitBox
HyukjinKwon commented on PR #38874: URL: https://github.com/apache/spark/pull/38874#issuecomment-1336598962 Thanks for reviewing this. @LuciferYang let me know when you think it's ready to go. -- This is an automated message from the Apache Git Service. To respond to the message, please

[GitHub] [spark] HyukjinKwon commented on pull request #38865: [SPARK-41232][SQL][PYTHON] Adding array_append function

2022-12-04 Thread GitBox
HyukjinKwon commented on PR #38865: URL: https://github.com/apache/spark/pull/38865#issuecomment-1336598796 @LuciferYang let me know when you think it's ready to go ahead. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

[GitHub] [spark] HyukjinKwon closed pull request #38862: [SPARK-41350][SQL] Allow simple name access of join hidden columns after subquery alias

2022-12-04 Thread GitBox
HyukjinKwon closed pull request #38862: [SPARK-41350][SQL] Allow simple name access of join hidden columns after subquery alias URL: https://github.com/apache/spark/pull/38862 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

[GitHub] [spark] wineternity commented on a diff in pull request #38702: [SPARK-41187][CORE] LiveExecutor MemoryLeak in AppStatusListener when ExecutorLost happen

2022-12-04 Thread GitBox
wineternity commented on code in PR #38702: URL: https://github.com/apache/spark/pull/38702#discussion_r1039103663 ## core/src/main/scala/org/apache/spark/status/AppStatusListener.scala: ## @@ -645,8 +645,11 @@ private[spark] class AppStatusListener( } override def

[GitHub] [spark] LuciferYang commented on a diff in pull request #38865: [SPARK-41232][SQL][PYTHON] Adding array_append function

2022-12-04 Thread GitBox
LuciferYang commented on code in PR #38865: URL: https://github.com/apache/spark/pull/38865#discussion_r1039127876 ## sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/CollectionExpressionsSuite.scala: ## @@ -2596,4 +2596,82 @@ class

[GitHub] [spark] zwangsheng commented on a diff in pull request #38202: [SPARK-40763][K8S] Should expose driver service name to config for user features

2022-12-04 Thread GitBox
zwangsheng commented on code in PR #38202: URL: https://github.com/apache/spark/pull/38202#discussion_r1039131007 ## resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/features/DriverServiceFeatureStep.scala: ## @@ -50,6 +50,8 @@ private[spark] class

[GitHub] [spark] dongjoon-hyun commented on pull request #38907: [SPARK-40379][K8S][FOLLOWUP] Fix scalastyle failure

2022-12-04 Thread GitBox
dongjoon-hyun commented on PR #38907: URL: https://github.com/apache/spark/pull/38907#issuecomment-1336742706 Could you review this, @HyukjinKwon ? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

[GitHub] [spark] wangyum closed pull request #38907: [SPARK-40379][K8S][FOLLOWUP] Fix scalastyle failure

2022-12-04 Thread GitBox
wangyum closed pull request #38907: [SPARK-40379][K8S][FOLLOWUP] Fix scalastyle failure URL: https://github.com/apache/spark/pull/38907 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] dongjoon-hyun commented on pull request #38907: [SPARK-40379][K8S][FOLLOWUP] Fix scalastyle failure

2022-12-04 Thread GitBox
dongjoon-hyun commented on PR #38907: URL: https://github.com/apache/spark/pull/38907#issuecomment-1336767713 Thank you so much, @wangyum ! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] wangyum commented on pull request #38907: [SPARK-40379][K8S][FOLLOWUP] Fix scalastyle failure

2022-12-04 Thread GitBox
wangyum commented on PR #38907: URL: https://github.com/apache/spark/pull/38907#issuecomment-1336767774 Merged to master -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

[GitHub] [spark] LuciferYang commented on pull request #38874: [SPARK-41235][SQL][PYTHON]High-order function: array_compact implementation

2022-12-04 Thread GitBox
LuciferYang commented on PR #38874: URL: https://github.com/apache/spark/pull/38874#issuecomment-1336766817 Please add some invalid input test cases @sandeep-katta and add some sql test to `src/test/resources/sql-tests/inputs/array.sql` -- This is an automated message from the Apache

[GitHub] [spark] amaliujia commented on a diff in pull request #38908: [SPARK-41384][CONNECT] Should use SQLExpression for str arguments in Projection

2022-12-04 Thread GitBox
amaliujia commented on code in PR #38908: URL: https://github.com/apache/spark/pull/38908#discussion_r1039159115 ## python/pyspark/sql/connect/dataframe.py: ## @@ -137,7 +137,13 @@ def isEmpty(self) -> bool: return len(self.take(1)) == 0 def select(self, *cols:

[GitHub] [spark] HeartSaVioR commented on pull request #38906: [SPARK-41379][SS][PYTHON] Provide cloned spark session in DataFrame in user function for foreachBatch sink in PySpark

2022-12-04 Thread GitBox
HeartSaVioR commented on PR #38906: URL: https://github.com/apache/spark/pull/38906#issuecomment-1336777465 Thanks! Merging to master/3.3. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] HeartSaVioR commented on pull request #38517: [SPARK-39591][SS] Async Progress Tracking

2022-12-04 Thread GitBox
HeartSaVioR commented on PR #38517: URL: https://github.com/apache/spark/pull/38517#issuecomment-1336784998 cc. @zsxwing @viirya @xuanyuanking to seek a chance for getting help on reviewing. I'll look into the PR sooner as well. -- This is an automated message from the Apache Git

[GitHub] [spark] dongjoon-hyun commented on pull request #38909: [SPARK-41385][K8S] Replace deprecated `.newInstance()` in K8s module

2022-12-04 Thread GitBox
dongjoon-hyun commented on PR #38909: URL: https://github.com/apache/spark/pull/38909#issuecomment-1336784356 Could you review this, @Yikun ? I missed this at SPARK-37145. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

[GitHub] [spark] Yikf commented on a diff in pull request #38795: [SPARK-41259][SQL] Spark-sql cli query results should correspond to schema

2022-12-04 Thread GitBox
Yikf commented on code in PR #38795: URL: https://github.com/apache/spark/pull/38795#discussion_r1039170262 ## sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkSQLDriver.scala: ## @@ -50,8 +52,21 @@ private[hive] class SparkSQLDriver(val context:

[GitHub] [spark] dongjoon-hyun commented on a diff in pull request #38904: [SPARK-41378][SQL] Support Column Stats in DS v2

2022-12-04 Thread GitBox
dongjoon-hyun commented on code in PR #38904: URL: https://github.com/apache/spark/pull/38904#discussion_r1039174225 ## sql/catalyst/src/test/scala/org/apache/spark/sql/connector/catalog/InMemoryBaseTable.scala: ## @@ -273,7 +275,24 @@ abstract class InMemoryBaseTable( }

[GitHub] [spark] Ngone51 commented on a diff in pull request #38876: [SPARK-41360][CORE] Avoid BlockManager re-registration if the executor has been lost

2022-12-04 Thread GitBox
Ngone51 commented on code in PR #38876: URL: https://github.com/apache/spark/pull/38876#discussion_r1039195161 ## core/src/main/scala/org/apache/spark/storage/BlockManagerMasterEndpoint.scala: ## @@ -583,7 +586,12 @@ class BlockManagerMasterEndpoint( val time =

[GitHub] [spark] sandeep-katta commented on a diff in pull request #38874: [SPARK-41235][SQL][PYTHON]High-order function: array_compact implementation

2022-12-04 Thread GitBox
sandeep-katta commented on code in PR #38874: URL: https://github.com/apache/spark/pull/38874#discussion_r1039205150 ## sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/CollectionExpressionsSuite.scala: ## @@ -2596,4 +2596,33 @@ class

[GitHub] [spark] sandeep-katta commented on pull request #38874: [SPARK-41235][SQL][PYTHON]High-order function: array_compact implementation

2022-12-04 Thread GitBox
sandeep-katta commented on PR #38874: URL: https://github.com/apache/spark/pull/38874#issuecomment-1336834908 @LuciferYang I added SQL tests you could you please review again. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

[GitHub] [spark] HeartSaVioR commented on pull request #38911: [SPARK-41387][SS] Add defensive assertions to Kafka data source for Trigger.AvailableNow

2022-12-04 Thread GitBox
HeartSaVioR commented on PR #38911: URL: https://github.com/apache/spark/pull/38911#issuecomment-1336865300 cc. @zsxwing @viirya @jerrypeng Please take a look. Thanks in advance! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

[GitHub] [spark] AmplabJenkins commented on pull request #38901: [SPARK-41376][CORE] Executor netty direct memory check should respect spark.shuffle.io.preferDirectBufs

2022-12-04 Thread GitBox
AmplabJenkins commented on PR #38901: URL: https://github.com/apache/spark/pull/38901#issuecomment-1336533550 Can one of the admins verify this patch? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[GitHub] [spark] AmplabJenkins commented on pull request #38899: [SPARK-41349][CONNECT] Implement DataFrame.hint

2022-12-04 Thread GitBox
AmplabJenkins commented on PR #38899: URL: https://github.com/apache/spark/pull/38899#issuecomment-1336533561 Can one of the admins verify this patch? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[GitHub] [spark] AmplabJenkins commented on pull request #38898: [SPARK-41375][SS] Avoid empty latest KafkaSourceOffset

2022-12-04 Thread GitBox
AmplabJenkins commented on PR #38898: URL: https://github.com/apache/spark/pull/38898#issuecomment-1336533569 Can one of the admins verify this patch? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[GitHub] [spark] huaxingao opened a new pull request, #38904: [SPARK-41378][SQL] Support Column Stats in DS v2

2022-12-04 Thread GitBox
huaxingao opened a new pull request, #38904: URL: https://github.com/apache/spark/pull/38904 ### What changes were proposed in this pull request? Support Col Stats in DS v2 ### Why are the changes needed? Currently only Table stats is supported in DS V2. Column stats

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #38886: [SPARK-41346][CONNECT][PYTHON][FOLLOWUP] `test_connect_function` cleanup

2022-12-04 Thread GitBox
HyukjinKwon commented on code in PR #38886: URL: https://github.com/apache/spark/pull/38886#discussion_r1039066850 ## python/pyspark/sql/tests/connect/test_connect_function.py: ## @@ -21,7 +21,6 @@ from pyspark.testing.sqlutils import have_pandas, SQLTestUtils from

[GitHub] [spark] HyukjinKwon commented on pull request #38886: [SPARK-41346][CONNECT][PYTHON][FOLLOWUP] `test_connect_function` cleanup

2022-12-04 Thread GitBox
HyukjinKwon commented on PR #38886: URL: https://github.com/apache/spark/pull/38886#issuecomment-1336586450 build: https://github.com/zhengruifeng/spark/actions/runs/3615952109/jobs/6093460025 -- This is an automated message from the Apache Git Service. To respond to the message, please

[GitHub] [spark] Ngone51 commented on a diff in pull request #38876: [SPARK-41360][CORE] Avoid BlockManager re-registration if the executor has been lost

2022-12-04 Thread GitBox
Ngone51 commented on code in PR #38876: URL: https://github.com/apache/spark/pull/38876#discussion_r1039080517 ## core/src/main/scala/org/apache/spark/storage/BlockManagerMasterEndpoint.scala: ## @@ -583,7 +586,12 @@ class BlockManagerMasterEndpoint( val time =

[GitHub] [spark] toujours33 commented on a diff in pull request #38711: [SPARK-41192][Core] Remove unscheduled speculative tasks when task finished to obtain better dynamic

2022-12-04 Thread GitBox
toujours33 commented on code in PR #38711: URL: https://github.com/apache/spark/pull/38711#discussion_r1039084796 ## core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala: ## @@ -383,8 +383,8 @@ private[spark] class DAGScheduler( /** * Called by the

[GitHub] [spark] pan3793 commented on a diff in pull request #38901: [SPARK-41376][CORE] Executor netty direct memory check should respect spark.shuffle.io.preferDirectBufs

2022-12-04 Thread GitBox
pan3793 commented on code in PR #38901: URL: https://github.com/apache/spark/pull/38901#discussion_r1039084916 ## core/src/main/scala/org/apache/spark/executor/CoarseGrainedExecutorBackend.scala: ## @@ -85,7 +85,8 @@ private[spark] class CoarseGrainedExecutorBackend(

[GitHub] [spark] zhengruifeng commented on pull request #38886: [SPARK-41346][CONNECT][PYTHON][FOLLOWUP] `test_connect_function` cleanup

2022-12-04 Thread GitBox
zhengruifeng commented on PR #38886: URL: https://github.com/apache/spark/pull/38886#issuecomment-1336668016 thank you! merged into master -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] zhengruifeng closed pull request #38886: [SPARK-41346][CONNECT][PYTHON][FOLLOWUP] `test_connect_function` cleanup

2022-12-04 Thread GitBox
zhengruifeng closed pull request #38886: [SPARK-41346][CONNECT][PYTHON][FOLLOWUP] `test_connect_function` cleanup URL: https://github.com/apache/spark/pull/38886 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[GitHub] [spark] LuciferYang commented on a diff in pull request #38865: [SPARK-41232][SQL][PYTHON] Adding array_append function

2022-12-04 Thread GitBox
LuciferYang commented on code in PR #38865: URL: https://github.com/apache/spark/pull/38865#discussion_r1039110793 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala: ## @@ -4600,3 +4600,118 @@ case class ArrayExcept(left:

[GitHub] [spark] LuciferYang commented on a diff in pull request #38865: [SPARK-41232][SQL][PYTHON] Adding array_append function

2022-12-04 Thread GitBox
LuciferYang commented on code in PR #38865: URL: https://github.com/apache/spark/pull/38865#discussion_r1039114830 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala: ## @@ -4600,3 +4600,118 @@ case class ArrayExcept(left:

[GitHub] [spark] LuciferYang commented on a diff in pull request #38865: [SPARK-41232][SQL][PYTHON] Adding array_append function

2022-12-04 Thread GitBox
LuciferYang commented on code in PR #38865: URL: https://github.com/apache/spark/pull/38865#discussion_r1039114830 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala: ## @@ -4600,3 +4600,118 @@ case class ArrayExcept(left:

[GitHub] [spark] zhengruifeng commented on a diff in pull request #38905: [SPARK-41380][CONNECT][PYTHON] Implement aggregation functions

2022-12-04 Thread GitBox
zhengruifeng commented on code in PR #38905: URL: https://github.com/apache/spark/pull/38905#discussion_r1039124841 ## python/pyspark/sql/tests/connect/test_connect_function.py: ## @@ -356,6 +356,96 @@ def test_math_functions(self):

[GitHub] [spark] dongjoon-hyun commented on pull request #37821: [SPARK-40379][K8S] Propagate decommission executor loss reason in K8s

2022-12-04 Thread GitBox
dongjoon-hyun commented on PR #37821: URL: https://github.com/apache/spark/pull/37821#issuecomment-1336725688 My bad. I created a follow-up to make it sure. - https://github.com/apache/spark/pull/38907 -- This is an automated message from the Apache Git Service. To respond to the

[GitHub] [spark] HeartSaVioR commented on pull request #38906: [SPARK-41379][SS][PYTHON] Provide cloned spark session in DataFrame in user function for foreachBatch sink in PySpark

2022-12-04 Thread GitBox
HeartSaVioR commented on PR #38906: URL: https://github.com/apache/spark/pull/38906#issuecomment-1336743969 Yes, as long as they use `DF.sparkSession`. Although this is still not 100% covering the case as there is no way to prevent end users to access sparkSession outside of user function

[GitHub] [spark] cloud-fan commented on pull request #38862: [SPARK-41350][SQL] Allow simple name access of join hidden columns after subquery alias

2022-12-04 Thread GitBox
cloud-fan commented on PR #38862: URL: https://github.com/apache/spark/pull/38862#issuecomment-1336744721 @HyukjinKwon this is a bug fix and needs to go to 3.3 as well, can you help to backport via local git operation? -- This is an automated message from the Apache Git Service. To

[GitHub] [spark] mridulm commented on a diff in pull request #38779: [SPARK-41244][UI] Introducing a Protobuf serializer for UI data on KV store

2022-12-04 Thread GitBox
mridulm commented on code in PR #38779: URL: https://github.com/apache/spark/pull/38779#discussion_r1039144282 ## core/src/main/protobuf/org/apache/spark/status/protobuf/store_types.proto: ## @@ -0,0 +1,60 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or

[GitHub] [spark] mridulm commented on a diff in pull request #38779: [SPARK-41244][UI] Introducing a Protobuf serializer for UI data on KV store

2022-12-04 Thread GitBox
mridulm commented on code in PR #38779: URL: https://github.com/apache/spark/pull/38779#discussion_r1039144282 ## core/src/main/protobuf/org/apache/spark/status/protobuf/store_types.proto: ## @@ -0,0 +1,60 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or

[GitHub] [spark] LuciferYang commented on pull request #38865: [SPARK-41232][SQL][PYTHON] Adding array_append function

2022-12-04 Thread GitBox
LuciferYang commented on PR #38865: URL: https://github.com/apache/spark/pull/38865#issuecomment-1336752983 @infoankitp ``` 2022-12-03T13:00:36.5812875Z [info] ExpressionsSchemaSuite: 2022-12-03T13:00:37.5439485Z [info]

[GitHub] [spark] grundprinzip commented on pull request #38905: [SPARK-41380][CONNECT][PYTHON] Implement aggregation functions

2022-12-04 Thread GitBox
grundprinzip commented on PR #38905: URL: https://github.com/apache/spark/pull/38905#issuecomment-1336772089 > shall we consider sharing code between pyspark and spark connect python client? Yes, as part of the packaging we have to merge this code back with the PySpark code.

[GitHub] [spark] LuciferYang commented on pull request #38874: [SPARK-41235][SQL][PYTHON]High-order function: array_compact implementation

2022-12-04 Thread GitBox
LuciferYang commented on PR #38874: URL: https://github.com/apache/spark/pull/38874#issuecomment-1336772413 > I personally think the output `[[1, null, 3], [null, 2, 3]]` is expected, let me confirm it. If this case is correct, I think Scala part is OK except fo lack of `invalid

[GitHub] [spark] Yikf commented on a diff in pull request #38795: [SPARK-41259][SQL] Spark-sql cli query results should correspond to schema

2022-12-04 Thread GitBox
Yikf commented on code in PR #38795: URL: https://github.com/apache/spark/pull/38795#discussion_r1039170262 ## sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkSQLDriver.scala: ## @@ -50,8 +52,21 @@ private[hive] class SparkSQLDriver(val context:

[GitHub] [spark] LuciferYang commented on pull request #38874: [SPARK-41235][SQL][PYTHON]High-order function: array_compact implementation

2022-12-04 Thread GitBox
LuciferYang commented on PR #38874: URL: https://github.com/apache/spark/pull/38874#issuecomment-1336792961 > > Please add some invalid input test cases @sandeep-katta and add some sql test to `src/test/resources/sql-tests/inputs/array.sql` > > Thanks @LuciferYang for the review, I

[GitHub] [spark] dongjoon-hyun commented on a diff in pull request #38904: [SPARK-41378][SQL] Support Column Stats in DS v2

2022-12-04 Thread GitBox
dongjoon-hyun commented on code in PR #38904: URL: https://github.com/apache/spark/pull/38904#discussion_r1039216766 ## sql/catalyst/src/test/scala/org/apache/spark/sql/connector/catalog/InMemoryBaseTable.scala: ## @@ -273,7 +275,24 @@ abstract class InMemoryBaseTable( }

[GitHub] [spark] HeartSaVioR opened a new pull request, #38911: [SPARK-41387][SS] Add defensive assertions to Kafka data source for Trigger.AvailableNow

2022-12-04 Thread GitBox
HeartSaVioR opened a new pull request, #38911: URL: https://github.com/apache/spark/pull/38911 ### What changes were proposed in this pull request? This PR proposes to add defensive assertions to Kafka data source for Trigger.AvailableNow, so that the query will rather fail fast

[GitHub] [spark] LuciferYang commented on a diff in pull request #38874: [SPARK-41235][SQL][PYTHON]High-order function: array_compact implementation

2022-12-04 Thread GitBox
LuciferYang commented on code in PR #38874: URL: https://github.com/apache/spark/pull/38874#discussion_r1039256946 ## sql/core/src/test/resources/sql-tests/inputs/array.sql: ## @@ -119,3 +119,21 @@ select get(array(1, 2, 3), 0); select get(array(1, 2, 3), 3); select

[GitHub] [spark] HyukjinKwon commented on pull request #38890: [SPARK-41305][CONNECT] Improve Documentation for Command proto

2022-12-04 Thread GitBox
HyukjinKwon commented on PR #38890: URL: https://github.com/apache/spark/pull/38890#issuecomment-1336584701 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] HyukjinKwon closed pull request #38890: [SPARK-41305][CONNECT] Improve Documentation for Command proto

2022-12-04 Thread GitBox
HyukjinKwon closed pull request #38890: [SPARK-41305][CONNECT] Improve Documentation for Command proto URL: https://github.com/apache/spark/pull/38890 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

[GitHub] [spark] Ngone51 commented on a diff in pull request #38876: [SPARK-41360][CORE] Avoid BlockManager re-registration if the executor has been lost

2022-12-04 Thread GitBox
Ngone51 commented on code in PR #38876: URL: https://github.com/apache/spark/pull/38876#discussion_r1039081212 ## core/src/main/scala/org/apache/spark/storage/BlockManagerMasterEndpoint.scala: ## @@ -616,10 +624,29 @@ class BlockManagerMasterEndpoint( if

[GitHub] [spark] pan3793 commented on pull request #38901: [SPARK-41376][CORE] Executor netty direct memory check should respect spark.shuffle.io.preferDirectBufs

2022-12-04 Thread GitBox
pan3793 commented on PR #38901: URL: https://github.com/apache/spark/pull/38901#issuecomment-1336649430 @srowen code and comment have been updated. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

[GitHub] [spark] zhengruifeng opened a new pull request, #38905: [SPARK-41380][CONNECT][PYTHON] Implement aggregation functions

2022-12-04 Thread GitBox
zhengruifeng opened a new pull request, #38905: URL: https://github.com/apache/spark/pull/38905 ### What changes were proposed in this pull request? Implement aggregation functions, except: 1, `approxCountDistinct`, `countDistinct`, `sumDistinct ` - deprecated 2, `count_distinct`,

[GitHub] [spark] mridulm commented on a diff in pull request #38779: [SPARK-41244][UI] Introducing a Protobuf serializer for UI data on KV store

2022-12-04 Thread GitBox
mridulm commented on code in PR #38779: URL: https://github.com/apache/spark/pull/38779#discussion_r1039144282 ## core/src/main/protobuf/org/apache/spark/status/protobuf/store_types.proto: ## @@ -0,0 +1,60 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or

[GitHub] [spark] amaliujia commented on pull request #38908: [SPARK-41384][CONNECT] Should use SQLExpression for str arguments in Projection

2022-12-04 Thread GitBox
amaliujia commented on PR #38908: URL: https://github.com/apache/spark/pull/38908#issuecomment-1336759090 cc @HyukjinKwon @zhengruifeng @grundprinzip @xinrong-meng -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

[GitHub] [spark] amaliujia opened a new pull request, #38908: [SPARK-41384][CONNECT] Should use SQLExpression for str arguments in Projection

2022-12-04 Thread GitBox
amaliujia opened a new pull request, #38908: URL: https://github.com/apache/spark/pull/38908 ### What changes were proposed in this pull request? We can depending on the server side SQL parse to parse the strings in projection so that clients side do not need to reason about

[GitHub] [spark] LuciferYang commented on pull request #38865: [SPARK-41232][SQL][PYTHON] Adding array_append function

2022-12-04 Thread GitBox
LuciferYang commented on PR #38865: URL: https://github.com/apache/spark/pull/38865#issuecomment-1336759147 @infoankitp Would you mind adding some sql related tests to `sql-tests/inputs/array.sql`? -- This is an automated message from the Apache Git Service. To respond to the message,

[GitHub] [spark] vinodkc commented on pull request #38419: [SPARK-40945][SQL] Support built-in function to truncate numbers

2022-12-04 Thread GitBox
vinodkc commented on PR #38419: URL: https://github.com/apache/spark/pull/38419#issuecomment-1336776081 @cloud-fan , Can you please review it? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] LuciferYang commented on a diff in pull request #38711: [SPARK-41192][Core] Remove unscheduled speculative tasks when task finished to obtain better dynamic

2022-12-04 Thread GitBox
LuciferYang commented on code in PR #38711: URL: https://github.com/apache/spark/pull/38711#discussion_r1039157936 ## core/src/main/scala/org/apache/spark/scheduler/SparkListener.scala: ## @@ -55,7 +55,8 @@ case class SparkListenerTaskGettingResult(taskInfo: TaskInfo) extends

[GitHub] [spark] Yikf commented on pull request #38795: [SPARK-41259][SQL] Spark-sql cli query results should correspond to schema

2022-12-04 Thread GitBox
Yikf commented on PR #38795: URL: https://github.com/apache/spark/pull/38795#issuecomment-1336789289 > Would be easier to follow if you post before/after results in the PR description. Yea, updated the PR descripe, thanks -- This is an automated message from the Apache Git

[GitHub] [spark] zhengruifeng commented on pull request #38905: [SPARK-41380][CONNECT][PYTHON] Implement aggregation functions

2022-12-04 Thread GitBox
zhengruifeng commented on PR #38905: URL: https://github.com/apache/spark/pull/38905#issuecomment-1336789658 merged into master, thanks for reviews! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[GitHub] [spark] sandeep-katta commented on pull request #38874: [SPARK-41235][SQL][PYTHON]High-order function: array_compact implementation

2022-12-04 Thread GitBox
sandeep-katta commented on PR #38874: URL: https://github.com/apache/spark/pull/38874#issuecomment-1336789676 > Please add some invalid input test cases @sandeep-katta and add some sql test to `src/test/resources/sql-tests/inputs/array.sql` Thanks @LuciferYang for the review, I will

[GitHub] [spark] zhengruifeng closed pull request #38905: [SPARK-41380][CONNECT][PYTHON] Implement aggregation functions

2022-12-04 Thread GitBox
zhengruifeng closed pull request #38905: [SPARK-41380][CONNECT][PYTHON] Implement aggregation functions URL: https://github.com/apache/spark/pull/38905 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

[GitHub] [spark] huaxingao commented on a diff in pull request #38904: [SPARK-41378][SQL] Support Column Stats in DS v2

2022-12-04 Thread GitBox
huaxingao commented on code in PR #38904: URL: https://github.com/apache/spark/pull/38904#discussion_r1039177239 ## sql/catalyst/src/test/scala/org/apache/spark/sql/connector/catalog/InMemoryBaseTable.scala: ## @@ -273,7 +275,24 @@ abstract class InMemoryBaseTable( } }

[GitHub] [spark] huaxingao commented on pull request #38904: [SPARK-41378][SQL] Support Column Stats in DS v2

2022-12-04 Thread GitBox
huaxingao commented on PR #38904: URL: https://github.com/apache/spark/pull/38904#issuecomment-1336843298 also cc @cloud-fan -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] LuciferYang commented on a diff in pull request #38874: [SPARK-41235][SQL][PYTHON]High-order function: array_compact implementation

2022-12-04 Thread GitBox
LuciferYang commented on code in PR #38874: URL: https://github.com/apache/spark/pull/38874#discussion_r1039216402 ## sql/core/src/test/resources/sql-tests/inputs/array.sql: ## @@ -119,3 +119,21 @@ select get(array(1, 2, 3), 0); select get(array(1, 2, 3), 3); select

[GitHub] [spark] LuciferYang commented on a diff in pull request #38874: [SPARK-41235][SQL][PYTHON]High-order function: array_compact implementation

2022-12-04 Thread GitBox
LuciferYang commented on code in PR #38874: URL: https://github.com/apache/spark/pull/38874#discussion_r1039215759 ## sql/core/src/test/resources/sql-tests/inputs/array.sql: ## @@ -119,3 +119,21 @@ select get(array(1, 2, 3), 0); select get(array(1, 2, 3), 3); select

[GitHub] [spark] HyukjinKwon commented on pull request #38863: [SPARK-41351][CONNECT] Column should support != operator

2022-12-04 Thread GitBox
HyukjinKwon commented on PR #38863: URL: https://github.com/apache/spark/pull/38863#issuecomment-1336587827 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] HyukjinKwon closed pull request #38863: [SPARK-41351][CONNECT] Column should support != operator

2022-12-04 Thread GitBox
HyukjinKwon closed pull request #38863: [SPARK-41351][CONNECT] Column should support != operator URL: https://github.com/apache/spark/pull/38863 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] panbingkun commented on a diff in pull request #38861: [SPARK-41294][SQL] Assign a name to the error class _LEGACY_ERROR_TEMP_1203 / 1168

2022-12-04 Thread GitBox
panbingkun commented on code in PR #38861: URL: https://github.com/apache/spark/pull/38861#discussion_r1039071288 ## core/src/main/resources/error/error-classes.json: ## @@ -876,6 +876,13 @@ ], "sqlState" : "42000" }, + "NOT_ENOUGH_DATA_COLUMNS" : { +"message"

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #38891: [SPARK-41372][CONNECT][PYTHON] Implement DataFrame TempView

2022-12-04 Thread GitBox
HyukjinKwon commented on code in PR #38891: URL: https://github.com/apache/spark/pull/38891#discussion_r1039073594 ## python/pyspark/sql/connect/dataframe.py: ## @@ -1554,6 +1554,52 @@ def explain( """ print(self._explain_string(extended=extended, mode=mode))

[GitHub] [spark] HeartSaVioR commented on pull request #38906: [SPARK-41379][SS] Provide cloned spark session in DataFrame in user function for foreachBatch sink in PySpark

2022-12-04 Thread GitBox
HeartSaVioR commented on PR #38906: URL: https://github.com/apache/spark/pull/38906#issuecomment-1336687453 That said, the example for foreachBatch sink should be changed to use session from given DataFrame, both Scala/Java API and PySpark API. -- This is an automated message from the

  1   2   >