[GitHub] [spark] cloud-fan commented on a diff in pull request #38823: [SPARK-41290][SQL] Support GENERATED ALWAYS AS expressions for columns in create/replace table statements

2022-12-05 Thread GitBox
cloud-fan commented on code in PR #38823: URL: https://github.com/apache/spark/pull/38823#discussion_r1039265860 ## sql/core/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveSessionCatalog.scala: ## @@ -171,6 +171,12 @@ class ResolveSessionCatalog(val catalogManager:

[GitHub] [spark] cloud-fan commented on pull request #38823: [SPARK-41290][SQL] Support GENERATED ALWAYS AS expressions for columns in create/replace table statements

2022-12-05 Thread GitBox
cloud-fan commented on PR #38823: URL: https://github.com/apache/spark/pull/38823#issuecomment-1336912205 also cc @huaxingao @sunchao -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] cloud-fan commented on pull request #38823: [SPARK-41290][SQL] Support GENERATED ALWAYS AS expressions for columns in create/replace table statements

2022-12-05 Thread GitBox
cloud-fan commented on PR #38823: URL: https://github.com/apache/spark/pull/38823#issuecomment-1336911874 Does the SQL standard say anything about the restrictions of the generate expression? Can we allow `GENERATED AS rand()`? I think at least catalyst should have a rule to check the expre

[GitHub] [spark] dongjoon-hyun commented on a diff in pull request #38912: [SPARK-41388][K8S] `getReusablePVCs` should ignore recently created PVCs in the previous batch

2022-12-05 Thread GitBox
dongjoon-hyun commented on code in PR #38912: URL: https://github.com/apache/spark/pull/38912#discussion_r1039267464 ## resource-managers/kubernetes/core/src/test/scala/org/apache/spark/scheduler/cluster/k8s/ExecutorPodsAllocatorSuite.scala: ## @@ -721,8 +722,10 @@ class Executo

[GitHub] [spark] sandeep-katta commented on a diff in pull request #38874: [SPARK-41235][SQL][PYTHON]High-order function: array_compact implementation

2022-12-05 Thread GitBox
sandeep-katta commented on code in PR #38874: URL: https://github.com/apache/spark/pull/38874#discussion_r1039276727 ## sql/core/src/test/resources/sql-tests/inputs/array.sql: ## @@ -119,3 +119,21 @@ select get(array(1, 2, 3), 0); select get(array(1, 2, 3), 3); select get(arra

[GitHub] [spark] cloud-fan commented on pull request #38765: [SPARK-41355][SQL] Workaround hive table name validation issue

2022-12-05 Thread GitBox
cloud-fan commented on PR #38765: URL: https://github.com/apache/spark/pull/38765#issuecomment-1336938464 thanks, merging to master! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific c

[GitHub] [spark] cloud-fan closed pull request #38765: [SPARK-41355][SQL] Workaround hive table name validation issue

2022-12-05 Thread GitBox
cloud-fan closed pull request #38765: [SPARK-41355][SQL] Workaround hive table name validation issue URL: https://github.com/apache/spark/pull/38765 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to t

[GitHub] [spark] AmplabJenkins commented on pull request #38908: [SPARK-41384][CONNECT] Should use SQLExpression for str arguments in Projection

2022-12-05 Thread GitBox
AmplabJenkins commented on PR #38908: URL: https://github.com/apache/spark/pull/38908#issuecomment-1336942447 Can one of the admins verify this patch? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

[GitHub] [spark] LuciferYang opened a new pull request, #38913: [SPARK-41389][SQL] Reuse `WRONG_NUM_ARGS` instead of `_LEGACY_ERROR_TEMP_1043` and `_LEGACY_ERROR_TEMP_1044`

2022-12-05 Thread GitBox
LuciferYang opened a new pull request, #38913: URL: https://github.com/apache/spark/pull/38913 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### How

[GitHub] [spark] LuciferYang commented on pull request #38913: [SPARK-41389][SQL] Reuse `WRONG_NUM_ARGS` instead of `_LEGACY_ERROR_TEMP_1043` and `_LEGACY_ERROR_TEMP_1044`

2022-12-05 Thread GitBox
LuciferYang commented on PR #38913: URL: https://github.com/apache/spark/pull/38913#issuecomment-1336946077 Test first -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To un

[GitHub] [spark] dongjoon-hyun commented on pull request #38912: [SPARK-41388][K8S] `getReusablePVCs` should ignore recently created PVCs in the previous batch

2022-12-05 Thread GitBox
dongjoon-hyun commented on PR #38912: URL: https://github.com/apache/spark/pull/38912#issuecomment-1336960628 Could you review this when you have some time, @viirya ? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

[GitHub] [spark] grundprinzip commented on a diff in pull request #38908: [SPARK-41384][CONNECT] Should use SQLExpression for str arguments in Projection

2022-12-05 Thread GitBox
grundprinzip commented on code in PR #38908: URL: https://github.com/apache/spark/pull/38908#discussion_r1039311509 ## python/pyspark/sql/connect/dataframe.py: ## @@ -137,7 +137,13 @@ def isEmpty(self) -> bool: return len(self.take(1)) == 0 def select(self, *cols

[GitHub] [spark] dongjoon-hyun commented on pull request #38912: [SPARK-41388][K8S] `getReusablePVCs` should ignore recently created PVCs in the previous batch

2022-12-05 Thread GitBox
dongjoon-hyun commented on PR #38912: URL: https://github.com/apache/spark/pull/38912#issuecomment-1336973835 Thank you so much! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comme

[GitHub] [spark] dongjoon-hyun commented on pull request #38912: [SPARK-41388][K8S] `getReusablePVCs` should ignore recently created PVCs in the previous batch

2022-12-05 Thread GitBox
dongjoon-hyun commented on PR #38912: URL: https://github.com/apache/spark/pull/38912#issuecomment-1336996286 All tests passed. Merged to master/3.3/3.2. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[GitHub] [spark] dongjoon-hyun closed pull request #38912: [SPARK-41388][K8S] `getReusablePVCs` should ignore recently created PVCs in the previous batch

2022-12-05 Thread GitBox
dongjoon-hyun closed pull request #38912: [SPARK-41388][K8S] `getReusablePVCs` should ignore recently created PVCs in the previous batch URL: https://github.com/apache/spark/pull/38912 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to Git

[GitHub] [spark] zhengruifeng opened a new pull request, #38914: [SPARK-41381][CONNECT][PYTHON] Implement `count_distinct` and `sum_distinct` functions

2022-12-05 Thread GitBox
zhengruifeng opened a new pull request, #38914: URL: https://github.com/apache/spark/pull/38914 ### What changes were proposed in this pull request? Implement `count_distinct` and `sum_distinct` functions ### Why are the changes needed? for API coverage ### Does th

[GitHub] [spark] grundprinzip commented on a diff in pull request #38908: [SPARK-41384][CONNECT] Should use SQLExpression for str arguments in Projection

2022-12-05 Thread GitBox
grundprinzip commented on code in PR #38908: URL: https://github.com/apache/spark/pull/38908#discussion_r1039339231 ## python/pyspark/sql/connect/dataframe.py: ## @@ -137,7 +137,13 @@ def isEmpty(self) -> bool: return len(self.take(1)) == 0 def select(self, *cols

[GitHub] [spark] grundprinzip commented on a diff in pull request #38908: [SPARK-41384][CONNECT] Should use SQLExpression for str arguments in Projection

2022-12-05 Thread GitBox
grundprinzip commented on code in PR #38908: URL: https://github.com/apache/spark/pull/38908#discussion_r1039348490 ## python/pyspark/sql/connect/dataframe.py: ## @@ -137,7 +137,13 @@ def isEmpty(self) -> bool: return len(self.take(1)) == 0 def select(self, *cols

[GitHub] [spark] Yikf commented on a diff in pull request #38795: [SPARK-41259][SQL] Spark-sql cli query results should correspond to schema

2022-12-05 Thread GitBox
Yikf commented on code in PR #38795: URL: https://github.com/apache/spark/pull/38795#discussion_r1039355440 ## sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkSQLDriver.scala: ## @@ -50,8 +52,21 @@ private[hive] class SparkSQLDriver(val context:

[GitHub] [spark] dengziming commented on a diff in pull request #38899: [SPARK-41349][CONNECT] Implement DataFrame.hint

2022-12-05 Thread GitBox
dengziming commented on code in PR #38899: URL: https://github.com/apache/spark/pull/38899#discussion_r1039377343 ## connector/connect/src/test/scala/org/apache/spark/sql/connect/planner/LiteralValueProtoConverterSuite.scala: ## @@ -0,0 +1,35 @@ +/* + * Licensed to the Apache So

[GitHub] [spark] zhengruifeng opened a new pull request, #38915: [SPARK-41382][CONNECT][PYTHON] Implement `product` function

2022-12-05 Thread GitBox
zhengruifeng opened a new pull request, #38915: URL: https://github.com/apache/spark/pull/38915 ### What changes were proposed in this pull request? Implement `product` function ### Why are the changes needed? for API coverage ### Does this PR introduce _any_ user-

[GitHub] [spark] zhengruifeng commented on a diff in pull request #38915: [SPARK-41382][CONNECT][PYTHON] Implement `product` function

2022-12-05 Thread GitBox
zhengruifeng commented on code in PR #38915: URL: https://github.com/apache/spark/pull/38915#discussion_r1039400753 ## connector/connect/src/main/scala/org/apache/spark/sql/connect/planner/SparkConnectPlanner.scala: ## @@ -539,6 +539,15 @@ class SparkConnectPlanner(session: Spar

[GitHub] [spark] wankunde commented on pull request #38682: [SPARK-41167][SQL] Improve multi like performance by creating a balanced expression tree predicate

2022-12-05 Thread GitBox
wankunde commented on PR #38682: URL: https://github.com/apache/spark/pull/38682#issuecomment-1337096043 Retest this please -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

[GitHub] [spark] holdenk commented on pull request #37821: [SPARK-40379][K8S] Propagate decommission executor loss reason in K8s

2022-12-05 Thread GitBox
holdenk commented on PR #37821: URL: https://github.com/apache/spark/pull/37821#issuecomment-1337135389 Oh hey sorry for my slow ping time on this and thanks for fixing the indentation, I'm on vacation (starter last week). -- This is an automated message from the Apache Git Service. To re

[GitHub] [spark] LuciferYang opened a new pull request, #38916: [MINOR][SQL] Update the script used to generate `register` function in `UDFRegistration`

2022-12-05 Thread GitBox
LuciferYang opened a new pull request, #38916: URL: https://github.com/apache/spark/pull/38916 ### What changes were proposed in this pull request? SPARK-35065 use `QueryCompilationErrors.invalidFunctionArgumentsError` instead of `throw new AnalysisException(...)` for `register` function

[GitHub] [spark] zhengruifeng commented on a diff in pull request #38865: [SPARK-41232][SQL][PYTHON] Adding array_append function

2022-12-05 Thread GitBox
zhengruifeng commented on code in PR #38865: URL: https://github.com/apache/spark/pull/38865#discussion_r1039464912 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala: ## @@ -4600,3 +4600,69 @@ case class ArrayExcept(left: Express

[GitHub] [spark] zhengruifeng commented on a diff in pull request #38874: [SPARK-41235][SQL][PYTHON]High-order function: array_compact implementation

2022-12-05 Thread GitBox
zhengruifeng commented on code in PR #38874: URL: https://github.com/apache/spark/pull/38874#discussion_r1039470282 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala: ## @@ -4600,3 +4600,51 @@ case class ArrayExcept(left: Express

[GitHub] [spark] zhengruifeng commented on a diff in pull request #38865: [SPARK-41232][SQL][PYTHON] Adding array_append function

2022-12-05 Thread GitBox
zhengruifeng commented on code in PR #38865: URL: https://github.com/apache/spark/pull/38865#discussion_r1039470572 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala: ## @@ -4600,3 +4600,69 @@ case class ArrayExcept(left: Express

[GitHub] [spark] zhengruifeng opened a new pull request, #38917: [SPARK-41391][SQL] The output column name of `groupBy.agg(count_distinct)` is incorrect

2022-12-05 Thread GitBox
zhengruifeng opened a new pull request, #38917: URL: https://github.com/apache/spark/pull/38917 ### What changes were proposed in this pull request? correct the output column name of `groupBy.agg(count_distinct)` ### Why are the changes needed? before this PR: `[id: bigin

[GitHub] [spark] melin commented on pull request #38496: [SPARK-40708][SQL] Auto update table statistics based on write metrics

2022-12-05 Thread GitBox
melin commented on PR #38496: URL: https://github.com/apache/spark/pull/38496#issuecomment-1337210022 Support partition statistics? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific co

[GitHub] [spark] zhengruifeng commented on pull request #38914: [SPARK-41381][CONNECT][PYTHON] Implement `count_distinct` and `sum_distinct` functions

2022-12-05 Thread GitBox
zhengruifeng commented on PR #38914: URL: https://github.com/apache/spark/pull/38914#issuecomment-1337212273 ``` DataFrame.columns values are different (66.7 %) [left]: Index(['a', 'count(DISTINCT b)', 'count(DISTINCT c)'], dtype='object') [right]: Index(['a', 'count(b)', 'coun

[GitHub] [spark] Daniel-Davies commented on pull request #38867: [WIP] [SPARK-41234][SQL][PYTHON] Add functionality for array_insert

2022-12-05 Thread GitBox
Daniel-Davies commented on PR #38867: URL: https://github.com/apache/spark/pull/38867#issuecomment-1337219140 @LuciferYang for quick feedback I'd be grateful for an overarching review of the method, and some assistance on the following questions: - Core behaviour: one interesting prop

[GitHub] [spark] jackylee-ch commented on pull request #38496: [SPARK-40708][SQL] Auto update table statistics based on write metrics

2022-12-05 Thread GitBox
jackylee-ch commented on PR #38496: URL: https://github.com/apache/spark/pull/38496#issuecomment-1337252351 > Support partition statistics? @melin I'm working on the supporting of partition statistics update, it relies on workers to return detailed partition statistics. -- This is

[GitHub] [spark] LuciferYang commented on a diff in pull request #38873: [SPARK-41358][SQL] Refactor `ColumnVectorUtils#populate` method to use `PhysicalDataType` instead of `DataType`

2022-12-05 Thread GitBox
LuciferYang commented on code in PR #38873: URL: https://github.com/apache/spark/pull/38873#discussion_r1039618680 ## sql/core/src/main/java/org/apache/spark/sql/execution/vectorized/ColumnVectorUtils.java: ## @@ -125,32 +125,45 @@ public static Map toJavaIntMap(ColumnarMap map

[GitHub] [spark] wangyum closed pull request #38682: [SPARK-41167][SQL] Improve multi like performance by creating a balanced expression tree predicate

2022-12-05 Thread GitBox
wangyum closed pull request #38682: [SPARK-41167][SQL] Improve multi like performance by creating a balanced expression tree predicate URL: https://github.com/apache/spark/pull/38682 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHu

[GitHub] [spark] wangyum commented on pull request #38682: [SPARK-41167][SQL] Improve multi like performance by creating a balanced expression tree predicate

2022-12-05 Thread GitBox
wangyum commented on PR #38682: URL: https://github.com/apache/spark/pull/38682#issuecomment-1337393910 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

[GitHub] [spark] srowen commented on a diff in pull request #38896: [WIP][SQL] Replace `require()` by an internal error in catalyst

2022-12-05 Thread GitBox
srowen commented on code in PR #38896: URL: https://github.com/apache/spark/pull/38896#discussion_r1039687033 ## core/src/main/scala/org/apache/spark/SparkException.scala: ## @@ -87,6 +87,16 @@ object SparkException { messageParameters = Map("message" -> msg), caus

[GitHub] [spark] srowen closed pull request #35017: [SPARK-36853][BUILD] Code failing on checkstyle

2022-12-05 Thread GitBox
srowen closed pull request #35017: [SPARK-36853][BUILD] Code failing on checkstyle URL: https://github.com/apache/spark/pull/35017 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific commen

[GitHub] [spark] srowen closed pull request #37738: add Support Java Class with circular references

2022-12-05 Thread GitBox
srowen closed pull request #37738: add Support Java Class with circular references URL: https://github.com/apache/spark/pull/37738 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific commen

[GitHub] [spark] thejdeep commented on a diff in pull request #36165: [SPARK-36620][SHUFFLE] Add Push Based Shuffle client side read metrics

2022-12-05 Thread GitBox
thejdeep commented on code in PR #36165: URL: https://github.com/apache/spark/pull/36165#discussion_r1039691827 ## core/src/main/scala/org/apache/spark/executor/Executor.scala: ## @@ -791,6 +770,53 @@ private[spark] class Executor( } } +private def incrementShu

[GitHub] [spark] srowen closed pull request #37795: fix the question of SparkSQL call iceberg's expire_snapshots procedur…

2022-12-05 Thread GitBox
srowen closed pull request #37795: fix the question of SparkSQL call iceberg's expire_snapshots procedur… URL: https://github.com/apache/spark/pull/37795 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

[GitHub] [spark] srowen closed pull request #37862: [MINOR][SQL] Remove an unnecessary parameter of the PartitionedFileUtil.splitFiles

2022-12-05 Thread GitBox
srowen closed pull request #37862: [MINOR][SQL] Remove an unnecessary parameter of the PartitionedFileUtil.splitFiles URL: https://github.com/apache/spark/pull/37862 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[GitHub] [spark] srowen closed pull request #38080: [SPARK][Structured Streaming] FIx StructuredNetworkWordCountWindowed exaple

2022-12-05 Thread GitBox
srowen closed pull request #38080: [SPARK][Structured Streaming] FIx StructuredNetworkWordCountWindowed exaple URL: https://github.com/apache/spark/pull/38080 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

[GitHub] [spark] srowen closed pull request #38085: [SPARK-40642][DOC] doc for String memory size since java>=9

2022-12-05 Thread GitBox
srowen closed pull request #38085: [SPARK-40642][DOC] doc for String memory size since java>=9 URL: https://github.com/apache/spark/pull/38085 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the spe

[GitHub] [spark] MaxGekk commented on a diff in pull request #38896: [WIP][SQL] Replace `require()` by an internal error in catalyst

2022-12-05 Thread GitBox
MaxGekk commented on code in PR #38896: URL: https://github.com/apache/spark/pull/38896#discussion_r1039699912 ## core/src/main/scala/org/apache/spark/SparkException.scala: ## @@ -87,6 +87,16 @@ object SparkException { messageParameters = Map("message" -> msg), cau

[GitHub] [spark] NarekDW closed pull request #38895: [MINOR][SQL] Get rid of redundant type cast

2022-12-05 Thread GitBox
NarekDW closed pull request #38895: [MINOR][SQL] Get rid of redundant type cast URL: https://github.com/apache/spark/pull/38895 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

[GitHub] [spark] NarekDW commented on pull request #38895: [MINOR][SQL] Get rid of redundant type cast

2022-12-05 Thread GitBox
NarekDW commented on PR #38895: URL: https://github.com/apache/spark/pull/38895#issuecomment-1337519684 > Oh, these are all in hive code. This is a copy of some code from Hive, so unless we must change something, it's simpler to leave it as-is to faciliate updating by copying new source fil

[GitHub] [spark] MaxGekk commented on a diff in pull request #38896: [WIP][SQL] Replace `require()` by an internal error in catalyst

2022-12-05 Thread GitBox
MaxGekk commented on code in PR #38896: URL: https://github.com/apache/spark/pull/38896#discussion_r1039702821 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/ArrayBasedMapData.scala: ## @@ -28,7 +29,10 @@ import org.apache.spark.util.collection.Utils * eleme

[GitHub] [spark] srielau commented on a diff in pull request #38861: [SPARK-41294][SQL] Assign a name to the error class _LEGACY_ERROR_TEMP_1203 / 1168

2022-12-05 Thread GitBox
srielau commented on code in PR #38861: URL: https://github.com/apache/spark/pull/38861#discussion_r1039709066 ## core/src/main/resources/error/error-classes.json: ## @@ -876,6 +876,13 @@ ], "sqlState" : "42000" }, + "NOT_ENOUGH_DATA_COLUMNS" : { +"message" : [

[GitHub] [spark] LuciferYang commented on pull request #38913: [SPARK-41389][CORE][SQL] Reuse `WRONG_NUM_ARGS` instead of `_LEGACY_ERROR_TEMP_1044`

2022-12-05 Thread GitBox
LuciferYang commented on PR #38913: URL: https://github.com/apache/spark/pull/38913#issuecomment-1337531860 GA passed -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To uns

[GitHub] [spark] MaxGekk commented on pull request #38913: [SPARK-41389][CORE][SQL] Reuse `WRONG_NUM_ARGS` instead of `_LEGACY_ERROR_TEMP_1044`

2022-12-05 Thread GitBox
MaxGekk commented on PR #38913: URL: https://github.com/apache/spark/pull/38913#issuecomment-1337534800 +1, LGTM. Merging to master. Thank you, @LuciferYang. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL a

[GitHub] [spark] LuciferYang commented on pull request #38913: [SPARK-41389][CORE][SQL] Reuse `WRONG_NUM_ARGS` instead of `_LEGACY_ERROR_TEMP_1044`

2022-12-05 Thread GitBox
LuciferYang commented on PR #38913: URL: https://github.com/apache/spark/pull/38913#issuecomment-1337535532 Thanks @MaxGekk -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

[GitHub] [spark] MaxGekk closed pull request #38913: [SPARK-41389][CORE][SQL] Reuse `WRONG_NUM_ARGS` instead of `_LEGACY_ERROR_TEMP_1044`

2022-12-05 Thread GitBox
MaxGekk closed pull request #38913: [SPARK-41389][CORE][SQL] Reuse `WRONG_NUM_ARGS` instead of `_LEGACY_ERROR_TEMP_1044` URL: https://github.com/apache/spark/pull/38913 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

[GitHub] [spark] LuciferYang commented on pull request #37725: [DO-NOT-MERGE] Exceptions without error classes in SQL golden files

2022-12-05 Thread GitBox
LuciferYang commented on PR #37725: URL: https://github.com/apache/spark/pull/37725#issuecomment-1337540603 @MaxGekk Have all the issues mentioned in this pr been solved? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

[GitHub] [spark] srielau commented on a diff in pull request #38861: [SPARK-41294][SQL] Assign a name to the error class _LEGACY_ERROR_TEMP_1203 / 1168

2022-12-05 Thread GitBox
srielau commented on code in PR #38861: URL: https://github.com/apache/spark/pull/38861#discussion_r1039709066 ## core/src/main/resources/error/error-classes.json: ## @@ -876,6 +876,13 @@ ], "sqlState" : "42000" }, + "NOT_ENOUGH_DATA_COLUMNS" : { +"message" : [

[GitHub] [spark] MaxGekk commented on pull request #37725: [DO-NOT-MERGE] Exceptions without error classes in SQL golden files

2022-12-05 Thread GitBox
MaxGekk commented on PR #37725: URL: https://github.com/apache/spark/pull/37725#issuecomment-1337553614 > @MaxGekk Have all the issues mentioned in this pr been solved? Let me rebase it on the recent master and regenerate golden files, then we will see. -- This is an automated

[GitHub] [spark] tedyu commented on pull request #38902: [SPARK-41136][K8S][FOLLOW-ON] Adjust graceful shutdown time of ExecutorPodsSnapshotsStoreImpl according to hadoop config

2022-12-05 Thread GitBox
tedyu commented on PR #38902: URL: https://github.com/apache/spark/pull/38902#issuecomment-1337558157 We can define `KUBERNETES_EXECUTOR_SNAPSHOTS_SUBSCRIBERS_GRACE_PERIOD` as the percentage based on the value of `hadoop.service.shutdown.timeout`. In @pan3793 's PR, the percentage is 30%.

[GitHub] [spark] Ngone51 commented on a diff in pull request #38901: [SPARK-41376][CORE] Correct the Netty preferDirectBufs check logic on executor start

2022-12-05 Thread GitBox
Ngone51 commented on code in PR #38901: URL: https://github.com/apache/spark/pull/38901#discussion_r1039732976 ## core/src/main/scala/org/apache/spark/executor/CoarseGrainedExecutorBackend.scala: ## @@ -85,7 +85,19 @@ private[spark] class CoarseGrainedExecutorBackend( log

[GitHub] [spark] LuciferYang opened a new pull request, #38918: [SPARK-41393][BUILD] Upgrade slf4j to 2.0.5

2022-12-05 Thread GitBox
LuciferYang opened a new pull request, #38918: URL: https://github.com/apache/spark/pull/38918 ### What changes were proposed in this pull request? This pr aims upgrade slf4j related dependencies from 2.0.4 to 2.0.5. ### Why are the changes needed? A version add SecurityM

[GitHub] [spark] xinrong-meng commented on a diff in pull request #38891: [SPARK-41372][CONNECT][PYTHON] Implement DataFrame TempView

2022-12-05 Thread GitBox
xinrong-meng commented on code in PR #38891: URL: https://github.com/apache/spark/pull/38891#discussion_r1039853137 ## python/pyspark/sql/tests/connect/test_connect_basic.py: ## @@ -498,11 +498,23 @@ def test_create_global_temp_view(self): self.connect.sql("SELECT 2

[GitHub] [spark] MaxGekk opened a new pull request, #38919: [SPARK-40419][SQL][TESTS][FOLLOWUP] Remove results/udaf.sql.out

2022-12-05 Thread GitBox
MaxGekk opened a new pull request, #38919: URL: https://github.com/apache/spark/pull/38919 ### What changes were proposed in this pull request? Remove the file `results/udaf.sql.out` because it is not generated anymore after https://github.com/apache/spark/pull/37873. ### Why are t

[GitHub] [spark] MaxGekk commented on pull request #38919: [SPARK-40419][SQL][TESTS][FOLLOWUP] Remove results/udaf.sql.out

2022-12-05 Thread GitBox
MaxGekk commented on PR #38919: URL: https://github.com/apache/spark/pull/38919#issuecomment-1337736363 @itholic Could you look at this PR since it is related to your PR (reviewed by @HyukjinKwon). -- This is an automated message from the Apache Git Service. To respond to the message, ple

[GitHub] [spark] MaxGekk commented on a diff in pull request #37725: [DO-NOT-MERGE] Exceptions without error classes in SQL golden files

2022-12-05 Thread GitBox
MaxGekk commented on code in PR #37725: URL: https://github.com/apache/spark/pull/37725#discussion_r1039856927 ## sql/core/src/test/resources/sql-tests/results/udaf.sql.out: ## @@ -31,7 +31,12 @@ SELECT default.myDoubleAvg(int_col1, 3) as my_avg from t1 struct<> -- !query outp

[GitHub] [spark] xinrong-meng commented on a diff in pull request #38915: [SPARK-41382][CONNECT][PYTHON] Implement `product` function

2022-12-05 Thread GitBox
xinrong-meng commented on code in PR #38915: URL: https://github.com/apache/spark/pull/38915#discussion_r1039865556 ## connector/connect/src/main/scala/org/apache/spark/sql/connect/planner/SparkConnectPlanner.scala: ## @@ -539,6 +539,15 @@ class SparkConnectPlanner(session: Spar

[GitHub] [spark] amaliujia commented on pull request #38889: [SPARK-41369][CONNECT][BUILD] Split connect project into common and server projects

2022-12-05 Thread GitBox
amaliujia commented on PR #38889: URL: https://github.com/apache/spark/pull/38889#issuecomment-1337766852 +1 to do this refactoring. With the proto split out, clients that need to depend on proto now will only depend on proto(not including server which is not the case before this PR). --

[GitHub] [spark] xinrong-meng closed pull request #38891: [SPARK-41372][CONNECT][PYTHON] Implement DataFrame TempView

2022-12-05 Thread GitBox
xinrong-meng closed pull request #38891: [SPARK-41372][CONNECT][PYTHON] Implement DataFrame TempView URL: https://github.com/apache/spark/pull/38891 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to t

[GitHub] [spark] xinrong-meng commented on pull request #38891: [SPARK-41372][CONNECT][PYTHON] Implement DataFrame TempView

2022-12-05 Thread GitBox
xinrong-meng commented on PR #38891: URL: https://github.com/apache/spark/pull/38891#issuecomment-1337773738 Merged to master, thanks! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] amaliujia commented on a diff in pull request #38915: [SPARK-41382][CONNECT][PYTHON] Implement `product` function

2022-12-05 Thread GitBox
amaliujia commented on code in PR #38915: URL: https://github.com/apache/spark/pull/38915#discussion_r1039881972 ## connector/connect/src/main/scala/org/apache/spark/sql/connect/planner/SparkConnectPlanner.scala: ## @@ -539,6 +539,15 @@ class SparkConnectPlanner(session: SparkSe

[GitHub] [spark] amaliujia commented on a diff in pull request #38914: [SPARK-41381][CONNECT][PYTHON] Implement `count_distinct` and `sum_distinct` functions

2022-12-05 Thread GitBox
amaliujia commented on code in PR #38914: URL: https://github.com/apache/spark/pull/38914#discussion_r1039886159 ## connector/connect/src/main/protobuf/spark/connect/expressions.proto: ## @@ -141,11 +141,14 @@ message Expression { // (Optional) Function arguments. Empty arg

[GitHub] [spark] amaliujia commented on a diff in pull request #38914: [SPARK-41381][CONNECT][PYTHON] Implement `count_distinct` and `sum_distinct` functions

2022-12-05 Thread GitBox
amaliujia commented on code in PR #38914: URL: https://github.com/apache/spark/pull/38914#discussion_r1039886159 ## connector/connect/src/main/protobuf/spark/connect/expressions.proto: ## @@ -141,11 +141,14 @@ message Expression { // (Optional) Function arguments. Empty arg

[GitHub] [spark] dongjoon-hyun closed pull request #38919: [SPARK-40419][SQL][TESTS][FOLLOWUP] Remove results/udaf.sql.out

2022-12-05 Thread GitBox
dongjoon-hyun closed pull request #38919: [SPARK-40419][SQL][TESTS][FOLLOWUP] Remove results/udaf.sql.out URL: https://github.com/apache/spark/pull/38919 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

[GitHub] [spark] dongjoon-hyun commented on pull request #38919: [SPARK-40419][SQL][TESTS][FOLLOWUP] Remove results/udaf.sql.out

2022-12-05 Thread GitBox
dongjoon-hyun commented on PR #38919: URL: https://github.com/apache/spark/pull/38919#issuecomment-1337839622 I verified manually. We use `results/udaf/udaf.sql.out` instead of `results/udaf.sql.out`. Merged to master. -- This is an automated message from the Apache Git Service. To res

[GitHub] [spark] vinodkc commented on pull request #38146: [SPARK-40687][SQL] Support data masking built-in function 'mask'

2022-12-05 Thread GitBox
vinodkc commented on PR #38146: URL: https://github.com/apache/spark/pull/38146#issuecomment-1337857038 @HyukjinKwon , @dtenedor , Can you please check this PR? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[GitHub] [spark] pan3793 commented on a diff in pull request #38901: [SPARK-41376][CORE] Correct the Netty preferDirectBufs check logic on executor start

2022-12-05 Thread GitBox
pan3793 commented on code in PR #38901: URL: https://github.com/apache/spark/pull/38901#discussion_r1039928534 ## core/src/main/scala/org/apache/spark/executor/CoarseGrainedExecutorBackend.scala: ## @@ -85,7 +85,19 @@ private[spark] class CoarseGrainedExecutorBackend( log

[GitHub] [spark] dtenedor commented on pull request #38146: [SPARK-40687][SQL] Support data masking built-in function 'mask'

2022-12-05 Thread GitBox
dtenedor commented on PR #38146: URL: https://github.com/apache/spark/pull/38146#issuecomment-1337916285 @vinodkc Yes, I said the change LGTM :) sadly I am unable to merge this PR on my own though. We will need @HyukjinKwon to merge it. -- This is an automated message from the Apache Git

[GitHub] [spark] dongjoon-hyun opened a new pull request, #38920: [SPARK-41394][PYTHON][TESTS] Skip MemoryProfilerTests when pandas is not installed

2022-12-05 Thread GitBox
dongjoon-hyun opened a new pull request, #38920: URL: https://github.com/apache/spark/pull/38920 … ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change?

[GitHub] [spark] dongjoon-hyun commented on pull request #38920: [SPARK-41394][PYTHON][TESTS] Skip `MemoryProfilerTests` when pandas is not installed

2022-12-05 Thread GitBox
dongjoon-hyun commented on PR #38920: URL: https://github.com/apache/spark/pull/38920#issuecomment-1338005526 Thank you, @ueshin . -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific com

[GitHub] [spark] pan3793 commented on pull request #38902: [SPARK-41136][K8S][FOLLOW-ON] Adjust graceful shutdown time of ExecutorPodsSnapshotsStoreImpl according to hadoop config

2022-12-05 Thread GitBox
pan3793 commented on PR #38902: URL: https://github.com/apache/spark/pull/38902#issuecomment-1338013789 Usually, it's a good idea to make the default value of configurations adaptive, but I'm not sure about this one. (choosing the 20s as the default value is because I don't want to change t

[GitHub] [spark] MaxGekk commented on pull request #37725: [DO-NOT-MERGE] Exceptions without error classes in SQL golden files

2022-12-05 Thread GitBox
MaxGekk commented on PR #37725: URL: https://github.com/apache/spark/pull/37725#issuecomment-1338028956 All issues have been fixed. At the moment, all sql tests (in *.sql files) raise exception with error classes. I would propose to merge this change to detect exceptions that are not ported

[GitHub] [spark] grundprinzip commented on a diff in pull request #38915: [SPARK-41382][CONNECT][PYTHON] Implement `product` function

2022-12-05 Thread GitBox
grundprinzip commented on code in PR #38915: URL: https://github.com/apache/spark/pull/38915#discussion_r1039998516 ## connector/connect/src/main/scala/org/apache/spark/sql/connect/planner/SparkConnectPlanner.scala: ## @@ -539,6 +539,15 @@ class SparkConnectPlanner(session: Spar

[GitHub] [spark] dongjoon-hyun commented on pull request #37821: [SPARK-40379][K8S] Propagate decommission executor loss reason in K8s

2022-12-05 Thread GitBox
dongjoon-hyun commented on PR #37821: URL: https://github.com/apache/spark/pull/37821#issuecomment-1338036777 No problem at all. Have a nice vacation, @holdenk ;) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the U

[GitHub] [spark] xinrong-meng opened a new pull request, #38921: [WIP] Implement String/Binary functions

2022-12-05 Thread GitBox
xinrong-meng opened a new pull request, #38921: URL: https://github.com/apache/spark/pull/38921 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### How was this p

[GitHub] [spark] tedyu commented on pull request #38902: [SPARK-41136][K8S][FOLLOW-ON] Adjust graceful shutdown time of ExecutorPodsSnapshotsStoreImpl according to hadoop config

2022-12-05 Thread GitBox
tedyu commented on PR #38902: URL: https://github.com/apache/spark/pull/38902#issuecomment-1338070280 @pan3793 Thanks for sharing the background. The current formation of this PR goes along with your change. If `hadoop.service.shutdown.timeout` is lowered, we want to use smaller

[GitHub] [spark] MaxGekk commented on a diff in pull request #38664: [SPARK-41147][SQL] Assign a name to the legacy error class `_LEGACY_ERROR_TEMP_1042`

2022-12-05 Thread GitBox
MaxGekk commented on code in PR #38664: URL: https://github.com/apache/spark/pull/38664#discussion_r1040017671 ## sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryCompilationErrors.scala: ## @@ -637,17 +637,21 @@ private[sql] object QueryCompilationErrors extends Qu

[GitHub] [spark] gengliangwang commented on a diff in pull request #38779: [SPARK-41244][UI] Introducing a Protobuf serializer for UI data on KV store

2022-12-05 Thread GitBox
gengliangwang commented on code in PR #38779: URL: https://github.com/apache/spark/pull/38779#discussion_r1040029861 ## core/src/main/protobuf/org/apache/spark/status/protobuf/store_types.proto: ## @@ -0,0 +1,60 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under o

[GitHub] [spark] gengliangwang commented on a diff in pull request #38779: [SPARK-41244][UI] Introducing a Protobuf serializer for UI data on KV store

2022-12-05 Thread GitBox
gengliangwang commented on code in PR #38779: URL: https://github.com/apache/spark/pull/38779#discussion_r1040029497 ## core/src/main/protobuf/org/apache/spark/status/protobuf/store_types.proto: ## @@ -0,0 +1,60 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under o

[GitHub] [spark] anchovYu commented on a diff in pull request #38776: [SPARK-27561][SQL] Support implicit lateral column alias resolution on Project and refactor Analyzer

2022-12-05 Thread GitBox
anchovYu commented on code in PR #38776: URL: https://github.com/apache/spark/pull/38776#discussion_r1040031835 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala: ## @@ -258,6 +417,17 @@ class Analyzer(override val catalogManager: CatalogManage

[GitHub] [spark] xkrogen commented on pull request #38864: [SPARK-41271][SQL] Support parameterized SQL queries by `sql()`

2022-12-05 Thread GitBox
xkrogen commented on PR #38864: URL: https://github.com/apache/spark/pull/38864#issuecomment-1338091122 I managed to find a better SQL standard reference in the form of _SQL: The Complete Reference_ (2003), which has an entire chapter devoted to Dynamic SQL (beginning from page 547). You ar

[GitHub] [spark] MaxGekk commented on pull request #38916: [SPARK-41390][SQL] Update the script used to generate `register` function in `UDFRegistration`

2022-12-05 Thread GitBox
MaxGekk commented on PR #38916: URL: https://github.com/apache/spark/pull/38916#issuecomment-1338103769 +1, LGTM. Merging to master. Thank you, @LuciferYang. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL a

[GitHub] [spark] MaxGekk closed pull request #38916: [SPARK-41390][SQL] Update the script used to generate `register` function in `UDFRegistration`

2022-12-05 Thread GitBox
MaxGekk closed pull request #38916: [SPARK-41390][SQL] Update the script used to generate `register` function in `UDFRegistration` URL: https://github.com/apache/spark/pull/38916 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub an

[GitHub] [spark] amaliujia commented on pull request #38899: [SPARK-41349][CONNECT] Implement DataFrame.hint

2022-12-05 Thread GitBox
amaliujia commented on PR #38899: URL: https://github.com/apache/spark/pull/38899#issuecomment-1338112177 @cloud-fan can you take a look? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the speci

[GitHub] [spark] MaxGekk commented on a diff in pull request #38896: [WIP][SQL] Replace `require()` by an internal error in catalyst

2022-12-05 Thread GitBox
MaxGekk commented on code in PR #38896: URL: https://github.com/apache/spark/pull/38896#discussion_r1040051601 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/ArrayBasedMapData.scala: ## @@ -26,6 +26,9 @@ import org.apache.spark.util.collection.Utils * * No

[GitHub] [spark] gengliangwang commented on a diff in pull request #38146: [SPARK-40687][SQL] Support data masking built-in function 'mask'

2022-12-05 Thread GitBox
gengliangwang commented on code in PR #38146: URL: https://github.com/apache/spark/pull/38146#discussion_r1040051982 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/maskExpressions.scala: ## @@ -0,0 +1,271 @@ +/* + * Licensed to the Apache Software Found

[GitHub] [spark] vinodkc commented on a diff in pull request #38146: [SPARK-40687][SQL] Support data masking built-in function 'mask'

2022-12-05 Thread GitBox
vinodkc commented on code in PR #38146: URL: https://github.com/apache/spark/pull/38146#discussion_r1040069687 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/maskExpressions.scala: ## @@ -0,0 +1,271 @@ +/* + * Licensed to the Apache Software Foundation

[GitHub] [spark] vinodkc commented on a diff in pull request #38146: [SPARK-40687][SQL] Support data masking built-in function 'mask'

2022-12-05 Thread GitBox
vinodkc commented on code in PR #38146: URL: https://github.com/apache/spark/pull/38146#discussion_r1040069687 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/maskExpressions.scala: ## @@ -0,0 +1,271 @@ +/* + * Licensed to the Apache Software Foundation

[GitHub] [spark] SandishKumarHN opened a new pull request, #38922: [SPARK-41396][SQL][PROTOBUF] OneOf field support and recursion checks

2022-12-05 Thread GitBox
SandishKumarHN opened a new pull request, #38922: URL: https://github.com/apache/spark/pull/38922 Oneof fields allow a message to contain one and only one of a defined set of field types, while recursive fields provide a way to define messages that can refer to themselves, allowing for the

[GitHub] [spark] SandishKumarHN commented on a diff in pull request #38922: [SPARK-41396][SQL][PROTOBUF] OneOf field support and recursion checks

2022-12-05 Thread GitBox
SandishKumarHN commented on code in PR #38922: URL: https://github.com/apache/spark/pull/38922#discussion_r1040138657 ## connector/protobuf/src/main/scala/org/apache/spark/sql/protobuf/utils/SchemaConverters.scala: ## @@ -92,9 +92,13 @@ object SchemaConverters { Map

[GitHub] [spark] SandishKumarHN commented on a diff in pull request #38922: [SPARK-41396][SQL][PROTOBUF] OneOf field support and recursion checks

2022-12-05 Thread GitBox
SandishKumarHN commented on code in PR #38922: URL: https://github.com/apache/spark/pull/38922#discussion_r1040138657 ## connector/protobuf/src/main/scala/org/apache/spark/sql/protobuf/utils/SchemaConverters.scala: ## @@ -92,9 +92,13 @@ object SchemaConverters { Map

[GitHub] [spark] amaliujia commented on a diff in pull request #38883: [SPARK-41366][CONNECT] DF.groupby.agg() should be compatible

2022-12-05 Thread GitBox
amaliujia commented on code in PR #38883: URL: https://github.com/apache/spark/pull/38883#discussion_r1040141502 ## python/pyspark/sql/connect/dataframe.py: ## @@ -55,8 +59,109 @@ def __init__(self, df: "DataFrame", *grouping_cols: Union[Column, str]) -> None: self._df

[GitHub] [spark] amaliujia commented on a diff in pull request #38883: [SPARK-41366][CONNECT] DF.groupby.agg() should be compatible

2022-12-05 Thread GitBox
amaliujia commented on code in PR #38883: URL: https://github.com/apache/spark/pull/38883#discussion_r1040141502 ## python/pyspark/sql/connect/dataframe.py: ## @@ -55,8 +59,109 @@ def __init__(self, df: "DataFrame", *grouping_cols: Union[Column, str]) -> None: self._df

  1   2   3   >