[GitHub] [spark] LuciferYang commented on pull request #39683: [SPARK-42144][CORE][SQL] Handle null string values in StageDataWrapper/StreamBlockData/StreamingQueryData

2023-01-22 Thread via GitHub
LuciferYang commented on PR #39683: URL: https://github.com/apache/spark/pull/39683#issuecomment-1399721161 thanks @gengliangwang -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] LuciferYang commented on pull request #39684: [SPARK-42140][CORE] Handle null string values in ApplicationEnvironmentInfoWrapper/ApplicationInfoWrapper

2023-01-22 Thread via GitHub
LuciferYang commented on PR #39684: URL: https://github.com/apache/spark/pull/39684#issuecomment-1399721398 thanks @gengliangwang -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] grundprinzip commented on a diff in pull request #39695: [SPARK-42156] SparkConnectClient supports RetryPolicies now

2023-01-22 Thread via GitHub
grundprinzip commented on code in PR #39695: URL: https://github.com/apache/spark/pull/39695#discussion_r1083641679 ## python/pyspark/sql/connect/client.py: ## @@ -640,6 +669,136 @@ def _handle_error(self, rpc_error: grpc.RpcError) -> NoReturn: raise

[GitHub] [spark] grundprinzip commented on a diff in pull request #39695: [SPARK-42156] SparkConnectClient supports RetryPolicies now

2023-01-22 Thread via GitHub
grundprinzip commented on code in PR #39695: URL: https://github.com/apache/spark/pull/39695#discussion_r1083642191 ## python/pyspark/sql/connect/client.py: ## @@ -640,6 +669,136 @@ def _handle_error(self, rpc_error: grpc.RpcError) -> NoReturn: raise

[GitHub] [spark] mridulm commented on a diff in pull request #39703: [SPARK-42157][CORE] `spark.scheduler.mode=FAIR` should provide FAIR scheduler

2023-01-22 Thread via GitHub
mridulm commented on code in PR #39703: URL: https://github.com/apache/spark/pull/39703#discussion_r1083660245 ## conf/fairscheduler-default.xml.template: ## @@ -0,0 +1,26 @@ + + + + + + +FAIR +1 +0 + + Review Comment: There is a

[GitHub] [spark] itholic commented on a diff in pull request #39695: [SPARK-42156] SparkConnectClient supports RetryPolicies now

2023-01-22 Thread via GitHub
itholic commented on code in PR #39695: URL: https://github.com/apache/spark/pull/39695#discussion_r1083661917 ## python/pyspark/sql/connect/client.py: ## @@ -365,6 +385,15 @@ def __init__( # Parse the connection string. self._builder =

[GitHub] [spark] itholic commented on pull request #39501: [SPARK-41295][SPARK-41296][SQL] Rename the error classes

2023-01-22 Thread via GitHub
itholic commented on PR #39501: URL: https://github.com/apache/spark/pull/39501#issuecomment-1399836266 @srowen Could you happen to help creating JIRA account for @NarekDW when you find some time?? -- This is an automated message from the Apache Git Service. To respond to the message,

[GitHub] [spark] itholic commented on pull request #39501: [SPARK-41295][SPARK-41296][SQL] Rename the error classes

2023-01-22 Thread via GitHub
itholic commented on PR #39501: URL: https://github.com/apache/spark/pull/39501#issuecomment-1399836177 Oh, I just submit a PR for SPARK-41488, so please take a look SPARK-41302 when you have some time. -- This is an automated message from the Apache Git Service. To respond to the

[GitHub] [spark] LuciferYang commented on a diff in pull request #39555: [SPARK-42051][SQL] Codegen Support for HiveGenericUDF

2023-01-22 Thread via GitHub
LuciferYang commented on code in PR #39555: URL: https://github.com/apache/spark/pull/39555#discussion_r1083674070 ## sql/hive/src/main/scala/org/apache/spark/sql/hive/hiveUDFs.scala: ## @@ -154,17 +154,19 @@ private[hive] case class HiveGenericUDF(

[GitHub] [spark] viirya commented on a diff in pull request #39508: [SPARK-41985][SQL] Centralize more column resolution rules

2023-01-22 Thread via GitHub
viirya commented on code in PR #39508: URL: https://github.com/apache/spark/pull/39508#discussion_r1083695957 ## sql/core/src/test/resources/sql-tests/inputs/group-by.sql: ## @@ -45,6 +45,15 @@ SELECT COUNT(DISTINCT b), COUNT(DISTINCT b, c) FROM (SELECT 1 AS a, 2 AS b, 3 AS

[GitHub] [spark] LuciferYang commented on a diff in pull request #39642: [SPARK-41677][CORE][SQL][SS][UI] Add Protobuf serializer for `StreamingQueryProgressWrapper`

2023-01-22 Thread via GitHub
LuciferYang commented on code in PR #39642: URL: https://github.com/apache/spark/pull/39642#discussion_r1083634888 ## sql/core/src/test/scala/org/apache/spark/status/protobuf/sql/KVStoreProtobufSerializerSuite.scala: ## @@ -271,4 +278,254 @@ class KVStoreProtobufSerializerSuite

[GitHub] [spark] LuciferYang commented on a diff in pull request #39642: [SPARK-41677][CORE][SQL][SS][UI] Add Protobuf serializer for `StreamingQueryProgressWrapper`

2023-01-22 Thread via GitHub
LuciferYang commented on code in PR #39642: URL: https://github.com/apache/spark/pull/39642#discussion_r1083634709 ## sql/core/src/test/scala/org/apache/spark/status/protobuf/sql/KVStoreProtobufSerializerSuite.scala: ## @@ -271,4 +278,254 @@ class KVStoreProtobufSerializerSuite

[GitHub] [spark] itholic commented on a diff in pull request #39691: [SPARK-31561][SQL] Add QUALIFY clause

2023-01-22 Thread via GitHub
itholic commented on code in PR #39691: URL: https://github.com/apache/spark/pull/39691#discussion_r1083665941 ## sql/core/src/test/resources/sql-tests/results/window.sql.out: ## @@ -1342,3 +1342,139 @@ org.apache.spark.sql.AnalysisException "windowName" : "w" } } + +

[GitHub] [spark] itholic opened a new pull request, #39705: [SPARK-41488][SQL] Assign name to _LEGACY_ERROR_TEMP_1176

2023-01-22 Thread via GitHub
itholic opened a new pull request, #39705: URL: https://github.com/apache/spark/pull/39705 ### What changes were proposed in this pull request? This PR proposes to assign name to _LEGACY_ERROR_TEMP_1176, "INCOMPATIBLE_VIEW_SCHEMA_CHANGE". ### Why are the changes

[GitHub] [spark] imhunterand commented on pull request #39566: Patched()Fix Protobuf Java vulnerable to Uncontrolled Resource Consumption

2023-01-22 Thread via GitHub
imhunterand commented on PR #39566: URL: https://github.com/apache/spark/pull/39566#issuecomment-1399849534 **Hi!** @everyone @apache any update is last week's ago for waited fixed. could you `merged` this pull-request as fixed/patched. Kind regards, -- This is an automated

[GitHub] [spark] dongjoon-hyun commented on pull request #39703: [SPARK-42157][CORE] `spark.scheduler.mode=FAIR` should provide FAIR scheduler

2023-01-22 Thread via GitHub
dongjoon-hyun commented on PR #39703: URL: https://github.com/apache/spark/pull/39703#issuecomment-1399888545 No problem. I totally understand your concern on the usage of template file. I'll also think about a new way. Thank you for your thoughtful review, @mridulm . -- This is an

[GitHub] [spark] viirya commented on a diff in pull request #39508: [SPARK-41985][SQL] Centralize more column resolution rules

2023-01-22 Thread via GitHub
viirya commented on code in PR #39508: URL: https://github.com/apache/spark/pull/39508#discussion_r1083698974 ## sql/core/src/test/resources/sql-tests/inputs/group-by.sql: ## @@ -45,6 +45,15 @@ SELECT COUNT(DISTINCT b), COUNT(DISTINCT b, c) FROM (SELECT 1 AS a, 2 AS b, 3 AS

[GitHub] [spark] grundprinzip commented on a diff in pull request #39695: [SPARK-42156] SparkConnectClient supports RetryPolicies now

2023-01-22 Thread via GitHub
grundprinzip commented on code in PR #39695: URL: https://github.com/apache/spark/pull/39695#discussion_r1083642374 ## python/pyspark/sql/tests/connect/test_connect_basic.py: ## @@ -2591,6 +2591,73 @@ def test_unsupported_io_functions(self): getattr(df.write,

[GitHub] [spark] grundprinzip commented on a diff in pull request #39695: [SPARK-42156] SparkConnectClient supports RetryPolicies now

2023-01-22 Thread via GitHub
grundprinzip commented on code in PR #39695: URL: https://github.com/apache/spark/pull/39695#discussion_r1083643160 ## python/pyspark/sql/connect/client.py: ## @@ -531,12 +560,16 @@ def _analyze(self, plan: pb2.Plan, explain_mode: str = "extended") -> AnalyzeRes

[GitHub] [spark] LuciferYang commented on a diff in pull request #39642: [SPARK-41677][CORE][SQL][SS][UI] Add Protobuf serializer for `StreamingQueryProgressWrapper`

2023-01-22 Thread via GitHub
LuciferYang commented on code in PR #39642: URL: https://github.com/apache/spark/pull/39642#discussion_r1083643178 ## core/src/main/protobuf/org/apache/spark/status/protobuf/store_types.proto: ## @@ -765,3 +765,54 @@ message PoolData { optional string name = 1; repeated

[GitHub] [spark] grundprinzip commented on a diff in pull request #39695: [SPARK-42156] SparkConnectClient supports RetryPolicies now

2023-01-22 Thread via GitHub
grundprinzip commented on code in PR #39695: URL: https://github.com/apache/spark/pull/39695#discussion_r1083643562 ## python/pyspark/sql/connect/client.py: ## @@ -365,6 +385,15 @@ def __init__( # Parse the connection string. self._builder =

[GitHub] [spark] itholic commented on pull request #39705: [SPARK-41488][SQL] Assign name to _LEGACY_ERROR_TEMP_1176 (and 1177)

2023-01-22 Thread via GitHub
itholic commented on PR #39705: URL: https://github.com/apache/spark/pull/39705#issuecomment-1399864027 I referred to code path in `sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/unresolved.scala` below: ```scala case class GetViewColumnByNameAndOrdinal(

[GitHub] [spark] itholic opened a new pull request, #39706: [SPARK-42158][SQL] Integrate `_LEGACY_ERROR_TEMP_1003` into `FIELD_NOT_FOUND`

2023-01-22 Thread via GitHub
itholic opened a new pull request, #39706: URL: https://github.com/apache/spark/pull/39706 ### What changes were proposed in this pull request? This PR proposes to integrate `_LEGACY_ERROR_TEMP_1003` into `FIELD_NOT_FOUND` ### Why are the changes needed? We should

[GitHub] [spark] wangyum commented on a diff in pull request #39691: [SPARK-31561][SQL] Add QUALIFY clause

2023-01-22 Thread via GitHub
wangyum commented on code in PR #39691: URL: https://github.com/apache/spark/pull/39691#discussion_r1083632654 ## docs/sql-ref-syntax-qry-select-qualify.md: ## @@ -0,0 +1,98 @@ +--- +layout: global +title: QUALIFY Clause +displayTitle: QUALIFY Clause +license: | + Licensed to

[GitHub] [spark] itholic commented on pull request #39702: [SPARK-41487][SQL] Assign name to _LEGACY_ERROR_TEMP_1020

2023-01-22 Thread via GitHub
itholic commented on PR #39702: URL: https://github.com/apache/spark/pull/39702#issuecomment-1399797614 cc @MaxGekk @srielau @cloud-fan -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] itholic commented on pull request #39701: [SPARK-41489][SQL] Assign name to _LEGACY_ERROR_TEMP_2415

2023-01-22 Thread via GitHub
itholic commented on PR #39701: URL: https://github.com/apache/spark/pull/39701#issuecomment-1399797647 cc @MaxGekk @srielau @cloud-fan -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] itholic commented on pull request #39700: [SPARK-41490][SQL] Assign name to _LEGACY_ERROR_TEMP_2441

2023-01-22 Thread via GitHub
itholic commented on PR #39700: URL: https://github.com/apache/spark/pull/39700#issuecomment-1399797488 cc @MaxGekk @srielau @cloud-fan -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] itholic commented on pull request #39705: [SPARK-41488][SQL] Assign name to _LEGACY_ERROR_TEMP_1176

2023-01-22 Thread via GitHub
itholic commented on PR #39705: URL: https://github.com/apache/spark/pull/39705#issuecomment-1399837315 cc @srielau @MaxGekk @cloud-fan -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] dongjoon-hyun commented on a diff in pull request #39703: [SPARK-42157][CORE] `spark.scheduler.mode=FAIR` should provide FAIR scheduler

2023-01-22 Thread via GitHub
dongjoon-hyun commented on code in PR #39703: URL: https://github.com/apache/spark/pull/39703#discussion_r1083692686 ## core/src/main/scala/org/apache/spark/scheduler/SchedulableBuilder.scala: ## @@ -86,10 +87,17 @@ private[spark] class FairSchedulableBuilder(val rootPool:

[GitHub] [spark] LuciferYang commented on a diff in pull request #39555: [SPARK-42051][SQL] Codegen Support for HiveGenericUDF

2023-01-22 Thread via GitHub
LuciferYang commented on code in PR #39555: URL: https://github.com/apache/spark/pull/39555#discussion_r1083687701 ## sql/hive/src/main/scala/org/apache/spark/sql/hive/hiveUDFs.scala: ## @@ -192,6 +194,48 @@ private[hive] case class HiveGenericUDF( override protected def

[GitHub] [spark] wangyum commented on a diff in pull request #39691: [SPARK-31561][SQL] Add QUALIFY clause

2023-01-22 Thread via GitHub
wangyum commented on code in PR #39691: URL: https://github.com/apache/spark/pull/39691#discussion_r1083632796 ## sql/core/src/test/resources/sql-tests/results/window.sql.out: ## @@ -1342,3 +1342,139 @@ org.apache.spark.sql.AnalysisException "windowName" : "w" } } + +

[GitHub] [spark] grundprinzip commented on a diff in pull request #39695: [SPARK-42156] SparkConnectClient supports RetryPolicies now

2023-01-22 Thread via GitHub
grundprinzip commented on code in PR #39695: URL: https://github.com/apache/spark/pull/39695#discussion_r1083641243 ## python/pyspark/sql/connect/client.py: ## @@ -567,54 +602,48 @@ def _execute_and_fetch( logger.info("ExecuteAndFetch") m:

[GitHub] [spark] grundprinzip commented on a diff in pull request #39695: [SPARK-42156] SparkConnectClient supports RetryPolicies now

2023-01-22 Thread via GitHub
grundprinzip commented on code in PR #39695: URL: https://github.com/apache/spark/pull/39695#discussion_r1083641323 ## python/pyspark/sql/connect/client.py: ## @@ -567,54 +602,48 @@ def _execute_and_fetch( logger.info("ExecuteAndFetch") m:

[GitHub] [spark] grundprinzip commented on a diff in pull request #39695: [SPARK-42156] SparkConnectClient supports RetryPolicies now

2023-01-22 Thread via GitHub
grundprinzip commented on code in PR #39695: URL: https://github.com/apache/spark/pull/39695#discussion_r1083641393 ## python/pyspark/sql/connect/client.py: ## @@ -567,54 +602,48 @@ def _execute_and_fetch( logger.info("ExecuteAndFetch") m:

[GitHub] [spark] grundprinzip commented on a diff in pull request #39695: [SPARK-42156] SparkConnectClient supports RetryPolicies now

2023-01-22 Thread via GitHub
grundprinzip commented on code in PR #39695: URL: https://github.com/apache/spark/pull/39695#discussion_r1083640867 ## python/pyspark/sql/connect/client.py: ## @@ -531,12 +560,16 @@ def _analyze(self, plan: pb2.Plan, explain_mode: str = "extended") -> AnalyzeRes

[GitHub] [spark] beliefer commented on a diff in pull request #39660: [SPARK-42128][SQL] Support TOP (N) for MS SQL Server dialect as an alternative to Limit pushdown

2023-01-22 Thread via GitHub
beliefer commented on code in PR #39660: URL: https://github.com/apache/spark/pull/39660#discussion_r1083729512 ## sql/core/src/main/scala/org/apache/spark/sql/jdbc/JdbcDialects.scala: ## @@ -544,6 +544,14 @@ abstract class JdbcDialect extends Serializable with Logging {

[GitHub] [spark] dongjoon-hyun opened a new pull request, #39704: [MINOR][DOCS] Add all supported resource managers in `Scheduling Within an Application` section

2023-01-22 Thread via GitHub
dongjoon-hyun opened a new pull request, #39704: URL: https://github.com/apache/spark/pull/39704 … ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change?

[GitHub] [spark] LuciferYang commented on pull request #39694: [SPARK-42152][BUILD][CORE][SQL][PYTHON][PROTOBUF] Use `_` instead of `-` for relocation package name

2023-01-22 Thread via GitHub
LuciferYang commented on PR #39694: URL: https://github.com/apache/spark/pull/39694#issuecomment-1399724327 @itholic Thanks for your suggestion, pr description has been updated -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

[GitHub] [spark] LuciferYang commented on pull request #39694: [SPARK-42152][BUILD][CORE][SQL][PYTHON][PROTOBUF] Use `_` instead of `-` for relocation package name

2023-01-22 Thread via GitHub
LuciferYang commented on PR #39694: URL: https://github.com/apache/spark/pull/39694#issuecomment-1399724538 also cc @srowen @dongjoon-hyun @HyukjinKwon -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[GitHub] [spark] dongjoon-hyun commented on a diff in pull request #39703: [SPARK-42157][CORE] `spark.scheduler.mode=FAIR` should provide FAIR scheduler

2023-01-22 Thread via GitHub
dongjoon-hyun commented on code in PR #39703: URL: https://github.com/apache/spark/pull/39703#discussion_r1083690034 ## conf/fairscheduler-default.xml.template: ## @@ -0,0 +1,26 @@ + + + + + + +FAIR +1 +0 + + Review Comment: This is not for testing, @mridulm

[GitHub] [spark] purple-dude commented on pull request #30889: [SPARK-33398] Fix loading tree models prior to Spark 3.0

2023-01-22 Thread via GitHub
purple-dude commented on PR #30889: URL: https://github.com/apache/spark/pull/30889#issuecomment-1399881202 Hi All, I have trained a random forest model in pyspark version 2.4 but I am unable to reload it in pyspark version 3.0.3 but it gives below error :

[GitHub] [spark] mridulm commented on pull request #39703: [SPARK-42157][CORE] `spark.scheduler.mode=FAIR` should provide FAIR scheduler

2023-01-22 Thread via GitHub
mridulm commented on PR #39703: URL: https://github.com/apache/spark/pull/39703#issuecomment-1399887383 Looks like I misunderstood the PR, I see what you mean @dongjoon-hyun. I am not sure what is a good way to make progress here ... let me think about it more. +CC @tgravescs,

[GitHub] [spark] zhengruifeng commented on pull request #39699: [SPARK-41772][CONNECT][PYTHON] Fix incorrect column name in `withField`'s doctest

2023-01-22 Thread via GitHub
zhengruifeng commented on PR #39699: URL: https://github.com/apache/spark/pull/39699#issuecomment-1399443025 @HyukjinKwon thanks, merged into master -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

[GitHub] [spark] LuciferYang commented on pull request #39682: [SPARK-42139][CORE][SQL] Handle null string values in SQLExecutionUIData/SparkPlanGraphWrapper/SQLPlanMetric

2023-01-22 Thread via GitHub
LuciferYang commented on PR #39682: URL: https://github.com/apache/spark/pull/39682#issuecomment-1399435222 This one GA passed -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] LuciferYang commented on pull request #39642: [SPARK-41677][CORE][SQL][SS][UI] Add Protobuf serializer for `StreamingQueryProgressWrapper`

2023-01-22 Thread via GitHub
LuciferYang commented on PR #39642: URL: https://github.com/apache/spark/pull/39642#issuecomment-1399441314 cc @gengliangwang Does this one have a chance to Spark 3.4.0 ? Or wait for the next version? -- This is an automated message from the Apache Git Service. To

[GitHub] [spark] zhengruifeng commented on pull request #39698: [SPARK-41283][CONNECT][PYTHON] Add `array_append` to Connect

2023-01-22 Thread via GitHub
zhengruifeng commented on PR #39698: URL: https://github.com/apache/spark/pull/39698#issuecomment-1399442430 thanks, merged into master -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] zhengruifeng closed pull request #39698: [SPARK-41283][CONNECT][PYTHON] Add `array_append` to Connect

2023-01-22 Thread via GitHub
zhengruifeng closed pull request #39698: [SPARK-41283][CONNECT][PYTHON] Add `array_append` to Connect URL: https://github.com/apache/spark/pull/39698 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

[GitHub] [spark] zhengruifeng commented on a diff in pull request #39693: [SPARK-41712][PYTHON][CONNECT] Migrate the Spark Connect errors into PySpark error framework.

2023-01-22 Thread via GitHub
zhengruifeng commented on code in PR #39693: URL: https://github.com/apache/spark/pull/39693#discussion_r1083425383 ## python/pyspark/sql/connect/client.py: ## @@ -628,21 +612,33 @@ def _handle_error(self, rpc_error: grpc.RpcError) -> NoReturn: if

[GitHub] [spark] itholic commented on a diff in pull request #39693: [SPARK-41712][PYTHON][CONNECT] Migrate the Spark Connect errors into PySpark error framework.

2023-01-22 Thread via GitHub
itholic commented on code in PR #39693: URL: https://github.com/apache/spark/pull/39693#discussion_r1083429604 ## python/pyspark/errors/__init__.py: ## @@ -45,4 +50,11 @@ "SparkUpgradeException", "PySparkTypeError", "PySparkValueError", +

[GitHub] [spark] zhengruifeng closed pull request #39699: [SPARK-41772][CONNECT][PYTHON] Fix incorrect column name in `withField`'s doctest

2023-01-22 Thread via GitHub
zhengruifeng closed pull request #39699: [SPARK-41772][CONNECT][PYTHON] Fix incorrect column name in `withField`'s doctest URL: https://github.com/apache/spark/pull/39699 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

[GitHub] [spark] LuciferYang commented on pull request #39683: [SPARK-42144][CORE][SQL] Handle null string values in StageDataWrapper/StreamBlockData/StreamingQueryData

2023-01-22 Thread via GitHub
LuciferYang commented on PR #39683: URL: https://github.com/apache/spark/pull/39683#issuecomment-1399442619 GA passed -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

[GitHub] [spark] itholic commented on pull request #39693: [SPARK-41712][PYTHON][CONNECT] Migrate the Spark Connect errors into PySpark error framework.

2023-01-22 Thread via GitHub
itholic commented on PR #39693: URL: https://github.com/apache/spark/pull/39693#issuecomment-1399452695 > not related to this PR, but we may also need to add tests to check the messages in these exceptions. For sure! I'm planning to improve the tests for new error framework as

[GitHub] [spark] Yikun commented on pull request #39690: [SPARK-42150][K8S][DOCS] Upgrade `Volcano` to 1.7.0

2023-01-22 Thread via GitHub
Yikun commented on PR #39690: URL: https://github.com/apache/spark/pull/39690#issuecomment-1399462666 @dongjoon-hyun Thanks! Late LGTM. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] AmplabJenkins commented on pull request #39695: [SPARK-XXXX] SparkConnectClient supports RetryPolicies now

2023-01-22 Thread via GitHub
AmplabJenkins commented on PR #39695: URL: https://github.com/apache/spark/pull/39695#issuecomment-1399473053 Can one of the admins verify this patch? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[GitHub] [spark] grundprinzip commented on pull request #39695: [SPARK-42156] SparkConnectClient supports RetryPolicies now

2023-01-22 Thread via GitHub
grundprinzip commented on PR #39695: URL: https://github.com/apache/spark/pull/39695#issuecomment-1399563679 R: @HyukjinKwon @zhengruifeng -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] grundprinzip commented on a diff in pull request #39585: [SPARK-42124][PYTHON][CONNECT] Scalar Inline Python UDF in Spark Connect

2023-01-22 Thread via GitHub
grundprinzip commented on code in PR #39585: URL: https://github.com/apache/spark/pull/39585#discussion_r1083516677 ## connector/connect/common/src/main/protobuf/spark/connect/expressions.proto: ## @@ -217,6 +218,28 @@ message Expression { bool is_user_defined_function =

[GitHub] [spark] grundprinzip commented on a diff in pull request #39693: [SPARK-41712][PYTHON][CONNECT] Migrate the Spark Connect errors into PySpark error framework.

2023-01-22 Thread via GitHub
grundprinzip commented on code in PR #39693: URL: https://github.com/apache/spark/pull/39693#discussion_r1083516891 ## python/pyspark/errors/exceptions.py: ## @@ -288,7 +291,57 @@ class UnknownException(CapturedException): class SparkUpgradeException(CapturedException):

[GitHub] [spark] itholic opened a new pull request, #39700: [SPARK-41490][SQL] Assign name to _LEGACY_ERROR_TEMP_2441

2023-01-22 Thread via GitHub
itholic opened a new pull request, #39700: URL: https://github.com/apache/spark/pull/39700 ### What changes were proposed in this pull request? This PR proposes to assign name to _LEGACY_ERROR_TEMP_2441, "UNSUPPORTED_EXPR_FOR_OPERATOR". ### Why are the changes

[GitHub] [spark] itholic opened a new pull request, #39701: [SPARK-41489][SQL] Assign name to _LEGACY_ERROR_TEMP_2415

2023-01-22 Thread via GitHub
itholic opened a new pull request, #39701: URL: https://github.com/apache/spark/pull/39701 ### What changes were proposed in this pull request? This PR proposes to assign name to _LEGACY_ERROR_TEMP_2415, "INVALID_TYPE_FOR_FILTER_EXPR". ### Why are the changes

[GitHub] [spark] itholic commented on a diff in pull request #39701: [SPARK-41489][SQL] Assign name to _LEGACY_ERROR_TEMP_2415

2023-01-22 Thread via GitHub
itholic commented on code in PR #39701: URL: https://github.com/apache/spark/pull/39701#discussion_r1083521103 ## core/src/main/resources/error/error-classes.json: ## @@ -933,6 +933,12 @@ ], "sqlState" : "42604" }, + "INVALID_TYPE_FOR_FILTER_EXPR" : { +

[GitHub] [spark] itholic commented on a diff in pull request #39693: [SPARK-41712][PYTHON][CONNECT] Migrate the Spark Connect errors into PySpark error framework.

2023-01-22 Thread via GitHub
itholic commented on code in PR #39693: URL: https://github.com/apache/spark/pull/39693#discussion_r1083528302 ## python/pyspark/errors/exceptions.py: ## @@ -288,7 +291,57 @@ class UnknownException(CapturedException): class SparkUpgradeException(CapturedException): """

[GitHub] [spark] itholic commented on pull request #39501: [SPARK-41295][SQL] Rename the error classes

2023-01-22 Thread via GitHub
itholic commented on PR #39501: URL: https://github.com/apache/spark/pull/39501#issuecomment-1399590022 It looks good to me. Also cc @cloud-fan -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[GitHub] [spark] itholic commented on pull request #39501: [SPARK-41295][SQL] Rename the error classes

2023-01-22 Thread via GitHub
itholic commented on PR #39501: URL: https://github.com/apache/spark/pull/39501#issuecomment-1399591723 @NarekDW Can you add `[SPARK-41296]` to PR title instead explaining in PR description ? `[SPARK-41295][SPARK-41296][SQL] Rename the error classes` -- This is an automated

[GitHub] [spark] dongjoon-hyun commented on a diff in pull request #38428: [SPARK-40912][CORE]Overhead of Exceptions in KryoDeserializationStream

2023-01-22 Thread via GitHub
dongjoon-hyun commented on code in PR #38428: URL: https://github.com/apache/spark/pull/38428#discussion_r1083538101 ## core/src/test/scala/org/apache/spark/serializer/KryoIteratorBenchmark.scala: ## @@ -0,0 +1,117 @@ +/* + * Licensed to the Apache Software Foundation (ASF)

[GitHub] [spark] itholic commented on pull request #39501: [SPARK-41295][SPARK-41296][SQL] Rename the error classes

2023-01-22 Thread via GitHub
itholic commented on PR #39501: URL: https://github.com/apache/spark/pull/39501#issuecomment-1399602887 Cool, thanks!! BTW, if you happen to interested in more contribution to rename error class, could you try resolving SPARK-41302 and SPARK-41488 ?? I believe these are pretty

[GitHub] [spark] NarekDW commented on pull request #39501: [SPARK-41295][SPARK-41296][SQL] Rename the error classes

2023-01-22 Thread via GitHub
NarekDW commented on PR #39501: URL: https://github.com/apache/spark/pull/39501#issuecomment-1399605596 > Cool, thanks!! > > BTW, if you happen to interested in more contribution to rename error class, could you try resolving

[GitHub] [spark] itholic commented on a diff in pull request #39695: [SPARK-42156] SparkConnectClient supports RetryPolicies now

2023-01-22 Thread via GitHub
itholic commented on code in PR #39695: URL: https://github.com/apache/spark/pull/39695#discussion_r1083547431 ## python/pyspark/sql/connect/client.py: ## @@ -567,54 +602,48 @@ def _execute_and_fetch( logger.info("ExecuteAndFetch") m:

[GitHub] [spark] gengliangwang commented on pull request #39682: [SPARK-42139][CORE][SQL] Handle null string values in SQLExecutionUIData/SparkPlanGraphWrapper/SQLPlanMetric

2023-01-22 Thread via GitHub
gengliangwang commented on PR #39682: URL: https://github.com/apache/spark/pull/39682#issuecomment-1399617495 Thanks, merging to master -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] gengliangwang closed pull request #39682: [SPARK-42139][CORE][SQL] Handle null string values in SQLExecutionUIData/SparkPlanGraphWrapper/SQLPlanMetric

2023-01-22 Thread via GitHub
gengliangwang closed pull request #39682: [SPARK-42139][CORE][SQL] Handle null string values in SQLExecutionUIData/SparkPlanGraphWrapper/SQLPlanMetric URL: https://github.com/apache/spark/pull/39682 -- This is an automated message from the Apache Git Service. To respond to the message,

[GitHub] [spark] gengliangwang commented on pull request #39683: [SPARK-42144][CORE][SQL] Handle null string values in StageDataWrapper/StreamBlockData/StreamingQueryData

2023-01-22 Thread via GitHub
gengliangwang commented on PR #39683: URL: https://github.com/apache/spark/pull/39683#issuecomment-1399617671 Thanks, merging to master -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] gengliangwang closed pull request #39683: [SPARK-42144][CORE][SQL] Handle null string values in StageDataWrapper/StreamBlockData/StreamingQueryData

2023-01-22 Thread via GitHub
gengliangwang closed pull request #39683: [SPARK-42144][CORE][SQL] Handle null string values in StageDataWrapper/StreamBlockData/StreamingQueryData URL: https://github.com/apache/spark/pull/39683 -- This is an automated message from the Apache Git Service. To respond to the message, please

[GitHub] [spark] gengliangwang commented on a diff in pull request #39642: [SPARK-41677][CORE][SQL][SS][UI] Add Protobuf serializer for `StreamingQueryProgressWrapper`

2023-01-22 Thread via GitHub
gengliangwang commented on code in PR #39642: URL: https://github.com/apache/spark/pull/39642#discussion_r1083550779 ## core/src/main/protobuf/org/apache/spark/status/protobuf/store_types.proto: ## @@ -765,3 +765,54 @@ message PoolData { optional string name = 1; repeated

[GitHub] [spark] itholic commented on a diff in pull request #39691: [SPARK-31561][SQL] Add QUALIFY clause

2023-01-22 Thread via GitHub
itholic commented on code in PR #39691: URL: https://github.com/apache/spark/pull/39691#discussion_r1083550178 ## sql/core/src/test/resources/sql-tests/results/window.sql.out: ## @@ -1342,3 +1342,139 @@ org.apache.spark.sql.AnalysisException "windowName" : "w" } } + +

[GitHub] [spark] sadikovi commented on a diff in pull request #39660: [SPARK-42128][SQL] Support TOP (N) for MS SQL Server dialect as an alternative to Limit pushdown

2023-01-22 Thread via GitHub
sadikovi commented on code in PR #39660: URL: https://github.com/apache/spark/pull/39660#discussion_r1083554624 ## sql/core/src/main/scala/org/apache/spark/sql/jdbc/MsSqlServerDialect.scala: ## @@ -167,10 +167,15 @@ private object MsSqlServerDialect extends JdbcDialect {

[GitHub] [spark] sadikovi commented on a diff in pull request #39660: [SPARK-42128][SQL] Support TOP (N) for MS SQL Server dialect as an alternative to Limit pushdown

2023-01-22 Thread via GitHub
sadikovi commented on code in PR #39660: URL: https://github.com/apache/spark/pull/39660#discussion_r1083554734 ## sql/core/src/test/scala/org/apache/spark/sql/jdbc/JDBCSuite.scala: ## @@ -1001,6 +1001,35 @@ class JDBCSuite extends QueryTest with SharedSparkSession { }

[GitHub] [spark] sadikovi commented on a diff in pull request #39660: [SPARK-42128][SQL] Support TOP (N) for MS SQL Server dialect as an alternative to Limit pushdown

2023-01-22 Thread via GitHub
sadikovi commented on code in PR #39660: URL: https://github.com/apache/spark/pull/39660#discussion_r1083555135 ## sql/core/src/main/scala/org/apache/spark/sql/jdbc/JdbcDialects.scala: ## @@ -544,6 +544,14 @@ abstract class JdbcDialect extends Serializable with Logging {

[GitHub] [spark] sadikovi commented on a diff in pull request #39667: [SPARK-42131][SQL] Extract the function that construct the select statement for JDBC dialect.

2023-01-22 Thread via GitHub
sadikovi commented on code in PR #39667: URL: https://github.com/apache/spark/pull/39667#discussion_r1083555237 ## sql/core/src/main/scala/org/apache/spark/sql/jdbc/JdbcDialects.scala: ## @@ -551,6 +552,63 @@ abstract class JdbcDialect extends Serializable with Logging {

[GitHub] [spark] sadikovi commented on a diff in pull request #39660: [SPARK-42128][SQL] Support TOP (N) for MS SQL Server dialect as an alternative to Limit pushdown

2023-01-22 Thread via GitHub
sadikovi commented on code in PR #39660: URL: https://github.com/apache/spark/pull/39660#discussion_r1083555750 ## sql/core/src/main/scala/org/apache/spark/sql/jdbc/JdbcDialects.scala: ## @@ -544,6 +544,14 @@ abstract class JdbcDialect extends Serializable with Logging {

[GitHub] [spark] sadikovi commented on a diff in pull request #39660: [SPARK-42128][SQL] Support TOP (N) for MS SQL Server dialect as an alternative to Limit pushdown

2023-01-22 Thread via GitHub
sadikovi commented on code in PR #39660: URL: https://github.com/apache/spark/pull/39660#discussion_r1083555430 ## sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JDBCRDD.scala: ## @@ -307,11 +307,12 @@ private[jdbc] class JDBCRDD( "" } +

[GitHub] [spark] sadikovi commented on a diff in pull request #39660: [SPARK-42128][SQL] Support TOP (N) for MS SQL Server dialect as an alternative to Limit pushdown

2023-01-22 Thread via GitHub
sadikovi commented on code in PR #39660: URL: https://github.com/apache/spark/pull/39660#discussion_r1083555135 ## sql/core/src/main/scala/org/apache/spark/sql/jdbc/JdbcDialects.scala: ## @@ -544,6 +544,14 @@ abstract class JdbcDialect extends Serializable with Logging {

[GitHub] [spark] xinrong-meng commented on a diff in pull request #39585: [SPARK-42124][PYTHON][CONNECT] Scalar Inline Python UDF in Spark Connect

2023-01-22 Thread via GitHub
xinrong-meng commented on code in PR #39585: URL: https://github.com/apache/spark/pull/39585#discussion_r1083556210 ## python/pyspark/sql/connect/udf.py: ## @@ -0,0 +1,165 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license

[GitHub] [spark] dongjoon-hyun opened a new pull request, #39703: [SPARK-42157][CORE] spark.scheduler.mode=FAIR should provide FAIR scheduler

2023-01-22 Thread via GitHub
dongjoon-hyun opened a new pull request, #39703: URL: https://github.com/apache/spark/pull/39703 … ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change?

[GitHub] [spark] dongjoon-hyun commented on a diff in pull request #39703: [SPARK-42157][CORE] `spark.scheduler.mode=FAIR` should provide FAIR scheduler

2023-01-22 Thread via GitHub
dongjoon-hyun commented on code in PR #39703: URL: https://github.com/apache/spark/pull/39703#discussion_r1083560182 ## conf/fairscheduler-default.xml.template: ## @@ -0,0 +1,26 @@ + + + + + + Review Comment:

[GitHub] [spark] dongjoon-hyun commented on a diff in pull request #39703: [SPARK-42157][CORE] `spark.scheduler.mode=FAIR` should provide FAIR scheduler

2023-01-22 Thread via GitHub
dongjoon-hyun commented on code in PR #39703: URL: https://github.com/apache/spark/pull/39703#discussion_r1083560381 ## conf/fairscheduler-default.xml.template: ## @@ -0,0 +1,26 @@ + + + + + + +FAIR +1 Review Comment:

[GitHub] [spark] dongjoon-hyun commented on a diff in pull request #39703: [SPARK-42157][CORE] `spark.scheduler.mode=FAIR` should provide FAIR scheduler

2023-01-22 Thread via GitHub
dongjoon-hyun commented on code in PR #39703: URL: https://github.com/apache/spark/pull/39703#discussion_r1083560527 ## core/src/main/scala/org/apache/spark/scheduler/SchedulableBuilder.scala: ## @@ -61,6 +61,7 @@ private[spark] class FairSchedulableBuilder(val rootPool: Pool,

[GitHub] [spark] dongjoon-hyun commented on a diff in pull request #39703: [SPARK-42157][CORE] `spark.scheduler.mode=FAIR` should provide FAIR scheduler

2023-01-22 Thread via GitHub
dongjoon-hyun commented on code in PR #39703: URL: https://github.com/apache/spark/pull/39703#discussion_r1083560421 ## conf/fairscheduler-default.xml.template: ## @@ -0,0 +1,26 @@ + + + + + + +FAIR +1 +0 Review Comment:

[GitHub] [spark] AmplabJenkins commented on pull request #39672: [SPARK-42133] Add basic Dataset API methods to Spark Connect Scala Client

2023-01-22 Thread via GitHub
AmplabJenkins commented on PR #39672: URL: https://github.com/apache/spark/pull/39672#issuecomment-1399633308 Can one of the admins verify this patch? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[GitHub] [spark] AmplabJenkins commented on pull request #39681: [SPARK-18011] Fix SparkR NA date serialization

2023-01-22 Thread via GitHub
AmplabJenkins commented on PR #39681: URL: https://github.com/apache/spark/pull/39681#issuecomment-1399633268 Can one of the admins verify this patch? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[GitHub] [spark] AmplabJenkins commented on pull request #39678: [SPARK-16484][SQL] Add HyperLogLogPlusPlus sketch generator/evaluator/aggregator

2023-01-22 Thread via GitHub
AmplabJenkins commented on PR #39678: URL: https://github.com/apache/spark/pull/39678#issuecomment-1399633285 Can one of the admins verify this patch? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[GitHub] [spark] AmplabJenkins commented on pull request #39673: [SPARK-42132][SQL] Deduplicate attributes in groupByKey.cogroup

2023-01-22 Thread via GitHub
AmplabJenkins commented on PR #39673: URL: https://github.com/apache/spark/pull/39673#issuecomment-1399633297 Can one of the admins verify this patch? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[GitHub] [spark] srowen commented on a diff in pull request #39660: [SPARK-42128][SQL] Support TOP (N) for MS SQL Server dialect as an alternative to Limit pushdown

2023-01-22 Thread via GitHub
srowen commented on code in PR #39660: URL: https://github.com/apache/spark/pull/39660#discussion_r1083561744 ## sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JDBCRDD.scala: ## @@ -307,11 +307,12 @@ private[jdbc] class JDBCRDD( "" } +

[GitHub] [spark] sadikovi commented on a diff in pull request #39660: [SPARK-42128][SQL] Support TOP (N) for MS SQL Server dialect as an alternative to Limit pushdown

2023-01-22 Thread via GitHub
sadikovi commented on code in PR #39660: URL: https://github.com/apache/spark/pull/39660#discussion_r1083562843 ## sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JDBCRDD.scala: ## @@ -307,11 +307,12 @@ private[jdbc] class JDBCRDD( "" } +

[GitHub] [spark] viirya commented on a diff in pull request #39697: [SPARK-42154][K8S][TESTS] Enable `Volcano` unit and integration tests in GitHub Action

2023-01-22 Thread via GitHub
viirya commented on code in PR #39697: URL: https://github.com/apache/spark/pull/39697#discussion_r1083562995 ## .github/workflows/build_and_test.yml: ## @@ -952,9 +952,9 @@ jobs: export PVC_TESTS_VM_PATH=$PVC_TMP_DIR minikube mount

[GitHub] [spark] dongjoon-hyun commented on a diff in pull request #39697: [SPARK-42154][K8S][TESTS] Enable `Volcano` unit and integration tests in GitHub Action

2023-01-22 Thread via GitHub
dongjoon-hyun commented on code in PR #39697: URL: https://github.com/apache/spark/pull/39697#discussion_r1083565704 ## .github/workflows/build_and_test.yml: ## @@ -952,9 +952,9 @@ jobs: export PVC_TESTS_VM_PATH=$PVC_TMP_DIR minikube mount

[GitHub] [spark] dongjoon-hyun commented on pull request #39697: [SPARK-42154][K8S][TESTS] Enable `Volcano` unit and integration tests in GitHub Action

2023-01-22 Thread via GitHub
dongjoon-hyun commented on PR #39697: URL: https://github.com/apache/spark/pull/39697#issuecomment-1399640431 Thank you, @gengliangwang and @viirya . Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

[GitHub] [spark] dongjoon-hyun closed pull request #39697: [SPARK-42154][K8S][TESTS] Enable `Volcano` unit and integration tests in GitHub Action

2023-01-22 Thread via GitHub
dongjoon-hyun closed pull request #39697: [SPARK-42154][K8S][TESTS] Enable `Volcano` unit and integration tests in GitHub Action URL: https://github.com/apache/spark/pull/39697 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

[GitHub] [spark] dongjoon-hyun commented on pull request #39697: [SPARK-42154][K8S][TESTS] Enable `Volcano` unit and integration tests in GitHub Action

2023-01-22 Thread via GitHub
dongjoon-hyun commented on PR #39697: URL: https://github.com/apache/spark/pull/39697#issuecomment-1399619557 Could you review this please, @viirya ? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

[GitHub] [spark] dongjoon-hyun commented on pull request #39697: [SPARK-42154][K8S][TESTS] Enable `Volcano` unit and integration tests in GitHub Action

2023-01-22 Thread via GitHub
dongjoon-hyun commented on PR #39697: URL: https://github.com/apache/spark/pull/39697#issuecomment-1399619465 All tests passed. ![Screenshot 2023-01-22 at 1 58 03 PM](https://user-images.githubusercontent.com/9700541/213942512-89475afa-bd90-4585-98db-931fa3ecf0bd.png) -- This

[GitHub] [spark] dongjoon-hyun commented on pull request #39555: [SPARK-42051][SQL] Codegen Support for HiveGenericUDF

2023-01-22 Thread via GitHub
dongjoon-hyun commented on PR #39555: URL: https://github.com/apache/spark/pull/39555#issuecomment-1399620284 Also cc @LuciferYang , too -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] AmplabJenkins commented on pull request #39687: [SPARK-41470][SQL] Relax constraints on Storage-Partitioned-Join should assume InternalRow implements equals and hashCode

2023-01-22 Thread via GitHub
AmplabJenkins commented on PR #39687: URL: https://github.com/apache/spark/pull/39687#issuecomment-1399572509 Can one of the admins verify this patch? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

  1   2   >