[GitHub] [spark] dongjoon-hyun opened a new pull request, #40509: [SPARK-42885][K8S][BUILD] Upgrade `kubernetes-client` to 6.5.1

2023-03-21 Thread via GitHub
dongjoon-hyun opened a new pull request, #40509: URL: https://github.com/apache/spark/pull/40509 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ###

[GitHub] [spark] dongjoon-hyun commented on pull request #40444: [SPARK-42813][K8S] Print application info when waitAppCompletion is false

2023-03-21 Thread via GitHub
dongjoon-hyun commented on PR #40444: URL: https://github.com/apache/spark/pull/40444#issuecomment-1478133626 Merged to master branch for Apache Spark 3.5.0. Thank you, @pan3793 and @yaooqinn . -- This is an automated message from the Apache Git Service. To respond to the message,

[GitHub] [spark] dongjoon-hyun closed pull request #40444: [SPARK-42813][K8S] Print application info when waitAppCompletion is false

2023-03-21 Thread via GitHub
dongjoon-hyun closed pull request #40444: [SPARK-42813][K8S] Print application info when waitAppCompletion is false URL: https://github.com/apache/spark/pull/40444 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[GitHub] [spark] hvanhovell commented on pull request #40498: [SPARK-42878][CONNECT] The table API in DataFrameReader could also accept options

2023-03-21 Thread via GitHub
hvanhovell commented on PR #40498: URL: https://github.com/apache/spark/pull/40498#issuecomment-1478120860 @amaliujia can you add a test that checks if the options are properly propagated from client to server? -- This is an automated message from the Apache Git Service. To respond to

[GitHub] [spark] viirya commented on a diff in pull request #40504: [SPARK-42880][DOCS] Update running-on-yarn.md to log4j2 syntax

2023-03-21 Thread via GitHub
viirya commented on code in PR #40504: URL: https://github.com/apache/spark/pull/40504#discussion_r1143616383 ## docs/running-on-yarn.md: ## @@ -137,7 +137,7 @@ Note that for the first option, both executors and the application master will s log4j configuration, which may

[GitHub] [spark] MaxGekk commented on a diff in pull request #40508: [MINOR][SQL][CONNECT][PYTHON] Clarify the comment of parameterized SQL args

2023-03-21 Thread via GitHub
MaxGekk commented on code in PR #40508: URL: https://github.com/apache/spark/pull/40508#discussion_r1143607484 ## connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/SparkSession.scala: ## @@ -213,7 +213,9 @@ class SparkSession private[sql] ( * @param sqlText

[GitHub] [spark] attilapiros commented on a diff in pull request #38518: [SPARK-33349][K8S] Reset the executor pods watcher when we receive a version changed from k8s

2023-03-21 Thread via GitHub
attilapiros commented on code in PR #38518: URL: https://github.com/apache/spark/pull/38518#discussion_r1143613836 ## resource-managers/kubernetes/core/src/main/scala/org/apache/spark/scheduler/cluster/k8s/ExecutorPodsWatchSnapshotSource.scala: ## @@ -86,8 +97,14 @@ class

[GitHub] [spark] MaxGekk commented on a diff in pull request #40508: [MINOR][SQL][CONNECT][PYTHON] Clarify the comment of parameterized SQL args

2023-03-21 Thread via GitHub
MaxGekk commented on code in PR #40508: URL: https://github.com/apache/spark/pull/40508#discussion_r1143607484 ## connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/SparkSession.scala: ## @@ -213,7 +213,9 @@ class SparkSession private[sql] ( * @param sqlText

[GitHub] [spark] grundprinzip commented on a diff in pull request #40508: [MINOR][SQL][CONNECT][PYTHON] Clarify the comment of parameterized SQL args

2023-03-21 Thread via GitHub
grundprinzip commented on code in PR #40508: URL: https://github.com/apache/spark/pull/40508#discussion_r1143612707 ## connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/SparkSession.scala: ## @@ -213,7 +213,9 @@ class SparkSession private[sql] ( * @param

[GitHub] [spark] viirya commented on a diff in pull request #40504: [SPARK-42880][DOCS] Update running-on-yarn.md to log4j2 syntax

2023-03-21 Thread via GitHub
viirya commented on code in PR #40504: URL: https://github.com/apache/spark/pull/40504#discussion_r1143610006 ## docs/running-on-yarn.md: ## @@ -137,7 +137,7 @@ Note that for the first option, both executors and the application master will s log4j configuration, which may

[GitHub] [spark] MaxGekk commented on a diff in pull request #40508: [MINOR][SQL][CONNECT][PYTHON] Clarify the comment of parameterized SQL args

2023-03-21 Thread via GitHub
MaxGekk commented on code in PR #40508: URL: https://github.com/apache/spark/pull/40508#discussion_r1143607484 ## connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/SparkSession.scala: ## @@ -213,7 +213,9 @@ class SparkSession private[sql] ( * @param sqlText

[GitHub] [spark] grundprinzip commented on a diff in pull request #40508: [MINOR][SQL][CONNECT][PYTHON] Clarify the comment of parameterized SQL args

2023-03-21 Thread via GitHub
grundprinzip commented on code in PR #40508: URL: https://github.com/apache/spark/pull/40508#discussion_r1143577084 ## connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/SparkSession.scala: ## @@ -213,7 +213,9 @@ class SparkSession private[sql] ( * @param

[GitHub] [spark] MaxGekk commented on a diff in pull request #40508: [MINOR][SQL][CONNECT][PYTHON] Clarify the comment of parameterized SQL args

2023-03-21 Thread via GitHub
MaxGekk commented on code in PR #40508: URL: https://github.com/apache/spark/pull/40508#discussion_r1143569959 ## connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/SparkSession.scala: ## @@ -213,7 +213,9 @@ class SparkSession private[sql] ( * @param sqlText

[GitHub] [spark] MaxGekk commented on a diff in pull request #40508: [MINOR][SQL][CONNECT][PYTHON] Clarify the comment of parameterized SQL args

2023-03-21 Thread via GitHub
MaxGekk commented on code in PR #40508: URL: https://github.com/apache/spark/pull/40508#discussion_r1143569959 ## connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/SparkSession.scala: ## @@ -213,7 +213,9 @@ class SparkSession private[sql] ( * @param sqlText

[GitHub] [spark] MaxGekk commented on a diff in pull request #40508: [MINOR][SQL][CONNECT][PYTHON] Clarify the comment of parameterized SQL args

2023-03-21 Thread via GitHub
MaxGekk commented on code in PR #40508: URL: https://github.com/apache/spark/pull/40508#discussion_r1143566089 ## connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/SparkSession.scala: ## @@ -213,7 +213,9 @@ class SparkSession private[sql] ( * @param sqlText

[GitHub] [spark] vicennial commented on pull request #40368: [SPARK-42748][CONNECT] Server-side Artifact Management

2023-03-21 Thread via GitHub
vicennial commented on PR #40368: URL: https://github.com/apache/spark/pull/40368#issuecomment-1478027975 PR is mergeable @hvanhovell -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] grundprinzip commented on a diff in pull request #40508: [MINOR][SQL][CONNECT][PYTHON] Clarify the comment of parameterized SQL args

2023-03-21 Thread via GitHub
grundprinzip commented on code in PR #40508: URL: https://github.com/apache/spark/pull/40508#discussion_r1143538305 ## connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/SparkSession.scala: ## @@ -213,7 +213,9 @@ class SparkSession private[sql] ( * @param

[GitHub] [spark] LuciferYang commented on pull request #40489: [SPARK-42871][BUILD] Upgrade slf4j to 2.0.7

2023-03-21 Thread via GitHub
LuciferYang commented on PR #40489: URL: https://github.com/apache/spark/pull/40489#issuecomment-1478005750 done, let's wait CI -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] yabola commented on a diff in pull request #39950: [SPARK-42388][SQL] Avoid parquet footer reads twice when no filters in vectorized reader

2023-03-21 Thread via GitHub
yabola commented on code in PR #39950: URL: https://github.com/apache/spark/pull/39950#discussion_r1143531801 ## sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/parquet/ParquetPartitionReaderFactory.scala: ## @@ -88,12 +88,18 @@ case class

[GitHub] [spark] yabola commented on a diff in pull request #39950: [SPARK-42388][SQL] Avoid parquet footer reads twice when no filters in vectorized reader

2023-03-21 Thread via GitHub
yabola commented on code in PR #39950: URL: https://github.com/apache/spark/pull/39950#discussion_r1143531801 ## sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/parquet/ParquetPartitionReaderFactory.scala: ## @@ -88,12 +88,18 @@ case class

[GitHub] [spark] srowen commented on pull request #40440: [SPARK-42808][CORE] Avoid getting availableProcessors every time in `MapOutputTrackerMaster#getStatistics`

2023-03-21 Thread via GitHub
srowen commented on PR #40440: URL: https://github.com/apache/spark/pull/40440#issuecomment-1477991828 Merged to master -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

[GitHub] [spark] yabola commented on a diff in pull request #40495: test reading footer within file range

2023-03-21 Thread via GitHub
yabola commented on code in PR #40495: URL: https://github.com/apache/spark/pull/40495#discussion_r1143525196 ## sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/parquet/ParquetPartitionReaderFactory.scala: ## @@ -92,8 +93,13 @@ case class

[GitHub] [spark] srowen closed pull request #40440: [SPARK-42808][CORE] Avoid getting availableProcessors every time in `MapOutputTrackerMaster#getStatistics`

2023-03-21 Thread via GitHub
srowen closed pull request #40440: [SPARK-42808][CORE] Avoid getting availableProcessors every time in `MapOutputTrackerMaster#getStatistics` URL: https://github.com/apache/spark/pull/40440 -- This is an automated message from the Apache Git Service. To respond to the message, please log on

[GitHub] [spark] srowen commented on pull request #40489: [SPARK-42871][BUILD] Upgrade slf4j to 2.0.7

2023-03-21 Thread via GitHub
srowen commented on PR #40489: URL: https://github.com/apache/spark/pull/40489#issuecomment-1477972190 @LuciferYang if you can resolve the conflict (I merged your other change) I'll merge this -- This is an automated message from the Apache Git Service. To respond to the message, please

[GitHub] [spark] LuciferYang commented on pull request #40490: [SPARK-42536][BUILD] Upgrade log4j2 to 2.20.0

2023-03-21 Thread via GitHub
LuciferYang commented on PR #40490: URL: https://github.com/apache/spark/pull/40490#issuecomment-1477971639 Thanks @srowen @HyukjinKwon -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] srowen commented on pull request #40490: [SPARK-42536][BUILD] Upgrade log4j2 to 2.20.0

2023-03-21 Thread via GitHub
srowen commented on PR #40490: URL: https://github.com/apache/spark/pull/40490#issuecomment-1477970912 Merged to master -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

[GitHub] [spark] srowen closed pull request #40490: [SPARK-42536][BUILD] Upgrade log4j2 to 2.20.0

2023-03-21 Thread via GitHub
srowen closed pull request #40490: [SPARK-42536][BUILD] Upgrade log4j2 to 2.20.0 URL: https://github.com/apache/spark/pull/40490 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

[GitHub] [spark] itholic commented on a diff in pull request #40507: [SPARK-42662][CONNECT][PS] Add `_distributed_sequence_id` for distributed-sequence index.

2023-03-21 Thread via GitHub
itholic commented on code in PR #40507: URL: https://github.com/apache/spark/pull/40507#discussion_r1143481973 ## python/pyspark/sql/connect/functions.py: ## @@ -2471,6 +2472,13 @@ def udf( udf.__doc__ = pysparkfuncs.udf.__doc__ +def _distributed_sequence_id() -> Column:

[GitHub] [spark] LuciferYang commented on a diff in pull request #40495: test reading footer within file range

2023-03-21 Thread via GitHub
LuciferYang commented on code in PR #40495: URL: https://github.com/apache/spark/pull/40495#discussion_r1143464079 ## sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/parquet/ParquetPartitionReaderFactory.scala: ## @@ -92,8 +93,13 @@ case class

[GitHub] [spark] LuciferYang commented on a diff in pull request #40495: test reading footer within file range

2023-03-21 Thread via GitHub
LuciferYang commented on code in PR #40495: URL: https://github.com/apache/spark/pull/40495#discussion_r1143464079 ## sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/parquet/ParquetPartitionReaderFactory.scala: ## @@ -92,8 +93,13 @@ case class

[GitHub] [spark] LuciferYang commented on a diff in pull request #40495: test reading footer within file range

2023-03-21 Thread via GitHub
LuciferYang commented on code in PR #40495: URL: https://github.com/apache/spark/pull/40495#discussion_r1143464079 ## sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/parquet/ParquetPartitionReaderFactory.scala: ## @@ -92,8 +93,13 @@ case class

[GitHub] [spark] itholic commented on a diff in pull request #40507: [SPARK-42662][CONNECT][PS] Add `_distributed_sequence_id` for distributed-sequence index.

2023-03-21 Thread via GitHub
itholic commented on code in PR #40507: URL: https://github.com/apache/spark/pull/40507#discussion_r1143463571 ## python/pyspark/sql/connect/functions.py: ## @@ -2471,6 +2472,13 @@ def udf( udf.__doc__ = pysparkfuncs.udf.__doc__ +def _distributed_sequence_id() -> Column:

[GitHub] [spark] peter-toth closed pull request #40488: [SPARK-42851][SQL] Replace EquivalentExpressions with mutable map in PhysicalAggregation

2023-03-21 Thread via GitHub
peter-toth closed pull request #40488: [SPARK-42851][SQL] Replace EquivalentExpressions with mutable map in PhysicalAggregation URL: https://github.com/apache/spark/pull/40488 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

[GitHub] [spark] cloud-fan closed pull request #40473: [SPARK-42851][SQL] Guard EquivalentExpressions.addExpr() with supportedExpression()

2023-03-21 Thread via GitHub
cloud-fan closed pull request #40473: [SPARK-42851][SQL] Guard EquivalentExpressions.addExpr() with supportedExpression() URL: https://github.com/apache/spark/pull/40473 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

[GitHub] [spark] cloud-fan commented on pull request #40473: [SPARK-42851][SQL] Guard EquivalentExpressions.addExpr() with supportedExpression()

2023-03-21 Thread via GitHub
cloud-fan commented on PR #40473: URL: https://github.com/apache/spark/pull/40473#issuecomment-1477837727 thanks, merging to master/3.4! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] beliefer commented on pull request #40355: [SPARK-42604][CONNECT] Implement functions.typedlit

2023-03-21 Thread via GitHub
beliefer commented on PR #40355: URL: https://github.com/apache/spark/pull/40355#issuecomment-1477836670 ping @hvanhovell @HyukjinKwon Could you have time to take a look? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

[GitHub] [spark] zzzzming95 commented on pull request #40477: [SPARK-42805]`DeduplicateRelations` rule show process `LOGICAL_RDD`

2023-03-21 Thread via GitHub
ming95 commented on PR #40477: URL: https://github.com/apache/spark/pull/40477#issuecomment-1477819886 @HyukjinKwon @LuciferYang cc -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] MaxGekk commented on pull request #40508: [MINOR][SQL][CONNECT][PYTHON] Clarify the comment of parameterized SQL args

2023-03-21 Thread via GitHub
MaxGekk commented on PR #40508: URL: https://github.com/apache/spark/pull/40508#issuecomment-1477781116 @cloud-fan @grundprinzip @HyukjinKwon Could you review this PR, please. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

[GitHub] [spark] MaxGekk opened a new pull request, #40508: [MINOR][SQL][CONNECT][PYTHON] Clarify the comment of parameterized SQL args

2023-03-21 Thread via GitHub
MaxGekk opened a new pull request, #40508: URL: https://github.com/apache/spark/pull/40508 ### What changes were proposed in this pull request? In the PR, I propose to clarify the comment of `args` in parameterized `sql()`. ### Why are the changes needed? To make the comment

[GitHub] [spark] panbingkun commented on a diff in pull request #40506: [SPARK-42881][SQL] Codegen Support for get_json_object

2023-03-21 Thread via GitHub
panbingkun commented on code in PR #40506: URL: https://github.com/apache/spark/pull/40506#discussion_r1143306325 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/jsonExpressions.scala: ## @@ -140,18 +135,92 @@ case class GetJsonObject(json: Expression,

[GitHub] [spark] LuciferYang commented on a diff in pull request #40504: [SPARK-42880] Update running-on-yarn.md to log4j2 syntax

2023-03-21 Thread via GitHub
LuciferYang commented on code in PR #40504: URL: https://github.com/apache/spark/pull/40504#discussion_r1143279989 ## docs/running-on-yarn.md: ## @@ -137,7 +137,7 @@ Note that for the first option, both executors and the application master will s log4j configuration, which

[GitHub] [spark] LuciferYang commented on a diff in pull request #40504: [SPARK-42880] Update running-on-yarn.md to log4j2 syntax

2023-03-21 Thread via GitHub
LuciferYang commented on code in PR #40504: URL: https://github.com/apache/spark/pull/40504#discussion_r1143279989 ## docs/running-on-yarn.md: ## @@ -137,7 +137,7 @@ Note that for the first option, both executors and the application master will s log4j configuration, which

[GitHub] [spark] Stove-hust commented on a diff in pull request #40393: [SPARK-40082] Schedule mergeFinalize when push merge shuffleMapStage retry but no running tasks

2023-03-21 Thread via GitHub
Stove-hust commented on code in PR #40393: URL: https://github.com/apache/spark/pull/40393#discussion_r1143279779 ## core/src/test/scala/org/apache/spark/scheduler/DAGSchedulerSuite.scala: ## @@ -4595,6 +4595,184 @@ class DAGSchedulerSuite extends SparkFunSuite with

[GitHub] [spark] bjornjorgensen commented on a diff in pull request #40505: [MINOR][DOCS] Remove SparkSession constructor invocation in the example

2023-03-21 Thread via GitHub
bjornjorgensen commented on code in PR #40505: URL: https://github.com/apache/spark/pull/40505#discussion_r1143270035 ## python/pyspark/sql/session.py: ## @@ -179,10 +179,15 @@ class SparkSession(SparkConversionMixin): ... .getOrCreate() ... ) -Create a

[GitHub] [spark] panbingkun commented on a diff in pull request #40506: [SPARK-42881][SQL] Codegen Support for get_json_object

2023-03-21 Thread via GitHub
panbingkun commented on code in PR #40506: URL: https://github.com/apache/spark/pull/40506#discussion_r1143261154 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/jsonExpressions.scala: ## @@ -140,18 +135,92 @@ case class GetJsonObject(json: Expression,

[GitHub] [spark] ulysses-you commented on a diff in pull request #40446: [SPARK-42815][SQL] Subexpression elimination support shortcut expression

2023-03-21 Thread via GitHub
ulysses-you commented on code in PR #40446: URL: https://github.com/apache/spark/pull/40446#discussion_r1143251312 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/EquivalentExpressions.scala: ## @@ -125,13 +128,27 @@ class EquivalentExpressions { }

[GitHub] [spark] EnricoMi commented on pull request #39952: [SPARK-40770][PYTHON][FOLLOW-UP] Improved error messages for mapInPandas for schema mismatch

2023-03-21 Thread via GitHub
EnricoMi commented on PR #39952: URL: https://github.com/apache/spark/pull/39952#issuecomment-1477683110 CC @gatorsmile @xinrong-meng -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] zhengruifeng commented on a diff in pull request #40507: [SPARK-42662][CONNECT][PS] Add `_distributed_sequence_id` for distributed-sequence index.

2023-03-21 Thread via GitHub
zhengruifeng commented on code in PR #40507: URL: https://github.com/apache/spark/pull/40507#discussion_r1143242826 ## python/pyspark/sql/connect/functions.py: ## @@ -2471,6 +2472,13 @@ def udf( udf.__doc__ = pysparkfuncs.udf.__doc__ +def _distributed_sequence_id() ->

[GitHub] [spark] vkn1234 commented on pull request #40034: [SPARK-42447][INFRA] Remove Hadoop 2 GitHub Action job

2023-03-21 Thread via GitHub
vkn1234 commented on PR #40034: URL: https://github.com/apache/spark/pull/40034#issuecomment-1477678151 > Not yet, @bjornjorgensen ~ :) Please hold on any significant changes until Apache Spark 3.4 is released. We still need to backport many bug fixes during Apache Spark 3.4 RC period.

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #40507: [SPARK-42662][CONNECT][PS] Add `_distributed_sequence_id` for distributed-sequence index.

2023-03-21 Thread via GitHub
HyukjinKwon commented on code in PR #40507: URL: https://github.com/apache/spark/pull/40507#discussion_r1143230272 ## python/pyspark/sql/connect/functions.py: ## @@ -2471,6 +2472,13 @@ def udf( udf.__doc__ = pysparkfuncs.udf.__doc__ +def _distributed_sequence_id() ->

[GitHub] [spark] LuciferYang commented on a diff in pull request #40506: [SPARK-42881][SQL] Codegen Support for get_json_object

2023-03-21 Thread via GitHub
LuciferYang commented on code in PR #40506: URL: https://github.com/apache/spark/pull/40506#discussion_r1143228608 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/jsonExpressions.scala: ## @@ -140,18 +135,92 @@ case class GetJsonObject(json: Expression,

[GitHub] [spark] itholic commented on a diff in pull request #40507: [SPARK-42662][CONNECT][PS] Add `_distributed_sequence_id` for distributed-sequence index.

2023-03-21 Thread via GitHub
itholic commented on code in PR #40507: URL: https://github.com/apache/spark/pull/40507#discussion_r1143211271 ## python/pyspark/sql/connect/functions.py: ## @@ -2471,6 +2472,13 @@ def udf( udf.__doc__ = pysparkfuncs.udf.__doc__ +def _distributed_sequence_id() -> Column:

[GitHub] [spark] itholic commented on a diff in pull request #40507: [SPARK-42662][CONNECT][PS] Add `_distributed_sequence_id` for distributed-sequence index.

2023-03-21 Thread via GitHub
itholic commented on code in PR #40507: URL: https://github.com/apache/spark/pull/40507#discussion_r1143207159 ## python/pyspark/sql/connect/functions.py: ## @@ -2471,6 +2472,13 @@ def udf( udf.__doc__ = pysparkfuncs.udf.__doc__ +def _distributed_sequence_id() -> Column:

[GitHub] [spark] itholic commented on a diff in pull request #40507: [SPARK-42662][CONNECT][PS] Add `_distributed_sequence_id` for distributed-sequence index.

2023-03-21 Thread via GitHub
itholic commented on code in PR #40507: URL: https://github.com/apache/spark/pull/40507#discussion_r1143198700 ## python/pyspark/sql/connect/functions.py: ## @@ -2471,6 +2472,13 @@ def udf( udf.__doc__ = pysparkfuncs.udf.__doc__ +def _distributed_sequence_id() -> Column:

[GitHub] [spark] itholic commented on a diff in pull request #40507: [SPARK-42662][CONNECT][PS] Add `_distributed_sequence_id` for distributed-sequence index.

2023-03-21 Thread via GitHub
itholic commented on code in PR #40507: URL: https://github.com/apache/spark/pull/40507#discussion_r1143207159 ## python/pyspark/sql/connect/functions.py: ## @@ -2471,6 +2472,13 @@ def udf( udf.__doc__ = pysparkfuncs.udf.__doc__ +def _distributed_sequence_id() -> Column:

[GitHub] [spark] itholic commented on a diff in pull request #40507: [SPARK-42662][CONNECT][PS] Add `_distributed_sequence_id` for distributed-sequence index.

2023-03-21 Thread via GitHub
itholic commented on code in PR #40507: URL: https://github.com/apache/spark/pull/40507#discussion_r1143198700 ## python/pyspark/sql/connect/functions.py: ## @@ -2471,6 +2472,13 @@ def udf( udf.__doc__ = pysparkfuncs.udf.__doc__ +def _distributed_sequence_id() -> Column:

[GitHub] [spark] itholic commented on a diff in pull request #40507: [SPARK-42662][CONNECT][PS] Add `_distributed_sequence_id` for distributed-sequence index.

2023-03-21 Thread via GitHub
itholic commented on code in PR #40507: URL: https://github.com/apache/spark/pull/40507#discussion_r1143200759 ## python/pyspark/sql/connect/functions.py: ## @@ -2471,6 +2472,13 @@ def udf( udf.__doc__ = pysparkfuncs.udf.__doc__ +def _distributed_sequence_id() -> Column:

[GitHub] [spark] itholic commented on a diff in pull request #40507: [SPARK-42662][CONNECT][PS] Add `_distributed_sequence_id` for distributed-sequence index.

2023-03-21 Thread via GitHub
itholic commented on code in PR #40507: URL: https://github.com/apache/spark/pull/40507#discussion_r1143200759 ## python/pyspark/sql/connect/functions.py: ## @@ -2471,6 +2472,13 @@ def udf( udf.__doc__ = pysparkfuncs.udf.__doc__ +def _distributed_sequence_id() -> Column:

[GitHub] [spark] itholic commented on a diff in pull request #40507: [SPARK-42662][CONNECT][PS] Add `_distributed_sequence_id` for distributed-sequence index.

2023-03-21 Thread via GitHub
itholic commented on code in PR #40507: URL: https://github.com/apache/spark/pull/40507#discussion_r1143198700 ## python/pyspark/sql/connect/functions.py: ## @@ -2471,6 +2472,13 @@ def udf( udf.__doc__ = pysparkfuncs.udf.__doc__ +def _distributed_sequence_id() -> Column:

[GitHub] [spark] cloud-fan commented on a diff in pull request #40385: [SPARK-42753] ReusedExchange refers to non-existent nodes

2023-03-21 Thread via GitHub
cloud-fan commented on code in PR #40385: URL: https://github.com/apache/spark/pull/40385#discussion_r1143136328 ## sql/core/src/test/scala/org/apache/spark/sql/ExplainSuite.scala: ## @@ -771,6 +775,163 @@ class ExplainSuiteAE extends ExplainSuiteHelper with

[GitHub] [spark] zwangsheng commented on pull request #40118: [SPARK-26365][K8S] In kuberentes cluster mode, spark submit should pass driver exit code

2023-03-21 Thread via GitHub
zwangsheng commented on PR #40118: URL: https://github.com/apache/spark/pull/40118#issuecomment-1477559688 Hello @dongjoon-hyun @holdenk , please help review this PR, given that there is a lot of user feedback on JIRA about this issue -- This is an automated message from the Apache Git

[GitHub] [spark] cloud-fan commented on a diff in pull request #40385: [SPARK-42753] ReusedExchange refers to non-existent nodes

2023-03-21 Thread via GitHub
cloud-fan commented on code in PR #40385: URL: https://github.com/apache/spark/pull/40385#discussion_r1143136328 ## sql/core/src/test/scala/org/apache/spark/sql/ExplainSuite.scala: ## @@ -771,6 +775,163 @@ class ExplainSuiteAE extends ExplainSuiteHelper with

[GitHub] [spark] cloud-fan commented on a diff in pull request #40385: [SPARK-42753] ReusedExchange refers to non-existent nodes

2023-03-21 Thread via GitHub
cloud-fan commented on code in PR #40385: URL: https://github.com/apache/spark/pull/40385#discussion_r1143134007 ## sql/core/src/test/scala/org/apache/spark/sql/ExplainSuite.scala: ## @@ -771,6 +775,163 @@ class ExplainSuiteAE extends ExplainSuiteHelper with

[GitHub] [spark] xinrong-meng closed pull request #40486: [SPARK-42340][CONNECT][PYTHON][3.4] Implement Grouped Map API

2023-03-21 Thread via GitHub
xinrong-meng closed pull request #40486: [SPARK-42340][CONNECT][PYTHON][3.4] Implement Grouped Map API URL: https://github.com/apache/spark/pull/40486 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

[GitHub] [spark] xinrong-meng commented on pull request #40486: [SPARK-42340][CONNECT][PYTHON][3.4] Implement Grouped Map API

2023-03-21 Thread via GitHub
xinrong-meng commented on PR #40486: URL: https://github.com/apache/spark/pull/40486#issuecomment-1477553890 Thanks! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

[GitHub] [spark] cloud-fan commented on a diff in pull request #40446: [SPARK-42815][SQL] Subexpression elimination support shortcut expression

2023-03-21 Thread via GitHub
cloud-fan commented on code in PR #40446: URL: https://github.com/apache/spark/pull/40446#discussion_r1143112674 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/EquivalentExpressions.scala: ## @@ -125,13 +128,27 @@ class EquivalentExpressions { }

[GitHub] [spark] zhengruifeng commented on a diff in pull request #40507: [SPARK-42662][CONNECT][PS] Add `_distributed_sequence_id` for distributed-sequence index.

2023-03-21 Thread via GitHub
zhengruifeng commented on code in PR #40507: URL: https://github.com/apache/spark/pull/40507#discussion_r1143100058 ## python/pyspark/sql/connect/functions.py: ## @@ -2471,6 +2472,13 @@ def udf( udf.__doc__ = pysparkfuncs.udf.__doc__ +def _distributed_sequence_id() ->

[GitHub] [spark] LuciferYang commented on a diff in pull request #40408: [SPARK-42780][BUILD] Upgrade `Tink` to 1.8.0

2023-03-21 Thread via GitHub
LuciferYang commented on code in PR #40408: URL: https://github.com/apache/spark/pull/40408#discussion_r1143096360 ## pom.xml: ## @@ -214,7 +214,7 @@ 1.1.0 1.5.0 1.60 -1.7.0 +1.8.0 Review Comment: Got it, we can reuse this jira when upgrading the new

[GitHub] [spark] zhengruifeng commented on pull request #40402: [SPARK-42020][CONNECT][PYTHON] Support UserDefinedType in Spark Connect

2023-03-21 Thread via GitHub
zhengruifeng commented on PR #40402: URL: https://github.com/apache/spark/pull/40402#issuecomment-1477518340 @ueshin Sure, thanks! `StructType().add("label", DoubleType()).add("weight", DoubleType()).add("features", VectorUDT(), False)` works, but the `nullable` in column `features`

[GitHub] [spark] itholic commented on pull request #40270: [WIP][SPARK-42662][CONNECT][PYTHON][PS] Support `withSequenceColumn` as PySpark DataFrame internal function.

2023-03-21 Thread via GitHub
itholic commented on PR #40270: URL: https://github.com/apache/spark/pull/40270#issuecomment-1477485243 Revisit PR: https://github.com/apache/spark/pull/40507 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[GitHub] [spark] peter-toth commented on a diff in pull request #40488: [SPARK-42851][SQL] Replace EquivalentExpressions with mutable map in PhysicalAggregation

2023-03-21 Thread via GitHub
peter-toth commented on code in PR #40488: URL: https://github.com/apache/spark/pull/40488#discussion_r1142961725 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/planning/patterns.scala: ## @@ -296,12 +298,17 @@ object PhysicalAggregation { // build a set of

[GitHub] [spark] peter-toth commented on a diff in pull request #40488: [SPARK-42851][SQL] Replace EquivalentExpressions with mutable map in PhysicalAggregation

2023-03-21 Thread via GitHub
peter-toth commented on code in PR #40488: URL: https://github.com/apache/spark/pull/40488#discussion_r1142961725 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/planning/patterns.scala: ## @@ -296,12 +298,17 @@ object PhysicalAggregation { // build a set of

[GitHub] [spark] itholic opened a new pull request, #40507: [SPARK-42662][CONNECT][PS] Add `_distributed_sequence_id` for distributed-sequence index.

2023-03-21 Thread via GitHub
itholic opened a new pull request, #40507: URL: https://github.com/apache/spark/pull/40507 ### What changes were proposed in this pull request? This PR proposes adding the `_distributed_sequence_id` to support pandas API on Spark in Spark Connect. `_distributed_sequence_id` create

[GitHub] [spark] grundprinzip commented on a diff in pull request #40447: [SPARK-42816][CONNECT] Support Max Message size up to 128MB

2023-03-21 Thread via GitHub
grundprinzip commented on code in PR #40447: URL: https://github.com/apache/spark/pull/40447#discussion_r1143057560 ## connector/connect/server/src/main/scala/org/apache/spark/sql/connect/config/Connect.scala: ## @@ -47,6 +47,16 @@ object Connect {

[GitHub] [spark] bjornjorgensen commented on a diff in pull request #40408: [SPARK-42780][BUILD] Upgrade `Tink` to 1.8.0

2023-03-21 Thread via GitHub
bjornjorgensen commented on code in PR #40408: URL: https://github.com/apache/spark/pull/40408#discussion_r1143050156 ## pom.xml: ## @@ -214,7 +214,7 @@ 1.1.0 1.5.0 1.60 -1.7.0 +1.8.0 Review Comment: yes, there may be a way to do it. I see that maybe

[GitHub] [spark] EnricoMi commented on pull request #40334: [SPARK-42716][SQL] DataSourceV2 supports reporting key-grouped partitioning without HasPartitionKey

2023-03-21 Thread via GitHub
EnricoMi commented on PR #40334: URL: https://github.com/apache/spark/pull/40334#issuecomment-1477439623 @cloud-fan is reporting a clustered distribution still supported? Data sources should be able to report that partitions are partitioned by some columns, without reporting the actual

[GitHub] [spark] cloud-fan commented on pull request #40499: [SPARK-42876][SQL] DataType's physicalDataType should be private[sql]

2023-03-21 Thread via GitHub
cloud-fan commented on PR #40499: URL: https://github.com/apache/spark/pull/40499#issuecomment-1477438241 also cc @xinrong-meng this is from api auditing -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

[GitHub] [spark] cloud-fan closed pull request #40499: [SPARK-42876][SQL] DataType's physicalDataType should be private[sql]

2023-03-21 Thread via GitHub
cloud-fan closed pull request #40499: [SPARK-42876][SQL] DataType's physicalDataType should be private[sql] URL: https://github.com/apache/spark/pull/40499 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[GitHub] [spark] cloud-fan commented on pull request #40499: [SPARK-42876][SQL] DataType's physicalDataType should be private[sql]

2023-03-21 Thread via GitHub
cloud-fan commented on PR #40499: URL: https://github.com/apache/spark/pull/40499#issuecomment-1477436589 thanks, merging to master/3.4! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] EnricoMi commented on pull request #37360: [SPARK-39931][PYTHON][WIP] Improve applyInPandas performance for very small groups

2023-03-21 Thread via GitHub
EnricoMi commented on PR #37360: URL: https://github.com/apache/spark/pull/37360#issuecomment-1477427083 @xinrong-meng what do you think about this? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

[GitHub] [spark] LuciferYang commented on pull request #40506: [SPARK-42881][SQL] Codegen Support for get_json_object

2023-03-21 Thread via GitHub
LuciferYang commented on PR #40506: URL: https://github.com/apache/spark/pull/40506#issuecomment-1477426004 cc @wangyum @cloud-fan FYI -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] EnricoMi commented on pull request #38624: [SPARK-40559][PYTHON] Add applyInArrow to groupBy and cogroup

2023-03-21 Thread via GitHub
EnricoMi commented on PR #38624: URL: https://github.com/apache/spark/pull/38624#issuecomment-1477426095 CC @xinrong-meng -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

[GitHub] [spark] panbingkun commented on pull request #37588: [SPARK-33393][SQL] Support SHOW TABLE EXTENDED in v2

2023-03-21 Thread via GitHub
panbingkun commented on PR #37588: URL: https://github.com/apache/spark/pull/37588#issuecomment-1477419783 @MaxGekk It's appreciated if it can be reviewed in your convenience, thanks! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

[GitHub] [spark] panbingkun opened a new pull request, #40506: [SPARK-42881][SQL] get_json_object Codegen Support

2023-03-21 Thread via GitHub
panbingkun opened a new pull request, #40506: URL: https://github.com/apache/spark/pull/40506 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? No. ### How was this patch

[GitHub] [spark] frankliee commented on pull request #40504: [SPARK-42880] Update running-on-yarn.md to log4j2 syntax

2023-03-21 Thread via GitHub
frankliee commented on PR #40504: URL: https://github.com/apache/spark/pull/40504#issuecomment-1477403287 Yarn NM injects spark.yarn.app.container.log.dir as a system property, so we use ${sys:xxx} to refer it during logging initialization.

[GitHub] [spark] allisonwang-db commented on a diff in pull request #40505: [MINOR][DOCS] Remove SparkSession constructor invocation in the example

2023-03-21 Thread via GitHub
allisonwang-db commented on code in PR #40505: URL: https://github.com/apache/spark/pull/40505#discussion_r1142993161 ## python/pyspark/sql/session.py: ## @@ -178,11 +178,6 @@ class SparkSession(SparkConversionMixin): ... .config("spark.some.config.option",

[GitHub] [spark] HyukjinKwon opened a new pull request, #40505: [MINOR][DOCS] Remove SparkSession constructor invocation in the example

2023-03-21 Thread via GitHub
HyukjinKwon opened a new pull request, #40505: URL: https://github.com/apache/spark/pull/40505 ### What changes were proposed in this pull request? This PR proposes to Remove SparkSession constructor invocation in the example. ### Why are the changes needed?

[GitHub] [spark] HyukjinKwon commented on pull request #40486: [SPARK-42340][CONNECT][PYTHON][3.4] Implement Grouped Map API

2023-03-21 Thread via GitHub
HyukjinKwon commented on PR #40486: URL: https://github.com/apache/spark/pull/40486#issuecomment-1477393696 Merged to branch-3.4. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] HyukjinKwon commented on pull request #40494: [MINOR][DOCS] Fix typos

2023-03-21 Thread via GitHub
HyukjinKwon commented on PR #40494: URL: https://github.com/apache/spark/pull/40494#issuecomment-1477389181 Mind double checking? Seems they are not running https://github.com/sudoliyang/spark/actions -- This is an automated message from the Apache Git Service. To respond to the message,

[GitHub] [spark] HyukjinKwon commented on pull request #40504: [SPARK-42880] Update running-on-yarn.md to log4j2 syntax

2023-03-21 Thread via GitHub
HyukjinKwon commented on PR #40504: URL: https://github.com/apache/spark/pull/40504#issuecomment-1477386014 cc @viirya FYI -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

[GitHub] [spark] frankliee opened a new pull request, #40504: [SPARK-42880] Update running-on-yarn.md for log4j2

2023-03-21 Thread via GitHub
frankliee opened a new pull request, #40504: URL: https://github.com/apache/spark/pull/40504 ### What changes were proposed in this pull request? Update log4j1 syntax to log4j2, and use ${sys:spark.yarn.app.container.log.dir} to relocate log path. see

[GitHub] [spark] smallzhongfeng commented on a diff in pull request #40341: [WIP][SPARK-42715][SQL] Tips for Optimizing NegativeArraySizeException

2023-03-21 Thread via GitHub
smallzhongfeng commented on code in PR #40341: URL: https://github.com/apache/spark/pull/40341#discussion_r1142974426 ## sql/core/src/main/java/org/apache/spark/sql/execution/datasources/orc/OrcColumnarBatchReader.java: ## @@ -204,7 +204,12 @@ public void initBatch( * by

[GitHub] [spark] grundprinzip commented on a diff in pull request #40447: [SPARK-42816][CONNECT] Support Max Message size up to 128MB

2023-03-21 Thread via GitHub
grundprinzip commented on code in PR #40447: URL: https://github.com/apache/spark/pull/40447#discussion_r1142972792 ## connector/connect/server/src/main/scala/org/apache/spark/sql/connect/config/Connect.scala: ## @@ -47,6 +47,16 @@ object Connect {

[GitHub] [spark] grundprinzip commented on a diff in pull request #40498: [SPARK-42878][CONNECT] The table API in DataFrameReader could also accept options

2023-03-21 Thread via GitHub
grundprinzip commented on code in PR #40498: URL: https://github.com/apache/spark/pull/40498#discussion_r1142969355 ## connector/connect/common/src/main/protobuf/spark/connect/relations.proto: ## @@ -122,6 +122,9 @@ message Read { message NamedTable { // (Required)

[GitHub] [spark] peter-toth commented on a diff in pull request #40488: [SPARK-42851][SQL] Replace EquivalentExpressions with mutable map in PhysicalAggregation

2023-03-21 Thread via GitHub
peter-toth commented on code in PR #40488: URL: https://github.com/apache/spark/pull/40488#discussion_r1142961725 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/planning/patterns.scala: ## @@ -296,12 +298,17 @@ object PhysicalAggregation { // build a set of

[GitHub] [spark] MaxGekk closed pull request #39332: [WIP][SPARK-40822][SQL] Stable derived column aliases

2023-03-21 Thread via GitHub
MaxGekk closed pull request #39332: [WIP][SPARK-40822][SQL] Stable derived column aliases URL: https://github.com/apache/spark/pull/39332 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] MaxGekk closed pull request #40126: [SPARK-40822][SQL] Stable derived column aliases

2023-03-21 Thread via GitHub
MaxGekk closed pull request #40126: [SPARK-40822][SQL] Stable derived column aliases URL: https://github.com/apache/spark/pull/40126 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] MaxGekk commented on pull request #40126: [SPARK-40822][SQL] Stable derived column aliases

2023-03-21 Thread via GitHub
MaxGekk commented on PR #40126: URL: https://github.com/apache/spark/pull/40126#issuecomment-1477333100 Merging to master. Thank you, @srielau and @cloud-fan for review. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

[GitHub] [spark] amaliujia commented on a diff in pull request #40498: [SPARK-42878][CONNECT] The table API in DataFrameReader could also accept options

2023-03-21 Thread via GitHub
amaliujia commented on code in PR #40498: URL: https://github.com/apache/spark/pull/40498#discussion_r1142929343 ## connector/connect/common/src/main/protobuf/spark/connect/relations.proto: ## @@ -148,6 +143,13 @@ message Read { // This is only supported by the JDBC data

[GitHub] [spark] amaliujia commented on pull request #40498: [SPARK-42878][CONNECT] The table API in DataFrameReader could also accept options

2023-03-21 Thread via GitHub
amaliujia commented on PR #40498: URL: https://github.com/apache/spark/pull/40498#issuecomment-1477332325 @hvanhovell -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

<    1   2   3   >