[GitHub] [spark] pan3793 opened a new pull request, #40444: [SPARK-42813][K8S] Print application info when waitAppCompletion is false

2023-03-15 Thread via GitHub
pan3793 opened a new pull request, #40444: URL: https://github.com/apache/spark/pull/40444 ### What changes were proposed in this pull request? ### Why are the changes needed? On K8s cluster mode, when `spark.kubernetes.submission.waitAppCompletion=false`,

[GitHub] [spark] LuciferYang commented on pull request #40403: [SPARK-42754][SQL][UI] Fix backward compatibility issue in nested SQL execution

2023-03-15 Thread via GitHub
LuciferYang commented on PR #40403: URL: https://github.com/apache/spark/pull/40403#issuecomment-1470085933 late LGTM -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

[GitHub] [spark] panbingkun commented on a diff in pull request #40397: [SPARK-42052][SQL] Codegen Support for HiveSimpleUDF

2023-03-15 Thread via GitHub
panbingkun commented on code in PR #40397: URL: https://github.com/apache/spark/pull/40397#discussion_r1137141359 ## sql/hive/src/main/scala/org/apache/spark/sql/hive/hiveUDFs.scala: ## @@ -49,68 +48,140 @@ private[hive] case class HiveSimpleUDF( name: String, funcWrapper:

[GitHub] [spark] LuciferYang commented on a diff in pull request #40440: [SPARK-42808][CORE] Avoid getting availableProcessors every time in `MapOutputTrackerMaster#getStatistics`

2023-03-15 Thread via GitHub
LuciferYang commented on code in PR #40440: URL: https://github.com/apache/spark/pull/40440#discussion_r1137081654 ## core/src/main/scala/org/apache/spark/MapOutputTracker.scala: ## @@ -697,6 +697,8 @@ private[spark] class MapOutputTrackerMaster( pool } + private val

[GitHub] [spark] vicennial opened a new pull request, #40443: [SPARK-42812][CONNECT] Add client_type to AddArtifactsRequest protobuf message

2023-03-15 Thread via GitHub
vicennial opened a new pull request, #40443: URL: https://github.com/apache/spark/pull/40443 ### What changes were proposed in this pull request? The missing `client_type` is added to the `AddArtifactsRequest` protobuf message. ### Why are the changes needed?

[GitHub] [spark] LuciferYang commented on pull request #40442: [SPARK-42809][BUILD] Upgrade scala-maven-plugin from 4.8.0 to 4.8.1

2023-03-15 Thread via GitHub
LuciferYang commented on PR #40442: URL: https://github.com/apache/spark/pull/40442#issuecomment-1470010407 This version fixed a compilation issue with Scala 3.x after 4.7.2 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

[GitHub] [spark] cloud-fan commented on a diff in pull request #40397: [SPARK-42052][SQL] Codegen Support for HiveSimpleUDF

2023-03-15 Thread via GitHub
cloud-fan commented on code in PR #40397: URL: https://github.com/apache/spark/pull/40397#discussion_r1137003368 ## sql/hive/src/main/scala/org/apache/spark/sql/hive/hiveUDFs.scala: ## @@ -49,68 +48,140 @@ private[hive] case class HiveSimpleUDF( name: String, funcWrapper:

[GitHub] [spark] LuciferYang commented on a diff in pull request #40438: [WIP][SPARK-42806][CONNECT] Catalog

2023-03-15 Thread via GitHub
LuciferYang commented on code in PR #40438: URL: https://github.com/apache/spark/pull/40438#discussion_r1136832569 ## connector/connect/client/jvm/src/test/scala/org/apache/spark/sql/CatalogSuite.scala: ## @@ -0,0 +1,68 @@ +/* + * Licensed to the Apache Software Foundation

[GitHub] [spark] beliefer commented on pull request #40410: [SPARK-42783][SQL] Infer window group limit should run as late as possible

2023-03-15 Thread via GitHub
beliefer commented on PR #40410: URL: https://github.com/apache/spark/pull/40410#issuecomment-1469932642 @cloud-fan @dongjoon-hyun Thank you ! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] zero323 commented on pull request #40338: [MINOR][PYTHON] Change TypeVar to private symbols

2023-03-15 Thread via GitHub
zero323 commented on PR #40338: URL: https://github.com/apache/spark/pull/40338#issuecomment-1469912441 > cc @zero323 in case you have some feedback on this. @HyukjinKwon I am OK with that, though there is a bigger issue here. We have `TypeVars` in `py` modules in quite a few places,

[GitHub] [spark] panbingkun commented on pull request #40397: [SPARK-42052][SQL] Codegen Support for HiveSimpleUDF

2023-03-15 Thread via GitHub
panbingkun commented on PR #40397: URL: https://github.com/apache/spark/pull/40397#issuecomment-1469911638 cc @cloud-fan -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

[GitHub] [spark] LuciferYang commented on pull request #40422: [SPARK-42803] Use getParameterCount function instead of getParameterTypes.length

2023-03-15 Thread via GitHub
LuciferYang commented on PR #40422: URL: https://github.com/apache/spark/pull/40422#issuecomment-1469894625 @NarekDW Are there any more similar cases? cc @srowen FYI -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

[GitHub] [spark] LuciferYang commented on pull request #40422: [SPARK-42803] Use getParameterCount function instead of getParameterTypes.length

2023-03-15 Thread via GitHub
LuciferYang commented on PR #40422: URL: https://github.com/apache/spark/pull/40422#issuecomment-1469893777 The pr title should be `[SPARK-42803][CORE][SQL][ML] Use ... ` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

[GitHub] [spark] HyukjinKwon closed pull request #40435: [SPARK-42496][CONNECT][DOCS][FOLLOW-UP] Addressing feedback to remove last ">>>" and adding type(spark) example

2023-03-15 Thread via GitHub
HyukjinKwon closed pull request #40435: [SPARK-42496][CONNECT][DOCS][FOLLOW-UP] Addressing feedback to remove last ">>>" and adding type(spark) example URL: https://github.com/apache/spark/pull/40435 -- This is an automated message from the Apache Git Service. To respond to the message,

[GitHub] [spark] HyukjinKwon commented on pull request #40435: [SPARK-42496][CONNECT][DOCS][FOLLOW-UP] Addressing feedback to remove last ">>>" and adding type(spark) example

2023-03-15 Thread via GitHub
HyukjinKwon commented on PR #40435: URL: https://github.com/apache/spark/pull/40435#issuecomment-1469882767 Merged to master and branch-3.4. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] HyukjinKwon commented on pull request #40436: [SPARK-42619][PS] Add `show_counts` parameter for DataFrame.info

2023-03-15 Thread via GitHub
HyukjinKwon commented on PR #40436: URL: https://github.com/apache/spark/pull/40436#issuecomment-1469882286 cc @itholic and @Yikun if you find some time to review. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

[GitHub] [spark] cloud-fan commented on pull request #40410: [SPARK-42783][SQL] Infer window group limit should run as late as possible

2023-03-15 Thread via GitHub
cloud-fan commented on PR #40410: URL: https://github.com/apache/spark/pull/40410#issuecomment-1469878236 thanks, merging to master! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] cloud-fan closed pull request #40410: [SPARK-42783][SQL] Infer window group limit should run as late as possible

2023-03-15 Thread via GitHub
cloud-fan closed pull request #40410: [SPARK-42783][SQL] Infer window group limit should run as late as possible URL: https://github.com/apache/spark/pull/40410 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[GitHub] [spark] zhengruifeng commented on a diff in pull request #40432: [SPARK-42800][CONNECT][PYTHON][ML] Implement ml function `{array_to_vector, vector_to_array}`

2023-03-15 Thread via GitHub
zhengruifeng commented on code in PR #40432: URL: https://github.com/apache/spark/pull/40432#discussion_r1136941163 ## python/pyspark/sql/connect/ml/functions.py: ## @@ -0,0 +1,38 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license

[GitHub] [spark] zhengruifeng commented on a diff in pull request #40432: [SPARK-42800][CONNECT][PYTHON][ML] Implement ml function `{array_to_vector, vector_to_array}`

2023-03-15 Thread via GitHub
zhengruifeng commented on code in PR #40432: URL: https://github.com/apache/spark/pull/40432#discussion_r1136937191 ## python/pyspark/ml/functions.py: ## @@ -119,6 +122,9 @@ def array_to_vector(col: Column) -> Column: Review Comment: got it -- This is an automated

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #40432: [SPARK-42800][CONNECT][PYTHON][ML] Implement ml function `{array_to_vector, vector_to_array}`

2023-03-15 Thread via GitHub
HyukjinKwon commented on code in PR #40432: URL: https://github.com/apache/spark/pull/40432#discussion_r1136936972 ## python/pyspark/sql/tests/connect/ml/test_connect_ml_function.py: ## @@ -0,0 +1,117 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #40432: [SPARK-42800][CONNECT][PYTHON][ML] Implement ml function `{array_to_vector, vector_to_array}`

2023-03-15 Thread via GitHub
HyukjinKwon commented on code in PR #40432: URL: https://github.com/apache/spark/pull/40432#discussion_r1136935246 ## python/pyspark/sql/connect/ml/functions.py: ## @@ -0,0 +1,38 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #40432: [SPARK-42800][CONNECT][PYTHON][ML] Implement ml function `{array_to_vector, vector_to_array}`

2023-03-15 Thread via GitHub
HyukjinKwon commented on code in PR #40432: URL: https://github.com/apache/spark/pull/40432#discussion_r1136934621 ## python/pyspark/sql/connect/ml/functions.py: ## @@ -0,0 +1,38 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license

[GitHub] [spark] zhengruifeng commented on a diff in pull request #40432: [SPARK-42800][CONNECT][PYTHON][ML] Implement ml function `{array_to_vector, vector_to_array}`

2023-03-15 Thread via GitHub
zhengruifeng commented on code in PR #40432: URL: https://github.com/apache/spark/pull/40432#discussion_r1136932841 ## python/pyspark/sql/tests/connect/ml/test_connect_ml_function.py: ## @@ -0,0 +1,117 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more

[GitHub] [spark] zhengruifeng commented on a diff in pull request #40432: [SPARK-42800][CONNECT][PYTHON][ML] Implement ml function `{array_to_vector, vector_to_array}`

2023-03-15 Thread via GitHub
zhengruifeng commented on code in PR #40432: URL: https://github.com/apache/spark/pull/40432#discussion_r1136932191 ## python/pyspark/sql/tests/connect/ml/test_connect_ml_function.py: ## @@ -0,0 +1,117 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more

[GitHub] [spark] zhengruifeng commented on a diff in pull request #40432: [SPARK-42800][CONNECT][PYTHON][ML] Implement ml function `{array_to_vector, vector_to_array}`

2023-03-15 Thread via GitHub
zhengruifeng commented on code in PR #40432: URL: https://github.com/apache/spark/pull/40432#discussion_r1136931940 ## python/pyspark/sql/tests/connect/ml/test_connect_ml_function.py: ## @@ -0,0 +1,117 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #40432: [SPARK-42800][CONNECT][PYTHON][ML] Implement ml function `{array_to_vector, vector_to_array}`

2023-03-15 Thread via GitHub
HyukjinKwon commented on code in PR #40432: URL: https://github.com/apache/spark/pull/40432#discussion_r1136930751 ## python/pyspark/ml/functions.py: ## @@ -119,6 +122,9 @@ def array_to_vector(col: Column) -> Column: Review Comment: You should probably decorate this via

[GitHub] [spark] zhengruifeng commented on a diff in pull request #40432: [SPARK-42800][CONNECT][PYTHON][ML] Implement ml function `{array_to_vector, vector_to_array}`

2023-03-15 Thread via GitHub
zhengruifeng commented on code in PR #40432: URL: https://github.com/apache/spark/pull/40432#discussion_r1136929510 ## python/pyspark/sql/tests/connect/ml/test_connect_ml_function.py: ## @@ -0,0 +1,117 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #40432: [SPARK-42800][CONNECT][PYTHON][ML] Implement ml function `{array_to_vector, vector_to_array}`

2023-03-15 Thread via GitHub
HyukjinKwon commented on code in PR #40432: URL: https://github.com/apache/spark/pull/40432#discussion_r1136929247 ## python/pyspark/sql/tests/connect/ml/test_connect_ml_function.py: ## @@ -0,0 +1,117 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more

[GitHub] [spark] zhengruifeng commented on a diff in pull request #40297: [SPARK-42412][WIP] Initial PR of Spark connect ML

2023-03-15 Thread via GitHub
zhengruifeng commented on code in PR #40297: URL: https://github.com/apache/spark/pull/40297#discussion_r1136928670 ## python/pyspark/sql/connect/proto/catalog_pb2.pyi: ## @@ -49,6 +49,7 @@ else: DESCRIPTOR: google.protobuf.descriptor.FileDescriptor

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #40432: [SPARK-42800][CONNECT][PYTHON][ML] Implement ml function `{array_to_vector, vector_to_array}`

2023-03-15 Thread via GitHub
HyukjinKwon commented on code in PR #40432: URL: https://github.com/apache/spark/pull/40432#discussion_r1136928710 ## python/pyspark/sql/tests/connect/ml/test_connect_ml_function.py: ## @@ -0,0 +1,117 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #40432: [SPARK-42800][CONNECT][PYTHON][ML] Implement ml function `{array_to_vector, vector_to_array}`

2023-03-15 Thread via GitHub
HyukjinKwon commented on code in PR #40432: URL: https://github.com/apache/spark/pull/40432#discussion_r1136927946 ## python/pyspark/sql/tests/connect/ml/test_connect_ml_function.py: ## @@ -0,0 +1,117 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more

[GitHub] [spark] zhengruifeng commented on a diff in pull request #40297: [SPARK-42412][WIP] Initial PR of Spark connect ML

2023-03-15 Thread via GitHub
zhengruifeng commented on code in PR #40297: URL: https://github.com/apache/spark/pull/40297#discussion_r1136925391 ## python/pyspark/sql/connect/ml/utils.py: ## @@ -0,0 +1,55 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license

[GitHub] [spark] beliefer commented on a diff in pull request #40410: [SPARK-42783][SQL] Infer window group limit should run as late as possible

2023-03-15 Thread via GitHub
beliefer commented on code in PR #40410: URL: https://github.com/apache/spark/pull/40410#discussion_r1136923968 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/InferWindowGroupLimit.scala: ## @@ -84,11 +84,11 @@ object InferWindowGroupLimit extends

[GitHub] [spark] beliefer commented on a diff in pull request #40410: [SPARK-42783][SQL] Infer window group limit should run as late as possible

2023-03-15 Thread via GitHub
beliefer commented on code in PR #40410: URL: https://github.com/apache/spark/pull/40410#discussion_r1136923968 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/InferWindowGroupLimit.scala: ## @@ -84,11 +84,11 @@ object InferWindowGroupLimit extends

[GitHub] [spark] HyukjinKwon commented on pull request #40420: [SPARK-42617][PS] Support `isocalendar` from the pandas 2.0.0

2023-03-15 Thread via GitHub
HyukjinKwon commented on PR #40420: URL: https://github.com/apache/spark/pull/40420#issuecomment-1469854808 Cc @Yikun too if you find some time to review. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

[GitHub] [spark] LuciferYang commented on a diff in pull request #40389: [SPARK-42767][CONNECT][TESTS] Add a precondition to start connect server fallback with `in-memory` and auto ignored some tests

2023-03-15 Thread via GitHub
LuciferYang commented on code in PR #40389: URL: https://github.com/apache/spark/pull/40389#discussion_r1136922602 ## connector/connect/client/jvm/src/test/scala/org/apache/spark/sql/connect/client/util/IntegrationTestUtils.scala: ## @@ -49,6 +51,12 @@ object

[GitHub] [spark] HyukjinKwon commented on pull request #40389: [SPARK-42767][CONNECT][TESTS] Add a precondition to start connect server fallback with `in-memory` and auto ignored some tests strongly d

2023-03-15 Thread via GitHub
HyukjinKwon commented on PR #40389: URL: https://github.com/apache/spark/pull/40389#issuecomment-1469850098 BTW, mind fxing https://github.com/apache/spark/pull/40389#discussion_r1136914961 though -- This is an automated message from the Apache Git Service. To respond to the message,

[GitHub] [spark] LuciferYang commented on pull request #40389: [SPARK-42767][CONNECT][TESTS] Add a precondition to start connect server fallback with `in-memory` and auto ignored some tests strongly d

2023-03-15 Thread via GitHub
LuciferYang commented on PR #40389: URL: https://github.com/apache/spark/pull/40389#issuecomment-1469847935 > > Should we push forward this one? @HyukjinKwon > > Should, I run test on local, can't passed because I build without hive. And the 2.4.0 release can't build passed without

[GitHub] [spark] HyukjinKwon commented on pull request #40338: [MINOR][PYTHON] Change TypeVar to private symbols

2023-03-15 Thread via GitHub
HyukjinKwon commented on PR #40338: URL: https://github.com/apache/spark/pull/40338#issuecomment-1469847407 cc @zero323 in case you have some feedback on this. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[GitHub] [spark] HyukjinKwon closed pull request #40338: [MINOR][PYTHON] Change TypeVar to private symbols

2023-03-15 Thread via GitHub
HyukjinKwon closed pull request #40338: [MINOR][PYTHON] Change TypeVar to private symbols URL: https://github.com/apache/spark/pull/40338 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] HyukjinKwon commented on pull request #40338: [MINOR][PYTHON] Change TypeVar to private symbols

2023-03-15 Thread via GitHub
HyukjinKwon commented on PR #40338: URL: https://github.com/apache/spark/pull/40338#issuecomment-1469845702 Merged to master and branch-3.4. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] HyukjinKwon commented on pull request #40389: [SPARK-42767][CONNECT][TESTS] Add a precondition to start connect server fallback with `in-memory` and auto ignored some tests strongly d

2023-03-15 Thread via GitHub
HyukjinKwon commented on PR #40389: URL: https://github.com/apache/spark/pull/40389#issuecomment-1469844012 cc @grundprinzip and @hvanhovell FYI -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #40389: [SPARK-42767][CONNECT][TESTS] Add a precondition to start connect server fallback with `in-memory` and auto ignored some tests

2023-03-15 Thread via GitHub
HyukjinKwon commented on code in PR #40389: URL: https://github.com/apache/spark/pull/40389#discussion_r1136914961 ## connector/connect/client/jvm/src/test/scala/org/apache/spark/sql/connect/client/util/IntegrationTestUtils.scala: ## @@ -49,6 +51,12 @@ object

[GitHub] [spark] HyukjinKwon commented on pull request #39239: [SPARK-41730][PYTHON] Set tz to UTC while converting of timestamps to python's datetime

2023-03-15 Thread via GitHub
HyukjinKwon commented on PR #39239: URL: https://github.com/apache/spark/pull/39239#issuecomment-1469834155 Okay, I got the problem now. The fix seems fine but we should probably fix `toInternal` too. -- This is an automated message from the Apache Git Service. To respond to the message,

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #39239: [SPARK-41730][PYTHON] Set tz to UTC while converting of timestamps to python's datetime

2023-03-15 Thread via GitHub
HyukjinKwon commented on code in PR #39239: URL: https://github.com/apache/spark/pull/39239#discussion_r1136904292 ## python/pyspark/sql/types.py: ## @@ -276,7 +276,18 @@ def toInternal(self, dt: datetime.datetime) -> int: def fromInternal(self, ts: int) ->

[GitHub] [spark] HyukjinKwon commented on pull request #39239: [SPARK-41730][PYTHON] Set tz to UTC while converting of timestamps to python's datetime

2023-03-15 Thread via GitHub
HyukjinKwon commented on PR #39239: URL: https://github.com/apache/spark/pull/39239#issuecomment-1469830006 Alright. Python's `datetime` is up to the program to deal with naive datetimes (https://docs.python.org/3/library/datetime.html#aware-and-naive-objects): > Whether a naive

[GitHub] [spark] Hisoka-X commented on pull request #40389: [SPARK-42767][CONNECT][TESTS] Add a precondition to start connect server fallback with `in-memory` and auto ignored some tests strongly depe

2023-03-15 Thread via GitHub
Hisoka-X commented on PR #40389: URL: https://github.com/apache/spark/pull/40389#issuecomment-1469820687 > Should we push forward this one? @HyukjinKwon Should, I run test on local, can't passed because I build without hive. And the 2.4.0 release can't build passed without hive.

[GitHub] [spark] panbingkun opened a new pull request, #40442: [SPARK-42809][BUILD] Upgrade scala-maven-plugin from 4.8.0 to 4.8.1

2023-03-15 Thread via GitHub
panbingkun opened a new pull request, #40442: URL: https://github.com/apache/spark/pull/40442 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### How

[GitHub] [spark] panbingkun closed pull request #40441: Spark 42809

2023-03-15 Thread via GitHub
panbingkun closed pull request #40441: Spark 42809 URL: https://github.com/apache/spark/pull/40441 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail:

[GitHub] [spark] panbingkun opened a new pull request, #40441: Spark 42809

2023-03-15 Thread via GitHub
panbingkun opened a new pull request, #40441: URL: https://github.com/apache/spark/pull/40441 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### How

[GitHub] [spark] cxzl25 opened a new pull request, #40440: [SPARK-42808][CORE] Avoid getting availableProcessors every time in MapOutputTrackerMaster#getStatistics

2023-03-15 Thread via GitHub
cxzl25 opened a new pull request, #40440: URL: https://github.com/apache/spark/pull/40440 ### What changes were proposed in this pull request? The return value of `Runtime.getRuntime.availableProcessors` is generally a fixed value. It is not necessary to obtain it every time

[GitHub] [spark] cxzl25 opened a new pull request, #40439: [SPARK-42807][CORE] Apply custom log URL pattern for yarn-client AM log URL in SHS

2023-03-15 Thread via GitHub
cxzl25 opened a new pull request, #40439: URL: https://github.com/apache/spark/pull/40439 ### What changes were proposed in this pull request? Add attributes to MiscellaneousProcessDetails event so that SHS can customize the log url according to attributes. ### Why are the

[GitHub] [spark] WeichenXu123 commented on pull request #40353: [SPARK-42732][PYTHON][CONNECT] Update spark connect session `getOrCreate` behavior to check existing global `_active_spark_session` firs

2023-03-15 Thread via GitHub
WeichenXu123 commented on PR #40353: URL: https://github.com/apache/spark/pull/40353#issuecomment-1469765761 > @WeichenXu123 mind fixing the PR title? This PR doesn't add the getActiveSession method Updated. -- This is an automated message from the Apache Git Service. To respond

[GitHub] [spark] Yikf commented on pull request #40437: [SPARK-41259][SQL] SparkSQLDriver Output schema and result string should be consistent

2023-03-15 Thread via GitHub
Yikf commented on PR #40437: URL: https://github.com/apache/spark/pull/40437#issuecomment-1469756155 @cloud-fan Please take a look if you have time, thanks. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[GitHub] [spark] LuciferYang commented on a diff in pull request #40438: [WIP][CONNECT] Catalog

2023-03-15 Thread via GitHub
LuciferYang commented on code in PR #40438: URL: https://github.com/apache/spark/pull/40438#discussion_r1136832569 ## connector/connect/client/jvm/src/test/scala/org/apache/spark/sql/CatalogSuite.scala: ## @@ -0,0 +1,68 @@ +/* + * Licensed to the Apache Software Foundation

[GitHub] [spark] LuciferYang opened a new pull request, #40438: [WIP][CONNECT] Catalog

2023-03-15 Thread via GitHub
LuciferYang opened a new pull request, #40438: URL: https://github.com/apache/spark/pull/40438 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ###

[GitHub] [spark] cloud-fan closed pull request #40394: [SPARK-42771][SQL] Refactor HiveGenericUDF

2023-03-15 Thread via GitHub
cloud-fan closed pull request #40394: [SPARK-42771][SQL] Refactor HiveGenericUDF URL: https://github.com/apache/spark/pull/40394 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

[GitHub] [spark] cloud-fan commented on pull request #40394: [SPARK-42771][SQL] Refactor HiveGenericUDF

2023-03-15 Thread via GitHub
cloud-fan commented on PR #40394: URL: https://github.com/apache/spark/pull/40394#issuecomment-1469727065 thanks, merging to master! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] lordk911 commented on pull request #38358: [SPARK-40588] FileFormatWriter materializes AQE plan before accessing outputOrdering

2023-03-15 Thread via GitHub
lordk911 commented on PR #38358: URL: https://github.com/apache/spark/pull/38358#issuecomment-1469726173 > @wangyum do you know why it's a problem only in 3.2? Spark3.2.1 don't have this problem. -- This is an automated message from the Apache Git Service. To respond to the

[GitHub] [spark] cloud-fan commented on a diff in pull request #40410: [SPARK-42783][SQL] Infer window group limit should run as late as possible

2023-03-15 Thread via GitHub
cloud-fan commented on code in PR #40410: URL: https://github.com/apache/spark/pull/40410#discussion_r1136819813 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/InferWindowGroupLimit.scala: ## @@ -84,11 +84,11 @@ object InferWindowGroupLimit extends

[GitHub] [spark] cloud-fan commented on a diff in pull request #40385: [SPARK-42753] ReusedExchange refers to non-existent nodes

2023-03-15 Thread via GitHub
cloud-fan commented on code in PR #40385: URL: https://github.com/apache/spark/pull/40385#discussion_r1136812183 ## sql/core/src/test/scala/org/apache/spark/sql/ExplainSuite.scala: ## @@ -771,6 +775,130 @@ class ExplainSuiteAE extends ExplainSuiteHelper with

[GitHub] [spark] Yikf closed pull request #38795: [SPARK-41259][SQL] Spark-sql cli query results should correspond to schema

2023-03-15 Thread via GitHub
Yikf closed pull request #38795: [SPARK-41259][SQL] Spark-sql cli query results should correspond to schema URL: https://github.com/apache/spark/pull/38795 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[GitHub] [spark] Yikf opened a new pull request, #40437: [SPARK-41259][SQL] SparkSQLDriver Output schema and result string should be consistent

2023-03-15 Thread via GitHub
Yikf opened a new pull request, #40437: URL: https://github.com/apache/spark/pull/40437 ### What changes were proposed in this pull request? The spark-sql shell result output of Apache Spark is compatible with the hive, but the schema output is not processed. Let's say

[GitHub] [spark] harupy commented on a diff in pull request #40297: [SPARK-42412][WIP] Initial PR of Spark connect ML

2023-03-15 Thread via GitHub
harupy commented on code in PR #40297: URL: https://github.com/apache/spark/pull/40297#discussion_r1136754416 ## connector/connect/server/src/main/scala/org/apache/spark/sql/connect/ml/MLHandler.scala: ## @@ -0,0 +1,231 @@ +/* + * Licensed to the Apache Software Foundation

[GitHub] [spark] dzhigimont opened a new pull request, #40436: [SPARK-42619][PS] Add `show_counts` parameter for DataFrame.info

2023-03-15 Thread via GitHub
dzhigimont opened a new pull request, #40436: URL: https://github.com/apache/spark/pull/40436 ### What changes were proposed in this pull request? Added `show_counts` parameter for DataFrame.info ### Why are the changes needed? When pandas 2.0.0 is released, we

[GitHub] [spark] LuciferYang commented on a diff in pull request #40160: [SPARK-41725][CONNECT] Eager Execution of DF.sql()

2023-03-15 Thread via GitHub
LuciferYang commented on code in PR #40160: URL: https://github.com/apache/spark/pull/40160#discussion_r1136730120 ## connector/connect/client/jvm/src/test/scala/org/apache/spark/sql/connect/client/util/RemoteSparkSession.scala: ## @@ -69,6 +69,10 @@ object

[GitHub] [spark] zhenlineo commented on a diff in pull request #40160: [SPARK-41725][CONNECT] Eager Execution of DF.sql()

2023-03-15 Thread via GitHub
zhenlineo commented on code in PR #40160: URL: https://github.com/apache/spark/pull/40160#discussion_r1136726276 ## connector/connect/client/jvm/src/test/scala/org/apache/spark/sql/connect/client/util/RemoteSparkSession.scala: ## @@ -69,6 +69,10 @@ object

[GitHub] [spark] LuciferYang commented on a diff in pull request #40160: [SPARK-41725][CONNECT] Eager Execution of DF.sql()

2023-03-15 Thread via GitHub
LuciferYang commented on code in PR #40160: URL: https://github.com/apache/spark/pull/40160#discussion_r1136723935 ## connector/connect/client/jvm/src/test/scala/org/apache/spark/sql/connect/client/util/RemoteSparkSession.scala: ## @@ -69,6 +69,10 @@ object

[GitHub] [spark] zhenlineo commented on a diff in pull request #40358: [SPARK-42733][CONNECT][Followup] Write without path or table

2023-03-15 Thread via GitHub
zhenlineo commented on code in PR #40358: URL: https://github.com/apache/spark/pull/40358#discussion_r1136720132 ## connector/connect/client/jvm/src/test/scala/org/apache/spark/sql/ClientE2ETestSuite.scala: ## @@ -175,6 +176,26 @@ class ClientE2ETestSuite extends

[GitHub] [spark] zhenlineo commented on a diff in pull request #40358: [SPARK-42733][CONNECT][Followup] Write without path or table

2023-03-15 Thread via GitHub
zhenlineo commented on code in PR #40358: URL: https://github.com/apache/spark/pull/40358#discussion_r1136713977 ## connector/connect/client/jvm/src/test/scala/org/apache/spark/sql/ClientE2ETestSuite.scala: ## @@ -175,6 +176,26 @@ class ClientE2ETestSuite extends

[GitHub] [spark] vicennial commented on a diff in pull request #40368: [SPARK-42748][CONNECT] Server-side Artifact Management

2023-03-15 Thread via GitHub
vicennial commented on code in PR #40368: URL: https://github.com/apache/spark/pull/40368#discussion_r1136699340 ## connector/connect/server/src/main/scala/org/apache/spark/sql/connect/service/SparkConnectAddArtifactsHandler.scala: ## @@ -0,0 +1,246 @@ +/* + * Licensed to the

[GitHub] [spark] vicennial commented on a diff in pull request #40368: [SPARK-42748][CONNECT] Server-side Artifact Management

2023-03-15 Thread via GitHub
vicennial commented on code in PR #40368: URL: https://github.com/apache/spark/pull/40368#discussion_r1136698968 ## connector/connect/server/src/main/scala/org/apache/spark/sql/connect/service/SparkConnectAddArtifactsHandler.scala: ## @@ -0,0 +1,246 @@ +/* + * Licensed to the

[GitHub] [spark] vicennial commented on a diff in pull request #40368: [SPARK-42748][CONNECT] Server-side Artifact Management

2023-03-15 Thread via GitHub
vicennial commented on code in PR #40368: URL: https://github.com/apache/spark/pull/40368#discussion_r1136697895 ## core/src/main/scala/org/apache/spark/SparkContext.scala: ## @@ -465,9 +473,24 @@ class SparkContext(config: SparkConf) extends Logging { SparkEnv.set(_env)

[GitHub] [spark] vicennial commented on a diff in pull request #40368: [SPARK-42748][CONNECT] Server-side Artifact Management

2023-03-15 Thread via GitHub
vicennial commented on code in PR #40368: URL: https://github.com/apache/spark/pull/40368#discussion_r1136689662 ## connector/connect/server/src/main/scala/org/apache/spark/sql/connect/service/SparkConnectAddArtifactsHandler.scala: ## @@ -0,0 +1,246 @@ +/* + * Licensed to the

[GitHub] [spark] bjornjorgensen commented on pull request #40431: [SPARK-42799][BUILD] Update SBT build `xercesImpl` version to match with `pom.xml`

2023-03-15 Thread via GitHub
bjornjorgensen commented on PR #40431: URL: https://github.com/apache/spark/pull/40431#issuecomment-1469541755 @dongjoon-hyun Thank you, sir.  -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

[GitHub] [spark] beliefer commented on pull request #40410: [SPARK-42783][SQL] Infer window group limit should run as late as possible

2023-03-15 Thread via GitHub
beliefer commented on PR #40410: URL: https://github.com/apache/spark/pull/40410#issuecomment-1469541110 ping @cloud-fan cc @zhengruifeng -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] StevenChenDatabricks commented on a diff in pull request #40385: [SPARK-42753] ReusedExchange refers to non-existent nodes

2023-03-15 Thread via GitHub
StevenChenDatabricks commented on code in PR #40385: URL: https://github.com/apache/spark/pull/40385#discussion_r1136672487 ## sql/core/src/test/scala/org/apache/spark/sql/ExplainSuite.scala: ## @@ -771,6 +775,130 @@ class ExplainSuiteAE extends ExplainSuiteHelper with

[GitHub] [spark] beliefer commented on pull request #40355: [SPARK-42604][CONNECT] Implement functions.typedlit

2023-03-15 Thread via GitHub
beliefer commented on PR #40355: URL: https://github.com/apache/spark/pull/40355#issuecomment-1469539349 @hvanhovell Could you have time to take a look ? Thanks! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[GitHub] [spark] allanf-db opened a new pull request, #40435: [SPARK-42496][CONNECT][DOCS] Addressing feedback to remove last ">>>" and adding type(spark) example

2023-03-15 Thread via GitHub
allanf-db opened a new pull request, #40435: URL: https://github.com/apache/spark/pull/40435 ### What changes were proposed in this pull request? Removing the last ">>>" in a Python code example based on feedback and adding type(spark) as an example of checking whether a session

[GitHub] [spark] beliefer commented on pull request #40396: [SPARK-42772][SQL] Change the default value of JDBC options about push down to true

2023-03-15 Thread via GitHub
beliefer commented on PR #40396: URL: https://github.com/apache/spark/pull/40396#issuecomment-1469534683 > Thank you for update. BTW, `JDBCDialect` is a documented developer API. We should not change the default value. We should keep the original default value to avoid a breaking change to

[GitHub] [spark] beliefer commented on a diff in pull request #40396: [SPARK-42772][SQL] Change the default value of JDBC options about push down to true

2023-03-15 Thread via GitHub
beliefer commented on code in PR #40396: URL: https://github.com/apache/spark/pull/40396#discussion_r1136665341 ## sql/core/src/main/scala/org/apache/spark/sql/jdbc/MsSqlServerDialect.scala: ## @@ -212,5 +212,5 @@ private object MsSqlServerDialect extends JdbcDialect {

[GitHub] [spark] beliefer commented on a diff in pull request #40396: [SPARK-42772][SQL] Change the default value of JDBC options about push down to true

2023-03-15 Thread via GitHub
beliefer commented on code in PR #40396: URL: https://github.com/apache/spark/pull/40396#discussion_r1136664661 ## sql/core/src/main/scala/org/apache/spark/sql/jdbc/JdbcDialects.scala: ## @@ -583,12 +583,16 @@ abstract class JdbcDialect extends Serializable with Logging {

[GitHub] [spark] beliefer commented on a diff in pull request #40396: [SPARK-42772][SQL] Change the default value of JDBC options about push down to true

2023-03-15 Thread via GitHub
beliefer commented on code in PR #40396: URL: https://github.com/apache/spark/pull/40396#discussion_r1136664231 ## sql/core/src/main/scala/org/apache/spark/sql/jdbc/JdbcDialects.scala: ## @@ -583,12 +583,16 @@ abstract class JdbcDialect extends Serializable with Logging {

[GitHub] [spark] StevenChenDatabricks commented on a diff in pull request #40385: [SPARK-42753] ReusedExchange refers to non-existent nodes

2023-03-15 Thread via GitHub
StevenChenDatabricks commented on code in PR #40385: URL: https://github.com/apache/spark/pull/40385#discussion_r1136663376 ## sql/core/src/test/scala/org/apache/spark/sql/ExplainSuite.scala: ## @@ -771,6 +775,130 @@ class ExplainSuiteAE extends ExplainSuiteHelper with

[GitHub] [spark] WeichenXu123 commented on a diff in pull request #40297: [SPARK-42412][WIP] Initial PR of Spark connect ML

2023-03-15 Thread via GitHub
WeichenXu123 commented on code in PR #40297: URL: https://github.com/apache/spark/pull/40297#discussion_r1136648435 ## connector/connect/server/src/main/scala/org/apache/spark/sql/connect/ml/MLHandler.scala: ## @@ -0,0 +1,231 @@ +/* + * Licensed to the Apache Software

[GitHub] [spark] WeichenXu123 commented on a diff in pull request #40297: [SPARK-42412][WIP] Initial PR of Spark connect ML

2023-03-15 Thread via GitHub
WeichenXu123 commented on code in PR #40297: URL: https://github.com/apache/spark/pull/40297#discussion_r1136643875 ## connector/connect/server/src/main/scala/org/apache/spark/sql/connect/ml/MLHandler.scala: ## @@ -0,0 +1,231 @@ +/* + * Licensed to the Apache Software

[GitHub] [spark] dongjoon-hyun closed pull request #40431: [SPARK-42799][BUILD] Update SBT build `xercesImpl` version to match with `pom.xml`

2023-03-15 Thread via GitHub
dongjoon-hyun closed pull request #40431: [SPARK-42799][BUILD] Update SBT build `xercesImpl` version to match with `pom.xml` URL: https://github.com/apache/spark/pull/40431 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

[GitHub] [spark] dongjoon-hyun commented on pull request #40431: [SPARK-42799][BUILD] Update SBT build `xercesImpl` version to match with `pom.xml`

2023-03-15 Thread via GitHub
dongjoon-hyun commented on PR #40431: URL: https://github.com/apache/spark/pull/40431#issuecomment-1469497908 Thank you all! All tests passed. Merged to master/3.4/3.3/3.2. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

[GitHub] [spark] dongjoon-hyun commented on a diff in pull request #40396: [SPARK-42772][SQL] Change the default value of JDBC options about push down to true

2023-03-15 Thread via GitHub
dongjoon-hyun commented on code in PR #40396: URL: https://github.com/apache/spark/pull/40396#discussion_r1136637368 ## sql/core/src/main/scala/org/apache/spark/sql/jdbc/MsSqlServerDialect.scala: ## @@ -212,5 +212,5 @@ private object MsSqlServerDialect extends JdbcDialect {

[GitHub] [spark] dongjoon-hyun commented on a diff in pull request #40396: [SPARK-42772][SQL] Change the default value of JDBC options about push down to true

2023-03-15 Thread via GitHub
dongjoon-hyun commented on code in PR #40396: URL: https://github.com/apache/spark/pull/40396#discussion_r1136636446 ## sql/core/src/main/scala/org/apache/spark/sql/jdbc/JdbcDialects.scala: ## @@ -583,12 +583,16 @@ abstract class JdbcDialect extends Serializable with Logging {

[GitHub] [spark] dongjoon-hyun commented on a diff in pull request #40396: [SPARK-42772][SQL] Change the default value of JDBC options about push down to true

2023-03-15 Thread via GitHub
dongjoon-hyun commented on code in PR #40396: URL: https://github.com/apache/spark/pull/40396#discussion_r1136636035 ## sql/core/src/main/scala/org/apache/spark/sql/jdbc/JdbcDialects.scala: ## @@ -583,12 +583,16 @@ abstract class JdbcDialect extends Serializable with Logging {

[GitHub] [spark] dongjoon-hyun commented on a diff in pull request #40396: [SPARK-42772][SQL] Change the default value of JDBC options about push down to true

2023-03-15 Thread via GitHub
dongjoon-hyun commented on code in PR #40396: URL: https://github.com/apache/spark/pull/40396#discussion_r1136633973 ## connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/v2/PostgresIntegrationSuite.scala: ## @@ -49,10 +49,6 @@ class

[GitHub] [spark] dongjoon-hyun commented on a diff in pull request #40396: [SPARK-42772][SQL] Change the default value of JDBC options about push down to true

2023-03-15 Thread via GitHub
dongjoon-hyun commented on code in PR #40396: URL: https://github.com/apache/spark/pull/40396#discussion_r1136632994 ## connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/v2/MySQLIntegrationSuite.scala: ## @@ -54,9 +54,6 @@ class MySQLIntegrationSuite

[GitHub] [spark] dongjoon-hyun commented on a diff in pull request #40396: [SPARK-42772][SQL] Change the default value of JDBC options about push down to true

2023-03-15 Thread via GitHub
dongjoon-hyun commented on code in PR #40396: URL: https://github.com/apache/spark/pull/40396#discussion_r1136632844 ## connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/v2/DB2IntegrationSuite.scala: ## @@ -61,9 +61,6 @@ class DB2IntegrationSuite

[GitHub] [spark] NarekDW commented on pull request #40422: [SPARK-42803] Use getParameterCount function instead of getParameterTypes.length

2023-03-15 Thread via GitHub
NarekDW commented on PR #40422: URL: https://github.com/apache/spark/pull/40422#issuecomment-1469468855 > I think we should file a jira to tracking this Done, pls check - https://issues.apache.org/jira/browse/SPARK-42803 -- This is an automated message from the Apache Git Service.

[GitHub] [spark] beliefer commented on a diff in pull request #40396: [SPARK-42772][SQL] Change the default value of JDBC options about push down to true

2023-03-15 Thread via GitHub
beliefer commented on code in PR #40396: URL: https://github.com/apache/spark/pull/40396#discussion_r1136607610 ## docs/sql-migration-guide.md: ## @@ -22,6 +22,10 @@ license: | * Table of contents {:toc} +## Upgrading from Spark SQL 3.4 to 3.5 + +- Since Spark 3.5, the JDBC

[GitHub] [spark] beliefer commented on a diff in pull request #40355: [SPARK-42604][CONNECT] Implement functions.typedlit

2023-03-15 Thread via GitHub
beliefer commented on code in PR #40355: URL: https://github.com/apache/spark/pull/40355#discussion_r1136606050 ## connector/connect/client/jvm/src/test/scala/org/apache/spark/sql/PlanGenerationTestSuite.scala: ## @@ -2065,6 +2065,55 @@ class PlanGenerationTestSuite

[GitHub] [spark] HeartSaVioR closed pull request #40427: [SPARK-42792][SS] Add support for WRITE_FLUSH_BYTES for RocksDB used in streaming stateful operators

2023-03-15 Thread via GitHub
HeartSaVioR closed pull request #40427: [SPARK-42792][SS] Add support for WRITE_FLUSH_BYTES for RocksDB used in streaming stateful operators URL: https://github.com/apache/spark/pull/40427 -- This is an automated message from the Apache Git Service. To respond to the message, please log on

[GitHub] [spark] HeartSaVioR commented on pull request #40427: [SPARK-42792][SS] Add support for WRITE_FLUSH_BYTES for RocksDB used in streaming stateful operators

2023-03-15 Thread via GitHub
HeartSaVioR commented on PR #40427: URL: https://github.com/apache/spark/pull/40427#issuecomment-1469443629 Thanks! Merging to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

<    1   2   3   >