[GitHub] [spark] HeartSaVioR commented on a diff in pull request #38719: [SPARK-41199][SS] Fix metrics issue when DSv1 streaming source and DSv2 streaming source are co-used

2022-11-18 Thread GitBox
HeartSaVioR commented on code in PR #38719: URL: https://github.com/apache/spark/pull/38719#discussion_r1026994040 ## sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/ProgressReporter.scala: ## @@ -345,7 +345,14 @@ trait ProgressReporter extends Logging {

[GitHub] [spark] HeartSaVioR commented on a diff in pull request #38719: [SPARK-41199][SS] Fix metrics issue when DSv1 streaming source and DSv2 streaming source are co-used

2022-11-18 Thread GitBox
HeartSaVioR commented on code in PR #38719: URL: https://github.com/apache/spark/pull/38719#discussion_r1026993747 ## sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/ProgressReporter.scala: ## @@ -345,7 +345,14 @@ trait ProgressReporter extends Logging {

[GitHub] [spark] HyukjinKwon commented on pull request #38718: [SPARK-41196][CONNECT][FOLLOW-UP] Fix out of sync generated files for Python

2022-11-18 Thread GitBox
HyukjinKwon commented on PR #38718: URL: https://github.com/apache/spark/pull/38718#issuecomment-1320707595 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] HyukjinKwon closed pull request #38718: [SPARK-41196][CONNECT][FOLLOW-UP] Fix out of sync generated files for Python

2022-11-18 Thread GitBox
HyukjinKwon closed pull request #38718: [SPARK-41196][CONNECT][FOLLOW-UP] Fix out of sync generated files for Python URL: https://github.com/apache/spark/pull/38718 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[GitHub] [spark] panbingkun opened a new pull request, #38721: [WIP][SPARK-41172][SQL] Migrate the ambiguous ref error to an error class

2022-11-18 Thread GitBox
panbingkun opened a new pull request, #38721: URL: https://github.com/apache/spark/pull/38721 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? No. ### How was this patch

[GitHub] [spark] wangyum commented on pull request #38682: [SPARK-41167][SQL] Improve multi like performance by creating a balanced expression tree predicate

2022-11-18 Thread GitBox
wangyum commented on PR #38682: URL: https://github.com/apache/spark/pull/38682#issuecomment-1320804263 cc @cloud-fan -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

[GitHub] [spark] amaliujia commented on pull request #38723: [SPARK-41201][CONNECT][PYTHON] Implement `DataFrame.SelectExpr` in Python client

2022-11-18 Thread GitBox
amaliujia commented on PR #38723: URL: https://github.com/apache/spark/pull/38723#issuecomment-1320811915 @zhengruifeng @HyukjinKwon -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] amaliujia opened a new pull request, #38723: [SPARK-41201][CONNECT][PYTHON] Implement `DataFrame.SelectExpr` in Python client

2022-11-18 Thread GitBox
amaliujia opened a new pull request, #38723: URL: https://github.com/apache/spark/pull/38723 ### What changes were proposed in this pull request? Implement `DataFrame.SelectExpr` in Python client. `SelectExpr` also has a good amount of usage. ### Why are the changes

[GitHub] [spark] MaxGekk commented on pull request #38696: [SPARK-41175][SQL] Assign a name to the error class _LEGACY_ERROR_TEMP_1078

2022-11-18 Thread GitBox
MaxGekk commented on PR #38696: URL: https://github.com/apache/spark/pull/38696#issuecomment-1320812183 +1, LGTM. Merging to master. Thank you, @panbingkun and @srielau for review. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

[GitHub] [spark] AmplabJenkins commented on pull request #38693: [SPARK-41196] [CONNECT] Homogenize the protobuf version across the Spark connect server to use the same major version.

2022-11-18 Thread GitBox
AmplabJenkins commented on PR #38693: URL: https://github.com/apache/spark/pull/38693#issuecomment-1320825239 Can one of the admins verify this patch? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[GitHub] [spark] williamhyun opened a new pull request, #38724: [SPARK-41202][BUILD] Update ORC to 1.7.7

2022-11-18 Thread GitBox
williamhyun opened a new pull request, #38724: URL: https://github.com/apache/spark/pull/38724 ### What changes were proposed in this pull request? This PR aims to update ORC to 1.7.7. ### Why are the changes needed? This will bring the latest bug fixes. ### Does this PR

[GitHub] [spark] viirya commented on a diff in pull request #38719: [SPARK-41199][SS] Fix metrics issue when DSv1 streaming source and DSv2 streaming source are co-used

2022-11-18 Thread GitBox
viirya commented on code in PR #38719: URL: https://github.com/apache/spark/pull/38719#discussion_r1026993007 ## sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/ProgressReporter.scala: ## @@ -345,7 +345,14 @@ trait ProgressReporter extends Logging { val

[GitHub] [spark] hvanhovell opened a new pull request, #38720: [SPARK-41165][SPARK-41184][CONNECT] Fix arrow collect (again) and reenable tests.

2022-11-18 Thread GitBox
hvanhovell opened a new pull request, #38720: URL: https://github.com/apache/spark/pull/38720 ### What changes were proposed in this pull request? The arrow collect code path for connect contains a bug where it would always fall back to JSON. This was caused by the assumption that

[GitHub] [spark] HyukjinKwon commented on pull request #38698: [SPARK-41186][INFRA][PS][TESTS] Upgrade infra and replace `list_run_infos` with `search_runs` in mlflow doctest

2022-11-18 Thread GitBox
HyukjinKwon commented on PR #38698: URL: https://github.com/apache/spark/pull/38698#issuecomment-1320711448 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] WeichenXu123 commented on pull request #38699: [SPARK-41188][CORE][ML] Set executorEnv OMP_NUM_THREADS to be spark.task.cpus by default for spark executor JVM processes

2022-11-18 Thread GitBox
WeichenXu123 commented on PR #38699: URL: https://github.com/apache/spark/pull/38699#issuecomment-1320790862 > If we are setting it in `SparkContext`, do we want to get rid of this from other places like `PythonRunner.compute` ? I think we can remove code in PythonRunner.compute --

[GitHub] [spark] grundprinzip commented on a diff in pull request #38659: [SPARK-41114][CONNECT] Support local data for LocalRelation

2022-11-18 Thread GitBox
grundprinzip commented on code in PR #38659: URL: https://github.com/apache/spark/pull/38659#discussion_r1027046810 ## connector/connect/src/main/scala/org/apache/spark/sql/connect/planner/SparkConnectPlanner.scala: ## @@ -271,8 +273,12 @@ class SparkConnectPlanner(session:

<    1   2