[GitHub] [spark] HeartSaVioR closed pull request #39662: [SPARK-42105][SS][DOCS] Reflect the change of SPARK-40925 to SS guide doc

2023-01-20 Thread GitBox
HeartSaVioR closed pull request #39662: [SPARK-42105][SS][DOCS] Reflect the change of SPARK-40925 to SS guide doc URL: https://github.com/apache/spark/pull/39662 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[GitHub] [spark] AmplabJenkins commented on pull request #39629: [SPARK-42103][PYTHON][ML] Added Instrumentation for PyTorch Distributor

2023-01-20 Thread GitBox
AmplabJenkins commented on PR #39629: URL: https://github.com/apache/spark/pull/39629#issuecomment-1398038934 Can one of the admins verify this patch? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[GitHub] [spark] AmplabJenkins commented on pull request #39628: [SPARK-40264][ML][DOCS] Supplement docstring in pyspark.ml.functions.predict_batch_udf

2023-01-20 Thread GitBox
AmplabJenkins commented on PR #39628: URL: https://github.com/apache/spark/pull/39628#issuecomment-1398038997 Can one of the admins verify this patch? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[GitHub] [spark] AmplabJenkins commented on pull request #39626: An automatic caching solution for Spark

2023-01-20 Thread GitBox
AmplabJenkins commented on PR #39626: URL: https://github.com/apache/spark/pull/39626#issuecomment-1398039062 Can one of the admins verify this patch? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[GitHub] [spark] gengliangwang commented on a diff in pull request #39666: [SPARK-42130][UI] Handle null string values in AccumulableInfo and ProcessSummary

2023-01-20 Thread GitBox
gengliangwang commented on code in PR #39666: URL: https://github.com/apache/spark/pull/39666#discussion_r1082286371 ## core/src/main/scala/org/apache/spark/status/protobuf/Utils.scala: ## @@ -17,10 +17,24 @@ package org.apache.spark.status.protobuf +import

[GitHub] [spark] dongjoon-hyun commented on pull request #39664: [SPARK-42114][SQL] Test of uniform parquet encryption

2023-01-20 Thread GitBox
dongjoon-hyun commented on PR #39664: URL: https://github.com/apache/spark/pull/39664#issuecomment-1398159305 I merged the newer PR, @ggershinsky . :) - https://github.com/apache/spark/commit/e1c630a98c45ae07c43c8cf95979532b51bf59ec -- This is an automated message from the Apache Git

[GitHub] [spark] HeartSaVioR commented on pull request #39662: [SPARK-42105][SS][DOCS] Reflect the change of SPARK-40925 to SS guide doc

2023-01-20 Thread GitBox
HeartSaVioR commented on PR #39662: URL: https://github.com/apache/spark/pull/39662#issuecomment-1398038094 Thanks for quick reviewing! Merging to master. (I'll deal with follow-up PR if there are outstanding post-review comments.) -- This is an automated message from the Apache Git

[GitHub] [spark] gengliangwang commented on pull request #39666: [SPARK-42130][UI] Handle null string values in AccumulableInfo and ProcessSummary

2023-01-20 Thread GitBox
gengliangwang commented on PR #39666: URL: https://github.com/apache/spark/pull/39666#issuecomment-1398047138 cc @LuciferYang @panbingkun @techaddict let's **update all the string fields** to make sure null string values are well handled. To avoid creating too many PRs and making the

[GitHub] [spark] LuciferYang commented on pull request #39642: [SPARK-41677][CORE][SQL][SS][UI] Add Protobuf serializer for `StreamingQueryProgressWrapper`

2023-01-20 Thread GitBox
LuciferYang commented on PR #39642: URL: https://github.com/apache/spark/pull/39642#issuecomment-1398075391 Will refactor after https://github.com/apache/spark/pull/39666 merged -- This is an automated message from the Apache Git Service. To respond to the message, please log on

[GitHub] [spark] gengliangwang commented on a diff in pull request #39666: [SPARK-42130][UI] Handle null string values in AccumulableInfo and ProcessSummary

2023-01-20 Thread GitBox
gengliangwang commented on code in PR #39666: URL: https://github.com/apache/spark/pull/39666#discussion_r1082251439 ## core/src/main/scala/org/apache/spark/status/protobuf/Utils.scala: ## @@ -17,10 +17,24 @@ package org.apache.spark.status.protobuf +import

[GitHub] [spark] sadikovi commented on pull request #39660: [SPARK-42128][SQL] Support TOP (N) for MS SQL Server dialect as an alternative to Limit pushdown

2023-01-20 Thread GitBox
sadikovi commented on PR #39660: URL: https://github.com/apache/spark/pull/39660#issuecomment-1398098578 Thanks @dongjoon-hyun. I will address your comments soon-ish . @beliefer, Yes, you are right. The documentation describes TOP (N) returning the N top rows when used together with

[GitHub] [spark] beliefer opened a new pull request, #39667: [SPARK-42131][SQL] Extract the function that construct the select statement for JDBC dialect.

2023-01-20 Thread GitBox
beliefer opened a new pull request, #39667: URL: https://github.com/apache/spark/pull/39667 ### What changes were proposed in this pull request? Currently, JDBCRDD uses fixed format for SELECT statement. ``` val sqlText = options.prepareQuery + s"SELECT $columnList FROM

[GitHub] [spark] dongjoon-hyun commented on a diff in pull request #39666: [SPARK-42130][UI] Handle null string values in AccumulableInfo and ProcessSummary

2023-01-20 Thread GitBox
dongjoon-hyun commented on code in PR #39666: URL: https://github.com/apache/spark/pull/39666#discussion_r1082308565 ## core/src/main/protobuf/org/apache/spark/status/protobuf/store_types.proto: ## @@ -22,7 +22,12 @@ package org.apache.spark.status.protobuf; * Developer

[GitHub] [spark] dongjoon-hyun commented on a diff in pull request #39666: [SPARK-42130][UI] Handle null string values in AccumulableInfo and ProcessSummary

2023-01-20 Thread GitBox
dongjoon-hyun commented on code in PR #39666: URL: https://github.com/apache/spark/pull/39666#discussion_r1082308565 ## core/src/main/protobuf/org/apache/spark/status/protobuf/store_types.proto: ## @@ -22,7 +22,12 @@ package org.apache.spark.status.protobuf; * Developer

[GitHub] [spark] dongjoon-hyun commented on pull request #39665: [SPARK-42114][SQL][TESTS] Add uniform parquet encryption test case

2023-01-20 Thread GitBox
dongjoon-hyun commented on PR #39665: URL: https://github.com/apache/spark/pull/39665#issuecomment-1398167017 I fixed the `Affected Version` from 3.3.1 to 3.4.0 because this fails in `branch-3.3`. ``` [info] ParquetEncryptionSuite: [info] - SPARK-34990: Write and read an encrypted

[GitHub] [spark] dongjoon-hyun opened a new pull request, #39668: [WIP] Test 3.4.0 tagging

2023-01-20 Thread GitBox
dongjoon-hyun opened a new pull request, #39668: URL: https://github.com/apache/spark/pull/39668 This aims to test the possible test failures on Spark 3.4.0 RC tag. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #39541: [SPARK-42043][CONNECT] Scala Client Result with E2E Tests

2023-01-20 Thread GitBox
HyukjinKwon commented on code in PR #39541: URL: https://github.com/apache/spark/pull/39541#discussion_r1082320316 ## connector/connect/client/jvm/src/test/scala/org/apache/spark/sql/connect/client/util/RemoteSparkSession.scala: ## @@ -0,0 +1,198 @@ +/* + * Licensed to the

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #39541: [SPARK-42043][CONNECT] Scala Client Result with E2E Tests

2023-01-20 Thread GitBox
HyukjinKwon commented on code in PR #39541: URL: https://github.com/apache/spark/pull/39541#discussion_r1082329724 ## connector/connect/client/jvm/src/test/scala/org/apache/spark/sql/connect/client/SparkConnectClientSuite.scala: ## @@ -78,7 +78,7 @@ class

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #39541: [SPARK-42043][CONNECT] Scala Client Result with E2E Tests

2023-01-20 Thread GitBox
HyukjinKwon commented on code in PR #39541: URL: https://github.com/apache/spark/pull/39541#discussion_r1082329225 ## connector/connect/client/jvm/src/test/scala/org/apache/spark/sql/connect/client/util/RemoteSparkSession.scala: ## @@ -0,0 +1,198 @@ +/* + * Licensed to the

[GitHub] [spark] HyukjinKwon commented on pull request #39668: [WIP] Test 3.4.0 tagging

2023-01-20 Thread GitBox
HyukjinKwon commented on PR #39668: URL: https://github.com/apache/spark/pull/39668#issuecomment-1398189235 cc @xinrong-meng FYI -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] antonipp commented on pull request #38376: [SPARK-40817][K8S] `spark.files` should preserve remote files

2023-01-20 Thread GitBox
antonipp commented on PR #38376: URL: https://github.com/apache/spark/pull/38376#issuecomment-1398209302 Thank you for the reviews and for the merge! I am not 100% sure what is the backport process but I opened 2 PRs (for 3.3 and 3.2) since I believe both are still supported based on

[GitHub] [spark] vicennial opened a new pull request, #39672: [SPARK-42133] Add basic Dataset API methods to Spark Connect Scala Client

2023-01-20 Thread GitBox
vicennial opened a new pull request, #39672: URL: https://github.com/apache/spark/pull/39672 ### What changes were proposed in this pull request? Adds the following methods: - Dataframe API methods - project - filter - limit - SparkSession - range (and

[GitHub] [spark] gengliangwang opened a new pull request, #39666: [SPARK-42130][UI] Handle null string values in AccumulableInfo and ProcessSummary

2023-01-20 Thread GitBox
gengliangwang opened a new pull request, #39666: URL: https://github.com/apache/spark/pull/39666 ### What changes were proposed in this pull request? After revisiting https://github.com/apache/spark/pull/39416 and https://github.com/apache/spark/pull/39623, I propose: *

[GitHub] [spark] HeartSaVioR commented on pull request #39647: [SPARK-42075][DSTREAM] Deprecate DStream API

2023-01-20 Thread GitBox
HeartSaVioR commented on PR #39647: URL: https://github.com/apache/spark/pull/39647#issuecomment-1398045113 Thanks! Merging to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] HeartSaVioR closed pull request #39647: [SPARK-42075][DSTREAM] Deprecate DStream API

2023-01-20 Thread GitBox
HeartSaVioR closed pull request #39647: [SPARK-42075][DSTREAM] Deprecate DStream API URL: https://github.com/apache/spark/pull/39647 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #39628: [SPARK-40264][ML][DOCS] Supplement docstring in pyspark.ml.functions.predict_batch_udf

2023-01-20 Thread GitBox
HyukjinKwon commented on code in PR #39628: URL: https://github.com/apache/spark/pull/39628#discussion_r1082211081 ## python/pyspark/ml/functions.py: ## @@ -647,37 +386,369 @@ def predict_columnar(x1: np.ndarray, x2: np.ndarray) -> Mapping[str, np.ndarray] Function

[GitHub] [spark] HyukjinKwon commented on pull request #39665: [SPARK-42114][SQL] Test of uniform parquet encryption

2023-01-20 Thread GitBox
HyukjinKwon commented on PR #39665: URL: https://github.com/apache/spark/pull/39665#issuecomment-1398048922 Mind keeping the PR description template https://github.com/apache/spark/blob/master/.github/PULL_REQUEST_TEMPLATE? -- This is an automated message from the Apache Git Service. To

[GitHub] [spark] LuciferYang commented on pull request #39666: [SPARK-42130][UI] Handle null string values in AccumulableInfo and ProcessSummary

2023-01-20 Thread GitBox
LuciferYang commented on PR #39666: URL: https://github.com/apache/spark/pull/39666#issuecomment-1398066333 > cc @LuciferYang @panbingkun @techaddict let's **update all the string fields** to make sure null string values are well handled. To avoid creating too many PRs and making the

[GitHub] [spark] LuciferYang commented on a diff in pull request #39666: [SPARK-42130][UI] Handle null string values in AccumulableInfo and ProcessSummary

2023-01-20 Thread GitBox
LuciferYang commented on code in PR #39666: URL: https://github.com/apache/spark/pull/39666#discussion_r1082248889 ## core/src/main/scala/org/apache/spark/status/protobuf/Utils.scala: ## @@ -17,10 +17,24 @@ package org.apache.spark.status.protobuf +import

[GitHub] [spark] dongjoon-hyun commented on a diff in pull request #39666: [SPARK-42130][UI] Handle null string values in AccumulableInfo and ProcessSummary

2023-01-20 Thread GitBox
dongjoon-hyun commented on code in PR #39666: URL: https://github.com/apache/spark/pull/39666#discussion_r1082307793 ## core/src/main/protobuf/org/apache/spark/status/protobuf/store_types.proto: ## @@ -22,7 +22,12 @@ package org.apache.spark.status.protobuf; * Developer

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #39541: [SPARK-42043][CONNECT] Scala Client Result with E2E Tests

2023-01-20 Thread GitBox
HyukjinKwon commented on code in PR #39541: URL: https://github.com/apache/spark/pull/39541#discussion_r1082319733 ## connector/connect/client/jvm/src/test/scala/org/apache/spark/sql/connect/client/util/RemoteSparkSession.scala: ## @@ -0,0 +1,198 @@ +/* + * Licensed to the

[GitHub] [spark] dongjoon-hyun commented on pull request #39541: [SPARK-42043][CONNECT] Scala Client Result with E2E Tests

2023-01-20 Thread GitBox
dongjoon-hyun commented on PR #39541: URL: https://github.com/apache/spark/pull/39541#issuecomment-1398177490 BTW, while I was reviewing this PR, I felt the necessity to open an official PR to test any potential test cases on tagging. Here is the general PR to detect any `SNAPSHOT`

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #39541: [SPARK-42043][CONNECT] Scala Client Result with E2E Tests

2023-01-20 Thread GitBox
HyukjinKwon commented on code in PR #39541: URL: https://github.com/apache/spark/pull/39541#discussion_r1082319733 ## connector/connect/client/jvm/src/test/scala/org/apache/spark/sql/connect/client/util/RemoteSparkSession.scala: ## @@ -0,0 +1,198 @@ +/* + * Licensed to the

[GitHub] [spark] antonipp opened a new pull request, #39669: [SPARK-40817][K8S][3.3] `spark.files` should preserve remote files

2023-01-20 Thread GitBox
antonipp opened a new pull request, #39669: URL: https://github.com/apache/spark/pull/39669 ### What changes were proposed in this pull request? Backport https://github.com/apache/spark/pull/38376 to `branch-3.3` You can find a detailed description of the issue and an example

[GitHub] [spark] antonipp opened a new pull request, #39670: [SPARK-40817][K8S][3.2] `spark.files` should preserve remote files

2023-01-20 Thread GitBox
antonipp opened a new pull request, #39670: URL: https://github.com/apache/spark/pull/39670 ### What changes were proposed in this pull request? Backport https://github.com/apache/spark/pull/38376 to `branch-3.2` You can find a detailed description of the issue and an example

[GitHub] [spark] dongjoon-hyun commented on pull request #39671: [SPARK-40303][DOCS] Recommends users to use JDK 8u362 and later versions

2023-01-20 Thread GitBox
dongjoon-hyun commented on PR #39671: URL: https://github.com/apache/spark/pull/39671#issuecomment-1398234989 Oh, is `Zulu` only have that released version, @wangyum ? - https://bugs.openjdk.org/browse/JDK-8296506 I cannot find docker image and Adoptium (Temurin) Java yet. -

[GitHub] [spark] zhengruifeng commented on pull request #39661: [SPARK-41884][CONNECT] Support naive tuple as a nested row

2023-01-20 Thread GitBox
zhengruifeng commented on PR #39661: URL: https://github.com/apache/spark/pull/39661#issuecomment-1398122520 LGTM, thanks! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

[GitHub] [spark] dongjoon-hyun closed pull request #39665: [SPARK-42114][SQL][TESTS] Add uniform parquet encryption test case

2023-01-20 Thread GitBox
dongjoon-hyun closed pull request #39665: [SPARK-42114][SQL][TESTS] Add uniform parquet encryption test case URL: https://github.com/apache/spark/pull/39665 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #39541: [SPARK-42043][CONNECT] Scala Client Result with E2E Tests

2023-01-20 Thread GitBox
HyukjinKwon commented on code in PR #39541: URL: https://github.com/apache/spark/pull/39541#discussion_r1082323765 ## connector/connect/client/jvm/src/test/scala/org/apache/spark/sql/connect/client/util/RemoteSparkSession.scala: ## @@ -0,0 +1,198 @@ +/* + * Licensed to the

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #39541: [SPARK-42043][CONNECT] Scala Client Result with E2E Tests

2023-01-20 Thread GitBox
HyukjinKwon commented on code in PR #39541: URL: https://github.com/apache/spark/pull/39541#discussion_r1082326873 ## connector/connect/client/jvm/src/test/scala/org/apache/spark/sql/connect/client/util/RemoteSparkSession.scala: ## @@ -0,0 +1,198 @@ +/* + * Licensed to the

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #39541: [SPARK-42043][CONNECT] Scala Client Result with E2E Tests

2023-01-20 Thread GitBox
HyukjinKwon commented on code in PR #39541: URL: https://github.com/apache/spark/pull/39541#discussion_r1082329957 ## connector/connect/client/jvm/src/test/scala/org/apache/spark/sql/ClientE2ETestSuite.scala: ## @@ -0,0 +1,43 @@ +/* + * Licensed to the Apache Software

[GitHub] [spark] WeichenXu123 commented on a diff in pull request #39299: [WIP][SPARK-41593][PYTHON][ML] Adding logging from executors

2023-01-20 Thread GitBox
WeichenXu123 commented on code in PR #39299: URL: https://github.com/apache/spark/pull/39299#discussion_r1082414835 ## python/pyspark/ml/torch/log_communication.py: ## @@ -0,0 +1,201 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor

[GitHub] [spark] LuciferYang commented on a diff in pull request #39666: [SPARK-42130][UI] Handle null string values in AccumulableInfo and ProcessSummary

2023-01-20 Thread GitBox
LuciferYang commented on code in PR #39666: URL: https://github.com/apache/spark/pull/39666#discussion_r1082425788 ## core/src/main/scala/org/apache/spark/status/protobuf/Utils.scala: ## @@ -17,10 +17,24 @@ package org.apache.spark.status.protobuf +import

[GitHub] [spark] dongjoon-hyun commented on pull request #39668: [WIP] Test 3.4.0 tagging

2023-01-20 Thread GitBox
dongjoon-hyun commented on PR #39668: URL: https://github.com/apache/spark/pull/39668#issuecomment-1398314758 It seems that we have only one failure. ![Screenshot 2023-01-20 at 4 28 07

[GitHub] [spark] dongjoon-hyun commented on pull request #39671: [SPARK-40303][DOCS] Deprecate old Java 8 versions prior to 8u362

2023-01-20 Thread GitBox
dongjoon-hyun commented on PR #39671: URL: https://github.com/apache/spark/pull/39671#issuecomment-1398319998 To @LuciferYang , I don't think this is a compatibility or any failure. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

[GitHub] [spark] LuciferYang opened a new pull request, #39674: [DON'T MERGE] Test remove SPARK_USE_CONC_INCR_GC

2023-01-20 Thread GitBox
LuciferYang opened a new pull request, #39674: URL: https://github.com/apache/spark/pull/39674 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ###

[GitHub] [spark] panbingkun opened a new pull request, #39675: [MINOR][DOCS] Update the doc of arrow & kubernetes

2023-01-20 Thread GitBox
panbingkun opened a new pull request, #39675: URL: https://github.com/apache/spark/pull/39675 ### What changes were proposed in this pull request? The pr aims to update the doc of arrow & kubernetes. ### Why are the changes needed?

[GitHub] [spark] dongjoon-hyun commented on pull request #39671: [SPARK-40303][DOCS] Deprecate old Java 8 versions prior to 8u362

2023-01-20 Thread GitBox
dongjoon-hyun commented on PR #39671: URL: https://github.com/apache/spark/pull/39671#issuecomment-1398367056 BTW, we didn't cut the branch yet and we still have one month for Apache Spark 3.4.0 release. I'm considering that time period for this decision, @LuciferYang . You are also

[GitHub] [spark] dongjoon-hyun commented on a diff in pull request #39675: [MINOR][DOCS] Update the doc of arrow & kubernetes

2023-01-20 Thread GitBox
dongjoon-hyun commented on code in PR #39675: URL: https://github.com/apache/spark/pull/39675#discussion_r1082532817 ## docs/running-on-kubernetes.md: ## @@ -34,13 +34,13 @@ Please see [Spark Security](security.html) and the specific security sections in Images built from

[GitHub] [spark] LuciferYang commented on a diff in pull request #39674: [DON'T MERGE] Test remove SPARK_USE_CONC_INCR_GC

2023-01-20 Thread GitBox
LuciferYang commented on code in PR #39674: URL: https://github.com/apache/spark/pull/39674#discussion_r1082533145 ## resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala: ## @@ -1005,26 +1005,6 @@ private[spark] class Client( val tmpDir = new

[GitHub] [spark] wangyum commented on pull request #39671: [SPARK-40303][DOCS] Recommends users to use JDK 8u362 and later versions

2023-01-20 Thread GitBox
wangyum commented on PR #39671: URL: https://github.com/apache/spark/pull/39671#issuecomment-1398267365 > Oh, does `Zulu` only have that released version, @wangyum ? > > * https://bugs.openjdk.org/browse/JDK-8296506 > > I cannot find docker image and Adoptium (Temurin) Java

[GitHub] [spark] WeichenXu123 commented on a diff in pull request #39299: [WIP][SPARK-41593][PYTHON][ML] Adding logging from executors

2023-01-20 Thread GitBox
WeichenXu123 commented on code in PR #39299: URL: https://github.com/apache/spark/pull/39299#discussion_r1082420394 ## python/pyspark/ml/torch/log_communication.py: ## @@ -0,0 +1,201 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor

[GitHub] [spark] EnricoMi commented on pull request #39640: [SPARK-38591][SQL] Add flatMapSortedGroups and cogroupSorted

2023-01-20 Thread GitBox
EnricoMi commented on PR #39640: URL: https://github.com/apache/spark/pull/39640#issuecomment-1398282605 @cloud-fan following issue: `ds.groupByKey` adds key columns to the plan: ``` def groupByKey[K: Encoder](func: T => K): KeyValueGroupedDataset[K, T] = { val withGroupingKey

[GitHub] [spark] WeichenXu123 commented on a diff in pull request #39299: [WIP][SPARK-41593][PYTHON][ML] Adding logging from executors

2023-01-20 Thread GitBox
WeichenXu123 commented on code in PR #39299: URL: https://github.com/apache/spark/pull/39299#discussion_r1082428873 ## python/pyspark/ml/torch/log_communication.py: ## @@ -0,0 +1,201 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor

[GitHub] [spark] wecharyu commented on pull request #39115: [SPARK-41563][SQL] Support partition filter in MSCK REPAIR TABLE statement

2023-01-20 Thread GitBox
wecharyu commented on PR #39115: URL: https://github.com/apache/spark/pull/39115#issuecomment-1398308497 > Can you tune the config spark.sql.addPartitionInBatch.size? Setting it to a larger number can reduce the number of RPCs. It does not help in `RepairTableCommand`, when enable

[GitHub] [spark] LuciferYang commented on pull request #39671: [SPARK-40303][DOCS] Deprecate old Java 8 versions prior to 8u362

2023-01-20 Thread GitBox
LuciferYang commented on PR #39671: URL: https://github.com/apache/spark/pull/39671#issuecomment-1398314217 One problem is that GA is still using Temurin 8u352 for build and test. We need to wait for a while before running GA tasks using 8u362. -- This is an automated message

[GitHub] [spark] LuciferYang commented on a diff in pull request #39674: [DON'T MERGE] Test remove SPARK_USE_CONC_INCR_GC

2023-01-20 Thread GitBox
LuciferYang commented on code in PR #39674: URL: https://github.com/apache/spark/pull/39674#discussion_r1082496069 ## resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala: ## @@ -1005,26 +1005,6 @@ private[spark] class Client( val tmpDir = new

[GitHub] [spark] LuciferYang commented on a diff in pull request #39674: [DON'T MERGE] Test remove SPARK_USE_CONC_INCR_GC

2023-01-20 Thread GitBox
LuciferYang commented on code in PR #39674: URL: https://github.com/apache/spark/pull/39674#discussion_r1082496069 ## resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala: ## @@ -1005,26 +1005,6 @@ private[spark] class Client( val tmpDir = new

[GitHub] [spark] LuciferYang commented on pull request #39663: [SPARK-42129][BUILD] Upgrade rocksdbjni to 7.9.2

2023-01-20 Thread GitBox
LuciferYang commented on PR #39663: URL: https://github.com/apache/spark/pull/39663#issuecomment-1398346083 Thanks @dongjoon-hyun -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] dongjoon-hyun commented on pull request #39671: [SPARK-40303][DOCS] Deprecate old Java 8 versions prior to 8u362

2023-01-20 Thread GitBox
dongjoon-hyun commented on PR #39671: URL: https://github.com/apache/spark/pull/39671#issuecomment-1398362594 Timezone issues are inevitably which we need to adjust the code in a regular basis, @LuciferYang . -- This is an automated message from the Apache Git Service. To respond to the

[GitHub] [spark] dongjoon-hyun commented on a diff in pull request #39674: [DON'T MERGE] Test remove SPARK_USE_CONC_INCR_GC

2023-01-20 Thread GitBox
dongjoon-hyun commented on code in PR #39674: URL: https://github.com/apache/spark/pull/39674#discussion_r1082528460 ## resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala: ## @@ -1005,26 +1005,6 @@ private[spark] class Client( val tmpDir = new

[GitHub] [spark] dongjoon-hyun commented on a diff in pull request #39675: [MINOR][DOCS] Update the doc of arrow & kubernetes

2023-01-20 Thread GitBox
dongjoon-hyun commented on code in PR #39675: URL: https://github.com/apache/spark/pull/39675#discussion_r1082529696 ## docs/index.md: ## @@ -45,7 +45,6 @@ Java 8 prior to version 8u201 support is deprecated as of Spark 3.2.0. When using the Scala API, it is necessary for

[GitHub] [spark] srowen commented on pull request #39190: [SPARK-41683][CORE] Fix issue of getting incorrect property numActiveStages in jobs API

2023-01-20 Thread GitBox
srowen commented on PR #39190: URL: https://github.com/apache/spark/pull/39190#issuecomment-1398393768 Yeah but do you know how it happens, or have a theory? Just want to see if the change seems to match with some theory of how it arises. Or does this change definitely change the output

[GitHub] [spark] EnricoMi opened a new pull request, #39673: [SPARK-42132][SQL] Deduplicate attributes in groupByKey.cogroup

2023-01-20 Thread GitBox
EnricoMi opened a new pull request, #39673: URL: https://github.com/apache/spark/pull/39673 ### What changes were proposed in this pull request? This deduplicate attributes that exist on both sides of a `CoGroup` by aliasing the occurrence on the right side. ### Why are the

[GitHub] [spark] EnricoMi commented on pull request #39673: [SPARK-42132][SQL] Deduplicate attributes in groupByKey.cogroup

2023-01-20 Thread GitBox
EnricoMi commented on PR #39673: URL: https://github.com/apache/spark/pull/39673#issuecomment-1398246138 Ideally, `QueryPlan.rewriteAttrs` would not replace occurrences `id#0L#` with `id#13L` in all fields of `CoGroup`, but only in `rightDeserializer`, `rightGroup`, `rightAttr`,

[GitHub] [spark] kuwii commented on pull request #39190: [SPARK-41683][CORE] Fix issue of getting incorrect property numActiveStages in jobs API

2023-01-20 Thread GitBox
kuwii commented on PR #39190: URL: https://github.com/apache/spark/pull/39190#issuecomment-1398263751 @srowen We found this issue in some of Spark applications. Here's the event log of an example, which can be loaded through history server:

[GitHub] [spark] WeichenXu123 commented on a diff in pull request #39299: [WIP][SPARK-41593][PYTHON][ML] Adding logging from executors

2023-01-20 Thread GitBox
WeichenXu123 commented on code in PR #39299: URL: https://github.com/apache/spark/pull/39299#discussion_r1082416985 ## python/pyspark/ml/torch/log_communication.py: ## @@ -0,0 +1,201 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor

[GitHub] [spark] WeichenXu123 commented on a diff in pull request #39299: [WIP][SPARK-41593][PYTHON][ML] Adding logging from executors

2023-01-20 Thread GitBox
WeichenXu123 commented on code in PR #39299: URL: https://github.com/apache/spark/pull/39299#discussion_r1082420774 ## python/pyspark/ml/torch/log_communication.py: ## @@ -0,0 +1,201 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor

[GitHub] [spark] WeichenXu123 commented on a diff in pull request #39369: [SPARK-41775][PYTHON][ML] Adding support for PyForch functions

2023-01-20 Thread GitBox
WeichenXu123 commented on code in PR #39369: URL: https://github.com/apache/spark/pull/39369#discussion_r1082443370 ## python/pyspark/ml/torch/distributor.py: ## @@ -495,32 +546,119 @@ def set_gpus(context: "BarrierTaskContext") -> None: def _run_distributed_training(

[GitHub] [spark] WeichenXu123 commented on a diff in pull request #39369: [SPARK-41775][PYTHON][ML] Adding support for PyForch functions

2023-01-20 Thread GitBox
WeichenXu123 commented on code in PR #39369: URL: https://github.com/apache/spark/pull/39369#discussion_r1082450887 ## python/pyspark/ml/torch/tests/test_distributor.py: ## @@ -224,8 +293,10 @@ def setUp(self) -> None: self.sc =

[GitHub] [spark] dongjoon-hyun commented on pull request #39541: [SPARK-42043][CONNECT] Scala Client Result with E2E Tests

2023-01-20 Thread GitBox
dongjoon-hyun commented on PR #39541: URL: https://github.com/apache/spark/pull/39541#issuecomment-1398317390 As @HyukjinKwon pointed out, this causes a failure for RC and official release. - https://github.com/apache/spark/pull/39668#issuecomment-1398314758 ![Screenshot

[GitHub] [spark] LuciferYang commented on pull request #39671: [SPARK-40303][DOCS] Deprecate old Java 8 versions prior to 8u362

2023-01-20 Thread GitBox
LuciferYang commented on PR #39671: URL: https://github.com/apache/spark/pull/39671#issuecomment-1398316727 Could you use 8u362 to run full UTs offline to check compatibility? Thanks ~ @wangyum -- This is an automated message from the Apache Git Service. To respond to the

[GitHub] [spark] dongjoon-hyun closed pull request #39663: [SPARK-42129][BUILD] Upgrade rocksdbjni to 7.9.2

2023-01-20 Thread GitBox
dongjoon-hyun closed pull request #39663: [SPARK-42129][BUILD] Upgrade rocksdbjni to 7.9.2 URL: https://github.com/apache/spark/pull/39663 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] LuciferYang commented on pull request #39671: [SPARK-40303][DOCS] Deprecate old Java 8 versions prior to 8u362

2023-01-20 Thread GitBox
LuciferYang commented on PR #39671: URL: https://github.com/apache/spark/pull/39671#issuecomment-1398356049 @dongjoon-hyun Hmm...do you remember SPARK-40846? When we upgrade from 8u345 to 8u352 for GA testing, there are some time zone issue that need to be solved by changing the code, so I

[GitHub] [spark] dongjoon-hyun commented on a diff in pull request #39675: [MINOR][DOCS] Update the doc of arrow & kubernetes

2023-01-20 Thread GitBox
dongjoon-hyun commented on code in PR #39675: URL: https://github.com/apache/spark/pull/39675#discussion_r1082534577 ## docs/running-on-kubernetes.md: ## @@ -34,13 +34,13 @@ Please see [Spark Security](security.html) and the specific security sections in Images built from

[GitHub] [spark] LuciferYang commented on pull request #39671: [SPARK-40303][DOCS] Deprecate old Java 8 versions prior to 8u362

2023-01-20 Thread GitBox
LuciferYang commented on PR #39671: URL: https://github.com/apache/spark/pull/39671#issuecomment-1398388671 Ok, plenty of time. I am fine to make this change -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

[GitHub] [spark] dtenedor commented on pull request #39678: [SPARK-16484][SQL] Add HyperLogLogPlusPlus sketch generator/evaluator/aggregator

2023-01-20 Thread via GitHub
dtenedor commented on PR #39678: URL: https://github.com/apache/spark/pull/39678#issuecomment-1399101487 Hi @RyanBerti, after a few initial conversations about this proposal, we wanted to express some questions and opinions here for your consideration. In general, we wholeheartedly

[GitHub] [spark] zhengruifeng commented on pull request #39622: [SPARK-42099][SPARK-41845][CONNECT][PYTHON] Fix `count(*)` and `count(col(*))`

2023-01-20 Thread via GitHub
zhengruifeng commented on PR #39622: URL: https://github.com/apache/spark/pull/39622#issuecomment-1399144510 merged into master, thank you @cloud-fan @HyukjinKwon @dongjoon-hyun ! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

[GitHub] [spark] mridulm commented on a diff in pull request #39654: [MINOR][SHUFFLE] Include IOException in warning log of finalizeShuffleMerge

2023-01-20 Thread via GitHub
mridulm commented on code in PR #39654: URL: https://github.com/apache/spark/pull/39654#discussion_r1083189284 ## common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/RemoteBlockPushResolver.java: ## @@ -815,7 +815,7 @@ public MergeStatuses

[GitHub] [spark] zhengruifeng commented on a diff in pull request #39638: [SPARK-42082][SPARK-41598][PYTHON][CONNECT] Introduce `PySparkValueError` and `PySparkTypeError`

2023-01-20 Thread via GitHub
zhengruifeng commented on code in PR #39638: URL: https://github.com/apache/spark/pull/39638#discussion_r1083224562 ## python/pyspark/sql/tests/test_functions.py: ## @@ -763,25 +798,55 @@ def test_higher_order_function_failures(self): from pyspark.sql.functions import

[GitHub] [spark] mridulm commented on pull request #39682: [SPARK-42139][CORE][SQL] Handle null string values in SQLExecutionUIData/SparkPlanGraphWrapper/SQLPlanMetric

2023-01-20 Thread via GitHub
mridulm commented on PR #39682: URL: https://github.com/apache/spark/pull/39682#issuecomment-1399193866 Thanks for clarifying - yeah, you have to use has/get and either Optional(value).isPresent(set) or null check for set -- This is an automated message from the Apache Git Service. To

[GitHub] [spark] LuciferYang opened a new pull request, #39684: [SPARK-42140][CORE] Handle null string values in ApplicationEnvironmentInfoWrapper/ApplicationInfoWrapper

2023-01-20 Thread via GitHub
LuciferYang opened a new pull request, #39684: URL: https://github.com/apache/spark/pull/39684 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ###

[GitHub] [spark] HyukjinKwon closed pull request #39299: [SPARK-41593][PYTHON][ML] Adding logging from executors

2023-01-20 Thread via GitHub
HyukjinKwon closed pull request #39299: [SPARK-41593][PYTHON][ML] Adding logging from executors URL: https://github.com/apache/spark/pull/39299 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] HyukjinKwon commented on pull request #39299: [SPARK-41593][PYTHON][ML] Adding logging from executors

2023-01-20 Thread via GitHub
HyukjinKwon commented on PR #39299: URL: https://github.com/apache/spark/pull/39299#issuecomment-1399197595 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] LuciferYang commented on a diff in pull request #39682: [SPARK-42139][CORE][SQL] Handle null string values in SQLExecutionUIData/SparkPlanGraphWrapper/SQLPlanMetric

2023-01-20 Thread via GitHub
LuciferYang commented on code in PR #39682: URL: https://github.com/apache/spark/pull/39682#discussion_r1083259379 ## sql/core/src/test/scala/org/apache/spark/status/protobuf/sql/KVStoreProtobufSerializerSuite.scala: ## @@ -48,6 +48,43 @@ class KVStoreProtobufSerializerSuite

[GitHub] [spark] dongjoon-hyun closed pull request #39679: [SPARK-42137][CORE] Enable `spark.kryo.unsafe` by default

2023-01-20 Thread via GitHub
dongjoon-hyun closed pull request #39679: [SPARK-42137][CORE] Enable `spark.kryo.unsafe` by default URL: https://github.com/apache/spark/pull/39679 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

[GitHub] [spark] dongjoon-hyun commented on pull request #39679: [SPARK-42137][CORE] Enable `spark.kryo.unsafe` by default

2023-01-20 Thread via GitHub
dongjoon-hyun commented on PR #39679: URL: https://github.com/apache/spark/pull/39679#issuecomment-1399134868 Merged to master for Apache Spark 3.4.0. The one pyspark pipeline seems to slow, but it's irrelevant to this PR and verified here before. -- This is an automated message from

[GitHub] [spark] zhengruifeng closed pull request #39622: [SPARK-42099][SPARK-41845][CONNECT][PYTHON] Fix `count(*)` and `count(col(*))`

2023-01-20 Thread via GitHub
zhengruifeng closed pull request #39622: [SPARK-42099][SPARK-41845][CONNECT][PYTHON] Fix `count(*)` and `count(col(*))` URL: https://github.com/apache/spark/pull/39622 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

[GitHub] [spark] huaxingao commented on pull request #39678: [SPARK-16484][SQL] Add HyperLogLogPlusPlus sketch generator/evaluator/aggregator

2023-01-20 Thread via GitHub
huaxingao commented on PR #39678: URL: https://github.com/apache/spark/pull/39678#issuecomment-1399184316 @RyanBerti Thanks for the great work! +1 for using Apache DataSketches library. I am wondering if we can use [Theta

[GitHub] [spark] HyukjinKwon commented on pull request #39677: [SPARK-42043][CONNECT][TEST][FOLLOWUP] Fix jar finding bug and use better env vars and time measurement

2023-01-20 Thread via GitHub
HyukjinKwon commented on PR #39677: URL: https://github.com/apache/spark/pull/39677#issuecomment-1399194601 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] HyukjinKwon closed pull request #39677: [SPARK-42043][CONNECT][TEST][FOLLOWUP] Fix jar finding bug and use better env vars and time measurement

2023-01-20 Thread via GitHub
HyukjinKwon closed pull request #39677: [SPARK-42043][CONNECT][TEST][FOLLOWUP] Fix jar finding bug and use better env vars and time measurement URL: https://github.com/apache/spark/pull/39677 -- This is an automated message from the Apache Git Service. To respond to the message, please log

[GitHub] [spark] gengliangwang commented on pull request #39685: [SPARK-42142][UI] Handle null string values in CachedQuantile/ExecutorSummary/PoolData

2023-01-20 Thread via GitHub
gengliangwang commented on PR #39685: URL: https://github.com/apache/spark/pull/39685#issuecomment-1399196312 cc @LuciferYang -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] gengliangwang opened a new pull request, #39685: [SPARK-42142][UI] Handle null string values in CachedQuantile/ExecutorSummary/PoolData

2023-01-20 Thread via GitHub
gengliangwang opened a new pull request, #39685: URL: https://github.com/apache/spark/pull/39685 ### What changes were proposed in this pull request? Similar to https://github.com/apache/spark/pull/39666, this PR handles null string values in

[GitHub] [spark] LuciferYang commented on pull request #39684: [SPARK-42140][CORE] Handle null string values in ApplicationEnvironmentInfoWrapper/ApplicationInfoWrapper

2023-01-20 Thread via GitHub
LuciferYang commented on PR #39684: URL: https://github.com/apache/spark/pull/39684#issuecomment-1399196254 cc @gengliangwang -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] tedyu closed pull request #39654: [MINOR][SHUFFLE] Include IOException in warning log of finalizeShuffleMerge

2023-01-20 Thread via GitHub
tedyu closed pull request #39654: [MINOR][SHUFFLE] Include IOException in warning log of finalizeShuffleMerge URL: https://github.com/apache/spark/pull/39654 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

[GitHub] [spark] joveyuan-db opened a new pull request, #39681: [SPARK-18011] Fix SparkR NA date serialization

2023-01-20 Thread via GitHub
joveyuan-db opened a new pull request, #39681: URL: https://github.com/apache/spark/pull/39681 ### What changes were proposed in this pull request? This PR ensures that SparkR serializes `NA` dates as `"NA"` (string) to avoid an undefined length when deserializing in the JVM. ###

[GitHub] [spark] huaxingao commented on pull request #39676: [SPARK-42134][SQL] Fix getPartitionFiltersAndDataFilters() to handle filters without referenced attributes

2023-01-20 Thread via GitHub
huaxingao commented on PR #39676: URL: https://github.com/apache/spark/pull/39676#issuecomment-1399156214 Merged to 3.3/master. Thanks for fix this @peter-toth -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[GitHub] [spark] huaxingao closed pull request #39676: [SPARK-42134][SQL] Fix getPartitionFiltersAndDataFilters() to handle filters without referenced attributes

2023-01-20 Thread via GitHub
huaxingao closed pull request #39676: [SPARK-42134][SQL] Fix getPartitionFiltersAndDataFilters() to handle filters without referenced attributes URL: https://github.com/apache/spark/pull/39676 -- This is an automated message from the Apache Git Service. To respond to the message, please log

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #39638: [SPARK-42082][SPARK-41598][PYTHON][CONNECT] Introduce `PySparkValueError` and `PySparkTypeError`

2023-01-20 Thread via GitHub
HyukjinKwon commented on code in PR #39638: URL: https://github.com/apache/spark/pull/39638#discussion_r1083233072 ## python/pyspark/sql/tests/test_functions.py: ## @@ -763,25 +798,55 @@ def test_higher_order_function_failures(self): from pyspark.sql.functions import

[GitHub] [spark] mridulm commented on pull request #39644: [SPARK-41415][3.3] SASL Request Retries

2023-01-20 Thread via GitHub
mridulm commented on PR #39644: URL: https://github.com/apache/spark/pull/39644#issuecomment-1399165650 Thanks for the ping @dongjoon-hyun :) I will merge this to 3.3 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

  1   2   3   >