date:20230120

[GitHub] [spark] LuciferYang commented on a diff in pull request #39682: [SPARK-42139][CORE][SQL] Handle null string values in SQLExecutionUIData/SparkPlanGraphWrapper/SQLPlanMetric

2023-01-20 Thread via GitHub



LuciferYang commented on code in PR #39682:
URL: https://github.com/apache/spark/pull/39682#discussion_r1083259379


##
sql/core/src/test/scala/org/apache/spark/status/protobuf/sql/KVStoreProtobufSerializerSuite.scala:
##
@@ -48,6 +48,43 @@ class KVStoreProtobufSerializerSuite extends SparkFunSuite {
 assert(result.metricValues == input.metricValues)
   }
 
+  test("SQLExecutionUIData with null string fields") {

Review Comment:
   
[5bb20d9](https://github.com/apache/spark/pull/39682/commits/5bb20d90ce0e17a999423be3d6577b7835f1aaff)
 move this to previous test case with a new loop



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] gengliangwang commented on pull request #39686: [SPARK-42143][UI] Handle null string values in RDDStorageInfo/RDDDataDistribution/RDDPartitionInfo

2023-01-20 Thread via GitHub



gengliangwang commented on PR #39686:
URL: https://github.com/apache/spark/pull/39686#issuecomment-1399199547

   cc @LuciferYang 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] gengliangwang opened a new pull request, #39686: [SPARK-42143][UI] Handle null string values in RDDStorageInfo/RDDDataDistribution/RDDPartitionInfo

2023-01-20 Thread via GitHub



gengliangwang opened a new pull request, #39686:
URL: https://github.com/apache/spark/pull/39686

   
   
   ### What changes were proposed in this pull request?
   
   
   Similar to https://github.com/apache/spark/pull/39666, this PR handles null 
string values in RDDStorageInfo/RDDDataDistribution/RDDPartitionInfo
   ### Why are the changes needed?
   
   
   Properly handles null string values in the protobuf serializer.
   ### Does this PR introduce _any_ user-facing change?
   
   No
   
   ### How was this patch tested?
   
   New UTs


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #39637: [SPARK-41777][PYSPARK][ML] Integration testing for TorchDistributor

2023-01-20 Thread via GitHub



HyukjinKwon commented on code in PR #39637:
URL: https://github.com/apache/spark/pull/39637#discussion_r1083256527


##
dev/requirements.txt:
##
@@ -59,3 +59,7 @@ googleapis-common-protos==1.56.4
 mypy-protobuf==3.3.0
 googleapis-common-protos-stubs==2.2.0
 grpc-stubs==1.24.11
+
+# TorchDistributor dependencies
+torch==1.13.1
+torchvision==0.14.1

Review Comment:
   You actually don't need to pin this version. This is only for developement 
enviornment, and I would prefer to let other people test it with the latest 
versions.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HyukjinKwon closed pull request #39637: [SPARK-41777][PYSPARK][ML] Integration testing for TorchDistributor

2023-01-20 Thread via GitHub



HyukjinKwon closed pull request #39637: [SPARK-41777][PYSPARK][ML] Integration 
testing for TorchDistributor
URL: https://github.com/apache/spark/pull/39637


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HyukjinKwon commented on pull request #39637: [SPARK-41777][PYSPARK][ML] Integration testing for TorchDistributor

2023-01-20 Thread via GitHub



HyukjinKwon commented on PR #39637:
URL: https://github.com/apache/spark/pull/39637#issuecomment-1399198148

   Merged to master.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HyukjinKwon closed pull request #39299: [SPARK-41593][PYTHON][ML] Adding logging from executors

2023-01-20 Thread via GitHub



HyukjinKwon closed pull request #39299: [SPARK-41593][PYTHON][ML] Adding 
logging from executors
URL: https://github.com/apache/spark/pull/39299


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HyukjinKwon commented on pull request #39299: [SPARK-41593][PYTHON][ML] Adding logging from executors

2023-01-20 Thread via GitHub



HyukjinKwon commented on PR #39299:
URL: https://github.com/apache/spark/pull/39299#issuecomment-1399197595

   Merged to master.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] gengliangwang commented on a diff in pull request #39682: [SPARK-42139][CORE][SQL] Handle null string values in SQLExecutionUIData/SparkPlanGraphWrapper/SQLPlanMetric

2023-01-20 Thread via GitHub



gengliangwang commented on code in PR #39682:
URL: https://github.com/apache/spark/pull/39682#discussion_r1083255761


##
sql/core/src/test/scala/org/apache/spark/status/protobuf/sql/KVStoreProtobufSerializerSuite.scala:
##
@@ -48,6 +48,43 @@ class KVStoreProtobufSerializerSuite extends SparkFunSuite {
 assert(result.metricValues == input.metricValues)
   }
 
+  test("SQLExecutionUIData with null string fields") {

Review Comment:
   why don't we just update the previous test case with a new loop for null 
values. Or, we can abstract the code for checking whether two 
SQLExecutionUIData are the same.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] gengliangwang commented on a diff in pull request #39682: [SPARK-42139][CORE][SQL] Handle null string values in SQLExecutionUIData/SparkPlanGraphWrapper/SQLPlanMetric

2023-01-20 Thread via GitHub



gengliangwang commented on code in PR #39682:
URL: https://github.com/apache/spark/pull/39682#discussion_r1083255631


##
sql/core/src/main/scala/org/apache/spark/status/protobuf/sql/SQLPlanMetricSerializer.scala:
##
@@ -19,18 +19,24 @@ package org.apache.spark.status.protobuf.sql
 
 import org.apache.spark.sql.execution.ui.SQLPlanMetric
 import org.apache.spark.status.protobuf.StoreTypes
+import org.apache.spark.status.protobuf.Utils._
+import org.apache.spark.util.Utils.weakIntern
 
 object SQLPlanMetricSerializer {
 
   def serialize(metric: SQLPlanMetric): StoreTypes.SQLPlanMetric = {
-StoreTypes.SQLPlanMetric.newBuilder()
-  .setName(metric.name)
-  .setAccumulatorId(metric.accumulatorId)
-  .setMetricType(metric.metricType)
-  .build()
+val builder = StoreTypes.SQLPlanMetric.newBuilder()
+setStringField(metric.name, builder.setName)
+builder.setAccumulatorId(metric.accumulatorId)
+setStringField(metric.metricType, builder.setMetricType)
+builder.build()
   }
 
   def deserialize(metrics: StoreTypes.SQLPlanMetric): SQLPlanMetric = {
-SQLPlanMetric(metrics.getName, metrics.getAccumulatorId, 
metrics.getMetricType)
+SQLPlanMetric(
+  name = getStringField(metrics.hasName, () => 
weakIntern(metrics.getName)),

Review Comment:
   ok you are adding weakIntern here



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HyukjinKwon closed pull request #39628: [SPARK-40264][ML][DOCS] Supplement docstring in pyspark.ml.functions.predict_batch_udf

2023-01-20 Thread via GitHub



HyukjinKwon closed pull request #39628: [SPARK-40264][ML][DOCS] Supplement 
docstring in pyspark.ml.functions.predict_batch_udf
URL: https://github.com/apache/spark/pull/39628


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HyukjinKwon commented on pull request #39628: [SPARK-40264][ML][DOCS] Supplement docstring in pyspark.ml.functions.predict_batch_udf

2023-01-20 Thread via GitHub



HyukjinKwon commented on PR #39628:
URL: https://github.com/apache/spark/pull/39628#issuecomment-1399197002

   Merged to master.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] gengliangwang commented on a diff in pull request #39683: [SPARK-42144][CORE][SQL] Handle null string values in StageDataWrapper/StreamBlockData/StreamingQueryData

2023-01-20 Thread via GitHub



gengliangwang commented on code in PR #39683:
URL: https://github.com/apache/spark/pull/39683#discussion_r1083255313


##
core/src/main/protobuf/org/apache/spark/status/protobuf/store_types.proto:
##
@@ -495,9 +495,10 @@ message RDDOperationGraphWrapper {
 }
 
 message StreamingQueryData {
-  string name = 1;
+  optional string name = 1;
+  // convert from uuid, id should not be null

Review Comment:
   let's simply convert all the string fields. We can even have tests to check 
if there is any string which is not optional.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] gengliangwang commented on pull request #39685: [SPARK-42142][UI] Handle null string values in CachedQuantile/ExecutorSummary/PoolData

2023-01-20 Thread via GitHub



gengliangwang commented on PR #39685:
URL: https://github.com/apache/spark/pull/39685#issuecomment-1399196312

   cc @LuciferYang 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] gengliangwang opened a new pull request, #39685: [SPARK-42142][UI] Handle null string values in CachedQuantile/ExecutorSummary/PoolData

2023-01-20 Thread via GitHub



gengliangwang opened a new pull request, #39685:
URL: https://github.com/apache/spark/pull/39685

   
   
   ### What changes were proposed in this pull request?
   
   
   Similar to https://github.com/apache/spark/pull/39666, this PR handles null 
string values in CachedQuantile/ExecutorSummary/PoolData
   ### Why are the changes needed?
   
   
   Properly handles null string values in the protobuf serializer.
   ### Does this PR introduce _any_ user-facing change?
   
   No
   
   ### How was this patch tested?
   
   New UTs


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] LuciferYang commented on pull request #39684: [SPARK-42140][CORE] Handle null string values in ApplicationEnvironmentInfoWrapper/ApplicationInfoWrapper

2023-01-20 Thread via GitHub



LuciferYang commented on PR #39684:
URL: https://github.com/apache/spark/pull/39684#issuecomment-1399196254

   cc @gengliangwang 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] LuciferYang opened a new pull request, #39684: [SPARK-42140][CORE] Handle null string values in ApplicationEnvironmentInfoWrapper/ApplicationInfoWrapper

2023-01-20 Thread via GitHub



LuciferYang opened a new pull request, #39684:
URL: https://github.com/apache/spark/pull/39684

   
   
   ### What changes were proposed in this pull request?
   
   
   
   ### Why are the changes needed?
   
   
   
   ### Does this PR introduce _any_ user-facing change?
   
   
   
   ### How was this patch tested?
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HyukjinKwon closed pull request #39677: [SPARK-42043][CONNECT][TEST][FOLLOWUP] Fix jar finding bug and use better env vars and time measurement

2023-01-20 Thread via GitHub



HyukjinKwon closed pull request #39677: [SPARK-42043][CONNECT][TEST][FOLLOWUP] 
Fix jar finding bug and use better env vars and time measurement
URL: https://github.com/apache/spark/pull/39677


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HyukjinKwon commented on pull request #39677: [SPARK-42043][CONNECT][TEST][FOLLOWUP] Fix jar finding bug and use better env vars and time measurement

2023-01-20 Thread via GitHub



HyukjinKwon commented on PR #39677:
URL: https://github.com/apache/spark/pull/39677#issuecomment-1399194601

   Merged to master.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HyukjinKwon closed pull request #39550: [SPARK-42056][SQL][PROTOBUF] Add missing options for Protobuf functions

2023-01-20 Thread via GitHub



HyukjinKwon closed pull request #39550: [SPARK-42056][SQL][PROTOBUF] Add 
missing options for Protobuf functions
URL: https://github.com/apache/spark/pull/39550


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HyukjinKwon commented on pull request #39550: [SPARK-42056][SQL][PROTOBUF] Add missing options for Protobuf functions

2023-01-20 Thread via GitHub



HyukjinKwon commented on PR #39550:
URL: https://github.com/apache/spark/pull/39550#issuecomment-1399194512

   Merged to master.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] mridulm commented on pull request #39682: [SPARK-42139][CORE][SQL] Handle null string values in SQLExecutionUIData/SparkPlanGraphWrapper/SQLPlanMetric

2023-01-20 Thread via GitHub



mridulm commented on PR #39682:
URL: https://github.com/apache/spark/pull/39682#issuecomment-1399193866

   Thanks for clarifying - yeah, you have to use has/get and either 
Optional(value).isPresent(set) or null check for set


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] gengliangwang closed pull request #39680: [SPARK-42138][UI] Handle null string values in JobData/TaskDataWrapper/ExecutorStageSummaryWrapper

2023-01-20 Thread via GitHub



gengliangwang closed pull request #39680: [SPARK-42138][UI] Handle null string 
values in JobData/TaskDataWrapper/ExecutorStageSummaryWrapper
URL: https://github.com/apache/spark/pull/39680


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] gengliangwang commented on pull request #39680: [SPARK-42138][UI] Handle null string values in JobData/TaskDataWrapper/ExecutorStageSummaryWrapper

2023-01-20 Thread via GitHub



gengliangwang commented on PR #39680:
URL: https://github.com/apache/spark/pull/39680#issuecomment-1399192216

   @dongjoon-hyun @LuciferYang Thanks for the review. Merging to master


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] gengliangwang commented on a diff in pull request #39680: [SPARK-42138][UI] Handle null string values in JobData/TaskDataWrapper/ExecutorStageSummaryWrapper

2023-01-20 Thread via GitHub



gengliangwang commented on code in PR #39680:
URL: https://github.com/apache/spark/pull/39680#discussion_r1083252155


##
core/src/main/protobuf/org/apache/spark/status/protobuf/store_types.proto:
##
@@ -83,10 +83,10 @@ message TaskDataWrapper {
   int64 launch_time = 5;
   int64 result_fetch_start = 6;
   int64 duration = 7;
-  string executor_id = 8;
-  string host = 9;
-  string status = 10;
-  string task_locality = 11;
+  optional string executor_id = 8;

Review Comment:
   Yeah that's complicated



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] huaxingao commented on pull request #39678: [SPARK-16484][SQL] Add HyperLogLogPlusPlus sketch generator/evaluator/aggregator

2023-01-20 Thread via GitHub



huaxingao commented on PR #39678:
URL: https://github.com/apache/spark/pull/39678#issuecomment-1399184316

   @RyanBerti Thanks for the great work!
   
   +1 for using Apache DataSketches library. 
   
   I am wondering if we can use [Theta 
Sketches](https://datasketches.apache.org/docs/Theta/ThetaSketchFramework.html) 
because it supports union, intersection and difference, while HLL sketches only 
supports union. Trino recently has added the support to use Apache 
DataSketches's Theta sketch to calculate distinct value, and Apache Iceberg 
defines Apache DataSketches's Theta as the common data sketch for keeping track 
of distinct values in a table.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] LuciferYang commented on a diff in pull request #39680: [SPARK-42138][UI] Handle null string values in JobData/TaskDataWrapper/ExecutorStageSummaryWrapper

2023-01-20 Thread via GitHub



LuciferYang commented on code in PR #39680:
URL: https://github.com/apache/spark/pull/39680#discussion_r1083244804


##
core/src/main/protobuf/org/apache/spark/status/protobuf/store_types.proto:
##
@@ -83,10 +83,10 @@ message TaskDataWrapper {
   int64 launch_time = 5;
   int64 result_fetch_start = 6;
   int64 duration = 7;
-  string executor_id = 8;
-  string host = 9;
-  string status = 10;
-  string task_locality = 11;
+  optional string executor_id = 8;

Review Comment:
   Have we considered defining a message as follows? 
   
   ```
   message NullableString {
 bool is_null = 1;
 string value = 2;
   }
   ```
   In this way, we can distinguish the string that are originally optional, but 
we need to record one more state



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] LuciferYang commented on pull request #39683: [SPARK-42144][CORE][SQL] Handle null string values in StageDataWrapper/StreamBlockData/StreamingQueryData

2023-01-20 Thread via GitHub



LuciferYang commented on PR #39683:
URL: https://github.com/apache/spark/pull/39683#issuecomment-1399179648

   cc @gengliangwang 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] LuciferYang opened a new pull request, #39683: [SPARK-42144][CORE][SQL] Handle null string values in StageDataWrapper/StreamBlockData/StreamingQueryData

2023-01-20 Thread via GitHub



LuciferYang opened a new pull request, #39683:
URL: https://github.com/apache/spark/pull/39683

   
   
   ### What changes were proposed in this pull request?
   
   
   
   ### Why are the changes needed?
   
   
   
   ### Does this PR introduce _any_ user-facing change?
   
   
   
   ### How was this patch tested?
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] LuciferYang commented on pull request #39682: [SPARK-42139][CORE][SQL] Handle null string values in SQLExecutionUIData/SparkPlanGraphWrapper/SQLPlanMetric

2023-01-20 Thread via GitHub



LuciferYang commented on PR #39682:
URL: https://github.com/apache/spark/pull/39682#issuecomment-1399169469

   And there is another pr do similar work 
https://github.com/apache/spark/pull/39682 @mridulm 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] LuciferYang commented on pull request #39682: [SPARK-42139][CORE][SQL] Handle null string values in SQLExecutionUIData/SparkPlanGraphWrapper/SQLPlanMetric

2023-01-20 Thread via GitHub



LuciferYang commented on PR #39682:
URL: https://github.com/apache/spark/pull/39682#issuecomment-1399169247

   For String type, `SomeProtoBufBuilder. setString (null)` will throw NPE, so 
in https://github.com/apache/spark/pull/39666 @gengliangwang propose:
   
   - checking nullability of all string fields to avoid NPE
   - using optional string for the protobuf definition of all string fields. If 
the deserialized result is None, then set the string field as null.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] mridulm commented on pull request #39682: [SPARK-42139][CORE][SQL] Handle null string values in SQLExecutionUIData/SparkPlanGraphWrapper/SQLPlanMetric

2023-01-20 Thread via GitHub



mridulm commented on PR #39682:
URL: https://github.com/apache/spark/pull/39682#issuecomment-1399168603

   I missed the discussion in #39666 ... can you tell me why this is an issue 
only for protobuf @LuciferYang ?
   If the field is null, it will be null for the regular case as well, right ?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] mridulm commented on pull request #39654: [MINOR][SHUFFLE] Include IOException in warning log of finalizeShuffleMerge

2023-01-20 Thread via GitHub



mridulm commented on PR #39654:
URL: https://github.com/apache/spark/pull/39654#issuecomment-1399168240

   To clarify @tedyu, I am fine with including the exception message 
(ex.getMessage) in the warn message - not the entire stack trace - so there is 
value in the PR !


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] mridulm closed pull request #39645: [SPARK-41415][3.2] SASL Request Retries

2023-01-20 Thread via GitHub



mridulm closed pull request #39645: [SPARK-41415][3.2] SASL Request Retries
URL: https://github.com/apache/spark/pull/39645


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] mridulm commented on pull request #39645: [SPARK-41415][3.2] SASL Request Retries

2023-01-20 Thread via GitHub



mridulm commented on PR #39645:
URL: https://github.com/apache/spark/pull/39645#issuecomment-1399167106

   Merged to 3.2
   Thanks for fixing this @akpatnam25 !


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] LuciferYang commented on pull request #39682: [SPARK-42139][CORE][SQL] Handle null string values in SQLExecutionUIData/SparkPlanGraphWrapper/SQLPlanMetric

2023-01-20 Thread via GitHub



LuciferYang commented on PR #39682:
URL: https://github.com/apache/spark/pull/39682#issuecomment-1399167087

   cc @gengliangwang 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] mridulm commented on pull request #39644: [SPARK-41415][3.3] SASL Request Retries

2023-01-20 Thread via GitHub



mridulm commented on PR #39644:
URL: https://github.com/apache/spark/pull/39644#issuecomment-1399166933

   Closing, as PR has been merged.
   Thanks for backporting this @akpatnam25 !
   And thanks for all the help @dongjoon-hyun :-) (you are literally responding 
in seconds !)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] mridulm closed pull request #39644: [SPARK-41415][3.3] SASL Request Retries

2023-01-20 Thread via GitHub



mridulm closed pull request #39644: [SPARK-41415][3.3] SASL Request Retries
URL: https://github.com/apache/spark/pull/39644


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] mridulm commented on a diff in pull request #39645: [SPARK-41415][3.2] SASL Request Retries

2023-01-20 Thread via GitHub



mridulm commented on code in PR #39645:
URL: https://github.com/apache/spark/pull/39645#discussion_r1083234871


##
common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/RetryingBlockTransferor.java:
##
@@ -187,13 +196,29 @@ private synchronized void initiateRetry() {
 
   /**
* Returns true if we should retry due a block transfer failure. We will 
retry if and only if
-   * the exception was an IOException and we haven't retried 'maxRetries' 
times already.
+   * the exception was an IOException or SaslTimeoutException and we haven't 
retried
+   * 'maxRetries' times already.
*/
   private synchronized boolean shouldRetry(Throwable e) {
 boolean isIOException = e instanceof IOException
-  || (e.getCause() != null && e.getCause() instanceof IOException);
+  || e.getCause() instanceof IOException;

Review Comment:
   This looks like the main change from 3.3/master related to this diff.
   This is fine.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] dongjoon-hyun commented on pull request #39644: [SPARK-41415][3.3] SASL Request Retries

2023-01-20 Thread via GitHub



dongjoon-hyun commented on PR #39644:
URL: https://github.com/apache/spark/pull/39644#issuecomment-1399166157

   Yes, only `master` branch is closed automatically.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] mridulm commented on pull request #39644: [SPARK-41415][3.3] SASL Request Retries

2023-01-20 Thread via GitHub



mridulm commented on PR #39644:
URL: https://github.com/apache/spark/pull/39644#issuecomment-1399166043

   Merged to branch-3.3
   This does not get automatically closed @dongjoon-hyun ?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] dongjoon-hyun commented on pull request #39644: [SPARK-41415][3.3] SASL Request Retries

2023-01-20 Thread via GitHub



dongjoon-hyun commented on PR #39644:
URL: https://github.com/apache/spark/pull/39644#issuecomment-1399165752

   Thank you!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] mridulm commented on pull request #39644: [SPARK-41415][3.3] SASL Request Retries

2023-01-20 Thread via GitHub



mridulm commented on PR #39644:
URL: https://github.com/apache/spark/pull/39644#issuecomment-1399165650

   Thanks for the ping @dongjoon-hyun :)
   I will merge this to 3.3


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #39638: [SPARK-42082][SPARK-41598][PYTHON][CONNECT] Introduce `PySparkValueError` and `PySparkTypeError`

2023-01-20 Thread via GitHub



HyukjinKwon commented on code in PR #39638:
URL: https://github.com/apache/spark/pull/39638#discussion_r1083233072


##
python/pyspark/sql/tests/test_functions.py:
##
@@ -763,25 +798,55 @@ def test_higher_order_function_failures(self):
 from pyspark.sql.functions import col, transform
 
 # Should fail with varargs
-with self.assertRaises(ValueError):
+with self.assertRaises(PySparkException) as pe:

Review Comment:
   +1



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] LuciferYang opened a new pull request, #39682: [SPARK-42139][CORE][SQL] Handle null string values in SQLExecutionUIData/SQLPlanMetric

2023-01-20 Thread via GitHub



LuciferYang opened a new pull request, #39682:
URL: https://github.com/apache/spark/pull/39682

   
   
   ### What changes were proposed in this pull request?
   
   
   
   ### Why are the changes needed?
   
   
   
   ### Does this PR introduce _any_ user-facing change?
   
   
   
   ### How was this patch tested?
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] huaxingao commented on pull request #39676: [SPARK-42134][SQL] Fix getPartitionFiltersAndDataFilters() to handle filters without referenced attributes

2023-01-20 Thread via GitHub



huaxingao commented on PR #39676:
URL: https://github.com/apache/spark/pull/39676#issuecomment-1399156214

   Merged to 3.3/master. Thanks for fix this @peter-toth 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] huaxingao closed pull request #39676: [SPARK-42134][SQL] Fix getPartitionFiltersAndDataFilters() to handle filters without referenced attributes

2023-01-20 Thread via GitHub



huaxingao closed pull request #39676: [SPARK-42134][SQL] Fix 
getPartitionFiltersAndDataFilters() to handle filters without referenced 
attributes
URL: https://github.com/apache/spark/pull/39676


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] joveyuan-db opened a new pull request, #39681: [SPARK-18011] Fix SparkR NA date serialization

2023-01-20 Thread via GitHub



joveyuan-db opened a new pull request, #39681:
URL: https://github.com/apache/spark/pull/39681

   ### What changes were proposed in this pull request?
   This PR ensures that SparkR serializes `NA` dates as `"NA"` (string) to 
avoid an undefined length when deserializing in the JVM.
   
   ### Why are the changes needed?
   Currently, SparkR assumes that a `NegativeArraySizeException` when 
deserializing dates and timestamps represents an `NA`.  However, this handling 
can be made more robust by ensuring that serialization on the R side always 
provides a valid string length (note that `nchar(as.character(as.Date(NA)))` is 
`NA`).
   
   ### Does this PR introduce _any_ user-facing change?
   No
   
   ### How was this patch tested?
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] zhengruifeng commented on a diff in pull request #39638: [SPARK-42082][SPARK-41598][PYTHON][CONNECT] Introduce `PySparkValueError` and `PySparkTypeError`

2023-01-20 Thread via GitHub



zhengruifeng commented on code in PR #39638:
URL: https://github.com/apache/spark/pull/39638#discussion_r1083224562


##
python/pyspark/sql/tests/test_functions.py:
##
@@ -763,25 +798,55 @@ def test_higher_order_function_failures(self):
 from pyspark.sql.functions import col, transform
 
 # Should fail with varargs
-with self.assertRaises(ValueError):
+with self.assertRaises(PySparkException) as pe:

Review Comment:
   should we use subclass PySparkValueError/PySparkTypeError in those tests?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] LuciferYang commented on pull request #39680: [SPARK-42138][UI] Handle null string values in JobData/TaskDataWrapper/ExecutorStageSummaryWrapper

2023-01-20 Thread via GitHub



LuciferYang commented on PR #39680:
URL: https://github.com/apache/spark/pull/39680#issuecomment-1399145487

   @gengliangwang @dongjoon-hyun I add some new tasks and will start work 
later, @panbingkun @techaddict feel free to take anyone from the them and leave 
a comment that you are working on it:
   
   - SPARK-42139: Handle null string values in SQLExecutionUIData/SQLPlanMetric
   - SPARK-42140: Handle null string values in 
ApplicationEnvironmentInfo/RuntimeInfo/PairStrings/ExecutorResourceRequest/TaskResourceRequest
   - SPARK-42141: Handle null string values in 
ApplicationInfo/ApplicationAttemptInfo
   - SPARK-42142: Handle null string values in 
CachedQuantile/ExecutorSummary/PoolData
   - SPARK-42143: Handle null string values in 
RDDStorageInfo/RDDDataDistribution/RDDPartitionInfo
   - SPARK-42144: Handle null string values in 
StageData/StreamBlockData/StreamingQueryData
   - SPARK-42145: Handle null string values in 
SparkPlanGraphNode/SparkPlanGraphClusterWrapper
   
   Some may be further consolidated into one task
   
   And I re-open 
[SPARK-41677](https://issues.apache.org/jira/browse/SPARK-41677)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] zhengruifeng commented on pull request #39622: [SPARK-42099][SPARK-41845][CONNECT][PYTHON] Fix `count()` and `count(col())`

2023-01-20 Thread via GitHub



zhengruifeng commented on PR #39622:
URL: https://github.com/apache/spark/pull/39622#issuecomment-1399144510

   merged into master, thank you @cloud-fan @HyukjinKwon @dongjoon-hyun !


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] zhengruifeng closed pull request #39622: [SPARK-42099][SPARK-41845][CONNECT][PYTHON] Fix `count()` and `count(col())`

2023-01-20 Thread via GitHub



zhengruifeng closed pull request #39622: 
[SPARK-42099][SPARK-41845][CONNECT][PYTHON] Fix `count(*)` and `count(col(*))`
URL: https://github.com/apache/spark/pull/39622


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] dongjoon-hyun closed pull request #39679: [SPARK-42137][CORE] Enable `spark.kryo.unsafe` by default

2023-01-20 Thread via GitHub



dongjoon-hyun closed pull request #39679: [SPARK-42137][CORE] Enable 
`spark.kryo.unsafe` by default
URL: https://github.com/apache/spark/pull/39679


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] dongjoon-hyun commented on pull request #39679: [SPARK-42137][CORE] Enable `spark.kryo.unsafe` by default

2023-01-20 Thread via GitHub



dongjoon-hyun commented on PR #39679:
URL: https://github.com/apache/spark/pull/39679#issuecomment-1399134868

   Merged to master for Apache Spark 3.4.0.
   The one pyspark pipeline seems to slow, but it's irrelevant to this PR and 
verified here before.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] dongjoon-hyun commented on pull request #39679: [SPARK-42137][CORE] Enable `spark.kryo.unsafe` by default

2023-01-20 Thread via GitHub



dongjoon-hyun commented on PR #39679:
URL: https://github.com/apache/spark/pull/39679#issuecomment-1399118708

   Thank you so much, @mridulm !


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] tedyu closed pull request #39654: [MINOR][SHUFFLE] Include IOException in warning log of finalizeShuffleMerge

2023-01-20 Thread via GitHub



tedyu closed pull request #39654: [MINOR][SHUFFLE] Include IOException in 
warning log of finalizeShuffleMerge
URL: https://github.com/apache/spark/pull/39654


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] mridulm commented on a diff in pull request #39654: [MINOR][SHUFFLE] Include IOException in warning log of finalizeShuffleMerge

2023-01-20 Thread via GitHub



mridulm commented on code in PR #39654:
URL: https://github.com/apache/spark/pull/39654#discussion_r1083189284


##
common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/RemoteBlockPushResolver.java:
##
@@ -815,7 +815,7 @@ public MergeStatuses 
finalizeShuffleMerge(FinalizeShuffleMerge msg) {
   } catch (IOException ioe) {
 logger.warn("{} attempt {} shuffle {} shuffleMerge {}: exception 
while " +
 "finalizing shuffle partition {}", msg.appId, 
msg.appAttemptId, msg.shuffleId,
-msg.shuffleMergeId, partition.reduceId);
+msg.shuffleMergeId, partition.reduceId, ioe);

Review Comment:
   Including stack trace in this warn message, is not helpful.
   It is not actionable by users



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] dtenedor commented on pull request #39678: [SPARK-16484][SQL] Add HyperLogLogPlusPlus sketch generator/evaluator/aggregator

2023-01-20 Thread via GitHub



dtenedor commented on PR #39678:
URL: https://github.com/apache/spark/pull/39678#issuecomment-1399101487

   Hi @RyanBerti, after a few initial conversations about this proposal, we 
wanted to express some questions and opinions here for your consideration. In 
general, we wholeheartedly appreciate your time, energy, and attention in 
working on this collection of features in the interest of expanding the 
functionality and usefulness of Apache Spark (as well as increasing consistency 
with other engines).
   
   * We feel strongly that the new function names should not begin with 
"approx_count_distinct" in order to differentiate the new sketch-based approach 
against the existing approximate distinct count implementation. In particular, 
the sketch-based approach exposes intermediate buffers suitable for saving into 
storage and then retrieving and merging later, whereas the existing 
"approx_count_distinct" aggregate function consumes input values and then 
returns the result immediately (without exposing any intermediate sketch to the 
user). This is essentially a function naming detail, and we can consider a 
prefix like "HLL_" for the new functions in order to accomplish this objective. 
This change should not consume too much time to bring into effect.
   
   * After examining the existing helper library in the existing 
[HyperLogLogPlusPLusHelper 
utility](https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/HyperLogLogPlusPlusHelper.scala),
 and comparing to other formats for generating and merging sketches for similar 
purposes, we would like to express a strong preference for using the Apache 
DataSketches library instead (https://datasketches.apache.org/). This is a 
dedicated Apache project with more than a decade of investment behind it, 
supporting very high performance for multiple types of sketches (such as KLL 
sketches for computing approximate quantiles, for example) and with client 
library support in C++, Java, and Python. We can use the Java implementation to 
integrate with Apache Spark using Scala/Java integration. This also aligns with 
the principles of Apache Spark as an open source project 
   
   Best,
   Daniel
   
   @czarifis
   @mkaravel
   @gatorsmile 
   @entong


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] dongjoon-hyun closed pull request #39675: [MINOR][DOCS] Remove Python 3.9 and Apache Arrow warning comment

2023-01-20 Thread via GitHub



dongjoon-hyun closed pull request #39675: [MINOR][DOCS] Remove Python 3.9 and 
Apache Arrow warning comment
URL: https://github.com/apache/spark/pull/39675


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] panbingkun commented on pull request #39675: [MINOR][DOCS] Update the doc of arrow & kubernetes

2023-01-20 Thread via GitHub



panbingkun commented on PR #39675:
URL: https://github.com/apache/spark/pull/39675#issuecomment-1399080548

   > it looks like controversial because K8s v1.25 is not available yet and the 
proposed one is still beta until v1.24.
   
   Ok, let me narrow down to Arrow first. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] gengliangwang commented on pull request #39680: [SPARK-42138][UI] Handle null string values in JobData/TaskDataWrapper/ExecutorStageSummaryWrapper

2023-01-20 Thread via GitHub



gengliangwang commented on PR #39680:
URL: https://github.com/apache/spark/pull/39680#issuecomment-1399073949

   cc @LuciferYang @panbingkun @techaddict 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] gengliangwang opened a new pull request, #39680: [SPARK-42138][UI] Handle null string values in JobData/TaskDataWrapper/ExecutorStageSummaryWrapper

2023-01-20 Thread via GitHub



gengliangwang opened a new pull request, #39680:
URL: https://github.com/apache/spark/pull/39680

   
   
   ### What changes were proposed in this pull request?
   
   
   Similar to https://github.com/apache/spark/pull/39666, this PR handles null 
string values in JobData/TaskDataWrapper/ExecutorStageSummaryWrapper
   ### Why are the changes needed?
   
   
   Properly handles null string values in the protobuf serializer.
   ### Does this PR introduce _any_ user-facing change?
   
   No
   
   ### How was this patch tested?
   
   New UTs


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] xinrong-meng commented on a diff in pull request #39585: [SPARK-42124][PYTHON][CONNECT] Scalar Inline Python UDF in Spark Connect

2023-01-20 Thread via GitHub



xinrong-meng commented on code in PR #39585:
URL: https://github.com/apache/spark/pull/39585#discussion_r1083112645


##
connector/connect/common/src/main/protobuf/spark/connect/expressions.proto:
##
@@ -217,6 +218,28 @@ message Expression {
 bool is_user_defined_function = 4;
   }
 
+  message ScalarInlineUserDefinedFunction {

Review Comment:
   Sounds good. How about the new ScalarInlineUserDefinedFunction as shown 
below? We will serialize the existing PythonUDF message to `language_sepecific` 
field then. Please let me know if that's what we want. @hvanhovell 
@grundprinzip 
   
   ```
 message ScalarInlineUserDefinedFunction {
   
   string function_name = 1;
   
   bool deterministic = 2;
   
   repeated Expression arguments = 3;
   
   FunctionLanguage language = 4;
   
   bytes language_sepecific = 5;
 }
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] xinrong-meng commented on a diff in pull request #39585: [SPARK-42124][PYTHON][CONNECT] Scalar Inline Python UDF in Spark Connect

2023-01-20 Thread via GitHub



xinrong-meng commented on code in PR #39585:
URL: https://github.com/apache/spark/pull/39585#discussion_r1083112645


##
connector/connect/common/src/main/protobuf/spark/connect/expressions.proto:
##
@@ -217,6 +218,28 @@ message Expression {
 bool is_user_defined_function = 4;
   }
 
+  message ScalarInlineUserDefinedFunction {

Review Comment:
   Sounds good. How about the new ScalarInlineUserDefinedFunction as shown 
below? We will serialize the existing PythonUDF message to `language_sepecific` 
field then. Please let me know if that's what we want. @hvanhovell 
@grundprinzip 
   
   ```
 message ScalarInlineUserDefinedFunction {
   
   string function_name = 1;
   
   bool deterministic = 2;
   
   repeated Expression arguments = 3;
   
   FunctionLanguage language = 4;
   
   bytes language_sepecific = 5
 }
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] dongjoon-hyun commented on pull request #38376: [SPARK-40817][K8S] `spark.files` should preserve remote files

2023-01-20 Thread via GitHub



dongjoon-hyun commented on PR #38376:
URL: https://github.com/apache/spark/pull/38376#issuecomment-1399039783

   All PRs (master/3.3/3.2) are merged.
   
   I added to you the Apache Spark contributor group and assign SPARK-40817 to 
you.
   
   Welcome to the Apache Spark community, @antonipp !


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] dongjoon-hyun closed pull request #39670: [SPARK-40817][K8S][3.2] `spark.files` should preserve remote files

2023-01-20 Thread via GitHub



dongjoon-hyun closed pull request #39670: [SPARK-40817][K8S][3.2] `spark.files` 
should preserve remote files
URL: https://github.com/apache/spark/pull/39670


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] dongjoon-hyun closed pull request #39669: [SPARK-40817][K8S][3.3] `spark.files` should preserve remote files

2023-01-20 Thread via GitHub



dongjoon-hyun closed pull request #39669: [SPARK-40817][K8S][3.3] `spark.files` 
should preserve remote files
URL: https://github.com/apache/spark/pull/39669


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] dongjoon-hyun opened a new pull request, #39679: [SPARK-42137][CORE] Enable `spark.kryo.unsafe` by default

2023-01-20 Thread via GitHub



dongjoon-hyun opened a new pull request, #39679:
URL: https://github.com/apache/spark/pull/39679

   
   
   ### What changes were proposed in this pull request?
   
   
   
   ### Why are the changes needed?
   
   
   
   ### Does this PR introduce _any_ user-facing change?
   
   
   
   ### How was this patch tested?
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] rangadi commented on pull request #39550: [SPARK-42056][SQL][PROTOBUF] Add missing options for Protobuf functions

2023-01-20 Thread via GitHub



rangadi commented on PR #39550:
URL: https://github.com/apache/spark/pull/39550#issuecomment-1399007794

   @dongjoon-hyun @HyukjinKwon could you approve this and merge?
   Mainly documentation change and passes 'options' from Python to scala.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] rangadi commented on a diff in pull request #39550: [SPARK-42056][SQL][PROTOBUF] Add missing options for Protobuf functions

2023-01-20 Thread via GitHub



rangadi commented on code in PR #39550:
URL: https://github.com/apache/spark/pull/39550#discussion_r1083089318


##
connector/protobuf/src/main/scala/org/apache/spark/sql/protobuf/CatalystDataToProtobuf.scala:
##
@@ -26,7 +26,8 @@ import org.apache.spark.sql.types.{BinaryType, DataType}
 private[protobuf] case class CatalystDataToProtobuf(
 child: Expression,
 messageName: String,
-descFilePath: Option[String] = None)
+descFilePath: Option[String] = None,
+options: Map[String, String] = Map.empty)

Review Comment:
   Agree. Options like 'JSON format' will be in open source (not working on it 
yet).
   I think only schema-registry support is the only one missing. It is a bit 
tricky to get it in. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SandishKumarHN commented on a diff in pull request #39550: [SPARK-42056][SQL][PROTOBUF] Add missing options for Protobuf functions

2023-01-20 Thread via GitHub



SandishKumarHN commented on code in PR #39550:
URL: https://github.com/apache/spark/pull/39550#discussion_r1083087017


##
connector/protobuf/src/main/scala/org/apache/spark/sql/protobuf/CatalystDataToProtobuf.scala:
##
@@ -26,7 +26,8 @@ import org.apache.spark.sql.types.{BinaryType, DataType}
 private[protobuf] case class CatalystDataToProtobuf(
 child: Expression,
 messageName: String,
-descFilePath: Option[String] = None)
+descFilePath: Option[String] = None,
+options: Map[String, String] = Map.empty)

Review Comment:
   These features would be useful to have in spark-protobuf. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] gengliangwang closed pull request #39666: [SPARK-42130][UI] Handle null string values in AccumulableInfo and ProcessSummary

2023-01-20 Thread via GitHub



gengliangwang closed pull request #39666: [SPARK-42130][UI] Handle null string 
values in AccumulableInfo and ProcessSummary
URL: https://github.com/apache/spark/pull/39666


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] gengliangwang commented on pull request #39666: [SPARK-42130][UI] Handle null string values in AccumulableInfo and ProcessSummary

2023-01-20 Thread via GitHub



gengliangwang commented on PR #39666:
URL: https://github.com/apache/spark/pull/39666#issuecomment-1398990949

   @LuciferYang @dongjoon-hyun Thanks for the review. Merging to master


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] rangadi commented on a diff in pull request #39550: [SPARK-42056][SQL][PROTOBUF] Add missing options for Protobuf functions

2023-01-20 Thread via GitHub



rangadi commented on code in PR #39550:
URL: https://github.com/apache/spark/pull/39550#discussion_r1083073269


##
connector/protobuf/src/main/scala/org/apache/spark/sql/protobuf/CatalystDataToProtobuf.scala:
##
@@ -26,7 +26,8 @@ import org.apache.spark.sql.types.{BinaryType, DataType}
 private[protobuf] case class CatalystDataToProtobuf(
 child: Expression,
 messageName: String,
-descFilePath: Option[String] = None)
+descFilePath: Option[String] = None,
+options: Map[String, String] = Map.empty)

Review Comment:
   We will likely add more options in the future. It is currently used in 
Databricks distribution for couple of things:
 - Enable support for sharded java classes (Databricks disables it by 
default due to complexity of shading)
 - Schema registry related options. 
   
   I can imagine future options like supporting json serialization (rather than 
binary bytes). 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] rangadi commented on a diff in pull request #39550: [SPARK-42056][SQL][PROTOBUF] Add missing options for Protobuf functions

2023-01-20 Thread via GitHub



rangadi commented on code in PR #39550:
URL: https://github.com/apache/spark/pull/39550#discussion_r1083073269


##
connector/protobuf/src/main/scala/org/apache/spark/sql/protobuf/CatalystDataToProtobuf.scala:
##
@@ -26,7 +26,8 @@ import org.apache.spark.sql.types.{BinaryType, DataType}
 private[protobuf] case class CatalystDataToProtobuf(
 child: Expression,
 messageName: String,
-descFilePath: Option[String] = None)
+descFilePath: Option[String] = None,
+options: Map[String, String] = Map.empty)

Review Comment:
   We will likely add more options in the future. It is currently used in 
Databricks distribution for couple of things:
 - Enable support for sharded java classes (Databricks disables it by 
default due to complexity of shading)
 - Schema registry related options. 
   I can imagine future options like supporting json serialization (rather than 
binary bytes). 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SandishKumarHN commented on a diff in pull request #39550: [SPARK-42056][SQL][PROTOBUF] Add missing options for Protobuf functions

2023-01-20 Thread via GitHub



SandishKumarHN commented on code in PR #39550:
URL: https://github.com/apache/spark/pull/39550#discussion_r1083066809


##
connector/protobuf/src/main/scala/org/apache/spark/sql/protobuf/CatalystDataToProtobuf.scala:
##
@@ -26,7 +26,8 @@ import org.apache.spark.sql.types.{BinaryType, DataType}
 private[protobuf] case class CatalystDataToProtobuf(
 child: Expression,
 messageName: String,
-descFilePath: Option[String] = None)
+descFilePath: Option[String] = None,
+options: Map[String, String] = Map.empty)

Review Comment:
   @rangadi I don't see options being used anywhere in the process of 
converting CatalystData to Protobuf. do we need it? 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] dtenedor commented on a diff in pull request #39657: [SPARK-42123][SQL] Include column default values in DESCRIBE output

2023-01-20 Thread via GitHub



dtenedor commented on code in PR #39657:
URL: https://github.com/apache/spark/pull/39657#discussion_r1083065750


##
sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala:
##
@@ -645,6 +645,16 @@ case class DescribeTableCommand(
   } else if (isExtended) {
 describeFormattedTableInfo(metadata, result)
   }
+
+  // If any columns have default values, append them to the result.
+  if (metadata.schema.fields.exists(_.metadata.contains(
+ResolveDefaultColumns.CURRENT_DEFAULT_COLUMN_METADATA_KEY))) {
+append(result, "", "", "")
+append(result, "# Column Default Information", "", "")
+metadata.schema.foreach { column =>
+  column.getCurrentDefaultValue().map(append(result, column.name, "", 
_))

Review Comment:
   good suggestion, this is done!



##
sql/core/src/test/scala/org/apache/spark/sql/execution/command/v1/DescribeTableSuite.scala:
##
@@ -225,4 +246,30 @@ class DescribeTableSuite extends DescribeTableSuiteBase 
with CommandSuiteBase {
   Row("Partition Provider", "Catalog", "")))
 }
   }
+
+  test("DESCRIBE TABLE EXTENDED of a table with a default column value") {

Review Comment:
   yes, I have added this testing as well.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] dongjoon-hyun commented on pull request #39666: [SPARK-42130][UI] Handle null string values in AccumulableInfo and ProcessSummary

2023-01-20 Thread via GitHub



dongjoon-hyun commented on PR #39666:
URL: https://github.com/apache/spark/pull/39666#issuecomment-1398943624

   Oh, sure.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] srielau commented on a diff in pull request #38419: [SPARK-40945][SQL] Support built-in function to truncate numbers

2023-01-20 Thread via GitHub



srielau commented on code in PR #38419:
URL: https://github.com/apache/spark/pull/38419#discussion_r1083050679


##
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/MathExpressionsSuite.scala:
##
@@ -937,4 +937,135 @@ class MathExpressionsSuite extends SparkFunSuite with 
ExpressionEvalHelper {
 checkEvaluation(WidthBucket(5.35, 0.024, Double.NegativeInfinity, 5L), 
null)
 checkEvaluation(WidthBucket(5.35, 0.024, Double.PositiveInfinity, 5L), 
null)
   }
+
+test("truncate number") {
+def evaluateForEachItem(from: Int, to: Int, increment: Int)(f: (Int, Int) 
=> Unit): Unit = {
+  Range(from, to, increment).zipWithIndex.foreach { case (scale, i) =>
+f(scale, i)
+  }
+}
+
+// Decimal
+val decimalInput = BigDecimal("123456789123456789.123456789")
+val truncDecimalResults = Seq(
+  "123456789123456789.123456789",
+  "123456789123456789.123456789",
+  "123456789123456789.12345678",
+  "123456789123456789.1234567",
+  "123456789123456789.123456",
+  "123456789123456789.12345",
+  "123456789123456789.1234",
+  "123456789123456789.123",
+  "123456789123456789.12",
+  "123456789123456789.1",
+  "123456789123456789",
+  "123456789123456780",
+  "123456789123456700",
+  "123456789123456000",
+  "12345678912345",
+  "12345678912340",
+  "12345678912300",
+  "12345678912000",
+  "1234567891",
+  "1234567890",
+  "1234567800",
+  "1234567000",
+  "123456",
+  "123450",
+  "123400",
+  "123000",
+  "12",
+  "10",
+  "0",
+  "0")
+
+evaluateForEachItem(10, -20, -1)((scale, index) => {
+  checkEvaluation(TruncNumber(decimalInput, scale), 
BigDecimal(truncDecimalResults(index)))
+  checkEvaluation(TruncNumber(-decimalInput, scale), 
-BigDecimal(truncDecimalResults(index)))
+})
+// Double
+val negativeDouble = -12345678.1234
+val neg_DoubleResults: Seq[Double] =
+  Seq(-12345678.0, -12345670.0, -12345600.0, -12345000.0, -1234.0, 
-1230.0,
+-1200.0, -1000.0, 0.0, 0.0)
+val positiveDouble = 12345678.1234
+val pos_DoubleResults: Seq[Double] =
+  Seq(12345678.0, 12345670.0, 12345600.0, 12345000.0, 1234.0, 
1230.0, 1200.0,
+1000.0, 0.0, 0.0)
+
+evaluateForEachItem(0, -10, -1)((scale, index) => {
+  checkEvaluation(TruncNumber(positiveDouble, scale), 
pos_DoubleResults(index))
+  checkEvaluation(TruncNumber(negativeDouble, scale), 
neg_DoubleResults(index))
+})
+
+// Float
+val negativeFloat = -12345678.123f
+val neg_floatResults: Seq[Float] =
+  Seq(-12345678f, -12345670f, -12345600f, -12345000f, -1234f, 
-1230f, -1200f,
+-1000f, 0.0f, 0.0f)
+val positiveFloat = 12345678.123f
+val pos_FloatResults: Seq[Float] =
+  Seq(12345678f, 12345670f, 12345600f, 12345000f, 1234f, 1230f, 
1200f, 1000f,
+0.0f, 0.0f)
+
+evaluateForEachItem(0, -10, -1)((scale, index) => {
+  checkEvaluation(TruncNumber(positiveFloat, scale), 
pos_FloatResults(index))
+  checkEvaluation(TruncNumber(negativeFloat, scale), 
neg_floatResults(index))
+})
+
+// Long
+val longInput = 123456789L
+val longResults: Seq[Long] =
+  Seq(123456789L, 123456789L, 123456789L, 123456789L, 123456780L, 
123456700L, 123456000L,
+12345L, 12340L, 12300L, 12000L, 1L, 0, 0)
+evaluateForEachItem(3, -11, -1)((scale, index) => {
+  checkEvaluation(TruncNumber(longInput, scale), longResults(index))
+  checkEvaluation(TruncNumber(-longInput, scale), -longResults(index))
+})
+
+// Int
+val intInput = 123456789
+val intResults: Seq[Int] =
+  Seq(123456789, 123456789, 123456789, 123456789, 123456780, 123456700, 
123456000, 12345,
+12340, 12300, 12000, 1, 0, 0)
+evaluateForEachItem(3, -11, -1)((scale, index) => {
+  checkEvaluation(TruncNumber(intInput, scale), intResults(index))
+  checkEvaluation(TruncNumber(-intInput, scale), -intResults(index))
+})
+
+// Short
+val shortInput: Short = 32767
+val shortResults: Seq[Short] =
+  Seq(32767, 32767, 32767, 32767, 32760, 32700, 32000, 3, 0, 0)
+evaluateForEachItem(3, -6, -1)((scale, index) => {
+  checkEvaluation(TruncNumber(shortInput, scale), shortResults(index))
+  checkEvaluation(TruncNumber(-shortInput, scale), -shortResults(index))
+})
+
+// Byte
+val byteInput: Byte = 127
+val byteResults: Seq[Byte] =
+  Seq(127, 127, 127, 127, 120, 100, 0, 0)
+evaluateForEachItem(3, -4, -1)((scale, index) => {
+  checkEvaluation(TruncNumber(byteInput, scale), byteResults(index))
+  checkEvaluation(TruncNumber(-byteInput, scale), -byteResults(index))

[GitHub] [spark] vinodkc commented on a diff in pull request #38419: [SPARK-40945][SQL] Support built-in function to truncate numbers

2023-01-20 Thread via GitHub



vinodkc commented on code in PR #38419:
URL: https://github.com/apache/spark/pull/38419#discussion_r1083045053


##
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/MathExpressionsSuite.scala:
##
@@ -937,4 +937,135 @@ class MathExpressionsSuite extends SparkFunSuite with 
ExpressionEvalHelper {
 checkEvaluation(WidthBucket(5.35, 0.024, Double.NegativeInfinity, 5L), 
null)
 checkEvaluation(WidthBucket(5.35, 0.024, Double.PositiveInfinity, 5L), 
null)
   }
+
+test("truncate number") {
+def evaluateForEachItem(from: Int, to: Int, increment: Int)(f: (Int, Int) 
=> Unit): Unit = {
+  Range(from, to, increment).zipWithIndex.foreach { case (scale, i) =>
+f(scale, i)
+  }
+}
+
+// Decimal
+val decimalInput = BigDecimal("123456789123456789.123456789")
+val truncDecimalResults = Seq(
+  "123456789123456789.123456789",
+  "123456789123456789.123456789",
+  "123456789123456789.12345678",
+  "123456789123456789.1234567",
+  "123456789123456789.123456",
+  "123456789123456789.12345",
+  "123456789123456789.1234",
+  "123456789123456789.123",
+  "123456789123456789.12",
+  "123456789123456789.1",
+  "123456789123456789",
+  "123456789123456780",
+  "123456789123456700",
+  "123456789123456000",
+  "12345678912345",
+  "12345678912340",
+  "12345678912300",
+  "12345678912000",
+  "1234567891",
+  "1234567890",
+  "1234567800",
+  "1234567000",
+  "123456",
+  "123450",
+  "123400",
+  "123000",
+  "12",
+  "10",
+  "0",
+  "0")
+
+evaluateForEachItem(10, -20, -1)((scale, index) => {
+  checkEvaluation(TruncNumber(decimalInput, scale), 
BigDecimal(truncDecimalResults(index)))
+  checkEvaluation(TruncNumber(-decimalInput, scale), 
-BigDecimal(truncDecimalResults(index)))
+})
+// Double
+val negativeDouble = -12345678.1234
+val neg_DoubleResults: Seq[Double] =
+  Seq(-12345678.0, -12345670.0, -12345600.0, -12345000.0, -1234.0, 
-1230.0,
+-1200.0, -1000.0, 0.0, 0.0)
+val positiveDouble = 12345678.1234
+val pos_DoubleResults: Seq[Double] =
+  Seq(12345678.0, 12345670.0, 12345600.0, 12345000.0, 1234.0, 
1230.0, 1200.0,
+1000.0, 0.0, 0.0)
+
+evaluateForEachItem(0, -10, -1)((scale, index) => {
+  checkEvaluation(TruncNumber(positiveDouble, scale), 
pos_DoubleResults(index))
+  checkEvaluation(TruncNumber(negativeDouble, scale), 
neg_DoubleResults(index))
+})
+
+// Float
+val negativeFloat = -12345678.123f
+val neg_floatResults: Seq[Float] =
+  Seq(-12345678f, -12345670f, -12345600f, -12345000f, -1234f, 
-1230f, -1200f,
+-1000f, 0.0f, 0.0f)
+val positiveFloat = 12345678.123f
+val pos_FloatResults: Seq[Float] =
+  Seq(12345678f, 12345670f, 12345600f, 12345000f, 1234f, 1230f, 
1200f, 1000f,
+0.0f, 0.0f)
+
+evaluateForEachItem(0, -10, -1)((scale, index) => {
+  checkEvaluation(TruncNumber(positiveFloat, scale), 
pos_FloatResults(index))
+  checkEvaluation(TruncNumber(negativeFloat, scale), 
neg_floatResults(index))
+})
+
+// Long
+val longInput = 123456789L
+val longResults: Seq[Long] =
+  Seq(123456789L, 123456789L, 123456789L, 123456789L, 123456780L, 
123456700L, 123456000L,
+12345L, 12340L, 12300L, 12000L, 1L, 0, 0)
+evaluateForEachItem(3, -11, -1)((scale, index) => {
+  checkEvaluation(TruncNumber(longInput, scale), longResults(index))
+  checkEvaluation(TruncNumber(-longInput, scale), -longResults(index))
+})
+
+// Int
+val intInput = 123456789
+val intResults: Seq[Int] =
+  Seq(123456789, 123456789, 123456789, 123456789, 123456780, 123456700, 
123456000, 12345,
+12340, 12300, 12000, 1, 0, 0)
+evaluateForEachItem(3, -11, -1)((scale, index) => {
+  checkEvaluation(TruncNumber(intInput, scale), intResults(index))
+  checkEvaluation(TruncNumber(-intInput, scale), -intResults(index))
+})
+
+// Short
+val shortInput: Short = 32767
+val shortResults: Seq[Short] =
+  Seq(32767, 32767, 32767, 32767, 32760, 32700, 32000, 3, 0, 0)
+evaluateForEachItem(3, -6, -1)((scale, index) => {
+  checkEvaluation(TruncNumber(shortInput, scale), shortResults(index))
+  checkEvaluation(TruncNumber(-shortInput, scale), -shortResults(index))
+})
+
+// Byte
+val byteInput: Byte = 127
+val byteResults: Seq[Byte] =
+  Seq(127, 127, 127, 127, 120, 100, 0, 0)
+evaluateForEachItem(3, -4, -1)((scale, index) => {
+  checkEvaluation(TruncNumber(byteInput, scale), byteResults(index))
+  checkEvaluation(TruncNumber(-byteInput, scale), -byteResults(index))

[GitHub] [spark] gengliangwang commented on pull request #39666: [SPARK-42130][UI] Handle null string values in AccumulableInfo and ProcessSummary

2023-01-20 Thread via GitHub



gengliangwang commented on PR #39666:
URL: https://github.com/apache/spark/pull/39666#issuecomment-1398910697

   @dongjoon-hyun may I have your approval for this one? So that we can work in 
parallel to finish the refactoring before Spark 3.4.0 rc0


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] gengliangwang commented on a diff in pull request #39657: [SPARK-42123][SQL] Include column default values in DESCRIBE output

2023-01-20 Thread via GitHub



gengliangwang commented on code in PR #39657:
URL: https://github.com/apache/spark/pull/39657#discussion_r1082955637


##
sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala:
##
@@ -645,6 +645,16 @@ case class DescribeTableCommand(
   } else if (isExtended) {
 describeFormattedTableInfo(metadata, result)
   }
+
+  // If any columns have default values, append them to the result.
+  if (metadata.schema.fields.exists(_.metadata.contains(
+ResolveDefaultColumns.CURRENT_DEFAULT_COLUMN_METADATA_KEY))) {
+append(result, "", "", "")
+append(result, "# Column Default Information", "", "")
+metadata.schema.foreach { column =>
+  column.getCurrentDefaultValue().map(append(result, column.name, "", 
_))

Review Comment:
   Shall we fill the data type column to make it easier to read?
   https://user-images.githubusercontent.com/1097932/213785998-09d5d34e-8a7b-4ba5-b8c2-6cab52fdc02a.png;>
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] gengliangwang commented on a diff in pull request #39657: [SPARK-42123][SQL] Include column default values in DESCRIBE output

2023-01-20 Thread via GitHub



gengliangwang commented on code in PR #39657:
URL: https://github.com/apache/spark/pull/39657#discussion_r1082948320


##
sql/core/src/test/scala/org/apache/spark/sql/execution/command/v1/DescribeTableSuite.scala:
##
@@ -225,4 +246,30 @@ class DescribeTableSuite extends DescribeTableSuiteBase 
with CommandSuiteBase {
   Row("Partition Provider", "Catalog", "")))
 }
   }
+
+  test("DESCRIBE TABLE EXTENDED of a table with a default column value") {

Review Comment:
   nit: shall we update `describe.sql` so that we can have a straightforward 
end-to-end test?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] srielau commented on a diff in pull request #38419: [SPARK-40945][SQL] Support built-in function to truncate numbers

2023-01-20 Thread via GitHub



srielau commented on code in PR #38419:
URL: https://github.com/apache/spark/pull/38419#discussion_r1082942017


##
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/MathExpressionsSuite.scala:
##
@@ -937,4 +937,135 @@ class MathExpressionsSuite extends SparkFunSuite with 
ExpressionEvalHelper {
 checkEvaluation(WidthBucket(5.35, 0.024, Double.NegativeInfinity, 5L), 
null)
 checkEvaluation(WidthBucket(5.35, 0.024, Double.PositiveInfinity, 5L), 
null)
   }
+
+test("truncate number") {
+def evaluateForEachItem(from: Int, to: Int, increment: Int)(f: (Int, Int) 
=> Unit): Unit = {
+  Range(from, to, increment).zipWithIndex.foreach { case (scale, i) =>
+f(scale, i)
+  }
+}
+
+// Decimal
+val decimalInput = BigDecimal("123456789123456789.123456789")
+val truncDecimalResults = Seq(
+  "123456789123456789.123456789",
+  "123456789123456789.123456789",
+  "123456789123456789.12345678",
+  "123456789123456789.1234567",
+  "123456789123456789.123456",
+  "123456789123456789.12345",
+  "123456789123456789.1234",
+  "123456789123456789.123",
+  "123456789123456789.12",
+  "123456789123456789.1",
+  "123456789123456789",
+  "123456789123456780",
+  "123456789123456700",
+  "123456789123456000",
+  "12345678912345",
+  "12345678912340",
+  "12345678912300",
+  "12345678912000",
+  "1234567891",
+  "1234567890",
+  "1234567800",
+  "1234567000",
+  "123456",
+  "123450",
+  "123400",
+  "123000",
+  "12",
+  "10",
+  "0",
+  "0")
+
+evaluateForEachItem(10, -20, -1)((scale, index) => {
+  checkEvaluation(TruncNumber(decimalInput, scale), 
BigDecimal(truncDecimalResults(index)))
+  checkEvaluation(TruncNumber(-decimalInput, scale), 
-BigDecimal(truncDecimalResults(index)))
+})
+// Double
+val negativeDouble = -12345678.1234
+val neg_DoubleResults: Seq[Double] =
+  Seq(-12345678.0, -12345670.0, -12345600.0, -12345000.0, -1234.0, 
-1230.0,
+-1200.0, -1000.0, 0.0, 0.0)
+val positiveDouble = 12345678.1234
+val pos_DoubleResults: Seq[Double] =
+  Seq(12345678.0, 12345670.0, 12345600.0, 12345000.0, 1234.0, 
1230.0, 1200.0,
+1000.0, 0.0, 0.0)
+
+evaluateForEachItem(0, -10, -1)((scale, index) => {
+  checkEvaluation(TruncNumber(positiveDouble, scale), 
pos_DoubleResults(index))
+  checkEvaluation(TruncNumber(negativeDouble, scale), 
neg_DoubleResults(index))
+})
+
+// Float
+val negativeFloat = -12345678.123f
+val neg_floatResults: Seq[Float] =
+  Seq(-12345678f, -12345670f, -12345600f, -12345000f, -1234f, 
-1230f, -1200f,
+-1000f, 0.0f, 0.0f)
+val positiveFloat = 12345678.123f
+val pos_FloatResults: Seq[Float] =
+  Seq(12345678f, 12345670f, 12345600f, 12345000f, 1234f, 1230f, 
1200f, 1000f,
+0.0f, 0.0f)
+
+evaluateForEachItem(0, -10, -1)((scale, index) => {
+  checkEvaluation(TruncNumber(positiveFloat, scale), 
pos_FloatResults(index))
+  checkEvaluation(TruncNumber(negativeFloat, scale), 
neg_floatResults(index))
+})
+
+// Long
+val longInput = 123456789L
+val longResults: Seq[Long] =
+  Seq(123456789L, 123456789L, 123456789L, 123456789L, 123456780L, 
123456700L, 123456000L,
+12345L, 12340L, 12300L, 12000L, 1L, 0, 0)
+evaluateForEachItem(3, -11, -1)((scale, index) => {
+  checkEvaluation(TruncNumber(longInput, scale), longResults(index))
+  checkEvaluation(TruncNumber(-longInput, scale), -longResults(index))
+})
+
+// Int
+val intInput = 123456789
+val intResults: Seq[Int] =
+  Seq(123456789, 123456789, 123456789, 123456789, 123456780, 123456700, 
123456000, 12345,
+12340, 12300, 12000, 1, 0, 0)
+evaluateForEachItem(3, -11, -1)((scale, index) => {
+  checkEvaluation(TruncNumber(intInput, scale), intResults(index))
+  checkEvaluation(TruncNumber(-intInput, scale), -intResults(index))
+})
+
+// Short
+val shortInput: Short = 32767
+val shortResults: Seq[Short] =
+  Seq(32767, 32767, 32767, 32767, 32760, 32700, 32000, 3, 0, 0)
+evaluateForEachItem(3, -6, -1)((scale, index) => {
+  checkEvaluation(TruncNumber(shortInput, scale), shortResults(index))
+  checkEvaluation(TruncNumber(-shortInput, scale), -shortResults(index))
+})
+
+// Byte
+val byteInput: Byte = 127
+val byteResults: Seq[Byte] =
+  Seq(127, 127, 127, 127, 120, 100, 0, 0)
+evaluateForEachItem(3, -4, -1)((scale, index) => {
+  checkEvaluation(TruncNumber(byteInput, scale), byteResults(index))
+  checkEvaluation(TruncNumber(-byteInput, scale), -byteResults(index))

[GitHub] [spark] gengliangwang commented on a diff in pull request #39666: [SPARK-42130][UI] Handle null string values in AccumulableInfo and ProcessSummary

2023-01-20 Thread via GitHub



gengliangwang commented on code in PR #39666:
URL: https://github.com/apache/spark/pull/39666#discussion_r1082936874


##
core/src/main/protobuf/org/apache/spark/status/protobuf/store_types.proto:
##
@@ -22,7 +22,12 @@ package org.apache.spark.status.protobuf;
  * Developer guides:
  *   - Coding style: https://developers.google.com/protocol-buffers/docs/style
  *   - Use int64 for job/stage IDs, in case of future extension in Spark core.
- *   - Use `weakIntern` on string values in create new objects during 
deserialization.
+ *   - For string fields:
+ * - use `optional string` for protobuf definition
+ * - on serialization, check if the string is null to avoid NPE
+ * - on deserialization, set string fields as null if it is not set. Also, 
use `weakIntern` on
+ *   string values in create new objects during deserialization.
+ * - add tests with null string inputs

Review Comment:
   AFAIK no. Unless we modify the protoc for generating java. I think having 
tests for setting all string fields as nulls should be enough for now.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] ggwiebe commented on pull request #39519: [SPARK-41995][SQL] Accept non-foldable expressions in schema_of_json

2023-01-20 Thread via GitHub



ggwiebe commented on PR #39519:
URL: https://github.com/apache/spark/pull/39519#issuecomment-1398786764

   This is great!
   
   I have a customer that I am helping where they are receiving many different 
types of messages enveloped into a structured stream. Today I cannot give a 
schema for the message because there are many different message types stored in 
this column (each row could hold a different message type, with a different 
schema stored in the message value field). The ability to dynamically provide 
different values without an a priori schema will be super useful.
   
   Thanks!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] rangadi commented on a diff in pull request #39550: [SPARK-42056][SQL][PROTOBUF] Add missing options for Protobuf functions

2023-01-20 Thread via GitHub



rangadi commented on code in PR #39550:
URL: https://github.com/apache/spark/pull/39550#discussion_r1082925694


##
connector/protobuf/src/main/scala/org/apache/spark/sql/protobuf/functions.scala:
##
@@ -80,14 +81,37 @@ object functions {
*
* @param data
*   the binary column.
-   * @param shadedMessageClassName
+   * @param messageClassName
*   The Protobuf class name. E.g. 
org.spark.examples.protobuf.ExampleEvent.
*   The jar with these classes needs to be shaded as described above.
* @since 3.4.0
*/
   @Experimental
-  def from_protobuf(data: Column, shadedMessageClassName: String): Column = {
-new Column(ProtobufDataToCatalyst(data.expr, shadedMessageClassName))
+  def from_protobuf(data: Column, messageClassName: String): Column = {
+new Column(ProtobufDataToCatalyst(data.expr, messageClassName))
+  }
+
+  /**
+   * Converts a binary column of Protobuf format into its corresponding 
catalyst value. The
+   * specified Protobuf class must match the data, otherwise the behavior is
+   * undefined: it may fail or return arbitrary result. The jar containing 
Java class should be
+   * shaded. Specifically, `com.google.protobuf.*` should be shaded to

Review Comment:
   @SandishKumarHN made a big update to both Scala and Python functions. PTAL.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] RyanBerti commented on pull request #39678: [SPARK-16484][SQL] Add HyperLogLogPlusPlus sketch generator/evaluator/aggregator

2023-01-20 Thread via GitHub



RyanBerti commented on PR #39678:
URL: https://github.com/apache/spark/pull/39678#issuecomment-1398762642

   For reference, @dtenedor worked with me on a pre-review of these changes; 
relevant discussions are available in this PR in my fork: 
https://github.com/RyanBerti/spark/pull/1


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] rithwik-db commented on a diff in pull request #39369: [SPARK-41775][PYTHON][ML] Adding support for PyTorch functions

2023-01-20 Thread GitBox



rithwik-db commented on code in PR #39369:
URL: https://github.com/apache/spark/pull/39369#discussion_r1082896210


##
python/pyspark/ml/torch/tests/test_distributor.py:
##
@@ -349,6 +434,13 @@ def test_get_num_tasks_distributed(self) -> None:
 
 self.spark.sparkContext._conf.set("spark.task.resource.gpu.amount", 
"1")
 
+# def test_end_to_end_run_distributedly(self) -> None:

Review Comment:
   Sorry, that shouldn't have been commented out. Will quickly update adding 
that test back in
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] lu-wang-dl commented on a diff in pull request #39369: [SPARK-41775][PYTHON][ML] Adding support for PyTorch functions

2023-01-20 Thread GitBox



lu-wang-dl commented on code in PR #39369:
URL: https://github.com/apache/spark/pull/39369#discussion_r1082894756


##
python/pyspark/ml/torch/tests/test_distributor.py:
##
@@ -349,6 +434,13 @@ def test_get_num_tasks_distributed(self) -> None:
 
 self.spark.sparkContext._conf.set("spark.task.resource.gpu.amount", 
"1")
 
+# def test_end_to_end_run_distributedly(self) -> None:

Review Comment:
   Clean up this?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] dongjoon-hyun commented on pull request #39671: [SPARK-40303][DOCS] Deprecate old Java 8 versions prior to 8u362

2023-01-20 Thread GitBox



dongjoon-hyun commented on PR #39671:
URL: https://github.com/apache/spark/pull/39671#issuecomment-1398718941

   FYI, GitHub Action is currently on the old version but `jdk8u362` will be 
automatically applied in one or two weeks. I'll keep monitoring the version 
change.
   
   https://user-images.githubusercontent.com/9700541/213764829-44ae3b35-54e1-4420-8741-9f10e720eec0.png;>
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] dongjoon-hyun commented on a diff in pull request #39677: [SPARK-42043][CONNECT][TEST][FOLLOWUP] Better env var and a few bug fixes

2023-01-20 Thread GitBox



dongjoon-hyun commented on code in PR #39677:
URL: https://github.com/apache/spark/pull/39677#discussion_r1082842120


##
connector/connect/client/jvm/src/test/scala/org/apache/spark/sql/connect/client/util/RemoteSparkSession.scala:
##
@@ -121,14 +117,14 @@ object SparkConnectServerUtils {
   s"Fail to locate the spark connect server target folder: 
'${parentDir.getCanonicalPath}'. " +
 s"SPARK_HOME='${new File(sparkHome).getCanonicalPath}'. " +
 "Make sure the spark connect server jar has been built " +
-"and the system property `SPARK_HOME` is set correctly.")
+"and the env variable `SPARK_HOME` is set correctly.")
 val jars = recursiveListFiles(parentDir).filter { f =>
   // SBT jar
   (f.getParentFile.getName.startsWith("scala-") &&
-f.getName.startsWith("spark-connect-assembly") && 
f.getName.endsWith("SNAPSHOT.jar")) ||
+f.getName.startsWith("spark-connect-assembly") && 
f.getName.endsWith(".jar")) ||

Review Comment:
   Thank you for the fix.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] dongjoon-hyun commented on pull request #39677: [SPARK-42043][CONNECT][TEST][FOLLOWUP] Better env var and a few bug fixes

2023-01-20 Thread GitBox



dongjoon-hyun commented on PR #39677:
URL: https://github.com/apache/spark/pull/39677#issuecomment-1398677709

   Hi, @zhenlineo . When we use the same JIRA id, we need to add `[FOLLOWUP]` 
in the PR title. I added at this time.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] dongjoon-hyun closed pull request #39671: [SPARK-40303][DOCS] Deprecate old Java 8 versions prior to 8u362

2023-01-20 Thread GitBox



dongjoon-hyun closed pull request #39671: [SPARK-40303][DOCS] Deprecate old 
Java 8 versions prior to 8u362
URL: https://github.com/apache/spark/pull/39671


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] dongjoon-hyun commented on pull request #39671: [SPARK-40303][DOCS] Deprecate old Java 8 versions prior to 8u362

2023-01-20 Thread GitBox



dongjoon-hyun commented on PR #39671:
URL: https://github.com/apache/spark/pull/39671#issuecomment-1398671949

   Now, both Zulu and Adoptiun(Temurin) are available. Thank you all. Merged to 
master.
   https://user-images.githubusercontent.com/9700541/213759635-e5163f33-ab6e-4f02-a981-d2044cd171b2.png;>
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] srowen commented on pull request #39190: [SPARK-41683][CORE] Fix issue of getting incorrect property numActiveStages in jobs API

2023-01-20 Thread GitBox



srowen commented on PR #39190:
URL: https://github.com/apache/spark/pull/39190#issuecomment-1398662080

   FWIW, this part was last changed in 
https://issues.apache.org/jira/browse/SPARK-24415 to fix a different bug (CC 
@ankuriitg ) It might be worth re-running the simple example there to see if 
this retains the 'fix', but, evidently the tests added in that old change still 
pass here. 
   
   While I'm always wary of touching this core code and I myself don't know it 
well, this seems fairly convincing.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] sunchao commented on pull request #39633: [SPARK-42038][SQL] SPJ: Support partially clustered distribution

2023-01-20 Thread GitBox



sunchao commented on PR #39633:
URL: https://github.com/apache/spark/pull/39633#issuecomment-1398661965

   @cloud-fan the idea is similar to skew join but for v2 sources, let me try 
to split the code into a separate rule following your idea.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] zhenlineo commented on pull request #39677: [SPARK-42043][CONNECT][TEST] Better env var and a few bug fixes

2023-01-20 Thread GitBox



zhenlineo commented on PR #39677:
URL: https://github.com/apache/spark/pull/39677#issuecomment-1398630857

   cc @HyukjinKwon @LuciferYang Thanks for the review. I address the immediate 
fix in this PR. 
   For other improvements, I've created tickets and we can add as follow ups.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] zhenlineo commented on a diff in pull request #39541: [SPARK-42043][CONNECT] Scala Client Result with E2E Tests

2023-01-20 Thread GitBox



zhenlineo commented on code in PR #39541:
URL: https://github.com/apache/spark/pull/39541#discussion_r1082775554


##
connector/connect/client/jvm/src/test/scala/org/apache/spark/sql/connect/client/util/RemoteSparkSession.scala:
##
@@ -0,0 +1,198 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.sql.connect.client.util
+
+import java.io.{BufferedOutputStream, File}
+
+import scala.io.Source
+
+import org.scalatest.BeforeAndAfterAll
+import sys.process._
+
+import org.apache.spark.sql.SparkSession
+import org.apache.spark.sql.connect.client.SparkConnectClient
+import org.apache.spark.sql.connect.common.config.ConnectCommon
+
+/**
+ * An util class to start a local spark connect server in a different process 
for local E2E tests.
+ * It is designed to start the server once but shared by all tests. It is 
equivalent to use the
+ * following command to start the connect server via command line:
+ *
+ * {{{
+ * bin/spark-shell \
+ * --jars `ls connector/connect/server/target/**/spark-connect*SNAPSHOT.jar | 
paste -sd ',' -` \
+ * --conf spark.plugins=org.apache.spark.sql.connect.SparkConnectPlugin
+ * }}}
+ *
+ * Set system property `SPARK_HOME` if the test is not executed from the Spark 
project top folder.
+ * Set system property `DEBUG_SC_JVM_CLIENT=true` to print the server process 
output in the
+ * console to debug server start stop problems.
+ */
+object SparkConnectServerUtils {
+  // System properties used for testing and debugging
+  private val SPARK_HOME = "SPARK_HOME"
+  private val ENV_DEBUG_SC_JVM_CLIENT = "DEBUG_SC_JVM_CLIENT"
+
+  private val sparkHome = System.getProperty(SPARK_HOME, fileSparkHome())
+  private val isDebug = System.getProperty(ENV_DEBUG_SC_JVM_CLIENT, 
"false").toBoolean
+
+  // Log server start stop debug info into console
+  // scalastyle:off println
+  private[connect] def debug(msg: String): Unit = if (isDebug) println(msg)
+  // scalastyle:on println
+  private[connect] def debug(error: Throwable): Unit = if (isDebug) 
error.printStackTrace()
+
+  // Server port
+  private[connect] val port = ConnectCommon.CONNECT_GRPC_BINDING_PORT + 
util.Random.nextInt(1000)
+
+  @volatile private var stopped = false
+
+  private lazy val sparkConnect: Process = {
+debug("Starting the Spark Connect Server...")
+val jar = findSparkConnectJar
+val builder = Process(
+  new File(sparkHome, "bin/spark-shell").getCanonicalPath,
+  Seq(
+"--jars",
+jar,
+"--driver-class-path",
+jar,
+"--conf",
+"spark.plugins=org.apache.spark.sql.connect.SparkConnectPlugin",
+"--conf",
+s"spark.connect.grpc.binding.port=$port"))
+
+val io = new ProcessIO(
+  // Hold the input channel to the spark console to keep the console open
+  in => new BufferedOutputStream(in),
+  // Only redirect output if debug to avoid channel interruption error on 
process termination.
+  out => if (isDebug) Source.fromInputStream(out).getLines.foreach(debug),
+  err => if (isDebug) Source.fromInputStream(err).getLines.foreach(debug))
+val process = builder.run(io)
+
+// Adding JVM shutdown hook
+sys.addShutdownHook(kill())
+process
+  }
+
+  def start(): Unit = {
+assert(!stopped)
+sparkConnect

Review Comment:
   I tried to stop the server properly by sending `:quit` to the shell process, 
but it works 20% of the runs. As it is too unreliable I has to use the kill. Do 
you know some better way to stop the shell properly? Thanks a lot.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] zhenlineo commented on a diff in pull request #39541: [SPARK-42043][CONNECT] Scala Client Result with E2E Tests

2023-01-20 Thread GitBox



zhenlineo commented on code in PR #39541:
URL: https://github.com/apache/spark/pull/39541#discussion_r1082771233


##
connector/connect/client/jvm/src/test/scala/org/apache/spark/sql/connect/client/util/RemoteSparkSession.scala:
##
@@ -0,0 +1,198 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.sql.connect.client.util
+
+import java.io.{BufferedOutputStream, File}
+
+import scala.io.Source
+
+import org.scalatest.BeforeAndAfterAll
+import sys.process._
+
+import org.apache.spark.sql.SparkSession
+import org.apache.spark.sql.connect.client.SparkConnectClient
+import org.apache.spark.sql.connect.common.config.ConnectCommon
+
+/**
+ * An util class to start a local spark connect server in a different process 
for local E2E tests.
+ * It is designed to start the server once but shared by all tests. It is 
equivalent to use the
+ * following command to start the connect server via command line:
+ *
+ * {{{
+ * bin/spark-shell \
+ * --jars `ls connector/connect/server/target/**/spark-connect*SNAPSHOT.jar | 
paste -sd ',' -` \
+ * --conf spark.plugins=org.apache.spark.sql.connect.SparkConnectPlugin
+ * }}}
+ *
+ * Set system property `SPARK_HOME` if the test is not executed from the Spark 
project top folder.
+ * Set system property `DEBUG_SC_JVM_CLIENT=true` to print the server process 
output in the
+ * console to debug server start stop problems.
+ */
+object SparkConnectServerUtils {
+  // System properties used for testing and debugging
+  private val SPARK_HOME = "SPARK_HOME"
+  private val ENV_DEBUG_SC_JVM_CLIENT = "DEBUG_SC_JVM_CLIENT"
+
+  private val sparkHome = System.getProperty(SPARK_HOME, fileSparkHome())
+  private val isDebug = System.getProperty(ENV_DEBUG_SC_JVM_CLIENT, 
"false").toBoolean
+
+  // Log server start stop debug info into console
+  // scalastyle:off println
+  private[connect] def debug(msg: String): Unit = if (isDebug) println(msg)

Review Comment:
   In normal runs, we shall not print anything out. The prints are merely added 
for me to quick debugging any server start errors when setting up the tests. 
   
   The Scala client needs a new logging system, which we will add in a 
follow-up PR. Let me know if it is a recommendation to use log4j for all test 
outputs. Then I will migrate these tests together. 
https://issues.apache.org/jira/browse/SPARK-42135



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

1 2 3 >

1 - 100 of 202 matches

Mail list logo