[GitHub] [spark] LuciferYang commented on a diff in pull request #39733: Setting version to 3.5.0-SNAPSHOT

2023-01-24 Thread via GitHub


LuciferYang commented on code in PR #39733:
URL: https://github.com/apache/spark/pull/39733#discussion_r1086307422


##
docs/_config.yml:
##
@@ -19,8 +19,8 @@ include:
 
 # These allow the documentation to be updated with newer releases
 # of Spark, Scala, and Mesos.
-SPARK_VERSION: 3.4.0-SNAPSHOT
-SPARK_VERSION_SHORT: 3.4.0
+SPARK_VERSION: 3.5.0-SNAPSHOT
+SPARK_VERSION_SHORT: 3.5.0
 SCALA_BINARY_VERSION: "2.12"
 SCALA_VERSION: "2.12.16"

Review Comment:
   When should `SCALA_VERSION`  be changed? The current master uses `2.12.17` 
instead of `2.12.16`. Maybe the branch-3.4 also needs to be changed?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] itholic commented on a diff in pull request #39700: [SPARK-41490][SQL] Assign name to _LEGACY_ERROR_TEMP_2441

2023-01-24 Thread via GitHub


itholic commented on code in PR #39700:
URL: https://github.com/apache/spark/pull/39700#discussion_r1086300683


##
core/src/main/resources/error/error-classes.json:
##
@@ -1432,6 +1432,12 @@
 },
 "sqlState" : "0A000"
   },
+  "UNSUPPORTED_EXPR_FOR_OPERATOR" : {
+"message" : [
+  "The query operator `` contains one or more unsupported 
expression types Aggregate, Window or Generate.",

Review Comment:
   Thanks! Just applied the comment



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] LuciferYang commented on pull request #39732: [SPARK-42178][UI] Handle remaining null string values in ui protobuf serializer and add tests

2023-01-24 Thread via GitHub


LuciferYang commented on PR #39732:
URL: https://github.com/apache/spark/pull/39732#issuecomment-1403214341

   Another way I can think of is to check each `field` in 
'StoreTypes.getDescriptor.toProto.getMessageTypeList', but it should not be 
better than the current one.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on a diff in pull request #39732: [SPARK-42178][UI] Handle remaining null string values in ui protobuf serializer and add tests

2023-01-24 Thread via GitHub


dongjoon-hyun commented on code in PR #39732:
URL: https://github.com/apache/spark/pull/39732#discussion_r1086294697


##
core/src/test/scala/org/apache/spark/status/protobuf/KVStoreProtobufSerializerSuite.scala:
##
@@ -463,7 +492,7 @@ class KVStoreProtobufSerializerSuite extends SparkFunSuite {
   name = null,
   numPartitions = 8,
   numCachedPartitions = 5,
-  storageLevel = "IN_MEMORY",
+  storageLevel = null,

Review Comment:
   Got it~



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] peter-toth commented on pull request #39722: [WIP][SPARK-42162] Introduce MultiAdd expression as a memory optimization for canonicalizing large trees of Add expressions

2023-01-24 Thread via GitHub


peter-toth commented on PR #39722:
URL: https://github.com/apache/spark/pull/39722#issuecomment-1403212470

   > - With the https://github.com/apache/spark/pull/37851 in the expression 
canonicalization, a complex query with a large number of Add operations could 
end up consuming significantly more (sometimes > 10X) memory on the executors.
   
   @db-scnakandala, can you please explain this issue a bit more? Does 
https://github.com/apache/spark/pull/37851 cause performance regression? Why 
exactly? And why on executors? 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] gengliangwang commented on a diff in pull request #39732: [SPARK-42178][UI] Handle remaining null string values in ui protobuf serializer and add tests

2023-01-24 Thread via GitHub


gengliangwang commented on code in PR #39732:
URL: https://github.com/apache/spark/pull/39732#discussion_r1086294168


##
core/src/test/scala/org/apache/spark/status/protobuf/KVStoreProtobufSerializerSuite.scala:
##
@@ -463,7 +492,7 @@ class KVStoreProtobufSerializerSuite extends SparkFunSuite {
   name = null,
   numPartitions = 8,
   numCachedPartitions = 5,
-  storageLevel = "IN_MEMORY",
+  storageLevel = null,

Review Comment:
   yeah we need to test null string input. There is another object with 
`storageLevel` as `IN_MEMORY` already. Let's simply change the third object.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on pull request #39735: [SPARK-42179][BUILD][SQL][3.3] Upgrade ORC to 1.7.8

2023-01-24 Thread via GitHub


dongjoon-hyun commented on PR #39735:
URL: https://github.com/apache/spark/pull/39735#issuecomment-1403211478

   Could you review this PR, @gengliangwang ?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] gengliangwang commented on a diff in pull request #39732: [SPARK-42178][UI] Handle remaining null string values in ui protobuf serializer and add tests

2023-01-24 Thread via GitHub


gengliangwang commented on code in PR #39732:
URL: https://github.com/apache/spark/pull/39732#discussion_r1086293127


##
core/src/test/scala/org/apache/spark/status/protobuf/KVStoreProtobufSerializerSuite.scala:
##
@@ -31,6 +34,32 @@ import org.apache.spark.ui.scope.{RDDOperationEdge, 
RDDOperationNode}
 class KVStoreProtobufSerializerSuite extends SparkFunSuite {
   private val serializer = new KVStoreProtobufSerializer()
 
+  test("All the string fields must be optional to avoid NPE") {

Review Comment:
   Example failure output:
   ```
   ArrayBuffer(("  string storage_level = 5;", 267), ("  string name = 1;", 
363)) was not empty
   All the string fields should be defined as `optional string` for handling 
null string.
   Please update the following fields:
   line #267:   string storage_level = 5;
   line #363:   string name = 1;
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] gengliangwang commented on a diff in pull request #39732: [SPARK-42178][UI] Handle remaining null string values in ui protobuf serializer and add tests

2023-01-24 Thread via GitHub


gengliangwang commented on code in PR #39732:
URL: https://github.com/apache/spark/pull/39732#discussion_r1086292734


##
core/src/test/scala/org/apache/spark/status/protobuf/KVStoreProtobufSerializerSuite.scala:
##
@@ -31,6 +34,32 @@ import org.apache.spark.ui.scope.{RDDOperationEdge, 
RDDOperationNode}
 class KVStoreProtobufSerializerSuite extends SparkFunSuite {
   private val serializer = new KVStoreProtobufSerializer()
 
+  test("All the string fields must be optional to avoid NPE") {
+val protoFile = getWorkspaceFilePath(
+  "core", "src", "main", "protobuf", "org", "apache", "spark", "status", 
"protobuf",
+  "store_types.proto")
+
+val containsStringRegex = "\\s*string .*"
+val invalidDefinition = new mutable.ArrayBuffer[(String, Int)]()
+var lineNumber = 1
+Source.fromFile(protoFile.toFile.getCanonicalPath).getLines().foreach { 
line =>
+  if (line.matches(containsStringRegex)) {

Review Comment:
   ok, updated



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] EnricoMi commented on pull request #39717: [SPARK-42168][3.2][SQL][PYTHON] Fix required child distribution of FlatMapCoGroupsInPandas (as in CoGroup)

2023-01-24 Thread via GitHub


EnricoMi commented on PR #39717:
URL: https://github.com/apache/spark/pull/39717#issuecomment-1403210336

   @sunchao good catch!
   
   I have renamed the PR and added the Python example as a unit test.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on a diff in pull request #39732: [SPARK-42178][UI] Handle remaining null string values in ui protobuf serializer and add tests

2023-01-24 Thread via GitHub


dongjoon-hyun commented on code in PR #39732:
URL: https://github.com/apache/spark/pull/39732#discussion_r1086292404


##
core/src/test/scala/org/apache/spark/status/protobuf/KVStoreProtobufSerializerSuite.scala:
##
@@ -463,7 +492,7 @@ class KVStoreProtobufSerializerSuite extends SparkFunSuite {
   name = null,
   numPartitions = 8,
   numCachedPartitions = 5,
-  storageLevel = "IN_MEMORY",
+  storageLevel = null,

Review Comment:
   Is this change required in this PR?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on a diff in pull request #39732: [SPARK-42178][UI] Handle remaining null string values in ui protobuf serializer and add tests

2023-01-24 Thread via GitHub


dongjoon-hyun commented on code in PR #39732:
URL: https://github.com/apache/spark/pull/39732#discussion_r1086292024


##
core/src/test/scala/org/apache/spark/status/protobuf/KVStoreProtobufSerializerSuite.scala:
##
@@ -31,6 +34,32 @@ import org.apache.spark.ui.scope.{RDDOperationEdge, 
RDDOperationNode}
 class KVStoreProtobufSerializerSuite extends SparkFunSuite {
   private val serializer = new KVStoreProtobufSerializer()
 
+  test("All the string fields must be optional to avoid NPE") {
+val protoFile = getWorkspaceFilePath(
+  "core", "src", "main", "protobuf", "org", "apache", "spark", "status", 
"protobuf",
+  "store_types.proto")
+
+val containsStringRegex = "\\s*string .*"
+val invalidDefinition = new mutable.ArrayBuffer[(String, Int)]()
+var lineNumber = 1
+Source.fromFile(protoFile.toFile.getCanonicalPath).getLines().foreach { 
line =>
+  if (line.matches(containsStringRegex)) {

Review Comment:
   +1 for the above comment.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun opened a new pull request, #39735: [SPARK-42179][BUILD][SQL][3.3] Upgrade ORC to 1.7.8

2023-01-24 Thread via GitHub


dongjoon-hyun opened a new pull request, #39735:
URL: https://github.com/apache/spark/pull/39735

   ### What changes were proposed in this pull request?
   
   This PR aims to upgrade ORC to 1.7.8 for Apache Spark 3.3.2.
   
   ### Why are the changes needed?
   
   Apache ORC 1.7.8 is a maintenance release with important bug fixes.
   - https://orc.apache.org/news/2023/01/21/ORC-1.7.8/
   
   ### Does this PR introduce _any_ user-facing change?
   
   No.
   
   ### How was this patch tested?
   
   Pass the CIs.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] itholic commented on a diff in pull request #39702: [SPARK-41487][SQL] Assign name to _LEGACY_ERROR_TEMP_1020

2023-01-24 Thread via GitHub


itholic commented on code in PR #39702:
URL: https://github.com/apache/spark/pull/39702#discussion_r1086279876


##
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/analysis/AnalysisErrorSuite.scala:
##
@@ -1050,15 +1052,17 @@ class AnalysisErrorSuite extends AnalysisTest {
 val t2 = LocalRelation(b, c).as("t2")
 
 // SELECT * FROM t1 WHERE a = (SELECT sum(c) FROM t2 WHERE t1.* = t2.b)
-assertAnalysisError(
+assertAnalysisErrorClass(
   Filter(EqualTo(a, ScalarSubquery(t2.select(sum(c)).where(star("t1") === 
b))), t1),
-  "Invalid usage of '*' in Filter" :: Nil
+  expectedErrorClass = "INVALID_USAGE_OF_STAR",
+  expectedMessageParameters = Map("elem" -> "'*'", "prettyName" -> 
"Filter")
 )
 
 // SELECT * FROM t1 JOIN t2 ON (EXISTS (SELECT 1 FROM t2 WHERE t1.* = b))
-assertAnalysisError(
+assertAnalysisErrorClass(
   t1.join(t2, condition = Some(Exists(t2.select(1).where(star("t1") === 
b,
-  "Invalid usage of '*' in Filter" :: Nil
+  expectedErrorClass = "INVALID_USAGE_OF_STAR",
+  expectedMessageParameters = Map("elem" -> "'*'", "prettyName" -> 
"Filter")

Review Comment:
   I tried to `operator.expressions.map(toPrettySQL).mkString(",")`, but it 
returns `"(unresolvedstar() = b)"`.



##
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/analysis/AnalysisErrorSuite.scala:
##
@@ -1050,15 +1052,17 @@ class AnalysisErrorSuite extends AnalysisTest {
 val t2 = LocalRelation(b, c).as("t2")
 
 // SELECT * FROM t1 WHERE a = (SELECT sum(c) FROM t2 WHERE t1.* = t2.b)
-assertAnalysisError(
+assertAnalysisErrorClass(
   Filter(EqualTo(a, ScalarSubquery(t2.select(sum(c)).where(star("t1") === 
b))), t1),
-  "Invalid usage of '*' in Filter" :: Nil
+  expectedErrorClass = "INVALID_USAGE_OF_STAR",
+  expectedMessageParameters = Map("elem" -> "'*'", "prettyName" -> 
"Filter")
 )
 
 // SELECT * FROM t1 JOIN t2 ON (EXISTS (SELECT 1 FROM t2 WHERE t1.* = b))
-assertAnalysisError(
+assertAnalysisErrorClass(
   t1.join(t2, condition = Some(Exists(t2.select(1).where(star("t1") === 
b,
-  "Invalid usage of '*' in Filter" :: Nil
+  expectedErrorClass = "INVALID_USAGE_OF_STAR",
+  expectedMessageParameters = Map("elem" -> "'*'", "prettyName" -> 
"Filter")

Review Comment:
   I tried to `operator.expressions.map(toPrettySQL).mkString(",")` to make SQL 
expression, but it returns `"(unresolvedstar() = b)"`.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] itholic commented on a diff in pull request #39701: [SPARK-41489][SQL] Assign name to _LEGACY_ERROR_TEMP_2415

2023-01-24 Thread via GitHub


itholic commented on code in PR #39701:
URL: https://github.com/apache/spark/pull/39701#discussion_r1086276342


##
core/src/main/resources/error/error-classes.json:
##
@@ -933,6 +933,12 @@
 ],
 "sqlState" : "42604"
   },
+  "INVALID_TYPE_FOR_FILTER_EXPR" : {

Review Comment:
   Yes, I agree that it's not really related to an input parameter.
   
   Let me put it in the `DATATYPE_MISMATCH` and rename it as 
`FILTER_NOT_BOOLEAN` for now.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on pull request #39731: [SPARK-42177][INFRA][3.4] Change master to branch-3.4 in GitHub Actions

2023-01-24 Thread via GitHub


dongjoon-hyun commented on PR #39731:
URL: https://github.com/apache/spark/pull/39731#issuecomment-1403185289

   Thank you, @HyukjinKwon !


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] zhengruifeng commented on a diff in pull request #39695: [SPARK-42156] SparkConnectClient supports RetryPolicies now

2023-01-24 Thread via GitHub


zhengruifeng commented on code in PR #39695:
URL: https://github.com/apache/spark/pull/39695#discussion_r1086269322


##
python/pyspark/sql/connect/client.py:
##
@@ -551,37 +592,37 @@ def _execute_and_fetch(
 logger.info("ExecuteAndFetch")
 
 m: Optional[pb2.ExecutePlanResponse.Metrics] = None
-
 batches: List[pa.RecordBatch] = []
 
 try:
-for b in self._stub.ExecutePlan(req, 
metadata=self._builder.metadata()):
-if b.client_id != self._session_id:
-raise SparkConnectException(
-"Received incorrect session identifier for request."
-)
-if b.metrics is not None:
-logger.debug("Received metric batch.")
-m = b.metrics
-if b.HasField("arrow_batch"):
-logger.debug(
-f"Received arrow batch rows={b.arrow_batch.row_count} "
-f"size={len(b.arrow_batch.data)}"
-)
-
-with pa.ipc.open_stream(b.arrow_batch.data) as reader:
-for batch in reader:
-assert isinstance(batch, pa.RecordBatch)
-batches.append(batch)
+for attempt in Retrying(
+can_retry=SparkConnectClient.retry_exception, 
**self._retry_policy
+):
+with attempt:

Review Comment:
   before each attempt, should we set `m = None` and `batches = []`? 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] zhengruifeng commented on a diff in pull request #39734: [SPARK-41812][SPARK-41823][CONNECT][PYTHON] Fix ambiguous columns issue in Join

2023-01-24 Thread via GitHub


zhengruifeng commented on code in PR #39734:
URL: https://github.com/apache/spark/pull/39734#discussion_r1086264760


##
connector/connect/server/src/test/scala/org/apache/spark/sql/connect/planner/SparkConnectPlannerSuite.scala:
##
@@ -260,21 +260,21 @@ class SparkConnectPlannerSuite extends SparkFunSuite with 
SparkConnectPlanTest {
 .addArguments(unresolvedAttribute)
 .build())
 
-val simpleJoin = proto.Relation.newBuilder
-  .setJoin(
-proto.Join.newBuilder
-  .setLeft(readRel)
-  .setRight(readRel)
-  .setJoinType(proto.Join.JoinType.JOIN_TYPE_INNER)
-  .setJoinCondition(joinCondition)
-  .build())
-  .build()
-
-val res = transform(simpleJoin)
-assert(res.nodeName == "Join")
-assert(res != null)
+val e0 = intercept[AnalysisException] {

Review Comment:
   after this PR, Join with `JoinCondition` will be eagerly analyzed.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] itholic commented on a diff in pull request #39702: [SPARK-41487][SQL] Assign name to _LEGACY_ERROR_TEMP_1020

2023-01-24 Thread via GitHub


itholic commented on code in PR #39702:
URL: https://github.com/apache/spark/pull/39702#discussion_r1086259893


##
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/analysis/AnalysisErrorSuite.scala:
##
@@ -1050,15 +1052,17 @@ class AnalysisErrorSuite extends AnalysisTest {
 val t2 = LocalRelation(b, c).as("t2")
 
 // SELECT * FROM t1 WHERE a = (SELECT sum(c) FROM t2 WHERE t1.* = t2.b)
-assertAnalysisError(
+assertAnalysisErrorClass(
   Filter(EqualTo(a, ScalarSubquery(t2.select(sum(c)).where(star("t1") === 
b))), t1),
-  "Invalid usage of '*' in Filter" :: Nil
+  expectedErrorClass = "INVALID_USAGE_OF_STAR",
+  expectedMessageParameters = Map("elem" -> "'*'", "prettyName" -> 
"Filter")
 )
 
 // SELECT * FROM t1 JOIN t2 ON (EXISTS (SELECT 1 FROM t2 WHERE t1.* = b))
-assertAnalysisError(
+assertAnalysisErrorClass(
   t1.join(t2, condition = Some(Exists(t2.select(1).where(star("t1") === 
b,
-  "Invalid usage of '*' in Filter" :: Nil
+  expectedErrorClass = "INVALID_USAGE_OF_STAR",
+  expectedMessageParameters = Map("elem" -> "'*'", "prettyName" -> 
"Filter")

Review Comment:
   Thanks @MaxGekk for review.
   
   I tried to output a SQL statement here, but unfortunately I couldn't proper 
way to expose the SQL statement rather than node name  
   
   The code path is here:
   
   
https://github.com/apache/spark/blob/866343c7be47d71b88ae9a6b4dda26f8c4f5964b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala#L247-L250
   
   The type of `operator` is `LogicalPlan`, and `s` is `Star` here.
   
   Can we extract the SQL statement from these objects?
   
   Do you happen to help outputting a SQL statement here??



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] zhengruifeng commented on a diff in pull request #39734: [SPARK-41812][SPARK-41823][CONNECT][PYTHON] Fix ambiguous columns issue in Join

2023-01-24 Thread via GitHub


zhengruifeng commented on code in PR #39734:
URL: https://github.com/apache/spark/pull/39734#discussion_r1086250404


##
connector/connect/server/src/main/scala/org/apache/spark/sql/connect/planner/SparkConnectPlanner.scala:
##
@@ -1174,27 +1174,63 @@ class SparkConnectPlanner(val session: SparkSession) {
 }
   }
 
+  private def splitConjunctivePredicates(condition: Expression): 
Seq[Expression] = {
+condition match {
+  case UnresolvedFunction(Seq("and"), Seq(cond1, cond2), _, _, _) =>
+splitConjunctivePredicates(cond1) ++ splitConjunctivePredicates(cond2)
+  case other => other :: Nil
+}
+  }
+
   private def transformJoin(rel: proto.Join): LogicalPlan = {
 assert(rel.hasLeft && rel.hasRight, "Both join sides must be present")
 if (rel.hasJoinCondition && rel.getUsingColumnsCount > 0) {
   throw InvalidPlanInput(
 s"Using columns or join conditions cannot be set at the same time in 
Join")
 }
-val joinCondition =
-  if (rel.hasJoinCondition) 
Some(transformExpression(rel.getJoinCondition)) else None
+
 val catalystJointype = transformJoinType(
   if (rel.getJoinType != null) rel.getJoinType else 
proto.Join.JoinType.JOIN_TYPE_INNER)
 val joinType = if (rel.getUsingColumnsCount > 0) {
   UsingJoin(catalystJointype, rel.getUsingColumnsList.asScala.toSeq)
 } else {
   catalystJointype
 }
-logical.Join(
-  left = transformRelation(rel.getLeft),
-  right = transformRelation(rel.getRight),
-  joinType = joinType,
-  condition = joinCondition,
-  hint = logical.JoinHint.NONE)
+
+if (rel.hasJoinCondition) {
+  val leftDF = Dataset.ofRows(session, transformRelation(rel.getLeft))
+  val rightDF = Dataset.ofRows(session, transformRelation(rel.getRight))
+  val joinExprs = 
splitConjunctivePredicates(transformExpression(rel.getJoinCondition))
+.map {
+  case func @ UnresolvedFunction(Seq(f), Seq(l, r), _, _, _)
+  if Seq("==", "<=>").contains(f) =>
+val l2 = l match {
+  case UnresolvedAttribute(Seq(c)) => leftDF.apply(c).expr
+  case other => other
+}
+val r2 = r match {
+  case UnresolvedAttribute(Seq(c)) => rightDF.apply(c).expr
+  case other => other
+}
+func.copy(arguments = Seq(l2, r2))
+
+  case other => other
+}
+.reduce(And)
+
+  leftDF

Review Comment:
   we must use `DataFrame.join` here to make sure the DataFrame ID will not be 
changed.
   
   
https://github.com/apache/spark/blob/faedcd91d554a00fc76116a0c188752cf036f907/sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala#L202-L203



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] otterc commented on a diff in pull request #39725: [SPARK-33573][FOLLOW-UP] Increment ignoredBlockBytes when shuffle push blocks are late or colliding

2023-01-24 Thread via GitHub


otterc commented on code in PR #39725:
URL: https://github.com/apache/spark/pull/39725#discussion_r1086246214


##
common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/RemoteBlockPushResolver.java:
##
@@ -1356,6 +1357,18 @@ private boolean isTooLate(
 
!appShuffleMergePartitionsInfo.shuffleMergePartitions.containsKey(reduceId);
 }
 
+/**
+ * For the block bytes in the deferred buffers that are ignored, capture 
them
+ * and update pushMergeMetrics's ignoredBlockBytes.
+ */
+private void updateIgnoredBytesWithDeferredBufs() {
+  if (deferredBufs != null && !deferredBufs.isEmpty()) {
+for (ByteBuffer buf : deferredBufs) {
+  
mergeManager.pushMergeMetrics.ignoredBlockBytes.mark(buf.remaining());
+}
+  }
+}
+

Review Comment:
   ```suggestion
   private void updateIgnoredBytes(long numBytes) {
 if (numBytes > 0) {
   mergeManager.pushMergeMetrics.ignoredBlockBytes.mark(numBytes);
 }
   }
   
   ```



##
common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/RemoteBlockPushResolver.java:
##
@@ -1315,6 +1315,7 @@ private void freeDeferredBufs() {
  */
 private void abortIfNecessary() {
   if 
(partitionInfo.shouldAbort(mergeManager.ioExceptionsThresholdDuringMerge)) {
+updateIgnoredBytesWithDeferredBufs();
 freeDeferredBufs();

Review Comment:
   ```suggestion
   updateIgnoredBytes(freeDeferredBufs());
   ```



##
common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/RemoteBlockPushResolver.java:
##
@@ -1461,6 +1476,7 @@ public void onComplete(String streamId) throws 
IOException {
 AppShuffleMergePartitionsInfo info =
 
appShuffleInfo.shuffles.get(partitionInfo.appAttemptShuffleMergeId.shuffleId);
 if (isTooLate(info, partitionInfo.reduceId)) {
+  updateIgnoredBytesWithDeferredBufs();
   freeDeferredBufs();

Review Comment:
   ```suggestion
 updateIgnoredBytes(freeDeferredBufs());
   ```



##
common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/RemoteBlockPushResolver.java:
##
@@ -1393,6 +1407,7 @@ public void onData(String streamId, ByteBuffer buf) 
throws IOException {
   // the client in cases of duplicate even though no data is written.
   if (isDuplicateBlock()) {
 freeDeferredBufs();
+mergeManager.pushMergeMetrics.ignoredBlockBytes.mark(buf.limit());

Review Comment:
   ```suggestion
   updateIgnoredBytes(freeDeferredBufs() + buf.remaining());
   ```



##
common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/RemoteBlockPushResolver.java:
##
@@ -1520,6 +1537,7 @@ public void onComplete(String streamId) throws 
IOException {
 partitionInfo.resetChunkTracker();
   }
 } else {
+  updateIgnoredBytesWithDeferredBufs();
   freeDeferredBufs();

Review Comment:
   ```suggestion
 updateIgnoredBytes(freeDeferredBufs());
   ```



##
common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/RemoteBlockPushResolver.java:
##
@@ -1480,6 +1496,7 @@ public void onComplete(String streamId) throws 
IOException {
   // Identify duplicate block generated by speculative tasks. We 
respond success to
   // the client in cases of duplicate even though no data is written.
   if (isDuplicateBlock()) {
+updateIgnoredBytesWithDeferredBufs();
 freeDeferredBufs();

Review Comment:
   ```suggestion
 updateIgnoredBytes(freeDeferredBufs());
   ```



##
common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/RemoteBlockPushResolver.java:
##
@@ -1382,6 +1395,7 @@ public void onData(String streamId, ByteBuffer buf) 
throws IOException {
   freeDeferredBufs();
   if (isTooLateBlockPush) {
 mergeManager.pushMergeMetrics.lateBlockPushes.mark();
+mergeManager.pushMergeMetrics.ignoredBlockBytes.mark(buf.limit());

Review Comment:
   ```suggestion
 long deferredBytes = freeDeferredBufs();
 if (isTooLateBlockPush) {
   mergeManager.pushMergeMetrics.lateBlockPushes.mark();
   updateIgnoredBytes(deferredBytes + buf.remaining());
   ```



##
common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/RemoteBlockPushResolver.java:
##
@@ -1315,6 +1315,7 @@ private void freeDeferredBufs() {
  */

Review Comment:
   Wasn't able to comment on the `freeDeferredBufs()`  lines since it is not 
changed. We can change it to below to apply my other suggestions.
   ```
   /**
* @return total number of deferred bytes
*/
   private long freeDeferredBufs() {
 if (deferredBufs == null || deferredBufs.isEmpty()) {
   deferredBufs = 

[GitHub] [spark] zhengruifeng commented on a diff in pull request #39734: [SPARK-41812][SPARK-41823][CONNECT][PYTHON] Fix ambiguous columns in Join

2023-01-24 Thread via GitHub


zhengruifeng commented on code in PR #39734:
URL: https://github.com/apache/spark/pull/39734#discussion_r1086250404


##
connector/connect/server/src/main/scala/org/apache/spark/sql/connect/planner/SparkConnectPlanner.scala:
##
@@ -1174,27 +1174,63 @@ class SparkConnectPlanner(val session: SparkSession) {
 }
   }
 
+  private def splitConjunctivePredicates(condition: Expression): 
Seq[Expression] = {
+condition match {
+  case UnresolvedFunction(Seq("and"), Seq(cond1, cond2), _, _, _) =>
+splitConjunctivePredicates(cond1) ++ splitConjunctivePredicates(cond2)
+  case other => other :: Nil
+}
+  }
+
   private def transformJoin(rel: proto.Join): LogicalPlan = {
 assert(rel.hasLeft && rel.hasRight, "Both join sides must be present")
 if (rel.hasJoinCondition && rel.getUsingColumnsCount > 0) {
   throw InvalidPlanInput(
 s"Using columns or join conditions cannot be set at the same time in 
Join")
 }
-val joinCondition =
-  if (rel.hasJoinCondition) 
Some(transformExpression(rel.getJoinCondition)) else None
+
 val catalystJointype = transformJoinType(
   if (rel.getJoinType != null) rel.getJoinType else 
proto.Join.JoinType.JOIN_TYPE_INNER)
 val joinType = if (rel.getUsingColumnsCount > 0) {
   UsingJoin(catalystJointype, rel.getUsingColumnsList.asScala.toSeq)
 } else {
   catalystJointype
 }
-logical.Join(
-  left = transformRelation(rel.getLeft),
-  right = transformRelation(rel.getRight),
-  joinType = joinType,
-  condition = joinCondition,
-  hint = logical.JoinHint.NONE)
+
+if (rel.hasJoinCondition) {
+  val leftDF = Dataset.ofRows(session, transformRelation(rel.getLeft))
+  val rightDF = Dataset.ofRows(session, transformRelation(rel.getRight))
+  val joinExprs = 
splitConjunctivePredicates(transformExpression(rel.getJoinCondition))
+.map {
+  case func @ UnresolvedFunction(Seq(f), Seq(l, r), _, _, _)
+  if Seq("==", "<=>").contains(f) =>
+val l2 = l match {
+  case UnresolvedAttribute(Seq(c)) => leftDF.apply(c).expr
+  case other => other
+}
+val r2 = r match {
+  case UnresolvedAttribute(Seq(c)) => rightDF.apply(c).expr
+  case other => other
+}
+func.copy(arguments = Seq(l2, r2))
+
+  case other => other
+}
+.reduce(And)
+
+  leftDF

Review Comment:
   we must use `DataFrame.join` here to make sure the DataFrame ID is not 
changed.
   
   
https://github.com/apache/spark/blob/faedcd91d554a00fc76116a0c188752cf036f907/sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala#L202-L203



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] zhengruifeng commented on a diff in pull request #39734: [SPARK-41812][SPARK-41823][CONNECT][PYTHON] Fix ambiguous columns in Join

2023-01-24 Thread via GitHub


zhengruifeng commented on code in PR #39734:
URL: https://github.com/apache/spark/pull/39734#discussion_r1086248363


##
connector/connect/server/src/main/scala/org/apache/spark/sql/connect/planner/SparkConnectPlanner.scala:
##
@@ -1174,27 +1174,63 @@ class SparkConnectPlanner(val session: SparkSession) {
 }
   }
 
+  private def splitConjunctivePredicates(condition: Expression): 
Seq[Expression] = {
+condition match {
+  case UnresolvedFunction(Seq("and"), Seq(cond1, cond2), _, _, _) =>
+splitConjunctivePredicates(cond1) ++ splitConjunctivePredicates(cond2)
+  case other => other :: Nil
+}
+  }
+
   private def transformJoin(rel: proto.Join): LogicalPlan = {
 assert(rel.hasLeft && rel.hasRight, "Both join sides must be present")
 if (rel.hasJoinCondition && rel.getUsingColumnsCount > 0) {
   throw InvalidPlanInput(
 s"Using columns or join conditions cannot be set at the same time in 
Join")
 }
-val joinCondition =
-  if (rel.hasJoinCondition) 
Some(transformExpression(rel.getJoinCondition)) else None
+
 val catalystJointype = transformJoinType(
   if (rel.getJoinType != null) rel.getJoinType else 
proto.Join.JoinType.JOIN_TYPE_INNER)
 val joinType = if (rel.getUsingColumnsCount > 0) {
   UsingJoin(catalystJointype, rel.getUsingColumnsList.asScala.toSeq)
 } else {
   catalystJointype
 }
-logical.Join(
-  left = transformRelation(rel.getLeft),
-  right = transformRelation(rel.getRight),
-  joinType = joinType,
-  condition = joinCondition,
-  hint = logical.JoinHint.NONE)
+
+if (rel.hasJoinCondition) {
+  val leftDF = Dataset.ofRows(session, transformRelation(rel.getLeft))
+  val rightDF = Dataset.ofRows(session, transformRelation(rel.getRight))
+  val joinExprs = 
splitConjunctivePredicates(transformExpression(rel.getJoinCondition))
+.map {
+  case func @ UnresolvedFunction(Seq(f), Seq(l, r), _, _, _)
+  if Seq("==", "<=>").contains(f) =>

Review Comment:
   currently, both `a == a` and `a <=> a` are taken into account



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] zhengruifeng commented on a diff in pull request #39734: [SPARK-41812][SPARK-41823][CONNECT][PYTHON] Fix ambiguous columns in Join

2023-01-24 Thread via GitHub


zhengruifeng commented on code in PR #39734:
URL: https://github.com/apache/spark/pull/39734#discussion_r1086248363


##
connector/connect/server/src/main/scala/org/apache/spark/sql/connect/planner/SparkConnectPlanner.scala:
##
@@ -1174,27 +1174,63 @@ class SparkConnectPlanner(val session: SparkSession) {
 }
   }
 
+  private def splitConjunctivePredicates(condition: Expression): 
Seq[Expression] = {
+condition match {
+  case UnresolvedFunction(Seq("and"), Seq(cond1, cond2), _, _, _) =>
+splitConjunctivePredicates(cond1) ++ splitConjunctivePredicates(cond2)
+  case other => other :: Nil
+}
+  }
+
   private def transformJoin(rel: proto.Join): LogicalPlan = {
 assert(rel.hasLeft && rel.hasRight, "Both join sides must be present")
 if (rel.hasJoinCondition && rel.getUsingColumnsCount > 0) {
   throw InvalidPlanInput(
 s"Using columns or join conditions cannot be set at the same time in 
Join")
 }
-val joinCondition =
-  if (rel.hasJoinCondition) 
Some(transformExpression(rel.getJoinCondition)) else None
+
 val catalystJointype = transformJoinType(
   if (rel.getJoinType != null) rel.getJoinType else 
proto.Join.JoinType.JOIN_TYPE_INNER)
 val joinType = if (rel.getUsingColumnsCount > 0) {
   UsingJoin(catalystJointype, rel.getUsingColumnsList.asScala.toSeq)
 } else {
   catalystJointype
 }
-logical.Join(
-  left = transformRelation(rel.getLeft),
-  right = transformRelation(rel.getRight),
-  joinType = joinType,
-  condition = joinCondition,
-  hint = logical.JoinHint.NONE)
+
+if (rel.hasJoinCondition) {
+  val leftDF = Dataset.ofRows(session, transformRelation(rel.getLeft))
+  val rightDF = Dataset.ofRows(session, transformRelation(rel.getRight))
+  val joinExprs = 
splitConjunctivePredicates(transformExpression(rel.getJoinCondition))
+.map {
+  case func @ UnresolvedFunction(Seq(f), Seq(l, r), _, _, _)
+  if Seq("==", "<=>").contains(f) =>

Review Comment:
   currently, only `a == a` and `a <=> a` are taken into account



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] zhengruifeng opened a new pull request, #39734: [SPARK-41812][SPARK-41823][CONNECT][PYTHON] Fix ambiguous columns in Join

2023-01-24 Thread via GitHub


zhengruifeng opened a new pull request, #39734:
URL: https://github.com/apache/spark/pull/39734

   ### What changes were proposed in this pull request?
   
   PySpark's `DataFrame.__getattr__` and `DataFrame.__getitem__` invokes `jc = 
self._jdf.apply(name)` in JVM, which resolve the column name and attach the 
dataframe id via `addDataFrameIdToCol` to handle ambiguous columns , see 
https://github.com/apache/spark/blob/faedcd91d554a00fc76116a0c188752cf036f907/sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala#L1472-L1481
   
   But in Connect, the output of `DataFrame.__getattr__` and 
`DataFrame.__getitem__` is not bound to the input `DataFrame`, it is just an 
`UnresolvedAttribute`.
   
   This PR aims to fix this issue by switching to the DataFrame API based 
implementation.
   
   ### Why are the changes needed?
   for parity
   
   ### Does this PR introduce _any_ user-facing change?
   yes
   
   ### How was this patch tested?
   enabled doctests and added UT
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] otterc commented on a diff in pull request #39725: [SPARK-33573][FOLLOW-UP] Increment ignoredBlockBytes when shuffle push blocks are late or colliding

2023-01-24 Thread via GitHub


otterc commented on code in PR #39725:
URL: https://github.com/apache/spark/pull/39725#discussion_r1086238663


##
common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/RemoteBlockPushResolver.java:
##
@@ -1382,6 +1395,7 @@ public void onData(String streamId, ByteBuffer buf) 
throws IOException {
   freeDeferredBufs();

Review Comment:
   When it comes here because of `isTooLateBlockPush` then the whole block is 
going to be ignored. That means the deferredBufs which are getting freed will 
be ignored as well so we need to add those bytes to the ignoreBlockBytes  as 
well, correct?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] LuciferYang commented on a diff in pull request #39732: [SPARK-42178][UI] Handle remaining null string values in ui protobuf serializer and add tests

2023-01-24 Thread via GitHub


LuciferYang commented on code in PR #39732:
URL: https://github.com/apache/spark/pull/39732#discussion_r1086234357


##
core/src/test/scala/org/apache/spark/status/protobuf/KVStoreProtobufSerializerSuite.scala:
##
@@ -31,6 +34,32 @@ import org.apache.spark.ui.scope.{RDDOperationEdge, 
RDDOperationNode}
 class KVStoreProtobufSerializerSuite extends SparkFunSuite {
   private val serializer = new KVStoreProtobufSerializer()
 
+  test("All the string fields must be optional to avoid NPE") {
+val protoFile = getWorkspaceFilePath(
+  "core", "src", "main", "protobuf", "org", "apache", "spark", "status", 
"protobuf",
+  "store_types.proto")
+
+val containsStringRegex = "\\s*string .*"
+val invalidDefinition = new mutable.ArrayBuffer[(String, Int)]()
+var lineNumber = 1
+Source.fromFile(protoFile.toFile.getCanonicalPath).getLines().foreach { 
line =>
+  if (line.matches(containsStringRegex)) {

Review Comment:
   `org.apache.spark.util.Utils.tryWithResource` should be used to 
ensure`Source.fromFile(protoFile.toFile.getCanonicalPath)` is closed after use
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] LuciferYang commented on a diff in pull request #39732: [SPARK-42178][UI] Handle remaining null string values in ui protobuf serializer and add tests

2023-01-24 Thread via GitHub


LuciferYang commented on code in PR #39732:
URL: https://github.com/apache/spark/pull/39732#discussion_r1086234357


##
core/src/test/scala/org/apache/spark/status/protobuf/KVStoreProtobufSerializerSuite.scala:
##
@@ -31,6 +34,32 @@ import org.apache.spark.ui.scope.{RDDOperationEdge, 
RDDOperationNode}
 class KVStoreProtobufSerializerSuite extends SparkFunSuite {
   private val serializer = new KVStoreProtobufSerializer()
 
+  test("All the string fields must be optional to avoid NPE") {
+val protoFile = getWorkspaceFilePath(
+  "core", "src", "main", "protobuf", "org", "apache", "spark", "status", 
"protobuf",
+  "store_types.proto")
+
+val containsStringRegex = "\\s*string .*"
+val invalidDefinition = new mutable.ArrayBuffer[(String, Int)]()
+var lineNumber = 1
+Source.fromFile(protoFile.toFile.getCanonicalPath).getLines().foreach { 
line =>
+  if (line.matches(containsStringRegex)) {

Review Comment:
   `org.apache.spark.util.Utils.tryWithResource` should be used to 
ensure`Source.fromFile(protoFile.toFile.getCanonicalPath)` is closed
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] xinrong-meng opened a new pull request, #39733: Setting version to 3.5.0-SNAPSHOT

2023-01-24 Thread via GitHub


xinrong-meng opened a new pull request, #39733:
URL: https://github.com/apache/spark/pull/39733

   ### What changes were proposed in this pull request?
   his PR aims to update `master` branch version to 3.5.0-SNAPSHOT.
   
   ### Why are the changes needed?
   Start to prepare Apache Spark 3.5.0 and the published snapshot version 
should not conflict with `branch-3.4`.
   
   ### Does this PR introduce _any_ user-facing change?
   N/A.
   
   ### How was this patch tested?
   CI.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] otterc commented on a diff in pull request #39725: [SPARK-33573][FOLLOW-UP] Increment ignoredBlockBytes when shuffle push blocks are late or colliding

2023-01-24 Thread via GitHub


otterc commented on code in PR #39725:
URL: https://github.com/apache/spark/pull/39725#discussion_r1086008666


##
common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/RemoteBlockPushResolver.java:
##
@@ -1382,6 +1395,7 @@ public void onData(String streamId, ByteBuffer buf) 
throws IOException {
   freeDeferredBufs();
   if (isTooLateBlockPush) {
 mergeManager.pushMergeMetrics.lateBlockPushes.mark();
+mergeManager.pushMergeMetrics.ignoredBlockBytes.mark(buf.limit());

Review Comment:
   should be` buf.remaining()`



##
common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/RemoteBlockPushResolver.java:
##
@@ -1356,6 +1357,18 @@ private boolean isTooLate(
 
!appShuffleMergePartitionsInfo.shuffleMergePartitions.containsKey(reduceId);
 }
 
+/**
+ * For the block bytes in the deferred buffers that are ignored, capture 
them
+ * and update pushMergeMetrics's ignoredBlockBytes.
+ */
+private void updateIgnoredBytesWithDeferredBufs() {
+  if (deferredBufs != null && !deferredBufs.isEmpty()) {
+for (ByteBuffer buf : deferredBufs) {
+  
mergeManager.pushMergeMetrics.ignoredBlockBytes.mark(buf.remaining());

Review Comment:
   I made a mistake with the earlier comment. For deferredBufs, we can use 
`buf.limit`.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] gengliangwang commented on a diff in pull request #39732: [SPARK-42178][UI] Handle remaining null string values in ui protobuf serializer and add tests

2023-01-24 Thread via GitHub


gengliangwang commented on code in PR #39732:
URL: https://github.com/apache/spark/pull/39732#discussion_r1086223751


##
core/src/test/scala/org/apache/spark/status/protobuf/KVStoreProtobufSerializerSuite.scala:
##
@@ -31,6 +34,32 @@ import org.apache.spark.ui.scope.{RDDOperationEdge, 
RDDOperationNode}
 class KVStoreProtobufSerializerSuite extends SparkFunSuite {
   private val serializer = new KVStoreProtobufSerializer()
 
+  test("All the string fields must be optional to avoid NPE") {

Review Comment:
   This test checks the protobuf file to make sure all the string fields are 
defined as `optional string`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] gengliangwang commented on pull request #39732: [SPARK-42178][UI] Handle remaining null string values in ui protobuf serializer and add tests

2023-01-24 Thread via GitHub


gengliangwang commented on PR #39732:
URL: https://github.com/apache/spark/pull/39732#issuecomment-1403122822

   cc @LuciferYang 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] gengliangwang opened a new pull request, #39732: [SPARK-42178][UI] Handle remaining null string values in ui protobuf serializer and add tests

2023-01-24 Thread via GitHub


gengliangwang opened a new pull request, #39732:
URL: https://github.com/apache/spark/pull/39732

   
   
   ### What changes were proposed in this pull request?
   
   * Similar to https://github.com/apache/spark/pull/39666, handle remaining 
null string values in ui protobuf serializer, including `RDDStorageInfo` and 
`ResourceInformation`
   * Add test to make sure all the string fields are defined as `optional 
string`. 
   
   ### Why are the changes needed?
   
   
   Properly handles null string values in the protobuf serializer.
   
   
   ### Does this PR introduce _any_ user-facing change?
   
   
   No
   ### How was this patch tested?
   
   New UTs


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] LuciferYang commented on a diff in pull request #39712: [SPARK-42172][CONNECT] Scala Client Mima Compatibility Tests

2023-01-24 Thread via GitHub


LuciferYang commented on code in PR #39712:
URL: https://github.com/apache/spark/pull/39712#discussion_r1086208244


##
connector/connect/client/jvm/pom.xml:
##
@@ -75,6 +76,13 @@
   mockito-core
   test
 
+

Review Comment:
   cc @dongjoon-hyun , also cc @pan3793 Do you have any suggestions for this?
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] LuciferYang commented on pull request #39642: [SPARK-41677][CORE][SQL][SS][UI] Add Protobuf serializer for `StreamingQueryProgressWrapper`

2023-01-24 Thread via GitHub


LuciferYang commented on PR #39642:
URL: https://github.com/apache/spark/pull/39642#issuecomment-1403096069

   done. Thank you for reviewing the code during the holiday :) @gengliangwang 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] gengliangwang commented on a diff in pull request #39642: [SPARK-41677][CORE][SQL][SS][UI] Add Protobuf serializer for `StreamingQueryProgressWrapper`

2023-01-24 Thread via GitHub


gengliangwang commented on code in PR #39642:
URL: https://github.com/apache/spark/pull/39642#discussion_r1086194955


##
sql/core/src/main/scala/org/apache/spark/status/protobuf/sql/StreamingQueryProgressSerializer.scala:
##
@@ -0,0 +1,102 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.status.protobuf.sql
+
+import java.util.{HashMap => JHashMap, Map => JMap, UUID}
+
+import com.fasterxml.jackson.databind.json.JsonMapper
+import com.fasterxml.jackson.module.scala.DefaultScalaModule
+
+import org.apache.spark.sql.Row
+import org.apache.spark.sql.catalyst.expressions.GenericRowWithSchema
+import org.apache.spark.sql.streaming.StreamingQueryProgress
+import org.apache.spark.status.protobuf.StoreTypes
+import org.apache.spark.status.protobuf.Utils.{getStringField, setJMapField, 
setStringField}
+
+private[protobuf] object StreamingQueryProgressSerializer {
+
+  private val mapper: JsonMapper = JsonMapper.builder()
+.addModule(DefaultScalaModule)
+.build()
+
+  def serialize(process: StreamingQueryProgress): 
StoreTypes.StreamingQueryProgress = {
+val builder = StoreTypes.StreamingQueryProgress.newBuilder()
+if (process.id != null) {
+  builder.setId(process.id.toString)
+}
+if (process.runId != null) {
+  builder.setRunId(process.runId.toString)
+}
+setStringField(process.name, builder.setName)
+setStringField(process.timestamp, builder.setTimestamp)
+builder.setBatchId(process.batchId)
+builder.setBatchDuration(process.batchDuration)
+setJMapField(process.durationMs, builder.putAllDurationMs)
+setJMapField(process.eventTime, builder.putAllEventTime)
+process.stateOperators.foreach(
+  s => 
builder.addStateOperators(StateOperatorProgressSerializer.serialize(s)))
+process.sources.foreach(
+  s => builder.addSources(SourceProgressSerializer.serialize(s))
+)
+builder.setSink(SinkProgressSerializer.serialize(process.sink))
+setJMapField(process.observedMetrics, putAllObservedMetrics(builder, _))
+builder.build()
+  }
+
+  def deserialize(process: StoreTypes.StreamingQueryProgress): 
StreamingQueryProgress = {
+val id = if (process.hasId) {
+  UUID.fromString(process.getId)
+} else null
+val runId = if (process.hasId) {
+  UUID.fromString(process.getRunId)
+} else null
+new StreamingQueryProgress(
+  id = id,
+  runId = runId,
+  name = getStringField(process.hasName, () => process.getName),
+  timestamp = getStringField(process.hasTimestamp, () => 
process.getTimestamp),
+  batchId = process.getBatchId,
+  batchDuration = process.getBatchDuration,
+  durationMs = new JHashMap(process.getDurationMsMap),
+  eventTime = new JHashMap(process.getEventTimeMap),
+  stateOperators =
+
StateOperatorProgressSerializer.deserializeToArray(process.getStateOperatorsList),
+  sources = 
SourceProgressSerializer.deserializeToArray(process.getSourcesList),
+  sink = SinkProgressSerializer.deserialize(process.getSink),
+  observedMetrics = convertToObservedMetrics(process.getObservedMetricsMap)
+)
+  }
+
+  private def putAllObservedMetrics(
+  builder: StoreTypes.StreamingQueryProgress.Builder,
+  observedMetrics: JMap[String, Row]): Unit = {
+observedMetrics.forEach {
+  case (k, v) => builder.putObservedMetrics(k, 
mapper.writeValueAsString(v))

Review Comment:
   nit: Let's add one line comment to mention why we choose to encode the row 
object with json.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] gengliangwang commented on a diff in pull request #39642: [SPARK-41677][CORE][SQL][SS][UI] Add Protobuf serializer for `StreamingQueryProgressWrapper`

2023-01-24 Thread via GitHub


gengliangwang commented on code in PR #39642:
URL: https://github.com/apache/spark/pull/39642#discussion_r1086190132


##
sql/core/src/main/scala/org/apache/spark/status/protobuf/sql/StreamingQueryProgressSerializer.scala:
##
@@ -0,0 +1,99 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.status.protobuf.sql
+
+import java.util.{HashMap => JHashMap, Map => JMap, UUID}
+
+import com.fasterxml.jackson.databind.json.JsonMapper
+import com.fasterxml.jackson.module.scala.DefaultScalaModule
+
+import org.apache.spark.sql.Row
+import org.apache.spark.sql.catalyst.expressions.GenericRowWithSchema
+import org.apache.spark.sql.streaming.StreamingQueryProgress
+import org.apache.spark.status.protobuf.StoreTypes
+import org.apache.spark.status.protobuf.Utils.{getStringField, setStringField}
+
+private[protobuf] object StreamingQueryProgressSerializer {
+
+  private val mapper: JsonMapper = JsonMapper.builder()
+.addModule(DefaultScalaModule)
+.build()
+
+  def serialize(process: StreamingQueryProgress): 
StoreTypes.StreamingQueryProgress = {
+import org.apache.spark.status.protobuf.Utils.setJMapField
+val builder = StoreTypes.StreamingQueryProgress.newBuilder()
+if (process.id != null) {

Review Comment:
   nvm, let's keep the current code



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] sadikovi commented on pull request #39660: [SPARK-42128][SQL] Support TOP (N) for MS SQL Server dialect as an alternative to Limit pushdown

2023-01-24 Thread via GitHub


sadikovi commented on PR #39660:
URL: https://github.com/apache/spark/pull/39660#issuecomment-1403067310

   @beliefer @srowen @dongjoon-hyun Could you please check the following 
comments: 
   - https://github.com/apache/spark/pull/39660#discussion_r1084578229
   - https://github.com/apache/spark/pull/39660#discussion_r1084580060
   
   Do you prefer to redesign JdbcDialects API for this work as a follow-up? Or 
should I close this PR and just work on redesign directly? Thanks.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon closed pull request #39731: [SPARK-42177][INFRA][3.4] Change master to branch-3.4 in GitHub Actions

2023-01-24 Thread via GitHub


HyukjinKwon closed pull request #39731: [SPARK-42177][INFRA][3.4] Change master 
to branch-3.4 in GitHub Actions 
URL: https://github.com/apache/spark/pull/39731


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon commented on pull request #39731: [SPARK-42177][INFRA][3.4] Change master to brach-3.4 in GitHub Actions

2023-01-24 Thread via GitHub


HyukjinKwon commented on PR #39731:
URL: https://github.com/apache/spark/pull/39731#issuecomment-1403053335

   Merged to branch-3.4.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon commented on pull request #39731: [SPARK-42177][INFRA][3.4] Change master to brach-3.4 in GitHub Actions

2023-01-24 Thread via GitHub


HyukjinKwon commented on PR #39731:
URL: https://github.com/apache/spark/pull/39731#issuecomment-1403053271

   im gonna push this to recover the build.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon closed pull request #39729: [SPARK-42176][SQL] Fix cast of a boolean value to timestamp

2023-01-24 Thread via GitHub


HyukjinKwon closed pull request #39729: [SPARK-42176][SQL] Fix cast of a 
boolean value to timestamp
URL: https://github.com/apache/spark/pull/39729


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon commented on pull request #39729: [SPARK-42176][SQL] Fix cast of a boolean value to timestamp

2023-01-24 Thread via GitHub


HyukjinKwon commented on PR #39729:
URL: https://github.com/apache/spark/pull/39729#issuecomment-1403052158

   Merged to master, branch-3.4, and branch-3.3.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon commented on pull request #39731: [SPARK-42177][INFRA][3.4] Change master to brach-3.4 in GitHub Actions

2023-01-24 Thread via GitHub


HyukjinKwon commented on PR #39731:
URL: https://github.com/apache/spark/pull/39731#issuecomment-1403048032

   cc @xinrong-meng @dongjoon-hyun @Yikun 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon opened a new pull request, #39731: [SPARK-42177][INFRA][3.4] Change master to brach-3.4 in GitHub Actions

2023-01-24 Thread via GitHub


HyukjinKwon opened a new pull request, #39731:
URL: https://github.com/apache/spark/pull/39731

   ### What changes were proposed in this pull request?
   
   Now we cut `branch-3.4`. But the `branch-3.4` points `master` reference (see 
https://github.com/apache/spark/blob/branch-3.4/.github/workflows/build_and_test.yml)
 that makes the CI fails 
https://github.com/apache/spark/actions/runs/4002380215/jobs/6869886029
   
   ### Why are the changes needed?
   
   To recover the CI
   
   ### Does this PR introduce _any_ user-facing change?
   
   No, test-only.
   
   ### How was this patch tested?
   
   CI should be monitored after this PR gets merged.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon closed pull request #39730: [SPARK-42177][INFRA][3.4] Change master to brach-3.4 in GitHub Actions

2023-01-24 Thread via GitHub


HyukjinKwon closed pull request #39730: [SPARK-42177][INFRA][3.4] Change master 
to brach-3.4 in GitHub Actions
URL: https://github.com/apache/spark/pull/39730


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon opened a new pull request, #39730: [SPARK-42177][INFRA][3.4] Change master to brach-3.4 in GitHub Actions

2023-01-24 Thread via GitHub


HyukjinKwon opened a new pull request, #39730:
URL: https://github.com/apache/spark/pull/39730

   ### What changes were proposed in this pull request?
   
   ### Why are the changes needed?
   
   To recover the CI, see 
https://github.com/apache/spark/actions/runs/4002380215/jobs/6869886029.
   
   ### Does this PR introduce _any_ user-facing change?
   
   No, test-only.
   
   ### How was this patch tested?
   
   CI should be monitored after this PR gets merged.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] zhenlineo commented on a diff in pull request #39712: [SPARK-42172][CONNECT] Scala Client Mima Compatibility Tests

2023-01-24 Thread via GitHub


zhenlineo commented on code in PR #39712:
URL: https://github.com/apache/spark/pull/39712#discussion_r1086087241


##
connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/Column.scala:
##
@@ -44,7 +45,7 @@ import org.apache.spark.sql.functions.lit
  *
  * @since 3.4.0
  */
-class Column private[sql] (private[sql] val expr: proto.Expression) {
+class Column private[sql] (private[sql] val expr: proto.Expression) extends 
Logging {

Review Comment:
   The logging is needed for the binary compatibility: class type shall be 
exactly the same.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] LuciferYang commented on a diff in pull request #39642: [SPARK-41677][CORE][SQL][SS][UI] Add Protobuf serializer for `StreamingQueryProgressWrapper`

2023-01-24 Thread via GitHub


LuciferYang commented on code in PR #39642:
URL: https://github.com/apache/spark/pull/39642#discussion_r1086143970


##
sql/core/src/main/scala/org/apache/spark/status/protobuf/sql/StreamingQueryProgressSerializer.scala:
##
@@ -0,0 +1,102 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.status.protobuf.sql
+
+import java.util.{HashMap => JHashMap, Map => JMap, UUID}
+
+import com.fasterxml.jackson.databind.json.JsonMapper
+import com.fasterxml.jackson.module.scala.DefaultScalaModule
+
+import org.apache.spark.sql.Row
+import org.apache.spark.sql.catalyst.expressions.GenericRowWithSchema
+import org.apache.spark.sql.streaming.StreamingQueryProgress
+import org.apache.spark.status.protobuf.StoreTypes
+import org.apache.spark.status.protobuf.Utils.{getStringField, setJMapField, 
setStringField}
+
+private[protobuf] object StreamingQueryProgressSerializer {
+
+  private val mapper: JsonMapper = JsonMapper.builder()
+.addModule(DefaultScalaModule)
+.build()
+
+  def serialize(process: StreamingQueryProgress): 
StoreTypes.StreamingQueryProgress = {
+val builder = StoreTypes.StreamingQueryProgress.newBuilder()
+if (process.id != null) {
+  builder.setId(process.id.toString)
+}
+if (process.runId != null) {
+  builder.setRunId(process.runId.toString)
+}
+setStringField(process.name, builder.setName)
+setStringField(process.timestamp, builder.setTimestamp)
+builder.setBatchId(process.batchId)
+builder.setBatchDuration(process.batchDuration)
+setJMapField(process.durationMs, builder.putAllDurationMs)
+setJMapField(process.eventTime, builder.putAllEventTime)
+process.stateOperators.foreach(
+  s => 
builder.addStateOperators(StateOperatorProgressSerializer.serialize(s)))
+process.sources.foreach(
+  s => builder.addSources(SourceProgressSerializer.serialize(s))
+)
+builder.setSink(SinkProgressSerializer.serialize(process.sink))
+setJMapField(process.observedMetrics, putAllObservedMetrics(builder, _))
+builder.build()
+  }
+
+  def deserialize(process: StoreTypes.StreamingQueryProgress): 
StreamingQueryProgress = {
+val id = if (process.hasId) {
+  UUID.fromString(process.getId)
+} else null
+val runId = if (process.hasId) {
+  UUID.fromString(process.getRunId)
+} else null
+new StreamingQueryProgress(
+  id = id,
+  runId = runId,
+  name = getStringField(process.hasName, () => process.getName),
+  timestamp = getStringField(process.hasTimestamp, () => 
process.getTimestamp),
+  batchId = process.getBatchId,
+  batchDuration = process.getBatchDuration,
+  durationMs = new JHashMap(process.getDurationMsMap),
+  eventTime = new JHashMap(process.getEventTimeMap),
+  stateOperators =
+
StateOperatorProgressSerializer.deserializeToArray(process.getStateOperatorsList),
+  sources = 
SourceProgressSerializer.deserializeToArray(process.getSourcesList),
+  sink = SinkProgressSerializer.deserialize(process.getSink),
+  observedMetrics = convertToObservedMetrics(process.getObservedMetricsMap)
+)
+  }
+
+  private def putAllObservedMetrics(

Review Comment:
   Change to define a class level private function
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] sadikovi commented on a diff in pull request #39729: [SPARK-42176][SQL] Fix cast of a boolean value to timestamp

2023-01-24 Thread via GitHub


sadikovi commented on code in PR #39729:
URL: https://github.com/apache/spark/pull/39729#discussion_r1086142703


##
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/CastWithAnsiOffSuite.scala:
##
@@ -608,6 +608,11 @@ class CastWithAnsiOffSuite extends CastSuiteBase {
 checkEvaluation(cast(input, StringType), "1.23E-7")
   }
 
+  test("SPARK-42176: cast boolean to timestamp") {

Review Comment:
   Oh, thanks for fixing it!



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on a diff in pull request #39729: [SPARK-42175][SQL] Fix cast of a boolean value to timestamp

2023-01-24 Thread via GitHub


dongjoon-hyun commented on code in PR #39729:
URL: https://github.com/apache/spark/pull/39729#discussion_r1086138284


##
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/CastWithAnsiOffSuite.scala:
##
@@ -608,6 +608,11 @@ class CastWithAnsiOffSuite extends CastSuiteBase {
 checkEvaluation(cast(input, StringType), "1.23E-7")
   }
 
+  test("SPARK-42176: cast boolean to timestamp") {

Review Comment:
   The PR title seems to show a wrong JIRA ID, `SPARK-42175`. Let me fix it.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] LuciferYang commented on a diff in pull request #39642: [SPARK-41677][CORE][SQL][SS][UI] Add Protobuf serializer for `StreamingQueryProgressWrapper`

2023-01-24 Thread via GitHub


LuciferYang commented on code in PR #39642:
URL: https://github.com/apache/spark/pull/39642#discussion_r1086137202


##
sql/core/src/main/scala/org/apache/spark/status/protobuf/sql/StreamingQueryProgressSerializer.scala:
##
@@ -0,0 +1,99 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.status.protobuf.sql
+
+import java.util.{HashMap => JHashMap, Map => JMap, UUID}
+
+import com.fasterxml.jackson.databind.json.JsonMapper
+import com.fasterxml.jackson.module.scala.DefaultScalaModule
+
+import org.apache.spark.sql.Row
+import org.apache.spark.sql.catalyst.expressions.GenericRowWithSchema
+import org.apache.spark.sql.streaming.StreamingQueryProgress
+import org.apache.spark.status.protobuf.StoreTypes
+import org.apache.spark.status.protobuf.Utils.{getStringField, setStringField}
+
+private[protobuf] object StreamingQueryProgressSerializer {
+
+  private val mapper: JsonMapper = JsonMapper.builder()
+.addModule(DefaultScalaModule)
+.build()
+
+  def serialize(process: StreamingQueryProgress): 
StoreTypes.StreamingQueryProgress = {
+import org.apache.spark.status.protobuf.Utils.setJMapField
+val builder = StoreTypes.StreamingQueryProgress.newBuilder()
+if (process.id != null) {

Review Comment:
   
https://github.com/apache/spark/blob/c43be4a1b23b599ef9c7f81c2cba619bcf7e/core/src/main/scala/org/apache/spark/status/protobuf/Utils.scala#L27-L31
   
   Or should we refactor `setStringField` to 
   
   ```scala
 def setStringField[T](input: T, f: String => Any): Unit = {
   if (input != null) {
 f(input.toString)
   }
 }
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on a diff in pull request #39726: [SPARK-42123][SQL] Include column default values in DESCRIBE and SHOW CREATE TABLE output #39657

2023-01-24 Thread via GitHub


dongjoon-hyun commented on code in PR #39726:
URL: https://github.com/apache/spark/pull/39726#discussion_r1086136677


##
sql/core/src/test/resources/sql-tests/inputs/describe.sql:
##
@@ -97,3 +97,25 @@ DROP VIEW temp_v;
 DROP VIEW temp_Data_Source_View;
 
 DROP VIEW v;
+
+-- Show column default values
+CREATE TABLE d (a STRING DEFAULT 'show-create-table.sql', b INT DEFAULT 42) 
USING parquet COMMENT 'table_comment';

Review Comment:
   Oh, I meant to refer `show-create-table.sql` file of this PR instead of 
mentioning the literal `show-create-table.sql`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] LuciferYang commented on a diff in pull request #39642: [SPARK-41677][CORE][SQL][SS][UI] Add Protobuf serializer for `StreamingQueryProgressWrapper`

2023-01-24 Thread via GitHub


LuciferYang commented on code in PR #39642:
URL: https://github.com/apache/spark/pull/39642#discussion_r1086118621


##
sql/core/src/main/scala/org/apache/spark/status/protobuf/sql/StreamingQueryProgressSerializer.scala:
##
@@ -0,0 +1,99 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.status.protobuf.sql
+
+import java.util.{HashMap => JHashMap, Map => JMap, UUID}
+
+import com.fasterxml.jackson.databind.json.JsonMapper
+import com.fasterxml.jackson.module.scala.DefaultScalaModule
+
+import org.apache.spark.sql.Row
+import org.apache.spark.sql.catalyst.expressions.GenericRowWithSchema
+import org.apache.spark.sql.streaming.StreamingQueryProgress
+import org.apache.spark.status.protobuf.StoreTypes
+import org.apache.spark.status.protobuf.Utils.{getStringField, setJMapField, 
setStringField}
+
+private[protobuf] object StreamingQueryProgressSerializer {
+
+  private val mapper: JsonMapper = JsonMapper.builder()
+.addModule(DefaultScalaModule)
+.build()
+
+  def serialize(process: StreamingQueryProgress): 
StoreTypes.StreamingQueryProgress = {
+val builder = StoreTypes.StreamingQueryProgress.newBuilder()
+if (process.id != null) {
+  builder.setId(process.id.toString)
+}
+if (process.runId != null) {
+  builder.setRunId(process.runId.toString)
+}
+setStringField(process.name, builder.setName)
+setStringField(process.timestamp, builder.setTimestamp)
+builder.setBatchId(process.batchId)
+builder.setBatchDuration(process.batchDuration)
+setJMapField(process.durationMs, builder.putAllDurationMs)
+setJMapField(process.eventTime, builder.putAllEventTime)
+process.stateOperators.foreach(
+  s => 
builder.addStateOperators(StateOperatorProgressSerializer.serialize(s)))
+process.sources.foreach(
+  s => builder.addSources(SourceProgressSerializer.serialize(s))
+)
+builder.setSink(SinkProgressSerializer.serialize(process.sink))
+def serializeMetrics(metrics: JMap[String, Row]): Unit = {

Review Comment:
   use `setJMapField` need to define an additional function or use an anonymous 
function due this is not a simple `putAll` scenario.
   
   Any other better way to write this ? @gengliangwang 
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dtenedor commented on a diff in pull request #39726: [SPARK-42123][SQL] Include column default values in DESCRIBE and SHOW CREATE TABLE output #39657

2023-01-24 Thread via GitHub


dtenedor commented on code in PR #39726:
URL: https://github.com/apache/spark/pull/39726#discussion_r1086099890


##
sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala:
##
@@ -645,6 +645,17 @@ case class DescribeTableCommand(
   } else if (isExtended) {
 describeFormattedTableInfo(metadata, result)
   }
+
+  // If any columns have default values, append them to the result.
+  if (metadata.schema.fields.exists(_.metadata.contains(
+ResolveDefaultColumns.CURRENT_DEFAULT_COLUMN_METADATA_KEY))) {

Review Comment:
   Done.



##
sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala:
##
@@ -645,6 +645,17 @@ case class DescribeTableCommand(
   } else if (isExtended) {
 describeFormattedTableInfo(metadata, result)
   }
+
+  // If any columns have default values, append them to the result.
+  if (metadata.schema.fields.exists(_.metadata.contains(
+ResolveDefaultColumns.CURRENT_DEFAULT_COLUMN_METADATA_KEY))) {
+append(result, "", "", "")
+append(result, "# Column Default Information", "", "")

Review Comment:
   Sounds good, done.



##
sql/catalyst/src/main/scala/org/apache/spark/sql/types/StructField.scala:
##
@@ -160,7 +164,7 @@ case class StructField(
*/
   def toDDL: String = {
 val nullString = if (nullable) "" else " NOT NULL"
-s"${quoteIfNeeded(name)} ${dataType.sql}${nullString}$getDDLComment"
+s"${quoteIfNeeded(name)} 
${dataType.sql}${nullString}$getDDLDefault$getDDLComment"

Review Comment:
   Good Q, I added a test case covering this.



##
sql/core/src/test/scala/org/apache/spark/sql/execution/command/v1/ShowCreateTableSuite.scala:
##
@@ -161,6 +161,30 @@ trait ShowCreateTableSuiteBase extends 
command.ShowCreateTableSuiteBase
   assert(cause.getMessage.contains("Use `SHOW CREATE TABLE` without `AS 
SERDE` instead"))
 }
   }
+
+  test("show create table with default column values") {
+withNamespaceAndTable(ns, table) { t =>
+  sql(
+s"""
+   |CREATE TABLE $t (
+   |  a bigint NOT NULL,
+   |  b bigint DEFAULT 42,
+   |  c string DEFAULT 'abc' COMMENT 'comment'
+   |)
+   |using parquet

Review Comment:
   Done



##
sql/catalyst/src/main/scala/org/apache/spark/sql/types/StructField.scala:
##
@@ -141,6 +141,10 @@ case class StructField(
 }
   }
 
+  private def getDDLDefault = getCurrentDefaultValue()
+.map(" DEFAULT " + _)

Review Comment:
   Good question, I was originally confused about this as well :) the default 
value is exactly the string that appeared in the CREATE TABLE or ALTER TABLE 
statement. So, for example, if we had `CREATE TABLE t (a STRING DEFAULT 
'str')`, then this column metadata has the value `'str'` (including the 
single-quotes). So we have to print them back exactly as-is here (e.g. I tried 
adding `escapeSingleQuotedString` like `getDDLComment` below, but it added 
extra escaping which we did not want).
   
   The extra test cases you suggested should show that this works even for 
different types of string-typed defaults.



##
sql/core/src/test/resources/sql-tests/inputs/describe.sql:
##
@@ -97,3 +97,14 @@ DROP VIEW temp_v;
 DROP VIEW temp_Data_Source_View;
 
 DROP VIEW v;
+
+-- Show column default values
+CREATE TABLE d (a STRING, b INT DEFAULT 42) USING parquet COMMENT 
'table_comment';

Review Comment:
   Sounds good, done.



##
sql/core/src/test/resources/sql-tests/inputs/show-create-table.sql:
##
@@ -45,6 +45,14 @@ SHOW CREATE TABLE tbl;
 DROP TABLE tbl;
 
 
+-- default column values
+CREATE TABLE tbl (a INT, b STRING DEFAULT 'abc', c INT DEFAULT 42) USING 
parquet

Review Comment:
   Done.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] itholic commented on pull request #39721: [SPARK-42171][PYSPARK][TESTS] Fix `pyspark-errors` module and enable it in GitHub Action

2023-01-24 Thread via GitHub


itholic commented on PR #39721:
URL: https://github.com/apache/spark/pull/39721#issuecomment-1402916066

   Late LGTM, thanks for fixing!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on pull request #39728: [SPARK-42173][CORE] RpcAddress equality can fail

2023-01-24 Thread via GitHub


dongjoon-hyun commented on PR #39728:
URL: https://github.com/apache/spark/pull/39728#issuecomment-1402912399

   cc @mridulm , too


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] holdenk commented on a diff in pull request #39728: [SPARK-42173][CORE] RpcAddress equality can fail

2023-01-24 Thread via GitHub


holdenk commented on code in PR #39728:
URL: https://github.com/apache/spark/pull/39728#discussion_r1086106561


##
core/src/main/scala/org/apache/spark/rpc/RpcAddress.scala:
##
@@ -23,30 +23,37 @@ import org.apache.spark.util.Utils
 /**
  * Address for an RPC environment, with hostname and port.
  */
-private[spark] case class RpcAddress(_host: String, port: Int) {
-
-  lazy val host: String = Utils.addBracketsIfNeeded(_host)
+private[spark] case class RpcAddress(host: String, port: Int) {
 
   def hostPort: String = host + ":" + port
 
   /** Returns a string in the form of "spark://host:port". */
   def toSparkURL: String = "spark://" + hostPort
 
   override def toString: String = hostPort
+
+

Review Comment:
   Sure :)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] LuciferYang commented on a diff in pull request #39642: [SPARK-41677][CORE][SQL][SS][UI] Add Protobuf serializer for `StreamingQueryProgressWrapper`

2023-01-24 Thread via GitHub


LuciferYang commented on code in PR #39642:
URL: https://github.com/apache/spark/pull/39642#discussion_r1086105668


##
sql/core/src/main/scala/org/apache/spark/status/protobuf/sql/StreamingQueryProgressSerializer.scala:
##
@@ -0,0 +1,99 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.status.protobuf.sql
+
+import java.util.{HashMap => JHashMap, Map => JMap, UUID}
+
+import com.fasterxml.jackson.databind.json.JsonMapper
+import com.fasterxml.jackson.module.scala.DefaultScalaModule
+
+import org.apache.spark.sql.Row
+import org.apache.spark.sql.catalyst.expressions.GenericRowWithSchema
+import org.apache.spark.sql.streaming.StreamingQueryProgress
+import org.apache.spark.status.protobuf.StoreTypes
+import org.apache.spark.status.protobuf.Utils.{getStringField, setStringField}
+
+private[protobuf] object StreamingQueryProgressSerializer {
+
+  private val mapper: JsonMapper = JsonMapper.builder()
+.addModule(DefaultScalaModule)
+.build()
+
+  def serialize(process: StreamingQueryProgress): 
StoreTypes.StreamingQueryProgress = {
+import org.apache.spark.status.protobuf.Utils.setJMapField
+val builder = StoreTypes.StreamingQueryProgress.newBuilder()
+if (process.id != null) {
+  builder.setId(process.id.toString)
+}
+if (process.runId != null) {
+  builder.setRunId(process.runId.toString)

Review Comment:
   ditto, `process.runId` is also UUID
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] LuciferYang commented on a diff in pull request #39642: [SPARK-41677][CORE][SQL][SS][UI] Add Protobuf serializer for `StreamingQueryProgressWrapper`

2023-01-24 Thread via GitHub


LuciferYang commented on code in PR #39642:
URL: https://github.com/apache/spark/pull/39642#discussion_r1086105668


##
sql/core/src/main/scala/org/apache/spark/status/protobuf/sql/StreamingQueryProgressSerializer.scala:
##
@@ -0,0 +1,99 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.status.protobuf.sql
+
+import java.util.{HashMap => JHashMap, Map => JMap, UUID}
+
+import com.fasterxml.jackson.databind.json.JsonMapper
+import com.fasterxml.jackson.module.scala.DefaultScalaModule
+
+import org.apache.spark.sql.Row
+import org.apache.spark.sql.catalyst.expressions.GenericRowWithSchema
+import org.apache.spark.sql.streaming.StreamingQueryProgress
+import org.apache.spark.status.protobuf.StoreTypes
+import org.apache.spark.status.protobuf.Utils.{getStringField, setStringField}
+
+private[protobuf] object StreamingQueryProgressSerializer {
+
+  private val mapper: JsonMapper = JsonMapper.builder()
+.addModule(DefaultScalaModule)
+.build()
+
+  def serialize(process: StreamingQueryProgress): 
StoreTypes.StreamingQueryProgress = {
+import org.apache.spark.status.protobuf.Utils.setJMapField
+val builder = StoreTypes.StreamingQueryProgress.newBuilder()
+if (process.id != null) {
+  builder.setId(process.id.toString)
+}
+if (process.runId != null) {
+  builder.setRunId(process.runId.toString)

Review Comment:
   ditto, process.runId is also UUID
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] LuciferYang commented on a diff in pull request #39642: [SPARK-41677][CORE][SQL][SS][UI] Add Protobuf serializer for `StreamingQueryProgressWrapper`

2023-01-24 Thread via GitHub


LuciferYang commented on code in PR #39642:
URL: https://github.com/apache/spark/pull/39642#discussion_r1086105619


##
sql/core/src/main/scala/org/apache/spark/status/protobuf/sql/StreamingQueryProgressSerializer.scala:
##
@@ -0,0 +1,99 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.status.protobuf.sql
+
+import java.util.{HashMap => JHashMap, Map => JMap, UUID}
+
+import com.fasterxml.jackson.databind.json.JsonMapper
+import com.fasterxml.jackson.module.scala.DefaultScalaModule
+
+import org.apache.spark.sql.Row
+import org.apache.spark.sql.catalyst.expressions.GenericRowWithSchema
+import org.apache.spark.sql.streaming.StreamingQueryProgress
+import org.apache.spark.status.protobuf.StoreTypes
+import org.apache.spark.status.protobuf.Utils.{getStringField, setStringField}
+
+private[protobuf] object StreamingQueryProgressSerializer {
+
+  private val mapper: JsonMapper = JsonMapper.builder()
+.addModule(DefaultScalaModule)
+.build()
+
+  def serialize(process: StreamingQueryProgress): 
StoreTypes.StreamingQueryProgress = {
+import org.apache.spark.status.protobuf.Utils.setJMapField
+val builder = StoreTypes.StreamingQueryProgress.newBuilder()
+if (process.id != null) {

Review Comment:
   `process.id` is UUID, cannot be the first parameter of `setStringField `
   



##
sql/core/src/main/scala/org/apache/spark/status/protobuf/sql/StreamingQueryProgressSerializer.scala:
##
@@ -0,0 +1,99 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.status.protobuf.sql
+
+import java.util.{HashMap => JHashMap, Map => JMap, UUID}
+
+import com.fasterxml.jackson.databind.json.JsonMapper
+import com.fasterxml.jackson.module.scala.DefaultScalaModule
+
+import org.apache.spark.sql.Row
+import org.apache.spark.sql.catalyst.expressions.GenericRowWithSchema
+import org.apache.spark.sql.streaming.StreamingQueryProgress
+import org.apache.spark.status.protobuf.StoreTypes
+import org.apache.spark.status.protobuf.Utils.{getStringField, setStringField}
+
+private[protobuf] object StreamingQueryProgressSerializer {
+
+  private val mapper: JsonMapper = JsonMapper.builder()
+.addModule(DefaultScalaModule)
+.build()
+
+  def serialize(process: StreamingQueryProgress): 
StoreTypes.StreamingQueryProgress = {
+import org.apache.spark.status.protobuf.Utils.setJMapField
+val builder = StoreTypes.StreamingQueryProgress.newBuilder()
+if (process.id != null) {
+  builder.setId(process.id.toString)
+}
+if (process.runId != null) {
+  builder.setRunId(process.runId.toString)

Review Comment:
   ditto



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on a diff in pull request #39728: [SPARK-42173][CORE] RpcAddress equality can fail

2023-01-24 Thread via GitHub


dongjoon-hyun commented on code in PR #39728:
URL: https://github.com/apache/spark/pull/39728#discussion_r1086103013


##
core/src/main/scala/org/apache/spark/rpc/RpcAddress.scala:
##
@@ -23,30 +23,37 @@ import org.apache.spark.util.Utils
 /**
  * Address for an RPC environment, with hostname and port.
  */
-private[spark] case class RpcAddress(_host: String, port: Int) {
-
-  lazy val host: String = Utils.addBracketsIfNeeded(_host)
+private[spark] case class RpcAddress(host: String, port: Int) {
 
   def hostPort: String = host + ":" + port
 
   /** Returns a string in the form of "spark://host:port". */
   def toSparkURL: String = "spark://" + hostPort
 
   override def toString: String = hostPort
+
+

Review Comment:
   Could you revert this empty line addition?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on pull request #39727: [SPARK-42174][PYTHON][INFRA] Use `scikit-learn` instead of `sklearn`

2023-01-24 Thread via GitHub


dongjoon-hyun commented on PR #39727:
URL: https://github.com/apache/spark/pull/39727#issuecomment-1402900556

   Thank you, @HyukjinKwon !


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] sadikovi commented on pull request #39729: [SPARK-42175][SQL] Fix cast of a boolean value to timestamp

2023-01-24 Thread via GitHub


sadikovi commented on PR #39729:
URL: https://github.com/apache/spark/pull/39729#issuecomment-1402899890

   @dongjoon-hyun @srowen Can you review this PR? Thanks. 
   It is a small fix for `Cast` expression.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] sadikovi opened a new pull request, #39729: [SPARK-42175] Fix cast of a boolean value to timestamp

2023-01-24 Thread via GitHub


sadikovi opened a new pull request, #39729:
URL: https://github.com/apache/spark/pull/39729

   
   
   ### What changes were proposed in this pull request?
   
   
   The PR fixes an issue when casting a boolean to timestamp. 
   
   While `select cast(true as timestamp)` works and returns 
`1970-01-01T00:00:00.000+`, casting `false` to timestamp fails with the 
following error:
   ```
   IllegalArgumentException: requirement failed: Literal must have a 
corresponding value to timestamp, but class Integer found.
   ```
   
   SBT test also fails with this error:
   ```
   [info]   java.lang.ClassCastException: java.lang.Integer cannot be cast to 
java.lang.Long
   [info]   at scala.runtime.BoxesRunTime.unboxToLong(BoxesRunTime.java:107)
   [info]   at 
org.apache.spark.sql.catalyst.InternalRow$.$anonfun$getWriter$5(InternalRow.scala:178)
   [info]   at 
org.apache.spark.sql.catalyst.InternalRow$.$anonfun$getWriter$5$adapted(InternalRow.scala:178)
   ```
   
   The issue was that we need to return `0L` instead of 0 when converting 
`false` to a long.
   
   ### Why are the changes needed?
   
   
   Fixes a small bug in cast.
   
   ### Does this PR introduce _any_ user-facing change?
   
   
   No.
   
   ### How was this patch tested?
   
   
   I added a unit test to verify the fix.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon commented on a diff in pull request #39712: [SPARK-42172][CONNECT] Scala Client Mima Compatibility Tests

2023-01-24 Thread via GitHub


HyukjinKwon commented on code in PR #39712:
URL: https://github.com/apache/spark/pull/39712#discussion_r1086096820


##
connector/connect/client/jvm/pom.xml:
##
@@ -75,6 +76,13 @@
   mockito-core
   test
 
+

Review Comment:
   Gotya. Let's probably add a couple of comments here and there to make it 
clear .. I am sure this is confusing to other developers.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] zhenlineo commented on a diff in pull request #39712: [SPARK-42172][CONNECT] Scala Client Mima Compatibility Tests

2023-01-24 Thread via GitHub


zhenlineo commented on code in PR #39712:
URL: https://github.com/apache/spark/pull/39712#discussion_r1086095693


##
connector/connect/client/jvm/src/test/scala/org/apache/spark/sql/connect/client/CompatibilitySuite.scala:
##
@@ -0,0 +1,139 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.sql.connect.client
+
+import java.io.File
+import java.net.URLClassLoader
+import java.util.regex.Pattern
+
+import com.typesafe.tools.mima.core._
+import com.typesafe.tools.mima.lib.MiMaLib
+import org.scalatest.funsuite.AnyFunSuite // scalastyle:ignore funsuite
+import org.apache.spark.sql.connect.client.util.IntegrationTestUtils._
+
+/**
+ * This test requires the following artifacts built before running the tests:
+ * {{{
+ * spark-sql
+ * spark-connect-client-jvm
+ * }}}
+ * To build the above artifact, use e.g. `sbt package` or `mvn clean install 
-DskipTests`.
+ *
+ * When debugging this test, if any changes to the client API, the client jar 
need to be built
+ * before re-running the test. An example workflow with SBT for this test:
+ *   1. Compatibility test reported an unexpected client API change.
+ *   1. Fix the wrong client API.
+ *   1. Build the client jar: `sbt package`
+ *   1. Run the test again: `sbt "testOnly
+ *  org.apache.spark.sql.connect.client.CompatibilitySuite"`
+ */
+class CompatibilitySuite extends AnyFunSuite { // scalastyle:ignore funsuite
+
+  private lazy val clientJar: File =
+findJar(
+  "connector/connect/client/jvm",
+  "spark-connect-client-jvm-assembly",
+  "spark-connect-client-jvm")
+
+  private lazy val sqlJar: File = findJar("sql/core", "spark-sql", "spark-sql")
+
+  test("compatibility mima tests") {
+val mima = new MiMaLib(Seq(clientJar, sqlJar))
+val allProblems = mima.collectProblems(sqlJar, clientJar, List.empty)
+val includedRules = Seq(
+  IncludeByName("org.apache.spark.sql.Column"),
+  IncludeByName("org.apache.spark.sql.Column$"),
+  IncludeByName("org.apache.spark.sql.Dataset"),
+  // TODO Add the Dataset object definition

Review Comment:
   https://issues.apache.org/jira/browse/SPARK-42175 This was skipped as I do 
not want to include too much API impl with the compatibility test PR.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] zhenlineo commented on a diff in pull request #39712: [SPARK-42172][CONNECT] Scala Client Mima Compatibility Tests

2023-01-24 Thread via GitHub


zhenlineo commented on code in PR #39712:
URL: https://github.com/apache/spark/pull/39712#discussion_r1086093964


##
connector/connect/client/jvm/pom.xml:
##
@@ -75,6 +76,13 @@
   mockito-core
   test
 
+

Review Comment:
   The SBT MiMa check has some limitations to run as a SBT rule:
   It is the best for a stable API. e.g. current vs previous. It is not very 
friendly to configure to test e.g. scala-client vs sql while we are actively 
working on the scala-client API.
   To be more specific, the problems I hit were:
   1. I cannot configure the MiMa rule to find the current SQL SNAPSHOT jar.
   2. I cannot use ClassLoader correctly in the SBT rule to load all methods in 
the client API.
   
   
   As a result, I end up this test where we have more freedom to grow the API 
test coverage with the client API.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] holdenk opened a new pull request, #39728: [SPARK-42173][CORE] RpcAddress equality can fail

2023-01-24 Thread via GitHub


holdenk opened a new pull request, #39728:
URL: https://github.com/apache/spark/pull/39728

   ### What changes were proposed in this pull request?
   
   When constructing an RpcAddress use InetUtils to get a consistently 
formatted IPv6 address if the env is for an IPv6 address.
   
   ### Why are the changes needed?
   
   We use RpcAddress equality for various tasks involving executors and a 
mismatch of equality can cause interesting errors.
   
   ### Does this PR introduce _any_ user-facing change?
   
   Log messages might change from sometimes having all the 0s in a v6 address 
present to not.
   
   ### How was this patch tested?
   
   Existing tests + new unit test showing that [::0:1] is formatted to [::1]


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon closed pull request #39656: [SPARK-42119][SQL] Add built-in table-valued functions inline and inline_outer

2023-01-24 Thread via GitHub


HyukjinKwon closed pull request #39656: [SPARK-42119][SQL] Add built-in 
table-valued functions inline and inline_outer
URL: https://github.com/apache/spark/pull/39656


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon commented on pull request #39656: [SPARK-42119][SQL] Add built-in table-valued functions inline and inline_outer

2023-01-24 Thread via GitHub


HyukjinKwon commented on PR #39656:
URL: https://github.com/apache/spark/pull/39656#issuecomment-1402878991

   Merged to master.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] zhenlineo commented on a diff in pull request #39712: [SPARK-42172][CONNECT] Scala Client Mima Compatibility Tests

2023-01-24 Thread via GitHub


zhenlineo commented on code in PR #39712:
URL: https://github.com/apache/spark/pull/39712#discussion_r1086087241


##
connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/Column.scala:
##
@@ -44,7 +45,7 @@ import org.apache.spark.sql.functions.lit
  *
  * @since 3.4.0
  */
-class Column private[sql] (private[sql] val expr: proto.Expression) {
+class Column private[sql] (private[sql] val expr: proto.Expression) extends 
Logging {

Review Comment:
   The logging is needed for the binary compatibility: class type shall be 
exact the same.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon closed pull request #39727: [SPARK-42174][PYTHON][INFRA] Use `scikit-learn` instead of `sklearn`

2023-01-24 Thread via GitHub


HyukjinKwon closed pull request #39727: [SPARK-42174][PYTHON][INFRA] Use 
`scikit-learn` instead of `sklearn`
URL: https://github.com/apache/spark/pull/39727


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon commented on pull request #39727: [SPARK-42174][PYTHON][INFRA] Use `scikit-learn` instead of `sklearn`

2023-01-24 Thread via GitHub


HyukjinKwon commented on PR #39727:
URL: https://github.com/apache/spark/pull/39727#issuecomment-1402877171

   Merged to master.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon commented on pull request #39717: [SPARK-42168][3.3][SQL][PYTHON] Fix required child distribution of FlatMapCoGroupsInPandas (as in CoGroup)

2023-01-24 Thread via GitHub


HyukjinKwon commented on PR #39717:
URL: https://github.com/apache/spark/pull/39717#issuecomment-1402875318

   cc @sunchao 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon commented on a diff in pull request #39712: [SPARK-42172][CONNECT] Scala Client Mima Compatibility Tests

2023-01-24 Thread via GitHub


HyukjinKwon commented on code in PR #39712:
URL: https://github.com/apache/spark/pull/39712#discussion_r1086085628


##
connector/connect/client/jvm/pom.xml:
##
@@ -75,6 +76,13 @@
   mockito-core
   test
 
+

Review Comment:
   Can we use SBT to check this instead of Maven? We have one place for MiMa so 
far in SBT (See also  `project/MimaBuild.scala`, and `dev/mima`)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon commented on a diff in pull request #39712: [SPARK-42172][CONNECT] Scala Client Mima Compatibility Tests

2023-01-24 Thread via GitHub


HyukjinKwon commented on code in PR #39712:
URL: https://github.com/apache/spark/pull/39712#discussion_r1086082677


##
connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/Column.scala:
##
@@ -44,7 +45,7 @@ import org.apache.spark.sql.functions.lit
  *
  * @since 3.4.0
  */
-class Column private[sql] (private[sql] val expr: proto.Expression) {
+class Column private[sql] (private[sql] val expr: proto.Expression) extends 
Logging {

Review Comment:
   Seems like we're not using this Logging



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon commented on a diff in pull request #39712: [SPARK-42172][CONNECT] Scala Client Mima Compatibility Tests

2023-01-24 Thread via GitHub


HyukjinKwon commented on code in PR #39712:
URL: https://github.com/apache/spark/pull/39712#discussion_r1086082530


##
connector/connect/client/jvm/src/test/scala/org/apache/spark/sql/connect/client/CompatibilitySuite.scala:
##
@@ -0,0 +1,139 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.sql.connect.client
+
+import java.io.File
+import java.net.URLClassLoader
+import java.util.regex.Pattern
+
+import com.typesafe.tools.mima.core._
+import com.typesafe.tools.mima.lib.MiMaLib
+import org.scalatest.funsuite.AnyFunSuite // scalastyle:ignore funsuite
+import org.apache.spark.sql.connect.client.util.IntegrationTestUtils._
+
+/**
+ * This test requires the following artifacts built before running the tests:
+ * {{{
+ * spark-sql
+ * spark-connect-client-jvm
+ * }}}
+ * To build the above artifact, use e.g. `sbt package` or `mvn clean install 
-DskipTests`.
+ *
+ * When debugging this test, if any changes to the client API, the client jar 
need to be built
+ * before re-running the test. An example workflow with SBT for this test:
+ *   1. Compatibility test reported an unexpected client API change.
+ *   1. Fix the wrong client API.
+ *   1. Build the client jar: `sbt package`
+ *   1. Run the test again: `sbt "testOnly
+ *  org.apache.spark.sql.connect.client.CompatibilitySuite"`
+ */
+class CompatibilitySuite extends AnyFunSuite { // scalastyle:ignore funsuite
+
+  private lazy val clientJar: File =
+findJar(
+  "connector/connect/client/jvm",
+  "spark-connect-client-jvm-assembly",
+  "spark-connect-client-jvm")
+
+  private lazy val sqlJar: File = findJar("sql/core", "spark-sql", "spark-sql")
+
+  test("compatibility mima tests") {
+val mima = new MiMaLib(Seq(clientJar, sqlJar))
+val allProblems = mima.collectProblems(sqlJar, clientJar, List.empty)
+val includedRules = Seq(
+  IncludeByName("org.apache.spark.sql.Column"),
+  IncludeByName("org.apache.spark.sql.Column$"),
+  IncludeByName("org.apache.spark.sql.Dataset"),
+  // TODO Add the Dataset object definition

Review Comment:
   Let's probably file a JIRA



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon closed pull request #39724: [SPARK-41775][PYTHON][FOLLOWUP] Fix stdout rerouting

2023-01-24 Thread via GitHub


HyukjinKwon closed pull request #39724: [SPARK-41775][PYTHON][FOLLOWUP] Fix 
stdout rerouting
URL: https://github.com/apache/spark/pull/39724


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon commented on pull request #39724: [SPARK-41775][PYTHON][FOLLOWUP] Fix stdout rerouting

2023-01-24 Thread via GitHub


HyukjinKwon commented on PR #39724:
URL: https://github.com/apache/spark/pull/39724#issuecomment-1402862371

   Merged to master.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon closed pull request #39375: [SPARK-36124][SQL] Support subqueries with correlation through UNION

2023-01-24 Thread via GitHub


HyukjinKwon closed pull request #39375: [SPARK-36124][SQL] Support subqueries 
with correlation through UNION
URL: https://github.com/apache/spark/pull/39375


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon commented on pull request #39375: [SPARK-36124][SQL] Support subqueries with correlation through UNION

2023-01-24 Thread via GitHub


HyukjinKwon commented on PR #39375:
URL: https://github.com/apache/spark/pull/39375#issuecomment-1402860722

   Merged to master.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon commented on pull request #39721: [SPARK-42171][PYSPARK][TESTS] Fix `pyspark-errors` module and enable it in GitHub Action

2023-01-24 Thread via GitHub


HyukjinKwon commented on PR #39721:
URL: https://github.com/apache/spark/pull/39721#issuecomment-1402853084

   Thank you!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] gengliangwang commented on a diff in pull request #39642: [SPARK-41677][CORE][SQL][SS][UI] Add Protobuf serializer for `StreamingQueryProgressWrapper`

2023-01-24 Thread via GitHub


gengliangwang commented on code in PR #39642:
URL: https://github.com/apache/spark/pull/39642#discussion_r1086049823


##
sql/core/src/main/scala/org/apache/spark/status/protobuf/sql/StreamingQueryProgressSerializer.scala:
##
@@ -0,0 +1,99 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.status.protobuf.sql
+
+import java.util.{HashMap => JHashMap, Map => JMap, UUID}
+
+import com.fasterxml.jackson.databind.json.JsonMapper
+import com.fasterxml.jackson.module.scala.DefaultScalaModule
+
+import org.apache.spark.sql.Row
+import org.apache.spark.sql.catalyst.expressions.GenericRowWithSchema
+import org.apache.spark.sql.streaming.StreamingQueryProgress
+import org.apache.spark.status.protobuf.StoreTypes
+import org.apache.spark.status.protobuf.Utils.{getStringField, setStringField}
+
+private[protobuf] object StreamingQueryProgressSerializer {
+
+  private val mapper: JsonMapper = JsonMapper.builder()
+.addModule(DefaultScalaModule)
+.build()
+
+  def serialize(process: StreamingQueryProgress): 
StoreTypes.StreamingQueryProgress = {
+import org.apache.spark.status.protobuf.Utils.setJMapField
+val builder = StoreTypes.StreamingQueryProgress.newBuilder()
+if (process.id != null) {
+  builder.setId(process.id.toString)
+}
+if (process.runId != null) {
+  builder.setRunId(process.runId.toString)
+}
+setStringField(process.name, builder.setName)
+setStringField(process.timestamp, builder.setTimestamp)
+builder.setBatchId(process.batchId)
+builder.setBatchDuration(process.batchDuration)
+setJMapField(process.durationMs, builder.putAllDurationMs)
+setJMapField(process.eventTime, builder.putAllEventTime)
+process.stateOperators.foreach(
+  s => 
builder.addStateOperators(StateOperatorProgressSerializer.serialize(s)))
+process.sources.foreach(
+  s => builder.addSources(SourceProgressSerializer.serialize(s))
+)
+builder.setSink(SinkProgressSerializer.serialize(process.sink))
+if (process.observedMetrics != null && !process.observedMetrics.isEmpty) {

Review Comment:
   use setJMapField?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] gengliangwang commented on a diff in pull request #39642: [SPARK-41677][CORE][SQL][SS][UI] Add Protobuf serializer for `StreamingQueryProgressWrapper`

2023-01-24 Thread via GitHub


gengliangwang commented on code in PR #39642:
URL: https://github.com/apache/spark/pull/39642#discussion_r1086049614


##
sql/core/src/main/scala/org/apache/spark/status/protobuf/sql/StreamingQueryProgressSerializer.scala:
##
@@ -0,0 +1,99 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.status.protobuf.sql
+
+import java.util.{HashMap => JHashMap, Map => JMap, UUID}
+
+import com.fasterxml.jackson.databind.json.JsonMapper
+import com.fasterxml.jackson.module.scala.DefaultScalaModule
+
+import org.apache.spark.sql.Row
+import org.apache.spark.sql.catalyst.expressions.GenericRowWithSchema
+import org.apache.spark.sql.streaming.StreamingQueryProgress
+import org.apache.spark.status.protobuf.StoreTypes
+import org.apache.spark.status.protobuf.Utils.{getStringField, setStringField}
+
+private[protobuf] object StreamingQueryProgressSerializer {
+
+  private val mapper: JsonMapper = JsonMapper.builder()
+.addModule(DefaultScalaModule)
+.build()
+
+  def serialize(process: StreamingQueryProgress): 
StoreTypes.StreamingQueryProgress = {
+import org.apache.spark.status.protobuf.Utils.setJMapField
+val builder = StoreTypes.StreamingQueryProgress.newBuilder()
+if (process.id != null) {

Review Comment:
   use setStringField?



##
sql/core/src/main/scala/org/apache/spark/status/protobuf/sql/StreamingQueryProgressSerializer.scala:
##
@@ -0,0 +1,99 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.status.protobuf.sql
+
+import java.util.{HashMap => JHashMap, Map => JMap, UUID}
+
+import com.fasterxml.jackson.databind.json.JsonMapper
+import com.fasterxml.jackson.module.scala.DefaultScalaModule
+
+import org.apache.spark.sql.Row
+import org.apache.spark.sql.catalyst.expressions.GenericRowWithSchema
+import org.apache.spark.sql.streaming.StreamingQueryProgress
+import org.apache.spark.status.protobuf.StoreTypes
+import org.apache.spark.status.protobuf.Utils.{getStringField, setStringField}
+
+private[protobuf] object StreamingQueryProgressSerializer {
+
+  private val mapper: JsonMapper = JsonMapper.builder()
+.addModule(DefaultScalaModule)
+.build()
+
+  def serialize(process: StreamingQueryProgress): 
StoreTypes.StreamingQueryProgress = {
+import org.apache.spark.status.protobuf.Utils.setJMapField
+val builder = StoreTypes.StreamingQueryProgress.newBuilder()
+if (process.id != null) {
+  builder.setId(process.id.toString)
+}
+if (process.runId != null) {
+  builder.setRunId(process.runId.toString)

Review Comment:
   ditto



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun opened a new pull request, #39727: Use scikit-learn instead of sklearn

2023-01-24 Thread via GitHub


dongjoon-hyun opened a new pull request, #39727:
URL: https://github.com/apache/spark/pull/39727

   
   
   ### What changes were proposed in this pull request?
   
   
   
   ### Why are the changes needed?
   
   
   
   ### Does this PR introduce _any_ user-facing change?
   
   
   
   ### How was this patch tested?
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] otterc commented on a diff in pull request #39725: [SPARK-33573][FOLLOW-UP] Increment ignoredBlockBytes when shuffle push blocks are late or colliding

2023-01-24 Thread via GitHub


otterc commented on code in PR #39725:
URL: https://github.com/apache/spark/pull/39725#discussion_r1086007667


##
common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/RemoteBlockPushResolver.java:
##
@@ -1393,6 +1405,7 @@ public void onData(String streamId, ByteBuffer buf) 
throws IOException {
   // the client in cases of duplicate even though no data is written.
   if (isDuplicateBlock()) {
 freeDeferredBufs();
+mergeManager.pushMergeMetrics.ignoredBlockBytes.mark(buf.limit());
 return;
   }
   abortIfNecessary();

Review Comment:
   Here it should be `buf.remaining` 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] rmcyang commented on a diff in pull request #39725: [SPARK-33573][FOLLOW-UP] Increment ignoredBlockBytes when shuffle push blocks are late or colliding

2023-01-24 Thread via GitHub


rmcyang commented on code in PR #39725:
URL: https://github.com/apache/spark/pull/39725#discussion_r1086006244


##
common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/RemoteBlockPushResolver.java:
##
@@ -1393,6 +1405,7 @@ public void onData(String streamId, ByteBuffer buf) 
throws IOException {
   // the client in cases of duplicate even though no data is written.
   if (isDuplicateBlock()) {
 freeDeferredBufs();
+mergeManager.pushMergeMetrics.ignoredBlockBytes.mark(buf.limit());
 return;
   }
   abortIfNecessary();

Review Comment:
   Make sense, have updated the PR to call 
`updateIgnoredBytesWithDeferredBufs()` in `abortIfNecessary()`.



##
common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/RemoteBlockPushResolver.java:
##
@@ -1356,6 +1356,17 @@ private boolean isTooLate(
 
!appShuffleMergePartitionsInfo.shuffleMergePartitions.containsKey(reduceId);
 }
 
+/**
+ * If necessary, look for the ignored bock bytes and capture them in 
pushMergeMetrics.
+ */
+private void lookForIgnoredBlockBytesIfNecessary() {
+  if (deferredBufs != null && !deferredBufs.isEmpty()) {
+for (ByteBuffer buf : deferredBufs) {
+  mergeManager.pushMergeMetrics.ignoredBlockBytes.mark(buf.limit());

Review Comment:
   Sounds good,  have updated the PR. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] rmcyang commented on a diff in pull request #39725: [SPARK-33573][FOLLOW-UP] Increment ignoredBlockBytes when shuffle push blocks are late or colliding

2023-01-24 Thread via GitHub


rmcyang commented on code in PR #39725:
URL: https://github.com/apache/spark/pull/39725#discussion_r1086005947


##
common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/RemoteBlockPushResolver.java:
##
@@ -1356,6 +1356,17 @@ private boolean isTooLate(
 
!appShuffleMergePartitionsInfo.shuffleMergePartitions.containsKey(reduceId);
 }
 
+/**
+ * If necessary, look for the ignored bock bytes and capture them in 
pushMergeMetrics.
+ */
+private void lookForIgnoredBlockBytesIfNecessary() {

Review Comment:
   Thanks for the suggestions, Chandni. Renamed it to 
`updateIgnoredBytesWithDeferredBufs` based on the former, as it sounds more 
smooth to me. Hopefully it looks better. Please let me know otherwise.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on a diff in pull request #39726: [SPARK-42123][SQL] Include column default values in DESCRIBE and SHOW CREATE TABLE output #39657

2023-01-24 Thread via GitHub


dongjoon-hyun commented on code in PR #39726:
URL: https://github.com/apache/spark/pull/39726#discussion_r1085985386


##
sql/core/src/test/resources/sql-tests/inputs/show-create-table.sql:
##
@@ -45,6 +45,14 @@ SHOW CREATE TABLE tbl;
 DROP TABLE tbl;
 
 
+-- default column values
+CREATE TABLE tbl (a INT, b STRING DEFAULT 'abc', c INT DEFAULT 42) USING 
parquet

Review Comment:
   Shall we use more complex string like `'abc, def'`?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on a diff in pull request #39726: [SPARK-42123][SQL] Include column default values in DESCRIBE and SHOW CREATE TABLE output #39657

2023-01-24 Thread via GitHub


dongjoon-hyun commented on code in PR #39726:
URL: https://github.com/apache/spark/pull/39726#discussion_r1085985386


##
sql/core/src/test/resources/sql-tests/inputs/show-create-table.sql:
##
@@ -45,6 +45,14 @@ SHOW CREATE TABLE tbl;
 DROP TABLE tbl;
 
 
+-- default column values
+CREATE TABLE tbl (a INT, b STRING DEFAULT 'abc', c INT DEFAULT 42) USING 
parquet

Review Comment:
   Shall we use more complex string like `abc, def`?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on a diff in pull request #39726: [SPARK-42123][SQL] Include column default values in DESCRIBE and SHOW CREATE TABLE output #39657

2023-01-24 Thread via GitHub


dongjoon-hyun commented on code in PR #39726:
URL: https://github.com/apache/spark/pull/39726#discussion_r1085984751


##
sql/core/src/test/resources/sql-tests/inputs/describe.sql:
##
@@ -97,3 +97,14 @@ DROP VIEW temp_v;
 DROP VIEW temp_Data_Source_View;
 
 DROP VIEW v;
+
+-- Show column default values
+CREATE TABLE d (a STRING, b INT DEFAULT 42) USING parquet COMMENT 
'table_comment';

Review Comment:
   Since this is a test case for DEFAULT value, please add a default value for 
column `a`, too, like `show-create-table.sql`.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on a diff in pull request #39726: [SPARK-42123][SQL] Include column default values in DESCRIBE and SHOW CREATE TABLE output #39657

2023-01-24 Thread via GitHub


dongjoon-hyun commented on code in PR #39726:
URL: https://github.com/apache/spark/pull/39726#discussion_r1085984751


##
sql/core/src/test/resources/sql-tests/inputs/describe.sql:
##
@@ -97,3 +97,14 @@ DROP VIEW temp_v;
 DROP VIEW temp_Data_Source_View;
 
 DROP VIEW v;
+
+-- Show column default values
+CREATE TABLE d (a STRING, b INT DEFAULT 42) USING parquet COMMENT 
'table_comment';

Review Comment:
   Since this is a test case for DEFAULT value, please add a default value for 
column `a`, too.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on a diff in pull request #39726: [SPARK-42123][SQL] Include column default values in DESCRIBE and SHOW CREATE TABLE output #39657

2023-01-24 Thread via GitHub


dongjoon-hyun commented on code in PR #39726:
URL: https://github.com/apache/spark/pull/39726#discussion_r1085984055


##
sql/catalyst/src/main/scala/org/apache/spark/sql/types/StructField.scala:
##
@@ -141,6 +141,10 @@ case class StructField(
 }
   }
 
+  private def getDDLDefault = getCurrentDefaultValue()
+.map(" DEFAULT " + _)

Review Comment:
   This looks a little weak. In case of STRING, we don't add quotation? If not, 
it looks like `DEFAULT THIS IS A DEFAULT VALUE`.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on a diff in pull request #39726: [SPARK-42123][SQL] Include column default values in DESCRIBE and SHOW CREATE TABLE output #39657

2023-01-24 Thread via GitHub


dongjoon-hyun commented on code in PR #39726:
URL: https://github.com/apache/spark/pull/39726#discussion_r1085981834


##
sql/catalyst/src/main/scala/org/apache/spark/sql/types/StructField.scala:
##
@@ -160,7 +164,7 @@ case class StructField(
*/
   def toDDL: String = {
 val nullString = if (nullable) "" else " NOT NULL"
-s"${quoteIfNeeded(name)} ${dataType.sql}${nullString}$getDDLComment"
+s"${quoteIfNeeded(name)} 
${dataType.sql}${nullString}$getDDLDefault$getDDLComment"

Review Comment:
   Just curious, what happens we have a multi-line default value string like 
`a\n b\n c\n d`?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on a diff in pull request #39726: [SPARK-42123][SQL] Include column default values in DESCRIBE and SHOW CREATE TABLE output #39657

2023-01-24 Thread via GitHub


dongjoon-hyun commented on code in PR #39726:
URL: https://github.com/apache/spark/pull/39726#discussion_r1085980466


##
sql/core/src/test/scala/org/apache/spark/sql/execution/command/v1/ShowCreateTableSuite.scala:
##
@@ -161,6 +161,30 @@ trait ShowCreateTableSuiteBase extends 
command.ShowCreateTableSuiteBase
   assert(cause.getMessage.contains("Use `SHOW CREATE TABLE` without `AS 
SERDE` instead"))
 }
   }
+
+  test("show create table with default column values") {
+withNamespaceAndTable(ns, table) { t =>
+  sql(
+s"""
+   |CREATE TABLE $t (
+   |  a bigint NOT NULL,
+   |  b bigint DEFAULT 42,
+   |  c string DEFAULT 'abc' COMMENT 'comment'
+   |)
+   |using parquet

Review Comment:
   `using` -> `USING`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on a diff in pull request #39726: [SPARK-42123][SQL] Include column default values in DESCRIBE and SHOW CREATE TABLE output #39657

2023-01-24 Thread via GitHub


dongjoon-hyun commented on code in PR #39726:
URL: https://github.com/apache/spark/pull/39726#discussion_r1085978481


##
sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala:
##
@@ -645,6 +645,17 @@ case class DescribeTableCommand(
   } else if (isExtended) {
 describeFormattedTableInfo(metadata, result)
   }
+
+  // If any columns have default values, append them to the result.
+  if (metadata.schema.fields.exists(_.metadata.contains(
+ResolveDefaultColumns.CURRENT_DEFAULT_COLUMN_METADATA_KEY))) {
+append(result, "", "", "")
+append(result, "# Column Default Information", "", "")

Review Comment:
   Maybe, `Default` -> `Default Value`?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on a diff in pull request #39726: [SPARK-42123][SQL] Include column default values in DESCRIBE and SHOW CREATE TABLE output #39657

2023-01-24 Thread via GitHub


dongjoon-hyun commented on code in PR #39726:
URL: https://github.com/apache/spark/pull/39726#discussion_r1085977201


##
sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala:
##
@@ -645,6 +645,17 @@ case class DescribeTableCommand(
   } else if (isExtended) {
 describeFormattedTableInfo(metadata, result)
   }
+
+  // If any columns have default values, append them to the result.
+  if (metadata.schema.fields.exists(_.metadata.contains(
+ResolveDefaultColumns.CURRENT_DEFAULT_COLUMN_METADATA_KEY))) {

Review Comment:
   indentation?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] vinodkc commented on a diff in pull request #39449: [SPARK-40688][SQL] Support data masking built-in function 'mask_first_n'

2023-01-24 Thread via GitHub


vinodkc commented on code in PR #39449:
URL: https://github.com/apache/spark/pull/39449#discussion_r1085963701


##
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/maskExpressions.scala:
##
@@ -257,19 +271,272 @@ case class Mask(
   otherChar = newChildren(4))
 }
 
-case class MaskArgument(maskChar: Char, ignore: Boolean)
+// scalastyle:off line.size.limit
+@ExpressionDescription(
+  usage =
+"""_FUNC_(input[, charCount, upperChar, lowerChar, digitChar, otherChar]) 
- masks the first n characters of given string value.
+   The function masks the first n characters of the value with 'X' or 'x', 
and numbers with 'n'.
+   This can be useful for creating copies of tables with sensitive 
information removed.
+   Error behavior: null value as replacement argument will throw 
AnalysisError.
+  """,
+  arguments = """
+Arguments:
+  * input  - string value to mask. Supported types: STRING, VARCHAR, 
CHAR
+  * charCount  - number of characters to be masked. Default value: 4
+  * upperChar  - character to replace upper-case characters with. Specify 
NULL to retain original character. Default value: 'X'
+  * lowerChar  - character to replace lower-case characters with. Specify 
NULL to retain original character. Default value: 'x'
+  * digitChar  - character to replace digit characters with. Specify NULL 
to retain original character. Default value: 'n'
+  * otherChar  - character to replace all other characters with. Specify 
NULL to retain original character. Default value: NULL
+  """,
+  examples = """
+Examples:

Review Comment:
   This case already exists `SELECT _FUNC_('abcd-EFGH-8765-@$#', 20, 'x', 'X', 
'n', 'o');`, input string length 18, request to mask 20 chars, 
   Result: `oo`
   Considers all chars  (18)in the input string for masking



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] otterc commented on a diff in pull request #39725: [SPARK-33573][FOLLOW-UP] Increment ignoredBlockBytes when shuffle push blocks are late or colliding

2023-01-24 Thread via GitHub


otterc commented on code in PR #39725:
URL: https://github.com/apache/spark/pull/39725#discussion_r1085930450


##
common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/RemoteBlockPushResolver.java:
##
@@ -1356,6 +1356,17 @@ private boolean isTooLate(
 
!appShuffleMergePartitionsInfo.shuffleMergePartitions.containsKey(reduceId);
 }
 
+/**
+ * If necessary, look for the ignored bock bytes and capture them in 
pushMergeMetrics.
+ */
+private void lookForIgnoredBlockBytesIfNecessary() {

Review Comment:
   nit: rename to `updatedIgnoreBytesWithDeferredBufs` or `deferredBufsIgnored` 
or something similar.



##
common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/RemoteBlockPushResolver.java:
##
@@ -1393,6 +1405,7 @@ public void onData(String streamId, ByteBuffer buf) 
throws IOException {
   // the client in cases of duplicate even though no data is written.
   if (isDuplicateBlock()) {
 freeDeferredBufs();
+mergeManager.pushMergeMetrics.ignoredBlockBytes.mark(buf.limit());
 return;
   }
   abortIfNecessary();

Review Comment:
   Even when the server aborts, it ignores the bytes.



##
common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/RemoteBlockPushResolver.java:
##
@@ -1356,6 +1356,17 @@ private boolean isTooLate(
 
!appShuffleMergePartitionsInfo.shuffleMergePartitions.containsKey(reduceId);
 }
 
+/**
+ * If necessary, look for the ignored bock bytes and capture them in 
pushMergeMetrics.
+ */
+private void lookForIgnoredBlockBytesIfNecessary() {
+  if (deferredBufs != null && !deferredBufs.isEmpty()) {
+for (ByteBuffer buf : deferredBufs) {
+  mergeManager.pushMergeMetrics.ignoredBlockBytes.mark(buf.limit());

Review Comment:
   I think we should use `buf.remaining` here



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



  1   2   >