[GitHub] [spark] LuciferYang commented on a diff in pull request #39733: Setting version to 3.5.0-SNAPSHOT
LuciferYang commented on code in PR #39733: URL: https://github.com/apache/spark/pull/39733#discussion_r1086307422 ## docs/_config.yml: ## @@ -19,8 +19,8 @@ include: # These allow the documentation to be updated with newer releases # of Spark, Scala, and Mesos. -SPARK_VERSION: 3.4.0-SNAPSHOT -SPARK_VERSION_SHORT: 3.4.0 +SPARK_VERSION: 3.5.0-SNAPSHOT +SPARK_VERSION_SHORT: 3.5.0 SCALA_BINARY_VERSION: "2.12" SCALA_VERSION: "2.12.16" Review Comment: When should `SCALA_VERSION` be changed? The current master uses `2.12.17` instead of `2.12.16`. Maybe the branch-3.4 also needs to be changed? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] itholic commented on a diff in pull request #39700: [SPARK-41490][SQL] Assign name to _LEGACY_ERROR_TEMP_2441
itholic commented on code in PR #39700: URL: https://github.com/apache/spark/pull/39700#discussion_r1086300683 ## core/src/main/resources/error/error-classes.json: ## @@ -1432,6 +1432,12 @@ }, "sqlState" : "0A000" }, + "UNSUPPORTED_EXPR_FOR_OPERATOR" : { +"message" : [ + "The query operator `` contains one or more unsupported expression types Aggregate, Window or Generate.", Review Comment: Thanks! Just applied the comment -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] LuciferYang commented on pull request #39732: [SPARK-42178][UI] Handle remaining null string values in ui protobuf serializer and add tests
LuciferYang commented on PR #39732: URL: https://github.com/apache/spark/pull/39732#issuecomment-1403214341 Another way I can think of is to check each `field` in 'StoreTypes.getDescriptor.toProto.getMessageTypeList', but it should not be better than the current one. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on a diff in pull request #39732: [SPARK-42178][UI] Handle remaining null string values in ui protobuf serializer and add tests
dongjoon-hyun commented on code in PR #39732: URL: https://github.com/apache/spark/pull/39732#discussion_r1086294697 ## core/src/test/scala/org/apache/spark/status/protobuf/KVStoreProtobufSerializerSuite.scala: ## @@ -463,7 +492,7 @@ class KVStoreProtobufSerializerSuite extends SparkFunSuite { name = null, numPartitions = 8, numCachedPartitions = 5, - storageLevel = "IN_MEMORY", + storageLevel = null, Review Comment: Got it~ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] peter-toth commented on pull request #39722: [WIP][SPARK-42162] Introduce MultiAdd expression as a memory optimization for canonicalizing large trees of Add expressions
peter-toth commented on PR #39722: URL: https://github.com/apache/spark/pull/39722#issuecomment-1403212470 > - With the https://github.com/apache/spark/pull/37851 in the expression canonicalization, a complex query with a large number of Add operations could end up consuming significantly more (sometimes > 10X) memory on the executors. @db-scnakandala, can you please explain this issue a bit more? Does https://github.com/apache/spark/pull/37851 cause performance regression? Why exactly? And why on executors? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] gengliangwang commented on a diff in pull request #39732: [SPARK-42178][UI] Handle remaining null string values in ui protobuf serializer and add tests
gengliangwang commented on code in PR #39732: URL: https://github.com/apache/spark/pull/39732#discussion_r1086294168 ## core/src/test/scala/org/apache/spark/status/protobuf/KVStoreProtobufSerializerSuite.scala: ## @@ -463,7 +492,7 @@ class KVStoreProtobufSerializerSuite extends SparkFunSuite { name = null, numPartitions = 8, numCachedPartitions = 5, - storageLevel = "IN_MEMORY", + storageLevel = null, Review Comment: yeah we need to test null string input. There is another object with `storageLevel` as `IN_MEMORY` already. Let's simply change the third object. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on pull request #39735: [SPARK-42179][BUILD][SQL][3.3] Upgrade ORC to 1.7.8
dongjoon-hyun commented on PR #39735: URL: https://github.com/apache/spark/pull/39735#issuecomment-1403211478 Could you review this PR, @gengliangwang ? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] gengliangwang commented on a diff in pull request #39732: [SPARK-42178][UI] Handle remaining null string values in ui protobuf serializer and add tests
gengliangwang commented on code in PR #39732: URL: https://github.com/apache/spark/pull/39732#discussion_r1086293127 ## core/src/test/scala/org/apache/spark/status/protobuf/KVStoreProtobufSerializerSuite.scala: ## @@ -31,6 +34,32 @@ import org.apache.spark.ui.scope.{RDDOperationEdge, RDDOperationNode} class KVStoreProtobufSerializerSuite extends SparkFunSuite { private val serializer = new KVStoreProtobufSerializer() + test("All the string fields must be optional to avoid NPE") { Review Comment: Example failure output: ``` ArrayBuffer((" string storage_level = 5;", 267), (" string name = 1;", 363)) was not empty All the string fields should be defined as `optional string` for handling null string. Please update the following fields: line #267: string storage_level = 5; line #363: string name = 1; ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] gengliangwang commented on a diff in pull request #39732: [SPARK-42178][UI] Handle remaining null string values in ui protobuf serializer and add tests
gengliangwang commented on code in PR #39732: URL: https://github.com/apache/spark/pull/39732#discussion_r1086292734 ## core/src/test/scala/org/apache/spark/status/protobuf/KVStoreProtobufSerializerSuite.scala: ## @@ -31,6 +34,32 @@ import org.apache.spark.ui.scope.{RDDOperationEdge, RDDOperationNode} class KVStoreProtobufSerializerSuite extends SparkFunSuite { private val serializer = new KVStoreProtobufSerializer() + test("All the string fields must be optional to avoid NPE") { +val protoFile = getWorkspaceFilePath( + "core", "src", "main", "protobuf", "org", "apache", "spark", "status", "protobuf", + "store_types.proto") + +val containsStringRegex = "\\s*string .*" +val invalidDefinition = new mutable.ArrayBuffer[(String, Int)]() +var lineNumber = 1 +Source.fromFile(protoFile.toFile.getCanonicalPath).getLines().foreach { line => + if (line.matches(containsStringRegex)) { Review Comment: ok, updated -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] EnricoMi commented on pull request #39717: [SPARK-42168][3.2][SQL][PYTHON] Fix required child distribution of FlatMapCoGroupsInPandas (as in CoGroup)
EnricoMi commented on PR #39717: URL: https://github.com/apache/spark/pull/39717#issuecomment-1403210336 @sunchao good catch! I have renamed the PR and added the Python example as a unit test. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on a diff in pull request #39732: [SPARK-42178][UI] Handle remaining null string values in ui protobuf serializer and add tests
dongjoon-hyun commented on code in PR #39732: URL: https://github.com/apache/spark/pull/39732#discussion_r1086292404 ## core/src/test/scala/org/apache/spark/status/protobuf/KVStoreProtobufSerializerSuite.scala: ## @@ -463,7 +492,7 @@ class KVStoreProtobufSerializerSuite extends SparkFunSuite { name = null, numPartitions = 8, numCachedPartitions = 5, - storageLevel = "IN_MEMORY", + storageLevel = null, Review Comment: Is this change required in this PR? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on a diff in pull request #39732: [SPARK-42178][UI] Handle remaining null string values in ui protobuf serializer and add tests
dongjoon-hyun commented on code in PR #39732: URL: https://github.com/apache/spark/pull/39732#discussion_r1086292024 ## core/src/test/scala/org/apache/spark/status/protobuf/KVStoreProtobufSerializerSuite.scala: ## @@ -31,6 +34,32 @@ import org.apache.spark.ui.scope.{RDDOperationEdge, RDDOperationNode} class KVStoreProtobufSerializerSuite extends SparkFunSuite { private val serializer = new KVStoreProtobufSerializer() + test("All the string fields must be optional to avoid NPE") { +val protoFile = getWorkspaceFilePath( + "core", "src", "main", "protobuf", "org", "apache", "spark", "status", "protobuf", + "store_types.proto") + +val containsStringRegex = "\\s*string .*" +val invalidDefinition = new mutable.ArrayBuffer[(String, Int)]() +var lineNumber = 1 +Source.fromFile(protoFile.toFile.getCanonicalPath).getLines().foreach { line => + if (line.matches(containsStringRegex)) { Review Comment: +1 for the above comment. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun opened a new pull request, #39735: [SPARK-42179][BUILD][SQL][3.3] Upgrade ORC to 1.7.8
dongjoon-hyun opened a new pull request, #39735: URL: https://github.com/apache/spark/pull/39735 ### What changes were proposed in this pull request? This PR aims to upgrade ORC to 1.7.8 for Apache Spark 3.3.2. ### Why are the changes needed? Apache ORC 1.7.8 is a maintenance release with important bug fixes. - https://orc.apache.org/news/2023/01/21/ORC-1.7.8/ ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Pass the CIs. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] itholic commented on a diff in pull request #39702: [SPARK-41487][SQL] Assign name to _LEGACY_ERROR_TEMP_1020
itholic commented on code in PR #39702: URL: https://github.com/apache/spark/pull/39702#discussion_r1086279876 ## sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/analysis/AnalysisErrorSuite.scala: ## @@ -1050,15 +1052,17 @@ class AnalysisErrorSuite extends AnalysisTest { val t2 = LocalRelation(b, c).as("t2") // SELECT * FROM t1 WHERE a = (SELECT sum(c) FROM t2 WHERE t1.* = t2.b) -assertAnalysisError( +assertAnalysisErrorClass( Filter(EqualTo(a, ScalarSubquery(t2.select(sum(c)).where(star("t1") === b))), t1), - "Invalid usage of '*' in Filter" :: Nil + expectedErrorClass = "INVALID_USAGE_OF_STAR", + expectedMessageParameters = Map("elem" -> "'*'", "prettyName" -> "Filter") ) // SELECT * FROM t1 JOIN t2 ON (EXISTS (SELECT 1 FROM t2 WHERE t1.* = b)) -assertAnalysisError( +assertAnalysisErrorClass( t1.join(t2, condition = Some(Exists(t2.select(1).where(star("t1") === b, - "Invalid usage of '*' in Filter" :: Nil + expectedErrorClass = "INVALID_USAGE_OF_STAR", + expectedMessageParameters = Map("elem" -> "'*'", "prettyName" -> "Filter") Review Comment: I tried to `operator.expressions.map(toPrettySQL).mkString(",")`, but it returns `"(unresolvedstar() = b)"`. ## sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/analysis/AnalysisErrorSuite.scala: ## @@ -1050,15 +1052,17 @@ class AnalysisErrorSuite extends AnalysisTest { val t2 = LocalRelation(b, c).as("t2") // SELECT * FROM t1 WHERE a = (SELECT sum(c) FROM t2 WHERE t1.* = t2.b) -assertAnalysisError( +assertAnalysisErrorClass( Filter(EqualTo(a, ScalarSubquery(t2.select(sum(c)).where(star("t1") === b))), t1), - "Invalid usage of '*' in Filter" :: Nil + expectedErrorClass = "INVALID_USAGE_OF_STAR", + expectedMessageParameters = Map("elem" -> "'*'", "prettyName" -> "Filter") ) // SELECT * FROM t1 JOIN t2 ON (EXISTS (SELECT 1 FROM t2 WHERE t1.* = b)) -assertAnalysisError( +assertAnalysisErrorClass( t1.join(t2, condition = Some(Exists(t2.select(1).where(star("t1") === b, - "Invalid usage of '*' in Filter" :: Nil + expectedErrorClass = "INVALID_USAGE_OF_STAR", + expectedMessageParameters = Map("elem" -> "'*'", "prettyName" -> "Filter") Review Comment: I tried to `operator.expressions.map(toPrettySQL).mkString(",")` to make SQL expression, but it returns `"(unresolvedstar() = b)"`. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] itholic commented on a diff in pull request #39701: [SPARK-41489][SQL] Assign name to _LEGACY_ERROR_TEMP_2415
itholic commented on code in PR #39701: URL: https://github.com/apache/spark/pull/39701#discussion_r1086276342 ## core/src/main/resources/error/error-classes.json: ## @@ -933,6 +933,12 @@ ], "sqlState" : "42604" }, + "INVALID_TYPE_FOR_FILTER_EXPR" : { Review Comment: Yes, I agree that it's not really related to an input parameter. Let me put it in the `DATATYPE_MISMATCH` and rename it as `FILTER_NOT_BOOLEAN` for now. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on pull request #39731: [SPARK-42177][INFRA][3.4] Change master to branch-3.4 in GitHub Actions
dongjoon-hyun commented on PR #39731: URL: https://github.com/apache/spark/pull/39731#issuecomment-1403185289 Thank you, @HyukjinKwon ! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] zhengruifeng commented on a diff in pull request #39695: [SPARK-42156] SparkConnectClient supports RetryPolicies now
zhengruifeng commented on code in PR #39695: URL: https://github.com/apache/spark/pull/39695#discussion_r1086269322 ## python/pyspark/sql/connect/client.py: ## @@ -551,37 +592,37 @@ def _execute_and_fetch( logger.info("ExecuteAndFetch") m: Optional[pb2.ExecutePlanResponse.Metrics] = None - batches: List[pa.RecordBatch] = [] try: -for b in self._stub.ExecutePlan(req, metadata=self._builder.metadata()): -if b.client_id != self._session_id: -raise SparkConnectException( -"Received incorrect session identifier for request." -) -if b.metrics is not None: -logger.debug("Received metric batch.") -m = b.metrics -if b.HasField("arrow_batch"): -logger.debug( -f"Received arrow batch rows={b.arrow_batch.row_count} " -f"size={len(b.arrow_batch.data)}" -) - -with pa.ipc.open_stream(b.arrow_batch.data) as reader: -for batch in reader: -assert isinstance(batch, pa.RecordBatch) -batches.append(batch) +for attempt in Retrying( +can_retry=SparkConnectClient.retry_exception, **self._retry_policy +): +with attempt: Review Comment: before each attempt, should we set `m = None` and `batches = []`? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] zhengruifeng commented on a diff in pull request #39734: [SPARK-41812][SPARK-41823][CONNECT][PYTHON] Fix ambiguous columns issue in Join
zhengruifeng commented on code in PR #39734: URL: https://github.com/apache/spark/pull/39734#discussion_r1086264760 ## connector/connect/server/src/test/scala/org/apache/spark/sql/connect/planner/SparkConnectPlannerSuite.scala: ## @@ -260,21 +260,21 @@ class SparkConnectPlannerSuite extends SparkFunSuite with SparkConnectPlanTest { .addArguments(unresolvedAttribute) .build()) -val simpleJoin = proto.Relation.newBuilder - .setJoin( -proto.Join.newBuilder - .setLeft(readRel) - .setRight(readRel) - .setJoinType(proto.Join.JoinType.JOIN_TYPE_INNER) - .setJoinCondition(joinCondition) - .build()) - .build() - -val res = transform(simpleJoin) -assert(res.nodeName == "Join") -assert(res != null) +val e0 = intercept[AnalysisException] { Review Comment: after this PR, Join with `JoinCondition` will be eagerly analyzed. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] itholic commented on a diff in pull request #39702: [SPARK-41487][SQL] Assign name to _LEGACY_ERROR_TEMP_1020
itholic commented on code in PR #39702: URL: https://github.com/apache/spark/pull/39702#discussion_r1086259893 ## sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/analysis/AnalysisErrorSuite.scala: ## @@ -1050,15 +1052,17 @@ class AnalysisErrorSuite extends AnalysisTest { val t2 = LocalRelation(b, c).as("t2") // SELECT * FROM t1 WHERE a = (SELECT sum(c) FROM t2 WHERE t1.* = t2.b) -assertAnalysisError( +assertAnalysisErrorClass( Filter(EqualTo(a, ScalarSubquery(t2.select(sum(c)).where(star("t1") === b))), t1), - "Invalid usage of '*' in Filter" :: Nil + expectedErrorClass = "INVALID_USAGE_OF_STAR", + expectedMessageParameters = Map("elem" -> "'*'", "prettyName" -> "Filter") ) // SELECT * FROM t1 JOIN t2 ON (EXISTS (SELECT 1 FROM t2 WHERE t1.* = b)) -assertAnalysisError( +assertAnalysisErrorClass( t1.join(t2, condition = Some(Exists(t2.select(1).where(star("t1") === b, - "Invalid usage of '*' in Filter" :: Nil + expectedErrorClass = "INVALID_USAGE_OF_STAR", + expectedMessageParameters = Map("elem" -> "'*'", "prettyName" -> "Filter") Review Comment: Thanks @MaxGekk for review. I tried to output a SQL statement here, but unfortunately I couldn't proper way to expose the SQL statement rather than node name The code path is here: https://github.com/apache/spark/blob/866343c7be47d71b88ae9a6b4dda26f8c4f5964b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala#L247-L250 The type of `operator` is `LogicalPlan`, and `s` is `Star` here. Can we extract the SQL statement from these objects? Do you happen to help outputting a SQL statement here?? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] zhengruifeng commented on a diff in pull request #39734: [SPARK-41812][SPARK-41823][CONNECT][PYTHON] Fix ambiguous columns issue in Join
zhengruifeng commented on code in PR #39734: URL: https://github.com/apache/spark/pull/39734#discussion_r1086250404 ## connector/connect/server/src/main/scala/org/apache/spark/sql/connect/planner/SparkConnectPlanner.scala: ## @@ -1174,27 +1174,63 @@ class SparkConnectPlanner(val session: SparkSession) { } } + private def splitConjunctivePredicates(condition: Expression): Seq[Expression] = { +condition match { + case UnresolvedFunction(Seq("and"), Seq(cond1, cond2), _, _, _) => +splitConjunctivePredicates(cond1) ++ splitConjunctivePredicates(cond2) + case other => other :: Nil +} + } + private def transformJoin(rel: proto.Join): LogicalPlan = { assert(rel.hasLeft && rel.hasRight, "Both join sides must be present") if (rel.hasJoinCondition && rel.getUsingColumnsCount > 0) { throw InvalidPlanInput( s"Using columns or join conditions cannot be set at the same time in Join") } -val joinCondition = - if (rel.hasJoinCondition) Some(transformExpression(rel.getJoinCondition)) else None + val catalystJointype = transformJoinType( if (rel.getJoinType != null) rel.getJoinType else proto.Join.JoinType.JOIN_TYPE_INNER) val joinType = if (rel.getUsingColumnsCount > 0) { UsingJoin(catalystJointype, rel.getUsingColumnsList.asScala.toSeq) } else { catalystJointype } -logical.Join( - left = transformRelation(rel.getLeft), - right = transformRelation(rel.getRight), - joinType = joinType, - condition = joinCondition, - hint = logical.JoinHint.NONE) + +if (rel.hasJoinCondition) { + val leftDF = Dataset.ofRows(session, transformRelation(rel.getLeft)) + val rightDF = Dataset.ofRows(session, transformRelation(rel.getRight)) + val joinExprs = splitConjunctivePredicates(transformExpression(rel.getJoinCondition)) +.map { + case func @ UnresolvedFunction(Seq(f), Seq(l, r), _, _, _) + if Seq("==", "<=>").contains(f) => +val l2 = l match { + case UnresolvedAttribute(Seq(c)) => leftDF.apply(c).expr + case other => other +} +val r2 = r match { + case UnresolvedAttribute(Seq(c)) => rightDF.apply(c).expr + case other => other +} +func.copy(arguments = Seq(l2, r2)) + + case other => other +} +.reduce(And) + + leftDF Review Comment: we must use `DataFrame.join` here to make sure the DataFrame ID will not be changed. https://github.com/apache/spark/blob/faedcd91d554a00fc76116a0c188752cf036f907/sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala#L202-L203 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] otterc commented on a diff in pull request #39725: [SPARK-33573][FOLLOW-UP] Increment ignoredBlockBytes when shuffle push blocks are late or colliding
otterc commented on code in PR #39725: URL: https://github.com/apache/spark/pull/39725#discussion_r1086246214 ## common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/RemoteBlockPushResolver.java: ## @@ -1356,6 +1357,18 @@ private boolean isTooLate( !appShuffleMergePartitionsInfo.shuffleMergePartitions.containsKey(reduceId); } +/** + * For the block bytes in the deferred buffers that are ignored, capture them + * and update pushMergeMetrics's ignoredBlockBytes. + */ +private void updateIgnoredBytesWithDeferredBufs() { + if (deferredBufs != null && !deferredBufs.isEmpty()) { +for (ByteBuffer buf : deferredBufs) { + mergeManager.pushMergeMetrics.ignoredBlockBytes.mark(buf.remaining()); +} + } +} + Review Comment: ```suggestion private void updateIgnoredBytes(long numBytes) { if (numBytes > 0) { mergeManager.pushMergeMetrics.ignoredBlockBytes.mark(numBytes); } } ``` ## common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/RemoteBlockPushResolver.java: ## @@ -1315,6 +1315,7 @@ private void freeDeferredBufs() { */ private void abortIfNecessary() { if (partitionInfo.shouldAbort(mergeManager.ioExceptionsThresholdDuringMerge)) { +updateIgnoredBytesWithDeferredBufs(); freeDeferredBufs(); Review Comment: ```suggestion updateIgnoredBytes(freeDeferredBufs()); ``` ## common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/RemoteBlockPushResolver.java: ## @@ -1461,6 +1476,7 @@ public void onComplete(String streamId) throws IOException { AppShuffleMergePartitionsInfo info = appShuffleInfo.shuffles.get(partitionInfo.appAttemptShuffleMergeId.shuffleId); if (isTooLate(info, partitionInfo.reduceId)) { + updateIgnoredBytesWithDeferredBufs(); freeDeferredBufs(); Review Comment: ```suggestion updateIgnoredBytes(freeDeferredBufs()); ``` ## common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/RemoteBlockPushResolver.java: ## @@ -1393,6 +1407,7 @@ public void onData(String streamId, ByteBuffer buf) throws IOException { // the client in cases of duplicate even though no data is written. if (isDuplicateBlock()) { freeDeferredBufs(); +mergeManager.pushMergeMetrics.ignoredBlockBytes.mark(buf.limit()); Review Comment: ```suggestion updateIgnoredBytes(freeDeferredBufs() + buf.remaining()); ``` ## common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/RemoteBlockPushResolver.java: ## @@ -1520,6 +1537,7 @@ public void onComplete(String streamId) throws IOException { partitionInfo.resetChunkTracker(); } } else { + updateIgnoredBytesWithDeferredBufs(); freeDeferredBufs(); Review Comment: ```suggestion updateIgnoredBytes(freeDeferredBufs()); ``` ## common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/RemoteBlockPushResolver.java: ## @@ -1480,6 +1496,7 @@ public void onComplete(String streamId) throws IOException { // Identify duplicate block generated by speculative tasks. We respond success to // the client in cases of duplicate even though no data is written. if (isDuplicateBlock()) { +updateIgnoredBytesWithDeferredBufs(); freeDeferredBufs(); Review Comment: ```suggestion updateIgnoredBytes(freeDeferredBufs()); ``` ## common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/RemoteBlockPushResolver.java: ## @@ -1382,6 +1395,7 @@ public void onData(String streamId, ByteBuffer buf) throws IOException { freeDeferredBufs(); if (isTooLateBlockPush) { mergeManager.pushMergeMetrics.lateBlockPushes.mark(); +mergeManager.pushMergeMetrics.ignoredBlockBytes.mark(buf.limit()); Review Comment: ```suggestion long deferredBytes = freeDeferredBufs(); if (isTooLateBlockPush) { mergeManager.pushMergeMetrics.lateBlockPushes.mark(); updateIgnoredBytes(deferredBytes + buf.remaining()); ``` ## common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/RemoteBlockPushResolver.java: ## @@ -1315,6 +1315,7 @@ private void freeDeferredBufs() { */ Review Comment: Wasn't able to comment on the `freeDeferredBufs()` lines since it is not changed. We can change it to below to apply my other suggestions. ``` /** * @return total number of deferred bytes */ private long freeDeferredBufs() { if (deferredBufs == null || deferredBufs.isEmpty()) { deferredBufs =
[GitHub] [spark] zhengruifeng commented on a diff in pull request #39734: [SPARK-41812][SPARK-41823][CONNECT][PYTHON] Fix ambiguous columns in Join
zhengruifeng commented on code in PR #39734: URL: https://github.com/apache/spark/pull/39734#discussion_r1086250404 ## connector/connect/server/src/main/scala/org/apache/spark/sql/connect/planner/SparkConnectPlanner.scala: ## @@ -1174,27 +1174,63 @@ class SparkConnectPlanner(val session: SparkSession) { } } + private def splitConjunctivePredicates(condition: Expression): Seq[Expression] = { +condition match { + case UnresolvedFunction(Seq("and"), Seq(cond1, cond2), _, _, _) => +splitConjunctivePredicates(cond1) ++ splitConjunctivePredicates(cond2) + case other => other :: Nil +} + } + private def transformJoin(rel: proto.Join): LogicalPlan = { assert(rel.hasLeft && rel.hasRight, "Both join sides must be present") if (rel.hasJoinCondition && rel.getUsingColumnsCount > 0) { throw InvalidPlanInput( s"Using columns or join conditions cannot be set at the same time in Join") } -val joinCondition = - if (rel.hasJoinCondition) Some(transformExpression(rel.getJoinCondition)) else None + val catalystJointype = transformJoinType( if (rel.getJoinType != null) rel.getJoinType else proto.Join.JoinType.JOIN_TYPE_INNER) val joinType = if (rel.getUsingColumnsCount > 0) { UsingJoin(catalystJointype, rel.getUsingColumnsList.asScala.toSeq) } else { catalystJointype } -logical.Join( - left = transformRelation(rel.getLeft), - right = transformRelation(rel.getRight), - joinType = joinType, - condition = joinCondition, - hint = logical.JoinHint.NONE) + +if (rel.hasJoinCondition) { + val leftDF = Dataset.ofRows(session, transformRelation(rel.getLeft)) + val rightDF = Dataset.ofRows(session, transformRelation(rel.getRight)) + val joinExprs = splitConjunctivePredicates(transformExpression(rel.getJoinCondition)) +.map { + case func @ UnresolvedFunction(Seq(f), Seq(l, r), _, _, _) + if Seq("==", "<=>").contains(f) => +val l2 = l match { + case UnresolvedAttribute(Seq(c)) => leftDF.apply(c).expr + case other => other +} +val r2 = r match { + case UnresolvedAttribute(Seq(c)) => rightDF.apply(c).expr + case other => other +} +func.copy(arguments = Seq(l2, r2)) + + case other => other +} +.reduce(And) + + leftDF Review Comment: we must use `DataFrame.join` here to make sure the DataFrame ID is not changed. https://github.com/apache/spark/blob/faedcd91d554a00fc76116a0c188752cf036f907/sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala#L202-L203 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] zhengruifeng commented on a diff in pull request #39734: [SPARK-41812][SPARK-41823][CONNECT][PYTHON] Fix ambiguous columns in Join
zhengruifeng commented on code in PR #39734: URL: https://github.com/apache/spark/pull/39734#discussion_r1086248363 ## connector/connect/server/src/main/scala/org/apache/spark/sql/connect/planner/SparkConnectPlanner.scala: ## @@ -1174,27 +1174,63 @@ class SparkConnectPlanner(val session: SparkSession) { } } + private def splitConjunctivePredicates(condition: Expression): Seq[Expression] = { +condition match { + case UnresolvedFunction(Seq("and"), Seq(cond1, cond2), _, _, _) => +splitConjunctivePredicates(cond1) ++ splitConjunctivePredicates(cond2) + case other => other :: Nil +} + } + private def transformJoin(rel: proto.Join): LogicalPlan = { assert(rel.hasLeft && rel.hasRight, "Both join sides must be present") if (rel.hasJoinCondition && rel.getUsingColumnsCount > 0) { throw InvalidPlanInput( s"Using columns or join conditions cannot be set at the same time in Join") } -val joinCondition = - if (rel.hasJoinCondition) Some(transformExpression(rel.getJoinCondition)) else None + val catalystJointype = transformJoinType( if (rel.getJoinType != null) rel.getJoinType else proto.Join.JoinType.JOIN_TYPE_INNER) val joinType = if (rel.getUsingColumnsCount > 0) { UsingJoin(catalystJointype, rel.getUsingColumnsList.asScala.toSeq) } else { catalystJointype } -logical.Join( - left = transformRelation(rel.getLeft), - right = transformRelation(rel.getRight), - joinType = joinType, - condition = joinCondition, - hint = logical.JoinHint.NONE) + +if (rel.hasJoinCondition) { + val leftDF = Dataset.ofRows(session, transformRelation(rel.getLeft)) + val rightDF = Dataset.ofRows(session, transformRelation(rel.getRight)) + val joinExprs = splitConjunctivePredicates(transformExpression(rel.getJoinCondition)) +.map { + case func @ UnresolvedFunction(Seq(f), Seq(l, r), _, _, _) + if Seq("==", "<=>").contains(f) => Review Comment: currently, both `a == a` and `a <=> a` are taken into account -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] zhengruifeng commented on a diff in pull request #39734: [SPARK-41812][SPARK-41823][CONNECT][PYTHON] Fix ambiguous columns in Join
zhengruifeng commented on code in PR #39734: URL: https://github.com/apache/spark/pull/39734#discussion_r1086248363 ## connector/connect/server/src/main/scala/org/apache/spark/sql/connect/planner/SparkConnectPlanner.scala: ## @@ -1174,27 +1174,63 @@ class SparkConnectPlanner(val session: SparkSession) { } } + private def splitConjunctivePredicates(condition: Expression): Seq[Expression] = { +condition match { + case UnresolvedFunction(Seq("and"), Seq(cond1, cond2), _, _, _) => +splitConjunctivePredicates(cond1) ++ splitConjunctivePredicates(cond2) + case other => other :: Nil +} + } + private def transformJoin(rel: proto.Join): LogicalPlan = { assert(rel.hasLeft && rel.hasRight, "Both join sides must be present") if (rel.hasJoinCondition && rel.getUsingColumnsCount > 0) { throw InvalidPlanInput( s"Using columns or join conditions cannot be set at the same time in Join") } -val joinCondition = - if (rel.hasJoinCondition) Some(transformExpression(rel.getJoinCondition)) else None + val catalystJointype = transformJoinType( if (rel.getJoinType != null) rel.getJoinType else proto.Join.JoinType.JOIN_TYPE_INNER) val joinType = if (rel.getUsingColumnsCount > 0) { UsingJoin(catalystJointype, rel.getUsingColumnsList.asScala.toSeq) } else { catalystJointype } -logical.Join( - left = transformRelation(rel.getLeft), - right = transformRelation(rel.getRight), - joinType = joinType, - condition = joinCondition, - hint = logical.JoinHint.NONE) + +if (rel.hasJoinCondition) { + val leftDF = Dataset.ofRows(session, transformRelation(rel.getLeft)) + val rightDF = Dataset.ofRows(session, transformRelation(rel.getRight)) + val joinExprs = splitConjunctivePredicates(transformExpression(rel.getJoinCondition)) +.map { + case func @ UnresolvedFunction(Seq(f), Seq(l, r), _, _, _) + if Seq("==", "<=>").contains(f) => Review Comment: currently, only `a == a` and `a <=> a` are taken into account -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] zhengruifeng opened a new pull request, #39734: [SPARK-41812][SPARK-41823][CONNECT][PYTHON] Fix ambiguous columns in Join
zhengruifeng opened a new pull request, #39734: URL: https://github.com/apache/spark/pull/39734 ### What changes were proposed in this pull request? PySpark's `DataFrame.__getattr__` and `DataFrame.__getitem__` invokes `jc = self._jdf.apply(name)` in JVM, which resolve the column name and attach the dataframe id via `addDataFrameIdToCol` to handle ambiguous columns , see https://github.com/apache/spark/blob/faedcd91d554a00fc76116a0c188752cf036f907/sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala#L1472-L1481 But in Connect, the output of `DataFrame.__getattr__` and `DataFrame.__getitem__` is not bound to the input `DataFrame`, it is just an `UnresolvedAttribute`. This PR aims to fix this issue by switching to the DataFrame API based implementation. ### Why are the changes needed? for parity ### Does this PR introduce _any_ user-facing change? yes ### How was this patch tested? enabled doctests and added UT -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] otterc commented on a diff in pull request #39725: [SPARK-33573][FOLLOW-UP] Increment ignoredBlockBytes when shuffle push blocks are late or colliding
otterc commented on code in PR #39725: URL: https://github.com/apache/spark/pull/39725#discussion_r1086238663 ## common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/RemoteBlockPushResolver.java: ## @@ -1382,6 +1395,7 @@ public void onData(String streamId, ByteBuffer buf) throws IOException { freeDeferredBufs(); Review Comment: When it comes here because of `isTooLateBlockPush` then the whole block is going to be ignored. That means the deferredBufs which are getting freed will be ignored as well so we need to add those bytes to the ignoreBlockBytes as well, correct? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] LuciferYang commented on a diff in pull request #39732: [SPARK-42178][UI] Handle remaining null string values in ui protobuf serializer and add tests
LuciferYang commented on code in PR #39732: URL: https://github.com/apache/spark/pull/39732#discussion_r1086234357 ## core/src/test/scala/org/apache/spark/status/protobuf/KVStoreProtobufSerializerSuite.scala: ## @@ -31,6 +34,32 @@ import org.apache.spark.ui.scope.{RDDOperationEdge, RDDOperationNode} class KVStoreProtobufSerializerSuite extends SparkFunSuite { private val serializer = new KVStoreProtobufSerializer() + test("All the string fields must be optional to avoid NPE") { +val protoFile = getWorkspaceFilePath( + "core", "src", "main", "protobuf", "org", "apache", "spark", "status", "protobuf", + "store_types.proto") + +val containsStringRegex = "\\s*string .*" +val invalidDefinition = new mutable.ArrayBuffer[(String, Int)]() +var lineNumber = 1 +Source.fromFile(protoFile.toFile.getCanonicalPath).getLines().foreach { line => + if (line.matches(containsStringRegex)) { Review Comment: `org.apache.spark.util.Utils.tryWithResource` should be used to ensure`Source.fromFile(protoFile.toFile.getCanonicalPath)` is closed after use -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] LuciferYang commented on a diff in pull request #39732: [SPARK-42178][UI] Handle remaining null string values in ui protobuf serializer and add tests
LuciferYang commented on code in PR #39732: URL: https://github.com/apache/spark/pull/39732#discussion_r1086234357 ## core/src/test/scala/org/apache/spark/status/protobuf/KVStoreProtobufSerializerSuite.scala: ## @@ -31,6 +34,32 @@ import org.apache.spark.ui.scope.{RDDOperationEdge, RDDOperationNode} class KVStoreProtobufSerializerSuite extends SparkFunSuite { private val serializer = new KVStoreProtobufSerializer() + test("All the string fields must be optional to avoid NPE") { +val protoFile = getWorkspaceFilePath( + "core", "src", "main", "protobuf", "org", "apache", "spark", "status", "protobuf", + "store_types.proto") + +val containsStringRegex = "\\s*string .*" +val invalidDefinition = new mutable.ArrayBuffer[(String, Int)]() +var lineNumber = 1 +Source.fromFile(protoFile.toFile.getCanonicalPath).getLines().foreach { line => + if (line.matches(containsStringRegex)) { Review Comment: `org.apache.spark.util.Utils.tryWithResource` should be used to ensure`Source.fromFile(protoFile.toFile.getCanonicalPath)` is closed -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] xinrong-meng opened a new pull request, #39733: Setting version to 3.5.0-SNAPSHOT
xinrong-meng opened a new pull request, #39733: URL: https://github.com/apache/spark/pull/39733 ### What changes were proposed in this pull request? his PR aims to update `master` branch version to 3.5.0-SNAPSHOT. ### Why are the changes needed? Start to prepare Apache Spark 3.5.0 and the published snapshot version should not conflict with `branch-3.4`. ### Does this PR introduce _any_ user-facing change? N/A. ### How was this patch tested? CI. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] otterc commented on a diff in pull request #39725: [SPARK-33573][FOLLOW-UP] Increment ignoredBlockBytes when shuffle push blocks are late or colliding
otterc commented on code in PR #39725: URL: https://github.com/apache/spark/pull/39725#discussion_r1086008666 ## common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/RemoteBlockPushResolver.java: ## @@ -1382,6 +1395,7 @@ public void onData(String streamId, ByteBuffer buf) throws IOException { freeDeferredBufs(); if (isTooLateBlockPush) { mergeManager.pushMergeMetrics.lateBlockPushes.mark(); +mergeManager.pushMergeMetrics.ignoredBlockBytes.mark(buf.limit()); Review Comment: should be` buf.remaining()` ## common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/RemoteBlockPushResolver.java: ## @@ -1356,6 +1357,18 @@ private boolean isTooLate( !appShuffleMergePartitionsInfo.shuffleMergePartitions.containsKey(reduceId); } +/** + * For the block bytes in the deferred buffers that are ignored, capture them + * and update pushMergeMetrics's ignoredBlockBytes. + */ +private void updateIgnoredBytesWithDeferredBufs() { + if (deferredBufs != null && !deferredBufs.isEmpty()) { +for (ByteBuffer buf : deferredBufs) { + mergeManager.pushMergeMetrics.ignoredBlockBytes.mark(buf.remaining()); Review Comment: I made a mistake with the earlier comment. For deferredBufs, we can use `buf.limit`. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] gengliangwang commented on a diff in pull request #39732: [SPARK-42178][UI] Handle remaining null string values in ui protobuf serializer and add tests
gengliangwang commented on code in PR #39732: URL: https://github.com/apache/spark/pull/39732#discussion_r1086223751 ## core/src/test/scala/org/apache/spark/status/protobuf/KVStoreProtobufSerializerSuite.scala: ## @@ -31,6 +34,32 @@ import org.apache.spark.ui.scope.{RDDOperationEdge, RDDOperationNode} class KVStoreProtobufSerializerSuite extends SparkFunSuite { private val serializer = new KVStoreProtobufSerializer() + test("All the string fields must be optional to avoid NPE") { Review Comment: This test checks the protobuf file to make sure all the string fields are defined as `optional string` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] gengliangwang commented on pull request #39732: [SPARK-42178][UI] Handle remaining null string values in ui protobuf serializer and add tests
gengliangwang commented on PR #39732: URL: https://github.com/apache/spark/pull/39732#issuecomment-1403122822 cc @LuciferYang -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] gengliangwang opened a new pull request, #39732: [SPARK-42178][UI] Handle remaining null string values in ui protobuf serializer and add tests
gengliangwang opened a new pull request, #39732: URL: https://github.com/apache/spark/pull/39732 ### What changes were proposed in this pull request? * Similar to https://github.com/apache/spark/pull/39666, handle remaining null string values in ui protobuf serializer, including `RDDStorageInfo` and `ResourceInformation` * Add test to make sure all the string fields are defined as `optional string`. ### Why are the changes needed? Properly handles null string values in the protobuf serializer. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? New UTs -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] LuciferYang commented on a diff in pull request #39712: [SPARK-42172][CONNECT] Scala Client Mima Compatibility Tests
LuciferYang commented on code in PR #39712: URL: https://github.com/apache/spark/pull/39712#discussion_r1086208244 ## connector/connect/client/jvm/pom.xml: ## @@ -75,6 +76,13 @@ mockito-core test + Review Comment: cc @dongjoon-hyun , also cc @pan3793 Do you have any suggestions for this? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] LuciferYang commented on pull request #39642: [SPARK-41677][CORE][SQL][SS][UI] Add Protobuf serializer for `StreamingQueryProgressWrapper`
LuciferYang commented on PR #39642: URL: https://github.com/apache/spark/pull/39642#issuecomment-1403096069 done. Thank you for reviewing the code during the holiday :) @gengliangwang -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] gengliangwang commented on a diff in pull request #39642: [SPARK-41677][CORE][SQL][SS][UI] Add Protobuf serializer for `StreamingQueryProgressWrapper`
gengliangwang commented on code in PR #39642: URL: https://github.com/apache/spark/pull/39642#discussion_r1086194955 ## sql/core/src/main/scala/org/apache/spark/status/protobuf/sql/StreamingQueryProgressSerializer.scala: ## @@ -0,0 +1,102 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.status.protobuf.sql + +import java.util.{HashMap => JHashMap, Map => JMap, UUID} + +import com.fasterxml.jackson.databind.json.JsonMapper +import com.fasterxml.jackson.module.scala.DefaultScalaModule + +import org.apache.spark.sql.Row +import org.apache.spark.sql.catalyst.expressions.GenericRowWithSchema +import org.apache.spark.sql.streaming.StreamingQueryProgress +import org.apache.spark.status.protobuf.StoreTypes +import org.apache.spark.status.protobuf.Utils.{getStringField, setJMapField, setStringField} + +private[protobuf] object StreamingQueryProgressSerializer { + + private val mapper: JsonMapper = JsonMapper.builder() +.addModule(DefaultScalaModule) +.build() + + def serialize(process: StreamingQueryProgress): StoreTypes.StreamingQueryProgress = { +val builder = StoreTypes.StreamingQueryProgress.newBuilder() +if (process.id != null) { + builder.setId(process.id.toString) +} +if (process.runId != null) { + builder.setRunId(process.runId.toString) +} +setStringField(process.name, builder.setName) +setStringField(process.timestamp, builder.setTimestamp) +builder.setBatchId(process.batchId) +builder.setBatchDuration(process.batchDuration) +setJMapField(process.durationMs, builder.putAllDurationMs) +setJMapField(process.eventTime, builder.putAllEventTime) +process.stateOperators.foreach( + s => builder.addStateOperators(StateOperatorProgressSerializer.serialize(s))) +process.sources.foreach( + s => builder.addSources(SourceProgressSerializer.serialize(s)) +) +builder.setSink(SinkProgressSerializer.serialize(process.sink)) +setJMapField(process.observedMetrics, putAllObservedMetrics(builder, _)) +builder.build() + } + + def deserialize(process: StoreTypes.StreamingQueryProgress): StreamingQueryProgress = { +val id = if (process.hasId) { + UUID.fromString(process.getId) +} else null +val runId = if (process.hasId) { + UUID.fromString(process.getRunId) +} else null +new StreamingQueryProgress( + id = id, + runId = runId, + name = getStringField(process.hasName, () => process.getName), + timestamp = getStringField(process.hasTimestamp, () => process.getTimestamp), + batchId = process.getBatchId, + batchDuration = process.getBatchDuration, + durationMs = new JHashMap(process.getDurationMsMap), + eventTime = new JHashMap(process.getEventTimeMap), + stateOperators = + StateOperatorProgressSerializer.deserializeToArray(process.getStateOperatorsList), + sources = SourceProgressSerializer.deserializeToArray(process.getSourcesList), + sink = SinkProgressSerializer.deserialize(process.getSink), + observedMetrics = convertToObservedMetrics(process.getObservedMetricsMap) +) + } + + private def putAllObservedMetrics( + builder: StoreTypes.StreamingQueryProgress.Builder, + observedMetrics: JMap[String, Row]): Unit = { +observedMetrics.forEach { + case (k, v) => builder.putObservedMetrics(k, mapper.writeValueAsString(v)) Review Comment: nit: Let's add one line comment to mention why we choose to encode the row object with json. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] gengliangwang commented on a diff in pull request #39642: [SPARK-41677][CORE][SQL][SS][UI] Add Protobuf serializer for `StreamingQueryProgressWrapper`
gengliangwang commented on code in PR #39642: URL: https://github.com/apache/spark/pull/39642#discussion_r1086190132 ## sql/core/src/main/scala/org/apache/spark/status/protobuf/sql/StreamingQueryProgressSerializer.scala: ## @@ -0,0 +1,99 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.status.protobuf.sql + +import java.util.{HashMap => JHashMap, Map => JMap, UUID} + +import com.fasterxml.jackson.databind.json.JsonMapper +import com.fasterxml.jackson.module.scala.DefaultScalaModule + +import org.apache.spark.sql.Row +import org.apache.spark.sql.catalyst.expressions.GenericRowWithSchema +import org.apache.spark.sql.streaming.StreamingQueryProgress +import org.apache.spark.status.protobuf.StoreTypes +import org.apache.spark.status.protobuf.Utils.{getStringField, setStringField} + +private[protobuf] object StreamingQueryProgressSerializer { + + private val mapper: JsonMapper = JsonMapper.builder() +.addModule(DefaultScalaModule) +.build() + + def serialize(process: StreamingQueryProgress): StoreTypes.StreamingQueryProgress = { +import org.apache.spark.status.protobuf.Utils.setJMapField +val builder = StoreTypes.StreamingQueryProgress.newBuilder() +if (process.id != null) { Review Comment: nvm, let's keep the current code -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] sadikovi commented on pull request #39660: [SPARK-42128][SQL] Support TOP (N) for MS SQL Server dialect as an alternative to Limit pushdown
sadikovi commented on PR #39660: URL: https://github.com/apache/spark/pull/39660#issuecomment-1403067310 @beliefer @srowen @dongjoon-hyun Could you please check the following comments: - https://github.com/apache/spark/pull/39660#discussion_r1084578229 - https://github.com/apache/spark/pull/39660#discussion_r1084580060 Do you prefer to redesign JdbcDialects API for this work as a follow-up? Or should I close this PR and just work on redesign directly? Thanks. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon closed pull request #39731: [SPARK-42177][INFRA][3.4] Change master to branch-3.4 in GitHub Actions
HyukjinKwon closed pull request #39731: [SPARK-42177][INFRA][3.4] Change master to branch-3.4 in GitHub Actions URL: https://github.com/apache/spark/pull/39731 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on pull request #39731: [SPARK-42177][INFRA][3.4] Change master to brach-3.4 in GitHub Actions
HyukjinKwon commented on PR #39731: URL: https://github.com/apache/spark/pull/39731#issuecomment-1403053335 Merged to branch-3.4. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on pull request #39731: [SPARK-42177][INFRA][3.4] Change master to brach-3.4 in GitHub Actions
HyukjinKwon commented on PR #39731: URL: https://github.com/apache/spark/pull/39731#issuecomment-1403053271 im gonna push this to recover the build. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon closed pull request #39729: [SPARK-42176][SQL] Fix cast of a boolean value to timestamp
HyukjinKwon closed pull request #39729: [SPARK-42176][SQL] Fix cast of a boolean value to timestamp URL: https://github.com/apache/spark/pull/39729 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on pull request #39729: [SPARK-42176][SQL] Fix cast of a boolean value to timestamp
HyukjinKwon commented on PR #39729: URL: https://github.com/apache/spark/pull/39729#issuecomment-1403052158 Merged to master, branch-3.4, and branch-3.3. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on pull request #39731: [SPARK-42177][INFRA][3.4] Change master to brach-3.4 in GitHub Actions
HyukjinKwon commented on PR #39731: URL: https://github.com/apache/spark/pull/39731#issuecomment-1403048032 cc @xinrong-meng @dongjoon-hyun @Yikun -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon opened a new pull request, #39731: [SPARK-42177][INFRA][3.4] Change master to brach-3.4 in GitHub Actions
HyukjinKwon opened a new pull request, #39731: URL: https://github.com/apache/spark/pull/39731 ### What changes were proposed in this pull request? Now we cut `branch-3.4`. But the `branch-3.4` points `master` reference (see https://github.com/apache/spark/blob/branch-3.4/.github/workflows/build_and_test.yml) that makes the CI fails https://github.com/apache/spark/actions/runs/4002380215/jobs/6869886029 ### Why are the changes needed? To recover the CI ### Does this PR introduce _any_ user-facing change? No, test-only. ### How was this patch tested? CI should be monitored after this PR gets merged. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon closed pull request #39730: [SPARK-42177][INFRA][3.4] Change master to brach-3.4 in GitHub Actions
HyukjinKwon closed pull request #39730: [SPARK-42177][INFRA][3.4] Change master to brach-3.4 in GitHub Actions URL: https://github.com/apache/spark/pull/39730 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon opened a new pull request, #39730: [SPARK-42177][INFRA][3.4] Change master to brach-3.4 in GitHub Actions
HyukjinKwon opened a new pull request, #39730: URL: https://github.com/apache/spark/pull/39730 ### What changes were proposed in this pull request? ### Why are the changes needed? To recover the CI, see https://github.com/apache/spark/actions/runs/4002380215/jobs/6869886029. ### Does this PR introduce _any_ user-facing change? No, test-only. ### How was this patch tested? CI should be monitored after this PR gets merged. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] zhenlineo commented on a diff in pull request #39712: [SPARK-42172][CONNECT] Scala Client Mima Compatibility Tests
zhenlineo commented on code in PR #39712: URL: https://github.com/apache/spark/pull/39712#discussion_r1086087241 ## connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/Column.scala: ## @@ -44,7 +45,7 @@ import org.apache.spark.sql.functions.lit * * @since 3.4.0 */ -class Column private[sql] (private[sql] val expr: proto.Expression) { +class Column private[sql] (private[sql] val expr: proto.Expression) extends Logging { Review Comment: The logging is needed for the binary compatibility: class type shall be exactly the same. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] LuciferYang commented on a diff in pull request #39642: [SPARK-41677][CORE][SQL][SS][UI] Add Protobuf serializer for `StreamingQueryProgressWrapper`
LuciferYang commented on code in PR #39642: URL: https://github.com/apache/spark/pull/39642#discussion_r1086143970 ## sql/core/src/main/scala/org/apache/spark/status/protobuf/sql/StreamingQueryProgressSerializer.scala: ## @@ -0,0 +1,102 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.status.protobuf.sql + +import java.util.{HashMap => JHashMap, Map => JMap, UUID} + +import com.fasterxml.jackson.databind.json.JsonMapper +import com.fasterxml.jackson.module.scala.DefaultScalaModule + +import org.apache.spark.sql.Row +import org.apache.spark.sql.catalyst.expressions.GenericRowWithSchema +import org.apache.spark.sql.streaming.StreamingQueryProgress +import org.apache.spark.status.protobuf.StoreTypes +import org.apache.spark.status.protobuf.Utils.{getStringField, setJMapField, setStringField} + +private[protobuf] object StreamingQueryProgressSerializer { + + private val mapper: JsonMapper = JsonMapper.builder() +.addModule(DefaultScalaModule) +.build() + + def serialize(process: StreamingQueryProgress): StoreTypes.StreamingQueryProgress = { +val builder = StoreTypes.StreamingQueryProgress.newBuilder() +if (process.id != null) { + builder.setId(process.id.toString) +} +if (process.runId != null) { + builder.setRunId(process.runId.toString) +} +setStringField(process.name, builder.setName) +setStringField(process.timestamp, builder.setTimestamp) +builder.setBatchId(process.batchId) +builder.setBatchDuration(process.batchDuration) +setJMapField(process.durationMs, builder.putAllDurationMs) +setJMapField(process.eventTime, builder.putAllEventTime) +process.stateOperators.foreach( + s => builder.addStateOperators(StateOperatorProgressSerializer.serialize(s))) +process.sources.foreach( + s => builder.addSources(SourceProgressSerializer.serialize(s)) +) +builder.setSink(SinkProgressSerializer.serialize(process.sink)) +setJMapField(process.observedMetrics, putAllObservedMetrics(builder, _)) +builder.build() + } + + def deserialize(process: StoreTypes.StreamingQueryProgress): StreamingQueryProgress = { +val id = if (process.hasId) { + UUID.fromString(process.getId) +} else null +val runId = if (process.hasId) { + UUID.fromString(process.getRunId) +} else null +new StreamingQueryProgress( + id = id, + runId = runId, + name = getStringField(process.hasName, () => process.getName), + timestamp = getStringField(process.hasTimestamp, () => process.getTimestamp), + batchId = process.getBatchId, + batchDuration = process.getBatchDuration, + durationMs = new JHashMap(process.getDurationMsMap), + eventTime = new JHashMap(process.getEventTimeMap), + stateOperators = + StateOperatorProgressSerializer.deserializeToArray(process.getStateOperatorsList), + sources = SourceProgressSerializer.deserializeToArray(process.getSourcesList), + sink = SinkProgressSerializer.deserialize(process.getSink), + observedMetrics = convertToObservedMetrics(process.getObservedMetricsMap) +) + } + + private def putAllObservedMetrics( Review Comment: Change to define a class level private function -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] sadikovi commented on a diff in pull request #39729: [SPARK-42176][SQL] Fix cast of a boolean value to timestamp
sadikovi commented on code in PR #39729: URL: https://github.com/apache/spark/pull/39729#discussion_r1086142703 ## sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/CastWithAnsiOffSuite.scala: ## @@ -608,6 +608,11 @@ class CastWithAnsiOffSuite extends CastSuiteBase { checkEvaluation(cast(input, StringType), "1.23E-7") } + test("SPARK-42176: cast boolean to timestamp") { Review Comment: Oh, thanks for fixing it! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on a diff in pull request #39729: [SPARK-42175][SQL] Fix cast of a boolean value to timestamp
dongjoon-hyun commented on code in PR #39729: URL: https://github.com/apache/spark/pull/39729#discussion_r1086138284 ## sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/CastWithAnsiOffSuite.scala: ## @@ -608,6 +608,11 @@ class CastWithAnsiOffSuite extends CastSuiteBase { checkEvaluation(cast(input, StringType), "1.23E-7") } + test("SPARK-42176: cast boolean to timestamp") { Review Comment: The PR title seems to show a wrong JIRA ID, `SPARK-42175`. Let me fix it. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] LuciferYang commented on a diff in pull request #39642: [SPARK-41677][CORE][SQL][SS][UI] Add Protobuf serializer for `StreamingQueryProgressWrapper`
LuciferYang commented on code in PR #39642: URL: https://github.com/apache/spark/pull/39642#discussion_r1086137202 ## sql/core/src/main/scala/org/apache/spark/status/protobuf/sql/StreamingQueryProgressSerializer.scala: ## @@ -0,0 +1,99 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.status.protobuf.sql + +import java.util.{HashMap => JHashMap, Map => JMap, UUID} + +import com.fasterxml.jackson.databind.json.JsonMapper +import com.fasterxml.jackson.module.scala.DefaultScalaModule + +import org.apache.spark.sql.Row +import org.apache.spark.sql.catalyst.expressions.GenericRowWithSchema +import org.apache.spark.sql.streaming.StreamingQueryProgress +import org.apache.spark.status.protobuf.StoreTypes +import org.apache.spark.status.protobuf.Utils.{getStringField, setStringField} + +private[protobuf] object StreamingQueryProgressSerializer { + + private val mapper: JsonMapper = JsonMapper.builder() +.addModule(DefaultScalaModule) +.build() + + def serialize(process: StreamingQueryProgress): StoreTypes.StreamingQueryProgress = { +import org.apache.spark.status.protobuf.Utils.setJMapField +val builder = StoreTypes.StreamingQueryProgress.newBuilder() +if (process.id != null) { Review Comment: https://github.com/apache/spark/blob/c43be4a1b23b599ef9c7f81c2cba619bcf7e/core/src/main/scala/org/apache/spark/status/protobuf/Utils.scala#L27-L31 Or should we refactor `setStringField` to ```scala def setStringField[T](input: T, f: String => Any): Unit = { if (input != null) { f(input.toString) } } ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on a diff in pull request #39726: [SPARK-42123][SQL] Include column default values in DESCRIBE and SHOW CREATE TABLE output #39657
dongjoon-hyun commented on code in PR #39726: URL: https://github.com/apache/spark/pull/39726#discussion_r1086136677 ## sql/core/src/test/resources/sql-tests/inputs/describe.sql: ## @@ -97,3 +97,25 @@ DROP VIEW temp_v; DROP VIEW temp_Data_Source_View; DROP VIEW v; + +-- Show column default values +CREATE TABLE d (a STRING DEFAULT 'show-create-table.sql', b INT DEFAULT 42) USING parquet COMMENT 'table_comment'; Review Comment: Oh, I meant to refer `show-create-table.sql` file of this PR instead of mentioning the literal `show-create-table.sql` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] LuciferYang commented on a diff in pull request #39642: [SPARK-41677][CORE][SQL][SS][UI] Add Protobuf serializer for `StreamingQueryProgressWrapper`
LuciferYang commented on code in PR #39642: URL: https://github.com/apache/spark/pull/39642#discussion_r1086118621 ## sql/core/src/main/scala/org/apache/spark/status/protobuf/sql/StreamingQueryProgressSerializer.scala: ## @@ -0,0 +1,99 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.status.protobuf.sql + +import java.util.{HashMap => JHashMap, Map => JMap, UUID} + +import com.fasterxml.jackson.databind.json.JsonMapper +import com.fasterxml.jackson.module.scala.DefaultScalaModule + +import org.apache.spark.sql.Row +import org.apache.spark.sql.catalyst.expressions.GenericRowWithSchema +import org.apache.spark.sql.streaming.StreamingQueryProgress +import org.apache.spark.status.protobuf.StoreTypes +import org.apache.spark.status.protobuf.Utils.{getStringField, setJMapField, setStringField} + +private[protobuf] object StreamingQueryProgressSerializer { + + private val mapper: JsonMapper = JsonMapper.builder() +.addModule(DefaultScalaModule) +.build() + + def serialize(process: StreamingQueryProgress): StoreTypes.StreamingQueryProgress = { +val builder = StoreTypes.StreamingQueryProgress.newBuilder() +if (process.id != null) { + builder.setId(process.id.toString) +} +if (process.runId != null) { + builder.setRunId(process.runId.toString) +} +setStringField(process.name, builder.setName) +setStringField(process.timestamp, builder.setTimestamp) +builder.setBatchId(process.batchId) +builder.setBatchDuration(process.batchDuration) +setJMapField(process.durationMs, builder.putAllDurationMs) +setJMapField(process.eventTime, builder.putAllEventTime) +process.stateOperators.foreach( + s => builder.addStateOperators(StateOperatorProgressSerializer.serialize(s))) +process.sources.foreach( + s => builder.addSources(SourceProgressSerializer.serialize(s)) +) +builder.setSink(SinkProgressSerializer.serialize(process.sink)) +def serializeMetrics(metrics: JMap[String, Row]): Unit = { Review Comment: use `setJMapField` need to define an additional function or use an anonymous function due this is not a simple `putAll` scenario. Any other better way to write this ? @gengliangwang -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dtenedor commented on a diff in pull request #39726: [SPARK-42123][SQL] Include column default values in DESCRIBE and SHOW CREATE TABLE output #39657
dtenedor commented on code in PR #39726: URL: https://github.com/apache/spark/pull/39726#discussion_r1086099890 ## sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala: ## @@ -645,6 +645,17 @@ case class DescribeTableCommand( } else if (isExtended) { describeFormattedTableInfo(metadata, result) } + + // If any columns have default values, append them to the result. + if (metadata.schema.fields.exists(_.metadata.contains( +ResolveDefaultColumns.CURRENT_DEFAULT_COLUMN_METADATA_KEY))) { Review Comment: Done. ## sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala: ## @@ -645,6 +645,17 @@ case class DescribeTableCommand( } else if (isExtended) { describeFormattedTableInfo(metadata, result) } + + // If any columns have default values, append them to the result. + if (metadata.schema.fields.exists(_.metadata.contains( +ResolveDefaultColumns.CURRENT_DEFAULT_COLUMN_METADATA_KEY))) { +append(result, "", "", "") +append(result, "# Column Default Information", "", "") Review Comment: Sounds good, done. ## sql/catalyst/src/main/scala/org/apache/spark/sql/types/StructField.scala: ## @@ -160,7 +164,7 @@ case class StructField( */ def toDDL: String = { val nullString = if (nullable) "" else " NOT NULL" -s"${quoteIfNeeded(name)} ${dataType.sql}${nullString}$getDDLComment" +s"${quoteIfNeeded(name)} ${dataType.sql}${nullString}$getDDLDefault$getDDLComment" Review Comment: Good Q, I added a test case covering this. ## sql/core/src/test/scala/org/apache/spark/sql/execution/command/v1/ShowCreateTableSuite.scala: ## @@ -161,6 +161,30 @@ trait ShowCreateTableSuiteBase extends command.ShowCreateTableSuiteBase assert(cause.getMessage.contains("Use `SHOW CREATE TABLE` without `AS SERDE` instead")) } } + + test("show create table with default column values") { +withNamespaceAndTable(ns, table) { t => + sql( +s""" + |CREATE TABLE $t ( + | a bigint NOT NULL, + | b bigint DEFAULT 42, + | c string DEFAULT 'abc' COMMENT 'comment' + |) + |using parquet Review Comment: Done ## sql/catalyst/src/main/scala/org/apache/spark/sql/types/StructField.scala: ## @@ -141,6 +141,10 @@ case class StructField( } } + private def getDDLDefault = getCurrentDefaultValue() +.map(" DEFAULT " + _) Review Comment: Good question, I was originally confused about this as well :) the default value is exactly the string that appeared in the CREATE TABLE or ALTER TABLE statement. So, for example, if we had `CREATE TABLE t (a STRING DEFAULT 'str')`, then this column metadata has the value `'str'` (including the single-quotes). So we have to print them back exactly as-is here (e.g. I tried adding `escapeSingleQuotedString` like `getDDLComment` below, but it added extra escaping which we did not want). The extra test cases you suggested should show that this works even for different types of string-typed defaults. ## sql/core/src/test/resources/sql-tests/inputs/describe.sql: ## @@ -97,3 +97,14 @@ DROP VIEW temp_v; DROP VIEW temp_Data_Source_View; DROP VIEW v; + +-- Show column default values +CREATE TABLE d (a STRING, b INT DEFAULT 42) USING parquet COMMENT 'table_comment'; Review Comment: Sounds good, done. ## sql/core/src/test/resources/sql-tests/inputs/show-create-table.sql: ## @@ -45,6 +45,14 @@ SHOW CREATE TABLE tbl; DROP TABLE tbl; +-- default column values +CREATE TABLE tbl (a INT, b STRING DEFAULT 'abc', c INT DEFAULT 42) USING parquet Review Comment: Done. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] itholic commented on pull request #39721: [SPARK-42171][PYSPARK][TESTS] Fix `pyspark-errors` module and enable it in GitHub Action
itholic commented on PR #39721: URL: https://github.com/apache/spark/pull/39721#issuecomment-1402916066 Late LGTM, thanks for fixing! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on pull request #39728: [SPARK-42173][CORE] RpcAddress equality can fail
dongjoon-hyun commented on PR #39728: URL: https://github.com/apache/spark/pull/39728#issuecomment-1402912399 cc @mridulm , too -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] holdenk commented on a diff in pull request #39728: [SPARK-42173][CORE] RpcAddress equality can fail
holdenk commented on code in PR #39728: URL: https://github.com/apache/spark/pull/39728#discussion_r1086106561 ## core/src/main/scala/org/apache/spark/rpc/RpcAddress.scala: ## @@ -23,30 +23,37 @@ import org.apache.spark.util.Utils /** * Address for an RPC environment, with hostname and port. */ -private[spark] case class RpcAddress(_host: String, port: Int) { - - lazy val host: String = Utils.addBracketsIfNeeded(_host) +private[spark] case class RpcAddress(host: String, port: Int) { def hostPort: String = host + ":" + port /** Returns a string in the form of "spark://host:port". */ def toSparkURL: String = "spark://" + hostPort override def toString: String = hostPort + + Review Comment: Sure :) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] LuciferYang commented on a diff in pull request #39642: [SPARK-41677][CORE][SQL][SS][UI] Add Protobuf serializer for `StreamingQueryProgressWrapper`
LuciferYang commented on code in PR #39642: URL: https://github.com/apache/spark/pull/39642#discussion_r1086105668 ## sql/core/src/main/scala/org/apache/spark/status/protobuf/sql/StreamingQueryProgressSerializer.scala: ## @@ -0,0 +1,99 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.status.protobuf.sql + +import java.util.{HashMap => JHashMap, Map => JMap, UUID} + +import com.fasterxml.jackson.databind.json.JsonMapper +import com.fasterxml.jackson.module.scala.DefaultScalaModule + +import org.apache.spark.sql.Row +import org.apache.spark.sql.catalyst.expressions.GenericRowWithSchema +import org.apache.spark.sql.streaming.StreamingQueryProgress +import org.apache.spark.status.protobuf.StoreTypes +import org.apache.spark.status.protobuf.Utils.{getStringField, setStringField} + +private[protobuf] object StreamingQueryProgressSerializer { + + private val mapper: JsonMapper = JsonMapper.builder() +.addModule(DefaultScalaModule) +.build() + + def serialize(process: StreamingQueryProgress): StoreTypes.StreamingQueryProgress = { +import org.apache.spark.status.protobuf.Utils.setJMapField +val builder = StoreTypes.StreamingQueryProgress.newBuilder() +if (process.id != null) { + builder.setId(process.id.toString) +} +if (process.runId != null) { + builder.setRunId(process.runId.toString) Review Comment: ditto, `process.runId` is also UUID -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] LuciferYang commented on a diff in pull request #39642: [SPARK-41677][CORE][SQL][SS][UI] Add Protobuf serializer for `StreamingQueryProgressWrapper`
LuciferYang commented on code in PR #39642: URL: https://github.com/apache/spark/pull/39642#discussion_r1086105668 ## sql/core/src/main/scala/org/apache/spark/status/protobuf/sql/StreamingQueryProgressSerializer.scala: ## @@ -0,0 +1,99 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.status.protobuf.sql + +import java.util.{HashMap => JHashMap, Map => JMap, UUID} + +import com.fasterxml.jackson.databind.json.JsonMapper +import com.fasterxml.jackson.module.scala.DefaultScalaModule + +import org.apache.spark.sql.Row +import org.apache.spark.sql.catalyst.expressions.GenericRowWithSchema +import org.apache.spark.sql.streaming.StreamingQueryProgress +import org.apache.spark.status.protobuf.StoreTypes +import org.apache.spark.status.protobuf.Utils.{getStringField, setStringField} + +private[protobuf] object StreamingQueryProgressSerializer { + + private val mapper: JsonMapper = JsonMapper.builder() +.addModule(DefaultScalaModule) +.build() + + def serialize(process: StreamingQueryProgress): StoreTypes.StreamingQueryProgress = { +import org.apache.spark.status.protobuf.Utils.setJMapField +val builder = StoreTypes.StreamingQueryProgress.newBuilder() +if (process.id != null) { + builder.setId(process.id.toString) +} +if (process.runId != null) { + builder.setRunId(process.runId.toString) Review Comment: ditto, process.runId is also UUID -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] LuciferYang commented on a diff in pull request #39642: [SPARK-41677][CORE][SQL][SS][UI] Add Protobuf serializer for `StreamingQueryProgressWrapper`
LuciferYang commented on code in PR #39642: URL: https://github.com/apache/spark/pull/39642#discussion_r1086105619 ## sql/core/src/main/scala/org/apache/spark/status/protobuf/sql/StreamingQueryProgressSerializer.scala: ## @@ -0,0 +1,99 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.status.protobuf.sql + +import java.util.{HashMap => JHashMap, Map => JMap, UUID} + +import com.fasterxml.jackson.databind.json.JsonMapper +import com.fasterxml.jackson.module.scala.DefaultScalaModule + +import org.apache.spark.sql.Row +import org.apache.spark.sql.catalyst.expressions.GenericRowWithSchema +import org.apache.spark.sql.streaming.StreamingQueryProgress +import org.apache.spark.status.protobuf.StoreTypes +import org.apache.spark.status.protobuf.Utils.{getStringField, setStringField} + +private[protobuf] object StreamingQueryProgressSerializer { + + private val mapper: JsonMapper = JsonMapper.builder() +.addModule(DefaultScalaModule) +.build() + + def serialize(process: StreamingQueryProgress): StoreTypes.StreamingQueryProgress = { +import org.apache.spark.status.protobuf.Utils.setJMapField +val builder = StoreTypes.StreamingQueryProgress.newBuilder() +if (process.id != null) { Review Comment: `process.id` is UUID, cannot be the first parameter of `setStringField ` ## sql/core/src/main/scala/org/apache/spark/status/protobuf/sql/StreamingQueryProgressSerializer.scala: ## @@ -0,0 +1,99 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.status.protobuf.sql + +import java.util.{HashMap => JHashMap, Map => JMap, UUID} + +import com.fasterxml.jackson.databind.json.JsonMapper +import com.fasterxml.jackson.module.scala.DefaultScalaModule + +import org.apache.spark.sql.Row +import org.apache.spark.sql.catalyst.expressions.GenericRowWithSchema +import org.apache.spark.sql.streaming.StreamingQueryProgress +import org.apache.spark.status.protobuf.StoreTypes +import org.apache.spark.status.protobuf.Utils.{getStringField, setStringField} + +private[protobuf] object StreamingQueryProgressSerializer { + + private val mapper: JsonMapper = JsonMapper.builder() +.addModule(DefaultScalaModule) +.build() + + def serialize(process: StreamingQueryProgress): StoreTypes.StreamingQueryProgress = { +import org.apache.spark.status.protobuf.Utils.setJMapField +val builder = StoreTypes.StreamingQueryProgress.newBuilder() +if (process.id != null) { + builder.setId(process.id.toString) +} +if (process.runId != null) { + builder.setRunId(process.runId.toString) Review Comment: ditto -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on a diff in pull request #39728: [SPARK-42173][CORE] RpcAddress equality can fail
dongjoon-hyun commented on code in PR #39728: URL: https://github.com/apache/spark/pull/39728#discussion_r1086103013 ## core/src/main/scala/org/apache/spark/rpc/RpcAddress.scala: ## @@ -23,30 +23,37 @@ import org.apache.spark.util.Utils /** * Address for an RPC environment, with hostname and port. */ -private[spark] case class RpcAddress(_host: String, port: Int) { - - lazy val host: String = Utils.addBracketsIfNeeded(_host) +private[spark] case class RpcAddress(host: String, port: Int) { def hostPort: String = host + ":" + port /** Returns a string in the form of "spark://host:port". */ def toSparkURL: String = "spark://" + hostPort override def toString: String = hostPort + + Review Comment: Could you revert this empty line addition? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on pull request #39727: [SPARK-42174][PYTHON][INFRA] Use `scikit-learn` instead of `sklearn`
dongjoon-hyun commented on PR #39727: URL: https://github.com/apache/spark/pull/39727#issuecomment-1402900556 Thank you, @HyukjinKwon ! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] sadikovi commented on pull request #39729: [SPARK-42175][SQL] Fix cast of a boolean value to timestamp
sadikovi commented on PR #39729: URL: https://github.com/apache/spark/pull/39729#issuecomment-1402899890 @dongjoon-hyun @srowen Can you review this PR? Thanks. It is a small fix for `Cast` expression. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] sadikovi opened a new pull request, #39729: [SPARK-42175] Fix cast of a boolean value to timestamp
sadikovi opened a new pull request, #39729: URL: https://github.com/apache/spark/pull/39729 ### What changes were proposed in this pull request? The PR fixes an issue when casting a boolean to timestamp. While `select cast(true as timestamp)` works and returns `1970-01-01T00:00:00.000+`, casting `false` to timestamp fails with the following error: ``` IllegalArgumentException: requirement failed: Literal must have a corresponding value to timestamp, but class Integer found. ``` SBT test also fails with this error: ``` [info] java.lang.ClassCastException: java.lang.Integer cannot be cast to java.lang.Long [info] at scala.runtime.BoxesRunTime.unboxToLong(BoxesRunTime.java:107) [info] at org.apache.spark.sql.catalyst.InternalRow$.$anonfun$getWriter$5(InternalRow.scala:178) [info] at org.apache.spark.sql.catalyst.InternalRow$.$anonfun$getWriter$5$adapted(InternalRow.scala:178) ``` The issue was that we need to return `0L` instead of 0 when converting `false` to a long. ### Why are the changes needed? Fixes a small bug in cast. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? I added a unit test to verify the fix. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on a diff in pull request #39712: [SPARK-42172][CONNECT] Scala Client Mima Compatibility Tests
HyukjinKwon commented on code in PR #39712: URL: https://github.com/apache/spark/pull/39712#discussion_r1086096820 ## connector/connect/client/jvm/pom.xml: ## @@ -75,6 +76,13 @@ mockito-core test + Review Comment: Gotya. Let's probably add a couple of comments here and there to make it clear .. I am sure this is confusing to other developers. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] zhenlineo commented on a diff in pull request #39712: [SPARK-42172][CONNECT] Scala Client Mima Compatibility Tests
zhenlineo commented on code in PR #39712: URL: https://github.com/apache/spark/pull/39712#discussion_r1086095693 ## connector/connect/client/jvm/src/test/scala/org/apache/spark/sql/connect/client/CompatibilitySuite.scala: ## @@ -0,0 +1,139 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.spark.sql.connect.client + +import java.io.File +import java.net.URLClassLoader +import java.util.regex.Pattern + +import com.typesafe.tools.mima.core._ +import com.typesafe.tools.mima.lib.MiMaLib +import org.scalatest.funsuite.AnyFunSuite // scalastyle:ignore funsuite +import org.apache.spark.sql.connect.client.util.IntegrationTestUtils._ + +/** + * This test requires the following artifacts built before running the tests: + * {{{ + * spark-sql + * spark-connect-client-jvm + * }}} + * To build the above artifact, use e.g. `sbt package` or `mvn clean install -DskipTests`. + * + * When debugging this test, if any changes to the client API, the client jar need to be built + * before re-running the test. An example workflow with SBT for this test: + * 1. Compatibility test reported an unexpected client API change. + * 1. Fix the wrong client API. + * 1. Build the client jar: `sbt package` + * 1. Run the test again: `sbt "testOnly + * org.apache.spark.sql.connect.client.CompatibilitySuite"` + */ +class CompatibilitySuite extends AnyFunSuite { // scalastyle:ignore funsuite + + private lazy val clientJar: File = +findJar( + "connector/connect/client/jvm", + "spark-connect-client-jvm-assembly", + "spark-connect-client-jvm") + + private lazy val sqlJar: File = findJar("sql/core", "spark-sql", "spark-sql") + + test("compatibility mima tests") { +val mima = new MiMaLib(Seq(clientJar, sqlJar)) +val allProblems = mima.collectProblems(sqlJar, clientJar, List.empty) +val includedRules = Seq( + IncludeByName("org.apache.spark.sql.Column"), + IncludeByName("org.apache.spark.sql.Column$"), + IncludeByName("org.apache.spark.sql.Dataset"), + // TODO Add the Dataset object definition Review Comment: https://issues.apache.org/jira/browse/SPARK-42175 This was skipped as I do not want to include too much API impl with the compatibility test PR. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] zhenlineo commented on a diff in pull request #39712: [SPARK-42172][CONNECT] Scala Client Mima Compatibility Tests
zhenlineo commented on code in PR #39712: URL: https://github.com/apache/spark/pull/39712#discussion_r1086093964 ## connector/connect/client/jvm/pom.xml: ## @@ -75,6 +76,13 @@ mockito-core test + Review Comment: The SBT MiMa check has some limitations to run as a SBT rule: It is the best for a stable API. e.g. current vs previous. It is not very friendly to configure to test e.g. scala-client vs sql while we are actively working on the scala-client API. To be more specific, the problems I hit were: 1. I cannot configure the MiMa rule to find the current SQL SNAPSHOT jar. 2. I cannot use ClassLoader correctly in the SBT rule to load all methods in the client API. As a result, I end up this test where we have more freedom to grow the API test coverage with the client API. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] holdenk opened a new pull request, #39728: [SPARK-42173][CORE] RpcAddress equality can fail
holdenk opened a new pull request, #39728: URL: https://github.com/apache/spark/pull/39728 ### What changes were proposed in this pull request? When constructing an RpcAddress use InetUtils to get a consistently formatted IPv6 address if the env is for an IPv6 address. ### Why are the changes needed? We use RpcAddress equality for various tasks involving executors and a mismatch of equality can cause interesting errors. ### Does this PR introduce _any_ user-facing change? Log messages might change from sometimes having all the 0s in a v6 address present to not. ### How was this patch tested? Existing tests + new unit test showing that [::0:1] is formatted to [::1] -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon closed pull request #39656: [SPARK-42119][SQL] Add built-in table-valued functions inline and inline_outer
HyukjinKwon closed pull request #39656: [SPARK-42119][SQL] Add built-in table-valued functions inline and inline_outer URL: https://github.com/apache/spark/pull/39656 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on pull request #39656: [SPARK-42119][SQL] Add built-in table-valued functions inline and inline_outer
HyukjinKwon commented on PR #39656: URL: https://github.com/apache/spark/pull/39656#issuecomment-1402878991 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] zhenlineo commented on a diff in pull request #39712: [SPARK-42172][CONNECT] Scala Client Mima Compatibility Tests
zhenlineo commented on code in PR #39712: URL: https://github.com/apache/spark/pull/39712#discussion_r1086087241 ## connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/Column.scala: ## @@ -44,7 +45,7 @@ import org.apache.spark.sql.functions.lit * * @since 3.4.0 */ -class Column private[sql] (private[sql] val expr: proto.Expression) { +class Column private[sql] (private[sql] val expr: proto.Expression) extends Logging { Review Comment: The logging is needed for the binary compatibility: class type shall be exact the same. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon closed pull request #39727: [SPARK-42174][PYTHON][INFRA] Use `scikit-learn` instead of `sklearn`
HyukjinKwon closed pull request #39727: [SPARK-42174][PYTHON][INFRA] Use `scikit-learn` instead of `sklearn` URL: https://github.com/apache/spark/pull/39727 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on pull request #39727: [SPARK-42174][PYTHON][INFRA] Use `scikit-learn` instead of `sklearn`
HyukjinKwon commented on PR #39727: URL: https://github.com/apache/spark/pull/39727#issuecomment-1402877171 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on pull request #39717: [SPARK-42168][3.3][SQL][PYTHON] Fix required child distribution of FlatMapCoGroupsInPandas (as in CoGroup)
HyukjinKwon commented on PR #39717: URL: https://github.com/apache/spark/pull/39717#issuecomment-1402875318 cc @sunchao -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on a diff in pull request #39712: [SPARK-42172][CONNECT] Scala Client Mima Compatibility Tests
HyukjinKwon commented on code in PR #39712: URL: https://github.com/apache/spark/pull/39712#discussion_r1086085628 ## connector/connect/client/jvm/pom.xml: ## @@ -75,6 +76,13 @@ mockito-core test + Review Comment: Can we use SBT to check this instead of Maven? We have one place for MiMa so far in SBT (See also `project/MimaBuild.scala`, and `dev/mima`) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on a diff in pull request #39712: [SPARK-42172][CONNECT] Scala Client Mima Compatibility Tests
HyukjinKwon commented on code in PR #39712: URL: https://github.com/apache/spark/pull/39712#discussion_r1086082677 ## connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/Column.scala: ## @@ -44,7 +45,7 @@ import org.apache.spark.sql.functions.lit * * @since 3.4.0 */ -class Column private[sql] (private[sql] val expr: proto.Expression) { +class Column private[sql] (private[sql] val expr: proto.Expression) extends Logging { Review Comment: Seems like we're not using this Logging -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on a diff in pull request #39712: [SPARK-42172][CONNECT] Scala Client Mima Compatibility Tests
HyukjinKwon commented on code in PR #39712: URL: https://github.com/apache/spark/pull/39712#discussion_r1086082530 ## connector/connect/client/jvm/src/test/scala/org/apache/spark/sql/connect/client/CompatibilitySuite.scala: ## @@ -0,0 +1,139 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.spark.sql.connect.client + +import java.io.File +import java.net.URLClassLoader +import java.util.regex.Pattern + +import com.typesafe.tools.mima.core._ +import com.typesafe.tools.mima.lib.MiMaLib +import org.scalatest.funsuite.AnyFunSuite // scalastyle:ignore funsuite +import org.apache.spark.sql.connect.client.util.IntegrationTestUtils._ + +/** + * This test requires the following artifacts built before running the tests: + * {{{ + * spark-sql + * spark-connect-client-jvm + * }}} + * To build the above artifact, use e.g. `sbt package` or `mvn clean install -DskipTests`. + * + * When debugging this test, if any changes to the client API, the client jar need to be built + * before re-running the test. An example workflow with SBT for this test: + * 1. Compatibility test reported an unexpected client API change. + * 1. Fix the wrong client API. + * 1. Build the client jar: `sbt package` + * 1. Run the test again: `sbt "testOnly + * org.apache.spark.sql.connect.client.CompatibilitySuite"` + */ +class CompatibilitySuite extends AnyFunSuite { // scalastyle:ignore funsuite + + private lazy val clientJar: File = +findJar( + "connector/connect/client/jvm", + "spark-connect-client-jvm-assembly", + "spark-connect-client-jvm") + + private lazy val sqlJar: File = findJar("sql/core", "spark-sql", "spark-sql") + + test("compatibility mima tests") { +val mima = new MiMaLib(Seq(clientJar, sqlJar)) +val allProblems = mima.collectProblems(sqlJar, clientJar, List.empty) +val includedRules = Seq( + IncludeByName("org.apache.spark.sql.Column"), + IncludeByName("org.apache.spark.sql.Column$"), + IncludeByName("org.apache.spark.sql.Dataset"), + // TODO Add the Dataset object definition Review Comment: Let's probably file a JIRA -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon closed pull request #39724: [SPARK-41775][PYTHON][FOLLOWUP] Fix stdout rerouting
HyukjinKwon closed pull request #39724: [SPARK-41775][PYTHON][FOLLOWUP] Fix stdout rerouting URL: https://github.com/apache/spark/pull/39724 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on pull request #39724: [SPARK-41775][PYTHON][FOLLOWUP] Fix stdout rerouting
HyukjinKwon commented on PR #39724: URL: https://github.com/apache/spark/pull/39724#issuecomment-1402862371 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon closed pull request #39375: [SPARK-36124][SQL] Support subqueries with correlation through UNION
HyukjinKwon closed pull request #39375: [SPARK-36124][SQL] Support subqueries with correlation through UNION URL: https://github.com/apache/spark/pull/39375 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on pull request #39375: [SPARK-36124][SQL] Support subqueries with correlation through UNION
HyukjinKwon commented on PR #39375: URL: https://github.com/apache/spark/pull/39375#issuecomment-1402860722 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on pull request #39721: [SPARK-42171][PYSPARK][TESTS] Fix `pyspark-errors` module and enable it in GitHub Action
HyukjinKwon commented on PR #39721: URL: https://github.com/apache/spark/pull/39721#issuecomment-1402853084 Thank you! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] gengliangwang commented on a diff in pull request #39642: [SPARK-41677][CORE][SQL][SS][UI] Add Protobuf serializer for `StreamingQueryProgressWrapper`
gengliangwang commented on code in PR #39642: URL: https://github.com/apache/spark/pull/39642#discussion_r1086049823 ## sql/core/src/main/scala/org/apache/spark/status/protobuf/sql/StreamingQueryProgressSerializer.scala: ## @@ -0,0 +1,99 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.status.protobuf.sql + +import java.util.{HashMap => JHashMap, Map => JMap, UUID} + +import com.fasterxml.jackson.databind.json.JsonMapper +import com.fasterxml.jackson.module.scala.DefaultScalaModule + +import org.apache.spark.sql.Row +import org.apache.spark.sql.catalyst.expressions.GenericRowWithSchema +import org.apache.spark.sql.streaming.StreamingQueryProgress +import org.apache.spark.status.protobuf.StoreTypes +import org.apache.spark.status.protobuf.Utils.{getStringField, setStringField} + +private[protobuf] object StreamingQueryProgressSerializer { + + private val mapper: JsonMapper = JsonMapper.builder() +.addModule(DefaultScalaModule) +.build() + + def serialize(process: StreamingQueryProgress): StoreTypes.StreamingQueryProgress = { +import org.apache.spark.status.protobuf.Utils.setJMapField +val builder = StoreTypes.StreamingQueryProgress.newBuilder() +if (process.id != null) { + builder.setId(process.id.toString) +} +if (process.runId != null) { + builder.setRunId(process.runId.toString) +} +setStringField(process.name, builder.setName) +setStringField(process.timestamp, builder.setTimestamp) +builder.setBatchId(process.batchId) +builder.setBatchDuration(process.batchDuration) +setJMapField(process.durationMs, builder.putAllDurationMs) +setJMapField(process.eventTime, builder.putAllEventTime) +process.stateOperators.foreach( + s => builder.addStateOperators(StateOperatorProgressSerializer.serialize(s))) +process.sources.foreach( + s => builder.addSources(SourceProgressSerializer.serialize(s)) +) +builder.setSink(SinkProgressSerializer.serialize(process.sink)) +if (process.observedMetrics != null && !process.observedMetrics.isEmpty) { Review Comment: use setJMapField? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] gengliangwang commented on a diff in pull request #39642: [SPARK-41677][CORE][SQL][SS][UI] Add Protobuf serializer for `StreamingQueryProgressWrapper`
gengliangwang commented on code in PR #39642: URL: https://github.com/apache/spark/pull/39642#discussion_r1086049614 ## sql/core/src/main/scala/org/apache/spark/status/protobuf/sql/StreamingQueryProgressSerializer.scala: ## @@ -0,0 +1,99 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.status.protobuf.sql + +import java.util.{HashMap => JHashMap, Map => JMap, UUID} + +import com.fasterxml.jackson.databind.json.JsonMapper +import com.fasterxml.jackson.module.scala.DefaultScalaModule + +import org.apache.spark.sql.Row +import org.apache.spark.sql.catalyst.expressions.GenericRowWithSchema +import org.apache.spark.sql.streaming.StreamingQueryProgress +import org.apache.spark.status.protobuf.StoreTypes +import org.apache.spark.status.protobuf.Utils.{getStringField, setStringField} + +private[protobuf] object StreamingQueryProgressSerializer { + + private val mapper: JsonMapper = JsonMapper.builder() +.addModule(DefaultScalaModule) +.build() + + def serialize(process: StreamingQueryProgress): StoreTypes.StreamingQueryProgress = { +import org.apache.spark.status.protobuf.Utils.setJMapField +val builder = StoreTypes.StreamingQueryProgress.newBuilder() +if (process.id != null) { Review Comment: use setStringField? ## sql/core/src/main/scala/org/apache/spark/status/protobuf/sql/StreamingQueryProgressSerializer.scala: ## @@ -0,0 +1,99 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.status.protobuf.sql + +import java.util.{HashMap => JHashMap, Map => JMap, UUID} + +import com.fasterxml.jackson.databind.json.JsonMapper +import com.fasterxml.jackson.module.scala.DefaultScalaModule + +import org.apache.spark.sql.Row +import org.apache.spark.sql.catalyst.expressions.GenericRowWithSchema +import org.apache.spark.sql.streaming.StreamingQueryProgress +import org.apache.spark.status.protobuf.StoreTypes +import org.apache.spark.status.protobuf.Utils.{getStringField, setStringField} + +private[protobuf] object StreamingQueryProgressSerializer { + + private val mapper: JsonMapper = JsonMapper.builder() +.addModule(DefaultScalaModule) +.build() + + def serialize(process: StreamingQueryProgress): StoreTypes.StreamingQueryProgress = { +import org.apache.spark.status.protobuf.Utils.setJMapField +val builder = StoreTypes.StreamingQueryProgress.newBuilder() +if (process.id != null) { + builder.setId(process.id.toString) +} +if (process.runId != null) { + builder.setRunId(process.runId.toString) Review Comment: ditto -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun opened a new pull request, #39727: Use scikit-learn instead of sklearn
dongjoon-hyun opened a new pull request, #39727: URL: https://github.com/apache/spark/pull/39727 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] otterc commented on a diff in pull request #39725: [SPARK-33573][FOLLOW-UP] Increment ignoredBlockBytes when shuffle push blocks are late or colliding
otterc commented on code in PR #39725: URL: https://github.com/apache/spark/pull/39725#discussion_r1086007667 ## common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/RemoteBlockPushResolver.java: ## @@ -1393,6 +1405,7 @@ public void onData(String streamId, ByteBuffer buf) throws IOException { // the client in cases of duplicate even though no data is written. if (isDuplicateBlock()) { freeDeferredBufs(); +mergeManager.pushMergeMetrics.ignoredBlockBytes.mark(buf.limit()); return; } abortIfNecessary(); Review Comment: Here it should be `buf.remaining` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] rmcyang commented on a diff in pull request #39725: [SPARK-33573][FOLLOW-UP] Increment ignoredBlockBytes when shuffle push blocks are late or colliding
rmcyang commented on code in PR #39725: URL: https://github.com/apache/spark/pull/39725#discussion_r1086006244 ## common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/RemoteBlockPushResolver.java: ## @@ -1393,6 +1405,7 @@ public void onData(String streamId, ByteBuffer buf) throws IOException { // the client in cases of duplicate even though no data is written. if (isDuplicateBlock()) { freeDeferredBufs(); +mergeManager.pushMergeMetrics.ignoredBlockBytes.mark(buf.limit()); return; } abortIfNecessary(); Review Comment: Make sense, have updated the PR to call `updateIgnoredBytesWithDeferredBufs()` in `abortIfNecessary()`. ## common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/RemoteBlockPushResolver.java: ## @@ -1356,6 +1356,17 @@ private boolean isTooLate( !appShuffleMergePartitionsInfo.shuffleMergePartitions.containsKey(reduceId); } +/** + * If necessary, look for the ignored bock bytes and capture them in pushMergeMetrics. + */ +private void lookForIgnoredBlockBytesIfNecessary() { + if (deferredBufs != null && !deferredBufs.isEmpty()) { +for (ByteBuffer buf : deferredBufs) { + mergeManager.pushMergeMetrics.ignoredBlockBytes.mark(buf.limit()); Review Comment: Sounds good, have updated the PR. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] rmcyang commented on a diff in pull request #39725: [SPARK-33573][FOLLOW-UP] Increment ignoredBlockBytes when shuffle push blocks are late or colliding
rmcyang commented on code in PR #39725: URL: https://github.com/apache/spark/pull/39725#discussion_r1086005947 ## common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/RemoteBlockPushResolver.java: ## @@ -1356,6 +1356,17 @@ private boolean isTooLate( !appShuffleMergePartitionsInfo.shuffleMergePartitions.containsKey(reduceId); } +/** + * If necessary, look for the ignored bock bytes and capture them in pushMergeMetrics. + */ +private void lookForIgnoredBlockBytesIfNecessary() { Review Comment: Thanks for the suggestions, Chandni. Renamed it to `updateIgnoredBytesWithDeferredBufs` based on the former, as it sounds more smooth to me. Hopefully it looks better. Please let me know otherwise. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on a diff in pull request #39726: [SPARK-42123][SQL] Include column default values in DESCRIBE and SHOW CREATE TABLE output #39657
dongjoon-hyun commented on code in PR #39726: URL: https://github.com/apache/spark/pull/39726#discussion_r1085985386 ## sql/core/src/test/resources/sql-tests/inputs/show-create-table.sql: ## @@ -45,6 +45,14 @@ SHOW CREATE TABLE tbl; DROP TABLE tbl; +-- default column values +CREATE TABLE tbl (a INT, b STRING DEFAULT 'abc', c INT DEFAULT 42) USING parquet Review Comment: Shall we use more complex string like `'abc, def'`? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on a diff in pull request #39726: [SPARK-42123][SQL] Include column default values in DESCRIBE and SHOW CREATE TABLE output #39657
dongjoon-hyun commented on code in PR #39726: URL: https://github.com/apache/spark/pull/39726#discussion_r1085985386 ## sql/core/src/test/resources/sql-tests/inputs/show-create-table.sql: ## @@ -45,6 +45,14 @@ SHOW CREATE TABLE tbl; DROP TABLE tbl; +-- default column values +CREATE TABLE tbl (a INT, b STRING DEFAULT 'abc', c INT DEFAULT 42) USING parquet Review Comment: Shall we use more complex string like `abc, def`? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on a diff in pull request #39726: [SPARK-42123][SQL] Include column default values in DESCRIBE and SHOW CREATE TABLE output #39657
dongjoon-hyun commented on code in PR #39726: URL: https://github.com/apache/spark/pull/39726#discussion_r1085984751 ## sql/core/src/test/resources/sql-tests/inputs/describe.sql: ## @@ -97,3 +97,14 @@ DROP VIEW temp_v; DROP VIEW temp_Data_Source_View; DROP VIEW v; + +-- Show column default values +CREATE TABLE d (a STRING, b INT DEFAULT 42) USING parquet COMMENT 'table_comment'; Review Comment: Since this is a test case for DEFAULT value, please add a default value for column `a`, too, like `show-create-table.sql`. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on a diff in pull request #39726: [SPARK-42123][SQL] Include column default values in DESCRIBE and SHOW CREATE TABLE output #39657
dongjoon-hyun commented on code in PR #39726: URL: https://github.com/apache/spark/pull/39726#discussion_r1085984751 ## sql/core/src/test/resources/sql-tests/inputs/describe.sql: ## @@ -97,3 +97,14 @@ DROP VIEW temp_v; DROP VIEW temp_Data_Source_View; DROP VIEW v; + +-- Show column default values +CREATE TABLE d (a STRING, b INT DEFAULT 42) USING parquet COMMENT 'table_comment'; Review Comment: Since this is a test case for DEFAULT value, please add a default value for column `a`, too. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on a diff in pull request #39726: [SPARK-42123][SQL] Include column default values in DESCRIBE and SHOW CREATE TABLE output #39657
dongjoon-hyun commented on code in PR #39726: URL: https://github.com/apache/spark/pull/39726#discussion_r1085984055 ## sql/catalyst/src/main/scala/org/apache/spark/sql/types/StructField.scala: ## @@ -141,6 +141,10 @@ case class StructField( } } + private def getDDLDefault = getCurrentDefaultValue() +.map(" DEFAULT " + _) Review Comment: This looks a little weak. In case of STRING, we don't add quotation? If not, it looks like `DEFAULT THIS IS A DEFAULT VALUE`. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on a diff in pull request #39726: [SPARK-42123][SQL] Include column default values in DESCRIBE and SHOW CREATE TABLE output #39657
dongjoon-hyun commented on code in PR #39726: URL: https://github.com/apache/spark/pull/39726#discussion_r1085981834 ## sql/catalyst/src/main/scala/org/apache/spark/sql/types/StructField.scala: ## @@ -160,7 +164,7 @@ case class StructField( */ def toDDL: String = { val nullString = if (nullable) "" else " NOT NULL" -s"${quoteIfNeeded(name)} ${dataType.sql}${nullString}$getDDLComment" +s"${quoteIfNeeded(name)} ${dataType.sql}${nullString}$getDDLDefault$getDDLComment" Review Comment: Just curious, what happens we have a multi-line default value string like `a\n b\n c\n d`? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on a diff in pull request #39726: [SPARK-42123][SQL] Include column default values in DESCRIBE and SHOW CREATE TABLE output #39657
dongjoon-hyun commented on code in PR #39726: URL: https://github.com/apache/spark/pull/39726#discussion_r1085980466 ## sql/core/src/test/scala/org/apache/spark/sql/execution/command/v1/ShowCreateTableSuite.scala: ## @@ -161,6 +161,30 @@ trait ShowCreateTableSuiteBase extends command.ShowCreateTableSuiteBase assert(cause.getMessage.contains("Use `SHOW CREATE TABLE` without `AS SERDE` instead")) } } + + test("show create table with default column values") { +withNamespaceAndTable(ns, table) { t => + sql( +s""" + |CREATE TABLE $t ( + | a bigint NOT NULL, + | b bigint DEFAULT 42, + | c string DEFAULT 'abc' COMMENT 'comment' + |) + |using parquet Review Comment: `using` -> `USING` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on a diff in pull request #39726: [SPARK-42123][SQL] Include column default values in DESCRIBE and SHOW CREATE TABLE output #39657
dongjoon-hyun commented on code in PR #39726: URL: https://github.com/apache/spark/pull/39726#discussion_r1085978481 ## sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala: ## @@ -645,6 +645,17 @@ case class DescribeTableCommand( } else if (isExtended) { describeFormattedTableInfo(metadata, result) } + + // If any columns have default values, append them to the result. + if (metadata.schema.fields.exists(_.metadata.contains( +ResolveDefaultColumns.CURRENT_DEFAULT_COLUMN_METADATA_KEY))) { +append(result, "", "", "") +append(result, "# Column Default Information", "", "") Review Comment: Maybe, `Default` -> `Default Value`? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on a diff in pull request #39726: [SPARK-42123][SQL] Include column default values in DESCRIBE and SHOW CREATE TABLE output #39657
dongjoon-hyun commented on code in PR #39726: URL: https://github.com/apache/spark/pull/39726#discussion_r1085977201 ## sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala: ## @@ -645,6 +645,17 @@ case class DescribeTableCommand( } else if (isExtended) { describeFormattedTableInfo(metadata, result) } + + // If any columns have default values, append them to the result. + if (metadata.schema.fields.exists(_.metadata.contains( +ResolveDefaultColumns.CURRENT_DEFAULT_COLUMN_METADATA_KEY))) { Review Comment: indentation? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] vinodkc commented on a diff in pull request #39449: [SPARK-40688][SQL] Support data masking built-in function 'mask_first_n'
vinodkc commented on code in PR #39449: URL: https://github.com/apache/spark/pull/39449#discussion_r1085963701 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/maskExpressions.scala: ## @@ -257,19 +271,272 @@ case class Mask( otherChar = newChildren(4)) } -case class MaskArgument(maskChar: Char, ignore: Boolean) +// scalastyle:off line.size.limit +@ExpressionDescription( + usage = +"""_FUNC_(input[, charCount, upperChar, lowerChar, digitChar, otherChar]) - masks the first n characters of given string value. + The function masks the first n characters of the value with 'X' or 'x', and numbers with 'n'. + This can be useful for creating copies of tables with sensitive information removed. + Error behavior: null value as replacement argument will throw AnalysisError. + """, + arguments = """ +Arguments: + * input - string value to mask. Supported types: STRING, VARCHAR, CHAR + * charCount - number of characters to be masked. Default value: 4 + * upperChar - character to replace upper-case characters with. Specify NULL to retain original character. Default value: 'X' + * lowerChar - character to replace lower-case characters with. Specify NULL to retain original character. Default value: 'x' + * digitChar - character to replace digit characters with. Specify NULL to retain original character. Default value: 'n' + * otherChar - character to replace all other characters with. Specify NULL to retain original character. Default value: NULL + """, + examples = """ +Examples: Review Comment: This case already exists `SELECT _FUNC_('abcd-EFGH-8765-@$#', 20, 'x', 'X', 'n', 'o');`, input string length 18, request to mask 20 chars, Result: `oo` Considers all chars (18)in the input string for masking -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] otterc commented on a diff in pull request #39725: [SPARK-33573][FOLLOW-UP] Increment ignoredBlockBytes when shuffle push blocks are late or colliding
otterc commented on code in PR #39725: URL: https://github.com/apache/spark/pull/39725#discussion_r1085930450 ## common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/RemoteBlockPushResolver.java: ## @@ -1356,6 +1356,17 @@ private boolean isTooLate( !appShuffleMergePartitionsInfo.shuffleMergePartitions.containsKey(reduceId); } +/** + * If necessary, look for the ignored bock bytes and capture them in pushMergeMetrics. + */ +private void lookForIgnoredBlockBytesIfNecessary() { Review Comment: nit: rename to `updatedIgnoreBytesWithDeferredBufs` or `deferredBufsIgnored` or something similar. ## common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/RemoteBlockPushResolver.java: ## @@ -1393,6 +1405,7 @@ public void onData(String streamId, ByteBuffer buf) throws IOException { // the client in cases of duplicate even though no data is written. if (isDuplicateBlock()) { freeDeferredBufs(); +mergeManager.pushMergeMetrics.ignoredBlockBytes.mark(buf.limit()); return; } abortIfNecessary(); Review Comment: Even when the server aborts, it ignores the bytes. ## common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/RemoteBlockPushResolver.java: ## @@ -1356,6 +1356,17 @@ private boolean isTooLate( !appShuffleMergePartitionsInfo.shuffleMergePartitions.containsKey(reduceId); } +/** + * If necessary, look for the ignored bock bytes and capture them in pushMergeMetrics. + */ +private void lookForIgnoredBlockBytesIfNecessary() { + if (deferredBufs != null && !deferredBufs.isEmpty()) { +for (ByteBuffer buf : deferredBufs) { + mergeManager.pushMergeMetrics.ignoredBlockBytes.mark(buf.limit()); Review Comment: I think we should use `buf.remaining` here -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org