[GitHub] [spark] gengliangwang commented on pull request #39100: [SPARK-41422][UI] Protobuf serializer for ExecutorStageSummaryWrapper

2022-12-16 Thread GitBox


gengliangwang commented on PR #39100:
URL: https://github.com/apache/spark/pull/39100#issuecomment-1356109099

   @techaddict Yes it is for ExecutorStageSummaryWrapper. I just updated the 
title.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] MaxGekk commented on a diff in pull request #38861: [SPARK-41294][SQL] Assign a name to the error class _LEGACY_ERROR_TEMP_1203 / 1168

2022-12-16 Thread GitBox


MaxGekk commented on code in PR #38861:
URL: https://github.com/apache/spark/pull/38861#discussion_r1051345163


##
sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryCompilationErrors.scala:
##
@@ -2018,15 +2007,29 @@ private[sql] object QueryCompilationErrors extends 
QueryErrorsBase {
 "tableColumns" -> expected.map(c => s"'${c.name}'").mkString(", "),
 "dataColumns" -> query.output.map(c => s"'${c.name}'").mkString(", ")))
   }
+  
+  def numColumnsMismatchError(
+  operator: LogicalPlan,
+  firstNumColumns: Int,
+  invalidOrdinalNum: Int,
+  invalidNumColumns: Int): Throwable = {
+
+def ordinalNumber(i: Int): String = i match {
+  case 0 => "first"
+  case 1 => "second"
+  case 2 => "third"
+  case i => s"${i + 1}th"
+}
 
-  def cannotWriteNotEnoughColumnsToTableError(
-  tableName: String, expected: Seq[Attribute], query: LogicalPlan): 
Throwable = {
 new AnalysisException(
-  errorClass = "_LEGACY_ERROR_TEMP_1203",
+  errorClass = "NUM_COLUMNS_MISMATCH",
   messageParameters = Map(
-"tableName" -> tableName,
-"tableColumns" -> expected.map(c => s"'${c.name}'").mkString(", "),
-"dataColumns" -> query.output.map(c => s"'${c.name}'").mkString(", ")))
+"operator" -> toSQLStmt(operator.nodeName),

Review Comment:
   > It's weird, maybe we should pin it to INSERT INTO?
   
   So far, let's do that.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] techaddict opened a new pull request, #39105: [SPARK-41426] Protobuf serializer for ResourceProfileWrapper

2022-12-16 Thread GitBox


techaddict opened a new pull request, #39105:
URL: https://github.com/apache/spark/pull/39105

   ### What changes were proposed in this pull request?
   Add Protobuf serializer for ResourceProfileWrapper
   
   ### Why are the changes needed?
   Support fast and compact serialization/deserialization for 
ResourceProfileWrapper over RocksDB.
   
   ### Does this PR introduce any user-facing change?
   No
   
   ### How was this patch tested?
   New UT


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SandishKumarHN commented on a diff in pull request #38922: [SPARK-41396][SQL][PROTOBUF] OneOf field support and recursion checks

2022-12-16 Thread GitBox


SandishKumarHN commented on code in PR #38922:
URL: https://github.com/apache/spark/pull/38922#discussion_r1051328356


##
connector/protobuf/src/main/scala/org/apache/spark/sql/protobuf/utils/ProtobufOptions.scala:
##
@@ -38,6 +38,14 @@ private[sql] class ProtobufOptions(
 
   val parseMode: ParseMode =
 parameters.get("mode").map(ParseMode.fromString).getOrElse(FailFastMode)
+
+  // Setting the `recursive.fields.max.depth` to 0 allows the field to be 
recurse once,

Review Comment:
   @rangadi Thank you for your suggestion. I have implemented it by adding a 
comment and a unit test to make the example more clear to users.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SandishKumarHN commented on a diff in pull request #38922: [SPARK-41396][SQL][PROTOBUF] OneOf field support and recursion checks

2022-12-16 Thread GitBox


SandishKumarHN commented on code in PR #38922:
URL: https://github.com/apache/spark/pull/38922#discussion_r1051328177


##
core/src/main/resources/error/error-classes.json:
##
@@ -1016,7 +1016,7 @@
   },
   "RECURSIVE_PROTOBUF_SCHEMA" : {
 "message" : [
-  "Found recursive reference in Protobuf schema, which can not be 
processed by Spark: "
+  "Found recursive reference in Protobuf schema, which can not be 
processed by Spark by default: . try setting the option 
`recursive.fields.max.depth` as 0 or 1 or 2. Going beyond 3 levels of recursion 
is not allowed."

Review Comment:
   @rangadi agree. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] techaddict opened a new pull request, #39104: [SPARK-41425] Protobuf serializer for RDDStorageInfoWrapper

2022-12-16 Thread GitBox


techaddict opened a new pull request, #39104:
URL: https://github.com/apache/spark/pull/39104

   ### What changes were proposed in this pull request?
   Add Protobuf serializer for RDDStorageInfoWrapper
   
   ### Why are the changes needed?
   Support fast and compact serialization/deserialization for 
RDDStorageInfoWrapper over RocksDB.
   
   ### Does this PR introduce any user-facing change?
   No
   
   ### How was this patch tested?
   New UT


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] techaddict commented on pull request #39100: [SPARK-41422][UI] Protobuf serializer for ExecutorStageSummary

2022-12-16 Thread GitBox


techaddict commented on PR #39100:
URL: https://github.com/apache/spark/pull/39100#issuecomment-1356054846

   @gengliangwang is this not for 
https://issues.apache.org/jira/browse/SPARK-41427 ?
   This one looks like PR for `ExecutorStageSummaryWrapper` where as issue 
[SPARK-41422] was for `ExecutorSummaryWrapper`


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] techaddict commented on pull request #39093: [SPARK-41420][UI] Protobuf serializer for ApplicationInfoWrapper

2022-12-16 Thread GitBox


techaddict commented on PR #39093:
URL: https://github.com/apache/spark/pull/39093#issuecomment-1356050497

   @gengliangwang thanks for the review, updated the PR


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] shuyouZZ commented on pull request #38983: [SPARK-41447][CORE] Reduce the number of `doMergeApplicationListing` invocations

2022-12-16 Thread GitBox


shuyouZZ commented on PR #38983:
URL: https://github.com/apache/spark/pull/38983#issuecomment-1356044762

   > @shuyouZZ . I added you to the Apache Spark contributor group and assigned 
[SPARK-41447](https://issues.apache.org/jira/browse/SPARK-41447) to you. 
Welcome to the Apache Spark community!
   
   Thanks for your review. @dongjoon-hyun @mridulm @thejdeep 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] shardulm94 commented on pull request #38676: [SPARK-41162][SQL] Do not push down anti-join predicates that become ambiguous

2022-12-16 Thread GitBox


shardulm94 commented on PR #38676:
URL: https://github.com/apache/spark/pull/38676#issuecomment-1356030772

   I tried looking into this a bit
   > @EnricoMi @cloud-fan Could we fix the DeduplicateRelations? It did not 
generate different expression IDs for all conflicting attributes:
   
   As @EnricoMi said `DeduplicateRelations` only considers the output attrs of 
the left and right, which do not conflict here. Also the `Project` case in 
`PushDownLeftSemiAntiJoin` calls [this 
method](https://github.com/apache/spark/blob/45d9daa2ecf6081ef1d031065a9c0e9a3a7f7a58/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/PushDownLeftSemiAntiJoin.scala#L119)
 which seems to check for self-join case based on conflicting expression ids. 
This makes me believe the duplicate expression IDs are expected here and hence 
DeduplicateRelations may not be at fault.
   
   Similar to the `Project` case, should we add a check like 
`canPushThroughCondition(Seq(agg.child), joinCond, rightOp)` to ensure that it 
is safe to push the join down an `Aggregate` node too?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] wankunde commented on pull request #38560: [SPARK-38005][core] Support cleaning up merged shuffle files and state from external shuffle service

2022-12-16 Thread GitBox


wankunde commented on PR #38560:
URL: https://github.com/apache/spark/pull/38560#issuecomment-1356030075

   Hi, @yabola @mridulm , I will update SPARK-40480 this weekend. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] wankunde commented on pull request #38496: [SPARK-40708][SQL] Auto update table statistics based on write metrics

2022-12-16 Thread GitBox


wankunde commented on PR #38496:
URL: https://github.com/apache/spark/pull/38496#issuecomment-1356027243

   Hi, @jackylee-ch @melin any update ?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HeartSaVioR commented on a diff in pull request #39082: [SPARK-41539][SQL] Remap stats and constraints against output in logical plan for LogicalRDD

2022-12-16 Thread GitBox


HeartSaVioR commented on code in PR #39082:
URL: https://github.com/apache/spark/pull/39082#discussion_r1051306854


##
sql/core/src/main/scala/org/apache/spark/sql/execution/ExistingRDD.scala:
##
@@ -183,16 +172,80 @@ object LogicalRDD {
   }
 }
 
+val logicalPlan = originDataset.logicalPlan
 val optimizedPlan = originDataset.queryExecution.optimizedPlan
 val executedPlan = originDataset.queryExecution.executedPlan
 
+val (stats, constraints) = rewriteStatsAndConstraints(logicalPlan, 
optimizedPlan)
+
 LogicalRDD(
   originDataset.logicalPlan.output,
   rdd,
   firstLeafPartitioning(executedPlan.outputPartitioning),
   executedPlan.outputOrdering,
   isStreaming
-)(originDataset.sparkSession, Some(optimizedPlan.stats), 
Some(optimizedPlan.constraints))
+)(originDataset.sparkSession, stats, constraints)
+  }
+
+  private[sql] def buildOutputAssocForRewrite(
+  source: Seq[Attribute],
+  destination: Seq[Attribute]): Option[Map[Attribute, Attribute]] = {
+// We check the name and type, allowing nullability, exprId, metadata, 
qualifier be different
+// E.g. This could happen during optimization phase.
+val rewrite = source.zip(destination).flatMap { case (attr1, attr2) =>
+  if (attr1.name == attr2.name && attr1.dataType == attr2.dataType) {
+Some(attr1 -> attr2)
+  } else {
+None
+  }
+}.toMap
+
+if (rewrite.size == source.size) {
+  Some(rewrite)
+} else {
+  None
+}
+  }
+
+  private[sql] def rewriteStatsAndConstraints(
+  logicalPlan: LogicalPlan,
+  optimizedPlan: LogicalPlan): (Option[Statistics], Option[ExpressionSet]) 
= {
+val rewrite = buildOutputAssocForRewrite(optimizedPlan.output, 
logicalPlan.output)
+
+rewrite.map { rw =>
+  val rewrittenStatistics = rewriteStatistics(optimizedPlan.stats, rw)
+  val rewrittenConstraints = rewriteConstraints(optimizedPlan.constraints, 
rw)
+
+  (Some(rewrittenStatistics), Some(rewrittenConstraints))
+}.getOrElse {
+  // can't rewrite stats and constraints, give up
+  logWarning("The output columns are expected to the same (for name and 
type) for output " +

Review Comment:
   Yeah my understanding is that the final output after optimization should be 
semantically same. Looks like exprId can change, but I'm not sure what else we 
allow to be changed. I'm also just speculating.
   
   My feeling is that changes on type or nullability could also break DSv1 
sink, if the change is not done with "compatible way". e.g. table in sink has a 
column nullability set to "false", and optimizer somehow (shouldn't happen but) 
changes nullability of a column from "false" to "true" (again, shouldn't 
happen). Same applies to data type as well. I feel column order also matters, 
but not sure we pretend that the consumer should access column by name.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HeartSaVioR commented on a diff in pull request #39082: [SPARK-41539][SQL] Remap stats and constraints against output in logical plan for LogicalRDD

2022-12-16 Thread GitBox


HeartSaVioR commented on code in PR #39082:
URL: https://github.com/apache/spark/pull/39082#discussion_r1051306854


##
sql/core/src/main/scala/org/apache/spark/sql/execution/ExistingRDD.scala:
##
@@ -183,16 +172,80 @@ object LogicalRDD {
   }
 }
 
+val logicalPlan = originDataset.logicalPlan
 val optimizedPlan = originDataset.queryExecution.optimizedPlan
 val executedPlan = originDataset.queryExecution.executedPlan
 
+val (stats, constraints) = rewriteStatsAndConstraints(logicalPlan, 
optimizedPlan)
+
 LogicalRDD(
   originDataset.logicalPlan.output,
   rdd,
   firstLeafPartitioning(executedPlan.outputPartitioning),
   executedPlan.outputOrdering,
   isStreaming
-)(originDataset.sparkSession, Some(optimizedPlan.stats), 
Some(optimizedPlan.constraints))
+)(originDataset.sparkSession, stats, constraints)
+  }
+
+  private[sql] def buildOutputAssocForRewrite(
+  source: Seq[Attribute],
+  destination: Seq[Attribute]): Option[Map[Attribute, Attribute]] = {
+// We check the name and type, allowing nullability, exprId, metadata, 
qualifier be different
+// E.g. This could happen during optimization phase.
+val rewrite = source.zip(destination).flatMap { case (attr1, attr2) =>
+  if (attr1.name == attr2.name && attr1.dataType == attr2.dataType) {
+Some(attr1 -> attr2)
+  } else {
+None
+  }
+}.toMap
+
+if (rewrite.size == source.size) {
+  Some(rewrite)
+} else {
+  None
+}
+  }
+
+  private[sql] def rewriteStatsAndConstraints(
+  logicalPlan: LogicalPlan,
+  optimizedPlan: LogicalPlan): (Option[Statistics], Option[ExpressionSet]) 
= {
+val rewrite = buildOutputAssocForRewrite(optimizedPlan.output, 
logicalPlan.output)
+
+rewrite.map { rw =>
+  val rewrittenStatistics = rewriteStatistics(optimizedPlan.stats, rw)
+  val rewrittenConstraints = rewriteConstraints(optimizedPlan.constraints, 
rw)
+
+  (Some(rewrittenStatistics), Some(rewrittenConstraints))
+}.getOrElse {
+  // can't rewrite stats and constraints, give up
+  logWarning("The output columns are expected to the same (for name and 
type) for output " +

Review Comment:
   Yeah my understanding is that the final output after optimization should be 
semantically same. Looks like exprId can change, but I'm not sure what else we 
allow to be changed. 
   
   My feeling is that changes on type or nullability could also break DSv1 
sink, if the change is not done with "compatible way". e.g. table in sink has a 
column nullability set to "false", and optimizer somehow (shouldn't happen but) 
changes nullability of a column from "false" to "true" (again, shouldn't 
happen). Same applies to data type as well. I feel column order also matters, 
but not sure we pretend that the consumer should access column by name.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dengziming commented on a diff in pull request #38984: [SPARK-41349][CONNECT][PYTHON] Implement DataFrame.hint

2022-12-16 Thread GitBox


dengziming commented on code in PR #38984:
URL: https://github.com/apache/spark/pull/38984#discussion_r1051305047


##
python/pyspark/sql/connect/dataframe.py:
##
@@ -875,6 +875,30 @@ def to_jcols(
 
 melt = unpivot
 
+def hint(self, name: str, *params: Any) -> "DataFrame":
+"""
+Specifies some hint on the current DataFrame. As an example, the 
following code specifies
+that one of the plan can be broadcasted: 
`df1.join(df2.hint("broadcast"))`
+
+.. versionadded:: 3.4.0
+
+Parameters
+--
+name: str
+the name of the hint, for example, "broadcast", "SHUFFLE_MERGE" 
and "shuffle_hash".
+params: tuple
+the parameters of the hint

Review Comment:
   Yeah, It's worth improving the docs of the parameters.



##
python/pyspark/sql/connect/plan.py:
##
@@ -343,6 +343,51 @@ def _repr_html_(self) -> str:
 """
 
 
+class Hint(LogicalPlan):
+"""Logical plan object for a Hint operation."""
+
+def __init__(self, child: Optional["LogicalPlan"], name: str, params: 
List[Any]) -> None:
+super().__init__(child)
+self.name = name
+self.params = params
+
+def _convert_value(self, v: Any) -> proto.Expression.Literal:
+value = proto.Expression.Literal()
+if v is None:
+value.null = True
+elif isinstance(v, int):
+value.integer = v
+else:
+value.string = v
+return value

Review Comment:
   I improved the error handing logic here, and there are 4 occurrence of this 
similar logic, I'm planing to refactor it to reuse the existing code.



##
python/pyspark/sql/tests/connect/test_connect_basic.py:
##
@@ -829,6 +829,13 @@ def test_with_columns(self):
 .toPandas(),
 )
 
+def test_hint(self):
+# SPARK-41349: Test hint

Review Comment:
   Good catch, I added 4 more test cases for it.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dengziming commented on pull request #38984: [SPARK-41349][CONNECT][PYTHON] Implement DataFrame.hint

2022-12-16 Thread GitBox


dengziming commented on PR #38984:
URL: https://github.com/apache/spark/pull/38984#issuecomment-1356014591

   @grundprinzip Thank you for your reviews, I have resolved these comments.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] gengliangwang commented on a diff in pull request #39093: [SPARK-41420][UI] Protobuf serializer for ApplicationInfoWrapper

2022-12-16 Thread GitBox


gengliangwang commented on code in PR #39093:
URL: https://github.com/apache/spark/pull/39093#discussion_r1051304099


##
core/src/test/scala/org/apache/spark/status/protobuf/KVStoreProtobufSerializerSuite.scala:
##
@@ -176,4 +176,59 @@ class KVStoreProtobufSerializerSuite extends SparkFunSuite 
{
 assert(result.stageId == input.stageId)
 assert(result.stageAttemptId == input.stageAttemptId)
   }
+
+  test("Application Info") {
+val attempts: Seq[ApplicationAttemptInfo] = Seq(
+  ApplicationAttemptInfo(
+attemptId = Some("001"),
+startTime = new Date(1),
+endTime = new Date(10),
+lastUpdated = new Date(10),

Review Comment:
   Let's have different values in the test case to avoid mistakes.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] beliefer commented on pull request #39091: [SPARK-41527][CONNECT][PYTHON] Implement DataFrame.observe

2022-12-16 Thread GitBox


beliefer commented on PR #39091:
URL: https://github.com/apache/spark/pull/39091#issuecomment-1356003824

   > @beliefer thanks for working on this. I have one question how are we going 
to get the observed metrics to the client? This seems to be missing from the 
implementation. One of the approaches would be to send it in a similar way as 
the metrics in the result code path.
   
   Good question. The result of datasets could passed by grpc server. But the 
ObservationListener runs on server, it seems we need another way to get.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] mridulm commented on a diff in pull request #39093: [SPARK-41420][UI] Protobuf serializer for ApplicationInfoWrapper

2022-12-16 Thread GitBox


mridulm commented on code in PR #39093:
URL: https://github.com/apache/spark/pull/39093#discussion_r1051302962


##
core/src/main/protobuf/org/apache/spark/status/protobuf/store_types.proto:
##
@@ -106,3 +106,28 @@ message TaskDataWrapper {
   int64 stage_id = 40;
   int32 stage_attempt_id = 41;
 }
+
+message ApplicationAttemptInfo {
+  optional string attempt_id = 1;
+  int64 start_time = 2;
+  int64 end_time = 3;
+  int64 last_updated = 4;
+  int64 duration = 5;
+  string spark_user = 6;
+  bool completed = 7;
+  string app_spark_version = 8;
+}
+
+message ApplicationInfo {
+  string id = 1;
+  string name = 2;
+  optional int32 cores_granted = 3;
+  optional int32 max_cores = 4;
+  optional int32 cores_per_executor = 5;
+  optional int32 memory_per_executor_mb = 6;
+  repeated ApplicationAttemptInfo attempts = 7;
+}
+
+message ApplicationInfoWrapper {
+  ApplicationInfo info = 1;

Review Comment:
   My bad, ended up looking at the wrong file :) Thx for clarifying !



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] beliefer commented on a diff in pull request #39091: [SPARK-41527][CONNECT][PYTHON] Implement DataFrame.observe

2022-12-16 Thread GitBox


beliefer commented on code in PR #39091:
URL: https://github.com/apache/spark/pull/39091#discussion_r1051302794


##
connector/connect/server/src/main/scala/org/apache/spark/sql/connect/planner/SparkConnectPlanner.scala:
##
@@ -338,6 +340,22 @@ class SparkConnectPlanner(session: SparkSession) {
 }
   }
 
+  private def transformCollectMetrics(rel: proto.CollectMetrics): LogicalPlan 
= {
+val metrics = rel.getMetricsList.asScala.map { expr =>
+  Column(transformExpression(expr))
+}
+
+if (rel.getIsObservation) {

Review Comment:
   The Observation registers `ObservationListener` on 
`ExecutionListenerManager`.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] beliefer commented on a diff in pull request #39091: [SPARK-41527][CONNECT][PYTHON] Implement DataFrame.observe

2022-12-16 Thread GitBox


beliefer commented on code in PR #39091:
URL: https://github.com/apache/spark/pull/39091#discussion_r1051302541


##
connector/connect/server/src/main/scala/org/apache/spark/sql/connect/dsl/package.scala:
##
@@ -126,6 +126,20 @@ package object dsl {
   
Expression.UnresolvedFunction.newBuilder().setFunctionName("min").addArguments(e))
 .build()
 
+def proto_max(e: Expression): Expression =

Review Comment:
   I just follows up the existing `proto_min`.



##
connector/connect/server/src/main/scala/org/apache/spark/sql/connect/dsl/package.scala:
##
@@ -126,6 +126,20 @@ package object dsl {
   
Expression.UnresolvedFunction.newBuilder().setFunctionName("min").addArguments(e))
 .build()
 
+def proto_max(e: Expression): Expression =

Review Comment:
   I just follow up the existing `proto_min`.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] beliefer commented on a diff in pull request #39084: [SPARK-41464][CONNECT][PYTHON] Implement `DataFrame.to`

2022-12-16 Thread GitBox


beliefer commented on code in PR #39084:
URL: https://github.com/apache/spark/pull/39084#discussion_r1051301969


##
python/pyspark/sql/connect/dataframe.py:
##
@@ -1631,6 +1632,47 @@ def inputFiles(self) -> List[str]:
 query = self._plan.to_proto(self._session.client)
 return self._session.client._analyze(query).input_files
 
+def to(self, schema: Union[DataType, str]) -> "DataFrame":

Review Comment:
   I have created https://github.com/apache/spark/pull/39103 let 
`pyspark_types_to_proto_types` support `StructType`.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] beliefer opened a new pull request, #39103: [SPARK-41546][CONNECT][PYTHON] `pyspark_types_to_proto_types` should support StructType.

2022-12-16 Thread GitBox


beliefer opened a new pull request, #39103:
URL: https://github.com/apache/spark/pull/39103

   ### What changes were proposed in this pull request?
   Currently, `pyspark_types_to_proto_types` used to transform pyspark 
datatypes to protobuffer datatypes.
   But it only supports the primary data type, no struct type.
   Many connect API need to transform pyspark struct type to protobuffer struct 
type. For example, `createDataFrame`, `DataFrame.to` and so on.
   
   
   ### Why are the changes needed?
   This PR let `pyspark_types_to_proto_types` support `StructType`.
   
   
   ### Does this PR introduce _any_ user-facing change?
   'No'.
   New feature.
   
   
   ### How was this patch tested?
   New tests.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] yabola commented on pull request #39087: [SPARK-41365][UI][3.3] Stages UI page fails to load for proxy in specific yarn environment

2022-12-16 Thread GitBox


yabola commented on PR #39087:
URL: https://github.com/apache/spark/pull/39087#issuecomment-1355977687

   @dongjoon-hyun Thank you very much!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] rangadi commented on a diff in pull request #38922: [SPARK-41396][SQL][PROTOBUF] OneOf field support and recursion checks

2022-12-16 Thread GitBox


rangadi commented on code in PR #38922:
URL: https://github.com/apache/spark/pull/38922#discussion_r1051293916


##
connector/protobuf/src/test/scala/org/apache/spark/sql/protobuf/ProtobufFunctionsSuite.scala:
##
@@ -693,4 +693,435 @@ class ProtobufFunctionsSuite extends QueryTest with 
SharedSparkSession with Prot
   errorClass = "CANNOT_CONSTRUCT_PROTOBUF_DESCRIPTOR",
   parameters = Map("descFilePath" -> testFileDescriptor))
   }
+
+  test("Verify OneOf field between from_protobuf -> to_protobuf and struct -> 
from_protobuf") {
+val descriptor = ProtobufUtils.buildDescriptor(testFileDesc, "OneOfEvent")
+val oneOfEvent = OneOfEvent.newBuilder()
+  .setKey("key")
+  .setCol1(123)
+  .setCol3(109202L)
+  .setCol2("col2value")
+  .addCol4("col4value").build()
+
+val df = Seq(oneOfEvent.toByteArray).toDF("value")
+
+checkWithFileAndClassName("OneOfEvent") {
+  case (name, descFilePathOpt) =>
+val fromProtoDf = df.select(
+  from_protobuf_wrapper($"value", name, descFilePathOpt) as 'sample)
+val toDf = fromProtoDf.select(
+  to_protobuf_wrapper($"sample", name, descFilePathOpt) as 'toProto)
+val toFromDf = toDf.select(
+  from_protobuf_wrapper($"toProto", name, descFilePathOpt) as 
'fromToProto)
+checkAnswer(fromProtoDf, toFromDf)
+val actualFieldNames = 
fromProtoDf.select("sample.*").schema.fields.toSeq.map(f => f.name)
+descriptor.getFields.asScala.map(f => {
+  assert(actualFieldNames.contains(f.getName))
+})
+
+val eventFromSpark = OneOfEvent.parseFrom(
+  toDf.select("toProto").take(1).toSeq(0).getAs[Array[Byte]](0))
+// OneOf field: the last set value(by order) will overwrite all 
previous ones.
+assert(eventFromSpark.getCol2.equals("col2value"))
+assert(eventFromSpark.getCol3 == 0)
+val expectedFields = descriptor.getFields.asScala.map(f => f.getName)
+eventFromSpark.getDescriptorForType.getFields.asScala.map(f => {
+  assert(expectedFields.contains(f.getName))
+})
+
+val jsonSchema =
+  s"""
+ |{
+ |  "type" : "struct",
+ |  "fields" : [ {
+ |"name" : "sample",
+ |"type" : {
+ |  "type" : "struct",
+ |  "fields" : [ {
+ |"name" : "key",
+ |"type" : "string",
+ |"nullable" : true
+ |  }, {
+ |"name" : "col_1",
+ |"type" : "integer",
+ |"nullable" : true
+ |  }, {
+ |"name" : "col_2",
+ |"type" : "string",
+ |"nullable" : true
+ |  }, {
+ |"name" : "col_3",
+ |"type" : "long",
+ |"nullable" : true
+ |  }, {
+ |"name" : "col_4",
+ |"type" : {
+ |  "type" : "array",
+ |  "elementType" : "string",
+ |  "containsNull" : false
+ |},
+ |"nullable" : false
+ |  } ]
+ |},
+ |"nullable" : true
+ |  } ]
+ |}
+ |{
+ |  "type" : "struct",
+ |  "fields" : [ {
+ |"name" : "sample",
+ |"type" : {
+ |  "type" : "struct",
+ |  "fields" : [ {
+ |"name" : "key",
+ |"type" : "string",
+ |"nullable" : true
+ |  }, {
+ |"name" : "col_1",
+ |"type" : "integer",
+ |"nullable" : true
+ |  }, {
+ |"name" : "col_2",
+ |"type" : "string",
+ |"nullable" : true
+ |  }, {
+ |"name" : "col_3",
+ |"type" : "long",
+ |"nullable" : true
+ |  }, {
+ |"name" : "col_4",
+ |"type" : {
+ |  "type" : "array",
+ |  "elementType" : "string",
+ |  "containsNull" : false
+ |},
+ |"nullable" : false
+ |  } ]
+ |},
+ |"nullable" : true
+ |  } ]
+ |}
+ |""".stripMargin
+val schema = DataType.fromJson(jsonSchema).asInstanceOf[StructType]
+val data = Seq(Row(Row("key", 123, "col2value", 109202L, 
Seq("col4value"
+val dataDf = 
spark.createDataFrame(spark.sparkContext.parallelize(data), schema)
+val dataDfToProto = dataDf.select(
+  to_protobuf_wrapper

[GitHub] [spark] rangadi commented on a diff in pull request #38922: [SPARK-41396][SQL][PROTOBUF] OneOf field support and recursion checks

2022-12-16 Thread GitBox


rangadi commented on code in PR #38922:
URL: https://github.com/apache/spark/pull/38922#discussion_r1051292604


##
connector/protobuf/src/main/scala/org/apache/spark/sql/protobuf/utils/ProtobufOptions.scala:
##
@@ -38,6 +38,14 @@ private[sql] class ProtobufOptions(
 
   val parseMode: ParseMode =
 parameters.get("mode").map(ParseMode.fromString).getOrElse(FailFastMode)
+
+  // Setting the `recursive.fields.max.depth` to 0 allows the field to be 
recurse once,

Review Comment:
   '0' disables recursion right? Why once? This might be difference in 
terminology. Thats why giving a quick example is better. Could you add this 
example?:
   
Consider a simple simple recursive proto 'message Person { string name = 1; 
Person bff = 2}
   
   > What would be spark schema when recursion 0, 1, and 2? I think : 
   
 - 0: struct
 - 1: struct>
 - 2: struct>>
 
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] rangadi commented on a diff in pull request #38922: [SPARK-41396][SQL][PROTOBUF] OneOf field support and recursion checks

2022-12-16 Thread GitBox


rangadi commented on code in PR #38922:
URL: https://github.com/apache/spark/pull/38922#discussion_r1051292604


##
connector/protobuf/src/main/scala/org/apache/spark/sql/protobuf/utils/ProtobufOptions.scala:
##
@@ -38,6 +38,14 @@ private[sql] class ProtobufOptions(
 
   val parseMode: ParseMode =
 parameters.get("mode").map(ParseMode.fromString).getOrElse(FailFastMode)
+
+  // Setting the `recursive.fields.max.depth` to 0 allows the field to be 
recurse once,

Review Comment:
   '0' disables recursion right? Why once? This might be difference in 
terminology. Thats why giving a quick example is better. Could you add this 
example?:
   
Consider a simple simple recursive proto 'message Person { string name = 1; 
Person bff = 2}
   
   > What would be spark schema when recursion 0, 1, and 2? I think : 
   
 - 0: struct
 - 1: struct>
 - 2: struct
 
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] gengliangwang closed pull request #39096: [SPARK-41421][UI] Protobuf serializer for ApplicationEnvironmentInfoWrapper

2022-12-16 Thread GitBox


gengliangwang closed pull request #39096: [SPARK-41421][UI] Protobuf serializer 
for ApplicationEnvironmentInfoWrapper
URL: https://github.com/apache/spark/pull/39096


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] gengliangwang commented on pull request #39096: [SPARK-41421][UI] Protobuf serializer for ApplicationEnvironmentInfoWrapper

2022-12-16 Thread GitBox


gengliangwang commented on PR #39096:
URL: https://github.com/apache/spark/pull/39096#issuecomment-1355950671

   Thanks, merging to master


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] monkeyboy123 opened a new pull request, #39102: [SPARK-41555][SQL] Multi sparkSession should share single SQLAppStatusStore

2022-12-16 Thread GitBox


monkeyboy123 opened a new pull request, #39102:
URL: https://github.com/apache/spark/pull/39102

   
   
   ### What changes were proposed in this pull request?
   
   It is a bug.
   
   
   ### Why are the changes needed?
   
   it is a waste of memory, that causes gc frequently.
   
   ### Does this PR introduce _any_ user-facing change?
   
   before this pr:
   it cause multi-SQL Tab in UI, besides, it exsits muti-SQLAppStatusListener  
in spark, that will cause waste of memory.
   code like 
jira:[SPARK-41555](https://issues.apache.org/jira/browse/SPARK-41555) said.
   
   
   ### How was this patch tested?
   
   exists UT.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] monkeyboy123 closed pull request #39101: [SPARK-41555][SQL] Multi sparkSession should share single SQLAppStatusStore

2022-12-16 Thread GitBox


monkeyboy123 closed pull request #39101: [SPARK-41555][SQL] Multi sparkSession 
should share single SQLAppStatusStore
URL: https://github.com/apache/spark/pull/39101


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] zhenlineo commented on a diff in pull request #39078: [SPARK-41534][CONNECT][SQL] Setup initial client module for Spark Connect

2022-12-16 Thread GitBox


zhenlineo commented on code in PR #39078:
URL: https://github.com/apache/spark/pull/39078#discussion_r1051273721


##
project/SparkBuild.scala:
##
@@ -1433,7 +1492,10 @@ object CopyDependencies {
   if (jar.getName.contains("spark-connect") &&
 !SbtPomKeys.profiles.value.contains("noshade-connect")) {
 Files.copy(fid.toPath, destJar.toPath)
-  } else if (jar.getName.contains("spark-protobuf") &&
+  } else if (jar.getName.contains("connect-client") &&
+!SbtPomKeys.profiles.value.contains("noshade-protobuf")) {

Review Comment:
   Just double check if you really meant to use `spark-protobuf` or it is a 
copy-past typo?
   The values to these three jars are a bit magical.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] monkeyboy123 opened a new pull request, #39101: [SPARK-41555][SQL] Multi sparkSession should share single SQLAppStatusStore

2022-12-16 Thread GitBox


monkeyboy123 opened a new pull request, #39101:
URL: https://github.com/apache/spark/pull/39101

   
   
   ### What changes were proposed in this pull request?
   
   It is a bug.
   
   
   ### Why are the changes needed?
   
   it is a waste of memory, that cause gc frequently.
   
   ### Does this PR introduce _any_ user-facing change?
   
   before this pr:
   it cause multi-SQL Tab in UI, besides, it exsits muti-SQLAppStatusListener  
in spark, that will cause waste of memory.
   code like 
jira:[SPARK-41555](https://issues.apache.org/jira/browse/SPARK-41555) said.
   
   
   ### How was this patch tested?
   
   exists UT.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on pull request #38357: [SPARK-40887][K8S] Allow Spark on K8s to integrate w/ Log Service

2022-12-16 Thread GitBox


dongjoon-hyun commented on PR #38357:
URL: https://github.com/apache/spark/pull/38357#issuecomment-1355886073

   Hi, @pan3793 . I'm interested in this PR for Apache Spark 3.4.0. To do that 
we need to split this PR in order to discuss and test more. I proposed you the 
followings first. Please let me know if you can revise.
   - Spin off pod env variable contributions (`SPARK_DRIVER_POD_NAME`).
   - Reduce the code change (e.g. Avoiding a whole new class like 
`KubernetesCoarseGrainedExecutorBackend`)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on a diff in pull request #38357: [SPARK-40887][K8S] Allow Spark on K8s to integrate w/ Log Service

2022-12-16 Thread GitBox


dongjoon-hyun commented on code in PR #38357:
URL: https://github.com/apache/spark/pull/38357#discussion_r1051268736


##
resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/Constants.scala:
##
@@ -65,6 +65,7 @@ private[spark] object Constants {
   val ENV_JAVA_OPT_PREFIX = "SPARK_JAVA_OPT_"
   val ENV_CLASSPATH = "SPARK_CLASSPATH"
   val ENV_DRIVER_BIND_ADDRESS = "SPARK_DRIVER_BIND_ADDRESS"
+  val ENV_DRIVER_POD_NAME = "SPARK_DRIVER_POD_NAME"

Review Comment:
   We had better put ths line should be before `ENV_DRIVER_BIND_ADDRESS`.
   
   BTW, Could you spin off `KUBERNETES_POD_NAME` and `SPARK_DRIVER_POD_NAME ` 
addition to a new PR, @pan3793 ?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on a diff in pull request #38357: [SPARK-40887][K8S] Allow Spark on K8s to integrate w/ Log Service

2022-12-16 Thread GitBox


dongjoon-hyun commented on code in PR #38357:
URL: https://github.com/apache/spark/pull/38357#discussion_r1051268736


##
resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/Constants.scala:
##
@@ -65,6 +65,7 @@ private[spark] object Constants {
   val ENV_JAVA_OPT_PREFIX = "SPARK_JAVA_OPT_"
   val ENV_CLASSPATH = "SPARK_CLASSPATH"
   val ENV_DRIVER_BIND_ADDRESS = "SPARK_DRIVER_BIND_ADDRESS"
+  val ENV_DRIVER_POD_NAME = "SPARK_DRIVER_POD_NAME"

Review Comment:
   We had better put ths line should be before `ENV_DRIVER_BIND_ADDRESS`. Could 
you spin off this `ENV_DRIVER_BIND_ADDRESS` addition to a new PR, @pan3793 ?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on a diff in pull request #38357: [SPARK-40887][K8S] Allow Spark on K8s to integrate w/ Log Service

2022-12-16 Thread GitBox


dongjoon-hyun commented on code in PR #38357:
URL: https://github.com/apache/spark/pull/38357#discussion_r1051268075


##
resource-managers/kubernetes/core/src/main/scala/org/apache/spark/scheduler/cluster/k8s/KubernetesExecutorBackend.scala:
##
@@ -49,7 +49,7 @@ private[spark] object KubernetesExecutorBackend extends 
Logging {
   def main(args: Array[String]): Unit = {
 val createFn: (RpcEnv, Arguments, SparkEnv, ResourceProfile, String) =>
   CoarseGrainedExecutorBackend = { case (rpcEnv, arguments, env, 
resourceProfile, execId) =>
-new CoarseGrainedExecutorBackend(rpcEnv, arguments.driverUrl, execId,
+new KubernetesCoarseGrainedExecutorBackend(rpcEnv, 
arguments.driverUrl, execId,
 arguments.bindAddress, arguments.hostname, arguments.cores,
 env, arguments.resourcesFileOpt, resourceProfile)

Review Comment:
   Instead of creating a new class `KubernetesCoarseGrainedExecutorBackend`, 
can we do the following simply?
   ```scala
new CoarseGrainedExecutorBackend(rpcEnv, arguments.driverUrl, 
execId,
arguments.bindAddress, arguments.hostname, arguments.cores,
   -env, arguments.resourcesFileOpt, resourceProfile)
   +env, arguments.resourcesFileOpt, resourceProfile) {
   +  override def getDriverAttributes: Option[Map[String, String]] = 
Some(
   +super.getDriverAttributes.getOrElse(Map.empty) ++ Map(
   +  "APP_ID" -> System.getenv(ENV_APPLICATION_ID),
   +  "KUBERNETES_NAMESPACE" -> conf.get(KUBERNETES_NAMESPACE),
   +  "KUBERNETES_POD_NAME" -> System.getenv(ENV_DRIVER_POD_NAME)))
   +}
   ```
   
   Then, we can remove the following file from this PR.
   - 
resource-managers/kubernetes/core/src/main/scala/org/apache/spark/executor/KubernetesCoarseGrainedExecutorBackend.scala



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on a diff in pull request #38357: [SPARK-40887][K8S] Allow Spark on K8s to integrate w/ Log Service

2022-12-16 Thread GitBox


dongjoon-hyun commented on code in PR #38357:
URL: https://github.com/apache/spark/pull/38357#discussion_r1051268075


##
resource-managers/kubernetes/core/src/main/scala/org/apache/spark/scheduler/cluster/k8s/KubernetesExecutorBackend.scala:
##
@@ -49,7 +49,7 @@ private[spark] object KubernetesExecutorBackend extends 
Logging {
   def main(args: Array[String]): Unit = {
 val createFn: (RpcEnv, Arguments, SparkEnv, ResourceProfile, String) =>
   CoarseGrainedExecutorBackend = { case (rpcEnv, arguments, env, 
resourceProfile, execId) =>
-new CoarseGrainedExecutorBackend(rpcEnv, arguments.driverUrl, execId,
+new KubernetesCoarseGrainedExecutorBackend(rpcEnv, 
arguments.driverUrl, execId,
 arguments.bindAddress, arguments.hostname, arguments.cores,
 env, arguments.resourcesFileOpt, resourceProfile)

Review Comment:
   Instead of creating a new class `KubernetesCoarseGrainedExecutorBackend`, 
can we do the following simply?
   ```scala
new CoarseGrainedExecutorBackend(rpcEnv, arguments.driverUrl, 
execId,
arguments.bindAddress, arguments.hostname, arguments.cores,
   -env, arguments.resourcesFileOpt, resourceProfile)
   +env, arguments.resourcesFileOpt, resourceProfile) {
   +  override def getDriverAttributes: Option[Map[String, String]] = 
Some(
   +super.getDriverAttributes.getOrElse(Map.empty) ++ Map(
   +  "APP_ID" -> System.getenv(ENV_APPLICATION_ID),
   +  "KUBERNETES_NAMESPACE" -> conf.get(KUBERNETES_NAMESPACE),
   +  "KUBERNETES_POD_NAME" -> System.getenv(ENV_DRIVER_POD_NAME)))
   +}
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] gengliangwang commented on a diff in pull request #39100: [SPARK-41422][UI] Protobuf serializer for ExecutorStageSummary

2022-12-16 Thread GitBox


gengliangwang commented on code in PR #39100:
URL: https://github.com/apache/spark/pull/39100#discussion_r1051265349


##
core/src/main/scala/org/apache/spark/status/protobuf/ExecutorMetricsSerializer.scala:
##
@@ -0,0 +1,39 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.status.protobuf
+
+import org.apache.spark.executor.ExecutorMetrics
+import org.apache.spark.metrics.ExecutorMetricType
+
+object ExecutorMetricsSerializer {
+  def serialize(e: ExecutorMetrics): StoreTypes.ExecutorMetrics = {

Review Comment:
   This follows ExecutorMetricsJsonSerializer



##
core/src/main/scala/org/apache/spark/status/protobuf/ExecutorMetricsSerializer.scala:
##
@@ -0,0 +1,39 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.status.protobuf
+
+import org.apache.spark.executor.ExecutorMetrics
+import org.apache.spark.metrics.ExecutorMetricType
+
+object ExecutorMetricsSerializer {
+  def serialize(e: ExecutorMetrics): StoreTypes.ExecutorMetrics = {
+val builder = StoreTypes.ExecutorMetrics.newBuilder()
+ExecutorMetricType.metricToOffset.foreach { case (metric, _) =>
+  builder.putMetrics(metric, e.getMetricValue(metric))
+  metric -> e.getMetricValue(metric)
+}
+builder.build()
+  }
+
+  def deserialize(binary: StoreTypes.ExecutorMetrics): ExecutorMetrics = {

Review Comment:
   This follows ExecutorMetricsJsonDeserializer



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on a diff in pull request #38357: [SPARK-40887][K8S] Allow Spark on K8s to integrate w/ Log Service

2022-12-16 Thread GitBox


dongjoon-hyun commented on code in PR #38357:
URL: https://github.com/apache/spark/pull/38357#discussion_r1051265233


##
core/src/main/scala/org/apache/spark/scheduler/SchedulerBackend.scala:
##
@@ -74,14 +76,24 @@ private[spark] trait SchedulerBackend {
* Executors tab for the driver.
* @return Map containing the log names and their respective URLs
*/
-  def getDriverLogUrls: Option[Map[String, String]] = None
+  def getDriverLogUrls: Option[Map[String, String]] = {
+val prefix = "SPARK_DRIVER_LOG_URL_"
+val driverLogUrls = sys.env.filterKeys(_.startsWith(prefix))
+  .map(e => (e._1.substring(prefix.length).toLowerCase(Locale.ROOT), 
e._2)).toMap
+if (driverLogUrls.nonEmpty) Some(driverLogUrls) else None
+  }
 
   /**
* Get the attributes on driver. These attributes are used to replace log 
URLs when
* custom log url pattern is specified.
* @return Map containing attributes on driver.
*/
-  def getDriverAttributes: Option[Map[String, String]] = None
+  def getDriverAttributes: Option[Map[String, String]] = {
+val prefix = "SPARK_DRIVER_ATTRIBUTE_"
+val driverAttributes = sys.env.filterKeys(_.startsWith(prefix))
+  .map(e => (e._1.substring(prefix.length).toUpperCase(Locale.ROOT), 
e._2)).toMap
+if (driverAttributes.nonEmpty) Some(driverAttributes) else None

Review Comment:
   This looks like logically duplicated in both `getDriverLogUrls` and 
`getDriverAttributes` except the variable names. Could you try to refactor more?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] gengliangwang commented on pull request #39100: [SPARK-41422][UI] Protobuf serializer for ExecutorStageSummary

2022-12-16 Thread GitBox


gengliangwang commented on PR #39100:
URL: https://github.com/apache/spark/pull/39100#issuecomment-1355874010

   cc @techaddict


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] gengliangwang opened a new pull request, #39100: [SPARK-41422][UI] Protobuf serializer for ExecutorStageSummary

2022-12-16 Thread GitBox


gengliangwang opened a new pull request, #39100:
URL: https://github.com/apache/spark/pull/39100

   ### What changes were proposed in this pull request?
   Add Protobuf serializer for ExecutorStageSummary
   
   ### Why are the changes needed?
   Support fast and compact serialization/deserialization for 
ExecutorStageSummary over RocksDB.
   
   ### Does this PR introduce any user-facing change?
   No
   
   ### How was this patch tested?
   New UT


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] github-actions[bot] closed pull request #37411: [SPARK-39984][CORE] Check workerLastHeartbeat with master before HeartbeatReceiver expires an executor

2022-12-16 Thread GitBox


github-actions[bot] closed pull request #37411: [SPARK-39984][CORE] Check 
workerLastHeartbeat with master before HeartbeatReceiver expires an executor
URL: https://github.com/apache/spark/pull/37411


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] github-actions[bot] closed pull request #37790: [SPARK-40288][SQL] After `RemoveRedundantAggregates`, `PullOutGroupingExpressions` should applied to avoid attribute missing when use

2022-12-16 Thread GitBox


github-actions[bot] closed pull request #37790:  [SPARK-40288][SQL] After 
`RemoveRedundantAggregates`, `PullOutGroupingExpressions` should applied to 
avoid attribute missing when use complex expression 
URL: https://github.com/apache/spark/pull/37790


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] github-actions[bot] commented on pull request #37532: [SPARK-39989][SQL][FollowUp] Improve foldable expression stats estimate for string and binary

2022-12-16 Thread GitBox


github-actions[bot] commented on PR #37532:
URL: https://github.com/apache/spark/pull/37532#issuecomment-1355860756

   We're closing this PR because it hasn't been updated in a while. This isn't 
a judgement on the merit of the PR in any way. It's just a way of keeping the 
PR queue manageable.
   If you'd like to revive this PR, please reopen it and ask a committer to 
remove the Stale tag!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] github-actions[bot] closed pull request #37793: add sparksql wirte mysql support update ,the design from replace into…

2022-12-16 Thread GitBox


github-actions[bot] closed pull request #37793: add sparksql wirte mysql 
support update ,the design from replace into…
URL: https://github.com/apache/spark/pull/37793


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on a diff in pull request #39078: [SPARK-41534][CONNECT][SQL] Setup initial client module for Spark Connect

2022-12-16 Thread GitBox


dongjoon-hyun commented on code in PR #39078:
URL: https://github.com/apache/spark/pull/39078#discussion_r1051253194


##
connector/connect/client/src/main/scala/org/apache/spark/sql/connect/client/SparkSession.scala:
##
@@ -0,0 +1,133 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.connect.client
+
+import scala.language.existentials
+
+import io.grpc.{ManagedChannel, ManagedChannelBuilder}
+import org.apache.arrow.memory.RootAllocator
+
+import org.apache.spark.SPARK_VERSION
+import org.apache.spark.connect.proto
+
+
+class SparkSession(
+private val userContext: proto.UserContext,
+private val channel: ManagedChannel)
+  extends AutoCloseable {
+  private[this] val stub = 
proto.SparkConnectServiceGrpc.newBlockingStub(channel)
+
+  private[this] val allocator = new RootAllocator()
+
+  /**
+   * The version of Spark on which this application is running.
+   */
+  def version: String = SPARK_VERSION
+
+  /**
+   * Returns a `DataFrame` with no rows or columns.
+   *
+   * @since 3.4.0
+   */
+  @transient
+  lazy val emptyDataFrame: Dataset = newDataset { builder =>
+builder.getLocalRelationBuilder
+  }
+
+  /**
+   * Creates a [[Dataset]] with a single `LongType` column named `id`, 
containing elements
+   * in a range from `start` to `end` (exclusive) with a step value, with 
partition number
+   * specified.
+   *
+   * @since 2.0.0
+   */
+  def range(start: Long, end: Long, step: Long, numPartitions: Int): Dataset = 
{
+range(start, end, step, Option(numPartitions))
+  }
+
+  private def range(start: Long, end: Long, step: Long, numPartitions: 
Option[Int]): Dataset = {
+newDataset { builder =>
+  val rangeBuilder = builder.getRangeBuilder
+.setStart(start)
+.setEnd(end)
+.setStep(step)
+  numPartitions.foreach(rangeBuilder.setNumPartitions)
+}
+  }
+
+  /**
+   * Executes a SQL query using Spark, returning the result as a `DataFrame`.
+   * This API eagerly runs DDL/DML commands, but not for SELECT queries.
+   *
+   * @since 2.0.0
+   */
+  def sql(query: String): Dataset = newDataset { builder =>
+builder.setSql(proto.SQL.newBuilder().setQuery(query))
+  }
+
+  private[client] def newDataset(f: proto.Relation.Builder => Unit): Dataset = 
{
+val builder = proto.Relation.newBuilder()
+f(builder)
+val plan = proto.Plan.newBuilder().setRoot(builder).build()
+new Dataset(this, plan)
+  }
+
+  private[client] def analyze(plan: proto.Plan): proto.AnalyzePlanResponse = {
+val request = proto.AnalyzePlanRequest.newBuilder()
+  .setPlan(plan)
+  .setUserContext(userContext)
+  .build()
+stub.analyzePlan(request)
+  }
+
+  override def close(): Unit = {
+channel.shutdownNow()
+allocator.close()
+  }
+}
+
+object SparkSession {
+  def builder(): Builder = new Builder()
+
+  class Builder() {
+private val userContextBuilder = proto.UserContext.newBuilder()
+private var _host: String = "localhost"
+private var _port: Int = 15002

Review Comment:
   Please avoid a magic number. If we cannot use the existing 
`CONNECT_GRPC_BINDING_PORT`, we may need to move the config definition from 
`server` to `common` first.
   
https://github.com/apache/spark/blob/f3be74e3a421224bb65a01e658de96006de2a4ac/connector/connect/server/src/main/scala/org/apache/spark/sql/connect/config/Connect.scala#L24-L28



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on a diff in pull request #39078: [SPARK-41534][CONNECT][SQL] Setup initial client module for Spark Connect

2022-12-16 Thread GitBox


dongjoon-hyun commented on code in PR #39078:
URL: https://github.com/apache/spark/pull/39078#discussion_r1051252600


##
project/SparkBuild.scala:
##
@@ -1433,7 +1492,10 @@ object CopyDependencies {
   if (jar.getName.contains("spark-connect") &&
 !SbtPomKeys.profiles.value.contains("noshade-connect")) {
 Files.copy(fid.toPath, destJar.toPath)
-  } else if (jar.getName.contains("spark-protobuf") &&
+  } else if (jar.getName.contains("connect-client") &&
+!SbtPomKeys.profiles.value.contains("noshade-protobuf")) {
+Files.copy(fidClient.toPath, destJar.toPath)
+  }  else if (jar.getName.contains("spark-protobuf") &&

Review Comment:
   Please remove an extra space, `}  else` -> `} else`.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on a diff in pull request #39078: [SPARK-41534][CONNECT][SQL] Setup initial client module for Spark Connect

2022-12-16 Thread GitBox


dongjoon-hyun commented on code in PR #39078:
URL: https://github.com/apache/spark/pull/39078#discussion_r1051251797


##
connector/connect/client/pom.xml:
##
@@ -0,0 +1,112 @@
+
+
+
+http://maven.apache.org/POM/4.0.0"; 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance";
+ xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 
http://maven.apache.org/xsd/maven-4.0.0.xsd";>
+4.0.0
+
+org.apache.spark
+spark-parent_2.12
+3.4.0-SNAPSHOT
+../../../pom.xml
+
+
+spark-connect-client_2.12
+jar
+Spark Project Connect Client
+https://spark.apache.org/
+
+connect-client
+31.0.1-jre
+1.0.1
+1.47.0
+6.0.53
+

Review Comment:
   This seems to be copied from `common` module. Could you try to avoid this 
kind of duplication?
   
https://github.com/apache/spark/blob/f3be74e3a421224bb65a01e658de96006de2a4ac/connector/connect/common/pom.xml#L33-L39



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] anchovYu commented on a diff in pull request #39040: [SPARK-27561][SQL][FOLLOWUP] Support implicit lateral column alias resolution on Aggregate

2022-12-16 Thread GitBox


anchovYu commented on code in PR #39040:
URL: https://github.com/apache/spark/pull/39040#discussion_r1051251048


##
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveLateralColumnAliasReference.scala:
##
@@ -129,6 +163,45 @@ object ResolveLateralColumnAliasReference extends 
Rule[LogicalPlan] {
   child = Project(innerProjectList.toSeq, child)
 )
   }
+
+case agg @ Aggregate(groupingExpressions, aggregateExpressions, _) if 
agg.resolved
+&& 
aggregateExpressions.exists(_.containsPattern(LATERAL_COLUMN_ALIAS_REFERENCE)) 
=>
+
+  val newAggExprs = collection.mutable.Set.empty[NamedExpression]
+  val expressionMap = 
collection.mutable.LinkedHashMap.empty[Expression, NamedExpression]
+  val projectExprs = aggregateExpressions.map { exp =>
+exp.transformDown {
+  case aggExpr: AggregateExpression =>
+// Doesn't support referencing a lateral alias in aggregate 
function
+if (aggExpr.containsPattern(LATERAL_COLUMN_ALIAS_REFERENCE)) {
+  aggExpr.collectFirst {
+case lcaRef: LateralColumnAliasReference =>
+  throw 
QueryCompilationErrors.lateralColumnAliasInAggFuncUnsupportedError(
+lcaRef.nameParts, aggExpr)
+  }
+}
+val ne = expressionMap.getOrElseUpdate(aggExpr.canonicalized, 
assignAlias(aggExpr))
+newAggExprs += ne
+ne.toAttribute
+  case e if groupingExpressions.exists(_.semanticEquals(e)) =>
+// TODO one concern here, is condition here be able to match 
all grouping

Review Comment:
   I surprisingly found out that this existing query can't analyze:
   ```
   select 1 + dept + 10 from $testTable group by dept + 10
   -- error: [MISSING_AGGREGATION] The non-aggregating expression "dept" is 
based on columns which are not participating in the GROUP BY clause
   ```
   Seems in our checkAnalysis, we don't canonicalize to compare the 
expressions. It is structured as (1 + dept) + 10, and can't match the grouping 
expression (dept + 10).



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on a diff in pull request #39078: [SPARK-41534][CONNECT][SQL] Setup initial client module for Spark Connect

2022-12-16 Thread GitBox


dongjoon-hyun commented on code in PR #39078:
URL: https://github.com/apache/spark/pull/39078#discussion_r1051250909


##
connector/connect/client/pom.xml:
##
@@ -0,0 +1,112 @@
+
+
+
+http://maven.apache.org/POM/4.0.0"; 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance";
+ xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 
http://maven.apache.org/xsd/maven-4.0.0.xsd";>
+4.0.0

Review Comment:
   If you don't mind, please use 2-space indentation in `pom.xml` like the 
parent pom file.
   
https://github.com/apache/spark/blob/f3be74e3a421224bb65a01e658de96006de2a4ac/pom.xml#L19-L27



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] techaddict commented on a diff in pull request #39093: [SPARK-41420][UI] Protobuf serializer for ApplicationInfoWrapper

2022-12-16 Thread GitBox


techaddict commented on code in PR #39093:
URL: https://github.com/apache/spark/pull/39093#discussion_r1051244979


##
core/src/main/protobuf/org/apache/spark/status/protobuf/store_types.proto:
##
@@ -106,3 +106,28 @@ message TaskDataWrapper {
   int64 stage_id = 40;
   int32 stage_attempt_id = 41;
 }
+
+message ApplicationAttemptInfo {
+  optional string attempt_id = 1;
+  int64 start_time = 2;
+  int64 end_time = 3;
+  int64 last_updated = 4;
+  int64 duration = 5;
+  string spark_user = 6;
+  bool completed = 7;
+  string app_spark_version = 8;
+}
+
+message ApplicationInfo {
+  string id = 1;
+  string name = 2;
+  optional int32 cores_granted = 3;
+  optional int32 max_cores = 4;
+  optional int32 cores_per_executor = 5;
+  optional int32 memory_per_executor_mb = 6;
+  repeated ApplicationAttemptInfo attempts = 7;
+}
+
+message ApplicationInfoWrapper {
+  ApplicationInfo info = 1;

Review Comment:
   @mridulm there are 2 `ApplicationInfoWrapper` one in 
`org.apache.spark.status`(this serializer is for that), and another one in 
`org.apache.spark.deploy.history`
   both have different signatures



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on a diff in pull request #39090: [SPARK-41334][CONNECT][PYTHON] Move `SortOrder` proto from relations to expressions

2022-12-16 Thread GitBox


dongjoon-hyun commented on code in PR #39090:
URL: https://github.com/apache/spark/pull/39090#discussion_r1051243980


##
python/pyspark/sql/connect/column.py:
##
@@ -437,35 +437,38 @@ def to_plan(self, session: "SparkConnectClient") -> 
proto.Expression:
 
 
 class SortOrder(Expression):
-def __init__(self, child: Expression, ascending: bool = True, nullsLast: 
bool = False) -> None:
+def __init__(self, child: Expression, ascending: bool = True, nullsFirst: 
bool = True) -> None:
 super().__init__()
 self._child = child
 self._ascending = ascending
-self._nullsLast = nullsLast
+self._nullsFirst = nullsFirst
 
 def __repr__(self) -> str:
 return (
 str(self._child)
 + (" ASC" if self._ascending else " DESC")
-+ (" NULLS LAST" if self._nullsLast else " NULLS FIRST")
++ (" NULLS FIRST" if self._nullsFirst else " NULLS LAST")
 )
 
 def to_plan(self, session: "SparkConnectClient") -> proto.Expression:
-# TODO(SPARK-41334): move SortField from relations.proto to 
expressions.proto
-sort = proto.Sort.SortField()
-sort.expression.CopyFrom(self._child.to_plan(session))
+sort = proto.Expression()
+sort.sort_order.child.CopyFrom(self._child.to_plan(session))
 
 if self._ascending:
-sort.direction = proto.Sort.SortDirection.SORT_DIRECTION_ASCENDING
+sort.sort_order.direction = (
+
proto.Expression.SortOrder.SortDirection.SORT_DIRECTION_ASCENDING
+)
 else:
-sort.direction = proto.Sort.SortDirection.SORT_DIRECTION_DESCENDING
+sort.sort_order.direction = (
+
proto.Expression.SortOrder.SortDirection.SORT_DIRECTION_DESCENDING
+)
 
-if self._nullsLast:
-sort.nulls = proto.Sort.SortNulls.SORT_NULLS_LAST
+if self._nullsFirst:
+sort.sort_order.null_ordering = 
proto.Expression.SortOrder.NullOrdering.SORT_NULLS_FIRST
 else:
-sort.nulls = proto.Sort.SortNulls.SORT_NULLS_FIRST
+sort.sort_order.null_ordering = 
proto.Expression.SortOrder.NullOrdering.SORT_NULLS_LAST
 
-return cast(proto.Expression, sort)

Review Comment:
   This seems to cause `unused import` linter failure. Could you fix it?
   ```
   ./python/pyspark/sql/connect/column.py:18:1: F401 'typing.cast' imported but 
unused
   from typing import (
F401 'typing.cast' imported but unused
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on pull request #39076: [SPARK-41530][CORE] Rename MedianHeap to PercentileMap and support percentile

2022-12-16 Thread GitBox


dongjoon-hyun commented on PR #39076:
URL: https://github.com/apache/spark/pull/39076#issuecomment-1355840094

   Merged to master for Apache Spark 3.4. Thank you, @cloud-fan and @mridulm .


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun closed pull request #39076: [SPARK-41530][CORE] Rename MedianHeap to PercentileMap and support percentile

2022-12-16 Thread GitBox


dongjoon-hyun closed pull request #39076: [SPARK-41530][CORE] Rename MedianHeap 
to PercentileMap and support percentile
URL: https://github.com/apache/spark/pull/39076


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on pull request #39054: [SPARK-27561][SQL][FOLLOWUP] Move the two rules for Later column alias into one file

2022-12-16 Thread GitBox


dongjoon-hyun commented on PR #39054:
URL: https://github.com/apache/spark/pull/39054#issuecomment-1355838611

   Could you make a PR, @anchovYu ?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] anchovYu commented on pull request #39054: [SPARK-27561][SQL][FOLLOWUP] Move the two rules for Later column alias into one file

2022-12-16 Thread GitBox


anchovYu commented on PR #39054:
URL: https://github.com/apache/spark/pull/39054#issuecomment-1355828789

   This PR changes to use the resolver from the SimpleAnalyzer, which is always 
case sensitive one instead of the one that is determined by conf. This will 
cause lca unable to resolve case insensitive alias. I fix this issue in the new 
pr 
https://github.com/apache/spark/pull/39040/commits/136a9308e623fee5b1303103e6397f96d8bb6788
 by refactoring the code.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] grundprinzip commented on a diff in pull request #39068: [SPARK-41434][CONNECT][PYTHON] Initial `LambdaFunction` implementation

2022-12-16 Thread GitBox


grundprinzip commented on code in PR #39068:
URL: https://github.com/apache/spark/pull/39068#discussion_r1051232035


##
connector/connect/server/src/main/scala/org/apache/spark/sql/connect/planner/SparkConnectPlanner.scala:
##
@@ -565,6 +565,47 @@ class SparkConnectPlanner(session: SparkSession) {
 val children = 
fun.getArgumentsList.asScala.toSeq.map(transformExpression)
 Some(In(children.head, children.tail))
 
+  case "___lambda_function___" =>
+// UnresolvedFunction[___lambda_function___, ["x, y -> x < y", "x", 
"y"]]

Review Comment:
   This branch deserves it's own function please.



##
connector/connect/server/src/main/scala/org/apache/spark/sql/connect/planner/SparkConnectPlanner.scala:
##
@@ -565,6 +565,47 @@ class SparkConnectPlanner(session: SparkSession) {
 val children = 
fun.getArgumentsList.asScala.toSeq.map(transformExpression)
 Some(In(children.head, children.tail))
 
+  case "___lambda_function___" =>
+// UnresolvedFunction[___lambda_function___, ["x, y -> x < y", "x", 
"y"]]
+
+if (fun.getArgumentsCount < 2) {
+  throw InvalidPlanInput(
+"LambdaFunction requires at least 2 child expressions: 
LamdaFunction, Arguments")
+}
+
+val children = 
fun.getArgumentsList.asScala.toSeq.map(transformExpression)
+
+val function = children.head
+
+val variableNames = children.tail.map {
+  case variable: UnresolvedAttribute if variable.nameParts.length == 1 
=>
+variable.nameParts.head
+  case other =>
+throw InvalidPlanInput(
+  "LambdaFunction requires all arguments to be UnresolvedAttribute 
with " +
+s"single name part, but got $other")

Review Comment:
   There are two interesting issues here:
   
1. When someone submits to the API an expression that does not transform 
into `UnresolvedExpression` this would throw a weird error message about the 
name parts but actually the type does not match.
2. Why the restriction to single part names? Is this a Spark limitation?



##
python/pyspark/sql/connect/column.py:
##
@@ -543,6 +543,39 @@ def __repr__(self) -> str:
 return f"({self._col} ({self._data_type}))"
 
 
+class LambdaFunction(Expression):
+def __init__(
+self,
+function: Expression,
+arguments: Sequence[Expression],
+) -> None:
+super().__init__()
+
+assert isinstance(function, Expression)
+
+assert (
+isinstance(arguments, list)
+and len(arguments) > 0
+and all(isinstance(arg, ColumnReference) for arg in arguments)
+)

Review Comment:
   Adding these assertions here is helpful in the Python client but the server 
side does not do the same assertion. What happens if we drop the assertion on 
`ColumnReference` what would happen on the server? 
   
   Is the analysis exception not better than the Python assertion>



##
python/pyspark/sql/connect/functions.py:
##
@@ -80,6 +84,80 @@ def _invoke_binary_math_function(name: str, col1: Any, col2: 
Any) -> Column:
 return _invoke_function(name, *_cols)
 
 
+def _get_lambda_parameters(f: Callable) -> ValuesView[inspect.Parameter]:
+signature = inspect.signature(f)
+parameters = signature.parameters.values()
+
+# We should exclude functions that use
+# variable args and keyword argnames
+# as well as keyword only args
+supported_parameter_types = {
+inspect.Parameter.POSITIONAL_OR_KEYWORD,
+inspect.Parameter.POSITIONAL_ONLY,
+}
+
+# Validate that
+# function arity is between 1 and 3

Review Comment:
   ```suggestion
   # Validate that the function arity is between 1 and 3.
   ```



##
python/pyspark/sql/connect/functions.py:
##
@@ -80,6 +84,80 @@ def _invoke_binary_math_function(name: str, col1: Any, col2: 
Any) -> Column:
 return _invoke_function(name, *_cols)
 
 
+def _get_lambda_parameters(f: Callable) -> ValuesView[inspect.Parameter]:
+signature = inspect.signature(f)
+parameters = signature.parameters.values()
+
+# We should exclude functions that use
+# variable args and keyword argnames
+# as well as keyword only args
+supported_parameter_types = {
+inspect.Parameter.POSITIONAL_OR_KEYWORD,
+inspect.Parameter.POSITIONAL_ONLY,
+}
+
+# Validate that
+# function arity is between 1 and 3

Review Comment:
   one line?



##
python/pyspark/sql/connect/functions.py:
##
@@ -80,6 +84,80 @@ def _invoke_binary_math_function(name: str, col1: Any, col2: 
Any) -> Column:
 return _invoke_function(name, *_cols)
 
 
+def _get_lambda_parameters(f: Callable) -> ValuesView[inspect.Parameter]:
+signature = inspect.signature(f)
+parameters = signature.parameters.values()
+
+# We should exclude functions that use
+# variable args and keyword 

[GitHub] [spark] fe2s opened a new pull request, #39099: [SPARK-41554] fix changing of Decimal scale when scale decreased by m…

2022-12-16 Thread GitBox


fe2s opened a new pull request, #39099:
URL: https://github.com/apache/spark/pull/39099

   …ore than 18
   
   
   
   ### What changes were proposed in this pull request?
   Fix `Decimal` scaling that is stored as compact long internally when scale 
decreased by more than 18. For example,
   ```
   Decimal(1, 38, 19).changePrecision(38, 0)
   ```
   produces an exception
   ```
   java.lang.ArrayIndexOutOfBoundsException: 19
   at org.apache.spark.sql.types.Decimal.changePrecision(Decimal.scala:377)
   at org.apache.spark.sql.types.Decimal.changePrecision(Decimal.scala:328)
   ```
   Another way to reproduce it with SQL query
   ```
   sql("select cast(cast(cast(cast(id as decimal(38,15)) as decimal(38,30)) as 
decimal(38,37)) as decimal(38,17)) from range(3)").show
   ```
   
   
   The bug exists for Decimal that is stored using compact long only, it works 
fine with Decimal that uses `scala.math.BigDecimal` internally.
   
   
   
   
   ### Why are the changes needed?
   Not able to execute the SQL query mentioned above. Please note, for my use 
case the SQL query is generated programatically, so I cannot optimize it 
manually. 
   
   
   
   ### Does this PR introduce _any_ user-facing change?
   
   Yes, it will allow scale Decimal properly that is not currently possible due 
to the exception.
   
   
   ### How was this patch tested?
   Tests were added. The fix affects the scale decrease only, but I decided to 
also include tests for scale increase as I didn't find them. 
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] anchovYu commented on pull request #39040: [SPARK-27561][SQL][FOLLOWUP] Support implicit lateral column alias resolution on Aggregate

2022-12-16 Thread GitBox


anchovYu commented on PR #39040:
URL: https://github.com/apache/spark/pull/39040#issuecomment-1355792680

   I will shortly address the left todos in this PR.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] grundprinzip commented on a diff in pull request #38984: [SPARK-41349][CONNECT][PYTHON] Implement DataFrame.hint

2022-12-16 Thread GitBox


grundprinzip commented on code in PR #38984:
URL: https://github.com/apache/spark/pull/38984#discussion_r1051222085


##
python/pyspark/sql/tests/connect/test_connect_basic.py:
##
@@ -829,6 +829,13 @@ def test_with_columns(self):
 .toPandas(),
 )
 
+def test_hint(self):
+# SPARK-41349: Test hint

Review Comment:
   Please add additional tests for:
   
 * unsupported param types
 * unsupported hint name
 * invalid combination of hint and param
   
   Please check if there is additional coverage needed.



##
python/pyspark/sql/connect/plan.py:
##
@@ -343,6 +343,51 @@ def _repr_html_(self) -> str:
 """
 
 
+class Hint(LogicalPlan):
+"""Logical plan object for a Hint operation."""
+
+def __init__(self, child: Optional["LogicalPlan"], name: str, params: 
List[Any]) -> None:
+super().__init__(child)
+self.name = name
+self.params = params
+
+def _convert_value(self, v: Any) -> proto.Expression.Literal:
+value = proto.Expression.Literal()
+if v is None:
+value.null = True
+elif isinstance(v, int):
+value.integer = v
+else:
+value.string = v
+return value

Review Comment:
   This code has weird error behavior if v is not `None` or `int`. If for 
example, I were to assign a float, I would receive an error message from 
protobuf that is not actionable for the user. I think it would be good to 
either use the existing Python to Literal conversion code that we have or throw 
an exception.



##
python/pyspark/sql/connect/dataframe.py:
##
@@ -875,6 +875,30 @@ def to_jcols(
 
 melt = unpivot
 
+def hint(self, name: str, *params: Any) -> "DataFrame":
+"""
+Specifies some hint on the current DataFrame. As an example, the 
following code specifies
+that one of the plan can be broadcasted: 
`df1.join(df2.hint("broadcast"))`
+
+.. versionadded:: 3.4.0
+
+Parameters
+--
+name: str
+the name of the hint, for example, "broadcast", "SHUFFLE_MERGE" 
and "shuffle_hash".
+params: tuple
+the parameters of the hint

Review Comment:
   I know that the documentation is most likeley directly from PySpark, but I'm 
wondering if we can add more context around what types can the params have? If 
I read through the code it can be `any` here but later only 
`Optional[Union[str, int]]`?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] mridulm commented on a diff in pull request #39093: [SPARK-41420][UI] Protobuf serializer for ApplicationInfoWrapper

2022-12-16 Thread GitBox


mridulm commented on code in PR #39093:
URL: https://github.com/apache/spark/pull/39093#discussion_r1051020503


##
core/src/main/protobuf/org/apache/spark/status/protobuf/store_types.proto:
##
@@ -106,3 +106,28 @@ message TaskDataWrapper {
   int64 stage_id = 40;
   int32 stage_attempt_id = 41;
 }
+
+message ApplicationAttemptInfo {
+  optional string attempt_id = 1;
+  int64 start_time = 2;
+  int64 end_time = 3;
+  int64 last_updated = 4;
+  int64 duration = 5;
+  string spark_user = 6;
+  bool completed = 7;
+  string app_spark_version = 8;
+}
+
+message ApplicationInfo {
+  string id = 1;
+  string name = 2;
+  optional int32 cores_granted = 3;
+  optional int32 max_cores = 4;
+  optional int32 cores_per_executor = 5;
+  optional int32 memory_per_executor_mb = 6;
+  repeated ApplicationAttemptInfo attempts = 7;
+}
+
+message ApplicationInfoWrapper {
+  ApplicationInfo info = 1;

Review Comment:
   Missing `AttemptInfoWrapper` ?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] anchovYu commented on a diff in pull request #39040: [SPARK-27561][SQL][FOLLOWUP] Support implicit lateral column alias resolution on Aggregate

2022-12-16 Thread GitBox


anchovYu commented on code in PR #39040:
URL: https://github.com/apache/spark/pull/39040#discussion_r1051207085


##
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveLateralColumnAliasReference.scala:
##
@@ -129,6 +163,45 @@ object ResolveLateralColumnAliasReference extends 
Rule[LogicalPlan] {
   child = Project(innerProjectList.toSeq, child)
 )
   }
+
+case agg @ Aggregate(groupingExpressions, aggregateExpressions, _) if 
agg.resolved
+&& 
aggregateExpressions.exists(_.containsPattern(LATERAL_COLUMN_ALIAS_REFERENCE)) 
=>
+
+  val newAggExprs = collection.mutable.Set.empty[NamedExpression]
+  val expressionMap = 
collection.mutable.LinkedHashMap.empty[Expression, NamedExpression]
+  val projectExprs = aggregateExpressions.map { exp =>
+exp.transformDown {
+  case aggExpr: AggregateExpression =>
+// Doesn't support referencing a lateral alias in aggregate 
function
+if (aggExpr.containsPattern(LATERAL_COLUMN_ALIAS_REFERENCE)) {
+  aggExpr.collectFirst {
+case lcaRef: LateralColumnAliasReference =>
+  throw 
QueryCompilationErrors.lateralColumnAliasInAggFuncUnsupportedError(
+lcaRef.nameParts, aggExpr)
+  }
+}
+val ne = expressionMap.getOrElseUpdate(aggExpr.canonicalized, 
assignAlias(aggExpr))
+newAggExprs += ne

Review Comment:
   Yes we do. This is the general strategy of this design, extract **all** 
aggregation functions or grouping expressions. It should be simple and avoid 
many extra considerations of code path (e.g., mark if the alias is used as 
lateral alias by later expressions).



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] techaddict commented on pull request #39096: [SPARK-41421][UI] Protobuf serializer for ApplicationEnvironmentInfoWrapper

2022-12-16 Thread GitBox


techaddict commented on PR #39096:
URL: https://github.com/apache/spark/pull/39096#issuecomment-1355722437

   @gengliangwang Thanks for the reivew, addressed comments.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on pull request #38983: [SPARK-41447][CORE] Reduce the number of `doMergeApplicationListing` invocations

2022-12-16 Thread GitBox


dongjoon-hyun commented on PR #38983:
URL: https://github.com/apache/spark/pull/38983#issuecomment-1355707241

   @shuyouZZ . I added you to the Apache Spark contributor group and assigned 
SPARK-41447 to you.
   Welcome to the Apache Spark community!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun closed pull request #38983: [SPARK-41447][CORE] Reduce the number of `doMergeApplicationListing` invocations

2022-12-16 Thread GitBox


dongjoon-hyun closed pull request #38983: [SPARK-41447][CORE] Reduce the number 
of `doMergeApplicationListing` invocations
URL: https://github.com/apache/spark/pull/38983


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] gengliangwang commented on pull request #39096: [SPARK-41421][UI] Protobuf serializer for ApplicationEnvironmentInfoWrapper

2022-12-16 Thread GitBox


gengliangwang commented on PR #39096:
URL: https://github.com/apache/spark/pull/39096#issuecomment-1355701397

   @techaddict LGTM except for two minor comments. Thanks for working on it.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] gengliangwang commented on a diff in pull request #39096: [SPARK-41421][UI] Protobuf serializer for ApplicationEnvironmentInfoWrapper

2022-12-16 Thread GitBox


gengliangwang commented on code in PR #39096:
URL: https://github.com/apache/spark/pull/39096#discussion_r1051192373


##
core/src/test/scala/org/apache/spark/status/protobuf/KVStoreProtobufSerializerSuite.scala:
##
@@ -176,4 +177,83 @@ class KVStoreProtobufSerializerSuite extends SparkFunSuite 
{
 assert(result.stageId == input.stageId)
 assert(result.stageAttemptId == input.stageAttemptId)
   }
+
+  test("Application Environment Info") {
+val input = new ApplicationEnvironmentInfoWrapper(
+  new ApplicationEnvironmentInfo(
+runtime = new RuntimeInfo(
+  javaVersion = "1.8",
+  javaHome = "/tmp/java",
+  scalaVersion = "2.13"),
+sparkProperties = Seq(("1", "2")),
+hadoopProperties = Seq(("1", "2")),

Review Comment:
   I would suggest setting different values for these fields in case of 
mistakes on the fields of serializer and deserializer



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] gengliangwang commented on a diff in pull request #39096: [SPARK-41421][UI] Protobuf serializer for ApplicationEnvironmentInfoWrapper

2022-12-16 Thread GitBox


gengliangwang commented on code in PR #39096:
URL: https://github.com/apache/spark/pull/39096#discussion_r1051188448


##
core/src/main/protobuf/org/apache/spark/status/protobuf/store_types.proto:
##
@@ -106,3 +106,45 @@ message TaskDataWrapper {
   int64 stage_id = 40;
   int32 stage_attempt_id = 41;
 }
+
+message ExecutorResourceRequest {
+  string resource_name = 1;
+  int64 amount = 2;
+  string discoveryScript = 3;
+  string vendor = 4;
+}
+
+message TaskResourceRequest {
+  string resource_name = 1;
+  double amount = 2;
+}
+
+message ResourceProfileInfo {
+  int32 id = 1;
+  map executor_resources = 2;
+  map task_resources = 3;
+}
+
+message RuntimeInfo {
+  string java_version = 1;
+  string java_home = 2;
+  string scala_version = 3;
+}
+
+message ApplicationEnvironmentInfo {
+  message PairSS {

Review Comment:
   Shall we name it `PairStrings` and move it out of 
`ApplicationEnvironmentInfo`? In the future this might be reused.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] tomvanbussel commented on pull request #38941: [SPARK-41498] Propagate metadata through Union

2022-12-16 Thread GitBox


tomvanbussel commented on PR #38941:
URL: https://github.com/apache/spark/pull/38941#issuecomment-1355603538

   @cloud-fan Could you help us find someone to take a look at this?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun closed pull request #39094: [SPARK-41552][BUILD][K8S] Upgrade `kubernetes-client` to 6.3.1

2022-12-16 Thread GitBox


dongjoon-hyun closed pull request #39094: [SPARK-41552][BUILD][K8S] Upgrade 
`kubernetes-client` to 6.3.1
URL: https://github.com/apache/spark/pull/39094


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on pull request #39094: [SPARK-41552][BUILD][K8S] Upgrade `kubernetes-client` to 6.3.1

2022-12-16 Thread GitBox


dongjoon-hyun commented on PR #39094:
URL: https://github.com/apache/spark/pull/39094#issuecomment-1355597529

   Merged to master for Apache Spark 3.4.0.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] bjornjorgensen opened a new pull request, #39098: [SPARK-41553][PS] Change `num_files` to `repartition`

2022-12-16 Thread GitBox


bjornjorgensen opened a new pull request, #39098:
URL: https://github.com/apache/spark/pull/39098

   ### What changes were proposed in this pull request?
   Change num_files to repartition
   
   ### Why are the changes needed?
   
   `num_files` has been deprecated and might be removed in a future version. "
   "Use `DataFrame.spark.repartition` instead.",
   
   ### Does this PR introduce _any_ user-facing change?
   No.
   
   ### How was this patch tested?
   Pass GA


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] NarekDW opened a new pull request, #39097: [SPARK-41049][SQL] make to_csv function deterministic

2022-12-16 Thread GitBox


NarekDW opened a new pull request, #39097:
URL: https://github.com/apache/spark/pull/39097

   ### What changes were proposed in this pull request?
   This PR enhances `StructsToCsv` class with `doGenCode` function instead of 
extending it from `CodegenFallback` trait, in order to make it deterministic.
   
   Example:
   ```scala
   import org.apache.spark.sql._
   import org.apache.spark.sql.types._
   import org.apache.spark.sql.functions._
   
   val sparkSession = SparkSession.builder().getOrCreate()
   val df = sparkSession.sparkContext.parallelize(1 to 5).toDF("x")
   val v1 = rand().*(lit(1)).cast(IntegerType)
   val v2 = to_csv(struct(v1.as("a")))
   df.select(v1, v1, v2, v2).show() 
   ```
   
   before this changes, the result was something like:
   ```scala
   +++++
   |   a|   b|   c|   d|
   +++++
   | 922| 922| 922|2028|
   |5571|5571|5571|1640|
   |5612|5612|5612|769 |
   |2068|2068|2068|8924|
   |5755|5755|5755|2731|
   +++++
   ```
   
   With current changes, the result looks like:
   ```scala
   +++++
   |   a|   b|   c|   d|
   +++++
   | 922| 922| 922| 922|
   |5571|5571|5571|5571|
   |5612|5612|5612|5612|
   |2068|2068|2068|2068|
   |5755|5755|5755|5755|
   +++++
   ```
   
   
   ### Why are the changes needed?
   To make to_csv function deterministic.
   
   
   ### Does this PR introduce _any_ user-facing change?
   No
   
   
   ### How was this patch tested?
   An additional test case was added to 
[CsvFunctionsSuite](https://github.com/NarekDW/spark/blob/13ab259815ad2b9351010df15ab23aacfa7ee14e/sql/core/src/test/scala/org/apache/spark/sql/CsvFunctionsSuite.scala)
   ```scala 
   test("SPARK-41049: make to_csv function deterministic") {
   ...
   ```
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] techaddict opened a new pull request, #39096: [SPARK-41421][UI] Protobuf serializer for ApplicationEnvironmentInfoWrapper

2022-12-16 Thread GitBox


techaddict opened a new pull request, #39096:
URL: https://github.com/apache/spark/pull/39096

   ### What changes were proposed in this pull request?
   Add Protobuf serializer for ApplicationEnvironmentInfoWrapper
   
   ### Why are the changes needed?
   Support fast and compact serialization/deserialization for 
ApplicationEnvironmentInfoWrapper over RocksDB.
   
   ### Does this PR introduce any user-facing change?
   No
   
   ### How was this patch tested?
   New UT


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on pull request #39094: [SPARK-41552][BUILD][K8S] Upgrade `kubernetes-client` to 6.3.1

2022-12-16 Thread GitBox


dongjoon-hyun commented on PR #39094:
URL: https://github.com/apache/spark/pull/39094#issuecomment-1355493567

   Thank you, @viirya and @bjornjorgensen .


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on pull request #39087: [SPARK-41365][UI][3.3] Stages UI page fails to load for proxy in specific yarn environment

2022-12-16 Thread GitBox


dongjoon-hyun commented on PR #39087:
URL: https://github.com/apache/spark/pull/39087#issuecomment-1355492859

   I added you to the Apache Spark contributor group and assign SPARK-41365 to 
you.
   Welcome to the Apache Spark community, @yabola .


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun closed pull request #39087: [SPARK-41365][UI][3.3] Stages UI page fails to load for proxy in specific yarn environment

2022-12-16 Thread GitBox


dongjoon-hyun closed pull request #39087: [SPARK-41365][UI][3.3] Stages UI page 
fails to load for proxy in specific yarn environment
URL: https://github.com/apache/spark/pull/39087


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] amaliujia commented on a diff in pull request #39078: [SPARK-41534][CONNECT][SQL] Setup initial client module for Spark Connect

2022-12-16 Thread GitBox


amaliujia commented on code in PR #39078:
URL: https://github.com/apache/spark/pull/39078#discussion_r1051078343


##
connector/connect/client/src/main/scala/org/apache/spark/sql/connect/client/SparkSession.scala:
##
@@ -0,0 +1,133 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.connect.client
+
+import scala.language.existentials
+
+import io.grpc.{ManagedChannel, ManagedChannelBuilder}
+import org.apache.arrow.memory.RootAllocator
+
+import org.apache.spark.SPARK_VERSION
+import org.apache.spark.connect.proto
+
+
+class SparkSession(
+private val userContext: proto.UserContext,
+private val channel: ManagedChannel)
+  extends AutoCloseable {
+  private[this] val stub = 
proto.SparkConnectServiceGrpc.newBlockingStub(channel)
+
+  private[this] val allocator = new RootAllocator()
+
+  /**
+   * The version of Spark on which this application is running.
+   */
+  def version: String = SPARK_VERSION
+
+  /**
+   * Returns a `DataFrame` with no rows or columns.
+   *
+   * @since 3.4.0
+   */
+  @transient
+  lazy val emptyDataFrame: Dataset = newDataset { builder =>
+builder.getLocalRelationBuilder
+  }
+
+  /**
+   * Creates a [[Dataset]] with a single `LongType` column named `id`, 
containing elements
+   * in a range from `start` to `end` (exclusive) with a step value, with 
partition number
+   * specified.
+   *
+   * @since 2.0.0

Review Comment:
   The client starts from 3.4.0?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] amaliujia commented on a diff in pull request #39084: [SPARK-41464][CONNECT][PYTHON] Implement `DataFrame.to`

2022-12-16 Thread GitBox


amaliujia commented on code in PR #39084:
URL: https://github.com/apache/spark/pull/39084#discussion_r1051076834


##
connector/connect/server/src/main/scala/org/apache/spark/sql/connect/planner/SparkConnectPlanner.scala:
##
@@ -270,6 +271,24 @@ class SparkConnectPlanner(session: SparkSession) {
   .logicalPlan
   }
 
+  private def transformToSchema(rel: proto.ToSchema): LogicalPlan = {
+val schemaType = if (rel.hasDatatype) {
+  DataTypeProtoConverter.toCatalystType(rel.getDatatype)
+} else {
+  parseDatatypeString(rel.getDatatypeStr)
+}
+
+val schemaStruct = schemaType match {
+  case s: StructType => s
+  case d => StructType(Seq(StructField("value", d)))

Review Comment:
   so it is not guaranteed that the client side always sends a StructType as 
schema when it is not a string?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] amaliujia commented on a diff in pull request #39084: [SPARK-41464][CONNECT][PYTHON] Implement `DataFrame.to`

2022-12-16 Thread GitBox


amaliujia commented on code in PR #39084:
URL: https://github.com/apache/spark/pull/39084#discussion_r1051075228


##
connector/connect/common/src/main/protobuf/spark/connect/relations.proto:
##
@@ -601,3 +602,19 @@ message Unpivot {
   // (Required) Name of the value column.
   string value_column_name = 5;
 }
+
+message ToSchema {
+  // (Required) The input relation.
+  Relation input = 1;
+
+  // (Required) The user provided schema.
+  //
+  // The Sever side will update the dataframe with this schema.
+  oneof schema {

Review Comment:
   This requires clients side to implement more. For example, client side 
always convert StructType to string representation then Spark will convert it 
back.
   
   This basically asks clients to understand Spark's protocol on the string 
based schema and then implement it right?



##
connector/connect/common/src/main/protobuf/spark/connect/relations.proto:
##
@@ -601,3 +602,19 @@ message Unpivot {
   // (Required) Name of the value column.
   string value_column_name = 5;
 }
+
+message ToSchema {
+  // (Required) The input relation.
+  Relation input = 1;
+
+  // (Required) The user provided schema.
+  //
+  // The Sever side will update the dataframe with this schema.
+  oneof schema {

Review Comment:
   This requires clients side to implement more. For example, client side 
always convert StructType to string representation then Spark will convert it 
back.
   
   This basically asks clients to understand Spark's protocol on the string 
based schema and then implement it right.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] amaliujia commented on a diff in pull request #39084: [SPARK-41464][CONNECT][PYTHON] Implement `DataFrame.to`

2022-12-16 Thread GitBox


amaliujia commented on code in PR #39084:
URL: https://github.com/apache/spark/pull/39084#discussion_r1051074565


##
connector/connect/common/src/main/protobuf/spark/connect/relations.proto:
##
@@ -601,3 +602,19 @@ message Unpivot {
   // (Required) Name of the value column.
   string value_column_name = 5;
 }
+
+message ToSchema {
+  // (Required) The input relation.
+  Relation input = 1;
+
+  // (Required) The user provided schema.
+  //
+  // The Sever side will update the dataframe with this schema.
+  oneof schema {
+
+DataType datatype = 2;

Review Comment:
   Maybe comment here to say this must be a StructType?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] amaliujia commented on a diff in pull request #39084: [SPARK-41464][CONNECT][PYTHON] Implement `DataFrame.to`

2022-12-16 Thread GitBox


amaliujia commented on code in PR #39084:
URL: https://github.com/apache/spark/pull/39084#discussion_r1051074168


##
python/pyspark/sql/connect/dataframe.py:
##
@@ -1631,6 +1632,47 @@ def inputFiles(self) -> List[str]:
 query = self._plan.to_proto(self._session.client)
 return self._session.client._analyze(query).input_files
 
+def to(self, schema: Union[DataType, str]) -> "DataFrame":

Review Comment:
   yes we should have StructType in `pyspark_types_to_proto_types`.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] bjornjorgensen commented on pull request #39094: [SPARK-41552][BUILD][K8S] Upgrade `kubernetes-client` to 6.3.1

2022-12-16 Thread GitBox


bjornjorgensen commented on PR #39094:
URL: https://github.com/apache/spark/pull/39094#issuecomment-1355444859

   Yes, then I can close this one 
https://github.com/bjornjorgensen/spark/pull/94 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] amaliujia commented on a diff in pull request #39091: [SPARK-41527][CONNECT][PYTHON] Implement DataFrame.observe

2022-12-16 Thread GitBox


amaliujia commented on code in PR #39091:
URL: https://github.com/apache/spark/pull/39091#discussion_r1051070657


##
sql/core/src/main/scala/org/apache/spark/sql/Observation.scala:
##
@@ -45,7 +45,7 @@ import org.apache.spark.sql.util.QueryExecutionListener
  * @param name name of the metric
  * @since 3.3.0
  */
-class Observation(name: String) {
+class Observation(val name: String) {

Review Comment:
   ah without `val`, `name` is treated as a method. Nice catch on this.



##
connector/connect/server/src/main/scala/org/apache/spark/sql/connect/dsl/package.scala:
##
@@ -126,6 +126,20 @@ package object dsl {
   
Expression.UnresolvedFunction.newBuilder().setFunctionName("min").addArguments(e))
 .build()
 
+def proto_max(e: Expression): Expression =

Review Comment:
   Same for below



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] amaliujia commented on a diff in pull request #39091: [SPARK-41527][CONNECT][PYTHON] Implement DataFrame.observe

2022-12-16 Thread GitBox


amaliujia commented on code in PR #39091:
URL: https://github.com/apache/spark/pull/39091#discussion_r1051069651


##
python/pyspark/sql/connect/dataframe.py:
##
@@ -891,6 +892,75 @@ def to_jcols(
 
 melt = unpivot
 
+def observe(
+self,
+observation: Union["Observation", str],
+*exprs: Column,
+) -> "DataFrame":
+"""Define (named) metrics to observe on the DataFrame. This method 
returns an 'observed'
+DataFrame that returns the same result as the input, with the 
following guarantees:
+
+* It will compute the defined aggregates (metrics) on all the data 
that is flowing through
+the Dataset at that point.
+
+* It will report the value of the defined aggregate columns as soon as 
we reach a completion
+point. A completion point is either the end of a query (batch 
mode) or the end of a
+streaming epoch. The value of the aggregates only reflects the 
data processed since
+the previous completion point.
+
+The metrics columns must either contain a literal (e.g. lit(42)), or 
should contain one or
+more aggregate functions (e.g. sum(a) or sum(a + b) + avg(c) - 
lit(1)). Expressions that
+contain references to the input Dataset's columns must always be 
wrapped in an aggregate
+function.
+
+A user can observe these metrics by adding
+Python's :class:`~pyspark.sql.streaming.StreamingQueryListener`,
+Scala/Java's ``org.apache.spark.sql.streaming.StreamingQueryListener`` 
or Scala/Java's
+``org.apache.spark.sql.util.QueryExecutionListener`` to the spark 
session.
+
+.. versionadded:: 3.3.0

Review Comment:
   3.4.0?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] amaliujia commented on a diff in pull request #39091: [SPARK-41527][CONNECT][PYTHON] Implement DataFrame.observe

2022-12-16 Thread GitBox


amaliujia commented on code in PR #39091:
URL: https://github.com/apache/spark/pull/39091#discussion_r1051068680


##
connector/connect/server/src/main/scala/org/apache/spark/sql/connect/dsl/package.scala:
##
@@ -126,6 +126,20 @@ package object dsl {
   
Expression.UnresolvedFunction.newBuilder().setFunctionName("min").addArguments(e))
 .build()
 
+def proto_max(e: Expression): Expression =

Review Comment:
   There is need to add `proto_` prefix? Just call it max? 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on pull request #39094: [SPARK-41552][BUILD][K8S] Upgrade `kubernetes-client` to 6.3.1

2022-12-16 Thread GitBox


dongjoon-hyun commented on PR #39094:
URL: https://github.com/apache/spark/pull/39094#issuecomment-1355430439

   cc @panbingkun , @Yikun , @viirya 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] MaxGekk opened a new pull request, #39095: [WIP][SQL] Add the error class `UNRESOLVED_ROUTINE`

2022-12-16 Thread GitBox


MaxGekk opened a new pull request, #39095:
URL: https://github.com/apache/spark/pull/39095

   
   
   ### What changes were proposed in this pull request?
   
   
   
   ### Why are the changes needed?
   
   
   
   ### Does this PR introduce _any_ user-facing change?
   
   
   
   ### How was this patch tested?
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on pull request #39065: [SPARK-41521][BUILD][K8S] Upgrade `kubernetes-client` to 6.3.0

2022-12-16 Thread GitBox


dongjoon-hyun commented on PR #39065:
URL: https://github.com/apache/spark/pull/39065#issuecomment-1355403342

   The upstream released a fixed version.
   - https://github.com/fabric8io/kubernetes-client/releases/tag/v6.3.1
   
   I made a PR for that.
   - https://github.com/apache/spark/pull/39094


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun opened a new pull request, #39094: [SPARK-41552][BUILD][K8S] Upgrade `kubernetes-client` to 6.3.1

2022-12-16 Thread GitBox


dongjoon-hyun opened a new pull request, #39094:
URL: https://github.com/apache/spark/pull/39094

   
   
   ### What changes were proposed in this pull request?
   
   
   
   ### Why are the changes needed?
   
   
   
   ### Does this PR introduce _any_ user-facing change?
   
   
   
   ### How was this patch tested?
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] srielau commented on a diff in pull request #38864: [SPARK-41271][SQL] Support parameterized SQL queries by `sql()`

2022-12-16 Thread GitBox


srielau commented on code in PR #38864:
URL: https://github.com/apache/spark/pull/38864#discussion_r1051043683


##
core/src/main/resources/error/error-classes.json:
##
@@ -795,6 +795,11 @@
   }
 }
   },
+  "INVALID_SQL_ARG" : {
+"message" : [
+  "The argument  of `sql()` is invalid. Consider to replace it by a 
SQL literal statement."

Review Comment:
   That's an expression then. Statements are top level. (Like SELECT, UPDATE, 
CRAETE, SET).
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] juliuszsompolski commented on a diff in pull request #38941: [SPARK-41498] Propagate metadata through Union

2022-12-16 Thread GitBox


juliuszsompolski commented on code in PR #38941:
URL: https://github.com/apache/spark/pull/38941#discussion_r1051033080


##
sql/core/src/test/scala/org/apache/spark/sql/connector/MetadataColumnSuite.scala:
##
@@ -232,4 +237,175 @@ class MetadataColumnSuite extends DatasourceV2SQLBase {
   )
 }
   }
+
+  test("SPARK-41498: Metadata column is propagated through union") {

Review Comment:
   could we also add a positive example with more complicated columns?
   e.g. selecting _partition and _metadata; or projecting _metadata to only 
select file_name so that nested column pruning kicks in?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] anchovYu commented on pull request #39040: [SPARK-27561][SQL][FOLLOWUP] Support implicit lateral column alias resolution on Aggregate

2022-12-16 Thread GitBox


anchovYu commented on PR #39040:
URL: https://github.com/apache/spark/pull/39040#issuecomment-1355358289

   @gengliangwang sure, will do today, fyi i was sick yesterday.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] hvanhovell commented on pull request #39091: [SPARK-41527][CONNECT][PYTHON] Implement DataFrame.observe

2022-12-16 Thread GitBox


hvanhovell commented on PR #39091:
URL: https://github.com/apache/spark/pull/39091#issuecomment-1355312686

   @beliefer thanks for working on this. I have one question how are we going 
to get the observed metrics to the client? This seems to be missing from the 
implementation. One of the approaches would be to send it in a similar way as 
the metrics in the result code path.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] hvanhovell commented on a diff in pull request #39091: [SPARK-41527][CONNECT][PYTHON] Implement DataFrame.observe

2022-12-16 Thread GitBox


hvanhovell commented on code in PR #39091:
URL: https://github.com/apache/spark/pull/39091#discussion_r1051008772


##
connector/connect/server/src/main/scala/org/apache/spark/sql/connect/planner/SparkConnectPlanner.scala:
##
@@ -338,6 +340,22 @@ class SparkConnectPlanner(session: SparkSession) {
 }
   }
 
+  private def transformCollectMetrics(rel: proto.CollectMetrics): LogicalPlan 
= {
+val metrics = rel.getMetricsList.asScala.map { expr =>
+  Column(transformExpression(expr))
+}
+
+if (rel.getIsObservation) {

Review Comment:
   What is the difference between the code paths?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun closed pull request #39092: [SPARK-41548][CONNECT][TESTS] Disable ANSI mode in pyspark.sql.tests.connect.test_connect_functions

2022-12-16 Thread GitBox


dongjoon-hyun closed pull request #39092: [SPARK-41548][CONNECT][TESTS] Disable 
ANSI mode in pyspark.sql.tests.connect.test_connect_functions
URL: https://github.com/apache/spark/pull/39092


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] MaxGekk commented on a diff in pull request #38861: [SPARK-41294][SQL] Assign a name to the error class _LEGACY_ERROR_TEMP_1203 / 1168

2022-12-16 Thread GitBox


MaxGekk commented on code in PR #38861:
URL: https://github.com/apache/spark/pull/38861#discussion_r1050943156


##
sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryCompilationErrors.scala:
##
@@ -2018,15 +2007,29 @@ private[sql] object QueryCompilationErrors extends 
QueryErrorsBase {
 "tableColumns" -> expected.map(c => s"'${c.name}'").mkString(", "),
 "dataColumns" -> query.output.map(c => s"'${c.name}'").mkString(", ")))
   }
+  
+  def numColumnsMismatchError(
+  operator: LogicalPlan,
+  firstNumColumns: Int,
+  invalidOrdinalNum: Int,
+  invalidNumColumns: Int): Throwable = {
+
+def ordinalNumber(i: Int): String = i match {
+  case 0 => "first"
+  case 1 => "second"
+  case 2 => "third"
+  case i => s"${i + 1}th"
+}
 
-  def cannotWriteNotEnoughColumnsToTableError(
-  tableName: String, expected: Seq[Attribute], query: LogicalPlan): 
Throwable = {
 new AnalysisException(
-  errorClass = "_LEGACY_ERROR_TEMP_1203",
+  errorClass = "NUM_COLUMNS_MISMATCH",
   messageParameters = Map(
-"tableName" -> tableName,
-"tableColumns" -> expected.map(c => s"'${c.name}'").mkString(", "),
-"dataColumns" -> query.output.map(c => s"'${c.name}'").mkString(", ")))
+"operator" -> toSQLStmt(operator.nodeName),

Review Comment:
   Maybe, we should introduce `prettyName()` similar to `prettyName` in 
`Expression`. @cloud-fan @HyukjinKwon WDYT?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] techaddict opened a new pull request, #39093: [SPARK-41420][UI] Protobuf serializer for ApplicationInfoWrapper

2022-12-16 Thread GitBox


techaddict opened a new pull request, #39093:
URL: https://github.com/apache/spark/pull/39093

   **What changes were proposed in this pull request?**
   Add Protobuf serializer for ApplicationInfoWrapper
   
   **Why are the changes needed?**
   Support fast and compact serialization/deserialization for 
ApplicationInfoWrapper over RocksDB.
   
   **Does this PR introduce any user-facing change?**
   No
   
   **How was this patch tested?**
   New UT


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] roczei commented on pull request #38828: [SPARK-35084][CORE] Spark 3: supporting --packages in k8s cluster mode

2022-12-16 Thread GitBox


roczei commented on PR #38828:
URL: https://github.com/apache/spark/pull/38828#issuecomment-1354933514

   @holdenk, @HyukjinKwon, @dongjoon-hyun
   
   Could you please take a look when you have some time? 
   
   This fixes a k8s --packages issue which is part of Spark 3 since 3.0.0. It 
would be nice to solve it. Here you can see the old branch-3.0 where the 
conditional codes of the "if" / "else if" / "else" are equal to the latest 
master version: 
   
   
https://github.com/apache/spark/blob/branch-3.0/core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala#L316-L328
   
   vs.
   
   
https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala#L317-L330
   
   These conditions were restructured a bit by @ocworld I have already added my 
test results above. The fix works.
   
   This K8S PR is a follow-up PR for #32397. It has been closed by 
github-action because it hasn't been updated in a while and there was no unit 
test. The requested unit test has been added, now we need just someone from the 
Spark committer team who can review it again. Thanks!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



  1   2   >