robreeves commented on code in PR #40637:
URL: https://github.com/apache/spark/pull/40637#discussion_r1158959568


##########
resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala:
##########
@@ -1162,7 +1165,11 @@ private[spark] class Client(
       val state = report.getYarnApplicationState
 
       if (logApplicationReport) {
-        logInfo(s"Application report for $appId (state: $state)")
+        if (reportsSinceLastLog == reportsTillNextLog || lastState != state) {

Review Comment:
   Right now this can't happen, but I'm usually paranoid about using equality 
checks in case there is some bug where we don't hit this line of code, but do 
increment `reportsSinceLastLog`. A more defensive way to code this is 
`reportsSinceLastLog >= reportsTillNextLog`.



##########
resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala:
##########
@@ -1142,8 +1142,11 @@ private[spark] class Client(
       logApplicationReport: Boolean = true,
       interval: Long = sparkConf.get(REPORT_INTERVAL)): YarnAppReport = {
     var lastState: YarnApplicationState = null
+    val reportsTillNextLog: Int = sparkConf.get(REPORT_LOG_FREQUENCY)
+    var reportsSinceLastLog: Int = 0
     while (true) {
       Thread.sleep(interval)
+      reportsSinceLastLog += 1

Review Comment:
   A bit of a nit, but I prefer to keep modification of `reportsSinceLastLog` 
as close to the logging section as possible (line 1166). Right now there is a 
big section of unrelated code before where it is set and used that requires the 
reader to jump around more.



##########
resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/config.scala:
##########
@@ -225,6 +225,15 @@ package object config extends Logging {
     .timeConf(TimeUnit.MILLISECONDS)
     .createWithDefaultString("1s")
 
+  private[spark] val REPORT_LOG_FREQUENCY = {
+    ConfigBuilder("spark.yarn.report.logging.frequency")
+      .doc("Number of application reports processed until the next application 
status is logged." +

Review Comment:
   To be more precise, it is the **maximum** number of reports processed 
because a state change will cause it to be logged right away. I think the state 
change behavior is worth mentioning.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to