[GitHub] [spark] tgravescs commented on a change in pull request #31974: [SPARK-34877][CORE][YARN]Add the code change for adding the Spark AM log link in spark UI

GitBox Thu, 15 Apr 2021 07:02:33 -0700


tgravescs commented on a change in pull request #31974:
URL: https://github.com/apache/spark/pull/31974#discussion_r614079032




##########
File path: 
core/src/main/scala/org/apache/spark/scheduler/cluster/CoarseGrainedSchedulerBackend.scala
##########
@@ -213,6 +213,11 @@ class CoarseGrainedSchedulerBackend(scheduler: 
TaskSchedulerImpl, val rpcEnv: Rp
           data.freeCores = data.totalCores
         }
         makeOffers(executorId)
+
+      case MiscellaneousProcessInfo(time: Long, info: 
MiscellaneousProcessDetails) =>
+        listenerBus.post(
+          MiscellaneousProcessInfoEvent(time, info))

Review comment:
       one line here if it will fit

##########
File path: 
core/src/main/scala/org/apache/spark/scheduler/MiscellaneousProcessDetails.scala
##########
@@ -0,0 +1,32 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.scheduler
+
+import org.apache.spark.annotation.DeveloperApi
+
+/**
+ * :: DeveloperApi ::
+ * Stores information about an Miscellaneous Process to pass from the 
scheduler to SparkListeners.
+ */
+
+@DeveloperApi
+private[spark] class MiscellaneousProcessDetails(

Review comment:
       this should not be private.  Its marked as developer api and I think the 
idea here is that if someone implements their own listener they can access it.

##########
File path: 
resource-managers/yarn/src/main/scala/org/apache/spark/scheduler/cluster/YarnSchedulerBackend.scala
##########
@@ -333,6 +333,13 @@ private[spark] abstract class YarnSchedulerBackend(
           logWarning(s"Requesting driver to remove executor $executorId for 
reason $reason")
           driverEndpoint.send(r)
         }
+
+      // In case of yarn Miscellaneous Process is Spark AM Container
+      // Launched for the deploy mode client
+      case processInfo @ MiscellaneousProcessInfo(_, _) =>
+        logInfo(s"Sending the Spark AM info for yarn client mode")
+        driverEndpoint.send(processInfo)
+

Review comment:
       remove extra newline

##########
File path: 
resource-managers/yarn/src/main/scala/org/apache/spark/scheduler/cluster/YarnSchedulerBackend.scala
##########
@@ -333,6 +333,13 @@ private[spark] abstract class YarnSchedulerBackend(
           logWarning(s"Requesting driver to remove executor $executorId for 
reason $reason")
           driverEndpoint.send(r)
         }
+
+      // In case of yarn Miscellaneous Process is Spark AM Container
+      // Launched for the deploy mode client
+      case processInfo @ MiscellaneousProcessInfo(_, _) =>
+        logInfo(s"Sending the Spark AM info for yarn client mode")

Review comment:
       Doesn't seem like this needs to be info, perhaps change to debug unless 
you have specific use case here

##########
File path: core/src/main/scala/org/apache/spark/scheduler/SparkListener.scala
##########
@@ -227,6 +227,13 @@ case class SparkListenerUnschedulableTaskSetRemoved(
 @DeveloperApi
 case class SparkListenerBlockUpdated(blockUpdatedInfo: BlockUpdatedInfo) 
extends SparkListenerEvent
 
+@DeveloperApi
+case class MiscellaneousProcessInfoEvent(time: Long,

Review comment:
       so I would rather have this like I requested with time, processId and 
then MiscellaneousProcessDetails. I don't expect time and processId fields to 
change or go away and they can be used to easily look to see if its an event 
you care about. It also mirrors how other events are done here.  Also I think 
we should rename like I requested to MiscellaneousProcessAdded, if something 
else happens to that process we could add corresponding Removed event.

##########
File path: core/src/main/scala/org/apache/spark/status/LiveEntity.scala
##########
@@ -915,3 +915,34 @@ private class RDDPartitionSeq extends 
Seq[v1.RDDPartitionInfo] {
   }
 
 }
+
+private[spark] class LiveMiscellaneousProcess(val processId: String,
+    creationTime: Long) extends LiveEntity {
+
+  var hostPort: String = null
+  var host: String = null
+  var isActive = true
+  var totalCores = 0
+  val addTime = new Date(creationTime)
+  var removeTime: Date = null
+  var memoryUsed = 0L
+  var maxMemory = 0L
+  var processLogs = Map[String, String]()
+
+  def hostname: String = if (host != null) host else 
Utils.parseHostPort(hostPort)._1

Review comment:
       why do we need this, I don't see it used?

##########
File path: core/src/main/scala/org/apache/spark/status/AppStatusListener.scala
##########
@@ -1353,4 +1363,22 @@ private[spark] class AppStatusListener(
     }
   }
 
+  /**
+   *
+   * @param processInfoEvent Miscellaneous Process Info Event
+   *
+   */
+  private def onMiscellaneousProcessAdded(
+      processInfoEvent: MiscellaneousProcessInfoEvent): Unit = {
+    val processInfo = processInfoEvent.info
+    val miscellaneousProcess =
+      getOrCreateOtherProcess(processInfo.processName, processInfoEvent.time)

Review comment:
       we are mixing processId and processName, lets stick to one consistently 
-> processId

##########
File path: core/src/main/scala/org/apache/spark/status/LiveEntity.scala
##########
@@ -915,3 +915,34 @@ private class RDDPartitionSeq extends 
Seq[v1.RDDPartitionInfo] {
   }
 
 }
+
+private[spark] class LiveMiscellaneousProcess(val processId: String,
+    creationTime: Long) extends LiveEntity {
+
+  var hostPort: String = null

Review comment:
       why do we need both host and hostPort?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] tgravescs commented on a change in pull request #31974: [SPARK-34877][CORE][YARN]Add the code change for adding the Spark AM log link in spark UI

Reply via email to