Github user squito commented on a diff in the pull request:
https://github.com/apache/spark/pull/6166#discussion_r30837452
--- Diff:
yarn/src/main/scala/org/apache/spark/scheduler/cluster/YarnClusterSchedulerBackend.scala
---
@@ -53,4 +62,65 @@ private[spark] class YarnClusterSchedulerBackend(
logError("Application attempt ID is not set.")
super.applicationAttemptId
}
+
+ override def getDriverLogUrls: Option[Map[String, String]] = {
+ var yarnClientOpt: Option[YarnClient] = None
+ var driverLogs: Option[Map[String, String]] = None
+ try {
+ val yarnConf = new YarnConfiguration(sc.hadoopConfiguration)
+ val containerId = YarnSparkHadoopUtil.get.getContainerId
+ yarnClientOpt = Some(YarnClient.createYarnClient())
+ yarnClientOpt.foreach { yarnClient =>
+ yarnClient.init(yarnConf)
+ yarnClient.start()
+
+ // For newer versions of YARN, we can find the HTTP address for a
given node by getting a
+ // container report for a given container. But container reports
came only in Hadoop 2.4,
+ // so we basically have to get the node reports for all nodes and
find the one which runs
+ // this container. For that we have to compare the node's host
against the current host.
+ // Since the host can have multiple addresses, we need to compare
against all of them to
+ // find out if one matches.
+
+ // Get all the addresses of this node.
+ val addresses =
+ NetworkInterface.getNetworkInterfaces.asScala
+ .flatMap(_.getInetAddresses.asScala)
+ .toSeq
+
+ // Find a node report that matches one of the addresses
+ val nodeReport =
+ yarnClient.getNodeReports(NodeState.RUNNING).asScala.find { x =>
+ val host = x.getNodeId.getHost
+ addresses.exists { address =>
+ address.getHostAddress == host ||
+ address.getHostName == host ||
+ address.getCanonicalHostName == host
+ }
+ }
+
+ // Build the HTTP address for the node and build the URL for the
logs.
+ nodeReport.foreach { report =>
+ val httpAddress = report.getHttpAddress
+ // lookup appropriate http scheme for container log urls
+ val yarnHttpPolicy = yarnConf.get(
+ YarnConfiguration.YARN_HTTP_POLICY_KEY,
+ YarnConfiguration.YARN_HTTP_POLICY_DEFAULT
+ )
+ val user = Utils.getCurrentUserName()
+ val httpScheme = if (yarnHttpPolicy == "HTTPS_ONLY") "https://"
else "http://"
+ val baseUrl =
s"$httpScheme$httpAddress/node/containerlogs/$containerId/$user"
--- End diff --
Hari and I discussed this offline a bit on how this works when you've got
multiple containers on a node -- it is just a bit confusing so I suggested
adding a comment here, something like: "The nodeReport gives us the httpAddress
for the NodeManager, which may be shared by more than one container on that
node. But we know we have the container for the driver because we use the
containerId as well"
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]