Github user pwendell commented on a diff in the pull request:
https://github.com/apache/spark/pull/718#discussion_r13401102
--- Diff:
core/src/main/scala/org/apache/spark/deploy/history/HistoryServer.scala ---
@@ -290,8 +217,88 @@ object HistoryServer {
}
}
+ private def parse(args: List[String]): Unit = {
+ args match {
+ case ("--dir" | "-d") :: value :: tail =>
+ set("fs.logDirectory", value)
+ parse(tail)
+
+ case ("--port" | "-p") :: value :: tail =>
+ set("ui.port", value)
+ parse(tail)
+
+ case ("-D") :: opt :: value :: tail =>
+ set(opt, value)
+ parse(tail)
+
+ case ("--help" | "-h") :: tail =>
+ printUsageAndExit(0)
+
+ case Nil =>
+
+ case _ =>
+ printUsageAndExit(1)
+ }
+ }
+
+ private def set(name: String, value: String) = {
+ conf.set("spark.history." + name, value)
+ }
+
+ private def printUsageAndExit(exitCode: Int) {
+ System.err.println(
+ """
+ |Usage: HistoryServer [options]
+ |
+ |Options are set by passing "-D option value" command line arguments
to the class.
+ |Command line options will override the Spark configuration file and
system properties.
+ |History Server options are always available; additional options
depend on the provider.
+ |
+ |History Server options:
+ |
+ | ui.port Port where server will listen for connections
(default 18080)
+ | ui.acls.enable Whether to enable view acls for all
applications (default false)
+ | provider Name of history provider class (defaults to
file system-based provider)
+ |
+ |FsHistoryProvider options:
+ |
+ | fs.logDirectory Directory where app logs are stored (required)
+ | fs.updateInterval How often to reload log data from storage
(seconds, default 10)
+ |""".stripMargin)
+ System.exit(exitCode)
+ }
+
}
+private[spark] abstract class ApplicationHistoryProvider {
+
+ /**
+ * This method should return a list of applications available for the
history server to
+ * show. The listing is assumed to be in descending time order.
+ *
+ * An adjusted offset should be returned if the app list has changed and
the request
+ * references an invalid start offset. Otherwise, the provided offset
should be returned.
+ *
+ * @param offset Starting offset for returned objects.
+ * @param count Max number of objects to return.
+ * @return 3-tuple (requested app list, adjusted offset, count of all
available apps)
+ */
+ def getListing(offset: Int, count: Int): (Seq[ApplicationHistoryInfo],
Int, Int)
--- End diff --
If we did want to optimize this, wouldn't the way be to have one function
that returns a listing of applications and another function that returns the
full in-memory representation of a given application? The nit wouldn't restrict
the optimizations to be tied to sequential access (in fact, the one existing
optimization we have is based on caching and not tied to sequential access at
all).
The offset is also a bit weird to have here because it implies an ordering,
but there isn't any discussion of how the ordering is defined and whether it
should be stable over time.
Anyways, why not just keep this simple rather than make it more clunky for
some optimization we didn't even write yet?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---