Github user squito commented on a diff in the pull request:
https://github.com/apache/spark/pull/21221#discussion_r203455379
--- Diff:
core/src/main/scala/org/apache/spark/scheduler/SparkListener.scala ---
@@ -160,11 +160,29 @@ case class
SparkListenerBlockUpdated(blockUpdatedInfo: BlockUpdatedInfo) extends
* Periodic updates from executors.
* @param execId executor id
* @param accumUpdates sequence of (taskId, stageId, stageAttemptId,
accumUpdates)
+ * @param executorUpdates executor level metrics updates
*/
@DeveloperApi
case class SparkListenerExecutorMetricsUpdate(
execId: String,
- accumUpdates: Seq[(Long, Int, Int, Seq[AccumulableInfo])])
+ accumUpdates: Seq[(Long, Int, Int, Seq[AccumulableInfo])],
+ executorUpdates: Option[Array[Long]] = None)
+ extends SparkListenerEvent
+
+/**
+ * Peak metric values for the executor for the stage, written to the
history log at stage
+ * completion.
+ * @param execId executor id
+ * @param stageId stage id
+ * @param stageAttemptId stage attempt
+ * @param executorMetrics executor level metrics, indexed by
MetricGetter.values
+ */
+@DeveloperApi
+case class SparkListenerStageExecutorMetrics(
+ execId: String,
+ stageId: Int,
+ stageAttemptId: Int,
+ executorMetrics: Array[Long])
--- End diff --
re: compatability -- the json already uses names, so is not dependent on
enum order. we always require the same code for the values sent between the
driver & executor, so that isn't a concern. For the user accessing these
values with `getMetricValue()` -- deleting an enum would be the same as
deleting a field, so yeah it would break compatibility. technically allowed
for a "developerapi" but something we should avoid. Adding enums should be OK.
If the enums are re-ordered in the spark code, but the user compiles against
an older version ... I *think* it should be OK, as we'd look up the index in
the actual spark code at runtime.
btw, I'm using "enum" loosely, I don't actually mean a java enum, I mean
more the general concept, as its implemented in the current diff. A fixed set
of constants, with a helper to get all of them in order. We could switch to
using java enums -- I actually started that
(https://github.com/apache/spark/pull/21221/commits/8b74ba8fff21b499e7cc9d93f9864831aa29773e),
but changed it in the end. honestly I forget why -- I think it was because I
decided I wanted scala scoping constructs and more flexible code organization
or something along those lines, and java's enum didn't really buy me much more.
The `executorMetrics` field here is basically an `EnumMap`, but it can
actually do primitives. That matters more for the internal messages, than here
in the public spark listener api.
anyway, I think there *does* need to be some developer api which exposes
all of the MetricGetter/ExecutorMetricType values. I don't really care whether
that is a java enum, or the home-grown version here. I'm flexible on
specifics, but my suggestion: an `ExecutorMetricType` enum that is a
developerapi; make the SparkListener api expose the values as `executorMetrics:
EnumMap<ExecutorMetricType, Long>`; internally, still use `Array[Long]`, and
have `MetricGetter` contain the code which knows how to take an
executorMetricType and determine its current value. that would make the public
api more self-documenting and clean, keep things efficient and clean
internally, and also allow us to separate out the public list of metrics from
the internal logic to determine current values, without having to play too many
games with package-private access.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]