Github user squito commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21221#discussion_r203455379
  
    --- Diff: 
core/src/main/scala/org/apache/spark/scheduler/SparkListener.scala ---
    @@ -160,11 +160,29 @@ case class 
SparkListenerBlockUpdated(blockUpdatedInfo: BlockUpdatedInfo) extends
      * Periodic updates from executors.
      * @param execId executor id
      * @param accumUpdates sequence of (taskId, stageId, stageAttemptId, 
accumUpdates)
    + * @param executorUpdates executor level metrics updates
      */
     @DeveloperApi
     case class SparkListenerExecutorMetricsUpdate(
         execId: String,
    -    accumUpdates: Seq[(Long, Int, Int, Seq[AccumulableInfo])])
    +    accumUpdates: Seq[(Long, Int, Int, Seq[AccumulableInfo])],
    +    executorUpdates: Option[Array[Long]] = None)
    +  extends SparkListenerEvent
    +
    +/**
    + * Peak metric values for the executor for the stage, written to the 
history log at stage
    + * completion.
    + * @param execId executor id
    + * @param stageId stage id
    + * @param stageAttemptId stage attempt
    + * @param executorMetrics executor level metrics, indexed by 
MetricGetter.values
    + */
    +@DeveloperApi
    +case class SparkListenerStageExecutorMetrics(
    +    execId: String,
    +    stageId: Int,
    +    stageAttemptId: Int,
    +    executorMetrics: Array[Long])
    --- End diff --
    
    re: compatability -- the json already uses names, so is not dependent on 
enum order.  we always require the same code for the values sent between the 
driver & executor, so that isn't a concern.  For the user accessing these 
values with `getMetricValue()` -- deleting an enum would be the same as 
deleting a field, so yeah it would break compatibility.  technically allowed 
for a "developerapi" but something we should avoid.  Adding enums should be OK. 
 If the enums are re-ordered in the spark code, but the user compiles against 
an older version ... I *think* it should be OK, as we'd look up the index in 
the actual spark code at runtime.
    
    btw, I'm using "enum" loosely, I don't actually mean a java enum, I mean 
more the general concept, as its implemented in the current diff.  A fixed set 
of constants, with a helper to get all of them in order.  We could switch to 
using java enums -- I actually started that 
(https://github.com/apache/spark/pull/21221/commits/8b74ba8fff21b499e7cc9d93f9864831aa29773e),
 but changed it in the end.  honestly I forget why -- I think it was because I 
decided I wanted scala scoping constructs and more flexible code organization 
or something along those lines, and java's enum didn't really buy me much more. 
 The `executorMetrics` field here is basically an `EnumMap`, but it can 
actually do primitives.  That matters more for the internal messages, than here 
in the public spark listener api.
    
    anyway, I think there *does* need to be some developer api which exposes 
all of the MetricGetter/ExecutorMetricType values.  I don't really care whether 
that is a java enum, or the home-grown version here.  I'm flexible on 
specifics, but my suggestion: an `ExecutorMetricType` enum that is a 
developerapi; make the SparkListener api expose the values as `executorMetrics: 
EnumMap<ExecutorMetricType, Long>`; internally, still use `Array[Long]`, and 
have `MetricGetter` contain the code which knows how to take an 
executorMetricType and determine its current value.  that would make the public 
api more self-documenting and clean, keep things efficient and clean 
internally, and also allow us to separate out the public list of metrics from 
the internal logic to determine current values, without having to play too many 
games with package-private access.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to