[GitHub] spark pull request: [SPARK-8861][SPARK-8862][SQL] Add basic instru...

Github user andrewor14 commented on the pull request:

    https://github.com/apache/spark/pull/7774#issuecomment-127429075
  
    
    
    
    
    Review status: 0 of 22 files reviewed at latest revision, 66 unresolved 
discussions, some commit checks failed.
    
    ---
    
    
<sup>**[sql/core/src/main/scala/org/apache/spark/sql/execution/SparkPlan.scala, 
line 71 
\[r4\]](https://reviewable.io:443/reviews/apache/spark/7774#-JvpjDe0mLMQg8-alRGn)**
 ([raw 
file](https://github.com/apache/spark/blob/94065929603633714929c5ecbd43c2a65182552a/sql/core/src/main/scala/org/apache/spark/sql/execution/SparkPlan.scala#L71)):</sup>
    shall we keep this `private[spark]` for now? We can always expose it in the 
future if we want. Same in the accessor several lines down.
    
    ---
    
    
<sup>**[sql/core/src/main/scala/org/apache/spark/sql/execution/SQLExecution.scala,
 line 58 
\[r3\]](https://reviewable.io:443/reviews/apache/spark/7774#-JvokJQh1eM3HWsvMBR6-r3-58)**
 ([raw 
file](https://github.com/apache/spark/blob/ca1811f5beb035db74e932e683716bd09e1f7302/sql/core/src/main/scala/org/apache/spark/sql/execution/SQLExecution.scala#L58)):</sup>
    I should add though that (1) allows us to write these custom events to the 
event log and thus allow us to see the information in the history server, which 
is very valuable.
    
    ---
    
    
<sup>**[sql/core/src/main/scala/org/apache/spark/sql/ui/AllExecutionsPage.scala,
 line 133 
\[r4\]](https://reviewable.io:443/reviews/apache/spark/7774#-Jvpk6HAav0HqRcnGqcD)**
 ([raw 
file](https://github.com/apache/spark/blob/94065929603633714929c5ecbd43c2a65182552a/sql/core/src/main/scala/org/apache/spark/sql/ui/AllExecutionsPage.scala#L133)):</sup>
    do you need `else { Nil }` here?
    
    ---
    
    
<sup>**[sql/core/src/main/scala/org/apache/spark/sql/ui/AllExecutionsPage.scala,
 line 154 
\[r4\]](https://reviewable.io:443/reviews/apache/spark/7774#-JvpkGDIsvkYxcFl-p1P)**
 ([raw 
file](https://github.com/apache/spark/blob/94065929603633714929c5ecbd43c2a65182552a/sql/core/src/main/scala/org/apache/spark/sql/ui/AllExecutionsPage.scala#L154)):</sup>
    +detail (without space) to be consistent with above
    
    ---
    
    
<sup>**[sql/core/src/main/scala/org/apache/spark/sql/ui/SparkPlanGraph.scala, 
line 52 
\[r4\]](https://reviewable.io:443/reviews/apache/spark/7774#-JvpkizmZ3pVvaR_iuiG)**
 ([raw 
file](https://github.com/apache/spark/blob/94065929603633714929c5ecbd43c2a65182552a/sql/core/src/main/scala/org/apache/spark/sql/ui/SparkPlanGraph.scala#L52)):</sup>
    does this need to be an atomic long? Seems a little overkill. Can this just 
be `nextNodeId = 0` or something?
    
    ---
    
    <sup>**[sql/core/src/main/scala/org/apache/spark/sql/ui/SQLListener.scala, 
line 139 
\[r4\]](https://reviewable.io:443/reviews/apache/spark/7774#-JvplbxtEcRT_dHYezAz)**
 ([raw 
file](https://github.com/apache/spark/blob/94065929603633714929c5ecbd43c2a65182552a/sql/core/src/main/scala/org/apache/spark/sql/ui/SQLListener.scala#L139)):</sup>
    `finishTask = true`, same in `onExecutorMetricsUpdate`
    
    ---
    
    <sup>**[sql/core/src/main/scala/org/apache/spark/sql/ui/SQLListener.scala, 
line 142 
\[r4\]](https://reviewable.io:443/reviews/apache/spark/7774#-Jvpm5Q5pWW4XPL9JV6S)**
 ([raw 
file](https://github.com/apache/spark/blob/94065929603633714929c5ecbd43c2a65182552a/sql/core/src/main/scala/org/apache/spark/sql/ui/SQLListener.scala#L142)):</sup>
    I would rename this `updateTaskAccumulatorValues`, then add a java doc:
    ```
    Update the accumulator values of a task with the latest metrics for this 
task.
    This is called every time we receive an executor heartbeat or when a task 
finishes.
    ```
    
    ---
    
    <sup>**[sql/core/src/main/scala/org/apache/spark/sql/ui/SQLListener.scala, 
line 198 
\[r4\]](https://reviewable.io:443/reviews/apache/spark/7774#-JvpnK83TUW9ZFVHe00-)**
 ([raw 
file](https://github.com/apache/spark/blob/94065929603633714929c5ecbd43c2a65182552a/sql/core/src/main/scala/org/apache/spark/sql/ui/SQLListener.scala#L198)):</sup>
    The `toMap` here doesn't seem safe. It turns out scala silently collapses 
items with the same key when you call `toMap`:
    ```
    scala> Seq((1, 1), (1, 2), (1, 3)).toMap
    res5: scala.collection.immutable.Map[Int,Int] = Map(1 -> 3)
    ```
    What are the semantics here? My understanding is that we want to display 
the accumulators for each of the operators in this plan, not just for the last 
one that has non-zero accumulator values.
    
    ---
    
    <sup>**[sql/core/src/main/scala/org/apache/spark/sql/ui/SQLListener.scala, 
line 252 
\[r4\]](https://reviewable.io:443/reviews/apache/spark/7774#-JvppYcO2JgB6qKiIEyF)**
 ([raw 
file](https://github.com/apache/spark/blob/94065929603633714929c5ecbd43c2a65182552a/sql/core/src/main/scala/org/apache/spark/sql/ui/SQLListener.scala#L252)):</sup>
    this could be the java doc itself actually
    
    ---
    
    
    Comments from the [review on 
Reviewable.io](https://reviewable.io:443/reviews/apache/spark/7774)
    <!-- Sent from Reviewable.io -->




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request: [SPARK-8861][SPARK-8862][SQL] Add basic instru...

Reply via email to