[GitHub] spark pull request: [SPARK-8861][SPARK-8862][SQL] Add basic instru...

Github user zsxwing commented on the pull request:

    https://github.com/apache/spark/pull/7774#issuecomment-127527279
  
    
    
    
    
    Review status: 5 of 26 files reviewed at latest revision, 71 unresolved 
discussions, all commit checks successful.
    
    ---
    
    
<sup>**[sql/core/src/main/scala/org/apache/spark/sql/execution/basicOperators.scala,
 line 48 
\[r4\]](https://reviewable.io:443/reviews/apache/spark/7774#-JvokJQehEcEYVaSyCrk-r4-48)**
 ([raw 
file](https://github.com/apache/spark/blob/94065929603633714929c5ecbd43c2a65182552a/sql/core/src/main/scala/org/apache/spark/sql/execution/basicOperators.scala#L48)):</sup>
    Removed it. Agree that we only need to track the node that will change the 
number of rows
    
    ---
    
    
<sup>**[sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/commands.scala,
 line 135 
\[r16\]](https://reviewable.io:443/reviews/apache/spark/7774#-JvrpmHC5pR7cLKsOh5r)**
 ([raw 
file](https://github.com/apache/spark/blob/cc1c73645f82a56e899cdb44c2e84ed68bfc7a46/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/commands.scala#L135)):</sup>
    Found the `save` action cannot track the metrics because here it creates a 
new DataFrame along with a new queryExecution (a new queryExecution means the 
accumulator ids are totally different). I cannot use the new DataFrame here 
because it only contains one SparkPlan: PhysicalRDD. So I modified 
`withNewExecutionId` to accept a QueryExecution so that I can pass the 
queryExecution that will be executed.
    
    ---
    
    
<sup>**[sql/core/src/main/scala/org/apache/spark/sql/execution/SparkPlan.scala, 
line 65 
\[r13\]](https://reviewable.io:443/reviews/apache/spark/7774#-JvokJQehEcEYVaSyCrj-r13-65)**
 ([raw 
file](https://github.com/apache/spark/blob/b8d5605b5432e26322190896637139a1b051c7d5/sql/core/src/main/scala/org/apache/spark/sql/execution/SparkPlan.scala#L65)):</sup>
    Done.
    
    ---
    
    
<sup>**[sql/core/src/main/scala/org/apache/spark/sql/execution/SparkPlan.scala, 
line 78 
\[r16\]](https://reviewable.io:443/reviews/apache/spark/7774#-JvrpXkePxsdEMQXAViA)**
 ([raw 
file](https://github.com/apache/spark/blob/cc1c73645f82a56e899cdb44c2e84ed68bfc7a46/sql/core/src/main/scala/org/apache/spark/sql/execution/SparkPlan.scala#L78)):</sup>
    Added this method to make subclasses easy to track the number of rows.
    
    ---
    
    
    Comments from the [review on 
Reviewable.io](https://reviewable.io:443/reviews/apache/spark/7774)
    <!-- Sent from Reviewable.io -->




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request: [SPARK-8861][SPARK-8862][SQL] Add basic instru...

Reply via email to