Github user zsxwing commented on the pull request:
https://github.com/apache/spark/pull/7774#issuecomment-127527279
Review status: 5 of 26 files reviewed at latest revision, 71 unresolved
discussions, all commit checks successful.
---
<sup>**[sql/core/src/main/scala/org/apache/spark/sql/execution/basicOperators.scala,
line 48
\[r4\]](https://reviewable.io:443/reviews/apache/spark/7774#-JvokJQehEcEYVaSyCrk-r4-48)**
([raw
file](https://github.com/apache/spark/blob/94065929603633714929c5ecbd43c2a65182552a/sql/core/src/main/scala/org/apache/spark/sql/execution/basicOperators.scala#L48)):</sup>
Removed it. Agree that we only need to track the node that will change the
number of rows
---
<sup>**[sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/commands.scala,
line 135
\[r16\]](https://reviewable.io:443/reviews/apache/spark/7774#-JvrpmHC5pR7cLKsOh5r)**
([raw
file](https://github.com/apache/spark/blob/cc1c73645f82a56e899cdb44c2e84ed68bfc7a46/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/commands.scala#L135)):</sup>
Found the `save` action cannot track the metrics because here it creates a
new DataFrame along with a new queryExecution (a new queryExecution means the
accumulator ids are totally different). I cannot use the new DataFrame here
because it only contains one SparkPlan: PhysicalRDD. So I modified
`withNewExecutionId` to accept a QueryExecution so that I can pass the
queryExecution that will be executed.
---
<sup>**[sql/core/src/main/scala/org/apache/spark/sql/execution/SparkPlan.scala,
line 65
\[r13\]](https://reviewable.io:443/reviews/apache/spark/7774#-JvokJQehEcEYVaSyCrj-r13-65)**
([raw
file](https://github.com/apache/spark/blob/b8d5605b5432e26322190896637139a1b051c7d5/sql/core/src/main/scala/org/apache/spark/sql/execution/SparkPlan.scala#L65)):</sup>
Done.
---
<sup>**[sql/core/src/main/scala/org/apache/spark/sql/execution/SparkPlan.scala,
line 78
\[r16\]](https://reviewable.io:443/reviews/apache/spark/7774#-JvrpXkePxsdEMQXAViA)**
([raw
file](https://github.com/apache/spark/blob/cc1c73645f82a56e899cdb44c2e84ed68bfc7a46/sql/core/src/main/scala/org/apache/spark/sql/execution/SparkPlan.scala#L78)):</sup>
Added this method to make subclasses easy to track the number of rows.
---
Comments from the [review on
Reviewable.io](https://reviewable.io:443/reviews/apache/spark/7774)
<!-- Sent from Reviewable.io -->
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]