Github user andrewor14 commented on the pull request:
https://github.com/apache/spark/pull/7774#issuecomment-127438028
@zsxwing I started leaving comments on Github and then later moved towards
[reviewable](https://reviewable.io/reviews/apache/spark/7774). Hopefully it's
not super confusing.
Most of my comments can be addressed quickly. The high level question I
still have is when we should wrap something in `withNewExecutionId`. Let's say
I have something like the following:
```
// in DataFrame.scala
def countTwice(): Long = { count() + count() }
```
This currently shows up on the UI as two separate queries, which maybe
confusing because the user only ran one DF operation. The two counts are really
just implementation detail of `countTwice`, so I wonder if we should merge them
into one item in the execution table. One tricky thing with merging these is
that we'll end up having 2 visualizations on the same page, which may be a lot
of work to support.
<img width="1076" alt="screen shot 2015-08-03 at 4 47 11 pm"
src="https://cloud.githubusercontent.com/assets/2133137/9050142/532d6e5a-39ff-11e5-80af-01b3af919692.png">
By the way, I'm totally fine with the existing semantics and potentially
addressing it later. I'd just like to point out a potential source of confusion
so we don't forget about it.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]