Github user andrewor14 commented on the pull request:
https://github.com/apache/spark/pull/7770#issuecomment-127077823
It seems the main source of outstanding debate is whether to display this
metric only when unsafe is enabled, which is the proposal of this patch.
The motivation is as follows: there are SQL operators whose memory usage is
both high and difficult to track without sacrificing performance (e.g. `Sort`
simply uses `Arrays.sort`, or `BroadcastHashJoin` with `GeneralHashedRelation`
uses a Java hash map under the hood). It would be misleading if we say the
memory usage in these operators is low simply because we do not track them.
There are two ways of fixing this: (1) only display this when unsafe is
enabled, and (2) detect cases when jobs are triggered due to the operators we
don't track. (2) is of a significantly wider scope than the intention of this
patch, so we just went with (1) here. Note that this means we don't expose this
metric even for simple non-SQL aggregations (e.g. `reduceByKey`), but working
around that would require us to detect whether a job is a SQL job, which is
being implemented in #7774 but outside the scope of this individual patch.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]