[GitHub] spark pull request: [SPARK-8735] [SQL] Expose memory usage for shu...

andrewor14 Sun, 02 Aug 2015 15:38:46 -0700

Github user andrewor14 commented on the pull request:

    https://github.com/apache/spark/pull/7770#issuecomment-127077823
  
    It seems the main source of outstanding debate is whether to display this 
metric only when unsafe is enabled, which is the proposal of this patch.
    
    The motivation is as follows: there are SQL operators whose memory usage is 
both high and difficult to track without sacrificing performance (e.g. `Sort` 
simply uses `Arrays.sort`, or `BroadcastHashJoin` with `GeneralHashedRelation` 
uses a Java hash map under the hood). It would be misleading if we say the 
memory usage in these operators is low simply because we do not track them.
    
    There are two ways of fixing this: (1) only display this when unsafe is 
enabled, and (2) detect cases when jobs are triggered due to the operators we 
don't track. (2) is of a significantly wider scope than the intention of this 
patch, so we just went with (1) here. Note that this means we don't expose this 
metric even for simple non-SQL aggregations (e.g. `reduceByKey`), but working 
around that would require us to detect whether a job is a SQL job, which is 
being implemented in #7774 but outside the scope of this individual patch.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request: [SPARK-8735] [SQL] Expose memory usage for shu...

Reply via email to